Summary
Technological advances enable assaying multiplexed spatially resolved RNA and protein expression profiling of individual cells, thereby capturing molecular variations in physiological contexts. While these methods are increasingly accessible, computational approaches for studying the interplay of the spatial structure of tissues and cell-cell heterogeneity are only beginning to emerge. Here, we present spatial variance component analysis (SVCA), a computational framework for the analysis of spatial molecular data. SVCA enables quantifying different dimensions of spatial variation and in particular quantifies the effect of cell-cell interactions on gene expression. In a breast cancer Imaging Mass Cytometry dataset, our model yields interpretable spatial variance signatures, which reveal cell-cell interactions as a major driver of protein expression heterogeneity. Applied to high-dimensional imaging-derived RNA data, SVCA identifies plausible gene families that are linked to cell-cell interactions. SVCA is available as a free software tool that can be widely applied to spatial data from different technologies.
Keywords: Gaussian process, random effect model, multiplexed imaging
Graphical Abstract

Highlights
-
•
Statistical method to assess cell-cell interactions in spatial expression data
-
•
Generally applicable to diverse data types and biological systems
-
•
Illustrated on IMC data in human cancer and seqFISH data in mouse hippocampus
-
•
Open source software available on github
Arnol et al. present a statistical method for analyzing single-cell expression data in a spatial context. The method identifies the sources of gene expression variability by decomposing it into different components, each attributable to a different source. These sources include aspects of spatial variation, in particular cell-cell interactions.
Introduction
Experimental advances enable assaying RNA and protein abundances of single cells in spatial contexts, thereby allowing the study of single-cell variations in tissue. Already, these technologies have delivered insights into the spatial structure of cell types in tissue and its effect on gene expression programs (Bodenmiller, 2016, Battich et al., 2013). These new dimensions of gene expression variation also have the potential to deliver biomarkers in health and disease (Bodenmiller, 2016).
Currently, there exist alternative technologies for profiling spatially resolved expression profiles. Imaging Mass Cytometry (IMC) (Giesen et al., 2014, Chang et al., 2017) and Multiplexed Ion Beam Imaging (MIBI) (Angelo et al., 2014) rely on protein labeling with antibodies coupled to metal isotopes of specific masses followed by high-resolution tissue ablation and ionization. IMC currently allows for assaying of up to 37 targeted proteins with subcellular resolution. Alternative methods such as multiplex immunofluorescence (MxIF) and cyclic immunofluorescence (CycIF) use immunofluorescence for protein quantification of dozens of markers in single cells (Gerdes et al., 2013, Lin et al., 2015). There are also rapidly evolving technologies based on fluorescence assays to measure single-cell RNA levels in spatial context (Strell et al., 2018). Among these, multiplexed error robust-fluorescence in situ hybridization (Mer-FISH) and sequential FISH (seqFISH) use a combinatorial approach of fluorescence-labeled small RNA probes to identify and localize single RNA molecules (Shah et al., 2017, Chen et al., 2015, Gerdes et al., 2013, Lin et al., 2015), which has dramatically increased the number of readouts (currently between 130 and 250). Even higher-dimensional expression profiles can be obtained from spatial expression profiling techniques such as spatial transcriptomics (Ståhl et al., 2016). However, they currently do not offer single-cell resolution and are therefore not sufficient for studying cell-to-cell variations.
The availability of spatially resolved expression profiles from a population of cells provides new opportunities to disentangle the sources of gene expression variation in a fine-grained manner. Spatial methods can be utilized to distinguish intrinsic sources of variation, such as the cell-cycle stages (Buettner et al., 2015, Scialdone et al., 2015), from sources of variation that relate to the spatial structure of the tissue, such as microenvironmental effects linked to the cell position (Fukumura, 2005), access to glucose or other metabolites (Meugnier et al., 2007, Lyssiotis and Kimmelman, 2017), or cell-cell interactions. To perform their function, proximal cells need to interact via direct molecular signals (Sieck, 2014), adhesion proteins (Franke, 2009), or other types of physical contacts (Varol et al., 2015). In addition, certain cell types such as immune cells may migrate to specific locations in a tissue to perform their function in tandem with local cells (Moreau et al., 2018). In the following we refer to cell-cell interactions as a general term regardless of the underlying mechanism, while more specific biological interpretations are discussed in the context of the specific biological use cases we present.
While intrinsic sources of variation have been extensively studied, cell-cell interactions are arguably less well explored, despite their importance for understanding tissue-level functions. Experimentally, the required spatial omics profiles can already be generated at high throughput, and hence there is an opportunity for computational methods that allow for identifying and quantifying the impact of cell-cell interactions.
Existing analysis approaches for spatial omics data can be broadly classified into two groups. On the one hand, there exist statistical tests to explore the relevance of the spatial position of cells for the expression profiles of individual genes (Svensson et al., 2018). Genes with distinct spatial expression patterns have also been used as markers to map cells from dissociated single-cell RNA sequencing (RNA-seq) to reconstructed spatial coordinates (Achim et al., 2015, Satija et al., 2015). However, these approaches do not consider cell-cell interactions.
On the other hand, there exist methods to test for qualitative patterns of cell-type organization. For example, recent methods designed for IMC datasets (Schapiro et al., 2017, Schulz et al., 2018) identify discrete cell types that co-occur in cellular neighborhoods more or less frequently than expected by chance. While these enrichment tests yield qualitative insights into interactions between cell types, these methods do not quantify the effect of cell-cell interactions on gene expression programs. Alternatively, there exist regression-based models to assess interactions on gene expression profiles of genes based on predefined features that capture specific aspects of the cell neighborhood (Goltsev et al., 2018, Battich et al., 2015). These models are conceptually closely related to our approach; however, they rely on the careful choice of relevant features and tend to require ad hoc discretization steps to define cell neighborhoods (see STAR Methods).
Here, we present spatial variance component analysis (SVCA), a computational framework based on Gaussian processes (Rasmussen and Williams, 2006), to model spatial sources of variation of individual genes. SVCA allows for decomposing gene expression variation into intrinsic effects, environmental effects, and, most importantly, an explicit cell-cell interaction component. In contrast to previous methods, our model directly uses the spatial coordinates and the gene expression profile of each cell as input, thereby avoiding the need to define discrete cell types and other microenvironmental variables.
We validate our model using simulated data, demonstrating the accuracy of the model and its robustness to technical sources of variation including mis-segmentation. We then apply SVCA to two datasets from different technologies and biological domains: IMC proteomics profiles data from human breast cancer tissue (Schapiro et al., 2017) and spatial single-cell RNA profiles from the mouse hippocampus generated using seqFISH (Shah et al., 2017). Across these domains, we find that the cell-cell interaction component in our model explains a major share of expression variability, thus facilitating the identification of biologically relevant genes and pathways that participate in cell-cell interactions.
Results
SVCA: A Statistical Framework for Decomposing Spatial and Non-spatial Sources of Variation
SVCA builds upon the random effect framework to model gene expression variation of individual genes as a function of additive components of intrinsic cell state effects, ; an environmental effect linked to the cell position, ; and an effect due to cell-cell interactions, : Here, denotes the vector of the expression levels of a gene of interest across all cells and denotes Gaussian measurement noise. These random effects are assumed to follow multivariate normal distributions, defined by covariance matrices that are functions of the cell spatial positions and expression profiles: , where is a covariance matrix that quantifies the pairwise similarity of cells in terms of their intrinsic state; where quantifies the similarity between the environmental context of cells based on their spatial proximity; and , where measures the similarity between the cellular neighborhoods of cells, thereby accounting for cell-cell interactions. Equivalently, this model can be expressed as the joint normal distribution with additive covariance terms (Figure 1B).
Figure 1.
Spatial Variance Component Analysis (SVCA): A Framework for Decomposing Spatial and Non-spatial Sources of Variation
(A) SVCA decomposes the variability of individual genes into (1) cell intrinsic effects (due to differences in intrinsic cell type or state, blue); (2) general environmental effects that capture expression differences due to non-specific local factors (green); and (3) a cell-cell interaction effects that capture differences in expression level attributable to different cellular composition of a cell’s neighborhood (yellow).
(B) SVCA builds on the random effect framework to model additive contributions of these components. See Figure S1 and STAR Methods for details on the definition of the corresponding covariance terms.
(C) SVCA output: gene-level breakdown of the proportion of variance attributable to different components.
The intrinsic cell-state covarianceis estimated based on the expression profiles of all genes except the focal gene: Here, and denote the vectors of expression levels of all genes but the target gene in cell and , and is a scaling parameter that is proportional to the variance explained by this covariance. The covariance for the environmental context is calculated based on the pairwise distance of all genes, , where denotes the physical distance between cells and . This component captures differences in the (local) environment or technical drift in the measurement process. The cell-cell interaction covariance term quantifies the similarity of the cellular composition in the neighborhood of cells. Borrowing concepts from social genetic effect studies (Baud et al., 2017), we define this covariance by aggregating, for each cell, the molecular composition of all other cells weighted by their distance, . Here, is a matrix that defines the continuous neighborhood of each cell, weighted by an exponential decay with cell distances . Finally, the noise term captures the unexplained residual gene expression variation. Figure 1A provides a schematic overview of these different variance components in SVCA and Figure S1 presents further details on the definition of the covariance terms used by the model (see also the STAR Methods).
The SVCA model is fitted for every target gene using maximum likelihood, to determine the scaling parameters , and , as well as the length-scale parameter . See Rasmussen and Williams (2006) for an overview of the parameter interface in this class of multivariate normal models. The fitted model can also be used to estimate the fraction of variance explained by each term after appropriate rescaling, using Gower factors (Searle, 1982, Kostem and Eskin, 2013; STAR Methods). This results in a breakdown for each gene of the fraction of variance explainable by spatial and non-spatial variance components, yielding a compact representation of major drivers of gene expression variation (Figure 1C). In the following we denote this representation as a spatial variance signature. Additionally, SVCA can be used to assess the statistical significance of individual variance terms, using model comparisons between the full SVCA model and reduced models in which individual covariance terms are omitted (STAR Methods). Finally, SVCA can also be used to predict expression profiles of held-out cells (STAR Methods).
Notably, SVCA does not require discrete cell-type assignments, but instead is based on continuous measures of cell-cell similarity that are directly estimated from cell expression profiles (Figure S1). The model also circumvents the need to define local cell neighborhoods, but instead weights interactions between pairs of cells as a function of their distance (Figure S1). Additionally, SVCA includes a non-linear environmental component which captures non-specific spatial effects. As we will observe later, this component captures unspecific variation that is linked to the location of a cell, including confounding factors such as technical drifts.
Initially, we used simulated data from the SVCA generative model to validate the model. We simulated expression profiles with no interaction effects to assess the calibration of the statistical test for cell-cell interactions, finding that the model yields conservative estimates (Figure S2A). We also compared the estimated variance components for cell-cell interactions with the simulated variance components when simulating increasing fractions of the variance explained by interaction effects, observing that the model yields accurate variance estimates (Figure S2B). We then assessed the power for detecting true cell-cell interactions, simulating increasing fractions of cell-cell interactions (Figure S2C), as well as the number of cells contained in the dataset (Figure S2D). To investigate the empirical identifiability of cell-cell interactions versus environmental effects, we also compared the estimates of the full model to a reduced model, without the cell-cell interaction component (Figure S2E). These results show that the environmental effect can falsely explain spatial variation if not accounted for by the cell-cell interaction term. This indicates that this component has the capacity to capture confounding effects by other spatial sources of variation, as we observed in the first round of simulations. Overall, we demonstrate that SVCA can be used to estimate and test for spatial drivers of single-cell variability, in particular cell-cell interactions.
SVCA Yields More Accurate Cell Interaction Estimates than Alternative Models
Next, we considered a more complex simulation using empirical parameters derived from 11 real datasets, to compare SVCA to alternative models. Briefly, we stimulated gene expression profiles based on a linear model that accounts for intrinsic effects and cell-cell interactions of variable size, as well as confounding effects due to cell mis-segmentation (STAR Methods). Cell intrinsic effects were simulated as a linear combination of the empirical expression profile of all other genes: , where is a fixed effect size, and X is the matrix of expression profiles for all cells. Cell-cell interactions are simulated using a linear combination of the nearest-neighbor expression profiles, weighted by a function of the distance , where controls the size of the cell-cell interactions. for the nearest neighbors of all focal cells, and otherwise ( is the distance between cells and ) (Figure 2A; STAR Methods). To simulate errors due to mis-segmentation, the generated expression profiles were perturbed by assigning a share of the expression profiles of mis-segmented neighboring cells, which results in perturbed expression profiles and . We varied the number of cell neighbors , the magnitude of the cell-cell interactions, and the extent of mis-segmentation effects (Figure 2B; STAR Methods).
Figure 2.
SVCA Is More Conservative and Robust than Alternative Linear Models
(A) Simulation approach: the expression profile of a simulated target gene Y is generated as a linear combination of the empirically observed cell expression profile of all genes (X) and a linear combination of the first neighbors expression profiles (X) (here, = 4). The effect of the first neighbors is weighted by the function of their distance to the focal cell.
(B) Simulation of cell mis-segmentation effects. Pairs of cells are randomly selected as mis-segmented with probability inversely proportional to the square of their distance (STAR Methods).
(C) Inferred cell-cell interactions versus simulated true values for and .
(D) Error in the inferred cell-cell interactions as a function of the simulated interaction component.
(E) Spurious cell-cell interactions as a function of the simulated mis-segmentation effect (as in B).
(F) Distribution of the cell-cell interaction error across as a function of the number of neighbors ().
We compared SVCA to four baseline methods: (1) a reduced random effect model, with the same covariance terms as SVCA but omitting the environmental term; (2) linear regression using the average of the expression profiles of the five nearest neighbors as input features; (3) linear regression accounting for cell-cell interactions between all pairs of cells weighted by the distance between cells; and (4) a combination of the last two methods, considering a fixed cellular neighborhood and weighting cell-cell interactions as a function of cell distance (STAR Methods). SVCA yielded the most accurate estimates of the cell-cell interaction component (Figures 2C and 2D) and the model was more robust to spurious effects due to mis-segmentation (spurious variance component below 1% for SVCA compared to up to 6% average for linear models; Figure 2E). The largest relative gains in accuracy were observed for small cell-cell interaction effects (Figure 2C). The lower accuracy of the reduced SVCA model with no environmental term indicates that this term plays an important role and in particular absorbs possible spurious effects from segmentation errors (spurious cell-cell interaction variance component of up to 3% versus < 1% and higher variance; Figure 2E). Additionally, SVCA was in general less biased across the full range of simulation settings than alternative methods (Figure 2F).
Application of SVCA to a Breast Cancer Proteomics Dataset Identifies Cell-Cell Interactions as a Major Driver of Expression Variation
Next, we applied SVCA to an IMC dataset from human breast cancer, where 26 protein expression levels were quantified at the single-cell level in 46 breast cancer biopsies (Schapiro et al., 2017). SVCA revealed substantial differences of the overall importance of cell-cell interaction components across proteins, explaining up to 25% of the total expression variance on average (Figure 3A). Immune cell markers in particular were identified among the set of proteins with the largest cell-cell interaction effects: CD44, CD20, CD3, and CD68, for which cell-cell interaction explained more than 10% of the variance in 36, 35, 34, and 28 out of the 46 images, respectively (Figure 3A). We hypothesize that this effect could reflect the recruitment of immune cells by specific cellular environments (Moreau et al., 2018, Chlon and Markowetz, 2017). CAHIX, a marker of hypoxia, was also found among the top markers linked to cell-cell interaction effects. We confirmed the consistency of the variance estimates from SVCA using cross-validation, where SVCA yielded more accurate out-of-sample gene expression imputations than alternative regression models, as well as simplified models that ignore cell-cell interactions (Figures 3B and S3; STAR Methods). As an additional sanity check, we also compared the variance estimates to results obtained after permuting the cell positions, which as expected resulted in near-zero cell-cell interaction components (Figure S3).
Figure 3.
Application of SVCA to 46 Breast Cancer Samples Profiled Using IMC
(A) Bottom panel: SVCA signatures for 26 proteins. Shown are averages of the fraction of variance explained by intrinsic effects, environmental effects, and cell-cell interactions, across 46 images. Proteins are ordered by the magnitude of the cell-cell interaction component. Top panel: number of images with a cell-cell interaction component greater than 10% variance.
(B) Accuracy of SVCA and alternative models for predicting gene expression out of sample (r2 assessed using 5-fold cross validation). Shown are average coefficients of determination (r2) between predicted and observed gene expression profiles, averaged across proteins and images. Error bars correspond to ±1 SD across images and proteins.
(C) First two principal components for 38 images with clinical annotations, calculated based on the spatial variance signature (variance break down as in A for each protein), with individual images colored by the clinical tumor grade.
(D) Loadings of the principal components as in (C), displaying the relevance of individual proteins and types of variance components.
We also observed substantial variation of the estimated spatial variance signatures between images (Figure S3), motivating investigation of the relationship between spatial variance components and clinical covariates, including tumor grade. A projection of the full SVCA output (spatial variance signature; Figure 3A) using principal-component analysis (PCA) identified the substructure between images that was significantly aligned with tumor grade (Figure 3C; p = 3.8 × 10–3; STAR Methods). Inspection of the PCA loadings (Figure 3D) identified the cell-cell interaction component and the environmental component for a subset of proteins (including CD20 and CD44) as the most informative SVCA features for PC1, which correlates with tumor grade. We also noticed that the images with the strongest separation in the PCA representation (image names highlighted in Figure 3C) have previously been highlighted in the primary analysis of this dataset, where these images were identified as exhibiting a different tissue organization compared to other images (Schapiro et al., 2017). This study also considered a permutation-based approach to identifying cell types that are enriched or depleted for co-occurence, followed by hierarchical clustering in order to detect images with similar cellular neighborhood structures. As a result of this procedure, the highlighted images were separated in a grade-1-enriched cluster containing the images Ay6x7 and Ay8x8 and a grade-3-enriched cluster containing the images Cy7x8, Cy8x4, Cy8x5, Cy8x6, Cy8x7, Cy13x6, Cy13x7, and Cy13x8 (Figure 3C) (Schapiro et al., 2017). This indicates that SVCA signatures capture variations that are identified using classical neighborhood statistics. Importantly, however, SVCA does not rely on cell-type classification and does not require a predefined definition of cell neighborhoods.
Tumor progression is characterized by disorganization and irregular cellular architecture, which is associated with larger cells, increased proliferation, and thus higher cell density in comparison to healthy breast tissue (Elston and Ellis, 1991). We investigated how SVCA signatures are affected by these environmental features and discovered a significant correlation (linear regression; p = 3.0 × 10–3) between the average number of neighbors per cell and the average cell-cell interaction components across proteins (using cellProfiler to estimate the number of cells). This relationship may in part explain the separation by tumor grade. In general, it is not surprising that the magnitude of cell-cell interactions is higher in tissue with increased cell density compared to adipose tissue with sparse cell coverage.
Application of SVCA to an Hippocampus RNA Dataset Identifies Relevant Gene Families Involved in Cell-Cell Interactions
SVCA can be used for the analysis of data from a broad range of spatially resolved technologies, including optical-imaging-based assays. To explore this, we considered a mouse hippocampus dataset profiled using seqFISH (Shah et al., 2017), in which 249 RNA expression levels were assayed in 21 distinct brain regions of a single animal. Spatial variance signatures for the 20 genes with the largest cell-cell interaction component are shown in Figure 4A. Analogous to the IMC dataset, SVCA signatures were robust and models that account for cell-cell interactions yielded more accurate gene expression predictions (Figure S4).
Figure 4.
Application of SVCA to 21 Images Profiled Using seqFISH
(A) Left: SVCA signatures for the 20 genes with the largest cell-cell interaction component. Shown are averages of the fraction of variance explained by intrinsic effects, environmental effects, and cell-cell interactions, across 21 images. Genes are ordered by the magnitude of the cell-cell interaction component. Right: variance estimate distribution across images and genes for all 249 genes contained in this dataset (violin plots).
(B) Spatial organization of the mouse hippocampus with dots corresponding to individual images. Colors and shapes denote regions using the classification as in Shah et al. (2017).
(C) First two principal components of the spatial variance signatures for individual images from the DG, the dorsal region, and the ventral region. Color and shape represent the location of the biopsy in the hippocampus.
(D) First two principal components of the spatial variance signatures for all 21 images.
(E) Enrichment of gene categories for cell-cell interactions (top) and intrinsic effect (bottom) (negative log Benjamini-Hochberg adjusted p values).
Similarly to results obtained from the IMC datasets, we observed differences in the spatial variance signatures across images, which were sampled from functionally distinct regions of the hippocampus (Shah et al., 2017). Principal components of the spatial variance signature for the dorsal region clustered together, irrespective of their CA1/CA3 location (Figure 4B). Similarly, images from the dentate gyrus (DG) also clustered together, and there was some proximity between signatures from the ventral region, although with more variation between them (Figures 4C and 4D). This is consistent with the observation by Shah et al. (2017) that the ventral and dorsal regions of the CA1 and CA3 mirror each other with respect to their cellular compositions and ventral regions are more heterogeneous in their cellular composition. Spatial variance signatures for intermediate regions, however, did not show much resemblance (Figure 4D).
Leveraging the higher dimensionality of these data, we sought to identify gene families that participate in cell-cell interactions. First, we manually classified genes into non-overlapping categories based on prior annotations (Table S1), considering categories with more than five genes, including genes involved in the cell cycle, cell junctions, the immune system, neurotransmitter transporters, and transcription factors for further analysis. The neurotransmitter transporter category consisted of six glutamate transporters of the solute carrier family (slc genes; Masson et al., 1999, Iversen, 2006). The immune system category consisted of six genes with multiple functions, consistently associated with immune response, such as MFGE8, which is involved in phagocytosis, or the interferon regulatory factor IRF2. The eight-cell-junction genes included ACTA2 (Actin), Opalin (Yoshikawa et al., 2008), and MOG. The largest group was made up of annotated transcription factors, consisting of 166 genes.
We tested which of these categories are enriched for large cell-cell interaction components (STAR Methods), finding that cell junction genes and neurotransmitter transporters were the most enriched groups (Q = 6 × 10–4 and Q = 1 × 10–3, Benjamini Hochberg adjusted across gene sets) (Figure 4E). Individual cell junction genes, such as GJA1 (connexin), are involved in gap junction intercellular communication (Cheng et al., 2015), while, for example, the actin skeleton has a known role in the adaptation of tissue structure and geometry to external stimulus (Carpenter, 2000, Brakebusch and Fässler, 2003). This may explain why the single-cell expression levels of cell junction genes appeared to be regulated by cell-cell interactions. The enrichment of glutamate transporters is also consistent with their involvement in the transport and (re)uptake of the neurotransmitter at the neuronal synapses, a critical cell-cell interaction in the brain (Masson et al., 1999, Iversen, 2006, Angulo et al., 2004, Mason, 2017). In addition, Slc5a7 (CHT) was also found to be preferentially expressed in specific interneurons with a link to the spatial organization of the tissue (Yi et al., 2015). To a smaller extent, genes related to the immune system were enriched for cell-cell interactions (Q = 2 × 10–2). Among the top cell-cell-interaction-related genes were CTSS (Cathepsin) and MFGE8 (Lactadherin), which play a role in phagocytosis in the brain, a form of cell-cell interaction (Fricker et al., 2012, Neher et al., 2013, Vitner et al., 2010). Notably, however, cell junction genes and neurotransmitter transporters were also enriched among genes with high intrinsic effect, suggesting that the expression level of these genes also relates to intracellular processes.
Five out of the ten genes with the highest cell-cell interaction variance components did not fall into any of the considered gene set categories. NGEF (Ephexin) is an exchange factor that plays a role in axon guidance (Shamah et al., 2001, O’Donnell et al., 2009), CAMK2 is a kinase shown to play a role in long-term potentiation and neurotransmitter release (Wang, 2008, Lisman et al., 2012), LYVE is a membrane receptor (Banerji et al., 1999), and SNCG (synuclein gamma) is involved in axonal architecture (Surguchov et al., 2001, Vargas et al., 2017). Taken together, this shows that genes with large cell-cell interaction components, as identified using SVCA, have known implications in cell-cell communication between neurons, or have known annotations for regulating the spatial architecture of the tissue.
Discussion
We have presented SVCA, a regression-based framework for the analysis of spatially resolved molecular expression data. Our model computes a spatial variance signature for individual mRNA or protein levels, decomposing their sources of variation into spatial and non-spatial components. Most prominently, SVCA provides a quantitative assessment of the effect of cell-cell interactions on the expression profile of individual molecules. SVCA tackles the problem of cellular classification and neighborhood definition using a continuous representation of space and cellular identity (Wagner et al., 2016).
We have applied SVCA to multiple datasets generated using alternative technologies, probing either RNA transcripts or proteins, demonstrating the broad applicability of the approach. Across these applications, we observed that cell-cell interactions can substantially contribute to gene expression variation, which is consistent with previous reports (Battich et al., 2015, Goltsev et al., 2018, Kamińska et al., 2015, Ayuob and Ali, 2012) and supports the concept that studying single-cell expression in the native context is important for understanding the sources of these variations.
We noticed variation in the SVCA signatures across images and investigated the possible causes of this variability. We provided evidence that differences in SVCA signatures could result from differences in the spatial structure of tissue, as well as different clinical and biological contexts. For the IMC data, we also noticed that this variability reflected previous findings about different tissue organizations between samples.
We used gene annotation to interpret the spatial variance signatures of individual genes and pathways. This identified genes with known involvements in cellular interactions, even specific to the brain, such as SLCs, to be predominantly enriched in the corresponding terms of our models. In addition to confirming the biological relevance of SVCA signatures, these results suggest that spatial variance signatures can be utilized to study the involvement of individual genes in tissue-level functions. Further interpretation of these signatures, in particular of the cell-cell interactions term, remains challenging, however. This could be due to our limited knowledge of such multi-cellular processes in comparison to intracellular pathways. In addition, cell-cell interactions may be caused by a diversity of biological contexts and processes; for example, it is intrinsically challenging to differentiate simple cell-type co-occurrence from more specific molecular interactions. As emerging technologies provide even richer and large datasets, methods such as SVCA will allow for a more fine-grained interpretation of signatures of cell-cell interactions. More hypothesis-driven research, possibly with simpler biological systems with clear positive and negative controls, can be instrumental toward this goal.
Although we have tested the calibration and robustness of SVCA, the model is not free of limitations. At present, the model does not account for technology-specific noise and instead assumes Gaussian-distributed residuals, thus requiring suitable processing of the raw data such that these assumptions are sufficiently met (see the STAR Methods). Further development could consider a generalized random effects model, for example to couple the random effect component with a negative-binomial likelihood. A second limitation of SVCA is that the model is univariate, which means that individual genes or proteins are modeled independently from each other. Multivariate extensions could account for relationships between genes involved in the same pathways, either in an unsupervised manner or using prior knowledge (Buettner et al., 2017). Such approaches could provide a more comprehensive understanding of how biological processes are affected by tissue structure. Additionally, extensions could include modeling interactions between environmental and cell-cell interaction effects, which are treated as independent additive factors at present. As the size of the spatial expression dataset increases with the development of higher-throughput technologies, scalability will also become an important challenge for SVCA. The computational cost scales linear in the number of genes, and massive parallelization can be obtained with adequate computational infrastructure. Also, the random effect approach typically scales cubically in the number of cells, which can be circumvented by splitting bigger images into multiple patches and averaging the resulting SVCA signatures. In future work, faster inference schemes based on sparse approximations (Hensman et al., 2013, Quiñonero-Candela and Rasmussen, 2005, Snelson and Ghahramani, 2006) or random feature selection (Rahimi and Recht, 2008, Oliva et al., 2016). Future work will focus on developing these features.
There is a growing appreciation of the role of spatial distribution of proteins, transcripts, and other molecules in determining tissue functioning and its deregulation in disease, with potential value as predictors of clinical outcomes (Bodenmiller, 2016). This is largely driven by vigorous development of novel technologies that enable us to capture such data (Bodenmiller, 2016, Lin et al., 2017, Goltsev et al., 2018, Aichler and Walch, 2015, Schulz et al., 2018). Future datasets at increased scale and resolution will enable powerful applications of the SVCA framework, which we have presented in this manuscript.
STAR★Methods
Key Resources Table
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| Deposited Data | ||
| IMC data | Schapiro et al., 2017 | https://www.nature.com/articles/nmeth.4391 |
| seqFISH data | Shah et al., 2017 | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5087994/ |
| Software and Algorithms | ||
| SVCA | this paper | https://github.com/damienArnol/svca |
| Limix | Lippert et al., 2014 | https://github.com/damienArnol/svca |
Lead Contact and Materials Availability
Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Oliver Stegle (Oliver.stegle@embl.de).
Method Details
SVCA Model Overview
SVCA uses a random effect approach based on the Gaussian Process (GP) framework (Rasmussen and Williams, 2006, Lippert et al., 2014) using additive covariance functions. The covariance is composed of four terms, modeling our assumption that the variance across cells of the gene expression level is due to three additive effects: an intrinsic effect due to the cell state, a cell-cell interaction effect, due to the state of the neighboring cells, and an environmental effect due to unobserved factors in the cell micro environment, such as local access to oxygen, nutrients etc.
In the following, we will define the different terms of the covariance, show how they are parameterized and how these parameters are optimized. We will then explain how this model can be used to assess the proportion of variance explained by each effect, as well as the statistical significance of each effect. See also Figure S1.
We will rely on the following nomenclature and notations:
Nomenclature:
-
•
Molecule of interest: Individual molecule, typically a gene or protein, on which SVCA is fitted.
-
•
Cell state: Intrinsic characteristic of a cell. In this paper, we take the overall expression profile excluding the gene or protein of interest as a multidimensional and continuous measure of cell state. Other possibilities include classifying cells into cell types.
-
•
Cellular neighborhood composition: Continuous measure of the molecular composition of a cell’s neighboring cells, summarized by weighting the molecular profiles of all neighboring cells with a squared exponential function of their distance to the focal cell.
-
•
Intrinsic effect: Effect of the cell state on the expression level of the molecule of interest.
-
•
Cell-cell interaction effect: Effect of cell-cell interactions on the expression level of the molecule of interest. These interactions may account for signaling between cells but also cell-types cooccurrences for example.
-
•
Environmental effect: Effect of the cell’s position on the expression level of the molecule of interest. This effect accounts for unmeasured variables from the microenvironment with an effect on gene expressions, such as local glucose or oxygen access.
-
•
Spatial variance signature: Concatenation of all variance estimates (intrinsic effect, environmental effect, cell-cell interaction effect and residual noise) across all molecules for a given image.
Notations:
-
•
- number of cells in a given image
-
•
- number of molecules (eg genes or proteins) in a given image
-
•
- Expression level of the molecule of interest in all cells Dimensions:
-
•
- Cell state matrix made of the entire expression profile of each cell minus the molecule of interest. Dimensions: . The molecule of interest is removed from the cell state matrix to prevent any cell-cell interaction false positive due to signal spillover between cells, as well as trivial intrinsic effect.
-
•
- euclidean distance between cell i and j
-
•
- cell-cell covariance for the intrinsic effect. Dimensions:
-
•
- cell-cell covariance for the cell-cell interaction effect. Dimensions:
-
•
- cell-cell covariance for the environmental effect. Dimensions:
With these notations, SVCA models the expression level across cells with the following Gaussian Process model:
Definition of Covariance Terms
The intrinsic effect is the effect of the cell state on the expression level of the molecule of interest. In our framework, it is modeled with the linear covariance term . This covariance term corresponds to a Bayesian linear regression that models the effect of the cell expression profile on the expression of the gene of interest: , where denotes the index of the gene of interest, with the following Normal prior on the effect sizes: . This covariance function has a scaling hyperparameter , which is proportional to the int variance explained by this component.
The environmental effect aims at accounting for other local sources of variation in the cell micro-environment which are not measured in the data and have an effect of the expression level of the modeled gene. To model this unobserved source of variation, we consider a Squared Exponential Kernel , which is able to capture complex non-linear dependencies and has previously been applied to spatial expression data (Svensson et al., 2018).
The cell-cell interaction effect models the effects of the types or states of all neighboring cells on the expression level of the molecule of interest. In the GP framework, it is modeled with the covariance function: where for every couple of cells i and j. for all i. This covariance term is equivalent to a Bayesian linear regression where gene expression profiles of all neighboring cells are used as covariates and the effect of a cell i on a cell j is weighted by a function of the distance between them: .
Parameters Inference
The variance parameters of the SVCA model are optimized by maximizing the log likelihood of the data using a gradient-based optimizer (Rasmussen and Williams, 2006, Lippert et al., 2014):
Where
The scales of the covariance terms, and are optimized with gradient descent using the lbfgs optimizer, which updates the parameters iteratively through small steps along the gradient of the likelihood until it reaches a local optimum (null gradient). The length scale of the environmental and the local terms is optimized with a grid search strategy, which avoids possible local optima.
Estimates of Variance Components
Variance components for each effect are estimated using Gower factors (Searle, 1982, Kostem and Eskin, 2013):
The Gower Factor of a covariance term computes the expected variance of a random variable which is normally distributed with the considered covariance. In other words, the Gower factor of each covariance term of the SVCA model computes the amount of gene/protein variance across cells which is explained by the corresponding effect:
For ,
To compute the fraction of variance explained by each effect modeled in SVCA (intrinsic, environmental, cell-cell interactions and noise), we normalize Gower factors as follows
This procedure enables us to break down the variance of every protein, across cells in the three effects of interest plus the noise.
Comparison to Related Models
Schapiro et al. (2017) - HistoCAT
HistoCAT(Schapiro et al., 2017) aims at measuring spatial co-occurrence of different cell types. Briefly, cells of one or multiple images are classified into discrete cell-types based on their expression profile using a clustering algorithm. For every cell, a neighborhood is defined as containing all cells within a fixed distance threshold (measured from membrane to membrane). Using this fixed neighborhood definition, histoCAT counts the number of occurrences of a given pair of cell types, in the same neighborhood. This number is then compared to a null distribution obtained from permuting the cells’ positions, which gives a p value for positive and negative cell types interactions.
Unlike SVCA, histoCAT does not quantify the effect of these interactions on individual expression levels.
Battich et al. (2015)
Battich et al. (2015) uses a regression approach to measure the effect of the cell microenvironment on individual expression levels. Briefly, 183 features are collected, quantifying intrinsic cell properties and microenvironmental properties. Microenvironmental features namely account for local cell crowding, number of adjacent neighbors, intercellular space around the cell, as well as the molecular profile of the neighbors, based on a fixed distance threshold. The dimensionality of this feature set is then reduced using principal component analysis (PCA), and single cell expression profiles are modeled with a fixed effect linear model with the first 20 PCs as covariates. The PCs are then a posteriori linked to the microenvironmental features of interest. Biological replicates are used to quantify the amount of variance explained by each covariate using out of sample prediction.
This method therefore quantifies directly the effect of microenvironmental features including cell-cell interactions. Unlike SVCA however, it relies on a definition of discrete microenvironmental features and the definition of fixed parameters such as a distance threshold to define a cell’s neighborhood, which limits the applicability of the method to general spatial data.
Goltsev et al. (2018)
Goltsev et al. (2018) approach also relies on the definition of discrete microenvironmental variables, used in a fixed effect linear model to predict the expression level of individual markers out of sample. In contrast to Battich et al., microenvironmental variables are not defined directly based on the molecular profile of neighboring cells, but based on the cell-type composition of the neighborhood. The different neighborhood cell-type compositions are clustered into discrete i-niches, used as a discrete input for the linear model.
This method therefore enables to quantify directly the effect of cell-cell interactions on individual molecular profiles of single cells. However, it again relies on a priori definition of microenvironmental variables, this time based on discrete cell-type assignments.
Model Validation Using Simulated Data
In order to be as realistic as possible, our simulations were based on real data from 11 images of the IMC dataset (Giesen et al., 2014): real cell positions, cell states, and intrinsic and environmental effects were used, and only the cell-cell interaction effect was rescaled for the purpose of the simulations.
Our workflow was as followed:
-
•
Fitting the SVCA model to the real dataset considered here (11 images and 26 proteins)
-
•
Simulating data from a multivariate normal distribution, with a covariance made of:
-
•
the intrinsic covariance from the fitted model
-
•
the environmental covariance from the fitted model
-
•
the noise covariance from the fitted model
-
•
a cell-cell interaction covariance which is a rescaled version of the one fitted to the data: , where represents the fitted covariance terms
-
•
Refitting SVCA to the simulated data, where the variance explained by cell-cell interactions is known from the rescaling step.
-
•
Comparing the variance estimates for cell-cell interactions with the ground truth.
In the following, the proportion of variance attributable to cell-cell interactions in the simulated data ranged from 10% to 90% (), and the rescaling factor was chosen accordingly:
Out of Sample Prediction on Real Data
The prediction performance of alternative models was assessed using 5-fold cross-validation. In order to assess the utility of different covariance terms in the model, we considered the following models:
-
•
a model with only an intrinsic covariance to it.
-
•
a model with an intrinsic component and a local component.
-
•
the full model with all three terms
Models were assessed by the mean prediction of Gaussian process regression (Rasmussen and Williams, 2006):
Where corresponds to the fitted covariance terms and corresponds to the fitted noise scale. corresponds to the fitted covariance function evaluated between the input for the hold-out sample and the input for the training samples .
Identifiability of Cell-Cell Interactions versus Environmental Effects
To understand the identifiability of cell-cell interactions versus environmental effects, we compared the variance estimates of SVCA with the variance estimates of a reduced model which does not account for cell-cell interactions. Both models were fitted in the simulation setting described in the main text (26 proteins, 11 images and 10 repeat experiments). Variance estimates of SVCA and the reduced model were averaged across proteins, images, and experiments.
Results were visualized using a Sankey plot (Figure 2E), which illustrates which term of the reduced model captures the variance that is explained by the cell-cell interaction term of the full model. The width of the edges correspond to an increase in the variance estimates for the intrinsic effect, the environmental effect and the noise, from the SVCA model to the reduced model. This represents the redistribution of the cell-cell interaction component to other variance estimate from the SVCA model to the reduced model.
Comparison to Baseline Models
In order to compare SVCA to simpler baseline approaches, we considered simulated data derived from a linear model which included an intrinsic effect, a cell-cell interaction effect and a confounding effect due to cell mis-segmentation.
As before, in silico gene expression profiles are generated from real IMC data. For a given IMC dataset, let be the expression profiles across cells for all genes, of dimensions . Let be the distance between cells i and j. We first simulated the expression profile of an in silico gene using the following linear model:
Where if the cell i is in the first neighbors of the cell j and otherwise.
The number of nearest neighbors involved in cell-cell interactions was also varied in order to simulate cell-cell interactions of variable ranges. The effect sizes and were drown from standard normal distributions, and the features and were standardized such that the variance explained by cell-cell interactions is . is a standard Gaussian noise.
We then simulated mis-segmentation between neighboring cells. For every cell in the image, two cells were chosen as mis-segmented with the focal cell. The probability for a cell j to be mis-segmented with a cell i was taken from the probability vector:
which models our assumption that the closer the cell, the more likely it is that they are mis-segmented.
The expression profile Y was then perturbed by mis-segmentation in the following manner:
where is the mean of the expression profile in the cells which are mis-segmented with the cell i. Simulations are then done while varying the relative effect of mis-segmentation .
The matrix was perturbed accordingly:
This models our assumption that all genes are affected in the same way by mis-segmentation which is reasonable but does not account for different subcellular localization of genes.
We then compared SVCA to four simpler models using data generated from the simulation setting described above. All models accounted for a cell intrinsic effect on the expression level of the simulated gene and a cell-cell interaction effect. The first three models were linear regressions with Ridge regularisation. The coefficient of regularisation was learnt with cross-validation using the RidgeCV function from the scikit-learn package (Pedregosa et al., 2011) with default parameters.
In all three linear regression models, the intrinsic effect was modeled as a linear combination of the expression profile of all genes measured in the cell, excluding the gene of interest. The three models differed in how they accounted for cell-cell interactions. The first model used all cells in the image, the impact of each cell being weighted by a function of the distance to the focal cell . The second model considered the average expression profile of the 5 nearest neighboring cells. And the third linear regression took a weighted average of these five nearest neighbors with the same weighting function .
The fourth model which was compared to SVCA was a reduced GP model containing all the covariance terms of SVCA apart from the local effect.
Data Processing and Experimental Procedures
Imaging Mass Cytometry (IMC) Data
With IMC, the analyzed tissue or cell culture is laser-dissected into a subcellular resolution grid of so-called voxels of dimension 1 μm × 1 μm. Every voxel of this grid is then analyzed with cyTOF (antibody based method), which results in protein counts of 26 proteins per voxel, which can be aggregated into single cell counts after cell segmentation (Giesen et al., 2014, Sommer et al., 2011, Schüffler et al., 2015, Carpenter et al., 2006). We analyzed a dataset of 46 breast cancer biopsies imaged with Imaging Mass Cytometry coming from 23 patients (Schapiro et al., 2017) (6 images were removed from the original dataset as they exhibited one or multiple markers with zero variance). 38 of these images are associated to clinical data:
These images contain between 267 and 1455 cells, with an average around 900 cells. 26 proteins counts are quantified at a subcellular level (between 10 and 100 pixels/measurements per cell).
The single cell expression levels were computed by taking the median protein count across pixels.
Data processing
In all cases, the data were then transformed with an Anscombe’s trans formation for variance stabilization of Negative Binomial data (Anscombe, 1948). The dispersion parameter in is optimized with gradient descent and the following log transformation is applied to the data:
The resulting signal is then normalized by regressing out the log of the total signal in the cell. This last step aims at taking into account local batch effects which would make some cells ”brighter” overall.
Before fitting SVCA, the stabilized expression profile of the target gene is subsequently raked standardized and transformed into normally distributed data using the probit function, in order to ensure a more robust fitting process due to a lesser sensitivity to outliers.
Analysis
We fitted SVCA on all processed IMC images independently and validated the results using 5-fold cross-validation as described in the Model Validation section. We then performed Principal Component Analysis on all SVCA variance signatures and used the ClusterSignificance R package to quantify tumor grade separation in the Principal Component space, as explained in the Downstream Analysis section.
mer-FISH and seq-FISH Data
Although mer-FISH and seq-FISH techniques differ slightly, the data produced and available online (Moffitt et al., 2016, Shah et al., 2017) come in a similar format. Briefly, it comes as a list of detected individual RNA molecules, associated to a precise position on the tissue and the index of the cell each molecule belongs to (obtained with automatic cell segmentation). Summarizing this data into a molecule count at the single cell level is therefore straightforward.
We analyzed a mer-FISH dataset of 20 images taken on a single plate of breast cancer cell culture. Each image contained between 2500 and 2900 cells and 130 genes were measured. Additionally, we analyzed a seqFISH dataset consisting of 20 images of a single mouse hippocampus (Shah et al., 2017). The images were taken in different regions of the hippocampus and 249 genes were measured.
Analysis
We fitted SVCA on all processed seq-FISH images independently and validated the results using 5-fold cross-validation as described in the Model Validation section. We then performed Principal Component Analysis on all SVCA variance signatures and Gene Set Enrichment Analysis for the genes with higher cell-cell interaction components, as described in the Downstream Analysis section.
Downstream Analysis
Gene Categories Enrichment in seqFISH
The statistical significance for the enrichment of gene categories for cell-cell interactions and intrinsic effect was done using a permutation strategy similar to the one used in GSEA (Subramanian et al., 2005, Mootha et al., 2003):
-
•
Genes were ranked based on the size of the tested variance component (cell-cell interactions or intrinsic effect)
-
•
A GSEA-like trace was computed for each gene category and the height of this trace is considered as a test statistic.
-
•
Gene names were permuted 10,000 times in order to estimate an empirical p value for the statistic described above.
-
•
p values were adjusted for multiple testing using a Benjamini-Hochberg procedure (Benjamini and Hochberg, 1995).
Grade Separation in Principal Component Analysis
We used the Mlp method of the ClusterSignificance package in R (Serviss et al., 2017) to quantify the separation between grades 1 and 3 in the PCA projection of the SVCA signatures.
Briefly, the package computes the centroids of the principal components for the tumors of grade 1 and 3 independently, and projects the samples onto the line going between the two centroids, providing a one dimensional representation of the samples. The package then computes the following separation score between the grade 1 and grade 3: , using the perpendicular line that best separates the two classes in their one dimensional representation.
Finally, ClusterSignificance uses permutations to compute the null distribution of the score defined above and deduce a quantile-based p value for the separation between grade 1 tumors and grade 3 tumors.
Quantification and Statistical Analysis
Significance of Variance Components
This section describes the procedure to assess the significance of the cell-cell interaction component, as this is the variance component of main interest for our study. The significance of other variance components in SVCA can be assessed analogously.
The significance of the cell-cell interaction component is assessed based on the log likelihood ratio (LLR) between the full SVCA model and a reduced model omitting the cell-cell interaction component. Given that the reduced model is nested, we rely on Wilks’ theorem (Wilks, 1938), where if the null hypothesis is true (no cell-cell interactions), the LLR statistics is expected to follow a distribution. In practise, we calibrate this distribution by fitting its parameter to an empirical null distribution of LLRs obtained from simulations (Bůžková et al., 2011, Casale et al., 2017).
The simulation procedure is as follows. For all proteins and all images, we fitted the null model (no cell-cell interaction) and simulated data from the fitted normal distribution. We simulated 100 data points for each test and then fitted a distribution to those using an off-the-shelf non linear optimization method (“[PDF]Package ‘Nloptr’ - CRAN.R-Project.org,” n.d.). We then compared the LLR obtained, for each protein and each image, for the real data to the corresponding fitted distribution and estimated p values from this comparison.
For every test, we computed a cell-cell interaction p value using the method described before and used the Benjamini-Hochberg procedure (Benjamini and Hochberg, 1995) to adjust p values for multiple testing. For each protein, we then counted the number of images in which the cell-cell interaction component was significant for a FDR threshold of 1%.
P Value Calibration for the Cell-Cell Interaction Component
We used simulations to assess the p value calibration for the cell-cell interaction component. For 11 random images and all 26 proteins of the IMC dataset, we simulated from the null model (SVCA without the cell-cell interaction term whose parameters were fitted to the data). We then used the procedure described above to compute p values for the null distribution, and computed the empirical false positive rate for multiple p value thresholds (Figure 2).
Error Bars and Boxplots
Error bars correspond to plus and minus one standard deviation across images and proteins (Figure 3). The lower and upper hinges of boxplots correspond to the 25th and 75th percentiles. The lower and upper whiskers extend from the hinge to the largest value no further than 1.5 ∗ IQR (Inter Quantile Range) from the lower and upper hinge respectively.
Data and Software Availability
An open source implementation of SVCA is available at https://github.com/damienArnol/svca, which builds on the limix package (Lippert et al., 2014).
Acknowledgments
D.A., J.S.-R., and O.S. acknowledge EMBL core funding. D.S. was supported by the Forschungskredit of the University of Zurich grant FK-74419-01-01, and the BioEntrepreneur-Fellowship of the University of Zurich, reference BIOEF-17-001. B.B.’s research is funded by an SNSF R’Equip grant; an SNSF Assistant Professorship grant; the SystemsX Transfer Project “Friends and Foes;” the SystemsX MetastasiX and PhosphoNetX grant; NIH grant UC4 DK108132; and the European Research Council (ERC) under the European Union’s Seventh Framework Program (FP/2007-2013)/ERC grant agreement 336921. We thank R. Argelaguet, V. Svensson, R. Vento, H. Jackson, A. Baud, N. Cai, F.P. Casale, and D. Horta for discussions on data processing, model design, and implementation and results visualization. We thank A. César Razquin and E. Girardi for insightful insights into SLCs.
Author Contributions
D.A. and O.S. developed the statistical method. D.A. implemented the model and analyzed all the data. D.S. and B.B. contributed to the interpretation of the results. D.A., D.S., O.S., and J.S.-R. wrote the manuscript with input from all authors. J.S.-R. and O.S. conceived the project and supervised the work.
Declaration of Interests
The authors declare no competing interests.
Published: October 1, 2019
Footnotes
Supplemental Information can be found online at https://doi.org/10.1016/j.celrep.2019.08.077.
Contributor Information
Julio Saez-Rodriguez, Email: julio.saez@bioquant.uni-heidelberg.de.
Oliver Stegle, Email: oliver.stegle@embl.de.
Supplemental Information
The first column corresponds to the gene name as given in Shah et al. (2017). The second column corresponds to the manual classification of the gene.
References
- Achim K., Pettit J.-B., Saraiva L.R., Gavriouchkina D., Larsson T., Arendt D., Marioni J.C. High-throughput spatial mapping of single-cell RNA-seq data to tissue of origin. Nat. Biotechnol. 2015;33:503–509. doi: 10.1038/nbt.3209. [DOI] [PubMed] [Google Scholar]
- Aichler M., Walch A. MALDI Imaging mass spectrometry: current frontiers and perspectives in pathology research and practice. Lab. Invest. 2015;95:422–431. doi: 10.1038/labinvest.2014.156. [DOI] [PubMed] [Google Scholar]
- Angelo M., Bendall S.C., Finck R., Hale M.B., Hitzman C., Borowsky A.D., Levenson R.M., Lowe J.B., Liu S.D., Zhao S. Multiplexed ion beam imaging of human breast tumors. Nat. Med. 2014;20:436–442. doi: 10.1038/nm.3488. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Angulo M.C., Kozlov A.S., Charpak S., Audinat E. Glutamate released from glial cells synchronizes neuronal activity in the hippocampus. J. Neurosci. 2004;24:6920–6927. doi: 10.1523/JNEUROSCI.0473-04.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Anscombe F.J. The Transformation of Poisson, Binomial and Negative-Binomial Data. Biometrika. 1948;35:246–254. [Google Scholar]
- Ayuob N.N., Ali S.S. Cell-cell interactions and cross talk described in normal and disease conditions: Morphological approach. In: Gowder S., editor. Cell Interaction. InTech; 2012. https://www.intechopen.com/books/cell-interaction/cell-cell-interactions-and-cross-talk-described-in-normal-and-disease-conditions-morphological-appro [Google Scholar]
- Banerji S., Ni J., Wang S.X., Clasper S., Su J., Tammi R., Jones M., Jackson D.G. LYVE-1, a new homologue of the CD44 glycoprotein, is a lymph-specific receptor for hyaluronan. J. Cell Biol. 1999;144:789–801. doi: 10.1083/jcb.144.4.789. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Battich N., Stoeger T., Pelkmans L. Image-based transcriptomics in thousands of single human cells at single-molecule resolution. Nat. Methods. 2013;10:1127–1133. doi: 10.1038/nmeth.2657. [DOI] [PubMed] [Google Scholar]
- Battich N., Stoeger T., Pelkmans L. Control of Transcript Variability in Single Mammalian Cells. Cell. 2015;163:1596–1610. doi: 10.1016/j.cell.2015.11.018. [DOI] [PubMed] [Google Scholar]
- Baud A., Mulligan M.K., Casale F.P., Ingels J.F., Bohl C.J., Callebert J., Launay J.-M., Krohn J., Legarra A., Williams R.W., Stegle O. Genetic Variation in the Social Environment Contributes to Health and Disease. PLoS Genet. 2017;13:e1006498. doi: 10.1371/journal.pgen.1006498. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Benjamini Y., Hochberg Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J. R. Stat. Soc. Series B Stat. Methodol. 1995;57:289–300. [Google Scholar]
- Bodenmiller B. Multiplexed Epitope-Based Tissue Imaging for Discovery and Healthcare Applications. Cell Syst. 2016;2:225–238. doi: 10.1016/j.cels.2016.03.008. [DOI] [PubMed] [Google Scholar]
- Brakebusch C., Fässler R. The integrin-actin connection, an eternal love affair. EMBO J. 2003;22:2324–2333. doi: 10.1093/emboj/cdg245. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Buettner F., Natarajan K.N., Casale F.P., Proserpio V., Scialdone A., Theis F.J., Teichmann S.A., Marioni J.C., Stegle O. Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells. Nat. Biotechnol. 2015;33:155–160. doi: 10.1038/nbt.3102. [DOI] [PubMed] [Google Scholar]
- Buettner F., Pratanwanich N., McCarthy D.J., Marioni J.C., Stegle O. f-scLVM: scalable and versatile factor analysis for single-cell RNA-seq. Genome Biol. 2017;18:212. doi: 10.1186/s13059-017-1334-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bůžková P., Lumley T., Rice K. Permutation and parametric bootstrap tests for gene-gene and gene-environment interactions. Ann. Hum. Genet. 2011;75:36–45. doi: 10.1111/j.1469-1809.2010.00572.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carpenter A.E., Jones T.R., Lamprecht M.R., Clarke C., Kang I.H., Friman O., Guertin D.A., Chang J.H., Lindquist R.A., Moffat J. CellProfiler: image analysis software for identifying and quantifying cell phenotypes. Genome Biol. 2006;7:R100. doi: 10.1186/gb-2006-7-10-r100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carpenter C.L. Actin cytoskeleton and cell signaling. Crit. Care Med. 2000;28(4, Suppl):N94–N99. doi: 10.1097/00003246-200004001-00011. [DOI] [PubMed] [Google Scholar]
- Casale F.P., Horta D., Rakitsch B., Stegle O. Joint genetic analysis using variant sets reveals polygenic gene-context interactions. PLoS Genet. 2017;13:e1006693. doi: 10.1371/journal.pgen.1006693. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chang Q., Ornatsky O.I., Siddiqui I., Loboda A., Baranov V.I., Hedley D.W. Imaging Mass Cytometry. Cytometry A. 2017;91:160–169. doi: 10.1002/cyto.a.23053. [DOI] [PubMed] [Google Scholar]
- Chen K.H., Boettiger A.N., Moffitt J.R., Wang S., Zhuang X. RNA imaging. Spatially resolved, highly multiplexed RNA profiling in single cells. Science. 2015;348:aaa6090. doi: 10.1126/science.aaa6090. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cheng J.-C., Chang H.-M., Fang L., Sun Y.-P., Leung P.C.K. TGF-β1 up-regulates connexin43 expression: a potential mechanism for human trophoblast cell differentiation. J. Cell. Physiol. 2015;230:1558–1566. doi: 10.1002/jcp.24902. [DOI] [PubMed] [Google Scholar]
- Chlon L., Markowetz F. Causal Modeling Dissects Tumour–Microenvironment Interactions In Breast Cancer. bioRxiv. 2017 [Google Scholar]
- Elston C.W., Ellis I.O. Pathological prognostic factors in breast cancer. I. The value of histological grade in breast cancer: experience from a large study with long-term follow-up. Histopathology. 1991;19:403–410. doi: 10.1111/j.1365-2559.1991.tb00229.x. [DOI] [PubMed] [Google Scholar]
- Franke W.W. Discovering the molecular components of intercellular junctions--a historical view. Cold Spring Harb. Perspect. Biol. 2009;1:a003061. doi: 10.1101/cshperspect.a003061. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fricker M., Neher J.J., Zhao J.-W., Théry C., Tolkovsky A.M., Brown G.C. MFG-E8 mediates primary phagocytosis of viable neurons during neuroinflammation. J. Neurosci. 2012;32:2657–2666. doi: 10.1523/JNEUROSCI.4837-11.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fukumura D. Role of Microenvironment on Gene Expression, Angiogenesis and Microvascular Function in Tumors. In: Meadows G.G., editor. Integration/Interaction of Oncologic Growth. Springer; 2005. pp. 23–36. [Google Scholar]
- Gerdes M.J., Sevinsky C.J., Sood A., Adak S., Bello M.O., Bordwell A., Can A., Corwin A., Dinn S., Filkins R.J. Highly multiplexed single-cell analysis of formalin-fixed, paraffin-embedded cancer tissue. Proc. Natl. Acad. Sci. USA. 2013;110:11982–11987. doi: 10.1073/pnas.1300136110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Giesen C., Wang H.A.O., Schapiro D., Zivanovic N., Jacobs A., Hattendorf B., Schüffler P.J., Grolimund D., Buhmann J.M., Brandt S. Highly multiplexed imaging of tumor tissues with subcellular resolution by mass cytometry. Nat. Methods. 2014;11:417–422. doi: 10.1038/nmeth.2869. [DOI] [PubMed] [Google Scholar]
- Goltsev Y., Samusik N., Kennedy-Darling J., Bhate S., Hale M., Vazquez G., Black S., Nolan G.P. Deep Profiling of Mouse Splenic Architecture with CODEX Multiplexed Imaging. Cell. 2018;174:968–981.e15. doi: 10.1016/j.cell.2018.07.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hensman J., Fusi N., Lawrence N.D. Gaussian Processes for Big Data. arXiv. 2013 https://arxiv.org/abs/1309.6835 [Google Scholar]
- Iversen L. Neurotransmitter transporters and their impact on the development of psychopharmacology. Br. J. Pharmacol. 2006;147(Suppl 1):S82–S88. doi: 10.1038/sj.bjp.0706428. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kamińska K., Szczylik C., Bielecka Z.F., Bartnik E., Porta C., Lian F., Czarnecka A.M. The role of the cell-cell interactions in cancer progression. J. Cell. Mol. Med. 2015;19:283–296. doi: 10.1111/jcmm.12408. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kostem E., Eskin E. Improving the accuracy and efficiency of partitioning heritability into the contributions of genomic regions. Am. J. Hum. Genet. 2013;92:558–564. doi: 10.1016/j.ajhg.2013.03.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lin J.-R., Fallahi-Sichani M., Sorger P.K. Highly multiplexed imaging of single cells using a high-throughput cyclic immunofluorescence method. Nat. Commun. 2015;6:8390. doi: 10.1038/ncomms9390. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lin J.-R., Izar B., Mei S., Wang S., Shah P., Sorger P. A Simple Open-Source Method for Highly Multiplexed Imaging of Single Cells in Tissues and Tumours. bioRxiv. 2017 [Google Scholar]
- Lippert C., Casale F.P., Rakitsch B., Stegle O. LIMIX: Genetic Analysis of Multiple Traits. bioRxiv. 2014 [Google Scholar]
- Lisman J., Yasuda R., Raghavachari S. Mechanisms of CaMKII action in long-term potentiation. Nat. Rev. Neurosci. 2012;13:169–182. doi: 10.1038/nrn3192. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lyssiotis C.A., Kimmelman A.C. Metabolic Interactions in the Tumor Microenvironment. Trends Cell Biol. 2017;27:863–875. doi: 10.1016/j.tcb.2017.06.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mason S. Lactate Shuttles in Neuroenergetics-Homeostasis, Allostasis and Beyond. Front. Neurosci. 2017;11:43. doi: 10.3389/fnins.2017.00043. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Masson J., Sagné C., Hamon M., El Mestikawy S. Neurotransmitter transporters in the central nervous system. Pharmacol. Rev. 1999;51:439–464. [PubMed] [Google Scholar]
- Meugnier E., Rome S., Vidal H. Regulation of gene expression by glucose. Curr. Opin. Clin. Nutr. Metab. Care. 2007;10:518–522. doi: 10.1097/MCO.0b013e3281298fef. [DOI] [PubMed] [Google Scholar]
- Moffitt J.R., Hao J., Wang G., Chen K.H., Babcock H.P., Zhuang X. High-throughput single-cell gene-expression profiling with multiplexed error-robust fluorescence in situ hybridization. Proc. Natl. Acad. Sci. USA. 2016;113:11046–11051. doi: 10.1073/pnas.1612826113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mootha V.K., Lindgren C.M., Eriksson K.-F., Subramanian A., Sihag S., Lehar J., Puigserver P., Carlsson E., Ridderstråle M., Laurila E. PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat. Genet. 2003;34:267–273. doi: 10.1038/ng1180. [DOI] [PubMed] [Google Scholar]
- Moreau H.D., Piel M., Voituriez R., Lennon-Duménil A.-M. Integrating Physical and Molecular Insights on Immune Cell Migration. Trends Immunol. 2018;39:632–643. doi: 10.1016/j.it.2018.04.007. [DOI] [PubMed] [Google Scholar]
- Neher J.J., Emmrich J.V., Fricker M., Mander P.K., Théry C., Brown G.C. Phagocytosis executes delayed neuronal death after focal brain ischemia. Proc. Natl. Acad. Sci. USA. 2013;110:E4098–E4107. doi: 10.1073/pnas.1308679110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- O’Donnell M., Chance R.K., Bashaw G.J. Axon growth and guidance: receptor regulation and signal transduction. Annu. Rev. Neurosci. 2009;32:383–412. doi: 10.1146/annurev.neuro.051508.135614. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Oliva J.B., Dubey A., Wilson A.G., Poczos B., Schneider J., Xing E.P. Proceedings of the 19th International Conference on Artificial Intelligence and Statistic. 2016. Bayesian Nonparametric Kernel-Learning; pp. 1078–1086. [Google Scholar]
- Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., Grisel O., Blondel M., Prettenhofer P., Weiss R., Dubourg V. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011;12:2825–2830. [Google Scholar]
- Quiñonero-Candela J., Rasmussen C.E. A Unifying View of Sparse Approximate Gaussian Process Regression. J. Mach. Learn. Res. 2005;6:1939–1959. [Google Scholar]
- Rahimi A., Recht B. Random Features for Large-Scale Kernel Machines. In: Platt J.C., Koller D., Singer Y., Roweis S.T., editors. Advances in Neural Information Processing Systems. Vol. 20. Curran Associates; 2008. pp. 1177–1184. [Google Scholar]
- Rasmussen C.E., Williams C.K.I. MIT Press; 2006. Gaussian Processes for Machine Learning. [Google Scholar]
- Satija R., Farrell J.A., Gennert D., Schier A.F., Regev A. Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 2015;33:495–502. doi: 10.1038/nbt.3192. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schapiro D., Jackson H.W., Raghuraman S., Fischer J.R., Zanotelli V.R.T., Schulz D., Giesen C., Catena R., Varga Z., Bodenmiller B. histoCAT: analysis of cell phenotypes and interactions in multiplex image cytometry data. Nat. Methods. 2017;14:873–876. doi: 10.1038/nmeth.4391. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schüffler P.J., Schapiro D., Giesen C., Wang H.A.O., Bodenmiller B., Buhmann J.M. Automatic single cell segmentation on highly multiplexed tissue images. Cytometry A. 2015;87:936–942. doi: 10.1002/cyto.a.22702. [DOI] [PubMed] [Google Scholar]
- Schulz D., Zanotelli V.R.T., Fischer J.R., Schapiro D., Engler S., Lun X.-K., Jackson H.W., Bodenmiller B. Simultaneous Multiplexed Imaging of mRNA and Proteins with Subcellular Resolution in Breast Cancer Tissue Samples by Mass Cytometry. Cell Syst. 2018;6:25–36.e5. doi: 10.1016/j.cels.2017.12.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Scialdone A., Natarajan K.N., Saraiva L.R., Proserpio V., Teichmann S.A., Stegle O., Marioni J.C., Buettner F. Computational assignment of cell-cycle stage from single-cell transcriptome data. Methods. 2015;85:54–61. doi: 10.1016/j.ymeth.2015.06.021. [DOI] [PubMed] [Google Scholar]
- Searle S.R. Wiley-Interscience; 1982. Matrix Algebra Useful for Statistics. [Google Scholar]
- Serviss J.T., Gådin J.R., Eriksson P., Folkersen L., Grandér D. ClusterSignificance: a bioconductor package facilitating statistical analysis of class cluster separations in dimensionality reduced data. Bioinformatics. 2017;33:3126–3128. doi: 10.1093/bioinformatics/btx393. [DOI] [PubMed] [Google Scholar]
- Shah S., Lubeck E., Zhou W., Cai L. seqFISH Accurately Detects Transcripts in Single Cells and Reveals Robust Spatial Organization in the Hippocampus. Neuron. 2017;94:752–758.e1. doi: 10.1016/j.neuron.2017.05.008. [DOI] [PubMed] [Google Scholar]
- Shamah S.M., Lin M.Z., Goldberg J.L., Estrach S., Sahin M., Hu L., Bazalakova M., Neve R.L., Corfas G., Debant A., Greenberg M.E. EphA receptors regulate growth cone dynamics through the novel guanine nucleotide exchange factor ephexin. Cell. 2001;105:233–244. doi: 10.1016/s0092-8674(01)00314-2. [DOI] [PubMed] [Google Scholar]
- Sieck G. Physiology in perspective: cell-cell interactions: the physiological basis of communication. Physiology (Bethesda) 2014;29:220–221. doi: 10.1152/physiol.00031.2014. [DOI] [PubMed] [Google Scholar]
- Snelson E., Ghahramani Z. Sparse Gaussian processes using pseudo-inputs. In: Weiss Y., Schölkopf B., Platt J.C., editors. Advances in Neural Information Processing Systems. Vol. 18. MIT Press; 2006. pp. 1257–1264. [Google Scholar]
- Sommer C., Straehle C., Köthe U., Hamprecht F.A. Proceedings of the 2011 IEEE International Symposium on Biomedical Imaging: From Nano to Macro. IEEE; 2011. Ilastik: Interactive Learning and Segmentation Toolkit; p. 230. 33. [Google Scholar]
- Ståhl P.L., Salmén F., Vickovic S., Lundmark A., Navarro J.F., Magnusson J., Giacomello S., Asp M., Westholm J.O., Huss M. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science. 2016;353:78–82. doi: 10.1126/science.aaf2403. [DOI] [PubMed] [Google Scholar]
- Strell C., Hilscher M.M., Laxman N., Svedlund J., Wu C., Yokota C., Nilsson M. Placing RNA in Context and Space - Methods for Spatially Resolved Transcriptomics. FEBS J. 2018;286:1468–1481. doi: 10.1111/febs.14435. [DOI] [PubMed] [Google Scholar]
- Subramanian A., Tamayo P., Mootha V.K., Mukherjee S., Ebert B.L., Gillette M.A., Paulovich A., Pomeroy S.L., Golub T.R., Lander E.S., Mesirov J.P. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA. 2005;102:15545–15550. doi: 10.1073/pnas.0506580102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Surguchov A., Palazzo R.E., Surgucheva I. Gamma synuclein: subcellular localization in neuronal and non-neuronal cells and effect on signal transduction. Cell Motil. Cytoskeleton. 2001;49:218–228. doi: 10.1002/cm.1035. [DOI] [PubMed] [Google Scholar]
- Svensson V., Teichmann S.A., Stegle O. SpatialDE: Identification of spatially variable genes. Nat. Methods. 2018;15:343–346. doi: 10.1038/nmeth.4636. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vargas K.J., Schrod N., Davis T., Fernandez-Busnadiego R., Taguchi Y.V., Laugks U., Lucic V., Chandra S.S. Synucleins Have Multiple Effects on Presynaptic Architecture. Cell Rep. 2017;18:161–173. doi: 10.1016/j.celrep.2016.12.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Varol C., Mildner A., Jung S. Macrophages: development and tissue specialization. Annu. Rev. Immunol. 2015;33:643–675. doi: 10.1146/annurev-immunol-032414-112220. [DOI] [PubMed] [Google Scholar]
- Vitner E.B., Dekel H., Zigdon H., Shachar T., Farfel-Becker T., Eilam R., Karlsson S., Futerman A.H. Altered expression and distribution of cathepsins in neuronopathic forms of Gaucher disease and in other sphingolipidoses. Hum. Mol. Genet. 2010;19:3583–3590. doi: 10.1093/hmg/ddq273. [DOI] [PubMed] [Google Scholar]
- Wagner A., Regev A., Yosef N. Revealing the vectors of cellular identity with single-cell genomics. Nat. Biotechnol. 2016;34:1145–1160. doi: 10.1038/nbt.3711. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang Z.-W. Regulation of synaptic transmission by presynaptic CaMKII and BK channels. Mol. Neurobiol. 2008;38:153–166. doi: 10.1007/s12035-008-8039-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wilks S.S. The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses. Ann. Math. Stat. 1938;9:60–62. [Google Scholar]
- Yi F., Catudio-Garrett E., Gábriel R., Wilhelm M., Erdelyi F., Szabo G., Deisseroth K., Lawrence J. Hippocampal “cholinergic interneurons” visualized with the choline acetyltransferase promoter: anatomical distribution, intrinsic membrane properties, neurochemical characteristics, and capacity for cholinergic modulation. Front. Synaptic Neurosci. 2015;7:4. doi: 10.3389/fnsyn.2015.00004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yoshikawa F., Sato Y., Tohyama K., Akagi T., Hashikawa T., Nagakura-Takagi Y., Sekine Y., Morita N., Baba H., Suzuki Y. Opalin, a transmembrane sialylglycoprotein located in the central nervous system myelin paranodal loop membrane. J. Biol. Chem. 2008;283:20830–20840. doi: 10.1074/jbc.M801314200. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
The first column corresponds to the gene name as given in Shah et al. (2017). The second column corresponds to the manual classification of the gene.
Data Availability Statement
An open source implementation of SVCA is available at https://github.com/damienArnol/svca, which builds on the limix package (Lippert et al., 2014).




