Computational analysis of super-resolved in situ sequencing data reveals genes modified by immune–tumor contact events

Michal Danino-Levi; Tal Goldberg; Maya Keter; Nikol Akselrod; Noa Shprach-Buaron; Modi Safra; Gonen Singer; Shahar Alon

doi:10.1261/rna.079801.123

. 2024 Jul;30(7):749–759. doi: 10.1261/rna.079801.123

Computational analysis of super-resolved in situ sequencing data reveals genes modified by immune–tumor contact events

Michal Danino-Levi ^1,^2,³, Tal Goldberg ^1,^2,³, Maya Keter ¹, Nikol Akselrod ¹, Noa Shprach-Buaron ^1,^2,³, Modi Safra ^1,^2,³, Gonen Singer ^1,^✉, Shahar Alon ^1,^2,^3,^✉

PMCID: PMC11182005 PMID: 38575346

Abstract

Cancer cells can manipulate immune cells and escape from the immune system response. Quantifying the molecular changes that occur when an immune cell touches a tumor cell can increase our understanding of the underlying mechanisms. Recently, it became possible to perform such measurements in situ—for example, using expansion sequencing, which enabled in situ sequencing of genes with super-resolution. We systematically examined whether individual immune cells from specific cell types express genes differently when in physical proximity to individual tumor cells. First, we demonstrated that a dense mapping of genes in situ can be used for the segmentation of cell bodies in 3D, thus improving our ability to detect likely touching cells. Next, we used three different computational approaches to detect the molecular changes that are triggered by proximity: differential expression analysis, tree-based machine learning classifiers, and matrix factorization analysis. This systematic analysis revealed tens of genes, in specific cell types, whose expression separates immune cells that are proximal to tumor cells from those that are not proximal, with a significant overlap between the different detection methods. Remarkably, an order of magnitude more genes are triggered by proximity to tumor cells in CD8 T cells compared to CD4 T cells, in line with the ability of CD8 T cells to directly bind major histocompatibility complex (MHC) class I on tumor cells. Thus, in situ sequencing of an individual biopsy can be used to detect genes likely involved in immune–tumor cell–cell interactions. The data used in this manuscript and the code of the InSituSeg, machine learning, cNMF, and Moran's I methods are publicly available at doi:10.5281/zenodo.7497981.

Keywords: RNA imaging, cancer biology, cell–cell interactions, genomics, spatial genomics

INTRODUCTION

The communication of the cancer cells with different types of cells that surround them, and in particular immune cells, can inhibit or promote tumor proliferation (Nishida-Aoki and Gujral 2019). Therefore, the study of cellular interactions within tumor tissues is essential for understanding the disease progression and the potential for its treatment (Wang et al. 2018a). However, immune–tumor interactions in cancer tissue remain largely uncharacterized (Giladi et al. 2020). To obtain in-depth characterization of immune–tumor cell–cell interactions, single-cell quantification is needed. Alas, standard single-cell genomic technologies can profile each cell separately but only after tissue dissociation, therefore, losing all information on cell locations in general, and on cell–cell interactions in particular. Single-cell sequencing protocols can be modified to characterize immune–tumor interactions (e.g., using PIC-seq; Giladi et al. 2020). However, because this method uses small aggregates of cells, it is not trivial to reconstruct single-cell information (i.e., accurately assign the sequenced genes in each aggregate to their cell of origin).

A direct quantification of cell–cell interactions between individual immune and tumor cells can be obtained via in situ approaches, which use imaging to assess the identity and location of expressed genes. Spatially resolved transcriptomics using technologies such as Slide-seq and spatial transcriptomics (ST), allow sequencing of RNA fragments, potentially from all genes, to be mapped to their spatial location in human tissues and biopsies (Ståhl et al. 2016; Rodriques et al. 2019; Vickovic et al. 2019; Stickels et al. 2021). However, to date, these technologies cannot allow the detection and quantification of single cells. This is mainly because the tissue is dissolved in the process, prohibiting the acquisition of cellular morphological features such as DAPI staining for the nucleus, and also because the resolution is not high enough for single-cell analysis. Although there are computational attempts to reconstruct single-cell information from this data (Elosua-Bayes et al. 2021; Rao et al. 2021), and to integrate this data with single-cell sequencing (Kleshchevnikov et al. 2020; Longo et al. 2021; Cable et al. 2022), accurate assignment of genes to single cells is still a challenge. Technologies based on multiplexed fluorescent in situ hybridization (FISH) allow measuring tens and even hundreds of genes in situ with a single-cell resolution. These technologies include MERFISH (Moffitt et al. 2016) as well as SeqFISH, STARmap, ISS, RNAscope, and BOLORAMIS (Ke et al. 2013; Codeluppi et al. 2018; Wang et al. 2018b; Eng et al. 2019; Liu et al. 2021). A recent technology, termed expansion sequencing or ExSeq, allows in situ sequencing with super-resolution (Alon et al. 2021). Here, we use an ExSeq measurement of 297 genes in a human breast cancer biopsy to perform a new kind of analysis—quantification of gene expression modifications in single interacting cells in situ. We identify physically touching cells with super-resolution, quantify immune–tumor cell–cell interactions, and determine how an immune cell is changing its gene expression profile when it is close to a tumor cell, and vice versa (Fig. 1).

FIGURE 1. — Overview of the detection of immune–tumor cross talk genes. First, the ExSeq images were segmented using InSituSeg. Next, cell typing was performed using the cell's expression profiles, clustered after dimension reduction, and displayed via uniform manifold approximation and projection (UMAP). Finally, cross talk genes were detected using a differential expression, tree-based machine learning methods, and matrix factorization using cNMF (Kotliar et al. 2019). In the cNMF panel, gene expression program (GEP) can define cell type (blue = T cells, brown = tumor cells), or be proximity-related (yellow). In the schema, two GEPs represent cell types, and one GEP is triggered by proximity. The pie chart inside each cell describes its GEP usage.

RESULTS AND DISCUSSION

3D segmentation of single-cell bodies using in situ sequencing data

With the aim to characterize immune–tumor cell–cell interactions, we used a spatial data set of a core biopsy, 1347 × 621 × 8 µm in size, taken from a patient with metastatic breast cancer infiltration into the liver, and sequenced in situ via targeted ExSeq (Alon et al. 2021). With targeted ExSeq, a set of genes is selected and then oligonucleotide padlock probes bearing barcodes for each selected gene are hybridized to specific transcripts. These padlock probes are amplified in situ to generate amplicons for subsequent readout through in situ sequencing of the barcodes. The resulting sequenced amplicons (termed reads) give the precise location of the transcripts in situ. In this biopsy, 297 genes were characterized in situ with super-resolution (Supplemental Table S1), because of 3.3× physical expansion of the tissue. The interrogated genes included gene markers for cell types and genes known or suspected to be associated with cancer tissues (Materials and Methods).

We developed a pipeline for ascribing in situ sequencing reads to cell bodies, termed InSituSeg, which aids in pinpointing touching cells in 3D, even in a densely packed tumor tissue (Supplemental Figs. S1 and S2). The main idea of InSituSeg is to use the dense mapping of genes in situ for the segmentation of cell bodies in 3D (Fig. 2). Segmentation of cells is typically performed using only nuclei staining, without using information about RNA location (Stringer et al. 2021; Hollandi et al. 2022). This procedure does not maximize the number of sequencing reads assigned to cells, mainly because sequencing reads are often located in the cell soma outside the nucleus. In contrast to recent tools (Hu et al. 2021; Littman et al. 2021; Petukhov et al. 2022), InSituSeg does not use prior information about cell types, or even information about RNA identities (i.e., genes). Instead, it uses only imaging data resulting from a typical in situ sequencing experiment: DAPI staining and the locations of the sequenced RNA molecules (Fig. 2; Supplemental Figs. S3 and S4; Materials and Methods). InSituSeg is performed in 3D, which aids in the separation of cells that seem to be overlapping when looking only at the x–y plane (Supplemental Fig. S1); and therefore has an advantage compared to 2D watershed-based segmentation, which is performed on individual z-planes. Importantly, because cell type information is not used, InSituSeg can ascribe an atypical gene to a cell from a given cell type, and thus can possibly better represent the heterogeneity of individual cells.

FIGURE 2. — Scheme of the InSituSeg pipeline. The pipeline uses dense mapping of genes in situ for segmentation of cell bodies, using pixel intensity thresholding of 3D images. The input is a DAPI-stained image and the spatial locations of the mRNA molecules, and the output is a segmented 3D image with the mRNA assigned to each cell body. (A) Three cells are presented (columns). (I) After DAPI staining, the signal of the cell bodies is weaker compared to the strong nucleus staining of genomic DNA, but nevertheless can be clearly detected in the examined cells with pixel intensity thresholding (II). (*III*) A clear overlap between the hues observed in the DAPI image and the sequenced RNA (red dots, the DAPI intensities are shown in red–blue for better visualization). This overlap further confirms that the hues correspond to cell bodies. (B) The segmentation pipeline is composed of six steps (Materials and Methods): (I) Illumination correction. (II) Detection of nuclei voxels. (*III*) Refinement of nuclei voxels. (IV) Splitting of large nuclei. For example, the large putative nucleus marked by a yellow rectangle in *III* is split in IV into two nuclei. (V) Detection of cell body voxels using watershed segmentation. (VI) Assignment of mRNA molecules to cell bodies.

InSituSeg uses pixel intensity thresholding to reduce the strong nuclei staining of the DAPI and reveal residual DAPI staining in the cytosol (Fig. 2; Materials and Methods). Residual DAPI staining in the cytosol was demonstrated before in the context of multiplexed FISH imaging, and was termed “cytoDAPI” (Wang et al. 2021). This residual DNA staining can be a result of RNA staining (as suggested in Wang et al. 2021) or due to staining of rolonies (i.e., the padlocks that bind single-molecule RNA, after φ29 amplification). Rolonies might be double-stranded to some extent because of the limited template switching of φ29 (Ducani et al. 2014). Residual DAPI staining might also be influenced by cytoplasmic DNA, which is more prevalent in tumor cells (Anindya 2022). However, in our data tumor cells do not have on average larger residual DAPI staining in the cytosol compared to other cell types (Supplemental Fig. S5). Details about the parameters used by InSituSeg and the sensitivity to fine tuning them are provided (Materials and Methods; Supplemental Fig. S6).

To test the performance of InSituSeg, we (1) showed that it is in agreement with manual segmentation (Supplemental Fig. S7A); (2) tested it on a different core biopsy that was analyzed by expansion sequencing (Supplemental Fig. S7B); and (3) demonstrated that it outperforms two recent neuronal network–based segmentation tools, ilastik (Berg et al. 2019) and Mesmer (Supplemental Fig. S7C; Greenwald et al. 2022). We note that in contrast to ilastik, Mesmer, and other recent segmentation tools, InSituSeg is specifically designed for in situ sequencing image data, and therefore InSituSeg is not a general-purpose segmentation tool. We also (4) showed that InSituSeg is superior to using RNA sequencing data alone for grouping reads into cells (i.e., without using nuclei information [Supplemental Fig. S7C; Materials and Methods]); (5) demonstrated that InSituSeg can be applied for in situ imaging data generated with MERFISH (Supplemental Fig. S8); and finally (6) estimated that InSituSeg captures between 65% and 71% of the cell body area, as determined via cytosolic and membrane staining (Supplemental Fig. S9). We further estimated that segmentation with InSituSeg can add 20% to the cell body volume, compared to nuclei segmentation alone (Supplemental Fig. S9).

Overall, the data set of the core biopsy contained 1,146,615 spatially resolved sequenced reads from 297 genes. Manual segmentation of nuclei using the tool VAST (Berger et al. 2018) resulted in 2395 cells (reporting only cells with at least 100 reads per cell), and 771,904 reads were assigned to them (Alon et al. 2021). In contrast, using InSituSeg, 2748 cells were detected, and 939,764 reads were assigned to them (again only cells containing at least 100 reads are reported). Thus, InSituSeg gives a 15% and 22% increase in the number of segmented cells and the number of assigned reads, respectively (Supplemental Fig. S10A), which can lead to better characterization of the molecular content of the cells. Moreover, the detection of cell bodies with InSituSeg, combined with the super-resolution of ExSeq, allowed pinpointing touching cells in 3D below (Fig. 1, segmentation step; Supplemental Figs. S1 and S2).

We next performed expression clustering on the InSituSeg results and compared it to manual segmentation of nuclei (Supplemental Fig. S10B,C; Alon et al. 2021). The analysis was done using principal component analysis (PCA)-based expression clustering of Seurat (Hao et al. 2021) and displayed using the UMAP representation (Materials and Methods; Becht et al. 2018). Overall, in both approaches, the expression clustering revealed the expected mixture of cell types, including tumor, immune (T cell, B cell, and macrophage), and fibroblast cell clusters (Supplemental Fig. S10B,C). However, with InSituSeg the higher number of reads assigned to cells allowed us to classify an additional tumor subtype, marked by the gene EPCAM. Finally, the transcriptionally defined cell clusters were mapped onto tissue context (Supplemental Fig. S10D,E).

Identification of genes involved in cell–cell interactions using differential expression

We next used the ExSeq data, after processing with InSituSeg and expression clustering, to characterize immune–tumor cell–cell interactions in situ. Specifically, we aimed to detect genes in a given cell type that have different expressions as a result of proximity to another cell type. These genes can either be influenced by the proximity between the cells or influence the proximity to occur. We first used a differential expression approach (Materials and Methods): For any pair of cell clusters X and Y, cluster X was partitioned into two subsets—a subset of X cells that are proximal to Y cells (1), and a subset of X cells that are not proximal to Y cells (2)—and all differentially expressed genes between (1) and (2) were detected using DeSeq2 (Love et al. 2014). The resulting P-values were further validated using bootstrapping (Materials and Methods). Cell–cell proximity was estimated using the smallest Euclidean distance between the mRNA molecules in two adjacent cells, using the InSituSeg cell body segmentation. We set a threshold of 3 µm (before expansion) for that distance, and validated the robustness of the results to changes in this parameter (Supplemental Figs. S11 and S12). Taking advantage of the super-resolution, which is a result of the physical expansion in ExSeq, we examined distances between cell bodies down to half a micrometer (Supplemental Fig. S12), further increasing the likelihood that the cells are physically touching. The genes detected below as induced by proximity are consistent between the different distance cutoffs (Supplemental Fig. S12; Supplemental Table S3). The physical expansion of ExSeq also allows a large number of transcripts to be quantified together, because neighboring RNA molecules can be better resolved (Xia et al. 2019; Alon et al. 2021). We estimate that without expansion most amplified transcripts would not be resolved because of spatial overlap (Supplemental Fig. S13; Materials and Methods). The dramatic decrease in the number of amplified transcripts resolved would have been also manifested in a decrease in the proximity-induced genes that can be detected (Supplemental Fig. S13E; Materials and Methods).

We systematically examined all possible interactions between tumor (five cell clusters, Supplemental Fig. S10C) and nontumor cell types (seven cell clusters, Supplemental Fig. S10C), overall 108 comparisons (Materials and Methods). We accounted for multiple testing using a Benjamini–Hochberg false discovery rate (FDR) of 0.1. The systematic search resulted in 11.8 genes, on average, detected as differentially expressed in the 108 comparisons performed (Fig. 3; Supplemental Figs. S14 and S15). Note that with bootstrapping, which is used to compute the P-values (Materials and Methods), on average less than one gene was detected as proximity-induced. This is true for the original cutoff distance of 3 µm, as well as all the other cutoff distances down to 0.5 µm. An example of the proximity-induced gene is the gene thymosin β 4 X-linked (TMSB4X), which is involved in the organization of the cytoskeleton, is overexpressed when CD3D⁺ T cells are in proximity to tumor cells in general, compared to CD3D⁺ T cells which are not proximal (Supplemental Fig. S14). This gene is also up-regulated when T cells, in general, are proximal to EPCAM-positive tumor cells, when CD8A⁺ T cells are proximal to tumor cells in general, and also when T cells in general are proximal to CD44⁺ tumor cells (Supplemental Fig. S14). Consequently, comparing all T cells that are proximal to tumor cells in general to T cells that are not proximal also reveals that this gene is overexpressed (Fig. 3A; Supplemental Fig. S14). Interestingly, in the last few years, this gene was detected as up-regulated in breast cancer, and it was suggested that its expression correlates with poor prognosis (Zhang et al. 2017; Morita and Hayashi 2018). The data presented here might point to the exact settings in which this gene is up-regulated.

FIGURE 3. — Example of genes identified as induced in T cells when proximal to tumor cells. Sequencing reads locations (red spots) of four induced genes are overlaid on the DAPI staining of the nuclei, as well as the segmentation of T cells (blue) and tumor cell types (yellow). The cell bodies were detected using InSituSeg, and the cell types were identified using clustering of the gene expression profiles. Only segmentations of T cells and tumor cells are presented. Genes up-regulated in T cells due to proximity to tumor cells have more red spots when proximal to tumor cells (exemplars in full red arrows vs. hollow red arrows). (A) The gene *TMSB4X* was detected by differential expression (DE), by matrix factorization (MF), and by machine learning (ML), when examining all T cells and all tumor cells. (B) The gene ribosomal protein SA (*RPSA*) was detected by DE, by ML, and by MF, when examining all T cells and all tumor cells. (C) The gene Complement Component 1, Q Subcomponent, A Chain (*C1QA*) was detected by DE when examining all T cells and all tumor cells. (D) The gene laminin subunit α 1 (*LAMA1*) was detected by DE and by ML, when examining the subtype T cell-CD3D and the subtype tumor-EPCAM. Each panel shows a subset region from the biopsy, acquired with a 40× objective, 100 × 100 µm in size (before expansion). Note that max projection is shown and therefore some cells seem to overlap, but they are clearly separated in 3D (Supplemental Fig. S1). DE was performed with DeSeq2 (Love et al. 2014), ML with CatBoost (Dorogush et al. 2018), and MF with cNMF (Kotliar et al. 2019). Permutation analysis was performed on all methods to assess statistical significance.

Identification of genes involved in cell–cell interactions using machine learning tools

We then applied supervised machine learning tools to identify genes that their expression separates, for cell type X, cells that are proximal to cell type Y versus nonproximal cells. We focused on Decision Tree (Quinlan 1986), a classifier with a high level of interpretability, and on algorithms that are based on Decision Tree with a low level of interpretability, including Random Forest (Ho 2002), XGBoost (Chen and Guestrin 2016), and CatBoost (Dorogush et al. 2018). We designed and applied a machine learning pipeline (Supplemental Fig. S16; Materials and Methods) on each one of the 108 comparisons between nontumor cell types and tumor cell types as described above, using the same measure of physical proximity between cells (Methods and sensitivity test in Supplemental Fig. S17). The data for each comparison (i.e., the gene expression of the cells that can either be in proximity or not proximal to the different cell type) was randomly split into training and testing sets. The split was stratified so the relative distribution of the proximity versus not proximal cells was retained. The testing set was not used during the training phase, and on the training set we applied the stratified k-fold cross-validation strategy (Tan and Gilbert 2003). In most cases, the number of nonproximal cells was higher than proximal cells; therefore, the data set was imbalanced and we used oversampling methods to correct this effect (Materials and Methods). We ran multiple combinations of the classifiers’ hyperparameters to find the best ones for each classification algorithm (“best model,” Materials and Methods). For each comparison, we determined the classifier with the best performance (best classifier), which was then applied to the test set.

When detecting genes that classify cells as proximal and nonproximal, the results are expected to be more robust when the number of cells, both proximal and not proximal, is high. On the other hand, when studying a biopsy from an individual patient using spatially resolved transcriptomics, the overall number of cells studied is typically on the order of thousands (Supplemental Fig. S10E) or tens of thousands. In addition, refining the cell types in the comparison (e.g., studying subtypes of T cells and subtypes of tumor cells) is expected to produce more specific results but reduce the number of cells even further. Therefore, the number of cells fed to the classifier was not high overall (Supplemental Tables S4 and S5). Specifically, in most comparisons, the number of proximal cells per cross-validation fold was ∼10–20 (or the number of nonproximal cells in cases when their number is lower than proximal cells) (Supplemental Tables S4 and S5), making the detection of proximity-induced genes challenging with machine learning tools. Therefore, we (1) quantified the sensitivity of the results with respect to the initial random split between train and test data (Materials and Methods); this sensitivity is expected to be high due to the small number of cells; and (2) performed an additional evaluation of the performance of the classifiers using nonbiological realizations. These realizations were generated by using the same data set, but with the cell labels (proximity or not proximal) shuffled such that they should not contain biological meaning (Materials and Methods). For the best classifier, we generated 30 nonbiological realizations for each comparison, and for each realization, the machine learning pipeline was rerun. We compared the results to 30 runs of the pipeline with the best classifier using the original (unshuffled) data, each run with a different initial random split between train and test data, and computed bootstrap P-values. We kept only the machine learning results that had a Benjamini–Hochberg FDR < 1 × 10⁻⁴. Finally, for comparisons that passed the aforementioned test, the best classifiers were applied to the complete data set (i.e., without splitting into train and test [Materials and Methods]).

Interestingly, the CatBoost classification method was found to outperform the three other classification methods in all 108 comparisons. Overall, only 60 out of the 108 comparisons resulted in classifications that were deemed significant with FDR < 1 × 10⁻⁴ (Supplemental Tables S4 and S5). The top 10 features (i.e., genes) that give rise to the significant classifications are presented in Supplemental Figure S18. The detected genes can either be up-regulated or down-regulated because of the proximity between cells from different types (Supplemental Fig. S18). Note that errors that result from inaccurate boundary detection of two adjacent cells, or missegmentation, can lead to cases of false detection of proximity-induced genes. We filtered genes that were likely to be detected because of inaccurate boundary detection using two approaches (see Materials and Methods). However, some cases of missegmentation-based errors might still occur. For example, in Supplemental Figure S18, the gene Pecam1 (CD31) appears as proximity-induced in B cells when close to tumor-ALDH1A3. Although Pecam1 can be expressed in B cells, it is known to be expressed in endothelial cells, and therefore missegmentation of blood vessels might have contributed to this result. Likewise, while LYZ might be expressed in tumor cells in breast cancer (Vizoso et al. 2001), it is known to be expressed in immune cells, and therefore the detection of this gene as proximity-induced in tumor-PGR when close to T cells (Supplemental Fig. S18), might be due to missegmentation.

The differential expression approach and the machine learning classifiers are fundamentally different, so we did not expect a one-to-one agreement between the genes detected using both approaches. Nevertheless, many genes did overlap; comparing nontumor cell types that are proximal to tumor cell types versus nontumor cell types that are not proximal to tumor cells, out of 436 genes detected using a differential expression, 61 genes were also detected using machine learning (Supplemental Table S6). The overlap is even more profound when examining the other direction (i.e., tumor cell types that are proximal to nontumor cell types versus tumor cell types that are not proximal to nontumor cells); out of 840 genes detected using differential expression, 166 genes were also detected using machine learning (Supplemental Table S6). Overall, in 56 out of the 108 comparisons performed between all tumor and immune cell types, genes were detected as proximity-related using both the differential expression approach and machine learning. Importantly, the overlap between the detected genes was statistically significant (P-value < 0.05, bootstrapping; Materials and Methods) in 37 out of these 56 comparisons (Supplemental Table S6). This overlap between the approaches provides additional support for the validity of the findings. Examining the genes detected by both differential expression and machine learning, taking T cells, for example, clearly shows the overexpression of these genes when proximal to tumor cells (Fig. 4).

FIGURE 4. — Overexpression of a group of genes in T cells when proximal to tumor cells. Six genes were detected as induced by both differential expression and machine learning when T cells are proximal to tumor cells: *RPSA*, *CD63*, *LYZ*, *TMSB4X*, *S100A14*, and *LAMA1*. (A) Sequencing reads locations (red spots) of these genes are overlaid on the DAPI staining of the nuclei, as well as the segmentation of T cells in blue and tumor cell types in yellow. The cell bodies were detected using InSituSeg, and the cell types were identified using clustering of the gene expression profiles. Only segmentations of T cells and tumor cells are presented. Genes up-regulated in T cells because of proximity to tumor cells have more red spots (overexpression) when proximal to tumor cells (exemplars in full red arrows vs. hollow red arrows). (B) The biopsy with DAPI staining. Each panel in A shows a max projection of a subset region from the biopsy (orange square in B), acquired with a 40× objective, 100 × 100 µm in size (before expansion).

An example of a gene detected using both differential expression analysis and machine learning is Keratin 19 (KRT19). KRT19 was detected using differential expression analysis as up-regulated in tumor cells proximal to T cells, compared to tumor cells not proximal to T cells (Supplemental Fig. S14). This gene was also detected using machine learning as the highest classification feature for all tumor cells proximal to all T cells versus tumor cells not proximal (Supplemental Fig. S18). This gene is also the second highest classification feature for all tumor cells proximal to CD8A⁺ T cells versus not proximal tumor cells, and the second highest classification feature for EGFR⁺ tumor cells proximal to CD8A⁺ T cells versus not proximal EGFR⁺ tumor cells (Supplemental Fig. S18). KRT19 is known to be important for the structural integrity of epithelial cells and is a marker gene for breast tumors (Saha et al. 2017). Our analysis pinpoints the settings in which this gene is up-regulated—namely, that this gene expression might be higher when tumor cells are proximal to T cells, and in particular to CD8A⁺ T cells.

Remarkably, a clear difference is observed between CD4 and CD8 T cells, in line with the ability of CD8 T cells to directly bind major histocompatibility complex (MHC) class I on tumor cells. Twelve and 10 genes were detected as overexpressed when CD8⁺ T cells were in proximity to tumor cells, using differential expression and machine learning analysis, respectively. In contrast, only 1 and 0 such genes were detected when CD4⁺ T cells were in proximity to tumor cells, using differential expression and machine learning analysis, respectively (Supplemental Table S6). Likewise, 38 and 10 genes were detected as overexpressed when tumor cells were in proximity to CD8⁺ T cells, using differential expression and machine learning analysis, respectively. In contrast, only six and zero such genes were detected when tumor cells were in proximity to CD4⁺ T cells, using differential expression and machine learning analysis, respectively (Supplemental Table S6). Thus, although in CD8 T cells physical proximity to tumor cells trigger changes in gene expression in both the T cell and the tumor cell, in CD4 T cells the changes in gene expression might be more gradual with respect to the distances to tumor cells.

Identification of genes involved in cell–cell interactions using matrix factorization

We then applied matrix factorization to identify a battery of genes that change their expression together as a result of proximity between immune and tumor cells (Supplemental Fig. S19). We used cNMF (Kotliar et al. 2019) to discover gene signatures, termed gene expression programs (GEPs), which define cell types as well as cell states. Specifically, we examined whether GEPs can be proximity-related (i.e., can be overexpressed or underexpressed in a cell type as a result of physical distance from other cell types). To do so, we divided each nontumor cell type into two subgroups: cells proximal to tumor cells and cells that are not close to tumor cells (Materials and Methods). A similar analysis was performed in the other direction (i.e., tumor cells proximal or not proximal to immune cells). Then, for each GEP in each cell type, we compared the usage of that GEP in the proximal cells’ subgroup to the usage in the nonproximal subgroup and computed statistical significance using permutation analysis (Materials and Methods). Importantly, this analysis revealed six GEPs that are induced by proximity to tumor cells (Supplemental Fig. S19; Supplemental Table S7). Importantly, in one such proximity-related GEP, expressed in T cells, eight out of the 15 genes in this GEP overlapped with the genes detected using differential expression (significant overlap, P-value < 0.01, bootstrapping; Supplemental Tables S6–S8). In addition, five out of the 15 genes in this GEP overlapped with the genes detected using machine learning (significant overlap, P-value < 0.01, bootstrapping; Supplemental Tables S6–S8). The overlaps found between these three computational approaches further support the possibility that the detected genes are indeed involved in cell–cell interactions between immune and tumor cell types (Figs. 3 and 4). Thus, the detected genes can potentially be markers for immune reactions toward tumor cells, or vice versa.

Detection of proximity-induced genes as a function of the fraction of data used

We explored the dependency of the number of proximity-induced genes on the fraction of the data used, and the number of adjacent nontumor cells to tumor cells, via a scale-down analysis (Materials and Methods). This analysis revealed a linear trend between the fraction of the data used and the number of proximity-induced genes revealed in T cells (Supplemental Fig. S20). Importantly, a linear trend is also observed between the number of proximity-induced genes in T cells and the number of adjacent T cells and tumor cells (Supplemental Fig. S20). The linear trend is also evident in other nontumor cell types (Supplemental Fig. S21). This trend suggests that a rational design of experiments aimed at detecting proximity-induced genes is feasible, given the number of adjacent cells present in the studied biopsy.

Detecting spatially dependent genes and cell types

We next examined if the genes detected as involved in cell–cell interactions tend to be spatially dependent. For this, we implemented Moran's I measurement for segmentation-free detection of spatially dependent genes (Hao et al. 2021; Hu et al. 2021). Similarly to a recent implementation (Miller et al. 2021), we account for nonuniform cell distribution in the tissue. Our implementation detects specific genes that have higher spatial dependence, relative to other expressed genes, by using the distribution of locations of all the genes in the tissue (Materials and Methods). For Moran's I calculation we automatically select the grid (spatial bins) that produces the most robust results for the spatially dependent genes (Materials and Methods).

Overall, 169 genes were detected as spatially dependent (FDR < 0.01) out of the 297 genes interrogated. Note that the selection of genes in the ExSeq gene panel potentially explains the large fraction of spatially variable genes. Ranking the genes according to their p-value, arranged from the smallest to the largest, we examined the top detected genes and six of them are presented in Supplemental Figure S22: KIT, S100A8, SOX18, COBL, RPSA, and XBP1. RPSA was detected as regulated by proximity between T cells and tumor cells in the differential expression analysis, the machine learning analysis, and the cNMF analysis (Fig. 3B; Supplemental Figs. S14 and S18; Supplemental Table S6). The expression of RPSA is increased in many cancers including breast, and clinical trials are ongoing to test if it can serve as a biomarker of tumor invasion in pancreatic ductal adenocarcinoma (clinical trials identifier [NCT number]: NCT04575363). The spatial dependence of this gene (Supplemental Fig. S22E), as well as the possible up-regulation of this gene due to T cells and tumor cells proximity (Supplemental Figs. S14 and S18), suggest that this gene might serve as a biomarker in breast cancer as well. However, given that most of the examined genes were detected as spatially dependent, it is unlikely that the main cause for the spatial dependence is the involvement in cell–cell interactions. We note that the spatial dependence of the genes cannot be fully explained by the uneven spatial distribution of cell types, because we detected genes that are spatially variable in spite (or in excess) of cell type spatial variability (Materials and Methods; Supplemental Fig. S23; Supplemental Table S9). Manual examination of the data did not reveal a clear link between the locations of genes that are spatially variable in excess of their cell type and the locations of the potentially interacting cell types (Supplemental Fig. S24). Last, we also examined the spatial dependence of the cell types, revealing a clear difference between nontumor cell types versus tumor cells (Supplemental Figs. S25 and S26). Segmentation-free detection opens the door to the analysis of several genes and cell types that might be interacting in specific locations in the tissue.

MATERIALS AND METHODS

Description of the data sets

Biopsies were collected from patients at Dana-Farber Cancer Institute and originally described in Alon et al. (2021). The sample used in this study was of a liver metastasis of hormone receptor–positive breast cancer. The region sequenced in situ with ExSeq was 1347 × 621 × 8 µm in size (before expansion). Full description of the biopsy and the 297 interrogated genes is in the Supplemental Information.

Segmentation of cell bodies

We developed a segmentation pipeline, termed InSituSeg, that takes advantage of the dense mapping of genes in situ for segmentation of cell bodies in 3D, using staining of cell nuclei and RNA locations (Fig. 2A). The steps of the segmentation pipeline (Fig. 2B) and its parameters (Supplemental Table S2) are described in the Supplemental Information.

Clustering segmented cells

To identify and cluster the segmented cells according to their expression pattern, we used the R toolkit Seurat (Hao et al. 2021) and followed the analysis in Alon et al. (2021). The procedure is described in the Supplemental Information.

Detecting differentially expressed genes

For any pair of cell clusters X and Y, cluster X was partitioned into two subsets: a subset of X cells that are proximal to Y cells, and a subset of X cells that are not proximal to Y cells. Comparisons were performed to observe differences in nontumor cell types when in proximity to tumor cell types, and vice versa. Gene expression change (fold change) and p-value per gene in each comparison were calculated using DESeq2 (Love et al. 2014), and we proceeded with genes that had a Benjamini–Hochberg FDR of 0.1. To further assess the statistical significance of the results, we used permutation analysis. To avoid errors that result from inaccurate boundary detection of two adjacent cells, we filtered genes in two different ways: (1) We filtered up-regulated genes detected in X cells if they are known cell markers for the Y cells (the known marker genes are listed in section “Description of the data sets”). (2) We filtered genes detected in X cells (i.e., induced in the subset of X cells that are proximal to Y cells compared to the subset of X cells that are not proximal) if they are highly differentially expressed in the Y cells (i.e., induced in the subset of Y cells that are proximal to X cells compared to the subset of Y cells that are not proximal). The high degree of overlap exists between the two different filtering methods; full details are in the Supplemental Information.

Machine learning pipeline

We applied machine learning tools to detect genes that their expressions separate, for cell type X, cells that are proximal to cell type Y versus nonproximal cells. In contrast to the detection of differentially expressed genes described above, machine learning tools can detect genes that change their expression in concert because of the proximity between cells. Overall four machine learning classifiers were applied on the data set: Decision Trees (Quinlan 1986), Random Forest (Ho 2002), XGBoost (Chen and Guestrin 2016), and CatBoost (Dorogush et al. 2018). To evaluate the performance of the classifiers, we first checked how sensitive the results are with respect to the initial (random) decision of which part of the data set will serve as a train and which part will be the test. Then we compared the results obtained to the results of the same data set, but with the class labels shuffled such that it should not contain biological meaning. To avoid errors that result from inaccurate boundary detection of two adjacent cells, we filtered up-regulated genes detected in X cells if they are known cell markers for the Y cells (the known marker genes are listed in the Supplemental Information, section “Description of the data sets”). Full details are in the Supplemental Information.

cNMF analysis

We implemented cNMF (Kotliar et al. 2019) analysis with the aim of detecting a battery of genes that change their expression together as a result of proximity between immune and tumor cells. Using this analysis we discovered gene signatures—namely, gene expression programs (GEPs)—that define cell types as well as cell states. We examined whether GEPs can be overexpressed or underexpressed in a cell type as a result of physical distance from other cell types (“proximity-related” GEPs). Full details are in the Supplemental Information.

Quantifying the statistical significance of overlapping genes

We assessed the statistical significance of the overlapping genes between any two detection methods (differential expression, machine learning, and matrix factorization) with a bootstrapping approach (Supplemental Information).

Moran's I calculation

We implemented Moran's I calculation in the context of spatially resolved transcriptomics. We compute a p-value for the spatial dependence of each gene by taking into account the locations of all the genes expressed in the tissue. p-values were estimated using bootstrapping. We also implemented Moran's I on the level of cells from a given cell type (i.e., for each cell type, we generate a p-value for the spatial dependence of the cells in the given type). Full details are in the Supplemental Information.

DATA DEPOSITION

The data used in this manuscript and the code of the InSituSeg, Machine learning, cNMF, and Moran's I methods are publicly available at doi:10.5281/zenodo.7497981. Test data for running the code is also provided in the same deposit.

SUPPLEMENTAL MATERIAL

Supplemental material is available for this article.

ACKNOWLEDGMENTS

This work was supported by the Israel Science Foundation (ISF) (grant numbers 2958/21 and 3363/21); Israel Cancer Association (ICA) (grant number 20220069); Israel Ministry of Science (grant number 2180); BrightFocus Foundation (grant number 929965); Joint Sheba-Bar Ilan Research Grant; and European Research Council (ERC) (grant number 101117324).

Footnotes

Article is online at http://www.rnajournal.org/cgi/doi/10.1261/rna.079801.123.

Freely available online through the RNA Open Access option.

MEET THE FIRST AUTHOR

Meet the First Author(s) is an editorial feature within RNA, in which the first author(s) of research-based papers in each issue have the opportunity to introduce themselves and their work to readers of RNA and the RNA research community. Michal Danino-Levi is the first author of this paper, “Computational analysis of super-resolved in situ sequencing data reveals genes modified by immune–tumor contact events.” Michal performed this work as a PhD student in Dr. Shahar Alon's Laboratory, Faculty of Engineering at Bar-Ilan University, Israel, specializing in computational analysis of biological data.

What are the major results described in your paper and how do they impact this branch of the field?

Cancer cells can manipulate immune cells and escape from the immune system response. Quantifying the molecular changes that occur when an immune cell touches a tumor cell can increase our understanding of the underlying mechanisms. In our work, we systematically examined whether individual immune cells from specific cell types express genes differently when in physical proximity to individual tumor cells. This systematic analysis revealed tens of genes, in specific cell types, whose expression separates immune cells that are proximal to tumor cells from those that are not proximal, with a significant overlap between different analysis methods. These findings advance our understanding of immune–tumor cell–cell interactions within cancer tissues, shedding light on the complex molecular mechanisms underlying disease progression. Furthermore, our study demonstrates the utility of in situ sequencing in detecting genes implicated in these interactions, offering promising avenues for future research and potential therapeutic interventions in cancer treatment.

What led you to study RNA or this aspect of RNA science?

During my search for a laboratory for an advanced degree, I grappled with the choice between genomics, especially RNA science, and signal and image processing, both of which intrigued me. Eventually, I decided to concentrate on the new field of spatial genomics, as it combines both. However, despite its great importance, there are gaps in our understanding, especially when examining RNA in 3D tissues, as we do in our laboratory. Additionally, although there is rapid progress in RNA-related technologies, the lack of robust computational tools for data analysis presents a significant challenge. These factors led me to choose RNA science as my study focus, driven by a passion for unraveling its mysteries and contributing to advancements in both knowledge and technology.

During the course of these experiments, were there any surprising results or particular difficulties that altered your thinking and subsequent focus?

During our experiments, we used three different computational methods to detect whether individual immune cells from specific cell types express genes differently when in physical proximity to individual tumor cells. To our surprise, we found that there were overlaps between the genes detected by these three computational methods, despite each one being based on a different approach.

What are some of the landmark moments that provoked your interest in science or your development as a scientist?

The moment that sparked the first light of my interest in science was during my final project for my bachelor's degree in biomedical engineering. This project focused on image processing on cerebral blood vessels from MRI images. I have always enjoyed creating and building new things, developing and promoting existing things, and thinking creatively. The final project required these abilities, to think of new and innovative solutions to the challenge at hand, and to express my thoughts, in addition to continuous self-learning of the tools that exist in the field. Through the project, I realized that I like research and the way of thinking that comes with it. For these reasons, I decided to experiment in new fields, pursue advanced degrees, and develop in the field of science.

What are your subsequent near- or long-term career plans?

In the near term, I plan to further investigate the spatial distribution of genes in tissues, continue to explore the interactions between cells in different tissues at a high resolution and in 3D, and develop new computational methods to analyze the innovative data generated in our laboratory. In the long term, regardless of the path I choose, what matters most to me is to continue pushing myself forward, facing new challenges, learning, staying curious, growing, and enjoying the journey.

REFERENCES

Alon S, Goodwin DR, Sinha A, Sinha A, Wassie AT, Chen F, Daugharthy ER, Bando Y, Kajita A, Xue AG, et al. 2021. Expansion sequencing: spatially precise in situ transcriptomics in intact biological systems. Science 371: eaax2656. 10.1126/science.aax2656 [DOI] [PMC free article] [PubMed] [Google Scholar]
Anindya R. 2022. Cytoplasmic DNA in cancer cells: several pathways that potentially limit DNase2 and TREX1 activities. Biochim Biophys Acta Mol Cell Res 1869: 119278. 10.1016/j.bbamcr.2022.119278 [DOI] [PubMed] [Google Scholar]
Becht E, McInnes L, Healy J, Dutertre CA, Kwok IWH, Ng LG, Ginhoux F, Newell EW. 2018. Dimensionality reduction for visualizing single-cell data using UMAP. Nat Biotechnol 37: 38–44. 10.1038/nbt.4314 [DOI] [PubMed] [Google Scholar]
Berg S, Kutra D, Kroeger T, Straehle CN, Kausler BX, Haubold C, Schiegg M, Ales J, Beier T, Rudy M, et al. 2019. Ilastik: interactive machine learning for (bio)image analysis. Nat Methods 16: 1226–1232. 10.1038/s41592-019-0582-9 [DOI] [PubMed] [Google Scholar]
Berger DR, Seung HS, Lichtman JW. 2018. VAST (Volume Annotation and Segmentation Tool): efficient manual and semi-automatic labeling of large 3D image stacks. Front Neural Circuits 12: 88. 10.3389/fncir.2018.00088 [DOI] [PMC free article] [PubMed] [Google Scholar]
Cable DM, Murray E, Zou LS, Goeva A, Macosko EZ, Chen F, Irizarry RA. 2022. Robust decomposition of cell type mixtures in spatial transcriptomics. Nat Biotechnol 40: 517–526. 10.1038/s41587-021-00830-w [DOI] [PMC free article] [PubMed] [Google Scholar]
Chen T, Guestrin C. 2016. XGBoost. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. [Google Scholar]
Codeluppi S, Borm LE, Zeisel A, La Manno G, van Lunteren JA, Svensson CI, Linnarsson S. 2018. Spatial organization of the somatosensory cortex revealed by osmFISH. Nat Methods 15: 932–935. 10.1038/s41592-018-0175-z [DOI] [PubMed] [Google Scholar]
Dorogush AV, Ershov V, Gulin A. 2018. CatBoost: gradient boosting with categorical features support. arXiv 10.48550/arxiv.1810.11363 [DOI]
Ducani C, Bernardinelli G, Högberg B. 2014. Rolling circle replication requires single-stranded DNA binding protein to avoid termination and production of double-stranded DNA. Nucleic Acids Res 42: 10596–10604. 10.1093/nar/gku737 [DOI] [PMC free article] [PubMed] [Google Scholar]
Elosua-Bayes M, Nieto P, Mereu E, Gut I, Heyn H. 2021. SPOTlight: seeded NMF regression to deconvolute spatial transcriptomics spots with single-cell transcriptomes. Nucleic Acids Res 49: e50. 10.1093/nar/gkab043 [DOI] [PMC free article] [PubMed] [Google Scholar]
Eng C-HL, Lawson M, Zhu Q, Dries R, Koulena N, Takei Y, Yun J, Cronin C, Karp C, Yuan G-C, et al. 2019. Transcriptome-scale super-resolved imaging in tissues by RNA seqFISH. Nature 568: 235–239. 10.1038/s41586-019-1049-y [DOI] [PMC free article] [PubMed] [Google Scholar]
Giladi A, Cohen M, Medaglia C, Baran Y, Li B, Zada M, Bost P, Blecher-Gonen R, Salame T-M, Mayer JU, et al. 2020. Dissecting cellular crosstalk by sequencing physically interacting cells. Nat Biotechnol 38: 629–637. 10.1038/s41587-020-0442-2 [DOI] [PubMed] [Google Scholar]
Greenwald NF, Miller G, Moen E, Kong A, Kagel A, Dougherty T, Fullaway CC, McIntosh BJ, Leow KX, Schwartz MS, et al. 2022. Whole-cell segmentation of tissue images with human-level performance using large-scale data annotation and deep learning. Nat Biotechnol 40: 555–565. 10.1038/s41587-021-01094-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
Hao Y, Hao S, Andersen-Nissen E, Mauck WM, Zheng S, Butler A, Lee MJ, Wilk AJ, Darby C, Zager M, et al. 2021. Integrated analysis of multimodal single-cell data. Cell 184: 3573–3587.e29. 10.1016/j.cell.2021.04.048 [DOI] [PMC free article] [PubMed] [Google Scholar]
Ho TK. 2002. Random decision forests. In Proceedings of 3rd International Conference on Document Analysis and Recognition. IEEE Computer Society Press. [Google Scholar]
Hollandi R, Moshkov N, Paavolainen L, Tasnadi E, Piccinini F, Horvath P. 2022. Nucleus segmentation: towards automated solutions. Trends Cell Biol 32: 295–310. 10.1016/j.tcb.2021.12.004 [DOI] [PubMed] [Google Scholar]
Hu J, Li X, Coleman K, Schroeder A, Ma N, Irwin DJ, Lee EB, Shinohara RT, Li M. 2021. SpaGCN: integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network. Nat Methods 18: 1342–1351. 10.1038/s41592-021-01255-8 [DOI] [PubMed] [Google Scholar]
Ke R, Mignardi M, Pacureanu A, Svedlund J, Botling J, Wählby C, Nilsson M. 2013. In situ sequencing for RNA analysis in preserved tissue and cells. Nat Methods 10: 857–860. 10.1038/nmeth.2563 [DOI] [PubMed] [Google Scholar]
Kleshchevnikov V, Shmatko A, Dann E, Aivazidis A, King HW, Li T, Lomakin A, Kedlian V, Jain MS, Park JS, et al. 2020. Comprehensive mapping of tissue cell architecture via integrated single cell and spatial transcriptomics. bioRxiv 10.1101/2020.11.15.378125 [DOI]
Kotliar D, Veres A, Nagy MA, Tabrizi S, Hodis E, Melton DA, Sabeti PC. 2019. Identifying gene expression programs of cell-type identity and cellular activity with single-cell RNA-Seq. Elife 8: e43803. 10.7554/eLife.43803 [DOI] [PMC free article] [PubMed] [Google Scholar]
Littman R, Hemminger Z, Foreman R, Arneson D, Zhang G, Gómez-Pinilla F, Yang X, Wollman R. 2021. Joint cell segmentation and cell type annotation for spatial transcriptomics. Mol Syst Biol 17: e10108. 10.15252/msb.202010108 [DOI] [PMC free article] [PubMed] [Google Scholar]
Liu S, Punthambaker S, Iyer EPR, Ferrante T, Goodwin D, Fürth D, Pawlowski AC, Jindal K, Tam JM, Mifflin L. 2021. Barcoded oligonucleotides ligated on RNA amplified for multiplexed and parallel in situ analyses. Nucleic Acids Res 49: e58. 10.1093/nar/gkab120 [DOI] [PMC free article] [PubMed] [Google Scholar]
Longo SK, Guo MG, Ji AL, Khavari PA. 2021. Integrating single-cell and spatial transcriptomics to elucidate intercellular tissue dynamics. Nat Rev Genet 22: 627–644. 10.1038/s41576-021-00370-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
Love MI, Huber W, Anders S. 2014. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15: 550. [DOI] [PMC free article] [PubMed] [Google Scholar]
Miller BF, Bambah-Mukku D, Dulac C, Zhuang X, Fan J. 2021. Characterizing spatial gene expression heterogeneity in spatially resolved single-cell transcriptomic data with nonuniform cellular densities. Genome Res 31: 1843–1855. 10.1101/gr.271288.120 [DOI] [PMC free article] [PubMed] [Google Scholar]
Moffitt JR, Hao J, Wang G, Chen KH, Babcock HP, Zhuang X. 2016. High-throughput single-cell gene-expression profiling with multiplexed error-robust fluorescence in situ hybridization. Proc Natl Acad Sci 113: 11046–11051. 10.1073/pnas.1612826113 [DOI] [PMC free article] [PubMed] [Google Scholar]
Morita T, Hayashi K. 2018. Tumor progression is mediated by thymosin-β4 through a TGFβ/MRTF signaling axis. Mol Cancer Res 16: 880–893. 10.1158/1541-7786.MCR-17-0715 [DOI] [PubMed] [Google Scholar]
Nishida-Aoki N, Gujral TS. 2019. Emerging approaches to study cell-cell interactions in tumor microenvironment. Oncotarget 10: 785–797. 10.18632/oncotarget.26585 [DOI] [PMC free article] [PubMed] [Google Scholar]
Petukhov V, Xu RJ, Soldatov RA, Cadinu P, Khodosevich K, Moffitt JR, Kharchenko PV. 2022. Cell segmentation in imaging-based spatial transcriptomics. Nat Biotechnol 40: 345–354. 10.1038/s41587-021-01044-w [DOI] [PubMed] [Google Scholar]
Quinlan JR. 1986. Induction of decision trees. Mach Learn 1: 81–106. 10.1007/BF00116251 [DOI] [Google Scholar]
Rao A, Barkley D, França GS, Yanai I. 2021. Exploring tissue architecture using spatial transcriptomics. Nature 596: 211–220. 10.1038/s41586-021-03634-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
Rodriques SG, Stickels RR, Goeva A, Murray E, Vanderburg CR, Welch J, Chen LM, Chen F, Macosko EZ. 2019. Slide-seq: a scalable technology for measuring genome-wide expression at high spatial resolution. Science 363: 1463–1467. 10.1126/science.aaw1219 [DOI] [PMC free article] [PubMed] [Google Scholar]
Saha SK, Choi HY, Kim BW, Dayem AA, Yang G-M, Kim KS, Yin YF, Cho S-G. 2017. KRT19 directly interacts with β-catenin/RAC1 complex to regulate NUMB-dependent NOTCH signaling pathway and breast cancer properties. Oncogene 36: 332–349. 10.1038/onc.2016.221 [DOI] [PMC free article] [PubMed] [Google Scholar]
Ståhl PL, Salmén F, Vickovic S, Lundmark A, Navarro JF, Magnusson J, Giacomello S, Asp M, Westholm JO, Huss M, et al. 2016. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science 353: 78–82. 10.1126/science.aaf2403 [DOI] [PubMed] [Google Scholar]
Stickels RR, Murray E, Kumar P, Li J, Marshall JL, Di B DJ, Arlotta P, Macosko EZ, Chen F. 2021. Highly sensitive spatial transcriptomics at near-cellular resolution with Slide-seqV2. Nat Biotechnol 39: 313–319. 10.1038/s41587-020-0739-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
Stringer C, Wang T, Michaelos M, Pachitariu M. 2021. Cellpose: a generalist algorithm for cellular segmentation. Nat Methods 18: 100–106. 10.1038/s41592-020-01018-x [DOI] [PubMed] [Google Scholar]
Tan AC, Gilbert D. 2003. Ensemble machine learning on gene expression data for cancer classification. Appl Bioinformatics 2: S75–S83. [PubMed] [Google Scholar]
Vickovic S, Eraslan G, Salmén F, Klughammer J, Stenbeck L, Schapiro D, Äijö T, Bonneau R, Bergenstråhle L, Navarro JF, et al. 2019. High-definition spatial transcriptomics for in situ tissue profiling. Nat Methods 16: 987–990. 10.1038/s41592-019-0548-y [DOI] [PMC free article] [PubMed] [Google Scholar]
Vizoso F, Plaza E, Vázquez J, Serra C, Lamelas M, González LO, Merino AM, Méndez J. 2001. Lysozyme expression by breast carcinomas, correlation with clinicopathologic parameters, and prognostic significance. Ann Surg Oncol 8: 667–674. 10.1007/s10434-001-0667-3 [DOI] [PubMed] [Google Scholar]
Wang J-J, Lei K-F, Han F. 2018a. Tumor microenvironment: recent advances in various cancer treatments. Eur Rev Med Pharmacol Sci 22: 3855–3864. [DOI] [PubMed] [Google Scholar]
Wang X, Allen WE, Wright MA, Sylwestrak EL, Samusik N, Vesuna S, Evans K, Liu C, Ramakrishnan C, Liu J, et al. 2018b. Three-dimensional intact-tissue sequencing of single-cell transcriptional states. Science 361: eaat5691. 10.1126/science.aat5691 [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang Y, Eddison M, Fleishman G, Weigert M, Xu S, Wang T, Rokicki K, Goina C, Henry FE, Lemire AL, et al. 2021. EASI-FISH for thick tissue defines lateral hypothalamus spatio-molecular organization. Cell 184: 6361–6377.e24. 10.1016/j.cell.2021.11.024 [DOI] [PubMed] [Google Scholar]
Xia C, Fan J, Emanuel G, Hao J, Zhuang X. 2019. Spatial transcriptome profiling by MERFISH reveals subcellular RNA compartmentalization and cell cycle-dependent gene expression. Proc Natl Acad Sci 116: 19490–19499. 10.1073/pnas.1912459116 [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang X, Ren D, Guo L, Wang L, Wu S, Lin C, Ye L, Zhu J, Li J, Song L, et al. 2017. Thymosin β 10 is a key regulator of tumorigenesis and metastasis and a novel serum marker in breast cancer. Breast Cancer Res 19: 15. 10.1186/s13058-016-0785-2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[RNA079801DANC1] Alon S, Goodwin DR, Sinha A, Sinha A, Wassie AT, Chen F, Daugharthy ER, Bando Y, Kajita A, Xue AG, et al. 2021. Expansion sequencing: spatially precise in situ transcriptomics in intact biological systems. Science 371: eaax2656. 10.1126/science.aax2656 [DOI] [PMC free article] [PubMed] [Google Scholar]

[RNA079801DANC2] Anindya R. 2022. Cytoplasmic DNA in cancer cells: several pathways that potentially limit DNase2 and TREX1 activities. Biochim Biophys Acta Mol Cell Res 1869: 119278. 10.1016/j.bbamcr.2022.119278 [DOI] [PubMed] [Google Scholar]

[RNA079801DANC3] Becht E, McInnes L, Healy J, Dutertre CA, Kwok IWH, Ng LG, Ginhoux F, Newell EW. 2018. Dimensionality reduction for visualizing single-cell data using UMAP. Nat Biotechnol 37: 38–44. 10.1038/nbt.4314 [DOI] [PubMed] [Google Scholar]

[RNA079801DANC4] Berg S, Kutra D, Kroeger T, Straehle CN, Kausler BX, Haubold C, Schiegg M, Ales J, Beier T, Rudy M, et al. 2019. Ilastik: interactive machine learning for (bio)image analysis. Nat Methods 16: 1226–1232. 10.1038/s41592-019-0582-9 [DOI] [PubMed] [Google Scholar]

[RNA079801DANC5] Berger DR, Seung HS, Lichtman JW. 2018. VAST (Volume Annotation and Segmentation Tool): efficient manual and semi-automatic labeling of large 3D image stacks. Front Neural Circuits 12: 88. 10.3389/fncir.2018.00088 [DOI] [PMC free article] [PubMed] [Google Scholar]

[RNA079801DANC6] Cable DM, Murray E, Zou LS, Goeva A, Macosko EZ, Chen F, Irizarry RA. 2022. Robust decomposition of cell type mixtures in spatial transcriptomics. Nat Biotechnol 40: 517–526. 10.1038/s41587-021-00830-w [DOI] [PMC free article] [PubMed] [Google Scholar]

[RNA079801DANC7] Chen T, Guestrin C. 2016. XGBoost. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. [Google Scholar]

[RNA079801DANC8] Codeluppi S, Borm LE, Zeisel A, La Manno G, van Lunteren JA, Svensson CI, Linnarsson S. 2018. Spatial organization of the somatosensory cortex revealed by osmFISH. Nat Methods 15: 932–935. 10.1038/s41592-018-0175-z [DOI] [PubMed] [Google Scholar]

[RNA079801DANC9] Dorogush AV, Ershov V, Gulin A. 2018. CatBoost: gradient boosting with categorical features support. arXiv 10.48550/arxiv.1810.11363 [DOI]

[RNA079801DANC10] Ducani C, Bernardinelli G, Högberg B. 2014. Rolling circle replication requires single-stranded DNA binding protein to avoid termination and production of double-stranded DNA. Nucleic Acids Res 42: 10596–10604. 10.1093/nar/gku737 [DOI] [PMC free article] [PubMed] [Google Scholar]

[RNA079801DANC11] Elosua-Bayes M, Nieto P, Mereu E, Gut I, Heyn H. 2021. SPOTlight: seeded NMF regression to deconvolute spatial transcriptomics spots with single-cell transcriptomes. Nucleic Acids Res 49: e50. 10.1093/nar/gkab043 [DOI] [PMC free article] [PubMed] [Google Scholar]

[RNA079801DANC12] Eng C-HL, Lawson M, Zhu Q, Dries R, Koulena N, Takei Y, Yun J, Cronin C, Karp C, Yuan G-C, et al. 2019. Transcriptome-scale super-resolved imaging in tissues by RNA seqFISH. Nature 568: 235–239. 10.1038/s41586-019-1049-y [DOI] [PMC free article] [PubMed] [Google Scholar]

[RNA079801DANC13] Giladi A, Cohen M, Medaglia C, Baran Y, Li B, Zada M, Bost P, Blecher-Gonen R, Salame T-M, Mayer JU, et al. 2020. Dissecting cellular crosstalk by sequencing physically interacting cells. Nat Biotechnol 38: 629–637. 10.1038/s41587-020-0442-2 [DOI] [PubMed] [Google Scholar]

[RNA079801DANC14] Greenwald NF, Miller G, Moen E, Kong A, Kagel A, Dougherty T, Fullaway CC, McIntosh BJ, Leow KX, Schwartz MS, et al. 2022. Whole-cell segmentation of tissue images with human-level performance using large-scale data annotation and deep learning. Nat Biotechnol 40: 555–565. 10.1038/s41587-021-01094-0 [DOI] [PMC free article] [PubMed] [Google Scholar]

[RNA079801DANC15] Hao Y, Hao S, Andersen-Nissen E, Mauck WM, Zheng S, Butler A, Lee MJ, Wilk AJ, Darby C, Zager M, et al. 2021. Integrated analysis of multimodal single-cell data. Cell 184: 3573–3587.e29. 10.1016/j.cell.2021.04.048 [DOI] [PMC free article] [PubMed] [Google Scholar]

[RNA079801DANC16] Ho TK. 2002. Random decision forests. In Proceedings of 3rd International Conference on Document Analysis and Recognition. IEEE Computer Society Press. [Google Scholar]

[RNA079801DANC17] Hollandi R, Moshkov N, Paavolainen L, Tasnadi E, Piccinini F, Horvath P. 2022. Nucleus segmentation: towards automated solutions. Trends Cell Biol 32: 295–310. 10.1016/j.tcb.2021.12.004 [DOI] [PubMed] [Google Scholar]

[RNA079801DANC18] Hu J, Li X, Coleman K, Schroeder A, Ma N, Irwin DJ, Lee EB, Shinohara RT, Li M. 2021. SpaGCN: integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network. Nat Methods 18: 1342–1351. 10.1038/s41592-021-01255-8 [DOI] [PubMed] [Google Scholar]

[RNA079801DANC19] Ke R, Mignardi M, Pacureanu A, Svedlund J, Botling J, Wählby C, Nilsson M. 2013. In situ sequencing for RNA analysis in preserved tissue and cells. Nat Methods 10: 857–860. 10.1038/nmeth.2563 [DOI] [PubMed] [Google Scholar]

[RNA079801DANC20] Kleshchevnikov V, Shmatko A, Dann E, Aivazidis A, King HW, Li T, Lomakin A, Kedlian V, Jain MS, Park JS, et al. 2020. Comprehensive mapping of tissue cell architecture via integrated single cell and spatial transcriptomics. bioRxiv 10.1101/2020.11.15.378125 [DOI]

[RNA079801DANC21] Kotliar D, Veres A, Nagy MA, Tabrizi S, Hodis E, Melton DA, Sabeti PC. 2019. Identifying gene expression programs of cell-type identity and cellular activity with single-cell RNA-Seq. Elife 8: e43803. 10.7554/eLife.43803 [DOI] [PMC free article] [PubMed] [Google Scholar]

[RNA079801DANC22] Littman R, Hemminger Z, Foreman R, Arneson D, Zhang G, Gómez-Pinilla F, Yang X, Wollman R. 2021. Joint cell segmentation and cell type annotation for spatial transcriptomics. Mol Syst Biol 17: e10108. 10.15252/msb.202010108 [DOI] [PMC free article] [PubMed] [Google Scholar]

[RNA079801DANC23] Liu S, Punthambaker S, Iyer EPR, Ferrante T, Goodwin D, Fürth D, Pawlowski AC, Jindal K, Tam JM, Mifflin L. 2021. Barcoded oligonucleotides ligated on RNA amplified for multiplexed and parallel in situ analyses. Nucleic Acids Res 49: e58. 10.1093/nar/gkab120 [DOI] [PMC free article] [PubMed] [Google Scholar]

[RNA079801DANC24] Longo SK, Guo MG, Ji AL, Khavari PA. 2021. Integrating single-cell and spatial transcriptomics to elucidate intercellular tissue dynamics. Nat Rev Genet 22: 627–644. 10.1038/s41576-021-00370-8 [DOI] [PMC free article] [PubMed] [Google Scholar]

[RNA079801DANC25] Love MI, Huber W, Anders S. 2014. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15: 550. [DOI] [PMC free article] [PubMed] [Google Scholar]

[RNA079801DANC26] Miller BF, Bambah-Mukku D, Dulac C, Zhuang X, Fan J. 2021. Characterizing spatial gene expression heterogeneity in spatially resolved single-cell transcriptomic data with nonuniform cellular densities. Genome Res 31: 1843–1855. 10.1101/gr.271288.120 [DOI] [PMC free article] [PubMed] [Google Scholar]

[RNA079801DANC27] Moffitt JR, Hao J, Wang G, Chen KH, Babcock HP, Zhuang X. 2016. High-throughput single-cell gene-expression profiling with multiplexed error-robust fluorescence in situ hybridization. Proc Natl Acad Sci 113: 11046–11051. 10.1073/pnas.1612826113 [DOI] [PMC free article] [PubMed] [Google Scholar]

[RNA079801DANC28] Morita T, Hayashi K. 2018. Tumor progression is mediated by thymosin-β4 through a TGFβ/MRTF signaling axis. Mol Cancer Res 16: 880–893. 10.1158/1541-7786.MCR-17-0715 [DOI] [PubMed] [Google Scholar]

[RNA079801DANC29] Nishida-Aoki N, Gujral TS. 2019. Emerging approaches to study cell-cell interactions in tumor microenvironment. Oncotarget 10: 785–797. 10.18632/oncotarget.26585 [DOI] [PMC free article] [PubMed] [Google Scholar]

[RNA079801DANC30] Petukhov V, Xu RJ, Soldatov RA, Cadinu P, Khodosevich K, Moffitt JR, Kharchenko PV. 2022. Cell segmentation in imaging-based spatial transcriptomics. Nat Biotechnol 40: 345–354. 10.1038/s41587-021-01044-w [DOI] [PubMed] [Google Scholar]

[RNA079801DANC31] Quinlan JR. 1986. Induction of decision trees. Mach Learn 1: 81–106. 10.1007/BF00116251 [DOI] [Google Scholar]

[RNA079801DANC32] Rao A, Barkley D, França GS, Yanai I. 2021. Exploring tissue architecture using spatial transcriptomics. Nature 596: 211–220. 10.1038/s41586-021-03634-9 [DOI] [PMC free article] [PubMed] [Google Scholar]

[RNA079801DANC33] Rodriques SG, Stickels RR, Goeva A, Murray E, Vanderburg CR, Welch J, Chen LM, Chen F, Macosko EZ. 2019. Slide-seq: a scalable technology for measuring genome-wide expression at high spatial resolution. Science 363: 1463–1467. 10.1126/science.aaw1219 [DOI] [PMC free article] [PubMed] [Google Scholar]

[RNA079801DANC34] Saha SK, Choi HY, Kim BW, Dayem AA, Yang G-M, Kim KS, Yin YF, Cho S-G. 2017. KRT19 directly interacts with β-catenin/RAC1 complex to regulate NUMB-dependent NOTCH signaling pathway and breast cancer properties. Oncogene 36: 332–349. 10.1038/onc.2016.221 [DOI] [PMC free article] [PubMed] [Google Scholar]

[RNA079801DANC35] Ståhl PL, Salmén F, Vickovic S, Lundmark A, Navarro JF, Magnusson J, Giacomello S, Asp M, Westholm JO, Huss M, et al. 2016. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science 353: 78–82. 10.1126/science.aaf2403 [DOI] [PubMed] [Google Scholar]

[RNA079801DANC36] Stickels RR, Murray E, Kumar P, Li J, Marshall JL, Di B DJ, Arlotta P, Macosko EZ, Chen F. 2021. Highly sensitive spatial transcriptomics at near-cellular resolution with Slide-seqV2. Nat Biotechnol 39: 313–319. 10.1038/s41587-020-0739-1 [DOI] [PMC free article] [PubMed] [Google Scholar]

[RNA079801DANC37] Stringer C, Wang T, Michaelos M, Pachitariu M. 2021. Cellpose: a generalist algorithm for cellular segmentation. Nat Methods 18: 100–106. 10.1038/s41592-020-01018-x [DOI] [PubMed] [Google Scholar]

[RNA079801DANC38] Tan AC, Gilbert D. 2003. Ensemble machine learning on gene expression data for cancer classification. Appl Bioinformatics 2: S75–S83. [PubMed] [Google Scholar]

[RNA079801DANC39] Vickovic S, Eraslan G, Salmén F, Klughammer J, Stenbeck L, Schapiro D, Äijö T, Bonneau R, Bergenstråhle L, Navarro JF, et al. 2019. High-definition spatial transcriptomics for in situ tissue profiling. Nat Methods 16: 987–990. 10.1038/s41592-019-0548-y [DOI] [PMC free article] [PubMed] [Google Scholar]

[RNA079801DANC40] Vizoso F, Plaza E, Vázquez J, Serra C, Lamelas M, González LO, Merino AM, Méndez J. 2001. Lysozyme expression by breast carcinomas, correlation with clinicopathologic parameters, and prognostic significance. Ann Surg Oncol 8: 667–674. 10.1007/s10434-001-0667-3 [DOI] [PubMed] [Google Scholar]

[RNA079801DANC41] Wang J-J, Lei K-F, Han F. 2018a. Tumor microenvironment: recent advances in various cancer treatments. Eur Rev Med Pharmacol Sci 22: 3855–3864. [DOI] [PubMed] [Google Scholar]

[RNA079801DANC42] Wang X, Allen WE, Wright MA, Sylwestrak EL, Samusik N, Vesuna S, Evans K, Liu C, Ramakrishnan C, Liu J, et al. 2018b. Three-dimensional intact-tissue sequencing of single-cell transcriptional states. Science 361: eaat5691. 10.1126/science.aat5691 [DOI] [PMC free article] [PubMed] [Google Scholar]

[RNA079801DANC43] Wang Y, Eddison M, Fleishman G, Weigert M, Xu S, Wang T, Rokicki K, Goina C, Henry FE, Lemire AL, et al. 2021. EASI-FISH for thick tissue defines lateral hypothalamus spatio-molecular organization. Cell 184: 6361–6377.e24. 10.1016/j.cell.2021.11.024 [DOI] [PubMed] [Google Scholar]

[RNA079801DANC44] Xia C, Fan J, Emanuel G, Hao J, Zhuang X. 2019. Spatial transcriptome profiling by MERFISH reveals subcellular RNA compartmentalization and cell cycle-dependent gene expression. Proc Natl Acad Sci 116: 19490–19499. 10.1073/pnas.1912459116 [DOI] [PMC free article] [PubMed] [Google Scholar]

[RNA079801DANC45] Zhang X, Ren D, Guo L, Wang L, Wu S, Lin C, Ye L, Zhu J, Li J, Song L, et al. 2017. Thymosin β 10 is a key regulator of tumorigenesis and metastasis and a novel serum marker in breast cancer. Breast Cancer Res 19: 15. 10.1186/s13058-016-0785-2 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Computational analysis of super-resolved in situ sequencing data reveals genes modified by immune–tumor contact events

Michal Danino-Levi

Tal Goldberg

Maya Keter

Nikol Akselrod

Noa Shprach-Buaron

Modi Safra

Gonen Singer

Shahar Alon

Abstract

INTRODUCTION

FIGURE 1.

RESULTS AND DISCUSSION

3D segmentation of single-cell bodies using in situ sequencing data

FIGURE 2.

Identification of genes involved in cell–cell interactions using differential expression

FIGURE 3.

Identification of genes involved in cell–cell interactions using machine learning tools

FIGURE 4.

Identification of genes involved in cell–cell interactions using matrix factorization

Detection of proximity-induced genes as a function of the fraction of data used

Detecting spatially dependent genes and cell types

MATERIALS AND METHODS

Description of the data sets

Segmentation of cell bodies

Clustering segmented cells

Detecting differentially expressed genes

Machine learning pipeline

cNMF analysis

Quantifying the statistical significance of overlapping genes

Moran's I calculation

DATA DEPOSITION

SUPPLEMENTAL MATERIAL

ACKNOWLEDGMENTS

Footnotes

MEET THE FIRST AUTHOR

Michal Danino-Levi.

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases