Reconstructing physical cell interaction networks from single-cell data using Neighbor-seq

Bassel Ghaddar; Subhajyoti De

doi:10.1093/nar/gkac333

. 2022 May 10;50(14):e82. doi: 10.1093/nar/gkac333

Reconstructing physical cell interaction networks from single-cell data using Neighbor-seq

Bassel Ghaddar ¹, Subhajyoti De ^2,^✉

PMCID: PMC9371920 PMID: 35536255

Abstract

Cell-cell interactions are the fundamental building blocks of tissue organization and multicellular life. We developed Neighbor-seq, a method to identify and annotate the architecture of direct cell–cell interactions and relevant ligand–receptor signaling from the undissociated cell fractions in massively parallel single cell sequencing data. Neighbor-seq accurately identifies microanatomical features of diverse tissue types such as the small intestinal epithelium, terminal respiratory tract, and splenic white pulp. It also captures the differing topologies of cancer-immune-stromal cell communications in pancreatic and skin tumors, which are consistent with the patterns observed in spatial transcriptomic data. Neighbor-seq is fast and scalable. It draws inferences from routine single-cell data and does not require prior knowledge about sample cell-types or multiplets. Neighbor-seq provides a framework to study the organ-level cellular interactome in health and disease, bridging the gap between single-cell and spatial transcriptomics.

Graphical Abstract

INTRODUCTION

The spatial context of cells in a tissue and their resulting cell–cell communications influence numerous processes, including cellular differentiation, organ development and homeostasis, and immune interactions in disease (1). Few high-throughput methods exist that can resolve direct cellular communications in vivo at single-cell resolution. Single-cell RNA sequencing (scRNA-seq) can identify cell-types and states in heterogeneous tissues, but the tissue structure is largely destroyed in the process (2). Microscopy-based methods such as RNAscope and FISH can interrogate only preselected genes at high spatial resolution. Spatial transcriptomics allows profiling of microscopic regions, but still samples 10–100 cells per region (3,4). Sequencing of partially dissociated tissues and subsequent multiplet deconvolution permits inference of physical cell interactions, but this requires specialized experimental modifications (5–8). A general method that can infer direct cell–cell interactions and concurrent transcriptomic changes in vivo at single cell resolution would provide unprecedented insight into the building blocks of tissue architecture in healthy and diseased tissues.

Cell aggregates (multiplets) naturally arise in scRNA-seq experiments when two or more cells are captured in the same reaction droplet, and they typically represent at least several percent of all capture events (9,10). Such multiplets occur primarily due to incomplete tissue dissociation (6), or occasionally by random co-encapsulation. We developed Neighbor-seq, a method to infer physical cell–cell communications by identifying, annotating, and analyzing cell multiplets from the undissociated cell fractions in scRNA-seq data using computational approaches (see Methods, Figure 1A). Neighbor-seq provides a framework to study the cellular interactome in health and disease using standard scRNA-seq data.

Figure 1. — Benchmarking cell neighbor detection and annotation. (A) A schematic representation of the Neighbor-seq workflow. (B) Uniform manifold approximation and projection (UMAP) of barcode RNA sequencing data singlets and multiplets of known composition from three cancer cell lines, colored by cell-type identities and shaped by the number of cells per barcode. (C) Neighbor-seq barcode composition annotation performance of cell-line barcodes in (B), plotted by known barcode type. AUC = area under the receiver operator curve. (D) Benchmarking Neighbor-seq doublet detection against nine other methods using 16 datasets of diverse tissue types with experimentally annotated doublets (see Supplementary Table S1). Among these methods, only Neighbor-seq is explicitly optimized to infer doublet composition. Comparison of singlet vs. doublet classification area under the receiver operator curve (AUROC) distributions are shown. Boxplots show median (line), 25th and 75th percentiles (box) and 1.5× IQR (whiskers). Points represent outliers. (E) Comparison of the number of co-expressed ligand–receptor pairs in enriched (statistically significant) doublet types, not enriched doublets, and random synthetic doublets from the benchmarking studies in (D). Descriptions of the boxplots are as in (D). (Wilcoxon tests, ****P < 0.0001. See also Supplementary Figure S1B, C). (F) Neighbor-seq example receiver operator curve for classifying artificial multiplet types in the cline-ch study, one of the benchmark studies used in (D). AUC, area under the curve. (G) Robustness of cell neighbor type annotation. Comparison of the distribution of coefficients of variation for neighbor type counts from the cline-ch benchmark dataset across n = 10 runs of Neighbor-seq and scDblFinder (Wilcoxon test, boxplots as in (D)).

MATERIALS AND METHODS

Neighbor-seq algorithm

Neighbor-seq is a method to infer physical cell–cell communications by identifying, annotating and analyzing cell multiplets from the undissociated cell fractions in scRNA-seq data using machine learning approaches. The Neighbor-seq algorithm consists of the following components, each further described below: (i) barcode clustering and marker gene identification, (ii) Random Forest classifier training to identify multiplets and their cell type compositions, (iii) calculating enrichment scores for cell–cell interactions and (iv) construction of cell–cell interactome network and analysis of cell-neighbor transcriptomes, including ligand–receptor interactions (Supplementary Figure S4).

Input and cell type identification

The input for Neighbor-seq is a cell by gene counts matrix, and optionally, cell-type cluster labels. If cell-type labels are not provided, Neighbor-seq utilizes a wrapper function to run Seurat (11) functions that TP10K normalize and scale the scRNA-seq data, find a default of 5000 variable genes, perform principle component analysis with n = 50 components, and identify cell type clusters. Cell-type marker genes are then identified using the FindAllMarkers function using a default of 200 cells subsampled for each cluster, an average log fold change threshold cutoff of 1, and minimum fraction of cells expressing a gene of 0.2 for computational efficiency. If cell-type labels are known a priori, only normalization and marker finding functions are run.

Random forest classifier training

Next, all homotypic and heterotypic combinations for a default of two cell-types are enumerated (e.g. AA, AB, etc.). Cells forming a doublet or multiplet are hereon referred to as ‘neighbors’, and the ‘neighbor-type’ indicates the identities of the 2+ cells on a single barcode. For each neighbor-type, artificial multiplets are created by randomly sampling cells from the constituent cell types and the prepared input gene by cell matrix and summing their gene counts. A random forest is then trained using a balanced set of singlets and artificial multiplets to predict the cell type composition of the barcodes from the assembled dataset. Although the majority of barcodes are expected to contain single cells, a balanced training set is used to increase random forest prediction accuracy. As such, a default of 100 artificial multiplets is created for each neighbor-type, and these are pooled with a default of 100 singlets from each originally identified cell-type to create the training set. A random forest trained, and hold-out artificial multiplets data are used to assess random forest performance. The multiROC R package is used to compute receiver operator curves. The process is iterated multiple times to retrain and generate an ensemble annotation result. The trained random forest is then used to predict the barcode composition of all barcodes in the original dataset. Two random forest implementations are incorporated in Neighbor-seq: the randomForest (12) R package, run with a default of 500 trees, and a faster implementation via the xgboost (13) R package, configured with the following parameters: objective = multi:softprob, eval_metric = mlogloss, nround = 1, max_depth = 20, num_parallel_tree = 200, subsample = 0.632, colsample_bytree = sqrt(number cells)/total number of cells, and colsample_bynode = sqrt(number of cells)/total number of cell. The faster implementation is preferred with large datasets and is set as the default in Neighbor-seq. Both implementations generated consistent results on benchmark analyses (Supplementary Figure S5b).

Determining significantly enriched cell–cell interactions

Neighbor-seq calculates an enrichment score for each cell-type interaction and compares it to the distribution of enrichment scores expected by chance. The interaction enrichment score reflects the proportion of counts of a neighbor type relative to the product of the total number of edges detected from each constituent cell and all other cell types. For neighbor type C₁C_n composed of cell types C₁ … C_n, the enrichment score is specifically defined as:

The observed enrichment score for a neighbor type is compared to that expected by chance. The null hypothesis assumes that multiplet formation is random and thus the distribution of neighbor-types follows the underlying singlet population counts. As such, for each sample in a dataset, given n predicted singlets and m predicted multiplets consisting of x constituent cells, Neighbor-seq simulates the synthetic creation of m multiplets drawing without replacement from n + x cells. The resulting neighbor-types are tallied and their enrichment scores are computed. This simulation is repeated for a default of 100 times; for each neighbor type, lower tailed Wilcoxon testing quantifies the probability that the simulated enrichment scores have a central tendency greater than the observed enrichment score. All probabilities are adjusted using the Holm correction.

Construction of a cell–cell interactome network and transcriptomic analysis

The primary outputs of Neighbor-seq are: (i) the artificial multiplet training and test sets, (ii) the trained random forest, (iii) the barcode classification probabilities for all barcode classes and (iv) the neighbor enrichment analysis. The latter reports the counts of each multiplet class detected in a sample as well as its enrichment score and corresponding P-value. When an ensemble result is generated by retraining the algorithm over multiple iterations, Neighbor-seq additionally reports the mean counts for each multiplet class and a combined P-value using Fisher's method. Using the cell neighbor type enrichment data, we use igraph (14) to plot a network, and it is possible to calculate network statistics such as degree and centrality. We use ligand–receptor data from Ramilowski et al. (15) or Shao et al. (16) to identify highly expressed ligand–receptor pairs in cell neighbors. Differentially expressed genes and other transcriptomic features in neighbor-types can be assessed using Seurat (11).

Benchmarking, robustness and reproducibility

We used a number of different metrics for benchmarking Neighbor-seq.

Comparative assessment with other doublet finding methods

We obtained scRNA-seq data for the benchmark annotated doublet datasets from Xi et al. (10) (see Supplementary Table S1 for dataset details) and compared Neighbor-seq's ability to identify singlets versus doublets against the following nine published methods: doubletCells (17), Scrublet (9), DoubletDetection (https://github.com/JonathanShor/DoubletDetection), cxds (18),bcds (18), hybrid (18), solo (19), DoubletFinder (20) and scDblFinder (https://github.com/plger/scDblFinder). These methods are optimized to distinguish singlets and filter out doublets in scRNA-seq data, and only scDblFinder has the ability to propose doublet composition. These methods generally rely on simulating artificial doublets, projecting data into a lower dimensional space, and using a machine or deep learning classifier to annotate barcodes. In contrast, Neighbor-seq utilizes cell-type specific clustering and marker gene sets to train a classifier directly on the gene counts of all possible expected barcode compositions, and it explicitly assigns multiplet class probabilities for each barcode rather than annotating the broader label of singlet or doublet. To evaluate Neighbor-seq, we ran Neighbor-seq using the default parameters on each benchmark dataset. To compute the probability of each barcode being either a singlet or doublet, we summed the Neighbor-seq probabilities for all doublet classes. Classification performance (area under the receiver operator curves) for the nine benchmark doublet finding methods tested on these datasets were obtained from Xi et al. (10), who ran these methods using their default settings. Area under the receiver operator curves for the Neighbor-seq singlet versus doublet classification were calculated as above with multiROC and compared with all other methods.

To compare the stability of doublet population predictions across multiple runs from Neighbor-seq and scDblFinder, we ran each algorithm 10 times on the cline-ch benchmark dataset obtained from Xi et al. (10) For each doublet class, the coefficient of variation of its counts (mean/standard deviation) was computed, and the distributions of coefficients of variations were compared using Wilcoxon testing.

Ensemble prediction

To increase reproducibility of results, artificial multiplet construction and/or random forest training can be run for multiple iterations and an ensemble results can be computed. For each neighboring cell-type pair, the mean observed counts and Fisher's combined adjusted P-value are reported. We observed a high degree of reproducibility across multiple iterations and with a range of random seeds. These multiplet type counts and P-value thresholds for determining significant interactions were set to counts >10 and P-value <0.05 for most analyses in this study, but they can be adjusted and interpreted in light of data quality and known biology.

Evaluation of ligand–receptor co-expression in cell neighbors

Ligand-receptor data was obtained from Ramilowski et al. (15) or Shao et al. (16). We compared three classes of barcodes: (i) multiplet-classes with statistically significant enrichment, (ii) multiplet-classes without statistical enrichment and (iii) randomly synthesized multiplets from the same datasets. Multiplet class statistical enrichment was determined as described above. Random multiplets of degree 2 were created by aggregating the read counts of randomly sampled cells. We determined the number of matched ligand–receptor pairs as follows. Using TP10K normalized scRNA data, we called a gene ‘expressed’ in a barcode if its normalized count was greater than the 25th percentile of expression across all barcodes. For each barcode in a dataset, we counted the number of expressed ligands whose receptor was also expressed. We compared the distributions of the number of co-expressed ligand–receptor pairs across the three barcode classes (enriched multiplet types, non-enriched multiplet types, random multiplets) using Wilcoxon testing.

Identifying cell–cell interactions in the small intestine

We obtained scRNA-seq data and metadata, including cell-type annotations, from Haber et al. (21) UMAP plots were drawn using Seurat (11). Neighbor-seq was run for n = 10 iterations using default parameters. Cell-cell interactions were considered significant for those interactions with mean counts >10 and combined P-value <0.05. The validation analysis in known intestinal singlets multiplets was done using data from Andrews et al. (6), who also provided metadata and cell-type annotations. Neighbor-seq was run with default parameters as above on the small intestine scRNA-seq data. The resulting classifier was used to predict interactions in the accompanying small intestine partially dissociated clumps. Interactions were kept for those with counts >10 and adjusted P-value <0.05. Quantitative comparison of cell–cell networks from different studies was done by converting the networks into adjacency matrices and computing the inner product correlation. A permutation-based test was used to calculate the statistical significance of the inner product correlation. For a pair of matrices, random symmetric adjacency matrices are generated with the same number of nodes and edges as the test matrix, and a distribution of inner product correlations is computed. The center of this distribution is compared to the actual inner product correlation between the two test matrices using a one-sided Wilcoxon test.

Identifying cell–cell interactions in the lung

We obtained scRNA-seq data and metadata, including cell-type annotations, from Travaglini et al. (22) To reduce computational complexity, cell-type annotations were collapsed into the following parent classes: alveolar, basal, ciliated, club, endothelial, fibroblast, goblet, immune, mucous, smooth muscle. UMAP plots were drawn using Seurat (11). Neighbor-seq was run for n = 10 iterations using default parameters. Cell-cell interactions were considered significant for those interactions with mean counts >10 and combined P-value <0.05. The validation analysis in known lung singlets and multiplets was done using data from Andrews et al. (6), who also provided metadata and cell-type annotations. Neighbor-seq was run with default parameters as above on the lung scRNA-seq data. The resulting classifier was used to predict interactions in the accompanying lung partially dissociated clumps. Interactions were kept for those with counts >10 and adjusted P-value <0.05. Quantitative comparison of cell–cell networks from different studies was done with the permutation-based inner correlation test as described for the small intestine analysis.

Identifying cell–cell interactions in the spleen

We obtained scRNA-seq data and metadata, including cell-type annotations, from Madissoon et al. (23) To reduce computational complexity, cell-type annotations were collapsed into the following parent classes: CD34 progenitor, CD4 T-cells, CD8 T-cells, Cycling T-cells, dendritic cells, follicular B cells, germinal center B cells, innate lymphoid cells, macrophage, mantle B cells, monocytes, natural killer cells, plasma B cells and platelets. UMAP plots were drawn using Seurat (11). Neighbor-seq was run for n = 10 iterations using default parameters. This study contained data from 19 human samples. Cell-cell interactions were considered significant for those interactions found in >1 sample and with mean counts >5 and combined P-value <0.05. The validation analysis was done using spleen data from Tabula Muris (24), who also provided metadata and cell-type annotations. Cell-type annotations were appended with their cluster label, resulting in multiple B-cell, T-cell and myeloid clusters. Neighbor-seq was run with default parameters as above. Cell–cell interactions were considered significant for those interactions with mean counts >5 and combined P-value <0.05.

Identifying cell–cell interactions in pancreatic cancer

We obtained scRNA-seq data and metadata, including cell-type annotations, from Peng et al. (25) UMAP plots were drawn using Seurat (11). Neighbor-seq was run for n = 10 iterations using default parameters. Cell–cell interactions were considered significant for those interactions with mean counts >10 and combined P-value <0.05. Ligand-receptor analysis was done as described above. Betweenness centrality was calculated using igraph (14). Hierarchical clustering of betweenness centralities scaled by cell-type was done using the pheatmap R package (https://CRAN.R-project.org/package = pheatmap). Spatial transcriptomic data for PDAC tumors was obtained from Moncada et al. (26) Cell-type scores for each spatial barcode were calculated as follows. We used the FindAllMarkers function from Seurat (11) to find differentially expressed genes for the PDAC cell-types in Peng et al. (25). Only highly cell-type specific genes were found by using these parameters: logfc.threshold = 2, min.diff.pct = 0.5, min.pct = 0.5. Spatial barcodes were TP10K normalized, and cell-type scores were calculated as the average expression of each cell-type gene signature. Scores Spearman correlations were computed using the cor.test function in R. Quantitative comparison of cell–cell networks from different studies was done by converting the networks into adjacency matrices and computing the inner product correlation. The Spearman correlation matrix of the transcriptomic data was used as the adjacency matrix, only keeping those edges with values r > 0.2 and P < 0.05. A permutation-based test was used to calculate the statistical significance of the inner product correlation as described above.

Identifying cell–cell interactions in skin cancer

We obtained scRNA-seq data and metadata, including cell-type annotations, from Ji et al. (27). To reduce computational complexity, immune cell-types annotations were collapsed into parent lymphoid and myeloid classes. UMAP plots were drawn using Seurat (11). Neighbor-seq was run for n = 10 iterations using default parameters. Cell-cell interactions were considered significant for those interactions with mean counts >10 and combined P-value <0.05. Ligand-receptor analysis was done as described above. Cell-type scores for each spatial barcode was calculated as follows. We used the FindAllMarkers function from Seurat (11) to find differentially expressed genes for the cell-types in Ji et al. Only highly cell-type specific genes were found by using these parameters: logfc.threshold = 0.5, min.diff.pct = 0.52. Marker genes were only kept if they were differentially expressed in <5 cell types. Spatial barcodes were TP10K normalized, and cell-type scores were calculated as the average expression of each cell-type gene signature. Scores Pearson correlations were computed using the cor.test function in R.

RESULTS

Neighbor-seq identifies physical cell interactions by using a machine learning classifier that annotates the cellular composition of each scRNA-seq barcode. Briefly, scRNA-seq data is filtered to remove low quality cells, the remaining barcodes are clustered based on their gene expression, and cluster marker genes are identified using a standard approach (11). A vast majority of the barcodes typically represent genuine single cells, with a minority being doublets or multiplets. Next, a training set of artificial multiplets is constructed representing all possible multiplet types by randomly sampling cells from each cell cluster and aggregating their raw read counts. The artificial multiplets can be homotypic (same cell-type) or heterotypic (different cell-type) and can be of order 2 (e.g. AA, AB, BB, etc.) or higher (e.g. AAA, ABA, etc.). A Random Forest cell-type classifier is trained based on the expression of marker gene sets from n (default: 100) randomly sampled barcodes from each artificial multiplet type, as well as n barcodes from each original cell type cluster, which are predominantly singlets. Neighbor-seq then applies the classifier to annotate each barcode in the original dataset and identifies those classified as multiplets. An ensemble annotation result is generated by iterating the algorithm. Neighbor-seq computes the enrichment and corresponding statistical significance of the observed frequencies of multiplet types compared to that expected based on the prevalence of the original cell-types in the dataset. Accordingly, Neighbor-seq constructs a network representing significant cell-type interactions and enables transcriptomic analysis of interacting cells.

Benchmarking and validation with known multiplets

We evaluated Neighbor-seq's ability to identify physically interacting cells via (i) annotation of multiplets with known composition, (ii) detection of known doublets in diverse tissues and (iii) identification of known cellular architectures in solid tissues. First, we tested Neighbor-seq's ability to annotate barcode composition in a controlled setting. We obtained scRNA-seq data (6) for three cancer cell lines that had been sorted as either singlets or multiplets of known composition (Figure 1B); we trained Neighbor-seq on the singlet data and used it to annotate singlet and multiplet barcode compositions, which contained between 1 and 4 cells in nonsymmetrical combinations. Neighbor-seq classified all droplets with high accuracy (AUC: singlets: 1, doublets: 0.99, triplets: 0.94, quadruplets: 0.82; Figure 1C), indicating that it can reliably infer barcode composition.

Next, we benchmarked Neighbor-seq against nine other published doublet-finding methods using 16 datasets of diverse tissue types in which doublets were experimentally annotated (10) (Supplementary Table S1). Neighbor-seq performed better or equally well to all published methods in identifying doublets (Wilcoxon P > 0.8, Figure 1D), but with the unique advantages of (i) determining multiplet cell-types, (ii) identifying significantly enriched cell-type interactions, (iii) requiring no parameter tuning and (iv) making no assumptions about the underlying doublet frequencies, which ranged from 2.5% to 37% in the benchmark datasets (Supplementary Figure S1A) and may not be known a priori in a real dataset. Moreover, Neighbor-seq is also the only method that can annotate higher order multiplets and can transfer knowledge across datasets.

We further hypothesized that since doublets contain neighboring cells, they would have increased co-expression of ligand–receptor pairs. Indeed, enriched doublets-types co-expressed significantly more ligand–receptor pairs than non-enriched doublets, and significantly more pairs than randomly synthesized doublets (both Wilcoxon P < 2.2e–16, Figure 1E, Supplementary Figure S1B, see Materials and Methods for details). This was true in aggregate using two different ligand–receptor databases (15,16) and for each benchmark study individually (Supplementary Figure S1B).

Neighbor-seq performed exceptionally well at classifying singlets and all artificial doublet types (average AUC = 0.99, Figure 1F) and performed significantly better than expected by chance when cell-type labels were randomized (cline_ch data; average AUC = 0.501, Wilcoxon P < 2.2e–16, Supplementary Figure S1C, D). Neighbor-seq classification accuracies across all benchmark datasets were similarly high (all AUC > 0.92, Supplementary Figure S1E). While Neighbor-seq is the only method designed to infer doublet-types, scDblFinder also proposes doublet compositions, but when comparing the stability of doublet-type counts across multiple runs on the same dataset, Neighbor-seq results were significantly more stable (Wilcoxon P < 2.2e–16, Figure 1G). These results indicate that Neighbor-seq successfully and identifies doublets across a range of tissue types and reproducibly annotates their cellular composition. Taken together, these attributes make Neighbor-seq first in the class of innovative computational methods that can help reconstruct physical cell interaction networks from single-cell sequencing data.

Identifying cell–cell interactions in the small intestine

We examined whether Neighbor-seq could identify known interaction architectures in multiple tissue types with varying levels of cell-type diversity and organization. First, we tested Neighbor-seq on a scRNA-seq survey of the small intestinal epithelium containing 11 666 cells from n = 2 mice (21). The small intestine consists of alternating units of villi and crypts. Paneth cells in the crypt protect the residing stem cells, which differentiate and migrate upwards through various progenitor stages until becoming mature enterocytes, while goblet cells are scattered throughout villus (28) (Figure 2A). We identified the major cell types (Figure 2B) and used Neighbor-seq to recover their cellular adjacencies. Neighbor-seq correctly detected Paneth-stem interactions, a progression from stem to mature enterocytes, and multiple goblet cell interactions in the villus (Figure 2C, see Materials and Methods for details). We confirmed this microanatomy by training Neighbor-seq on a separate study of 5279 small intestinal singlets and used it to identify interactions in a dataset of 3671 intestinal multiplets (6) (Figure 2D). This validation analysis revealed a similar interaction network as obtained only from scRNA-seq data (Figure 2E). Differences between the schematic illustration and the two networks are primarily due to different cell types annotated in the two datasets and different optimized network layouts; nonetheless, both networks recapitulate a progression from the Paneth-stem crypt to the mature enterocytes at the top of the villus whilst passing through transition and progenitor cells. Furthermore, both networks supported known ligand–receptor signaling (28), such as LGR, LRP, BMP and NOTCH-mediated interactions between Paneth and stem cells. The directionality of this signaling could be inferred by inspecting the stem and Paneth cell singlets data, in which stem cells had significantly higher expression of the aforementioned genes compared to Paneth cells (Wilcoxon testing, P < 2e–16 for all four genes). To quantitatively compare the networks obtained from the two studies, we harmonized the cell labels based on cell ontology and represented them as adjacency matrices (Figure 2F). The two matrices were significantly more correlated than expected by chance (inner product correlation = 0.53, Wilcoxon permutation test, P = 4e–17, see Materials and Methods), indicating that Neighbor-seq identified similar cell networks from independent studies of the same tissue types.

Figure 2. — Detecting known microanatomical features of the small intestinal epithelium. (A) Illustration of the main cell types in the small intestinal crypt and villus. (B) Uniform manifold approximation and projection (UMAP) of 11 665 small intestinal cells (21) from the duodenum, jejunum, and ileum (n = 2 mice) colored by cell type. TA, transit amplifying; EP, enterocyte progenitor. (C) Network diagram of significant cell type interactions from (B) identified by Neighbor-seq. Neighbor-seq is run for n = 10 iterations, and interactions are shown for edges with a mean count >10 and enrichment score combined adjusted P-value <0.05 (see Materials and Methods). Edge thickness represents interaction P-value and edge color represents counts. Green color scale represents anatomical progression from crypt to villus. (D) UMAP of 5279 cells from the small intestine (6) (n = 1 mouse) colored by cell type. Neighbor-seq is trained on these cells and used to predict the interaction network in a dataset of partially dissociated intestinal clumps. (E) Network diagram of significant cell type interactions identified by Neighbor-seq from 3671 small intestinal clumps. Methods, edge color and thickness, and colors scale are the same as in (C). (F) Adjacency matrix representation of the networks from (C) and (E). Cell labels were harmonized based on cell ontologies. Red color indicates the presence of a connection, white indicates no connection.

Identifying cell–cell interactions in the lung

Second, we tested Neighbor-seq's ability to identify microanatomical structures in the lung, which contains interactions between cells of multiple lineages. The terminal bronchioles contain ciliated epithelium and club cells, smooth muscle, mucous secreting cells, and basal stem cells; gas exchange occurs in the alveoli, which contain alveolar pneumocytes and macrophages and are lined by vessels, fibroblasts, and smooth muscle (22) (Figure 3A). We analyzed a human lung cell atlas containing 65 662 cells from n = 5 human donors (22), identified the cell-types (Figure 3B) and recovered an interaction network (Figure 3C, see Materials and Methods for details). Neighbor-seq correctly separated the bronchiolar and alveolar compartments and identified the gas exchange membrane between alveolar and endothelial cells. We confirmed detection of these cell–cell interactions in a dataset of lung singlets and multiplets (6) (Figure 3D). Training Neighbor-seq on the singlet data and deconvoluting interactions from the known multiplets again revealed bronchiolar and alveolar compartments as well as several immune cell interactions (Figure 3E). Neighbor-seq networks also corroborated known signaling pathways in lung development (29), such as by expression of NOTCH, FGF, and EGF ligands in basal- cell neighbors. Representing the networks derived from the two studies as adjacency matrices with harmonized cell-ontology labeling further indicated significant similarity in the cell–cell connections identified (Figure 3F, inner product correlation = 0.73, Wilcoxon permutation test, P = 5e–17, see Materials and Methods).

Figure 3. — Detecting known microanatomical features of the terminal respiratory tract. (A) Illustration of the main cell types in the terminal bronchioles and alveolus. AT1, alveolar type 1 cell, AT2, alveolar type 2 cell (B) Uniform manifold approximation and projection (UMAP) 65,662 cells (22) from the terminal respiratory tract (n = 5 human samples) colored by cell type. (C) Network diagram of significant cell type interactions from (B) identified by Neighbor-seq, colored by known microanatomical compartment. Data is shown for n = 10 iterations, mean counts >10, combined P-value < 0.05. (D) UMAP of 6084 mouse lung cells (6) (n = 3 mice) colored by cell type. Neighbor-seq is trained on these cells and used to predict the interaction network in a dataset of partially dissociated lung clumps. (E) Network diagram of significant cell type interactions identified by Neighbor-seq from 4729 lung clumps. Methods, edge color and thickness, and colors scale are the same as in (C). (F) Adjacency matrix representation of the networks from (C) and (E). Cell labels were harmonized based on cell ontologies. Red color indicates the presence of a connection, white indicates no connection.

Identifying cell–cell interactions in the spleen

Third, we used Neighbor-seq to predict interactions in the splenic white pulp, a structure not held by tight junctions but whose organization is chemokine driven (30). The white pulp is surrounded by myeloid cells and consists of separate B- and T-cell zones, with B-cell follicles further separating into germinal centers and mantle zones and T-cells interacting with antigen presenting myeloid cells (30) (Figure 4A). We used Neighbor-seq to analyze a dataset of human spleen containing 94,050 immune cells from n = 19 human samples (23) (Figure 4B, see Materials and Methods for details). Neighbor-seq correctly identified the B and T-cell zones; germinal centers did not interact with non-B-cells, and myeloid cells interacted with both B and T lymphocytes (Figure 4C). We confirmed these structures by analyzing splenic tissue from the Tabula Muris (24) containing 9552 cells of lymphoid and myeloid origin (Figure 4D). Neighbor-seq again identified B and T-cell zones with myeloid interactions (Figure 4E). Collectively, these analyses confirmed that Neighbor-seq can correctly identify direct cell–cell interactions and microanatomies in vivo in diverse tissue types.

Figure 4. — Detecting known microanatomical features of the splenic white pulp. (A) Illustration of the main cell types in the splenic white pulp. (B) Uniform manifold approximation and projection (UMAP) of 94 050 splenic white pulp immune cells (23) (n = 19 human samples) colored by cell type. DC, dendritic cell; GC, germinal center; ILC, innate lymphoid cell; NK, natural killer cell. (C) Network diagram of significant cell type interactions from (B) identified by Neighbor-seq, colored by known cell type lineage. Data is shown for n = 10 iterations, mean counts >5, combined P-value < 0.05. (D) UMAP of 9552 murine immune cells from Tabula Muris (24) colored by cell type. (E) Network diagram of significant cell type interactions from (D) identified by Neighbor-seq, colored by known cell compartment. Methods, edge color and thickness, and colors scale are the same as in (C).

Identifying cell–cell interactions in pancreatic cancer

We next used Neighbor-seq to analyze inter-cellular interactions in pancreatic ductal adenocarcinoma (PDAC). We obtained scRNA-seq data for 24 tumors and 11 control samples (25); these tissues had normal to poorly differentiated histopathology, were from anatomic locations throughout the pancreas (Supplementary Table S2), and in total contained 57 530 cells from a mixture of normal and malignant epithelial cells and immune, stromal and endothelial cells (Figure 5A). Neighbor-seq detected 17 580 doublets (31% of barcodes), of which 1373 were heterotypic doublets (2.3% of all barcodes)—this fraction of heterotypic cell neighbors was typical for all datasets analyzed.

Out of 45 possible heterotypic cell-type pairs, Neighbor-seq identified 19 enriched interactions (Figure 5B). Normal pancreatic epithelium (ductal, endocrine, acinar cells) interacted with each other and with endothelial cells, whereas tumor cells had stronger connections with ductal cells and fibroblasts and immune cells interacted with most cell-types. These interactions were detected across multiple samples, the most common being B cell-myeloid, ductal-tumor and tumor-T cell interactions (Figure 5B, Supplementary Figure S2A, Table S3). Again, enriched cell neighbors co-expressed significantly more matched ligand–receptor pairs than non-enriched doublet-types or random synthetic doublets (Wilcoxon P < 2.2e–16, Figure 5C). The detected neighboring cell communications were also consistent with existing literature. For example, BTK-dependent B-cell interactions with FCγR+ myeloid cells were recently implicated in PDAC progression (31), and we observed significantly greater BTK (Wilcoxon, P = 6.3e–40) and FCGR1A expression (Wilcoxon, P = 6.8e–59) in B cell-myeloid doublets compared to all other doublet-types. This highlights how Neighbor-seq can be used to identify interaction-specific transcriptional changes with functional relevance.

The topology of cell–cell interactions varied across samples, which could be due to intra- or inter-tumor heterogeneity in cell interactions or differential presence/absence of doublet-types in a sample. We computed the betweenness centrality of cell-types in each sample, and clustering of these centrality scores revealed three classes of tissue microenvironments (ME, Figure 5D). ME1 contained mostly normal ductal epithelial connections, ME2 was dominated by tumor cell connections and ME3 by immune cell edges (Figure 5D). These ME classes had significantly different histopathological characteristics (Fisher's exact test; pathology: P = 5e–4; stage: P = 5e–4; lymph node invasion: P = 3e–3). These analyses indicate that Neighbor-seq can uncover direct cellular interactions and tumor microenvironment characteristics with functional or clinical relevance.

Next, we analyzed spatial transcriptomic data for pancreatic ductal adenocarcinoma (PDAC) samples from a published study (26) where there were 243–996 spatially annotated barcodes per sample, each capturing the aggregated transcriptomic makeup of ∼20–70 cells in tumor and adjacent normal tissue microenvironments. We observed organizational contexts similar to what we identified from scRNA-seq data alone. We used marker genes to calculate cell-type scores for each spatial barcode (see Materials and Methods; PDAC-A: n = 428, PDAC-B: n = 224 spatial barcodes) and visualized the distribution of cell-types in the tissues (Figure 5E). Tumor cells co-localized with ductal cells (Spearman correlation; A: r = 0.19, P = 2e–2; B: r = 0.44, P = 4e–10) and fibroblasts (Spearman correlation; A: r = 0.30, P = 8e–9). Ductal cells co-localized with acinar cells (Spearman correlation; A: r = 0.16, P = 2e–2), endocrine cells (Spearman correlation; A: r = 0.29, P = 5e–8), and endothelial cells (Spearman correlation; A: r = 0.41, P = 1e–17). B and T cells were scattered throughout the tissue, but myeloid cells did not significantly co-localize with tumor cells. We observed similar patterns when visualizing spatial transcriptomic data for n = 7 other PDAC tumors (Supplementary Figure S3A). These findings are broadly consistent with the architecture of cell-networks we deduced from scRNA-seq data of a different cohort of PDAC tumors (Figure 5C), suggesting that despite intra- and inter-tumor heterogeneity, the topology of cell–cell interactions may be similar. We represented the cell–cell networks derived from scRNA-seq and the spatial transcriptomic data as cell adjacency matrices (Supplementary Figure S2B, see Materials and Methods). Inner product correlation further indicated that the cell adjacencies identified from the two studies were significantly more similar than expected by chance (r = 0.33, P = 2e–11). This validation across two different technology platforms reinforces Neighbor-seq's ability to infer cellular connectomes from scRNA-seq data and bridge between single-cell and spatial transcriptomics, which has a resolution of 10¹–10² cells.

Identifying cell–cell interactions in skin cancer

Lastly, we used Neighbor-seq to analyze physical cell interactions in cutaneous squamous cell carcinoma (SCC). We obtained scRNA-seq data for 10 tumors, of which four tumors also had spatial transcriptomic data (27). These tissues had moderate or well differentiated histology, were from various primary sites on the body, and the dataset in total contained 48 164 single-cell transcriptomes from 16 cell subtypes of epithelial, stromal, and immune origin (Figure 6A). Neighbor-seq detected 15 626 doublets (32% of all barcodes), of which 5346 were heterotypic doublets (11% of all barcodes). Of 105 possible heterotypic cell-type pairs, Neighbor-seq identified 38 significant pairwise interactions (Figure 6B). Tumor cell populations were on the edge of the cell interaction network and showed significant interactions mostly with normal epithelium. Unlike in pancreatic cancer (Figure 5B), immune cells were located peripherally in the network and interacted mostly with normal epithelium. We did not observe direct connections between lymphoid cells and any tumor subpopulation (Figure 6B), which might suggest potential immune evasion.

Figure 6. — Identification of cell type interactions in skin cancer. (A) Uniform manifold approximation and projection (UMAP) of 48 164 primary human cells from n = 10 cutaneous squamous cell carcinomas (27) colored by cell type. TSK, tumor-specific keratinocyte; KC, keratinocyte; Cyc, cycling; Diff, differentiating. (B) Network diagram of significant cell type interactions from (A) identified by Neighbor-seq, colored by known cell compartment. Edge thickness represents interaction P-value and edge color represents counts. Neighbor-seq is run for n = 10 iterations, and interactions are shown for edges with a mean count >10 and enrichment score combined adjusted P-value <0.05 (see Materials and Methods). (C) Spatial transcriptomic maps of four patients (P2: n = 666, P5: n = 521, P9: n = 1145, P10: n = 608 barcodes) colored by cell-type abundance scores (see Materials and Methods). (D) Boxplots comparing spatial barcode cell-type abundance scores between cell-type pairs with enriched doublets from (B) compared to all other possible pairs. Boxplots show median (line), 25th and 75th percentiles (box) and 1.5× IQR (whiskers). Points represent outliers. Wilcoxon testing. (E) Comparison of the number of co-expressed ligand–receptor pairs in enriched (statistically significant) doublet types, not enriched doublets and random synthetic doublets from (A). Boxplots are the same as in (D).

These general patterns identified from the inferred cell-interaction network agreed with observations made from spatial transcriptomic sequencing data of the same tumors profiled at 8179 spots. Using the single-cell data, we identified cell-type marker genes and used these to calculate cell-type scores for each spatial sequencing spot in four patients (see Methods). Coloring the spatial maps by composite tumor, normal epithelium and stroma, or immune scores revealed spatial contexts similar to those we identified with Neighbor-seq from the single-cell data alone (Figure 6C). Most prominently in patient P2, tumor populations occupied a separate region of the spatial map with minimal immune presence, while the adjacent regions were rich in normal epithelium, stromal, and immune genetic activity. We next compared the specific cell-type pair associations identified by Neighbor-seq (Figure 6B) with the cell-type score correlations in the matched spatial data. For all patients, cell-types with enriched neighbor interactions in scRNA-seq were significantly more correlated with each other across spatial spots than were cell-type pairs that did not have enriched doublets in scRNA-seq (P2: P = 2.3e–7, P5: P = 3.8e–4, P9: P = 3.4e–6, P10: P = 4.2e–10; Wilcoxon; Figure 6D). Lastly, we confirmed that the enriched cell neighbors in scRNA-seq exhibited increased pairwise crosstalk by assessing the number of co-expressed ligand–receptor pairs (see Materials and Methods). Consistent with our previous analyses, enriched doublets expressed significantly more ligand–receptor pairs than non-enriched doublets or randomly synthesized doublets (both Wilcoxon P < 2.2e–16, Figure 6E). Taken together, these results indicate that Neighbor-seq accurately identifies and annotates multiplets that contain interacting cells across a range of normal and diseased tissues, and these interactions are consistent with spatial contexts identified from both the same and different samples.

DISCUSSION

In summary, Neighbor-seq infers direct cell–cell interactions by identifying undissociated cell multiplets in standard scRNA-seq data and classifying them according to their constituent cell-types, ultimately building the cellular interactome in diverse normal and diseased tissues. It shows high accuracy and reproducibility, and the results are in agreement with prior knowledge about tissue microenvironments in well-studied tissues and with spatial transcriptomic data. Neighbor-seq complements emerging methods (5–8) that are optimized for deconvoluting the transcriptomics of single cells and cell clumps. It, however, reconstructs tissue-scale cell interaction networks using undissociated multiplets from standard scRNA-seq alone, thereby eliminating the need for specialized sample preparation and boosting scalability. Neighbor-seq provides a framework to study the topology of cell–cell interactome leading to the organization of tissue microenvironment, bridging the gap in resolution between single-cell and spatial transcriptomics. Neighbor-seq is available at https://github.com/sjdlabgroup/Neighborseq.

DATA AVAILABILITY

The Neighbor-seq resource and user-friendly documentations are freely available on Github at https://github.com/sjdlabgroup/Neighborseq.

Supplementary Material

gkac333_Supplemental_Files

Click here for additional data file.^{(4.4MB, zip)}

ACKNOWLEDGEMENTS

We acknowledge the Office of Advanced Research Computing (OARC) at Rutgers, The State University of New Jersey for providing access to the Amarel cluster URL: https://it.rutgers.edu/oarc.

Contributor Information

Bassel Ghaddar, Rutgers Cancer Institute of New Jersey, Rutgers the State University of New Jersey, New Brunswick, NJ 08901, USA.

Subhajyoti De, Rutgers Cancer Institute of New Jersey, Rutgers the State University of New Jersey, New Brunswick, NJ 08901, USA.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

National Institutes of Health [R21CA248122, R01GM129066 to S.D.]; National Institutes of Health, National Center for Advancing Translational Sciences, Rutgers Clinical and Translational Science Award [TL1TR003019 to B.G.]. The open access publication charge for this paper has been waived by Oxford University Press - NAR Editorial Board members are entitled to one free paper per year in recognition of their work on behalf of the journal.

Conflict of interest statement. None declared.

REFERENCES

1. Armingol E., Officer A., Harismendy O., Lewis N.E.. Deciphering cell–cell interactions and communication from gene expression. Nat. Rev. Genet. 2021; 22:71–88. [DOI] [PMC free article] [PubMed] [Google Scholar]
2. Regev A., Teichmann S.A., Lander E.S., Amit I., Benoist C., Birney E., Bodenmiller B., Campbell P., Carninci P., Clatworthy M.et al.. Science forum: the human cell atlas. Elife. 2017; 6:e27041. [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Asp M., Bergenstråhle J., Lundeberg J.. Spatially resolved transcriptomes—next generation tools for tissue exploration. BioEssays. 2020; 42:1900221. [DOI] [PubMed] [Google Scholar]
4. Crosetto N., Bienko M., van Oudenaarden A.. Spatially resolved transcriptomics and beyond. Nat. Rev. Genet. 2015; 16:57–66. [DOI] [PubMed] [Google Scholar]
5. Manco R., Averbukh I., Porat Z., Halpern K.B., Amit I., Itzkovitz S.. Clump sequencing exposes the spatial expression programs of intestinal secretory cells. Nat. Commun. 2021; 12:3074. [DOI] [PMC free article] [PubMed] [Google Scholar]
6. Andrews N., Serviss J.T., Geyer N., Andersson A.B., Dzwonkowska E., Šutevski I., Heijboer R., Baryawno N., Gerling M., Enge M.. An unsupervised method for physical cell interaction profiling of complex tissues. Nat. Methods. 2021; 18:912–920. [DOI] [PubMed] [Google Scholar]
7. Boisset J.-C., Vivié J., Grün D., Muraro M.J., Lyubimova A., van Oudenaarden A.. Mapping the physical network of cellular interactions. Nat. Methods. 2018; 15:547–553. [DOI] [PubMed] [Google Scholar]
8. Giladi A., Cohen M., Medaglia C., Baran Y., Li B., Zada M., Bost P., Blecher-Gonen R., Salame T.-M., Mayer J.U.et al.. Dissecting cellular crosstalk by sequencing physically interacting cells. Nat. Biotechnol. 2020; 38:629–637. [DOI] [PubMed] [Google Scholar]
9. Wolock S.L., Lopez R., Klein A.M.. Scrublet: computational identification of cell doublets in single-cell transcriptomic data. Cell Syst. 2019; 8:281–291. [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Xi N.M., Li J.J.. Benchmarking computational doublet-detection methods for single-cell RNA sequencing data. Cell Syst. 2021; 12:176–194. [DOI] [PMC free article] [PubMed] [Google Scholar]
11. Stuart T., Butler A., Hoffman P., Stoeckius M., Smibert P., Satija R., Stuart T., Butler A., Hoffman P., Hafemeister C.et al.. Comprehensive integration of single-cell data. Cell. 2019; 177:1888–1902. [DOI] [PMC free article] [PubMed] [Google Scholar]
12. Liaw A., Wiener M.. Classification and regression by randomForest. R News. 2002; 2:18–22. [Google Scholar]
13. Chen T., Guestrin C.. XGBoost: a scalable tree boosting system. 2016; arXiv doi:10 June 2016, preprint: not peer reviewedhttps://arxiv.org/abs/1603.02754.
14. Csardi G., Nepusz T.. The igraph software package for complex network research. InterJ. Complex Syst. 2006; 1695:1696. [Google Scholar]
15. Ramilowski J.A., Goldberg T., Harshbarger J., Kloppmann E., Lizio M., Satagopam V.P., Itoh M., Kawaji H., Carninci P., Rost B.et al.. A draft network of ligand–receptor-mediated multicellular signalling in human. Nat. Commun. 2015; 6:7866. [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Shao X., Liao J., Li C., Lu X., Cheng J., Fan X.. CellTalkDB: a manually curated database of ligand–receptor interactions in humans and mice. Brief. Bioinform. 2021; 22:bbaa269. [DOI] [PubMed] [Google Scholar]
17. Lun A.T.L., McCarthy D.J., Marioni J.C.. A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor. F1000Research. 2016; 5:2122. [DOI] [PMC free article] [PubMed] [Google Scholar]
18. Bais A.S., Kostka D.. scds: computational annotation of doublets in single-cell RNA sequencing data. Bioinformatics. 2020; 36:1150–1158. [DOI] [PMC free article] [PubMed] [Google Scholar]
19. Bernstein N.J., Fong N.L., Lam I., Roy M.A., Hendrickson D.G., Kelley D.R.. Solo: doublet identification in single-cell RNA-Seq via semi-supervised deep learning. Cell Syst. 2020; 11:95–101. [DOI] [PubMed] [Google Scholar]
20. McGinnis C.S., Murrow L.M., Gartner Z.J.. DoubletFinder: doublet detection in single-cell RNA sequencing data using artificial nearest neighbors. Cell Syst. 2019; 8:329–337. [DOI] [PMC free article] [PubMed] [Google Scholar]
21. Haber A.L., Biton M., Rogel N., Herbst R.H., Shekhar K., Smillie C., Burgin G., Delorey T.M., Howitt M.R., Katz Y.et al.. A single-cell survey of the small intestinal epithelium. Nature. 2017; 551:333–339. [DOI] [PMC free article] [PubMed] [Google Scholar]
22. Travaglini K.J., Nabhan A.N., Penland L., Sinha R., Gillich A., Sit R.V, Chang S., Conley S.D., Mori Y., Seita J.et al.. A molecular cell atlas of the human lung from single-cell RNA sequencing. Nature. 2020; 587:619–625. [DOI] [PMC free article] [PubMed] [Google Scholar]
23. Madissoon E., Wilbrey-Clark A., Miragaia R.J., Saeb-Parsy K., Mahbubani K.T., Georgakopoulos N., Harding P., Polanski K., Huang N., Nowicki-Osuch K.et al.. scRNA-seq assessment of the human lung, spleen, and esophagus tissue stability after cold preservation. Genome Biol. 2019; 21:1. [DOI] [PMC free article] [PubMed] [Google Scholar]
24. Schaum N., Karkanias J., Neff N.F., May A.P., Quake S.R., Wyss-Coray T., Darmanis S., Batson J., Botvinnik O., Chen M.B.et al.. Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature. 2018; 562:367–372. [DOI] [PMC free article] [PubMed] [Google Scholar]
25. Peng J., Sun B.F., Chen C.Y., Zhou J.Y., Chen Y.S., Chen H., Liu L., Huang D., Jiang J., Cui G.S.et al.. Single-cell RNA-seq highlights intra-tumoralheterogeneity and malignant progression in pancreatic ductal adenocarcinoma. Cell Res. 2019; 29:725–738. [DOI] [PMC free article] [PubMed] [Google Scholar]
26. Moncada R., Barkley D., Wagner F., Chiodin M., Devlin J.C., Baron M., Hajdu C.H., Simeone D.M., Yanai I.. Integrating microarray-based spatial transcriptomics and single-cell RNA-seq reveals tissue architecture in pancreatic ductal adenocarcinomas. Nat. Biotechnol. 2020; 38:333–342. [DOI] [PubMed] [Google Scholar]
27. Ji A.L., Rubin A.J., Thrane K., Jiang S., Reynolds D.L., Meyers R.M., Guo M.G., George B.M., Mollbrink A., Bergenstråhle J.et al.. Multimodal Analysis of Composition and Spatial Architecture in Human Squamous Cell Carcinoma. Cell. 2020; 182:497–514. [DOI] [PMC free article] [PubMed] [Google Scholar]
28. Gehart H., Clevers H.. Tales from the crypt: new insights into intestinal stem cells. Nat. Rev. Gastroenterol. Hepatol. 2019; 16:19–34. [DOI] [PubMed] [Google Scholar]
29. Whitsett J.A., Kalin T.V, Xu Y., Kalinichenko V.V. Building and regenerating the lung cell by cell. Physiol. Rev. 2018; 99:513–554. [DOI] [PMC free article] [PubMed] [Google Scholar]
30. M. L.S., Adam W., C. E.S. Structure and function of the immune system in the spleen. Sci. Immunol. 2019; 4:eaau6085. [DOI] [PMC free article] [PubMed] [Google Scholar]
31. Gunderson A.J., Kaneda M.M., Tsujikawa T., Nguyen A.V, Affara N.I., Ruffell B., Gorjestani S., Liudahl S.M., Truitt M., Olson P.et al.. Bruton tyrosine kinase–dependent immune cell cross-talk drives pancreas cancer. Cancer Discov. 2016; 6:270–285. [DOI] [PMC free article] [PubMed] [Google Scholar]
32. Moncada R., Barkley D., Wagner F., Chiodin M., Devlin J.C., Baron M., Hajdu C.H., Simeone D.M., Yanai I.. Integrating microarray-based spatial transcriptomics and single-cell RNA-seq reveals tissue architecture in pancreatic ductal adenocarcinomas. Nat. Biotechnol. 2020; 38:333–342. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

gkac333_Supplemental_Files

Click here for additional data file.^{(4.4MB, zip)}

Data Availability Statement

The Neighbor-seq resource and user-friendly documentations are freely available on Github at https://github.com/sjdlabgroup/Neighborseq.

[B1] 1. Armingol E., Officer A., Harismendy O., Lewis N.E.. Deciphering cell–cell interactions and communication from gene expression. Nat. Rev. Genet. 2021; 22:71–88. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B2] 2. Regev A., Teichmann S.A., Lander E.S., Amit I., Benoist C., Birney E., Bodenmiller B., Campbell P., Carninci P., Clatworthy M.et al.. Science forum: the human cell atlas. Elife. 2017; 6:e27041. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B3] 3. Asp M., Bergenstråhle J., Lundeberg J.. Spatially resolved transcriptomes—next generation tools for tissue exploration. BioEssays. 2020; 42:1900221. [DOI] [PubMed] [Google Scholar]

[B4] 4. Crosetto N., Bienko M., van Oudenaarden A.. Spatially resolved transcriptomics and beyond. Nat. Rev. Genet. 2015; 16:57–66. [DOI] [PubMed] [Google Scholar]

[B5] 5. Manco R., Averbukh I., Porat Z., Halpern K.B., Amit I., Itzkovitz S.. Clump sequencing exposes the spatial expression programs of intestinal secretory cells. Nat. Commun. 2021; 12:3074. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B6] 6. Andrews N., Serviss J.T., Geyer N., Andersson A.B., Dzwonkowska E., Šutevski I., Heijboer R., Baryawno N., Gerling M., Enge M.. An unsupervised method for physical cell interaction profiling of complex tissues. Nat. Methods. 2021; 18:912–920. [DOI] [PubMed] [Google Scholar]

[B7] 7. Boisset J.-C., Vivié J., Grün D., Muraro M.J., Lyubimova A., van Oudenaarden A.. Mapping the physical network of cellular interactions. Nat. Methods. 2018; 15:547–553. [DOI] [PubMed] [Google Scholar]

[B8] 8. Giladi A., Cohen M., Medaglia C., Baran Y., Li B., Zada M., Bost P., Blecher-Gonen R., Salame T.-M., Mayer J.U.et al.. Dissecting cellular crosstalk by sequencing physically interacting cells. Nat. Biotechnol. 2020; 38:629–637. [DOI] [PubMed] [Google Scholar]

[B9] 9. Wolock S.L., Lopez R., Klein A.M.. Scrublet: computational identification of cell doublets in single-cell transcriptomic data. Cell Syst. 2019; 8:281–291. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B10] 10. Xi N.M., Li J.J.. Benchmarking computational doublet-detection methods for single-cell RNA sequencing data. Cell Syst. 2021; 12:176–194. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B11] 11. Stuart T., Butler A., Hoffman P., Stoeckius M., Smibert P., Satija R., Stuart T., Butler A., Hoffman P., Hafemeister C.et al.. Comprehensive integration of single-cell data. Cell. 2019; 177:1888–1902. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B12] 12. Liaw A., Wiener M.. Classification and regression by randomForest. R News. 2002; 2:18–22. [Google Scholar]

[B13] 13. Chen T., Guestrin C.. XGBoost: a scalable tree boosting system. 2016; arXiv doi:10 June 2016, preprint: not peer reviewedhttps://arxiv.org/abs/1603.02754.

[B14] 14. Csardi G., Nepusz T.. The igraph software package for complex network research. InterJ. Complex Syst. 2006; 1695:1696. [Google Scholar]

[B15] 15. Ramilowski J.A., Goldberg T., Harshbarger J., Kloppmann E., Lizio M., Satagopam V.P., Itoh M., Kawaji H., Carninci P., Rost B.et al.. A draft network of ligand–receptor-mediated multicellular signalling in human. Nat. Commun. 2015; 6:7866. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B16] 16. Shao X., Liao J., Li C., Lu X., Cheng J., Fan X.. CellTalkDB: a manually curated database of ligand–receptor interactions in humans and mice. Brief. Bioinform. 2021; 22:bbaa269. [DOI] [PubMed] [Google Scholar]

[B17] 17. Lun A.T.L., McCarthy D.J., Marioni J.C.. A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor. F1000Research. 2016; 5:2122. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B18] 18. Bais A.S., Kostka D.. scds: computational annotation of doublets in single-cell RNA sequencing data. Bioinformatics. 2020; 36:1150–1158. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B19] 19. Bernstein N.J., Fong N.L., Lam I., Roy M.A., Hendrickson D.G., Kelley D.R.. Solo: doublet identification in single-cell RNA-Seq via semi-supervised deep learning. Cell Syst. 2020; 11:95–101. [DOI] [PubMed] [Google Scholar]

[B20] 20. McGinnis C.S., Murrow L.M., Gartner Z.J.. DoubletFinder: doublet detection in single-cell RNA sequencing data using artificial nearest neighbors. Cell Syst. 2019; 8:329–337. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B21] 21. Haber A.L., Biton M., Rogel N., Herbst R.H., Shekhar K., Smillie C., Burgin G., Delorey T.M., Howitt M.R., Katz Y.et al.. A single-cell survey of the small intestinal epithelium. Nature. 2017; 551:333–339. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B22] 22. Travaglini K.J., Nabhan A.N., Penland L., Sinha R., Gillich A., Sit R.V, Chang S., Conley S.D., Mori Y., Seita J.et al.. A molecular cell atlas of the human lung from single-cell RNA sequencing. Nature. 2020; 587:619–625. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B23] 23. Madissoon E., Wilbrey-Clark A., Miragaia R.J., Saeb-Parsy K., Mahbubani K.T., Georgakopoulos N., Harding P., Polanski K., Huang N., Nowicki-Osuch K.et al.. scRNA-seq assessment of the human lung, spleen, and esophagus tissue stability after cold preservation. Genome Biol. 2019; 21:1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B24] 24. Schaum N., Karkanias J., Neff N.F., May A.P., Quake S.R., Wyss-Coray T., Darmanis S., Batson J., Botvinnik O., Chen M.B.et al.. Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature. 2018; 562:367–372. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B25] 25. Peng J., Sun B.F., Chen C.Y., Zhou J.Y., Chen Y.S., Chen H., Liu L., Huang D., Jiang J., Cui G.S.et al.. Single-cell RNA-seq highlights intra-tumoralheterogeneity and malignant progression in pancreatic ductal adenocarcinoma. Cell Res. 2019; 29:725–738. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B26] 26. Moncada R., Barkley D., Wagner F., Chiodin M., Devlin J.C., Baron M., Hajdu C.H., Simeone D.M., Yanai I.. Integrating microarray-based spatial transcriptomics and single-cell RNA-seq reveals tissue architecture in pancreatic ductal adenocarcinomas. Nat. Biotechnol. 2020; 38:333–342. [DOI] [PubMed] [Google Scholar]

[B27] 27. Ji A.L., Rubin A.J., Thrane K., Jiang S., Reynolds D.L., Meyers R.M., Guo M.G., George B.M., Mollbrink A., Bergenstråhle J.et al.. Multimodal Analysis of Composition and Spatial Architecture in Human Squamous Cell Carcinoma. Cell. 2020; 182:497–514. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B28] 28. Gehart H., Clevers H.. Tales from the crypt: new insights into intestinal stem cells. Nat. Rev. Gastroenterol. Hepatol. 2019; 16:19–34. [DOI] [PubMed] [Google Scholar]

[B29] 29. Whitsett J.A., Kalin T.V, Xu Y., Kalinichenko V.V. Building and regenerating the lung cell by cell. Physiol. Rev. 2018; 99:513–554. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B30] 30. M. L.S., Adam W., C. E.S. Structure and function of the immune system in the spleen. Sci. Immunol. 2019; 4:eaau6085. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B31] 31. Gunderson A.J., Kaneda M.M., Tsujikawa T., Nguyen A.V, Affara N.I., Ruffell B., Gorjestani S., Liudahl S.M., Truitt M., Olson P.et al.. Bruton tyrosine kinase–dependent immune cell cross-talk drives pancreas cancer. Cancer Discov. 2016; 6:270–285. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B32] 32. Moncada R., Barkley D., Wagner F., Chiodin M., Devlin J.C., Baron M., Hajdu C.H., Simeone D.M., Yanai I.. Integrating microarray-based spatial transcriptomics and single-cell RNA-seq reveals tissue architecture in pancreatic ductal adenocarcinomas. Nat. Biotechnol. 2020; 38:333–342. [DOI] [PubMed] [Google Scholar]

PERMALINK

Reconstructing physical cell interaction networks from single-cell data using Neighbor-seq

Bassel Ghaddar

Subhajyoti De

Abstract

Graphical Abstract

Graphical Abstract.

INTRODUCTION

Figure 1.

MATERIALS AND METHODS

Neighbor-seq algorithm

Input and cell type identification

Random forest classifier training

Determining significantly enriched cell–cell interactions

Construction of a cell–cell interactome network and transcriptomic analysis

Benchmarking, robustness and reproducibility

Comparative assessment with other doublet finding methods

Ensemble prediction

Evaluation of ligand–receptor co-expression in cell neighbors

Identifying cell–cell interactions in the small intestine

Identifying cell–cell interactions in the lung

Identifying cell–cell interactions in the spleen

Identifying cell–cell interactions in pancreatic cancer

Identifying cell–cell interactions in skin cancer

RESULTS

Benchmarking and validation with known multiplets

Identifying cell–cell interactions in the small intestine

Figure 2.

Identifying cell–cell interactions in the lung

Figure 3.

Identifying cell–cell interactions in the spleen

Figure 4.

Identifying cell–cell interactions in pancreatic cancer

Figure 5.

Identifying cell–cell interactions in skin cancer

Figure 6.

DISCUSSION

DATA AVAILABILITY

Supplementary Material

ACKNOWLEDGEMENTS

Contributor Information

SUPPLEMENTARY DATA

FUNDING

REFERENCES

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases