Skip to main content
Nature Communications logoLink to Nature Communications
. 2022 Jun 9;13:3224. doi: 10.1038/s41467-022-30755-0

Comparison of methods and resources for cell-cell communication inference from single-cell RNA-Seq data

Daniel Dimitrov 1, Dénes Türei 1, Martin Garrido-Rodriguez 1, Paul L Burmedi 1, James S Nagai 2,3, Charlotte Boys 1, Ricardo O Ramirez Flores 1, Hyojin Kim 1, Bence Szalai 4, Ivan G Costa 2,3, Alberto Valdeolivas 5,#, Aurélien Dugourd 1,#, Julio Saez-Rodriguez 1,
PMCID: PMC9184522  PMID: 35680885

Abstract

The growing availability of single-cell data, especially transcriptomics, has sparked an increased interest in the inference of cell-cell communication. Many computational tools were developed for this purpose. Each of them consists of a resource of intercellular interactions prior knowledge and a method to predict potential cell-cell communication events. Yet the impact of the choice of resource and method on the resulting predictions is largely unknown. To shed light on this, we systematically compare 16 cell-cell communication inference resources and 7 methods, plus the consensus between the methods’ predictions. Among the resources, we find few unique interactions, a varying degree of overlap, and an uneven coverage of specific pathways and tissue-enriched proteins. We then examine all possible combinations of methods and resources and show that both strongly influence the predicted intercellular interactions. Finally, we assess the agreement of cell-cell communication methods with spatial colocalisation, cytokine activities, and receptor protein abundance and find that predictions are generally coherent with those data modalities. To facilitate the use of the methods and resources described in this work, we provide LIANA, a LIgand-receptor ANalysis frAmework as an open-source interface to all the resources and methods.

Subject terms: Cellular signalling networks, Computational platforms and environments, Computational models, Cancer genomics


Multiple methods to infer cell-cell communication (CCC) from single cell data are currently available. Here, the authors systematically compare 16 CCC inference resources and 7 methods, and develop the LIANA framework as an interface to use and compare all these approaches.

Introduction

Single-cell RNA sequencing (scRNA-Seq) data has become a driving force in the analysis of the cellular heterogeneity of tissues. Furthermore, Spatial Transcriptomics has recently emerged as a technology to measure gene expression while preserving the spatial distribution of cells in a sample, thus providing an unprecedented opportunity to decipher tissue architecture1. These advancements have in turn led to an increased interest in the development of tools for cell-cell communication (CCC) inference. CCC events are essential for homeostasis, development, and disease, and their estimation is becoming a routine approach in scRNA-seq data analysis2. CCC commonly refers to interactions between secreted ligands and plasma membrane receptors. This picture can be broadened to include secreted enzymes, extracellular matrix proteins, transporters, and interactions that require the physical contact between cells, such as cell-cell adhesion proteins and gap junctions3. For simplicity, we refer to all of these events involving protein-protein interactions as CCC.

A number of computational tools and resources have emerged that can be further classified as those that predict CCC interactions alone417, and those that additionally estimate intracellular activities related to CCC1824. Here, we focus on the former (Table 1). These CCC tools typically use gene expression information obtained by scRNA-Seq. In general, single cells are clustered by their gene expression profile and cell type identities are assigned to the clusters based on known gene markers. Then, CCC tools can predict intercellular crosstalk between any pair of clusters, one cluster being the source and the other the target of a CCC event. CCC events are thus typically represented as a one-to-one interaction between a transmitter and receiver protein, accordingly expressed by the source and target cell clusters. The information about which transmitter binds to which receiver is extracted from diverse sources of prior knowledge. Roughly, CCC tools then estimate the likelihood of crosstalk based on the expression level of the transmitter and the receiver in the source and target clusters, respectively. Every tool has two major components: a resource of prior knowledge on CCC (interactions), and a method to estimate CCC from the known interactions and the dataset at hand. Most tools have been published as the combination of one resource and one method, but in principle any resource could be combined with any method.

Table 1.

Tools included in the framework.

Tool/Method Resource Methods’ scoring systems
CellChat#14 CellChatDB

(1) Probability—based on the expression of differentially expressed transmitter and receiver genes and their mediators, calculated with the law of mass action

(2) P-values†—significance identified via permutation of cell cluster labels and recalculating the probabilities for each cell pair and each transmitter-receiver interaction

CellPhoneDBv2#8 CellPhoneDB

(1) Truncated Mean—average expression of transmitter and receivers, the minimum expression (by default) of heteromeric complex of subunits

(2) P-values†—significance identified via permutation of cell cluster labels to determine a null distribution of means for each receiver-transmitter interaction

Connectome10 Ramilowski

(1) weight_norm—derived via the product (by default) of the normalised expression of transmitter and receiver genes

(2) weight_scale†—derived from a function (mean, by default) of the z-scores of the transmitter and the receiver, scaled according to cell cluster specificity

Crosstalk scores - (1) Crosstalk score†—Cytotalk-inspired22 crosstalk scores were derived from the expression of transmitters and receivers, weighted by the likelihood of autocrine signalling between the source and target cell types.
logFC Mean - (1) logFC Mean†—iTALK-inspired6 logFC means, derived using the mean of the logged one-versus-all fold change of receiver and transmitter gene expression
NATMI11 ConnectomeDB

(1) Mean-expression edge weight—transmitter and receiver gene expression product

(2) Specificity-based edge weight†—the mean expression of the transmitter and receiver are divided by the sum of the means of the same transmitters/receivers across all cell clusters

SingleCellSignalR#12 LRdb (1) LRscore—a regularised score calculated using the squared expression of the transmitter and receiver (sqTRE) divided by sum of the mean of the count matrix and sqTRE.
Consensus - (1) Robust Rank Aggregate65—preferentially highly-ranked interactions are obtained from a distribution generated from the interaction rankings of other methods

Each method considers expression at the cell cluster level, and all of the scoring systems presented here use the expression of transmitters and receiver genes in the source and target cells, respectively. In addition to the seven methods, we included their consensus.

In bold are the names of cell-cell communication inference methods and their scoring functions.

Dagger (†): Explicitly incorporates communicating cell-pair specificity in interaction predictions

Hashtag (#): CellPhoneDB, CellChat, and SingleCellSignalR provide explicit thresholds to control for false positive interaction predictions. In the case of the former two, these are permutation-based p-values, whereas SingleCellSignalR’s LRscore has a suggested threshold of 0.5.

Methods that additionally infer intracellular processes, such as NicheNet19, Cytotalk22, and SoptSC20 are not directly comparable but instead provide complementary analyses.

Despite the aforementioned common premises to explore CCC events, each tool uses a different method, such as permutation of cluster labels, regularisations, and scaling, to prioritise interactions according to the input datasets (Table 1). In turn, these different approaches result in diverse scoring systems that are challenging to compare and evaluate. The difficulties are further exacerbated by the lack of an appropriate gold standard to benchmark the performance of CCC methods2,25. Nevertheless, different strategies have been used to indirectly evaluate the methods’ performance, including a presumed correlation between CCC predictions and spatial adjacency14,22, recovering the effect of receptor gene knockouts22, robustness to subsampling14, agreement with proteomics12, simulated scRNA-Seq data9, and the agreement among methods10,12,14,22.

The available prior knowledge resources, largely composed of ligand-receptor, extracellular matrix, and adhesion interactions, are typically distinct but often show partial overlap3,26. Some of these resources also provide additional details for the interactions such as information about subcellular localisation3,14, classification into signalling pathways and categories14,27 (Supplementary Table 1). Notably, some resources3,8,14,27,28 (Supplementary Table 1), and consequently their corresponding methods, focus on protein complexes as the functional units of CCC, which are crucial for the coordination of signalling as different subunit combinations may induce distinct responses8. Despite the fact that CCC inference is constrained by the prior knowledge used, yet the impact of resource choice is largely unexplored, with the exception of a descriptive comparison of 4 resources with one method26. Thus, it remains unclear how the choice of resource and method affects the results and thereby the biological interpretation of the scRNA-seq data.

In this work, we systematically compared all combinations of 16 resources and 7 CCC methods, plus their consensus (Fig. 1). First, we explored the degree of overlap among resources and whether certain resources are biased toward specific biological terms, such as pathways and tissue-enriched proteins. Then, we analysed how different combinations of resources and methods influence CCC inference by decoupling the methods from their corresponding resources and applying all method-resource combinations on six different datasets. Finally we evaluated the agreement of the different CCC methods with additional modalities, including spatial adjacency, cytokine activities, and protein abundance. All results were generated using LIANA—a LIgand-receptor ANalysis frAmework (Fig. 1; available at https://github.com/saezlab/liana).

Fig. 1. LIANA—a LIgand-receptor ANalysis frAmework.

Fig. 1

LIANA takes any annotated single-cell RNA (scRNA) dataset as input and establishes a common interface to all the resources and methods in any combination. LIANA also provides a consensus ranking for the method’s predictions.

Results

Resource uniqueness and overlap

To investigate the lineages of CCC resources, we manually gathered information about the origins of every resource. Many of these resources share the same original data sources, including general biological databases such as KEGG29,30, Reactome31, and STRING32 (Fig. 2). Moreover, interactions from the Guide to Pharmacology33, CellPhoneDB8, HMPR34, and in particular Ramilowski (FANTOM5)35, which are manually curated, were commonly incorporated into subsequently published resources (Fig. 2; Supplementary Table 2). All the resources included in this analysis are integrated into OmniPath’s CCC resource3, along with additional CCC interactions from other sources (e.g. SIGNOR36, Adhesome37, SignaLink38). A part of the OmniPath CCC resource, also referred to as ‘OmniPath’ and used in this work, was filtered by curation and protein localisation quality (“Processing of CCC resources” Methods).

Fig. 2. Dependencies and overlap between CCC resources.

Fig. 2

The lineages of CCC interaction database knowledge. General biological knowledge databases (blue), CCC-dedicated resources (magenta), manual literature curation effort (yellow), additional resources included in iTALK (cyan), and OmniPath (green). Arrows show the data transfers between resources. The yus symbol (Ѫ) indicates the manual-curation of resources, defined by explicitly mentioning that these resources are ‘manually’ or ‘expert’ curated. The asterisk (Inline graphic) indicates that the resource was included in the analyses presented here.

As a consequence of their common origins, we noted limited uniqueness across the resources, with mean percentages of 6.4% unique receivers, 5.7% unique transmitters, and 10.4% unique interactions (Fig. 3A; Supplementary Table 1). One notable exception was Cellinker’s resource16, as 39.3% of its interactions were not present in any other resource. Despite the fact that few components were unique to any given resource, the pairwise overlap between the resources varied and was often limited (Fig. 3B; Supplementary. Fig. S1). Yet, high similarity was observed between CellTalkDB26, ConnectomeDB11, iTALK6, LRdb12, and Ramilowski (Fig. 3B). Each of these resources, together with OmniPath and Cellinker, contained an average of at least 60% of the interactions present in other resources, largely explained by each containing a large proportion (>80%) of the interactions present in Ramilowski (Supplementary Fig. S2). Baccin28, CellPhoneDB, CellChatDB, and EMBRACE showed limited similarity with other resources, as each included on average ~40–50% of the interactions present in any other resource. These latter resources, except EMBRACE, include protein complexes, which were dissociated and treated as distinct protein subunits in our resource analyses. The relatively smaller resources CellCall23, ICELLNET13, Guide to Pharmacology, HMPR and Kirouac201039 were the most dissimilar from the remainder. Finally, the similarity among the resources was generally higher when considering transmitters and receivers (Supplementary Figs. S1, 2), rather than the interaction themselves, suggesting that different resources account for different interactions between the same proteins.

Fig. 3. Cell-cell communication resources—uniqueness and overlap.

Fig. 3

A Shared and unique Interactions, Receivers and Transmitters for each resource. B Similarity between the different resources based on the interactions (Jaccard Index). Source data are provided as a Source Data file.

Resource prior knowledge bias

Since CCC inference relies heavily on prior knowledge to estimate intercellular communication events, the choice of resource and any potential bias in it is expected to impact the results. We therefore explored whether the coverage of each CCC resource, when compared to the collection of all resources, is biased toward specific functional categories, tissue-enriched proteins, disease-associated genes, or subcellular locations.

To examine whether specific pathways and biological functions are unevenly represented in CCC resources, we matched the interactions, receivers and transmitters from each resource to well-known pathways and functional categories from SignaLink38, NetPath40, and CancerSEA41 (“Descriptive analysis of resources” Methods) and compared the resulting distributions across 16 CCC-dedicated resources (Supplementary Table 2).

The Receptor tyrosine kinase (RTK), JAK/STAT, TGF, WNT, and Notch pathways covered the largest proportions of interactions matched to SignaLink (Fig. 4A), with analogous results observed for receivers and transmitters (Supplementary Fig. S3). The interactions from Ramilowski, ConnectomeDB, CellTalkDB, LRdb, and iTALK showed a highly similar patterns, explained by the high overlap of these resources, with all of them showing significant underrepresentation of the T cell receptor pathway (Fig. 4B). A more pronounced underrepresentation of the same pathway was observed in Guide to Pharmacology, ICELLNET, CellPhoneDB, CellCall, CellChatDB, HMPR, Baccin2019, EMBRACE, and Kirouac2010. On the contrary, the T-cell receptor pathway was significantly overrepresented in OmniPath and Cellinker. When we used NetPath instead of SignaLink to define the T-cell receptor pathway, we also observed underrepresentation in HMPR, CellCall, EMBRACE, and Kirouac2010 and overrepresentation in OmniPath (Supplementary Fig. S4A). Moreover, the Signalink WNT pathway was underrepresented in Guide to Pharmacology, ICELLNET CellPhoneDB, HMPR, and Kirouac2010, and on the contrary overrepresented in CellCall. We saw similar results when using NetPath’s WNT pathway (Supplementary Fig. S4A). We also observed uneven representations across the resources, in particular for the Hedgehog, Notch, and Innate Immune pathways (Fig. 4A; Supplementary Fig. S4A).

Fig. 4. Representation of functional categories in CCC resources.

Fig. 4

CCC resources distributions in terms of number of interactions (A) and relative abundance (B) matched to the SignaLink database. Relative abundance of interactions categorised by (C) CancerSEA’s cancer-related gene sets, and (D) organ-enriched proteins from the Human Protein Atlas (HPA). Fisher’s exact test was used to estimate the differentially-represented categories. Differentially represented (absolute(log2(Odds ratio)) >1) categories were marked according to FDR-corrected p-values =<  0.05 (diamond, ♢), 0.01 (triangle, △), and 0.001 (8-pointed asterisk; ❋). Source data are provided as a Source Data file.

We then matched interactions to cancer-related gene sets from CancerSEA41, which were also unevenly represented. For example, interactions from the CellPhoneDB resource were overrepresented in gene sets associated with inflammation, proliferation, and quiescence (Fig. 4C; Supplementary Fig. S5). Gene sets associated with epithelial-mesenchymal transition were underrepresented in CellPhoneDB, Guide to Pharmacology, CellCall, ICELLNET, and Kirouac2010. This observation was further supported by the underrepresentation of direct-contact signalling in the latter two resources (See Supplementary Note 1; Supplementary Fig. S6).

We also examined the coverage of tissue-enriched proteins and disease markers from the Human Protein Atlas42 and DisGeNet43, respectively. Organ-enriched proteins were largely uniformly distributed across the CCC resources, with some exceptions, such as organ-associated proteins from the Breast, Bone Marrow, Lymph Nodes, and the Hypothalamus (Fig. 4D; Supplementary Figs. S7S8). Similarly, tissue-enriched proteins were generally distributed evenly across most CCC resources, with some exceptions including the underrepresentation of interactions associated with cardiomyocyte proteins in ICELLNET and Kirouac2010, as well as the overrepresentation of proteins associated with Glial cells in Guide2Pharma (Supplementary Figs. S9, S10).

Finally, no differentially represented disease markers were noted in any of the CCC resources (Supplementary Fig. S11).

In summary, our results indicated biases towards certain pathways, functional categories, and tissue-enriched proteins across the different CCC resources, implying that resource choice can influence the functional interpretation of CCC predictions.

Using LIANA to systematically compare CCC predictions

To estimate the relative agreement between CCC methods and the importance of the resources, we built LIANA—a framework to decouple tools from their inbuilt resources. LIANA enabled us to combine the 16 CCC resources detailed in the descriptive resource analysis above (Supplementary Table 2), with 7 CCC methods used to prioritise ligand-receptor interactions from scRNA-Seq data (Table 1). We then predicted the interactions from all possible method-resource combinations for 6 single-cell RNA datasets from three different subtypes of breast cancer44, cord blood mononucleated cells45, Pancreatic Islets46, and colorectal cancer47 (Methods “Data availability”).

We first looked at the overlap between the 1000 highest ranked interactions predicted for every method-resource combination. Whenever available, we used the recommended scoring functions (Supplementary Table 3), each tailored for predicting relevant interactions. We found consistently low overlap in the top predicted interactions when using either different methods or different resources (Fig. 5). The median pairwise Jaccard index when using different methods ranged from 0.045 to 0.112 across datasets (median = 0.080) (Fig. 5A). The overlap when using different resources was slightly higher, as the median pairwise Jaccard index ranged from 0.085 to 0.132 (median = 0.119) (Fig. 5B). We found similar results when considering the top 1% predicted interactions instead of the top 1000 (Supplementary Fig. S12; Supplementary Note 2). These analyses revealed substantial discrepancies in the highest-ranked predicted interactions by the different methods under study.

Fig. 5. Overlap of predictions using any combination of CCC methods and resources.

Fig. 5

Overlap (Jaccard index) in the 1000 highest ranked (A) when using the same Resource with different Methods (Blue; n = 7) and (B) when using the same Method with different Resources (Red; n = 16). Boxplots represent the median pairwise jaccard index with hinges showing the first and third quartiles and whiskers extending 1.5 above and below the interquartile range. The dashed lines represent the median when using different resources (red) and methods (blue); the lines overlap for the CMBCs dataset. Source data are provided as a Source Data file.

These discrepancies reflect the diverse nature of the scoring systems used to prioritise interactions of interest, and in particular, the different approaches used to assign communication cell cluster pair specificity to the interactions (marked with a dagger () in Table 1; used by all methods except SingleCellSignalR). The low overlap between the results of the different methods was also reflected by dissimilarities in the relative importances assigned to different cell types (See Supplementary Note 2).

On the other hand, the low overlap between the highest ranking interactions using different resources was largely expected due to the limited overlap between the CCC resources as described in “Resource Uniqueness and Overlap”.

Taken together, our results suggest that both the choice of method and the resource can have a considerable impact on the predicted interactions.

Robustness to noise in resources and data

We then analysed the sensitivity of the methods to the addition of noise in the data and resource (“Robustness analyses” Methods). We found that most were fairly robust to subsampling of the total number of cells (Supplementary Fig. S13A), while erroneous annotation of cell types had a stronger effect, highlighting the importance of preprocessing and proper cluster annotation (Supplementary Fig. S13B). The methods were also adequately robust to the selective replacement of original canonical resources interactions with spurious putatively false interactions (“Robustness analyses” Methods), in which highest ranked interactions for each method were preserved (Supplementary Fig. S13C). The non-selective replacement of interactions, meant to simulate the change of resource (Supplementary Fig. S13D), had a strong effect on all methods, reflecting the low overlap when using different resources observed in the overlap analysis above.

Overall, our analysis showed that all methods, especially CellChat, CellPhoneDB, and SingleCellSignalR, were fairly robust to noise in both the data and the resource.

Association between CCC predictions and cytokine expression signatures

Next, given the lack of a ground truth, we used other data modalities to indirectly evaluate the methods using OmniPath, the resource with the largest coverage.

First, we noted that all methods appropriately detect specifically-expressed receptor proteins across seven CITE-seq datasets (See Supplementary Note 3). Since protein levels of receptors do not necessarily imply activity, we evaluated the methods’ agreement with predicted cytokine activities using 43 cytokine expression signatures48 on two datasets coming from two subtypes of breast cancer44 (Methods “Agreement with cytokine signatures”). To show the association between CCC predictions and cytokine activities, we calculated the odds ratios between preferentially ranked interactions and positively enriched cytokines across a range of ranks. We found generally positive trends between cytokine activities and the most prioritised CCC interactions across all methods. The observed trends largely converged toward the random baseline as the number of considered interactions increased (Fig. 6A). Connectome, the Crosstalk scores, and NATMI showed a consistent trend across both datasets, while SingleCellSignalR, logFC Mean, CellChat, CellPhoneDB, and the consensus of the methods (Table 1) showed negative or lack of signal for the higher ranks of the HER2 + dataset (Fig. 6A; Supplementary Fig. S14). Notably, a high agreement with cytokine activities was observed for CellChat and CellPhoneDB in the HER2 + dataset, when considering all of their predictions subsequent to false-positive filtering (vertical line in Fig. 6A), highlighting the value of the false-positive control steps of these methods.

Fig. 6. Agreement of CCC predictions with other modalities.

Fig. 6

Odds ratios of (A) active cytokines and (B) colocalized cell types among the highest ranked interaction predictions, across a ranked range between 100 and 10,000. Odds ratios representing the association of preferentially ranked CCC predictions and (A) cytokine activities and (B) spatial adjacencies were calculated using Fisher’s exact test. Asterisk (*): Consensus represents the aggregated ranks of all interactions predicted by all the methods. Dashed horizontal line is the baseline represented by an odds ratio of 1. The dashed vertical lines represent the truncated ranges of CellChat, CellPhoneDB, and LogFC Mean, arising from their relatively stricter preprocessing steps. Source data are provided as a Source Data file.

These results suggest that the interactions identified as relevant by all methods were largely concordant with cytokine activities, confirming the agreement of predicted CCC interactions with downstream signalling events.

Enrichment of predicted interactions between spatially adjacent cell types

Next, we leveraged spatial information as a way to support the methods’ predictive potential, under the assumption that, while many other factors are involved, colocalized cell populations are expected to have a higher chance to interact with each other than other non-adjacent cell types14,22,49,50. That is, the highest ranked interactions predicted between various cell populations are expected to be positively associated in interactions between pairs of adjacent cell types (Methods “Agreement with spatially adjacent cell types”).

We used the spatial mapping information from eight 10× Visium slides (see Methods), corresponding to a murine brain cortex51 and triple negative breast cancer44 datasets, to identify the colocalized cell types in the tissues. We observed a positive trend of increased colocalisation of cell types in Visium and prioritisation of CCC interactions in the scRNA datasets (Fig. 6B). This trend was particularly consistent for the well-structured, murine brain cortex dataset, where all methods, except the Crosstalk scores, showed an association between cell type spatial adjacency and CCC predictions, with Connectome, LogFC Mean, and the consensus displaying the most positive associations. In the case of the triple negative breast cancer dataset, only the predictions by the consensus and LogFC Mean showed a consistent, positive association with spatial adjacency (Fig. 6B).

We conducted a similar analysis with seqFISH52 and merFISH53 datasets (“Agreement with spatially adjacent cell types” Methods). In this case, we made use of the single-cell resolutions of these datasets to identify both the spatially adjacent cell types and to obtain the interaction predictions. For the seqFISH dataset, we found a clear association between the predicted CCC interactions and the spatial adjacency of their corresponding cell-types for NATMI, and moderate associations for logFC Mean and Connectome, while the other methods showed inconsistent trends or lack of signal (Supplementary Fig. S15). There was no trend in the merFISH dataset, likely due to the lower gene space of that dataset (Supplementary Fig. S15).

In summary, our results showed a positive association of interactions predicted by most methods and spatially-adjacent cell types in the well-structured brain cortex, while the associations were less consistent in the breast cancer subtypes. This positive association suggests that, despite the dissociation of single-cells and their grouping into cell types, CCC predictions partly reflect the expression patterns encoded by tissue spatial context.

Discussion

The growing interest in CCC inference has led to the recent emergence of a number of methods and prior knowledge resources. To shed light on the impact of the choice of method and resource on the inference of CCC events, we built a framework to systematically combine and compare 16 resources and 7 methods, plus their consensus. We used this framework to explore in detail the content of the different resources, to compare the predictions on six different datasets when using all combinations of methods and resources, and to assess the agreement of the methods with other data modalities. Our results suggest that both the method and resource can considerably impact CCC inference predictions, and that most methods generally capture the biological signals from other data modalities.

Resource overlap and bias

Despite their largely common origins, different resources covered varying proportions of the collective prior knowledge. A large share of the observed overlap among resources was a result of the frequent inclusion of certain resources8,31,33,34, particularly Ramilowski et al. 35.

When inspecting the relative compositions of the resources, we noted biases towards certain organ- or tissue-enriched proteins and functional terms. Some resources are predominantly manually-curated8,11,16,26,27,54, while others6,12,28,55 are composites which also import non-curated interactions. Thus, this suggests a quality-coverage trade-off, as is commonly the case for biological prior knowledge. Of note, the literature-support reported by different authors for the same resources do not always agree23,26, suggesting different interpretations of what defines a curated interaction.

These findings highlight an inherent limitation of knowledge-based inference, as any prior knowledge resource has its own biases and only represents a limited proportion of biology. Taken together, the variable overlap between the resources, their uneven functional distributions, and the reported curation disagreements are a call for further large-scale curation efforts.

Impact of methods and resources

Our systematic analysis using different combinations of resources and methods revealed that both had a considerable effect on the predictions. In the case of the resources, the disagreements were largely expected as a consequence of their varying overlap. However, this was not necessarily the case for the methods, given their conceptually common aim, similar assumptions, and previously reported agreement among some of them1012,14.

A major reason for the low overlap between the methods was their distinct approaches to identify the most relevant interactions. Hence, the common practice of using the number of interactions reported between two cell types as a proxy for their communication intensity is likely biased by the choice of CCC inference method. Reassuringly, our robustness analyses highlighted that the methods are fairly robust to cluster subsampling, as well as the introduction of noise to both the dataset and the resource. Collectively, these results indicate that while the methods are fairly robust to technical noise, the choice of method and resource is likely to have a major impact on the results. Therefore, downstream analyses and biological interpretation of the predicted ligand-receptor interactions should be considered with caution.

Agreement with other data modalities

Motivated by the observed discrepancies, we supported the methods’ performance using complementary data modalities. We found concordance of the CCC predictions with receptor protein specificity and with cytokine activities estimated from downstream gene-expression signatures48. Of note, the cytokine activities and receptor proteins, presented in this work as an evaluation, could also be used to improve the confidence in predictions56. Similarly, other analyses such as pathway14 or transcription factor activities15,57, as well as other types of cell-communication dedicated methods, including NicheNet19, CytoTalk22, and SoptSC20, could be utilised to provide further confidence in the predicted ligand-receptor interactions.

Furthermore, similarly to previous efforts, we used spatial information to support the methods’ predictions14,22. We saw that most methods prioritise interactions between colocalized cell types, and this was clearer in the well-structured brain cortex than in breast cancer tissue. These results suggest that the performance of the methods depends on the type of tissue, and that, if available, spatial information should be used to inform58,59 or constrain60 the predictions.

Our agreement analyses are based on assumptions that are only approximations of reality. The limitations include the restricted coverage of the cytokine activity signatures and receptor proteins, and the technical shortcomings of current spatial transcriptomics technologies. Furthermore, such benchmarks cannot distinguish simple co-expression from actual CCC events, and do not capture complex relationships between CCC events. Since a gold standard is currently not available and the biological ground truth is largely unknown2,25, our analyses cannot give a definitive answer of what method is best. However, we believe that these results are useful to indirectly support the methods’ predictive potential.

Overall, our results suggest that despite their relatively low agreement, the CCC methods are generally able to capture relevant biological signals, and that leveraging information from additional modalities and analyses could help to refine the predictions.

CCC inference assumptions and limitations

The shared purpose of the methods considered in this work is to predict the most relevant interactions, commonly between a secreted ligand and its receptor, each expressed by a particular cell type. All methods work under the assumption that the expression of a pair of genes at the cell type level is informative of CCC events. Some of the methods such as CellChat14, CellPhoneDB8, and others16,27,28, go a step further by considering heteromeric complexes. Ensuring that all subunits of a protein complex are expressed to consider a cell-cell interaction valid has been shown to reduce false positive predictions, and can thus impact significantly downstream interpretation and validation8,14. CellChat additionally accounts for interaction mediator proteins14. Another common assumption among the CCC methods is that cell-type-specific interactions are more informative than those shared by multiple cell types8,10,11,14. Yet by focusing on the cluster-specific interactions, the predictions may not capture biologically relevant processes that are common between multiple cell types12.

Gene expression provided by scRNA-Seq is typically limited to the cells within the dataset, and hence does not capture long-distance endocrine signalling events. In addition, CCC inference from scRNA-Seq data assumes that gene expression of a transmitter and a receiver is a good proxy for their joint activity, without considering any of the processes preceding transmitter-receiver interactions, including protein translation and processing, secretion, and diffusion2. Furthermore, gene expression is a proxy of protein levels alone, yet recent efforts attempt to capture signalling events mediated by other molecules such as neuro-transmitters15,16. Finally, current methods are limited to single species although some information about interspecies communication can be inferred61,62.

Conclusions

Considerable efforts have been made to develop CCC inference tools and resources, and we expect that further advancements will be key for the systems-level analysis of single-cell data. The popularity of CCC inference is anticipated to increase as spatial transcriptomics1 and single-cell proteomics63 continue their rapid development. We regard the results presented here as steps towards an understanding of the strengths and weaknesses of CCC methods, and LIANA as a framework for their further analysis, benchmark, use and development.

Methods

Processing of CCC resources

The connections between resources shown in the dependency plot were manually gathered from the publications and the web pages of each CCC resource.

OmniPath is a comprehensive knowledge database with more than 100 intracellular and intercellular resources3. The OmniPath intercellular component is a composite resource which contains interactions from all of the CCC dedicated resources compared here, along with some additional resources3. All the CCC resources used in the analyses presented in this work were queried from OmniPath3, with the exception of CellCall which was processed to OmniPath format separately. The contents of the resources are identical to their original formats, apart from minor processing differences (Supplementary Table 2), such as removal of duplicates, updating to the latest gene symbols, or removal of genes lacking reviewed Uniprot IDs. All complex-containing resources were dissociated into individual subunits for the resource-focused analyses presented in this work.

OmniPath’s version used in this work was filtered according to the following criteria: (i) we only retained interactions with literature references, (ii) we kept interactions only where the receiver protein was plasma membrane transmembrane or peripheral according to the 51st consensus percentile of the localisation annotations, and (iii) we only considered interactions between single proteins (interactions between complexes are also available in OmniPath). Tutorials on how to customise OmniPath as well as how to make use of the intracellular functional information available at OmniPath are available at https://saezlab.github.io/liana/. OmniPath’s intra- and intercellular components were both obtained and are both available via the OmnipathR package (https://github.com/saezlab/OmnipathR).

Descriptive analysis of resources

We defined unique and shared interactions, receivers and transmitters between the CCC resources if they could be found in only one or at least two of the resources, respectively.

To identify uneven distributions of transmitters, receivers, and interactions toward biological terms or protein localisations, we used Fisher’s exact test to compare each individual resource to the collection of all the resources. The test p-values were FDR corrected. We performed the analysis using the aforementioned functional annotation databases in 3 distinct categories. For the overrepresentation of interactions, we considered annotations when both the transmitter and receiver were matched to the same category, while annotations matched to transmitters and receivers enrichments were examined independently. We allowed the same protein or interaction to be matched to multiple pathways or functional categories from the same database. Interactions, receivers, and transmitters were independently matched to the 10 pathways from SignaLink38, and the 15 largest categories from CancerSEA41, and NetPath40. The same procedure was also applied to organ- and tissue-enriched proteins from the Human Protein Atlas42, accessible at https://proteinatlas.org, and disease-associated genes from DisGeNet43. Pathology-associated, uncertain, and unsupported proteins with a low/non-representative level of expression were excluded from Human Protein Atlas database, while DisGeNet gene-disease associations were filtered to include only literature-supported associations (GDA Score > = 0.3). Each of the aforementioned general functional annotation databases was obtained via OmniPath and their protein complexes, if present, were also dissociated.

We also obtained protein localisations from OmniPath which collects this information from 20 databases3. Then we kept consensus protein localisations above the 51st percentile. We classified CCC interactions using the localisation combinations of proteins involved in the interactions, which included secreted, plasma membrane peripheral and transmembrane proteins.

Input specifics

For the method-resource comparisons and evaluations, we used Seurat46,64 objects which were converted to the appropriate data format when calling each method. Whenever available, we used the recommended conversion method or wrapper for each method. Log-transformed counts were used when this was not done internally by the method.

The complex-containing interactions, if present in a given resource, were dissociated for the methods which do not take complexes into account, namely the original implementations of NATMI, SingleCellSignalR, and Connectome.

Method specifics

CellChat

CellChat was run using its default settings with 1000 permutations and the gene expression diffusion-based smoothing process was omitted.

CellPhoneDBv2

CellPhoneDB’s algorithm8 was re-implemented in LIANA and used throughout this manuscript with 1000 permutations. Identical to the original implementation, cluster labels were reshuffled and an one-sided empirical p-value was calculated for the interactions with a mean expression higher than random. Only interactions whose transmitter and receiver genes were expressed in at least 10% of the cells were considered, and the subunit with the minimum expression was used for complexes.

Connectome

Connectome was run with its default settings and filtered for differentially expressed genes (p-value < = 0.05), as identified via a Wilcoxon test.

logFC Mean

The LogFC Mean score implemented in LIANA, was inspired by iTALK6, and it represents the average of one-versus-the-rest log2FC expression changes for the transmitter and receiver cell types. The logFC Mean score uses LIANA’s default filtering settings, namely both the transmitter and receiver genes of any interaction evaluated must be expressed in at least 10% of the cells, and it considers the subunit with the minimum expression for complex-containing interactions.

SingleCellSignalR

SingleCellSignalR was run with the processed gene counts, considering differentially expressed genes with a log2 fold change threshold of 1.5 or above, and we filtered LRscores > = 0.5 for the evaluations. The “int.type” parameter was set to “autocrine”. We noted that this option returned both paracrine and autocrine signalling interactions. The source code of SingleCellSignalR was modified to work with external resources (available at https://github.com/saezlab/SingleCellSignalR_v1).

NATMI

NATMI’s implementation is command-line based, thus a system command is invoked via R that calls the NATMI python module and passes the appropriate command line arguments. NATMI was run with its default settings using the processed gene expression matrix, converted from Seurat. The source code of NATMI was modified to be path-agnostic and to work with integers as cluster names (available at https://github.com/saezlab/NATMI).

Crosstalk scores

Crosstalk scores, inspired by CytoTalk22, were implemented in LIANA. CytoTalk’s crosstalk scores are composed of two metrics: the preferential expression measures (PEMs) and the non-self talk scores (NSTs). The first one reflects the specific expression for quantified genes across all the cell types. The latter is defined on the basis of information theoretic measures and quantifies the mutual information (Shannon entropy) for a pair of genes (ligand and receptor) within the same cell type, and is thus designed to penalise autocrine signalling. Once NST and PEM are calculated, the crosstalk score is calculated for each ligand-receptor pair and for each cell type pair as the product of the minmax normalised PEM and NST values. To enable the comparison to the rest of the methods, and in contrast to the crosstalk scores implemented in CytoTalk, we calculated the crosstalk scores by cell type pairs and used the inverse of the non-self-talk scores for autocrine signalling interactions. Moreover, our implementation considers complexes, and interactions with transmitters or receivers with preferential expression measures of 0 are also assigned 0.

Robust-rank aggregate

A consensus rank is generated across all methods using Robust Rank Aggregation65. These aggregated ranks can in turn be interpreted as a probabilistic distribution for interactions that are preferentially highly-ranked. The aggregate ranks are built across the universe of all interaction predictions, after independent filtering by each method. By default, missing interactions are imputed as the max ranks.

Overlap analysis

To compare the overlap between the interactions predicted by each method-resource combination, we kept the 1,000 highest ranked interactions by default, including ties. We also considered the highest ranked 1% of interactions for each method, including ties. We then generated a presence-absence matrix of predicted interactions with method-resource combinations. These matrices were subsequently used to calculate the reported Jaccard indices.

Unless explicitly mentioned, and if available, we used the scoring functions for each method recommended for single-condition interaction predictions (Supplementary Table 3).

Frequencies of interactions per cell type were calculated using the highest ranked hits for each method-resource combination. These frequencies represent the proportion of top predicted interactions (or edges) that stem from or lead to a source or target cell type, respectively. In other words, interaction frequencies represent the relative number of interactions per cell type within the highest ranked 1000 interactions.

The relative interaction strength by cell type was calculated using the regularised scores from each method, i.e. all scoring functions were scaled between 0 and 1. Then the mean regularised score per cell type, categorised as source or target, was divided by the average score of all interactions predicted.

Agreement with other modalities and robustness

All of the comparisons with other modalities were performed using the OmniPath CCC resource. For murine datasets, we converted the OmniPath to murine symbols using the biomaRt package66.

For the binary categorisations used in the agreement with cytokine activity analysis and spatial adjacencies, we performed Fisher’s exact test, sequentially in rank intervals ranging from 100 to 10,000, to obtain the Odds ratios of the positive and negative classes against a background universe. In the case of the spatial adjacency analysis, the background universe contained all predicted interactions, while for the cytokine activities, we only considered those matched to cytokines from CytoSig48.

Agreement with cytokine signatures

CytoSig provides a collection of consensus, data-driven, cytokine-activity signatures compiled using a compendium of transcriptomic profiles48. We used CytoSig’s 43 high-quality signatures to infer which cytokines induce signalling activities in each cell type. We then used this information to assess if a cytokine-receptor interaction reported by the different CCC methods was supported by the corresponding cytokine downstream signalling activities.

We computed the cytokine activity scores for all cell types with the multivariate linear regression model (‘mlm’) method of decoupleR at the pseudobulk level. We chose the mlm method as an approach that models the effect of multiple cytokines and that performed best in a recent footprint-focused analysis benchmark67.

To build the pseudobulk profiles, we log2-transformed the summed counts within each cell type, and kept only genes which were expressed in at least 10% of the cells and with a summed raw count above 5.

In this evaluation, we used both the autocrine and paracrine CCC predictions, calculated using expression counts at the cell-type level for all cell types, from the HER2 + and triple negative breast cancer subtype datasets44. We considered any cytokine signature with a positive score and FDR-corrected p-value = < 0.05 in the target cell types as an active cytokine. We considered all CCC predictions with a ligand corresponding to a CytoSig signature, including the same ligand to multiple receptors, matched to any of the aliases of the cytokines. Odds ratios were then calculated as the ratio between any CCC prediction with corresponding active cytokine in a given receiver cell type, and those assigned to the negative class—i.e. the remainder of the cytokine signatures.

Agreement with spatially adjacent cell types

We used the SPOTlight68 deconvolution method with default parameters to spatially map the cell types present in our scRNAseq datasets into their corresponding 10× Visium slides. SPOTlight provides cell type proportions per spot that were subsequently used to identify colocalized cell types by computing Pearson’s correlation. The Pearson coefficients were scaled to create a distribution of correlations, and only considered the most strongly correlated cell type densities (z-score > = 1.645) as colocalized, while the remainder of the cell pairs were considered as non-colocalised.

The mer- and seqFISH datasets were already annotated and provide single-cell spatial resolution, hence the same dataset was used to obtain CCC predictions and spatial information. To identify the enriched neighbouring cells for each cell type mer- and seqFISH datasets, we used Squidpy’s69 Neighbourhood Enrichment analysis with its default parameters. In accordance with the approach followed with the 10× VISIUM slides, we considered significantly colocalized cell type pairs with a normalised neighbourhood enrichment score > = 1.645 as spatially adjacent.

Agreement with receptor protein abundance

To identify specifically expressed receptors across clusters, we z-transformed receptor protein abundance across cell types. Receptors with an abundance z-score > = 1.645 were considered specifically abundant at the protein level. These receptors were then treated as the positive class, while all others were assigned to the negative class. AUROC and AUPRC metrics were calculated using yardstick70. For the AUPRC calculations, we downsampled the negative class 100 times to match the (lower) number of receptors assigned to the positive class. The downsampling procedure binds the expected random AUPRC to 0.5.

We allowed surface protein receptors to match multiple genes (e.g. T-cell receptors subunits), and vice versa. Gene aliases of proteins were obtained using the human and mouse gene databases from the org.Hs.eg.db71 and org.Mm.eg.db72 BioConductor packages. Proteins with non-standard names, or absent aliases in the aforementioned databases, were manually annotated using UniProt73 as a reference.

Robustness analyses

To evaluate sensitivity of the methods to noise, we performed four distinct robustness analyses. We simulated noise in the data by subsampling the number of cells per cluster and by reshuffling the cell type labels.

Additionally, to simulate the impact of false interactions in the resource, we randomly generated interactions from the 2000 most variable genes in the dataset and randomly replaced proportions of the resource with these putative false interactions. In one scenario, we selectively replaced interactions in the resource and preserved the highest ranked interactions, while in the other scenario we non-selectively swapped any of the interactions.

All four analyses were done in an iterative manner over a range of manipulations (0–40%). We treated the highest ranked 250 interactions from the non-modified resource/data as ground truth and repeated the randomisation process 5 times.

Data processing

All 10× Genomics, including all CITE-Seq and the 3k PBMC, datasets were processed using the standard Seurat pipeline. Namely, filtered gene expression count matrices were log-normalised, and if cell type annotations were not provided, the cells were clustered, following scaling, identification of variable features, and PCA dimensionality reduction, using Seurat’s64 (v4.0.3) default settings. For 10× Genomics CITE-Seq datasets we used a clustering resolution of 0.4 and the protein abundances were centred-log-ratio transformed. In the Murine spleen-lymph CITE-Seq datasets74, duplicated and low quality cells, as annotated by the original authors, were filtered, in agreement with the other CITE-seq datasets, gene counts were log-normalised, while protein abundances were centred-log-ratio transformed.

For the colorectal cancer dataset, we kept the original subtype labels, reformatted the names to work with each CCC method, and sparsified the counts into a Seurat64 object. The pre-processed and labelled Pancreatic islet46 and cord blood mononuclear cell45 datasets were log-normalised, and subsequently used for CCC inference. In the latter dataset, any murine and doublet/multiplet cells, as annotated by the authors, were excluded.

We used ComplexHeatmap75 to generate the heatmaps and ggplot276 for any of the other plots presented in this work.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Supplementary information

Supplementary Info (20.3MB, pdf)
Reporting Summary (1.9MB, pdf)
Peer review file. (8.9MB, pdf)

Acknowledgements

This work was supported in part by the European Union’s Horizon 2020 research and innovation program (860329 Marie-Curie ITN “STRATEGY-CKD”) and the German Federal Ministry of Education and Research (Bundesministerium für Bildung und Forschung BMBF) Computational Life Sciences LaMarck grant no. 031L0181B), awarded to J.S.R. This work was in part funded by the clinical research unit CRU344 supported by the German Research Foundation (DFG) and the E:MED Consortia Fibromap funded by the German Ministry of Education and Science (BMBF) awarded to I.C. We express our gratitude to Erick Armingol, Pau Badia i Mompel, Hratch Baghdassarian, Luz Garcia-Alonso and Suoqin Jin for their helpful feedback and discussions and to Ece Kartal for the design of LIANA’s outline graphic. For the publication fee we acknowledge financial support by Deutsche Forschungsgemeinschaft within the funding programme “Open Access Publikationskosten” as well as by Heidelberg University.

Source data

Source Data (424.3KB, xlsx)

Author contributions

J.S.R. conceived the project. D.D. set up the framework used in this manuscript, with the help of D.T., M.G.R., and J.S.N. D.D. performed the comparisons and evaluations presented in this work with the support of A.D., A.V., R.O.R.F., and J.S.R. D.T. set up the resource formatting infrastructure with the help of D.D. D.T., D.D., and C.B. created the resource analysis pipeline. P.L.B. performed the robustness analysis under the guidance of D.D. J.S.R. supervised the project with the help of A.V. and A.D. H.K., R.O.R.F., and B.S. performed preliminary and supplementary analyses that helped shape the work presented here. I.C. supervised J.S.N. A.D. and A.V. contributed equally to the manuscript. All authors contributed and revised the final version of the manuscript.

Peer review

Peer review information

Nature Communications thanks Qing Nie, Xiaohui Fan and the other anonymous reviewer(s) for their contribution to the peer review of this work. Peer review reports are available.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Data availability

The processed and annotated Human Breast Cancer single-cell atlas44 is available via the GEO accession number: GSE176078. The filtered breast cancer 10× Visium slides from the same publication are available at https://zenodo.org/record/4739739. Processed seqFISH77 [https://content.cruk.cam.ac.uk/jmlab/SpatialMouseAtlas2020/] and merFISH53 (GEO accession number: GSE113576) datasets were obtained via the spatial single-cell analysis framework—Squidpy (v1.1.0)69 [https://squidpy.readthedocs.io/en/latest/api.html#module-squidpy.datasets].

Pancreatic islet46 (GEO accession numbers: GSE84133, GSE81076, GSE85241, GSE86469; ArrayExpress: E-MTAB-5061) and cord blood mononuclear cells45 (GEO accession number: GSE100866) scRNA-Seq datasets were obtained via SeuratData (https://github.com/satijalab/seurat-data).

Publicly available 5K PBMC, 5K PBMC NextGem, 10K PBCM, and 10K MALT CITE-Seq datasets were obtained from 10× Genomics (accessible under the list of datasets at https://tinyurl.com/10xCITEseq).

Processed and annotated murine spleen-lymph CITE-Seq datasets74 are available via the GEO accession number: GSE150599.

The processed single cell RNA-Seq data47 for 23 Korean colorectal cancer patients are available via the GEO accession number: GSE132465.

Spatial transcriptomics datasets (10× Visium slides) on sagittal adult mouse brain anterior and posterior slices were obtained from SeuratData, available at https://github.com/satijalab/seurat-data, under the dataset name of ‘stxBrain‘, or publically via the 10× Genomics website under Spatial Gene Expression v1 Chemistry datasets [https://tinyurl.com/10xVisiumDemonstration]. The single-cell data (Allen Brain Atlas51) used for the cell type mapping (deconvolution), was obtained as a Seurat object, accessible at https://www.dropbox.com/s/cuowvm4vrf65pvq/allen_cortex.rds?dl=1, and is alternatively available via accession number: GSE71585.

The 10× Genomics’ 3k PBMC dataset used in the robustness analysis is available at https://cf.10xgenomics.com/samples/cell/pbmc3k/pbmc3k_filtered_gene_bc_matrices.tar.gz. Source Data for all Supplementary Figures, along with preprocessed outputs, are available at: https://zenodo.org/record/6531218Source data are provided with this paper.

Code availability

The LIANA framework is available at https://github.com/saezlab/liana, and the version used to generate the results presented here is available via Zenodo78. The scripts used to generate the results presented here can be accessed at https://github.com/saezlab/ligrec_decouple.

Competing interests

J.S.R. has received funding from GSK and Sanofi and fees from Travere Therapeutics. A.V. is currently employed by F. Hoffmann-La Roche Ltd. The remaining authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Alberto Valdeolivas, Aurélien Dugourd.

Supplementary information

The online version contains supplementary material available at 10.1038/s41467-022-30755-0.

References

  • 1.Chen X, Teichmann SA, Meyer KB. From Tissues to Cell Types and Back: Single-Cell Gene Expression Analysis of Tissue Architecture. Annu. Rev. Biomed. Data Sci. 2018;1:29–51. doi: 10.1146/annurev-biodatasci-080917-013452. [DOI] [Google Scholar]
  • 2.Armingol E, Officer A, Harismendy O, Lewis NE. Deciphering cell-cell interactions and communication from gene expression. Nat. Rev. Genet. 2021;22:71–88. doi: 10.1038/s41576-020-00292-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Türei D, et al. Integrated intra- and intercellular signaling knowledge for multicellular omics analysis. Mol. Syst. Biol. 2021;17:e9923. doi: 10.15252/msb.20209923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Kumar, M. P. et al. Analysis of Single-Cell RNA-Seq Identifies Cell-Cell Communication Associated with Tumor Characteristics. Cell Rep.25, 1458–1468.e4 (2018). [DOI] [PMC free article] [PubMed]
  • 5.Cillo AR, et al. Immune Landscape of Viral- and Carcinogen-Driven Head and Neck Cancer. Immunity. 2020;52:183–199.e9. doi: 10.1016/j.immuni.2019.11.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Wang, Y. et al. iTALK: an R Package to Characterize and Illustrate Intercellular Communication. BioRxiv10.1101/507871 (2019).
  • 7.Tyler SR, et al. PyMINEr Finds Gene and Autocrine-Paracrine Networks from Human Islet scRNA-Seq. Cell Rep. 2019;26:1951–1964.e8. doi: 10.1016/j.celrep.2019.01.063. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Efremova M, Vento-Tormo M, Teichmann SA, Vento-Tormo R. CellPhoneDB: inferring cell-cell communication from combined expression of multi-subunit ligand-receptor complexes. Nat. Protoc. 2020;15:1484–1506. doi: 10.1038/s41596-020-0292-x. [DOI] [PubMed] [Google Scholar]
  • 9.Tsuyuzaki, K., Ishii, M. & Nikaido, I. Uncovering hypergraphs of cell-cell interaction from single cell RNA-sequencing data. BioRxiv10.1101/566182 (2019).
  • 10.Raredon, M. S. B. et al. Connectome: computation and visualization of cell-cell signaling topologies in single-cell systems data. BioRxiv10.1101/2021.01.21.427529 (2021). [DOI] [PMC free article] [PubMed]
  • 11.Hou R, Denisenko E, Ong HT, Ramilowski JA, Forrest ARR. Predicting cell-to-cell communication networks using NATMI. Nat. Commun. 2020;11:5011. doi: 10.1038/s41467-020-18873-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Cabello-Aguilar S, et al. SingleCellSignalR: inference of intercellular networks from single-cell transcriptomics. Nucleic Acids Res. 2020;48:e55. doi: 10.1093/nar/gkaa183. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Noël, F. et al. ICELLNET: a transcriptome-based framework to dissect intercellular communication. BioRxiv10.1101/2020.03.05.976878 (2020).
  • 14.Jin S, et al. Inference and analysis of cell-cell communication using CellChat. Nat. Commun. 2021;12:1088. doi: 10.1038/s41467-021-21246-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Jakobsson, J. E. T., Spjuth, O. & Lagerström, M. C. scConnect: a method for exploratory analysis of cell-cell communication based on single cell RNA sequencing data. Bioinformatics10.1093/bioinformatics/btab245 (2021). [DOI] [PMC free article] [PubMed]
  • 16.Zhang, Y. et al. Cellinker: a platform of ligand-receptor interactions for intercellular communication analysis. Bioinformatics10.1093/bioinformatics/btab036 (2021). [DOI] [PMC free article] [PubMed]
  • 17.Lagger, C. et al. scAgeCom: a murine atlas of age-related changes in intercellular communication inferred with the package scDiffCom. BioRxiv10.1101/2021.08.13.456238 (2021).
  • 18.Choi H, et al. Transcriptome analysis of individual stromal cell populations identifies stroma-tumor crosstalk in mouse lung cancer model. Cell Rep. 2015;10:1187–1201. doi: 10.1016/j.celrep.2015.01.040. [DOI] [PubMed] [Google Scholar]
  • 19.Browaeys R, Saelens W, Saeys Y. NicheNet: modeling intercellular communication by linking ligands to target genes. Nat. Methods. 2020;17:159–162. doi: 10.1038/s41592-019-0667-5. [DOI] [PubMed] [Google Scholar]
  • 20.Wang S, Karikomi M, MacLean AL, Nie Q. Cell lineage and communication network inference via optimization for single-cell transcriptomics. Nucleic Acids Res. 2019;47:e66. doi: 10.1093/nar/gkz204. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Cheng J, Zhang J, Wu Z, Sun X. Inferring microenvironmental regulation of gene expression from single-cell RNA sequencing data using scMLnet with an application to COVID-19. Brief. Bioinforma. 2021;22:988–1005. doi: 10.1093/bib/bbaa327. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Hu, Y., Peng, T., Gao, L. & Tan, K. CytoTalk: De novo construction of signal transduction networks using single-cell transcriptomic data. Sci. Adv. 7, eabf1356 (2021). [DOI] [PMC free article] [PubMed]
  • 23.Zhang Y, et al. CellCall: integrating paired ligand-receptor and transcription factor activities for cell-cell communication. Nucleic Acids Res. 2021;49:8520–8534. doi: 10.1093/nar/gkab638. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Mishra V, et al. Systematic elucidation of neuron-astrocyte interaction in models of amyotrophic lateral sclerosis using multi-modal integrated bioinformatics workflow. Nat. Commun. 2020;11:5579. doi: 10.1038/s41467-020-19177-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Almet AA, Cang Z, Jin S, Nie Q. The landscape of cell-cell communication through single-cell transcriptomics. Curr. Opin. Syst. Biol. 2021;26:12–23. doi: 10.1016/j.coisb.2021.03.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Shao, X. et al. CellTalkDB: a manually curated database of ligand-receptor interactions in humans and mice. Brief. Bioinformatics22, bbaa269 (2021). [DOI] [PubMed]
  • 27.Noël F, et al. Dissection of intercellular communication using the transcriptome-based framework ICELLNET. Nat. Commun. 2021;12:1089. doi: 10.1038/s41467-021-21244-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Baccin C, et al. Combined single-cell and spatial transcriptomics reveal the molecular, cellular and spatial bone marrow niche organization. Nat. Cell Biol. 2020;22:38–48. doi: 10.1038/s41556-019-0439-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Kanehisa M, Goto S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28:27–30. doi: 10.1093/nar/28.1.27. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Kanehisa M, Furumichi M, Sato Y, Ishiguro-Watanabe M, Tanabe M. KEGG: integrating viruses and cellular organisms. Nucleic Acids Res. 2021;49:D545–D551. doi: 10.1093/nar/gkaa970. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Fabregat A, et al. The Reactome Pathway Knowledgebase. Nucleic Acids Res. 2018;46:D649–D655. doi: 10.1093/nar/gkx1132. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Szklarczyk D, et al. The STRING database in 2017: quality-controlled protein-protein association networks, made broadly accessible. Nucleic Acids Res. 2017;45:D362–D368. doi: 10.1093/nar/gkw937. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Harding SD, et al. The IUPHAR/BPS Guide to PHARMACOLOGY in 2018: updates and expansion to encompass the new guide to IMMUNOPHARMACOLOGY. Nucleic Acids Res. 2018;46:D1091–D1106. doi: 10.1093/nar/gkx1121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Ben-Shlomo I, Yu Hsu S, Rauch R, Kowalski HW, Hsueh AJW. Signaling receptome: a genomic and evolutionary perspective of plasma membrane receptors involved in signal transduction. Sci. STKE. 2003;2003:RE9. doi: 10.1126/stke.2003.187.re9. [DOI] [PubMed] [Google Scholar]
  • 35.Ramilowski JA, et al. A draft network of ligand-receptor-mediated multicellular signalling in human. Nat. Commun. 2015;6:7866. doi: 10.1038/ncomms8866. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Licata L, et al. SIGNOR 2.0, the SIGnaling Network Open Resource 2.0: 2019 update. Nucleic Acids Res. 2020;48:D504–D510. doi: 10.1093/nar/gkz949. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Winograd-Katz SE, Fässler R, Geiger B, Legate KR. The integrin adhesome: from genes and proteins to human disease. Nat. Rev. Mol. Cell Biol. 2014;15:273–288. doi: 10.1038/nrm3769. [DOI] [PubMed] [Google Scholar]
  • 38.Fazekas D, et al. SignaLink 2—a signaling pathway resource with multi-layered regulatory networks. BMC Syst. Biol. 2013;7:7. doi: 10.1186/1752-0509-7-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Kirouac DC, et al. Dynamic interaction networks in a hierarchically organized tissue. Mol. Syst. Biol. 2010;6:417. doi: 10.1038/msb.2010.71. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Kandasamy K, et al. NetPath: a public resource of curated signal transduction pathways. Genome Biol. 2010;11:R3. doi: 10.1186/gb-2010-11-1-r3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Yuan, H. et al. CancerSEA: a cancer single-cell state atlas. Nucleic Acids Res. 47, D900–D908 (2018). [DOI] [PMC free article] [PubMed]
  • 42.Uhlén M, et al. Proteomics. Tissue-based map of the human proteome. Science. 2015;347:1260419. doi: 10.1126/science.1260419. [DOI] [PubMed] [Google Scholar]
  • 43.Piñero J, et al. DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants. Nucleic Acids Res. 2017;45:D833–D839. doi: 10.1093/nar/gkw943. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Wu, S. Z. et al. A single-cell and spatially resolved atlas of human breast cancers. Nature Genetics53, 1334–1347 (2021). [DOI] [PMC free article] [PubMed]
  • 45.Stoeckius M, et al. Simultaneous epitope and transcriptome measurement in single cells. Nat. Methods. 2017;14:865–868. doi: 10.1038/nmeth.4380. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Butler A, Hoffman P, Smibert P, Papalexi E, Satija R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 2018;36:411–420. doi: 10.1038/nbt.4096. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Lee H-O, et al. Lineage-dependent gene expression programs influence the immune landscape of colorectal cancer. Nat. Genet. 2020;52:594–603. doi: 10.1038/s41588-020-0636-z. [DOI] [PubMed] [Google Scholar]
  • 48.Jiang P, et al. Systematic investigation of cytokine signaling activity at the tissue and single-cell levels. Nat. Methods. 2021;18:1181–1191. doi: 10.1038/s41592-021-01274-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Armingol, E. et al. Inferring the spatial code of cell-cell interactions and communication across a whole animal body. BioRxiv10.1101/2020.11.22.392217 (2020).
  • 50.Palla G, Fischer DS, Regev A, Theis FJ. Spatial components of molecular tissue biology. Nat. Biotechnol. 2022;40:308–318. doi: 10.1038/s41587-021-01182-1. [DOI] [PubMed] [Google Scholar]
  • 51.Tasic B, et al. Adult mouse cortical cell taxonomy revealed by single cell transcriptomics. Nat. Neurosci. 2016;19:335–346. doi: 10.1038/nn.4216. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Lohoff, T. et al. Highly multiplexed spatially resolved gene expression profiling of mouse organogenesis. BioRxiv10.1101/2020.11.20.391896 (2020).
  • 53.Moffitt, J. R. et al. Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic region. Science362, eaau5324 (2018). [DOI] [PMC free article] [PubMed]
  • 54.Jin, S. et al. Inference and analysis of cell-cell communication using CellChat. BioRxiv10.1101/2020.07.21.214387 (2020). [DOI] [PMC free article] [PubMed]
  • 55.Sheikh BN, et al. Systematic Identification of Cell-Cell Communication Networks in the Developing Brain. iScience. 2019;21:273–287. doi: 10.1016/j.isci.2019.10.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Kim HJ, Lin Y, Geddes TA, Yang JYH, Yang P. CiteFuse enables multi-modal analysis of CITE-seq data. Bioinformatics. 2020;36:4137–4143. doi: 10.1093/bioinformatics/btaa282. [DOI] [PubMed] [Google Scholar]
  • 57.Jung, S., Singh, K. & del Sol, A. FunRes: resolving tissue-specific functional cell states based on a cell–cell communication network model. Brief. Bioinformatics22, bbaa283 (2021). [DOI] [PMC free article] [PubMed]
  • 58.Fischer, D. S., Schaar, A. C. & Theis, F. J. Learning cell communication from spatial graphs of cells. BioRxiv10.1101/2021.07.11.451750 (2021).
  • 59.Tanevski, J., Ramirez Flores, R. O., Gabor, A., Schapiro, D. & Saez-Rodriguez, J. Explainable multi-view framework for dissecting inter-cellular signaling from highly multiplexed spatial data. BioRxiv10.1101/2020.05.08.084145 (2020).
  • 60.Garcia-Alonso L, et al. Mapping the temporal and spatial dynamics of the human endometrium in vivo and in vitro. Nat. Genet. 2021;53:1698–1711. doi: 10.1038/s41588-021-00972-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Gul, L. et al. Extracellular vesicles produced by the human commensal gut bacterium Bacteroides thetaiotaomicron affect host immune pathways in a cell-type specific manner that are altered in inflammatory bowel disease. BioRxiv10.1101/2021.03.20.436262 (2021). [DOI] [PMC free article] [PubMed]
  • 62.Westermann AJ, Vogel J. Cross-species RNA-seq for deciphering host-microbe interactions. Nat. Rev. Genet. 2021;22:361–378. doi: 10.1038/s41576-021-00326-y. [DOI] [PubMed] [Google Scholar]
  • 63.Mahdessian D, et al. Spatiotemporal dissection of the cell cycle with single-cell proteogenomics. Nature. 2021;590:649–654. doi: 10.1038/s41586-021-03232-9. [DOI] [PubMed] [Google Scholar]
  • 64.Satija R, Farrell JA, Gennert D, Schier AF, Regev A. Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 2015;33:495–502. doi: 10.1038/nbt.3192. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Kolde R, Laur S, Adler P, Vilo J. Robust rank aggregation for gene list integration and meta-analysis. Bioinformatics. 2012;28:573–580. doi: 10.1093/bioinformatics/btr709. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Durinck S, Spellman PT, Birney E, Huber W. Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nat. Protoc. 2009;4:1184–1191. doi: 10.1038/nprot.2009.97. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Badia-i-Mompel, P. et al. decoupleR: Ensemble of computational methods to infer biological activities from omics data. BioRxiv10.1101/2021.11.04.467271 (2021). [DOI] [PMC free article] [PubMed]
  • 68.Elosua-Bayes M, Nieto P, Mereu E, Gut I, Heyn H. SPOTlight: seeded NMF regression to deconvolute spatial transcriptomics spots with single-cell transcriptomes. Nucleic Acids Res. 2021;49:e50. doi: 10.1093/nar/gkab043. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Palla, G. et al. Squidpy: a scalable framework for spatial single cell analysis. BioRxiv10.1101/2021.02.19.431994 (2021).
  • 70.Kuhn, M. & Vaughan, D. yardstick: Tidy Characterizations of Model Performance. (CRAN, 2021).
  • 71.Carlson, M. org.Hs.eg.db: Genome wide annotation for Human. R package version 3.8.2. Bioconductor10.18129/b9.bioc.org.hs.eg.db (2017).
  • 72.Carlson, M. org.Mm.eg.db. Bioconductor10.18129/b9.bioc.org.mm.eg.db (2017).
  • 73.UniProt Consortium. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res. 2021;49:D480–D489. doi: 10.1093/nar/gkaa1100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Gayoso A, et al. Joint probabilistic modeling of single-cell multi-omic data with totalVI. Nat. Methods. 2021;18:272–282. doi: 10.1038/s41592-020-01050-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Gu Z, Eils R, Schlesner M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics. 2016;32:2847–2849. doi: 10.1093/bioinformatics/btw313. [DOI] [PubMed] [Google Scholar]
  • 76.Wickham H. ggplot2. WIREs Comp. Stat. 2011;3:180–185. doi: 10.1002/wics.147. [DOI] [Google Scholar]
  • 77.Lohoff T, et al. Integration of spatial and single-cell transcriptomic data elucidates mouse organogenesis. Nat. Biotechnol. 2022;40:74–85. doi: 10.1038/s41587-021-01006-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Dimitrov, D. saezlab/liana version 0.05 (Devel). Zenodo10.5281/zenodo.6475164 (2022).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Info (20.3MB, pdf)
Reporting Summary (1.9MB, pdf)
Peer review file. (8.9MB, pdf)

Data Availability Statement

The processed and annotated Human Breast Cancer single-cell atlas44 is available via the GEO accession number: GSE176078. The filtered breast cancer 10× Visium slides from the same publication are available at https://zenodo.org/record/4739739. Processed seqFISH77 [https://content.cruk.cam.ac.uk/jmlab/SpatialMouseAtlas2020/] and merFISH53 (GEO accession number: GSE113576) datasets were obtained via the spatial single-cell analysis framework—Squidpy (v1.1.0)69 [https://squidpy.readthedocs.io/en/latest/api.html#module-squidpy.datasets].

Pancreatic islet46 (GEO accession numbers: GSE84133, GSE81076, GSE85241, GSE86469; ArrayExpress: E-MTAB-5061) and cord blood mononuclear cells45 (GEO accession number: GSE100866) scRNA-Seq datasets were obtained via SeuratData (https://github.com/satijalab/seurat-data).

Publicly available 5K PBMC, 5K PBMC NextGem, 10K PBCM, and 10K MALT CITE-Seq datasets were obtained from 10× Genomics (accessible under the list of datasets at https://tinyurl.com/10xCITEseq).

Processed and annotated murine spleen-lymph CITE-Seq datasets74 are available via the GEO accession number: GSE150599.

The processed single cell RNA-Seq data47 for 23 Korean colorectal cancer patients are available via the GEO accession number: GSE132465.

Spatial transcriptomics datasets (10× Visium slides) on sagittal adult mouse brain anterior and posterior slices were obtained from SeuratData, available at https://github.com/satijalab/seurat-data, under the dataset name of ‘stxBrain‘, or publically via the 10× Genomics website under Spatial Gene Expression v1 Chemistry datasets [https://tinyurl.com/10xVisiumDemonstration]. The single-cell data (Allen Brain Atlas51) used for the cell type mapping (deconvolution), was obtained as a Seurat object, accessible at https://www.dropbox.com/s/cuowvm4vrf65pvq/allen_cortex.rds?dl=1, and is alternatively available via accession number: GSE71585.

The 10× Genomics’ 3k PBMC dataset used in the robustness analysis is available at https://cf.10xgenomics.com/samples/cell/pbmc3k/pbmc3k_filtered_gene_bc_matrices.tar.gz. Source Data for all Supplementary Figures, along with preprocessed outputs, are available at: https://zenodo.org/record/6531218Source data are provided with this paper.

The LIANA framework is available at https://github.com/saezlab/liana, and the version used to generate the results presented here is available via Zenodo78. The scripts used to generate the results presented here can be accessed at https://github.com/saezlab/ligrec_decouple.


Articles from Nature Communications are provided here courtesy of Nature Publishing Group

RESOURCES