Skip to main content
Briefings in Functional Genomics logoLink to Briefings in Functional Genomics
. 2022 Jul 12;21(5):339–356. doi: 10.1093/bfgp/elac019

A systematic evaluation of the computational tools for ligand-receptor-based cell–cell interaction inference

Saidi Wang, Hansi Zheng, James S Choi, Jae K Lee, Xiaoman Li , Haiyan Hu
PMCID: PMC9479691  PMID: 35822343

Abstract

Cell–cell interactions (CCIs) are essential for multicellular organisms to coordinate biological processes and functions. One classical type of CCI interaction is between secreted ligands and cell surface receptors, i.e. ligand-receptor (LR) interactions. With the recent development of single-cell technologies, a large amount of single-cell ribonucleic acid (RNA) sequencing (scRNA-Seq) data has become widely available. This data availability motivated the single-cell-resolution study of CCIs, particularly LR-based CCIs. Dozens of computational methods and tools have been developed to predict CCIs by identifying LR-based CCIs. Many of these tools have been theoretically reviewed. However, there is little study on current LR-based CCI prediction tools regarding their performance and running results on public scRNA-Seq datasets. In this work, to fill this gap, we tested and compared nine of the most recent computational tools for LR-based CCI prediction. We used 15 well-studied scRNA-Seq samples that correspond to approximately 100K single cells under different experimental conditions for testing and comparison. Besides briefing the methodology used in these nine tools, we summarized the similarities and differences of these tools in terms of both LR prediction and CCI inference between cell types. We provided insight into using these tools to make meaningful discoveries in understanding cell communications.

Keywords: ligand-receptor interaction, cell–cell interaction, single-cell RNA sequencing, computational prediction tools

Introduction

Cell–cell interactions (CCIs) are essential for multicellular organisms to develop tissue structure and regulate individual cell processes [1–4]. They also contribute to maintaining intercellular relationships and coordinating diverse biological processes, such as development, differentiation and inflammation [5–7]. A CCI occurs when one cell called the sender cell transmits information via signaling molecules to another cell called the receiver cell [8]. Various signaling molecules, such as ions, metabolites, integrins, receptors, junction proteins, structural proteins, ligands and secreted proteins of the extracellular matrix, are involved in CCIs [9, 10]. A typical signaling cascade begins with a single key event, such as ligand-receptor (LR) interactions, which trigger the activation of downstream signaling pathways. Finally, it affects the activities of transcription factors (TFs) and their target gene expression [11–13]. CCIs mediated by LR interactions have been the most common scenario for the computational study of CCIs in recent years [9].

With the recent advancement of single-cell technologies [14], a large amount of single-cell ribonucleic acid (RNA)-sequencing (scRNA-Seq) data has become publicly available [15–21]. scRNA-Seq has enabled new and potentially unexpected biological discoveries compared to traditional profiling methods. These include revealing complex and rare cell populations, uncovering regulatory relationships between genes and tracking the trajectories of distinct cell lineages in development [19, 20, 22]. Compared to the traditional assays that directly measure protein–protein interactions (PPI), the transcriptomic measurements at single-cell resolution greatly facilitate the computational study of CCIs [23]. With the LR interaction information, scRNA-Seq has demonstrated its effectiveness in exploring CCIs at single-cell resolution.

Many computational tools have been developed to identify CCIs through scRNA-Seq data integration under specific cellular and physiological conditions [24]. These CCI prediction tools, in general, follow a common pipeline, including cell-type classification, LR interaction inference, CCI network construction and CCI visualization. However, each tool has its specific emphasis and algorithmic details. Existing comparative studies of CCI tools mainly report their advantages and disadvantages based on the theoretical analysis [24]. There is a lack of running assessments to understand the performance and effectiveness of the most recent CCI inference tools in real application scenarios. This work attempts to compare and illustrate applications of LR-based CCI inference tools to public scRNA-Seq datasets using the recently developed nine tools [5, 8, 12, 24–29]. In the following, we first provide an overview of these tools. We next describe the LR resources used by different tools and the benchmark LR interaction database used for the later tool evaluation. We then introduce the three benchmark scRNA-Seq datasets involving transcriptional profiling of 95 145 single cells. Finally, we discuss the systematic evaluation of nine tools in LR interaction prediction and CCI network construction.

Computational tools for LR-based CCI inference

We compiled nine tools that can predict LR-based CCIs based on scRNA-Seq gene expression measurements, including CellPhoneDB, iTALK, Network Analysis Toolkit for Multicellular Interactions (NATMI), PyMINEr, NicheNet, SingleCellSignalR, CellChat, ICELLNET and scMLnet [5, 8, 12, 24–29]. Note that there are more than nine tools published on this subject. We finalized our list of nine tools for comparison because these tools were published in the last 4 years and can run smoothly without further debugging (Table 1). We also limit our comparison to tools not designed for spatial transcriptomics data analysis because spatial measurements of gene expression levels are not always accessible as regular scRNA-Seq data [30, 31]. All these nine tools were developed using either R or python. Almost all the tools, except PyMINEr, require a curated LR interaction database in addition to raw or normalized Unique Molecular Identifiers (UMIs) counts as input. UMI is a type of molecular barcoding which provides error correction and increased accuracy during sequencing. Among these tools, CellPhoneDB, iTALK, NATMI, PyMINEr and scMLnet also need the cell-type annotation as input. Others perform cell-type annotation by embedding certain cell-clustering procedures, such as Seurat or K-means in their pipelines, and then assuming cluster-corresponding cell types [32]. All of these tools output the predicted LR interaction pairs between cell types. Such LR pairs can then be used to construct CCI networks, suggesting the potential communication between cells. In addition, all of them can provide visualization of CCIs. The nine tools are briefly described as follows.

Table 1.

Overview of the CCI inference tools

Tools Reference Database (# LR pairs) code CCI inference Input: gene expression (normalized/raw data) Input: predefined cell types Visualization
iTALK Wang et al. [25] 2648 R CCIs are inferred by finding significant LR interactions based on differentially expressed ligand and receptor identification Raw YES Circos plots and boxplots of CCI networks
PyMINEr Tyler et al. [28] NA Python CCIs are inferred by identifying autocrine and paracrine signaling through LR-involved pathway analysis Normalized YES Network graph and circular plot of CCIs
NicheNet Browaeys et al. [12] 12 652 R CCIs are inferred based on the influence of a ligand in a cell type on target genes of a signaling path in another cell type through prior ligand-to-target signaling path integration Raw NO Circos plot of interactions between cells or clusters
CellPhoneDB Efremova et al. [5] 1396 Python CCIs are based on identifying statistically significant LR interactions based on their gene expression in involved cell states Normalized YES Dot plot of cluster combinations
NATMI Hou et al. [29] 2293 Python CCIs are based on the summation of LR interactions weighted by multiple metrics considering the ligand and receptor expression levels in the two involved cell clusters Raw YES Heatmap-view, network graph and circular plot of CCIs
SingleCellSignalR Cabello-Aguilar et al. [27] 3251 R CCI probabilities are based on the nonlinear function of the product of ligand and receptor expressions Normalized No Network graph and circular plot of CCIs
CellChat Jin et al. [8] 2021 R CCI probability derived using the law of mass action according to the ligand- and receptor- expression levels averaged over their associated cell groups Normalized No Alluvial and Circos plots of communication pathways
ICELLNET Noël et al. [24] 752 R CCI scores are calculated based on the mean expression profiles of their respective ligands and receptors in two cell clusters Raw NO Network visualization of communication between interacting groups
scMLnet Cheng et al. [26] 2557 R CCIs are inferred by the multilayer network constructed through integrating expression-based reconstruction of LR interaction, receptor-TF, TF-target gene networks Raw YES Multilayer network visualization of ligand–receptor-TF–target gene links for each cell cluster

iTALK

iTALK [25] was developed as a tool for identifying and illustrating CCIs. iTALK was motivated by understanding the crosstalk between tumor cells and other cells in a tumor microenvironment. iTALK manually curated a database that contains 2648 unique LR pairs. According to the functions of the ligands, iTALK classifies the interacting pairs into four categories, including cytokine/chemokine, immune checkpoint, growth factor and others. With the cell-gene expression matrix and predefined cell types as input, iTALK identifies CCIs as differentially expressed LR pairs. The identification of differential expression is made by integrating existing tools, such as DESeq, scde and Monocle [33–35]. iTALK provides various ways to visualize the CCIs, including network, circular and errorbar plots.

PyMINEr

PyMINEr [28] is an open-source program that can perform multi-facet analysis of gene expression data, such as clustering for cell-type identification, differential expression identification and pathway analyses. Its LR-based CCI inference was illustrated by predicting autocrine and paracrine signaling networks. PyMINEr first identifies differentially expressed genes between cell types as cell-type-enriched genes. Two separate lists of cell-type-enriched genes that have the potential to function as ligands and receptors in cell communication are then generated according to their Gene Ontology (GO) annotation and subcellular localization analysis. An LR pair can be defined if the cell-type-enriched genes are in the two lists and can have physical interaction at the protein level according to the PPI information in StringDB. Such obtained LR pairs are then integrated into the pathway analysis to identify autocrine and paracrine signaling between all cell types. PyMINEr was applied to human pancreatic islet scRNA-Seq datasets, and it was able to identify the bone morphogenic protein-WNT signaling responsible for cystic fibrosis pancreatic acinar cell loss. PyMINEr output an HTML webpage to display all the analysis results, including network graph and circular plot of CCI networks.

NicheNet

NicheNet [12] attempts to elucidate the functional understanding of CCIs by inferring the functional effect of ligands in the sender cells on the expression of genes in the receiver cells. To do that, NicheNet integrates LR interaction, signal transduction and gene regulatory interaction information. The LR interactions were compiled from KEGG, Reactome, the IUPHAR/BPS Guide to pharmacology and PPI databases [10, 36–38]. The signaling and PPI information was obtained from multiple pathways and PPI databases such as ConsensusPathDB, Omnipath and PathwayCommons [39–47]. The gene regulatory interactions were compiled from multiple resources, including TRANSFAC, JASPAR, MSigDB and so on [48–60]. Individual interactions were organized as weighted networks where LR and signaling networks were combined as a weighted ligand-signaling network, and gene regulatory interactions were converted to a weighted gene regulatory network. A weighted sum of the individual networks was then performed to integrate different data sources. Network operations, such as PageRank and matrix multiplication, were applied to this integrated network to derive a prior model of ligand-target regulatory potential. After combining the expression profiles of interacting cells, NicheNet can prioritize the regulatory potential of ligands on the target genes. When applied to the HNSCC tumor single-cell data, NicheNet identified TGFB3 as the most probable ligand that regulates the epithelial-to-mesenchymal transition program in a group of malignant cells. The signaling paths from TGFB3 to its target genes were also inferred. NicheNet, when applied to study the CCIs among immune cells using the mouse immune cells, identified antiviral-relevant ligands such as Il27, Ifng and Il12a.

SingleCellSignalR

SingleCellSignalR [27] curated a LR interaction database, LRdb, from multiple resources such as FANTOM5 [61], HPRD [62], Reactome [63] and HPMR [64]. LRdb requires a potential LR pair with its respective GO annotations being ligand or receptor and involved in the Reactome Pathway database [65, 66]. LRdb contains 3251 reliable LR pairs. The LR interactions are determined by a regularized product score that is majorly derived from the product of the expression levels of the associated ligand and receptor. A scoring threshold is estimated using multiple benchmark datasets, including a metastatic melanoma dataset, two peripheral blood monoclonal cell datasets, a pan T-cell dataset, a head and neck squamous cell carcinoma dataset. The usage of SingleCellSignalR was demonstrated by the identified LR interactions between cells in a mouse interfollicular epidermis dataset.

CellPhoneDB

Unlike the previous studies, such as iTALK, singleCellSignalR and NicheNet, CellPhoneDB v2.0 considers multiple subunit architecture for both ligands and receptors [5]. CellPhoneDB also curated a LR database, which contains multiple LR subunits. Significant LR interaction pairs are defined based on the likelihood estimation of their respective cell-type enrichment. The number of significant LR interaction pairs is used to prioritize the CCIs specific to two given cell types. The CCI-based networks can be constructed to assess cellular crosstalk between different cell types. The cell subsampling method is used to reduce memory usage and runtime.

NATMI

NATMI [29] compiled coonectomeDB2020 that contains 2293 manually curated LR pairs. NATMI considers a ligand or receptor as expressed if it is expressed in at least 20% of the cells of a given cell type. NATMI then defines an edge weight corresponding to a LR pair using three metrics differing in summarizing the expression levels of a ligand or receptor in cells of the same cell type. For example, one metric is the mean-expression weight, which is defined as the product of the ligand’s mean expression and the receptor’s mean expression in the cells of the cell type under consideration. Based on the edge weights between LR pairs, NATMI constructs a cell connectivity summary network that summarizes CCIs. The criteria can be specified by users, for example, simply counting the number of LR pairs whose weights pass user-defined thresholds. NATMI was applied to the Tabula Muris atlas that contains single cells from 20 organs in mice [27]. Autocrine, intra-organ and inter-organ signaling were identified among 117 cell types in the Tabula Muris dataset. Cellular communities and differential networks were also predicted. NATMI provides a function to visualize the extracted LR pairs and their edge weights.

CellChat

Like CellPhoneDB, CellChat [8] also incorporates multiple ligand/receptor subunits. It further extends their consideration to additional cofactors such as soluble agonists, antagonists and stimulatory and inhibitory membrane-bound co-receptors. Based on manual curation, CellChatDB was constructed with over 2K LR interactions, each of which is associated with a literature-supported signaling pathway. CCI prediction between a given pair of cell groups is based on a probability calculated by integrating the PPI network and the differential expressed ligands/receptors in the involved cell types. Based on the predicted CCIs, CellChat also performs network analysis on the intercellular CCI networks to identify the dominant roles of different cell types and CCI patterns. CellChat demonstrated its functions using several scRNA-Seq datasets, e.g. the single-cell mouse skin datasets covering embryonic development and adult wound healing stages. With the adult wound healing data, CellChat identified TGF\beta signaling from myeloid cells to fibroblasts, which is consistent with myeloid cells’ role in literature. CellChat’s pattern recognition module also revealed connections between cells and signaling pathways. For instance, multiple pathways, such as ncWNT, SPP1, MK and PROS, were identified corresponding to the outgoing signaling of fibroblast cells. With the embryonic day D14.5 mouse skin dataset, CellChat showed its ability to identify CCIs in continuous cell states inferred by the pseudotemporal trajectory.

scMLNet

Similar to NicheNet, scMLNet [26] integrates multiple types of information, including LR interactions, signaling and gene regulatory interactions as subnetworks to study CCI. scMLNet focuses more on the context-dependent integration by incorporating scRNA-Seq expression data into each subnetwork construction. The constructed subnetworks are output as a multilayer network representing CCIs. scMLNet was applied to a single cell dataset of bronchoalveolar lavage fluid samples in nine COVID patients and four healthy controls. The CCIs between secretory cells and other cell types enabled the identification of ACE2 regulatory pathways, such as PI3K-Akt, JAK–STAT, TNF and MAPK signaling pathways, which were further validated using bulk gene expression data and additional experiments.

ICELLNET

ICELLNET [24] is a computational framework that can infer CCIs from bulk transcriptomic and scRNA-Seq data. ICELLNET integrated hundreds of literature-annotated and experimentally validated LR interactions as a database. Similar to previous methods, ICELLNET scores the LR interactions using the expression levels of both ligands and receptors in the corresponding cells. One unique feature of ICELLNET is its ability to incorporate gene expression profiles of other cell types not from the same dataset. For example, a cell type can be from the Human Primary Cell Atlas. Briefly, given the expression profiling of a specific cell type, called ‘central cell’, and that of other cell types provided by the user, called ‘partner cells’, ICELLNET can predict the potential CCIs between the central cell and partner cells. Due to the incompleteness, complexity and possibilities to cause false CCI predictions, ICELLNET did not integrate pathway and gene regulatory information [67–70]. ICELLNET was applied to human breast cancer-associated fibroblast (CAF) cells and demonstrated its capability to reconstruct the CCIs and identify the LR interactions between CAFs and 14 other cell types involved in the tumor microenvironment. ICELLNET also provides a few visualization tools for result interpretation.

Even though all the tools can predict CCI networks, they have different considerations in defining LR interactions. We can thus further classify the tools into two categories: pathway-involved and non-pathway-involved. Tools like NicheNet, PyMINEr and scMLnet fall into the pathway-involved category, while others are majorly non-pathway-involved. The pathway-involved tools are primarily motivated by the essential role of intracellular signaling and transcriptional regulation events during CCIs [12, 26, 71, 72]. Therefore, these tools take into account the signaling pathway components by performing pathway integration and analysis when inferring LR interactions and CCIs. Because the current understanding of pathway and gene regulatory mechanisms is incomplete, the integration of downstream signaling pathways and targets can lead to false predictions [24, 69, 73, 74] . On the other hand, non-pathway-involved tools focus on the transcriptional abundance of the ligands and receptors. Some non-pathways-involved tools, such as CellChat and SingleCellSignalR, also perform pathway analysis but only after the CCI inference [8, 27]. Meanwhile, most of these tools utilize the expression levels of ligands and receptors to score the potential LR interactions, which is common in CCI literature [75–79]. Nevertheless, the formulas used to compute the score vary. For example, NATMI offers three different metrics for calculating the LR interaction scores, SingleCellSignalR uses a regularized product and CellPhoneDB applies modified mean P-value. The corresponding score cutoffs are also defined differently. Although all the tools can generate predictions for important LR interaction pairs, they do not always directly output the LR interaction scores, which is particularly true for the pathway-involved tools.

Testing data compilation and tool comparison methods

LR interaction data used for comparison

To evaluate the predicted LR interactions from the nine tools, we downloaded the compiled LR interactions in 23 resources from recent studies [8, 10, 12, 24, 25, 27, 29, 80–92] (Supplementary Table 1 available online at http://bib.oxfordjournals.org/). The majority of the LR interactions were inferred from protein interaction, gene function and pathway annotations in the literature. Although many LR interactions are shared among these resources, much more LR interactions are unique to individual resources. To find the most reliable LR interactions, we calculated the overlap of the LR pairs between resources (Figure 1). We defined the most reliable LR interactions as those occurring in at least four resources. This definition resulted in 3779 LR interaction pairs forming the consensus LR interaction database (C-LRI). We used the C-LRI database as a benchmark dataset to evaluate the LR predictions from different tools (Supplementary File 1 available online at http://bib.oxfordjournals.org/).

Figure 1.

Figure 1

The overlap of the LR pairs in the 23 resources. Although many LR interactions are shared among these resources, much more LR interactions are unique to individual resources.

scRNA-Seq datasets

We evaluated the nine tools using three well-studied scRNA-Seq datasets (Table 2), involving 15 scRNA-Seq samples.

Table 2.

Details of the 15 samples

Dataset GEO Name Data size (Windows10) Data format # Cell numbers
Embryonic mouse skin GSM3453535 e13.5control 379 MB 10X Genomics 7067
GSM3453536 e13.5control_replicate 326 MB 10X Genomics 6098
GSM3453537 e14.5control 343 MB 10X Genomics 6394
GSM3453538 e14.5control_replicate 329 MB 10X Genomics 6153
Mouse cerebral cortex GSE60361 NA 115 MB UMI counts 3005
Mouse spinal cord GSM4955359 uninj_sample1 151 MB UMI counts 2757
GSM4955360 uninj_sample2 61 MB UMI counts 1024
GSM4955361 uninj_sample3 547 MB UMI counts 8858
GSM4955362 1 dpi_sample1 404 MB UMI counts 7360
GSM4955363 1 dpi_sample2 309 MB UMI counts 5216
GSM4955364 1 dpi_sample3 526 MB UMI counts 8520
GSM4955365 3 dpi_sample1 507 MB UMI counts 9263
GSM4955366 3 dpi_sample2 489 MB UMI counts 8237
GSM4955367 7 dpi_sample1 408 MB UMI counts 7467
GSM4955368 7 dpi_sample2 459 MB UMI counts 7726

The first dataset is the scRNA-Seq data from embryonic mouse skin [93]. These scRNA-Seq data correspond to the gene expression measurement of single cells of embryonic dorsolateral/flank skin at embryonic days 13.5 and 14.5. We obtained two biological replicates for both days (GSM3453535, GSM3453536, GSM3453537 and GSM3453538). We then performed the cell-level filtering by removing the cells with UMI counts <2500 or >50 000 [8]. We also removed cells with gene numbers <1000 and the fraction of mitochondrial counts >20%.

The second dataset is the mouse cerebral cortex scRNA-Seq data [94], which measures gene expression in 3005 high-quality single cells isolated from the mouse cerebral cortex (GSE60361). It contains the main cell types in the hippocampus and somatosensory cortex. We then filtered out unreliable genes based on the total number of reads per gene and only kept the genes detected in more than 30 cells [95].

We also compared the nine tools on the third scRNA-Seq dataset from the mouse spinal cord (GSE162610) [96]. This dataset measures gene expression in all cell types of the uninjured and injured spinal cord of wild-type mice at 1 day, 3 days and 7 days after injury. The data contain 10 samples including three uninjured samples, three 1 dpi, two 3 dpi and two 7 dpi samples (GSM4955359, GSM4955360, GSM4955361, GSM4955362, GSM4955363, GSM4955364, GSM4955365, GSM4955366, GSM4955367 and GSM4955368). We used the processed dataset from the original paper.

Compare predicted LR pairs

For a given scRNA-Seq sample, the LR interactions between specific cell types were compared. Cell clusters were first obtained by Seurat [32]. Each cell cluster was then annotated to a cell type by cross-checking the cells’ annotation in the original literature reference. Each CCI tool was run to output the top LR predictions between cell-type-annotated clusters. Given such obtained LR interactions between any two cell types, all the tools were compared regarding the predicted number of LR interactions. The predicted LR interaction pairs were also compared against the C-LRI database. The LR pairs were used to infer CCIs for each tool. These CCIs were then compared across different tools. To determine how cell clustering results affected the CCI inference results, we enumerated the resolution parameter of Seurat. Note that the resolution parameter is associated with the Seurat clusters’ granularity, where a larger resolution value corresponds to a larger number of clusters.

Compare the inferred CCI networks

For each CCI inference tool, a CCI network was constructed based on the predicted LR interactions. In a CCI network, each node represents a cell type, and each edge corresponds to the LR-based CCIs between two cell types. Similar to ICELLNET [24], we defined the pairwise dissimilarities dα,β between two CCI networks α and β as

graphic file with name DmEquation1.gif

with

graphic file with name DmEquation2.gif

where Inline graphic is the weight of the directed edge from nodes i to j in the CCI network α, N is the total number of nodes, Inline graphic is the set of edges in the CCI network α and Inline graphic is the cardinality of the union of all edges in CCI networks α and β. Note that, the edge weight can be defined as the number of predicted LRs between two corresponding cell types.

Compare the effect of subsampling

To compare the predicted communications on different subsets of the original input data, we used the ‘geometric sketching’ approach to perform subsampling of the scRNA-Seq datasets [97]. Geometric sketching is a data sampling approach that has been shown effective in using a small portion of the original single-cell data to represent the full data and consequently reduce the data volume. Compared with the commonly used downsampling strategies, the geometric sketching method can keep the rare cell types and maintain the heterogeneity of the original data. To calculate the scores for the tools, given a CCI inferred from the sampled data, if it is also inferred from the original data, it is defined as a true positive. Any CCI predictions from the original data but not in the sampled data are false negatives. Such a sampling analysis was used to evaluate the consistency between tools rather than the accuracy of each tool.

Running time and memory usage

We collected the runtime and memory usage of the nine tools on a dedicated machine. All the tools were run on the same Ubuntu 18.04 Long Term Support supporting computer with Intel core i7-10875H CPU @2.3 GHz 16 cores and 128 GB memory. We used the python ‘time’ package to record the time used for each tool. Briefly, we recorded the time between the tool command start and the tool command end. We used the ‘top’ command to monitor the memory used when the tool was running for memory usage. The maximum used memory was selected as the memory usage for each tool. When we ran the code, we tried to suspend all other activities to make sure we could obtain the actual maximum memory usage.

Comparison of the nine tools using mouse skin and cortex data

Comparison of the predicted LR interaction pairs

We found a large inconsistency in the number of predicted LR interactions among the different tools. The number of LR interaction pairs predicted by the nine tools for a given sample often varies from dozens to thousands. To make the predicted LR pairs from different tools comparable, we selected the top predictions accordingly for each tool. For example, we kept the top 10% of LR pairs from iTALK, ICELLNET, NicheNet and NATMI.

After selecting the top LR pairs, the difference of the predicted LR numbers is lessened but persists. For example, for the embryonic day 13.5 control replicate sample (GSM3453536) in the Embryonic mouse skin dataset, the number of predicted LR interaction pairs between fibroblast type A (FIB-A) and fibroblast type B (FIB-B) cells still is varied by tools ranging from 12 to 192 (Figure 2A). This situation is similar for all four samples in the mouse skin dataset and the mouse cerebral cortex sample. In the mouse cortex sample, the number of LR interactions from the Pyramidal CA1 to oligodendrocytes cells varies from 44 (CellChat) to 235 (ICELLNET) (Figure 2C). CellChat, scMLnet and SingleCellSignalR predicted fewer than a few dozen LR interactions, while iTALK, ICELLNET and NATMI often output >100 LR interactions based on our selection criteria above. For different tools, not only does the predicted number of LR interactions differ for the same sample but also the number distribution of LR interactions across different samples. For example, the CellChat, CellPhoneDB, PyMINEr predicted most LR pairs from the embryonic day 14.5 control sample (GSM3453537) in the mouse skin dataset, while SingleCellSignalR predicted the least LR interactions in the same sample.

Figure 2.

Figure 2

Tool results on different datasets. (A) The number of predicted LR pairs varies from FIB-A to FIB-B cell types in the embryonic mouse skin samples. (B) A large percentage of predicted LR pairs by all tools except PyMINEr are in C-LRI from FIB-A to FIB-B cell types in embryonic mouse skin samples. (C) The number of predicted LR pairs varies from pyramidal CA1 to oligodendrocytes cell types in mouse cerebral cortex. (D) A large percentage of predicted LR pairs by all tools except PyMINEr are in C-LRI from pyramidal CA1 to oligodendrocytes cell types in mouse cerebral cortex.

A closer examination of the LR pairs shows few predictions are shared between tools (Supplementary Figures 1 and 2 available online at http://bib.oxfordjournals.org/). Although iTALK, iCELLNET and NATMI all had over 100 pairs of LR interaction predicted, they only share 20–30% of their predictions. For example, for the embryonic day 13.5 control replicate sample (GSM3453536), iTALK shares with ICELLNET and NATMI 23 (19.7%), 39 (33.3%) of its 117 LR predictions, respectively. Meanwhile, among its 192 LR predictions, ICELLNET only has 23 (12%), 38 (19.8%) LR pairs in common with iTALK and NATMI, respectively. On the other hand, for the tools that frequently predicted fewer LR pairs in a given sample, the overlap between the predicted LR pairs is even less. For instance, out of the 31 predicted LR pairs, SingleCellSignalR only has one LR pair in common with scMLnet.

We compared the LR predictions against the C-LRI database (Figure 2B). We found that all the predictions from iTALK and scMLnet are in the database C-LRI. A large percentage of NATMI, SingleCellSignalR and ICELLNET predictions are in the C-LRI. For example, for the embryonic day 13.5 control replicate sample (GSM3453536), 86.36% of the NATMI’s 110 LR predictions, 77.42% of the SingleCellSignalR’s 31 predictions and 77.08% of the ICELLNET’s 192 predictions were found in the database C-LRI. In contrast, CellChat, NicheNet, CellPhoneDB have a smaller percentage of overlap with the C-LRI data (41.67%, 34.74% and 44.83%, respectively). The same pattern is approximately followed by the mouse cortex sample where iTALK and scMLnet predicted all C-LRI LR pairs, while ICELLNET and NATMI predicted more C-LRI pairs than others taking into account the number of predictions (Figure 2D). Among the nine tools, PyMINEr is the only exception in that it has little overlap with the C-LRI data. Further inspection shows that PyMINEr’s predictions are largely PPIs that are not often annotated LR interactions.

The above observations were made when the resolution parameter of Seurat was set as 0.4 (corresponding to 20 clusters assigned to seven cell types). We also investigated how cell clustering with various Seurat resolution settings might change the LR interaction prediction of the nine tools. To do that, we set the resolution to be 0.1, 0.3, 0.5, 0.7 and 0.9, corresponding to 8, 15, 20, 22 and 22 clusters, respectively. We then compared all LR pairs predicted by each tool. We found that cell clustering impacted the tools differently. Take the mouse cerebral cortex sample as an example. Although the resulted cluster number changes, the overall number of LR pairs remains similar for most tools (Supplementary Figure 3 available online at http://bib.oxfordjournals.org/). Only ICELLNET and NATMI have little difference in resolution 0.1 compared with other parameters. However, when examining LR interactions between specific cell types, we observed that the number of LR pairs consistently increases with the higher resolution setting, except SingleCellSignalR (Figure 3). For example, we observed the number of CellChat-predicted LR pairs between cell types pyramidal CA1 and pyramidal SS is 15, 21, 24, 26 and 26 for resolutions 0.1, 0.3, 0.5, 0.7 and 0.9, respectively. However, the corresponding number of SingleCellSignalR-predicted LR pairs is 15, 21, 24, 26 and 26, respectively. The distributions of LR interactions for the nine tools are similar for other cell types, e.g. the number of LR pairs between cell-type pyramidal CA1 and oligodendrocytes (Supplementary Figure 4 available online at http://bib.oxfordjournals.org/). With more clusters generated, the same LR pair can be predicted multiple times by multiple clusters corresponding to the same cell types, leading to more predicted LR pairs. The exception of SingleCellSignalR might be due to its specific procedure to select significantly expressed genes for consideration in LR prediction. Therefore, variations in cell-clustering and cell-type classification can impact the prediction of LR interactions.

Figure 3.

Figure 3

The number of LR pairs between pyramidal CA1 and pyramidal SS cell types in mouse cerebral cortex consistently increases with the higher resolution setting, except SingleCellSignalR.

Comparison of the predicted CCI networks

We compared the CCI networks generated by the nine tools using the dissimilarity score (Section ‘Testing data compilation and tool comparison methods’). We found that the CCI dissimilarity patterns are conserved between different samples. Figure 4 shows the dissimilarity scores between tools applied to the four samples in the embryonic mouse skin dataset. We observed that ICELLNET, NATMI, iTALK and NicheNet often have similar CCI outputs, scMLnet and SingleCellSignalR, in general, have similar CCIs. However, CCI networks from SingleCellSignalR are often very different from those produced by other tools. For example, the mean dissimilarity score between PyMINEr and the SingleCellSignalR was 8.12. In contrast, ICELLNET and NATMI have an average score of 0.78. The dissimilarity pattern between tools is followed by the mouse cortex sample as well (Supplementary Figure 5 available online at http://bib.oxfordjournals.org/).

Figure 4.

Figure 4

The CCI network dissimilarity scores between different tools on the four samples in the embryonic mouse skin dataset. A subset of tools have similar CCI outputs (darker blocks), while scMLnet and SingleCellSignalR have quite different CCI output from other tools.

Further inspection of the CCIs between different tools shows that the difference of CCI networks is largely derived from the LR interaction predictions. For the embryonic day 13.5 control replicate sample (GSM3453536) in the mouse skin dataset, we observe the same CCI edges identified by ICELLNET, iTALK, and NATMI with slightly varied edge weights. In contrast, SingleCellSingalR and scMLnet generally have much smaller edge weights and fewer CCI edges between cells (Figure 5A and B). In the CCI network corresponding to SingleCellSignalR, the signaling from immune cells to basal cells was not inferred due to the lack of LR interaction predictions. For the same reason, the communication from muscle cells to MELA cells was not inferred by the SingleCellSignalR. The same observation was made for the mouse cortex sample. A larger number of LR predictions, in general, leads to a denser CCI network. Tools, such as SingleCellSignalR and scMLnet, that tend to predict a smaller number of LRs can result in a lower CCI detection rate. On the other hand, tools, such as ICELLNET, iTALK and NATMI, tend to predict a larger number of CCIs, which can be associated with lower confidence in the predictions.

Figure 5.

Figure 5

CCI network examples of the selected six tools on the embryonic mouse skin dataset (GSM3453536). SingleCellSingalR and scMLnet generally have much smaller edge weights and fewer CCI edges between cells than other tools. (A) The CCI subnetwork corresponding to the basal, immune, FIB-A and FIB-B cell types. (B) The CCI subnetwork corresponding to the endothelial, spinous, MELA and muscle cell types.

We performed the Geometric sketching subsampling procedure to compare the consistency of the CCI predictions by the nine tools (Section ‘Testing data compilation and tool comparison methods’). Briefly, we separately sampled 90%, 80% and 70% of the total number of cells in each data. We then ran the nine tools using the sampled data as input. We computed the precision, recall and F1 scores based on the running results (Supplementary Tables 2 and 3 available online at http://bib.oxfordjournals.org/). The precision is higher with the increasing percentage of subsampling for both datasets. In general, ICELLNET and NATMI have higher subsampling precision and recall overall compared with scMLnet and SingleCellSignalR (Figure 6A and B).

Figure 6.

Figure 6

ICELLNET and NATMI have higher subsampling consistency scores (precision, recall and F1 score) than other tools. (A) The subsampling consistency scores on the Embryonic mouse skin data. (B) The subsampling consistency scores on the Mouse cerebral cortex data.

Comparison of the nine tools using mouse spinal cord injury data

We performed the tool comparison on the recently published mouse spinal cord data. The scRNA-Seq data with 10 wild-type mice samples correspond to the transcriptional profiling of the uninjured and injured spinal cord at 1, 3 and 7 dpi. For mouse spine cord samples, we set Seurat parameters to discover the 15 cell types based on the original UMAP plot [96].

Similar to the mouse skin and cortex datasets, the nine tools predicted different numbers of LR interactions. For example, the number of LR interactions from Microglia cells to Endothelial cells varies from 11 (CellChat) to 242 (iTALK) for the ‘uninjured sample3’ (GSM4955361) (Figure 7A). Consistent with the observations from mouse skin and cortex samples, iTALK, ICELLNET and NATMI often predicted hundreds of LR interaction pairs. The top 10% pairs are counted over 100. However, the predicted LR pairs are largely not overlapping. For example, for the ‘uninjured sample3’ (GSM4955361), iTALK shares with ICELLNET and NATMI 31 (12.8%), 50 (20.7%) of its 242 LR predictions, respectively. Meanwhile, ICELLNET only has 31 (17.8%), 46 (26.4%) of its 174 LR predictions in common with iTALK and NATMI, respectively. While NATMI has nearly 40% of its 127 predictions overlapping with iTALK’s 242 predictions, SingleCellSignalR has 74 predictions, with only 6 overlapping with ICELLNET, 5 overlapping with NATMI and 0 overlapping with CellChat, CellPhoneDB and PyMINEr. Although CellPhoneDB has 54 predictions, it shared a few with other tools. This situation is similar across all the ten samples (Supplementary Figure 6 available online at http://bib.oxfordjournals.org/). This observation again suggests the large inconsistency of LR interaction ranking and predictions.

Figure 7.

Figure 7

Comparative performance of tools on the mouse spine cord injury dataset. (A) The number of predicted LR pairs from microglia to endothelial cell types varies greatly. (B) A large proportion of predicted LR pairs by all tools except PyMINEr are in C-LRI from microglia to endothelial cell types.

Comparing the nine tools’ LR pair predictions to the C-LRI database for all samples, we found almost all predictions from iTALK and scMLnet, the majority of the predictions by CellChat, NATMI, SingleCellSignalR and ICELLNET are in C-LRI. At the same time, NicheNet and PyMINEr have a lower percentage of their predictions in C-LRI (Figure 7B). Take the ‘uninjured sample3’ (GSM4955361) as an example, 90.91% of the 11 predictions of CellChat, 90.55% of the NATMI’s 127 predictions, 81.08% of the SingleCellSignalR’s 74 predictions and 67.82% of the ICELLNET predictions are also in C-LRI. In contrast, NicheNet, CellPhoneDB and PyMINEr were only found 24.48%, 34.38%, and 2.48% of their 143, 64 and 121 predictions in C-LRI. The resulted statistics for each tool are similar across the other nine samples (Figure 7B). Considering the total number of LR predictions, NATMI’s LR predictions are most consistent with the C-LRI information, indicating that NATMI is most likely to predict well-annotated LR interactions.

The large variance of LR interaction predictions by different tools has an immediate impact on the study of LR interaction dynamics. We compared the nine tools to see how LR interactions between certain cell types change with the days passed the injury. Multiple tools consistently identified some dynamics of LR interactions. For instance, the THBS1-CD47 interactions between the monocyte and endothelial cells were not found in uninjured mice but were identified in injured mice by ICELLNET, NATMI, PyMINEr and ITALK. It is also interesting that the SPP1-ITGA5 interactions between microglial and endothelial cells were not discovered in the uninjured mice, injured mice at 7 dpi, but were only found in the injured mice at 1 and 3 dpi. This discovery was made consistently by CellChat, PyMINEr, iTALK and scMLnet. Also, the TGFB1-ENG interactions between the microglial and endothelial cells became identifiable by ICELLNET, NATMI and iTALK starting at 3 dpi and continuing into 7 dpi. However, most LR pairs were not supported by the majority of the nine tools. For example, the GDNF-GFRA2 interactions between neutrophil and endothelial cells only occur at 1 dpi, identified by ICELLNET and iTALK, but not by other tools. In some cases, the discoveries of the LR dynamics patterns conflict with each other. For example, GNAI2-ADORA1 interactions between neutrophil and endothelial cells were discovered by NATMI at 1 and 7 but not at 3 dpi and uninjured mice. In contrast, the same LR interaction was identified by SingleCellSignalR only at 3 dpi. In general, we observed that ICELLNET, iTALK and NATMI have high similarities in terms of LR dynamics identification.

The comparison between the CCI networks generated by the nine tools shows CCI dissimilarity patterns are largely conserved between different samples (Supplementary Figure 7 available online at http://bib.oxfordjournals.org/). Again, we observed that ICELLNET, NATMI, iTALK and NicheNet often have similar CCI outputs, while scMLnet and SingleCellSignalR generally have similar CCIs. We also tested the robustness of the nine tools on the mouse spinal cord data with the Geometric sketching approach. We found iTALK and NATMI often show better consistency than scMLnet and SingleCellSignalR (Supplementary Table 4 available online at http://bib.oxfordjournals.org/).

Runtime and memory analysis

We compared the runtime and memory cost of the nine tools. We found the running time varies from seconds to hours. For example, using the mouse cerebral cortex dataset as an example, the iTALK used the shortest time (~20 s), while scMLnet took the longest time (8 h) (Figure 8). The variation in the time cost is largely attributed to the specific algorithms for CCI inference. For instance, iTALK predicts LR pairs among all cell types simultaneously, while scMLnet and NicheNet consider LR pairs for two cell types one at a time. In certain cases, such as PyMINEr and SingleCellSignalR, additional outputs such as networks and images can make the process longer. The running time is also affected by the number of identified cell clusters.

Figure 8.

Figure 8

The comparison of the running time and memory usage of the nine tools.

We also estimated the memory usage of each tool. Almost all of the tools’ memory usage is between 0.5 and 2.5 GB except for PyMINEr and SingleCellSignalR, which cost 13.3 and 5.2 GB, respectively. For PyMINEr, the calculating of correlation between gene pairs can be memory-intensive. For SingleCellSignalR, a large memory is required to obtain a signature cell matrix by computing the average gene signature expression across all the cells and signatures.

Conclusions and outlook

We performed an initial study of LR-based CCI inference tools using nine most recent prediction software using well-studied scRNA-Seq samples. We provided a side-by-side comparison scenario regarding LR interaction resources, required input, LR output, CCI inference, run time, memory consumption and visualization capability.

The number of predicted LR interaction pairs can significantly differ among the nine tools in terms of LR interaction predictions. Compared with the other tools, which often predict hundreds of LR interactions, CellChat and SingleCellSignalR generate dozens of LR interaction predictions for the testing datasets. We found that iTALK, ICELLNET, NicheNet and NATMI provide a relatively more consistent number of LR interaction predictions with each other than others. However, their predicted LR interaction pairs often do not form consensuses. In terms of the CCI inference, it is directly impacted by the previous LR interaction prediction results. Naturally, we observed that tools with similar LR predictions result in similar CCI networks. We tested the robustness of these tools by performing subsampling of the original input data. We found that most of the current tools have good robustness regarding data selection.

From the above analysis of the 15 samples containing nearly 100K cells, we observed a large discrepancy among the nine tools in LR predictions and the subsequent CCI inference. One reason can be that different tools run on their LR databases, and most of the tools do not allow user-specified LR databases in their current pipeline. We constructed a more reliable LR interaction database C-LRI containing LR interaction pairs supported by at least four literature resources to compare the predicted LR interactions. Therefore, LR pairs in C-LRI are mostly well-annotated LR interactions. We used C-LRI as one way to benchmark the LR predictions from these tools. These benchmark data provided us with additional insight into the tools. For instance, PyMINEr predicted a decent number of LR interaction pairs for our test data, but few of PyMINEr’s predictions are in the C-LRI database. We need to note that even for LR predictions not in C-LRI, one cannot claim them as false predictions. However, it does provide an intuition of the LR prediction accuracy, assuming well-known and relatively new LR interactions are equally distributed.

We tested the sensitivity of LR predictions by enumerating cell clustering/classification at different resolutions. We found that most tools generated more LR predictions when more clusters were obtained. One possible explanation is that as the number of clusters increases, the number of possible cell-type pairs increases, and the number of predicted LR interactions also increases. This could indicate issues relevant to multiple hypothesis testing. Meanwhile, because clusters often correspond to cell-type classification, the clustering results would affect the CCI inference in the later stage. Besides choosing clustering algorithms that are robust to parameter settings, knowing the approximate number of cell types in the data would increase the confidence of the inferred CCIs. However, accurate specification of cell-type composition for a given sample is often unavailable and requires domain expertise.

We found that the number of single cells in a given sample might also affect the number of identified LR pairs for tools, such as iTALK, for which fewer cells often correspond to fewer LR predictions. This is also evident when comparing the mouse skin samples and the mouse cortex sample, where the mouse cortex sample contains a smaller number (3005) of high-quality single cells. We observed that the number of iTALK-predicted LR interactions dropped from 108 (the lowest for mouse skin sample) in the mouse skin sample to 44 (mouse cortex sample). At the same time, ICELLNET, NicheNet and NATMI still predicted over 100 LR interactions in the mouse cortex sample.

A few caveats to usage and interpretation when applying a computational tool for LR-based CCI inference. First, the current LR annotation is incomplete. Even though there is a long list of LR pairs in various resources, many of them are computational predictions, e.g. from PPI databases, that might contain false positives. Second, most tools have specific details about calculating significance scores, e.g. P-values, and defining a threshold to filter confident LR pairs [27, 29]. These details need to be considered to interpret the predictions correctly. Third, the cell-type annotation or classification can affect the final LR-based CCI inference. Carefully selecting parameters in the cell clustering is required for a more accurate prediction. How to achieve accurate cell-type classification from scRNA-Seq data is still an active topic for computational method development [95, 98, 99].

Hence, this initial comparison shows that when applied to scRNA-Seq samples, the current LR-based CCI tools can provide insight into the cell communications at the single-cell resolution. Although different tools are not often consistent, predictions from some of these tools, such as iTALK, ICELLNET and NATMI, have shown good overlap with each other. They also predicted a larger percentage of well-annotated LR pairs. It would be practical for specific biological applications to run multiple tools to generate a consensus before biological validation. Finally, with more tools for CCI prediction and more experimental techniques on CCI measurement published, golden standards need to be developed for more sophisticated evaluations of the tools in the near future. For example, with the rapidly developing spatial transcriptomics technologies [100], methods have been developed to integrate spatial transcriptomics data and scRNA-Seq data [101, 102]. Such integrated data have demonstrated the promise to understand cell subpopulations and would hold promise for improved CCI inference [77, 81]. Also, additional information such as those from proteomics data and experiments on the effect of receptor gene knockouts can be used for further testing and benchmarking when available.

Key Points

  • Nine computational tools on LR-based CCI inference published in the last 4 years are surveyed.

  • Nine tools are systematically evaluated on 15 scRNA-Seq samples containing nearly 100K single cells under different conditions in mouse.

  • The number of predicted LR interaction pairs and CCI networks show the consistency of subsets of the nine tools but has a large variance among the nine tools.

  • The performance of the nine tools is affected by the cell clustering, cell-type classification and the scale and quality of scRNA-Seq experiments but shows robustness to data sampling.

  • Obtaining a consensus CCI by running multiple tools is recommended for CCI interpretation in specific biological applications.

Supplementary Material

Supplementary_file_S1_elac019
Supplementary_figures_tables_elac019

Saidi Wang is a graduate student at the Department of Computer Science, University of Central Florida. He mainly works on gene transcriptional regulation and metagenomics.

Hansi Zheng is a graduate student at the Department of Computer Science, University of Central Florida. He mainly works on non-coding RNAs and epigenomics.

James S. Choi is a research associate at the Miami Project to Cure Paralysis, University of Miami School of Medicine. He mainly works on nervous system cell biology, bioinformatics, data analysis and drug development.

Jae K. Lee is a professor at the Miami Project to Cure Paralysis, University of Miami School of Medicine. He mainly works on the investigating mechanisms of scar formation after CNS injury.

Xiaoman Li is an associate professor at the Burnett School of Biomedical Science, University of Central Florida. He works on chromatin interactions and metagenomics.

Haiyan Hu is an associate professor at the Department of Computer Science, University of Central Florida. She works on non-coding RNAs, epigenomics and gene transcriptional regulation.

Authors’ contributions

H.H. and X.L. conceived the idea. S.W. and H.Z. processed the data. S.W., X.L. and H.H. implemented the idea and generated results. S.W., J.S.C., J.K.L., X.L. and H.H. analyzed the results and contributed to the writing of the manuscript. All authors read and approved the final manuscript.

Funding

This work was supported by the National Science Foundation [2120907, 1661414, 2015838].

Conflict of Interest

The authors declare that there is no conflict of interest.

References

  • 1. Zhai Y, Li G, Li R, et al. Single-cell RNA-sequencing shift in the interaction pattern between glioma stem cells and immune cells during tumorigenesis. Front Immunol 2020;11:581209. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Norman TM, Horlbeck MA, Replogle JM, et al. Exploring genetic interaction manifolds constructed from rich single-cell phenotypes. Science 2019;365:786–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Dong K, Zhang S. Joint reconstruction of cis-regulatory interaction networks across multiple tissues using single-cell chromatin accessibility data. Brief Bioinform 2021;22(3):bbaa120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Almet AA, Cang Z, Jin S, et al. The landscape of cell-cell communication through single-cell transcriptomics. Curr Opin Syst Biol 2021;26:12–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Efremova M, Vento-Tormo M, Teichmann SA, et al. CellPhoneDB: inferring cell-cell communication from combined expression of multi-subunit ligand-receptor complexes. Nat Protoc 2020;15:1484–506. [DOI] [PubMed] [Google Scholar]
  • 6. Junttila MR, Sauvage FJ. Influence of tumour micro-environment heterogeneity on therapeutic response. Nature 2013;501:346–54. [DOI] [PubMed] [Google Scholar]
  • 7. Hanahan D, Weinberg RA. Hallmarks of cancer: the next generation. Cell 2011;144:646–74. [DOI] [PubMed] [Google Scholar]
  • 8. Jin S, Guerrero-Juarez CF, Zhang L, et al. Inference and analysis of cell-cell communication using CellChat. Nat Commun 2021;12:1088. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Armingol E, Officer A, Harismendy O, et al. Deciphering cell-cell interactions and communication from gene expression. Nat Rev Genet 2021;22:71–88. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Ramilowski JA, Goldberg T, Harshbarger J, et al. A draft network of ligand-receptor-mediated multicellular signalling in human. Nat Commun 2015;6:7866. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Ma F, Zhang S, Song L, et al. Applications and analytical tools of cell communication based on ligand-receptor interactions at single cell level. Cell Biosci 2021;11:121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Browaeys R, Saelens W, Saeys Y. NicheNet: modeling intercellular communication by linking ligands to target genes. Nat Methods 2020;17:159–62. [DOI] [PubMed] [Google Scholar]
  • 13. Wang Y, Goodison S, Li X, et al. Prognostic cancer gene signatures share common regulatory motifs. Sci Rep 2017;7:4750. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Hwang B, Lee JH, Bang D. Single-cell RNA sequencing technologies and bioinformatics pipelines. Exp Mol Med 2018;50:1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Zhang Y, Lv J, Liu H, et al. HHMD: the human histone modification database. Nucleic Acids Res 38:D149–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Barrett T, Troup DB, Wilhite SE, et al. NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res 2009;37:D885–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Wheeler DL, Barrett T, Benson DA, et al. Database resources of the National Center for biotechnology information. Nucleic Acids Res 2008;36:D13–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Parkinson H, Kapushesky M, Shojatalab M, et al. ArrayExpress—a public database of microarray experiments and gene expression profiles. Nucleic Acids Res 2007;35:D747–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Svensson V, Veiga BE, Pachter L. A curated database reveals trends in single-cell transcriptomics. Database (Oxford) 2020;2020:baaa073. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Jiang J, Wang C, Qi R, et al. scREAD: a single-cell RNA-Seq database for Alzheimer's disease. iScience 2020;23:101769. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Cao Y, Zhu J, Jia P, et al. scRNASeqDB: a database for RNA-Seq based gene expression profiles in human single cells. Genes (Basel) 2017;8(12):368. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Zhao T, Lyu S, Lu G, et al. SC2disease: a manually curated database of single-cell transcriptome for human diseases. Nucleic Acids Res 2021;49:D1413–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Shao X, Lu X, Liao J, et al. New avenues for systematically inferring cell-cell communication: through single-cell transcriptomics data. Protein Cell 2020;11:866–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Noël F, Massenet-Regad L, Carmi-Levy I, et al. Dissection of intercellular communication using the transcriptome-based framework ICELLNET. Nat Commun 2021;12:1089. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Wang Y, Wang R, Zhang S, et al. iTALK: an R package to characterize and illustrate intercellular communication. bioRxiv 2019;507871. [Google Scholar]
  • 26. Cheng J, Zhang J, Wu Z, et al. Inferring microenvironmental regulation of gene expression from single-cell RNA sequencing data using scMLnet with an application to COVID-19. Brief Bioinform 2021;22:988–1005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Cabello-Aguilar S, Alame M, Kon-Sun-Tack F, et al. SingleCellSignalR: inference of intercellular networks from single-cell transcriptomics. Nucleic Acids Res 2020;48:e55. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Tyler SR, Rotti PG, Sun X, et al. PyMINEr finds gene and autocrine-paracrine networks from human islet scRNA-Seq. Cell Rep 2019;26:1951–1964.e1958. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Hou R, Denisenko E, Ong HT, et al. Predicting cell-to-cell communication networks using NATMI. Nat Commun 2020;11:5011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Dries R, Zhu Q, Dong R, et al. Giotto: a toolbox for integrative analysis and visualization of spatial expression data. Genome Biol 2021;22:78. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Cang Z, Nie Q. Inferring spatial and signaling relationships between cells from single cell transcriptomic data. Nat Commun 2020;11:2084. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Stuart T, Butler A, Hoffman P, et al. Comprehensive integration of single-cell data. Cell 2019;177:1888, e1821–902. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Qiu X, Mao Q, Tang Y, et al. Reversed graph embedding resolves complex single-cell trajectories. Nat Methods 2017;14:979–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Kharchenko PV, Silberstein L, Scadden DT. Bayesian approach to single-cell differential expression analysis. Nat Methods 2014;11:740–2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 2014;15:550. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Kanehisa M, Furumichi M, Tanabe M, et al. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res 2017;45:D353–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Rouillard AD, Gundersen GW, Fernandez NF, et al. The harmonizome: a collection of processed datasets gathered to serve and mine knowledge about genes and proteins. Database (Oxford) 2016;2016:baw100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Pawson AJ, Sharman JL, Benson HE, et al. The IUPHAR/BPS guide to PHARMACOLOGY: an expert-driven knowledgebase of drug targets and their ligands. Nucleic Acids Res 2014;42:D1098–106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Damle NP, Kohn M. The human DEPhOsphorylation database DEPOD: 2019 update. Database (Oxford) 2019;2019:baz133. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Linding R, Jensen LJ, Pasculescu A, et al. NetworKIN: a resource for exploring cellular phosphorylation networks. Nucleic Acids Res 2008;36:D695–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Lachmann A, Ma'ayan A. KEA: kinase enrichment analysis. Bioinformatics 2009;25:684–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Van Landeghem S, Hakala K, Ronnqvist S, et al. Exploring biomolecular literature with EVEX: connecting genes through events. Homology, and Indirect Associations, Adv Bioinformatics 2012;2012:582765. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Vinayagam A, Stelzl U, Foulle R, et al. A directed protein interaction network for investigating intracellular signal transduction. Sci Signal 2011;4:rs8. [DOI] [PubMed] [Google Scholar]
  • 44. Kamburov A, Stelzl U, Lehrach H, et al. The ConsensusPathDB interaction database: 2013 update. Nucleic Acids Res 2013;41:D793–800. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Li T, Wernersson R, Hansen RB, et al. A scored human protein-protein interaction network to catalyze genomic interpretation. Nat Methods 2017;14:61–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Cerami EG, Gross BE, Demir E, et al. Pathway commons, a web resource for biological pathway data. Nucleic Acids Res 2011;39:D685–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Turei D, Korcsmaros T, Saez-Rodriguez J. OmniPath: guidelines and gateway for literature-curated signaling pathway resources. Nat Methods 2016;13:966–7. [DOI] [PubMed] [Google Scholar]
  • 48. Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 2005;102:15545–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Barrett T, Wilhite SE, Ledoux P, et al. NCBI GEO: archive for functional genomics data sets--update. Nucleic Acids Res 2013;41:D991–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Xie X, Rigor P, Baldi P. MotifMap: a human genome-wide map of candidate regulatory motif sites. Bioinformatics 2009;25:167–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Matys V, Kel-Margoulis OV, Fricke E, et al. TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res 2006;34:D108–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Mathelier A, Zhao X, Zhang AW, et al. JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles. Nucleic Acids Res 2014;42:D142–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Consortium EP . The ENCODE (ENCyclopedia of DNA elements) project. Science 2004;306:636–40. [DOI] [PubMed] [Google Scholar]
  • 54. Lachmann A, Xu H, Krishnan J, et al. ChEA: transcription factor regulation inferred from integrating genome-wide ChIP-X experiments. Bioinformatics 2010;26:2438–44. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55. Heng TS, Painter MW. Immunological genome project C. The immunological genome project: networks of gene expression in immune cells. Nat Immunol 2008;9:1091–4. [DOI] [PubMed] [Google Scholar]
  • 56. Jojic V, Shay T, Sylvia K, et al. Identification of transcriptional regulators in the mouse immune system. Nat Immunol 2013;14:633–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57. Griffon A, Barbier Q, Dalino J, et al. Integrative analysis of public ChIP-seq experiments reveals a complex multi-cell regulatory landscape. Nucleic Acids Res 2015;43:e27. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58. Bovolenta LA, Acencio ML, Lemke N. HTRIdb: an open-access database for experimentally verified human transcriptional regulation interactions. BMC Genomics 2012;13:405. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59. Han H, Shim H, Shin D, et al. TRRUST: a reference database of human transcriptional regulatory interactions. Sci Rep 2015;5:11432. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60. Liu ZP, Wu C, Miao H, et al. RegNetwork: an integrated database of transcriptional and post-transcriptional regulatory networks in human and mouse. Database (Oxford) 2015;2015:bav095. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61. Kawaji H, Kasukawa T, Forrest A, et al. The FANTOM5 collection, a data series underpinning mammalian transcriptome atlases in diverse cell types. Sci Data 2017;4:170113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62. Yu NY, Hallstrom BM, Fagerberg L, et al. Complementing tissue characterization by integrating transcriptome profiling from the human protein atlas and from the FANTOM5 consortium. Nucleic Acids Res 43:6787–98. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63. Jupe S, Akkerman JW, Soranzo N, et al. Reactome—a curated knowledgebase of biological pathways: megakaryocytes and platelets. J Thromb Haemost 2012;10:2399–402. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64. Ben-Shlomo I, Yu Hsu S, Rauch R, et al. Signaling receptome: a genomic and evolutionary perspective of plasma membrane receptors involved in signal transduction. Sci STKE 2003;2003:RE9. [DOI] [PubMed] [Google Scholar]
  • 65. Gillespie M, Jassal B, Stephan R, et al. The reactome pathway knowledgebase 2022. Nucleic Acids Res 2022;50:D687–92. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66. Dimmer EC, Huntley RP, Alam-Faruque Y, et al. The UniProt-GO annotation database in 2011. Nucleic Acids Res 2012;40:D565–70. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67. Talukder A, Saadat S, Li X, et al. EPIP: a novel approach for condition-specific enhancer-promoter interaction prediction. Bioinformatics 2019;35(20):3877–883. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68. Cappuccio A, Zollinger R, Schenk M, et al. Combinatorial code governing cellular responses to complex stimuli. Nat Commun 2015;6:6847. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69. Ogris C, Guala D, Helleday T, et al. A novel method for crosstalk analysis of biological networks: improving accuracy of pathway annotation. Nucleic Acids Res 2017;45:e8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70. Castresana-Aguirre M, Sonnhammer ELL. Pathway-specific model estimation for improved pathway annotation by network crosstalk. Sci Rep 2020;10:13585. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71. Ruppert SM, Chehtane M, Zhang G, et al. JunD/AP-1-mediated gene expression promotes lymphocyte growth dependent on interleukin-7 signal transduction. PLoS One 2012;7:e32262. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72. Ding J, Cai X, Wang Y, et al. ChIPModule: systematic discovery of transcription factors and their cofactors from ChIP-seq data. Pac Symp Biocomput 2013;320–31. [PubMed] [Google Scholar]
  • 73. Zhao C, Li X, Hu H. PETModule: a motif module based approach for enhancer target gene prediction. Sci Rep 2016;6:30043. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74. Petri T, Altmann S, Geistlinger L, et al. Addressing false discoveries in network inference. Bioinformatics 2015;31:2836–43. [DOI] [PubMed] [Google Scholar]
  • 75. Boisset JC, Vivie J, Grun D, et al. Mapping the physical network of cellular interactions. Nat Methods 2018;15:547–53. [DOI] [PubMed] [Google Scholar]
  • 76. Joost S, Jacob T, Sun X, et al. Single-cell transcriptomics of traced epidermal and hair follicle stem cells reveals rapid adaptations during wound healing. Cell Rep 2018;25:585, e587–97. [DOI] [PubMed] [Google Scholar]
  • 77. Halpern KB, Shenhav R, Massalha H, et al. Paired-cell sequencing enables spatial gene expression mapping of liver endothelial cells. Nat Biotechnol 2018;36:962–70. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78. Kumar MP, Du J, Lagoudas G, et al. Analysis of single-cell RNA-Seq identifies cell-cell communication associated with tumor characteristics. Cell Rep 2018;25:1458, e1454–68. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79. Cohen M, Giladi A, Gorki AD, et al. Lung single-cell signaling interaction map reveals basophil role in macrophage imprinting. Cell 2018;175(1031–1044):e1018. [DOI] [PubMed] [Google Scholar]
  • 80. Cain MP, Hernandez BJ, Chen J. Quantitative single-cell interactomes in normal and virus-infected mouse lungs. Dis Model Mech 2020;13(6):dmm044404. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81. Baccin C, Al-Sabah J, Velten L, et al. Combined single-cell and spatial transcriptomics reveal the molecular, cellular and spatial bone marrow niche organization. Nat Cell Biol 2020;22:38–48. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82. Sheikh BN, Bondareva O, Guhathakurta S, et al. Systematic identification of cell-cell communication networks in the developing brain. iScience 2019;21:273–87. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83. Skelly DA, Squiers GT, McLellan MA, et al. Single-cell transcriptional profiling reveals cellular diversity and intercommunication in the mouse heart. Cell Rep 2018;22:600–10. [DOI] [PubMed] [Google Scholar]
  • 84. Yuzwa SA, Yang G, Borrett MJ, et al. Proneurogenic ligands defined by Modeling developing cortex growth factor communication networks. Neuron 2016;91:988–1004. [DOI] [PubMed] [Google Scholar]
  • 85. Ding C, Li Y, Guo F, et al. A cell-type-resolved liver proteome. Mol Cell Proteomics 2016;15:3190–202. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86. Turei D, Valdeolivas A, Gul L, et al. Integrated intra- and intercellular signaling knowledge for multicellular omics analysis. Mol Syst Biol 2021;17:e9923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87. Shao X, Liao J, Li C, et al. CellTalkDB: a manually curated database of ligand-receptor interactions in humans and mice. Brief Bioinform 2021;22(4):bbaa269. [DOI] [PubMed] [Google Scholar]
  • 88. Ximerakis M, Lipnick SL, Innes BT, et al. Single-cell transcriptomic profiling of the aging mouse brain. Nat Neurosci 2019;22:1696–708. [DOI] [PubMed] [Google Scholar]
  • 89. Pavlicev M, Wagner GP, Chavan AR, et al. Single-cell transcriptomics of the human placenta: inferring the cell communication network of the maternal-fetal interface. Genome Res 2017;27:349–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90. Choi H, Sheng J, Gao D, et al. Transcriptome analysis of individual stromal cell populations identifies stroma-tumor crosstalk in mouse lung cancer model. Cell Rep 2015;10:1187–201. [DOI] [PubMed] [Google Scholar]
  • 91. Qiao W, Wang W, Laurenti E, et al. Intercellular network structure and regulatory motifs in the human hematopoietic system. Mol Syst Biol 2014;10:741. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92. Kirouac DC, Ito C, Csaszar E, et al. Dynamic interaction networks in a hierarchically organized tissue. Mol Syst Biol 2010;6:417. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93. Gupta K, Levinsohn J, Linderman G, et al. Single-cell analysis reveals a hair follicle dermal niche molecular differentiation trajectory that begins prior to morphogenesis. Dev Cell 2019;48:17–31.e16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94. Zeisel A, Muñoz-Manchado AB, Codeluppi S, et al. Brain structure. Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq. Science 2015;347:1138–42. [DOI] [PubMed] [Google Scholar]
  • 95. Aibar S, Gonzalez-Blas CB, Moerman T, et al. SCENIC: single-cell regulatory network inference and clustering. Nat Methods 2017;14:1083–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96. Milich LM, Choi JS, Ryan C, et al. Single-cell analysis of the cellular heterogeneity and interactions in the injured mouse spinal cord. J Exp Med 2021;218(8):e20210040. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97. Hie B, Cho H, DeMeo B, et al. Geometric sketching compactly summarizes the single-cell transcriptomic landscape. Cell Syst 2019;8:483, e487–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98. Alquicira-Hernandez J, Sathe A, Ji HP, et al. scPred: accurate supervised method for cell-type classification from single-cell RNA-seq data. Genome Biol 2019;20:264. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99. Stegle O, Teichmann SA, Marioni JC. Computational and analytical challenges in single-cell transcriptomics. Nat Rev Genet 2015;16:133–45. [DOI] [PubMed] [Google Scholar]
  • 100. Satija R, Farrell JA, Gennert D, et al. Spatial reconstruction of single-cell gene expression data. Nat Biotechnol 2015;33:495–502. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 101. Moncada R, Barkley D, Wagner F, et al. Integrating microarray-based spatial transcriptomics and single-cell RNA-seq reveals tissue architecture in pancreatic ductal adenocarcinomas. Nat Biotechnol 2020;38:333–42. [DOI] [PubMed] [Google Scholar]
  • 102. Achim K, Pettit JB, Saraiva LR, et al. High-throughput spatial mapping of single-cell RNA-seq data to tissue of origin. Nat Biotechnol 2015;33:503–9. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary_file_S1_elac019
Supplementary_figures_tables_elac019

Articles from Briefings in Functional Genomics are provided here courtesy of Oxford University Press

RESOURCES