Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Apr 19.
Published in final edited form as: Cell Syst. 2023 Feb 13;14(4):302–311.e4. doi: 10.1016/j.cels.2023.01.004

scTenifoldXct: a semi-supervised method for predicting cell-cell interactions and mapping cellular communication graphs

Yongjian Yang 1, Guanxun Li 2, Yan Zhong 4, Qian Xu 3, Yu-Te Lin 5, Cristhian Roman-Vicharra 3, Robert S Chapkin 6,*, James J Cai 1,3,7,*,#
PMCID: PMC10121998  NIHMSID: NIHMS1873963  PMID: 36787742

Summary

We present scTenifoldXct, a semi-supervised computational tool for detecting ligand-receptor (LR)-mediated cell-cell interactions and mapping cellular communication graphs. Our method is based on manifold alignment, using LR pairs as inter-data correspondences to embed ligand and receptor genes expressed in interacting cells into a unified latent space. Neural networks are employed to minimize the distance between corresponding genes while preserving the structure of gene regression networks. We apply scTenifoldXct to real data sets for testing and demonstrate that our method detects interactions with high consistency compared to other methods. More importantly, scTenifoldXct uncovers weak but biologically relevant interactions overlooked by other methods. We also demonstrate how scTenifoldXct can be used to compare different samples, such as healthy vs. diseased and wild type vs. knockout, to identify differential interactions, thereby revealing functional implications associated with changes in cellular communication status.

Keywords: Cell-cell interaction, cellular communication, gene regression network, single-cell RNA sequencing, scRNA-seq, machine learning, manifold alignment

eTOC

scTenifoldXct is a semi-supervised computational tool for detecting LR-mediated cell-cell interactions. It uncovers weak but biologically important interactions often overlooked by other methods. It can also be used to compare different samples to identify differential interactions, revealing functional implications associated with changes in cellular communication status.

Graphical Abstract

graphic file with name nihms-1873963-f0005.jpg

Introduction

Single-cell technology has revolutionized biomedical research. For example, single-cell RNA sequencing (scRNA-seq) allows the transcriptomic information from tens of thousands of cells to be gathered in parallel in a robust and reproducible way.13 The unprecedented resolution in scRNA-seq data has been exploited to reveal cellular diversity, such as various cell types and cellular states in tissue samples.4 With scRNA-seq data, it is possible to study the cellular communication network through the mapping of cell-specific ligand-receptor (LR) connectivity in complex tissues.5,6 The evolving scRNA-seq data space has sparked the development of numerous computational tools for mining cell-to-cell communication information.718 Nevertheless, robust statistical confidence in detecting results has proved difficult to achieve.19,20

This paper presents scTenifoldXct—a computational tool that incorporates intra- and inter-cellular gene networks to detect cell-cell interaction using scRNA-seq data. scTenifoldXct is semi-supervised and thus can be used with or without reference LR pairs. A redesigned crosstalk scoring metric is also introduced to estimate the interaction strength of each LR pair. The crosstalk scoring metric extends the commonly used metric, i.e., the product of ligand and receptor gene expressions, which is adopted by almost all existing methods. The built-in functionality involving single-cell gene network construction allows scTenifoldXct to couple cell-cell interactions with intracellular activities. scTenifoldXct is also able to perform differential interaction analysis, in which cell-cell interaction patterns are compared between tissue samples. When performing comparative analysis, scTenifoldXct combines and analyzes data from two samples in an integrative manner rather than processing the two samples separately, making the analysis more powerful in detecting subtle differential interactions. To demonstrate these features of scTenifoldXct, we applied scTenifoldXct to scRNA-seq data sets under the single-sample and two-sample application settings.

Results

The scTenifoldXct framework for single-sample application

The simple application for scTenifoldXct is single-sample based, i.e., an application using scRNA-seq data from one sample that contains different cell types. The goal is to detect LR pairs with gene products that interact between two given cell types. In such a single-sample application setting, scTenifoldXct requires two inputs: a gene-by-cell count matrix, Xn×m, and a cell type label vector, cm×1, where n is the number of genes and m is the number of cells. Given any cell type A, let XA be the count matrix whose cells belong to type A. The analytical objectives include: (1) to examine LR pairs in cell-cell interaction databases and identify those pairs that show significant activities and contribute to the LR-mediated interactions in the given sample, and (2) to discover LR pairs that are absent in database. The framework of scTenifoldXct involves three successive steps (Fig. 1), as briefly described below. More technical details are given in Methods.

Fig. 1. Overview of the scTenifoldXct workflow.

Fig. 1.

scTenifoldXct is designed to identify LR-mediated interactions using scRNA-seq data. The scTenifoldXct workflow involves three successive steps, namely, (i) gene similarity matrix construction, (ii) manifold alignment, and (iii) significant LR pairs detection. PC regression is used in the construction of single-cell gene regression networks (scGRNs), WA and WB, to approximate gene regulatory networks for cell types A and B, respectively. The joint similarity matrix is composed of WA, WB, and S, the crosstalk score matrix. Neural networks are used to learn latent representations of each gene pair. Two genes of an LR pair are more likely to interact when their low-dimensional latent representations are more similar.

Step 1. Construction of single-cell gene networks and crosstalk score matrix

We started by creating two gene networks for the two cell types of interest, respectively (Substep 1.1). Subsequently, we computed a gene-gene crosstalk score matrix (Substep 1.2). Finally, we concatenated the two gene networks and the gene-gene crosstalk score matrix to create a joint similarity matrix (Substep 1.3), subject to manifold alignment in Step 2.

Substep 1.1. Constructing gene network for each of the cell types

We used principal component (PC) regression to construct single-cell gene networks as previously described 21. Briefly, for each gene, the expression of the gene was used as a response variable, and the expressions of all other genes were used as dependent variables. The constructed gene networks for cell types A and B were then saved as graphs with signed, weighted, and directional edge weights represented as adjacency matrices WA and WB. Each column of an adjacency matrix stores the PC regression coefficients of a gene, indicating the regulatory relationships between this gene and all other genes. WA and WB were normalized separately by dividing the maximum absolute value of all entries of the matrix. Note that PC regression infers gene-gene expression relationships to approximate gene regulatory networks without requiring any information on transcription factors (TFs) and their targets or knowledge of regulatory elements such as enhancers and promoters. To avoid confusion, the networks, WA and WB, are thereafter referred to as gene regression networks. Users can supply their own WA and WB at this step to replace these two gene regression networks.

Substep 1.2. Calculating crosstalk score between gene pairs from different cell types

We computed the crosstalk score, score(iA,jB), between gene i in cell type A and gene j in cell type B. The score was designed to incorporate information from the mean and variance of expression of the two genes. The score is greater when two genes have a higher and more variable expression (see Methods). The crosstalk scores between pairs of all genes form the matrix S.

Substep 1.3. Constructing gene similarity matrix

The joint similarity matrix W was composed of four blocks. The gene regression networks, WA and WB, were set as the diagonal blocker matrices and the crosstalk score matrix S as the off-diagonal blocker matrix to yield the similarity matrix W between genes from two cell types, A and B.

Step 2. Manifold alignment

The manifold alignment method was applied to the joint similarity matrix W to recover latent representations of gene expression of two cell types. We used neural networks to learn the unified low-dimensional latent representation.22 The manifold alignment problem solved by neural networks has been shown to be computationally efficient while preserving nonlinearity properties,23 and numerically more stable than methods based on eigen decomposition.24,25

Step 3. Determination of significant LR pairs

Euclidean distance between projected genes on the aligned manifold subspace was used to determine significant LR pairs. The distances were computed between all gene pairs from cell type A to B. Based on the distance values, gene pairs were ranked. The closer two genes are, the more likely they are to interact. To select significant LR pairs, p-values were calculated using a nonparametric test. A list of significant LR pairs was selected using a 5% p-value cutoff.

Single-sample scTenifoldXct analysis with inflammatory skin data

We validated scTenifoldXct outcomes using published scRNA-seq data sets. The first data set was generated with lesional and non-lesional skin samples in a study of atopic dermatitis.26 This inflammatory skin data set contains ten different subpopulations of cells, including three subpopulations of fibroblasts and two subpopulations of dendritic cells. Experimental evidence shows that lesional skin has enhanced CCL19-CCR7 interaction between inflammatory fibroblasts and dendritic cells. CCL19 is required for lymphocyte recirculation, homing, and migration to secondary lymphoid organs.27 Lesional skin in atopic dermatitis shows enhanced chemokine signals with higher expression of CCL19 in inflammatory fibroblasts.28

We first focused on interactions between fibroblasts and dendritic cells and identified 30 significant LR pairs (Supplementary Table S1), including CCL19-CCR7. These significant interactions shed light on the function of inflammatory fibroblasts interacting with lymphoid cells and regulating type-2 inflammation, an inflammatory pathway involving a subpopulation of CD4+ T cells known as Th2 cells, which is consistent with the findings of the original study.26 Gene ontology (GO) enrichment analysis further suggested that significant LR genes were enriched in dendritic cell chemotaxis and dendritic cell migration pathways. This result is in line with another previous study,29 showing that the cutaneous immune response depends on dendritic cell migration from the skin to draining lymph nodes. VCAM1-ITGB2 and CCL2-CXCR4 were identified by scTenifoldXct as the top-ranked pairs. These genes do not have a higher expression (measured by either the average expression of a single gene among cells or the product of the average expressions of two genes) than many other LR pairs. But, both VCAM1-ITGB2 and CCL2-CXCR4 are biologically relevant, i.e., VCAM1-associated type-2 inflammation causes the over-expression of cytokines (CCL2 and CCL19) in fibroblasts,26 and CCL2 is known to be induced by inflammatory stimuli such as tumor necrosis factor α (TNF α).30 scTenifoldXct also identified CCL2-TNF. Again, neither of the genes was highly expressed. These examples demonstrate the capability of scTenifoldXct in detecting lowly expressed LR pairs by considering intracellular regulatory activities.

We then focused on interactions between dendritic cells and T cells and identified 33 significant LR pairs (Supplementary Table S2). These interactions centered around two ligand genes: CCL17 and CCL22, both closely associated with the pathophysiology of atopic dermatitis.31 Serum levels of CCL17 and CCL22 are known to be correlated with the disease severity.32 In the context of inflamed skin, dendritic cell-derived CCL17 and CCL22 primarily attract T cells that express the cutaneous homing receptor; they bind to CCR chemokine receptors, which are preferentially expressed in T cells, resulting in inflammation.33 Genes of significant LR pairs also included CCR6, CXCR4, CCL17, S100A8, and S100B. The abundant expression of CCR6 in T cells in skin lesions suggests its important role in early inflammatory T cell recruitment. CXCR4 is a critical receptor involved in both homeostatic and pathological leukocyte trafficking, attracting cells to inflammatory sites and contributing to the activation of integrins required for T cell activation.34 CCL17 binds to the receptor CCR4, which is known to be expressed on activated/memory T cells. Recent data show enhanced expression of CCL17 in the skin lesions and serum of AD patients, leading to dendritic cell migration from the skin to the skin-draining lymph nodes.35 S100 proteins such as S100A8 and S100B are required for immunological homeostasis and inflammation and have been linked to various inflammatory skin diseases, including psoriasis and atopic dermatitis.36,37 S100A8 and S100B were other examples lowly expressed but were detected and ranked as the top LR pairs by scTenifoldXct.

Finally, we conflated the interactions across three cell types, i.e., fibroblast–dendritic cell–T cell. The combined results implied a cascade of intercellular signaling pathways where fibroblasts activate dendritic cell inflammatory responsiveness. The latter interacts with T cells to facilitate T-cell trafficking, lymphoid tissue organization, and type-2 cell recruitment (Fig. 2A). Gene regression networks allowed us to approximate and examine intracellular systems, locate LR genes in the intracellular networks, and trace the upstream TFs that regulate the expression of the LR genes. Fig. 2B illustrates such an integrated intra- and inter-cellular network centered around two significant interactions: CCL19-CCR7 and CXCL12-CXCR4, as predicted by scTenifoldXct. In dendritic cells, REL (Proto-oncogene c-Rel, an NF-κ B subunit) was strongly regulated by CCR7, which is consistent with the experimental results showing that CCR7 activates NF-κ B.38 NF-κ B is also known to be co-activated with AP1 and regulates CCR7 expression.39 JUN (the subunit of AP1) is a TF positively regulates the expression of CCR7. Fig. 2C depicts another integrated network centered around ligands CCL17 and CCL22 and their receptors CCR6 and CXCR4 in dendritic and T cells. We found that AP1 subunits, JUNB and JUN, are linked with these receptors, suggesting a role of JUN/AP-1 proteins in skin inflammation.40 In conclusion, these constructed networks were consistent with our prior knowledge about gene regulatory relationships in the studied inflammatory skin system.

Fig. 2. Cell-cell interactions between fibroblasts, dendritic cells and T cells with their intracellular networks in inflamed skin.

Fig. 2.

(A) An illustration of representative interactions between cell types in skin lesions.26 (B) An integrated network across fibroblasts and dendritic cells with interactions: CCL19-CCR7, CXCL12-CXCR4, and CCL2-TNF (boldfaced in Supplementary Table S1). Blue and red edges indicate negative and positive regulatory relationships between genes, respectively. Genes like HES1 and JUNB are present in both cell types and thus appear twice in the figure. (C) An integrated network across dendritic cells and T cells with interactions: CCL17-CCR6, CCL22-CCR6, and CCL17-CXCR4 (boldfaced in Supplementary Table S2).

Comparison between scTenifoldXct and other existing methods

We compared the prediction results of scTenifoldXct with those produced by five existing methods, namely CellChat,10 Connectome,11 iTALK,9 NATMI,7 and SingleCellSignalR.8 The prediction of each method was generated from the same input data, i.e., fibroblasts and dendritic cells in the inflammatory skin data set, and the same reference cell-cell interaction database, Omnipath,5 was used. For each method, we implemented them at their own optimal thresholds and retained the equal numbers of most significant LR pairs as reported (except that CellChat only produced nine significant pairs) to perform an overlap analysis using an upset plot (Supplementary Fig. S1). Five interactions (CCL2-CCR7, CCL19-CCR7, CCL19-CXCR4, CXCL12-CCR7, and CXCL14-CCR7) were detected by all six methods. Predictions made by scTenifoldXct tended to overlap more with consensus inferred by other methods. For example, scTenifoldXct detected all interactions that CellChat detected. scTenifoldXct detected six interactions missed by CellChat and Connectome, but the other three methods detected them. scTenifoldXct detected three additional interactions missed by CellChat but detected by all other methods. CytoTalk,18 another method considering intracellular networks, detected four significant LR pairs (CCL19-CCR7, CCL26-CCR7, CCL2-VEGFA, and CCL26-CXCR4); all were detected by scTenifoldXct (Supplementary Table S1).

Additionally, scTenifoldXct detected S100A8-ITGB2 and CCL26-CCR6. All other methods failed to detect these two LR pairs because this observation involved genes that were lowly expressed (Supplementary Fig. S2), while the sensitivity of scTenifoldXct was augmented with the manifold alignment of intracellular networks. There is mounting evidence that shows that the two LR pairs are key factors involved in the pathogenesis of atopic dermatitis. S100A8 is an important molecule in the pathogenesis and progression of atopic dermatitis via altering cytokine and skin barrier protein expression levels.41 S100A8 is the first S100 family member that has a potent chemokine-like activity to murine phagocytes in vitro and in vivo.42 S100A8 and S100A9 induce neutrophil adhesion to fibrinogen in vitro via upregulating Mac-1 (a heterodimer of CD11b and ITGB2), indicating S100A8-ITGB2 is an important pathogenic mechanism associated with the pathogenesis and progression of atopic dermatitis. CCL26-CCR6 is another pair of molecules uniquely detected by scTenifoldXct. The lesional atopic dermatitis samples were characterized by the expansion of inflammatory dendritic cells and tissue-resident memory T cells.26 CCR6, a β-chemokine receptor, mediates the migration of dendritic cells and several lymphocyte subsets to sites of epithelial inflammation.43,44 It has also been reported that CCR6 is required for IL-23–induced psoriasis-like inflammation in mice.45 CCL26 is another molecule that may serve an important role in the pathogenesis of atopic dermatitis 46. The expression of CCL26 is known to be increased in lesional atopic dermatitis fibroblasts;26 the serum CCL26 levels in patients with atopic dermatitis tend to decrease after the treatment.46 These analyses demonstrate that scTenifoldXct is capable of predicting both strong and relatively weak interactions.

To further assess the performance of scTenifoldXct quantitatively, we used 40 LR pairs that were identified by all the five existing methods as the “ground truth” and plotted the receiver operating characteristic (ROC) curve and precision-recall curve. We found that scTenifoldXct achieved the greatest area under the ROC curve (AUROC, 0.89) and the highest average precision (AP, 0.87), respectively (Supplementary Fig. S3).

The scTenifoldXct framework for two-sample application

The basic scTenifoldXct framework for single-sample applications can be extended to the comparison between two samples, detecting LR pairs that show a significant difference in interaction strength between such as healthy and diseased samples. Similar to the basic framework, in which manifold alignment is achieved by solving a generalized eigenvalue problem with a joint matrix (Fig. 3A), the extended framework applies the neural networks as in the single-sample to solve the manifold alignment problem in the two-sample. The difference is that the coupled joint matrix is formed by including four gene regression networks (from two cell types of two samples) and two crosstalk score matrices (between two cell types from each sample) (Fig. 3B).

Fig. 3. Joint similarity matrices used in scTenifoldXct single- and two-sample analyses.

Fig. 3.

(A) Joint similarity matrix for manifold alignment in the single-sample analysis. WA and WB are the gene regression networks of cell types A and B, respectively. S is the crosstalk score matrix, and n is the number of genes. (B) Coupled joint similarity matrix for manifold alignment in two-sample analysis. W1,A and W1,B are the gene regression networks of cell types A and B from sample 1, W2,A and W2,B from sample 2. S1 and S2 are the crosstalk matrices for samples 1 and 2, respectively. βI is an identity matrix with a tuning hyperparameter β.

Step 1. Construcing joint similarity matrices for two samples separately

We computed a joint matrix for each sample separately using the procedure described in the single-sample application above.

Step 2. Constructing a coupled joint similarity matrix

We then placed the two joint matrices in the diagonal block of the coupled joint matrix V (Fig. 3B). To make the low-dimensional representations of two samples numerically comparable, a constraint factor was included by setting βI to the off-diagonal block of the joint similarity matrix, where I is an identity matrix that reflects the binary correspondence between genes in different samples and β is a tuning hyperparameter.

Step 3. Calculating the distance differences between two samples

To determine significant differentially interacted LR pairs, we calculated the Euclidean distance between every pairwise combination of gene i in cell type A and gene j in cell type B for two samples, respectively. The distance difference between each gene pair was then computed across samples.

Step 4. Determining significantly differential LR-mediated interactions

We considered gene pairs with a greater distance difference between two samples to be more significantly differentiated. With this, we obtained a list of ranked gene pairs. We computed p-values for the gene pairs using the Chi-square test, adjusting the p-values with a multiple testing correction, and selected significant LR pairs using a 5% false discovery rate (FDR) cutoff against the OmniPath database.5

Differential interaction analysis between normal and tumor samples

The additional applications of scTenifoldXct include multi-sample comparison. To demonstrate the performance of scTenifoldXct comparative analysis, we obtained the scRNA-seq data set from a hepatocellular carcinoma study.47 The data set contains cells from the tumor and adjacent normal tissues, allowing for the comparison (Fig. 4A). Using scTenifoldXct to compare tumor and normal samples, we identified 11 interactions that showed significantly different strengths of hepatocyte-endothelial cell interactions (Table 1). Among them, eight LR pairs interact more strongly in tumor samples. For example, PLA2G2A (secretory calcium-dependent phospholipase A2), previously implicated in host antimicrobial defense, inflammatory response, and tissue regeneration,4850 was found to interact more strongly with ITGA5 in the tumor. GO enrichment analysis with genes in these eight LR pairs enriched beta-1 integrin cell surface interactions (Supplementary Table S3), highlighting the role of beta-1 integrin in the progression of hepatocellular carcinoma.51 Fig. 4B depicts an integrated network showing four out of the eight tumor-enhanced pairs of LR-mediated interactions: PLA2G2A-ITGA5, MDK-ITGA6, SPP1-ITGA5, and SPP1-ITGB5. Corresponding intracellular networks of different cell types were included. In this integrated network, TCF4 is linked with ITGA5 and ITGA6 in the endothelial cells, indicating a functional relationship between TCF4 and the two receptor genes. TCF4 is a key component of the Wnt signaling pathway, which has been linked to the proliferation of hepatocellular carcinoma cells. It could be an effective therapeutic target for blocking the growth of hepatocellular tumors.52 Specifically, the mutation of beta-catenin (leading to its nuclear and cytoplasmic accumulation) renders it capable of bypassing APC-targeted degradation and accumulating in the nucleus to form a complex with TCF4 aberrantly activating downstream transcriptional events.53 Similarly, ERG is also linked with ITGA5 and ITGA6 in endothelial cells. A previous study has found more ERG-positive endothelial cells in hepatocellular carcinoma tissue than in adjacent normal tissue.54 FOS is linked with ITGA5 and ITGB5. FOS is an oncogene upregulated in hepatocellular carcinoma.55 Fig. 4C shows two identical cellular networks of endothelial cells connected with hepatocytes via LR-mediated interactions IGF2-FGFR3 and IGF2-PDGFRB. In the tumor sample, the reduced intracellular regulation between PDGFRB and ID1 and ID3 was evident. PDGFRB has been implicated in the development and metastasis of hepatocellular carcinoma; ID1 and ID3 are associated with hepatocellular carcinoma dedifferentiation.56

Fig. 4. Cell-cell interactions between hepatocytes (senders) and endothelial cells (receivers).

Fig. 4.

(A) An illustration of representative interactions in hepatocellular carcinoma (HCC).47 (B) An integrated network across hepatocytes and endothelial cells, connected by interactions PLA2G2A-ITGA5, MDK-ITGA6, SPP1-ITGA5, and SPP1-ITGB5 (boldfaced in Table 1). The top 30 weighted edges in each network of the cell type are shown. (C) Integrated networks connected through IGF2-FGFR3 and IGF2-PDGFRB across hepatocytes and endothelial cells in the adjacent normal (left) and tumor (right) tissues.

Table 1.

Differential interactions between hepatocytes and endothelial cells.

LR pairs LR distance on manifold (tumor) LR distance on manifold (normal) Difference in LR distances between tumor and normal p-value Adj. p-value
PLA2G2A-ITGA5 0.0010 0.0203 −0.0193 3.6E-07 8.0E-04
TFF3-ACKR3 0.0147 0.0330 −0.0183 1.5E-06 1.6E-03
IGF2-PDGFRB 0.0290 0.0108 0.0182 1.8E-06 1.7E-03
IGF2-FGFR3 0.0269 0.0105 0.0164 1.6E-05 6.1E-03
MDK-PTPRB 0.0041 0.0193 −0.0152 6.6E-05 1.5E-02
CXCL3-ACKR1 0.0078 0.0228 −0.0150 8.1E-05 1.8E-02
MDK-ITGA6 0.0059 0.0206 −0.0147 1.1E-04 2.1E-02
SPP1-ITGA5 0.0058 0.0203 −0.0145 1.3E-04 2.4E-02
IGF2-ERBB2 0.0275 0.0135 0.0140 2.1E-04 3.3E-02
SPP1-ITGB5 0.0173 0.0312 −0.0139 2.4E-04 3.6E-02
SPP1-ITGA6 0.0068 0.0206 −0.0138 2.8E-04 3.9E-02

LR pairs were ranked according to the difference in distances from the ligand gene to the receptor gene on their aligned manifold in the latent space between two cell types. The sign of difference indicates the direction of change in the interaction strength between the tumor and normal tissues. A positive difference indicates a greater LR distance or decreased interaction strength in the tumor. In contrast, a negative difference indicates a smaller LR distance or increased interaction strength in the tumor. LR pairs that appear in Fig. 4B are highlighted in bold for cross reference.

Differential interaction analysis between wild-type (WT) and gene knockout (KO) samples

To test the generalizability of scTenifoldXct, especially when applying to the comparison of two samples, we performed an additional two-sample comparison analysis. We obtained scRNA-seq data derived from lung alveolar type II (AT2) cells and basophils in a study utilizing WT and IL1RL1 (IL33R) KO mice 57. It is known that AT2 cells produce IL33 (ligand) that binds to IL1RL1 (receptor). In IL1RL1-deficient lungs, basophils lack the expression of a large number of lung basophil-specific genes. We predicted that scTenifoldXct-based comparison analysis between the WT and KO samples should be able to identify significant differential LR pairs. Indeed, IL33-IL1RL1 ranked at the top of all LR pairs that were predicted by scTenifoldXct to be significantly differentiated (Supplementary Table S4). In addition to IL33-IL1RL1, scTenifoldXct also identified six other significant differentially interactive LR pairs, including CSF1-TNF, CSF2-TNF, EDN1-EDNRB, and EDN3-EDNRB. CSF1 and CSF2 are CSF family cytokines that have been shown to play crucial roles in shaping the lung microenvironment.5860 The interaction strengths of CSF1-TNF and CSF2-TNF were all significantly diminished in the KO sample. This is consistent with the known functions of these genes and their interplay in lung development.57,61 Similarly, the interaction strengths of EDN1-EDNRB and EDN3-EDNRB were also diminished in the KO sample, which is consistent with the observed downregulation of EDNRB in lung disease, where cells lose the ability to respond to endothelin stimulation.62

Discussion

We have showcased the functionality and performance of scTenifoldXct. Using real-data examples, we demonstrated single-sample and two-sample applications of scTenifoldXct. A two-sample application involves comparative analysis between two samples, for which none of the existing methods adopts the integrative strategy as does scTenifoldXct. Methodologically, scTenifoldXct leverages information on the topological placements of genes in gene regression networks to explore intracellular connections of LR pairs with other genes. Since the scTenifoldXct method takes into account the networks underlying genes of LR pairs, it allows for the reconstruction of a more detailed map of biological pathways linking the two cell types of interest. Incorporating intracellular networks enables a comprehensive inference of signaling pathways across cells. To the best of our knowledge, only a few existing tools, such as NATMI and scMLnet,7,17 incorporate intracellular network or signaling pathway information in their analyses. The difference is that scTenifoldXct requires no prior knowledge of intracellular networks, while other methods depend on signaling pathways and networks in databases such as KEGG. scTenifoldXct computes gene regression networks from the input data and operates entirely in a data-driven manner. Furthermore, in the initial stage of the analysis, scTenifoldXct assigns scores to all combinations of gene pairs without requiring prior knowledge of known LR pairs. Thus, significant LR pairs detected by scTenifoldXct are likely to be those undocumented in the databases.

scTenifoldXct adopts the redesigned crosstalk score to quantify the interaction strength between an LR pair. The product of ligand and receptor expression has been widely adopted as an indicator for this purpose, which is intuitively sound. When a ligand and its receptor interact, they are expected to be highly expressed, and the value of the product of their gene expression should be greater. The redesigned interaction score we designed for scTenifoldXct used the same principle, but we expanded its formula by incorporating gene expression variability. The latter has been shown to be equally weighted as the average gene expression in reflecting cellular functions.63,64 Rather than focusing just on the expression mean, scTenifoldXct considers gene expression variability across cells and calculates the interaction score, balancing the contributions of the mean and the variability. The effectiveness of this redesigned metric has been demonstrated in our results, showing that biologically significant LR pairs with low expression can be detected.

In terms of differential interaction analysis, while many tools allow users to compare two samples,711 they initially infer interactions for each sample independently before comparison. The differences in obtained interaction scores are used to assess the differences between the two samples. In contrast, the scTenifoldXct comparative algorithm uses the integrated method to learn the correspondences rather than processing two samples independently. The level of differential interactions is measured by the distance differences in the joint latent space of aligned manifolds. Our manifold alignment algorithm automatically assumes that data sets share the same underlying structure across cell types. Such an assumption can be easily nullified by presenting cell type-specific cell states across heterogeneous single-cell data sets. Thus, it remains computationally challenging for manifold alignment algorithms to preserve both shared and data set-specific cellular structures across samples.65 Despite this concern, our common manifold-based integrative strategy has been shown to be highly effective. For example, retrospectively, we tried to apply the independent processing strategy to repeat the comparative analysis, but this strategy did not work even with scTenifoldXct—i.e., no LR pairs were detected under the independent processing.

For further development, because temporally and spatially precise cell communication is the key to cellular differentiation, we consider the next version of scTenifoldXct could be directed toward incorporating time-series and spatial transcriptomic information. For time-series scRNA-seq data, existing analytical frameworks such as GraphFP and SoptSC,66,67 have demonstrated the feasibility of simultaneous inference of cell lineages and cell-cell communications. Recent improvements in pseudotemporal ordering enable us to map the underlying regulatory networks over time.68,69 Thus, the scTenifoldXct framework can be further updated by taking time-series or pseudotime information and integrating dynamic inference modules, enabling the assessment of changes in interaction strength throughout the processes of cellular differentiation or organ development.

STAR★Methods

Resource availability

Lead contact

Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, James J. Cai (jcai@tamu.edu).

Materials availability

This study did not generate new materials.

Data and code availability

  • This paper analyzes existing, publicly available data. These accession numbers for the datasets are listed in the key resources table. Known LR pairs (n = 8,047) were retrieved from OmniPath (v1.0.4) 5, a knowledge database of intracellular and intercellular signaling pathways. TF gene names (n = 1,564) were downloaded from the Human TFome database.70

  • All original code has been deposited at GitHub (https://github.com/cailab-tamu/scTenifoldXct) and is publicly available as of the date of publication. This repository has been archived at Zenodo (https://doi.org/10.5281/zenodo.7453377).

  • Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

Key resources table.
REAGENT or RESOURCE SOURCE IDENTIFIER
Antibodies
Bacterial and virus strains
Biological samples
Chemicals, peptides, and recombinant proteins
Critical commercial assays
Deposited data
Inflammatory skin data set He, H., et al.26 GEO: GSE147424
Hepatocellular carcinoma data set Sharma, A., et al.47 GEO: GSE156337
Lung microenvironment data set Cohen, M., et al.57 GEO: GSE119228
Experimental models: Cell lines
Experimental models: Organisms/strains
Oligonucleotides
Recombinant DNA
Software and algorithms
scTenifoldXct This paper DOI:10.5281/zenodo.7453377
R https://www.cran.r-project.org v4.0.4
Python https://www.python.org/ v3.9
OmniPath Turei, D., et al.5 v1.0.4
Seurat (R package) https://cran.r-project.org/web/packages/Seurat v4.0.2
Ray Tune (Python package) https://www.ray.io/ray-tune v1.8.0
Pytorch (Python package) https://pytorch.org v1.9.0
igraph (Python package) https://igraph.org v0.9.6
LIANA (R package) Dimitrov, D., et al.20 v0.0.5
UpSetR (R package) https://cran.r-project.org/web/packages/UpSetR v1.4.0
CytoTalk (R package) Hu, Y., et al.18 v0.99.0
Other

Methods details

Data preprocessing

The input of scTenifoldXct is a gene-by-cell count matrix with annotated cell types. In addition, sample information is required for the comparative analysis. For all scRNA-seq data sets, we performed log-normalization using the NormalizeData function in Seurat (v4.0.2).71 Highly variable genes were selected using the FindVariableFeatures function in Seurat (selection.method = “vst”). For each gene, Seurat computed the standardized variance of its expression across cells, controlling for the mean expression.71 For each data set, the top 3,000 highly variable genes were included in subsequent analyses. Cell annotations from original studies were retained and used.

Network construction

Given cell types A and B, we employed PC regression to construct the intracellular gene regression networks, denoting them as WA and WB. Specifically, suppose Xn×p is the count matrix with n genes and p cells for cell type A. The gene expression level of the ith gene in all p cells is represented by the ith row of X, denoted by Xip. Denote Xi(n1)×p as the matrix by deleting Xi from X. To estimate the effects of other n1 genes on the ith gene, we constructed the PC regression model for Xi. First, we applied principal component analysis (PCA) to XiT and selected the first M leading PCs to construct Zi=(Z1i,,ZMi)p×M, where Zmip is the mth PC of XiT, m=1,2,,M and Mmin(n,p). Because PC regression only uses M PCs as covariates, it reduces the risk of over-fitting and computation time. Denote Vi(n1)×M as the PC loading matrix for the first M leading PCs, then Zi=XiTVi where Vi satisfies (Vi)TVi=IM. The next step was to generate the regression coefficients by regressing Xi on Zi and solve the following optimization problem:

β^i=argminβiRMXiZiβi2'2

which can be easily solved by the ordinary least square (OLS) method. Then, the effects of the other n1 genes to the ith gene were obtained by α^i=Viβ^in1. Repeating this process for another n1 times, with one different gene as the response gene each time, we assembled {α^i}i=1n together and constructed an n×n weighted adjacency matrix W of the intracellular gene regression network. The ith row of W is α^i, and all diagonal entries of W are all 0. Eventually, the gene regression network WA for cell type A and gene regression network WB for cell type B were obtained. WA and WB could be symmetrized when necessary.

Crosstalk score

The product of the mean expression of known ligands and receptors has been widely used in most computational approaches to gauge cellular interactions. To account for the gene variance within cell groups, we additionally incorporated gene expression variances in the definition of the crosstalk score between gene i in cell A and gene j in cell B, as shown below:

scoreiA,jB=[(1λ)uiA2+λσiA2]×[(1λ)ujB2+λσiB2],

where uiA, σiA2 and uiB, σiB2 are the expression mean and variance of gene i in cell A and gene j in cell B, respectively. The hyperparameter λ scales the relative contribution of mean and variance. By default, λ=1/2, and the crosstalk score could be treated as the product of half of the second moment between gene i in cell A and gene j in cell B.

Manifold alignment

Manifold alignment is a nonlinear feature projection method by which we embed the genes of sender and receiver cells into a unified space while simultaneously minimizing the distance between corresponding genes and preserving the original structure of gene regression networks of each cell type. Manifold alignment allows the low dimensional projections of genes to be comparable and preserves the information of gene regression networks. To summarize the strength of interactions for each pair of genes across the cell types, we defined the crosstalk score matrix S with Si,j=μscoreiA,jB, where μ is a scale factor indicating how much we want to preserve the weight of intercellular correspondences relative to the edge weight of gene regression networks. Following ref.24, we set μ=i,j(WA)ij+i,j(WB)ij2i,j(Sscore)ij. In this way we ensured the correspondence and gene regression networks are in a comparable scale so that the manifold alignment result is not biased toward either metric. We found that the above value setting of μ generated a more robust and unbiased detecting result than other scaling settings, including μ=1 (unscaled), 0.1μ, and 10μ (Supplementary Fig. S4B). The joint similarity matrix is then constructed as follows:

W=[WASSTWB]

Note that S is asymmetric, while, when WA and WB are symmetric, W is symmetric. WA and WB may contain negative values when gene expressions are negatively correlated, and in such cases, the properties of the corresponding Laplacian are not well understood.24 We resolved this problem by adding 1 to all entries of WA and WB, transforming the range of WA and WB from [−1, 1] to [0, 2]. As a result, the projected features of two genes with a positive correlation would be closer than those with a negative correlation. For convenience, we still used WA and WB to denote the transformed gene regression networks of the two cell types. The loss function for this manifold alignment is

()=i,jij2Wij=i,jiAjA2WijA+i,jiBjB2WijB+2i,jiAjB2Sij,

where =[AB]2n×d is the low-dimensional representation for cell type A and cell type B, and d(n) is the dimension of the latent space. Moreover, we needed an additional constraint

TD=Id

for this loss function to work properly, i.e., to avoid the trivial solution of mapping all instances to zero, where D is a diagonal matrix with Dii=iWij and Id is a d×d identity matrix. Solutions for manifold alignment traditionally rely on the eigen decomposition, which is computationally demanding. To speed up, neural networks were used to learn the unified low-dimensional latent representation .23 Let =[AB]=[A(,θA)B(,θB)], where θA and θB are parameters for two neural networks. By minimizing the loss function (;θA,θB), we obtained parameters θ^A and θ^B. To guarantee that our solution ^=(;θ^A,θ^B) satisfies the constrain ^TD^=Id, we followed the optimization method described in ref.23, forcing the outputs of the optimization problem on the Stiefel manifolds.72 This nonlinear method yielded the low-dimensional representation ^, which reveals the information on both intra- and inter-cellular networks.

Neural networks have an input layer consisting of nodes of samples (cells) and two hidden layers (by default 32/16 hidden units) along with sigmoid nonlinearity, followed by a linear output layer (3 units). All the layers are fully connected. Hyperparameters such as the number of iterations and learning rate were optimized to ensure the functionality and reproducibility of the networks for single- and two-sample analyses. For each data set, we performed a random hyperparameters search of 100 trials by the module Ray Tune (v1.8.0)73 with the hidden neurons randomly selected from 64 and 32, the embedding dimension sampled from 2 to 10, and the learning rate sampled from a uniform distribution from 10−5 to 0.1 in log space. Supplementary Fig. S4A shows some experimental runs including the optimal output for each data set. To implement, we initialized two neural networks with uniform weights following PyTorch (v1.9.0) recommended heuristic for linear layers and trained them from scratch. For the inflammatory skin data set of single-sample analysis, we trained the model for 1,000 iterations using the Adam optimizer with a learning rate of 0.01.

Determining the statistical significance of interactions between cell types in a single sample

With ^A=A(,θ^A) and ^B=B(,θ^B)n×d being the representations of genes from cell types A and B in the low dimensional embedding, respectively, we calculated the Euclidean distance dij across cell types for every pairwise combination of gene i in cell A and gene j in cell B and denoted the square difference of projected representations between gene i and gene j as dij2=^iA^jB2. We implemented a nonparametric test to identify significant gene pairs among all combinations under the null hypothesis that there is no LR-mediated interaction between gene pairs. The null hypothesis distribution was obtained by collecting dij2 of all gene pairs, excluding LR pairs in the OmniPath database. Compared to gene pairs that appear in the database, those gene pairs absent in the database were considered much less likely to be gene pairs that confer LR-mediated interactions. Alternatively, the null distribution could be constructed by random shuffling of the data, which would make the test less conservative. Next, we calculated the quantiles for LR pairs under the null hypothesis distribution and set them as the original p-values. The threshold was set to 0.05 for all data sets. We excluded the Chi-square test in this case because the left tail targeting for close gene pairs with short distances is desired and is therefore incompatible with the Chi-square test.

Comparative manifold alignment

Comparative biology to study the function of gene sets looks for significant differential interactions between two samples, such as healthy (H) and disease (D). One direct method is to walk through the above pipeline for two samples separately and obtain the difference dijH and dijD for different samples. Given that dijH and dijD were not comparable since the low-dimensional representations for two samples belonged to different latent spaces, we were asked to add constraints to make them numerically comparable so that the same genes from two samples would have similar low-dimensional representations. Following the general manifold alignment framework,25 we constructed the coupled joint similarity matrix across two samples denoted by V as

V=[WHβIβIWD],

where WH and WD are the joint similarity matrices constructed by the above manifold alignment method for each sample, respectively. xI is an 2n×2n identical matrix, and β is a tuning hyperparameter, which is by default 0.9 times the mean value of the row sums of WH and WD. Intuitively, a smaller factor β would not enforce correspondence between identical genes from different samples, whereas a larger one would produce close distances between identical genes without much consideration of the given gene regression networks. We showed that within an optimal range of the scale factor in β, the aligned distances remained highly correlated (Supplementary Fig. S4C). The loss function is defined as

(F)=i,jF(i,)F(j,)2V(i,j).

The solution is denoted as F^=F^HF^D4n×d where F^H and F^D2n×d are analogy to ^ in single-sample analysis, which are the vertically stacked representations of genes from cell types A and B in each sample. In the two-sample analysis setting, four neural networks are initiated for solving the optimization problem described previously, the architecture of which are identical with ones used for the single-sample analysis. We performed a random search again for the learning rate. We trained the model for up to 3,000 iterations using the Adam optimizer with the same learning rate of 0.01 as in the single-sample analysis (Supplementary Fig. S4A).

Determining LR pair statistical significance of differential interactions between two samples

From F^H and F^D, we calculated the Euclidean distances dijH and dijD for every pairwise combination of gene i in cell A and gene j in cell B in samples H and D, respectively, similar to the single sample scenario. We then computed the squared distance difference of each gene pair across samples as Δdij2=dijHdijD2. Note that this test statistic would be higher for gene pairs that exhibit a significant difference between two samples. Thus, gene pairs with larger Δdij2 are considered more differentiated. To determine the significance, we implemented the Chi-square test since our test statistic was demonstrated in the sum of square form. Here we set df=1 to make a conservative selection of gene pairs with high precision. By using the right tail P(X>x) of the Chi-square distribution, we assigned p-values for each gene pair. Finally, we implemented FDR to generate the adjusted p-values and selected significant gene pairs with adjusted p-value < 0.05.

Validate the predicted interactions between LR pairs

After the Chi-square test, significant LR interactions were queried against the OmniPath database.5 The predicted LR pairs in the database were retained for subsequent functional enrichment analyses performed using Enrichr.74

Visualization of integrated networks

Python package igraph (v0.9.6) was used to generate network plots. In each of the network plots, only direct connections between TFs and the enriched LR pair of interest were shown, and the LR interaction itself was highlighted in green. The edge thickness was adjusted to be proportional to the absolute value of the coefficient between gene pairs in the intracellular networks. Positive and negative coefficients were indicated in red and blue, respectively.

Systematic comparisons between scTenifoldXct and existing tools

We compared scTenifoldXct with NATMI,7 SingleCellSignalR,8 Connectome,11 iTALK,9 and CellChat.10 The test data set was the inflammatory skin scRNA-seq data. Two cell types were fibroblasts and dendritic cells. For all the methods, the reference LR database was OmniPath.5 The comparison analysis was facilitated by using LIANA (v0.0.5).20 To show the overlap between significant results, the upset plot was generated using UpSetR (v1.4.0).75 Equal numbers of pairs ranked by each method’s default scoring metric were retained. CytoTalk (v0.99.0)18 was executed independently for reporting significant signaling pathways, and LR pairs from those pathways were subsequently used for the comparison. For plotting the ROC and precision-recall curves, a total of 160 LR pairs, for which both ligand and receptor genes were expressed in the two cell types, were included in the evaluation. Among them, 40 consensus pairs detected by all five methods except scTenifoldXct were used as positive pairs. The remaining 120 were used as negative pairs. The computation cost of scTenifoldXct was estimated on the public platform Google Colaboratory,76 and the results are available in Supplementary Table S5.

Supplementary Material

2

Highlights.

  • scTenifoldXct detects ligand-receptor (LR)-mediated cell-cell interactions.

  • scTenifoldXct is based on the manifold alignment of gene regression networks.

  • scTenifoldXct detects weak but biologically important interactions.

  • scTenifoldXct detects differential interactions between two different samples.

Acknowledgments

This research was funded by Texas A&M University 2019 X-Grants and DoD grant GW200026 for J.J.C; NIH R35 CA197707, the Allen Endowed Chair in Nutrition & Chronic Disease Prevention, and support from the Institute for Advancing Health through Agriculture for R.S.C.

Footnotes

Competing Interest Statement

Y.Y., G.L., Y.Z. and J.J.C. are listed as inventors on a patent application related to this work.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  • 1.Macosko EZ, Basu A, Satija R, Nemesh J, Shekhar K, Goldman M, Tirosh I, Bialas AR, Kamitaki N, Martersteck EM, et al. (2015). Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets. Cell 161, 1202–1214. 10.1016/j.cell.2015.05.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Zilionis R, Nainys J, Veres A, Savova V, Zemmour D, Klein AM, and Mazutis L (2017). Single-cell barcoding and sequencing using droplet microfluidics. Nat Protoc 12, 44–73. 10.1038/nprot.2016.154. [DOI] [PubMed] [Google Scholar]
  • 3.Zheng GX, Terry JM, Belgrader P, Ryvkin P, Bent ZW, Wilson R, Ziraldo SB, Wheeler TD, McDermott GP, Zhu J, et al. (2017). Massively parallel digital transcriptional profiling of single cells. Nat Commun 8, 14049. 10.1038/ncomms14049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Aldridge S, and Teichmann SA (2020). Single cell transcriptomics comes of age. Nat Commun 11, 4307. 10.1038/s41467-020-18158-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Turei D, Valdeolivas A, Gul L, Palacio-Escat N, Klein M, Ivanova O, Olbei M, Gabor A, Theis F, Modos D, et al. (2021). Integrated intra- and intercellular signaling knowledge for multicellular omics analysis. Mol Syst Biol 17, e9923. 10.15252/msb.20209923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Armingol E, Officer A, Harismendy O, and Lewis NE (2021). Deciphering cell-cell interactions and communication from gene expression. Nat Rev Genet 22, 71–88. 10.1038/s41576-020-00292-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Hou R, Denisenko E, Ong HT, Ramilowski JA, and Forrest ARR (2020). Predicting cell-to-cell communication networks using NATMI. Nat Commun 11, 5011. 10.1038/s41467-020-18873-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Cabello-Aguilar S, Alame M, Kon-Sun-Tack F, Fau C, Lacroix M, and Colinge J (2020). SingleCellSignalR: inference of intercellular networks from single-cell transcriptomics. Nucleic Acids Res 48, e55. 10.1093/nar/gkaa183. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Wang Y, Wang R, Zhang S, Song S, Jiang C, Han G, Wang M, Ajani J, Futreal A, and Wang L (2019). iTALK: an R Package to Characterize and Illustrate Intercellular Communication. bioRxiv, 507871. 10.1101/507871. [DOI] [Google Scholar]
  • 10.Jin S, Guerrero-Juarez CF, Zhang L, Chang I, Ramos R, Kuan CH, Myung P, Plikus MV, and Nie Q (2021). Inference and analysis of cell-cell communication using CellChat. Nat Commun 12, 1088. 10.1038/s41467-021-21246-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Raredon MSB, Yang J, Garritano J, Wang M, Kushnir D, Schupp JC, Adams TS, Greaney AM, Leiby KL, Kaminski N, et al. (2022). Computation and visualization of cell-cell signaling topologies in single-cell systems data using Connectome. Sci Rep 12, 4187. 10.1038/s41598-022-07959-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Efremova M, Vento-Tormo M, Teichmann SA, and Vento-Tormo R (2020). CellPhoneDB: inferring cell-cell communication from combined expression of multi-subunit ligand-receptor complexes. Nat Protoc 15, 1484–1506. 10.1038/s41596-020-0292-x. [DOI] [PubMed] [Google Scholar]
  • 13.Ramilowski JA, Goldberg T, Harshbarger J, Kloppmann E, Lizio M, Satagopam VP, Itoh M, Kawaji H, Carninci P, Rost B, and Forrest AR (2015). A draft network of ligand-receptor-mediated multicellular signalling in human. Nat Commun 6, 7866. 10.1038/ncomms8866. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Noel F, Massenet-Regad L, Carmi-Levy I, Cappuccio A, Grandclaudon M, Trichot C, Kieffer Y, Mechta-Grigoriou F, and Soumelis V (2021). Dissection of intercellular communication using the transcriptome-based framework ICELLNET. Nat Commun 12, 1089. 10.1038/s41467-021-21244-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Wang Y (2020). talklr uncovers ligand-receptor mediated intercellular crosstalk. bioRxiv, 2020.2002.2001.930602. 10.1101/2020.02.01.930602. [DOI] [Google Scholar]
  • 16.Browaeys R, Saelens W, and Saeys Y (2020). NicheNet: modeling intercellular communication by linking ligands to target genes. Nat Methods 17, 159–162. 10.1038/s41592-019-0667-5. [DOI] [PubMed] [Google Scholar]
  • 17.Cheng J, Zhang J, Wu Z, and Sun X (2021). Inferring microenvironmental regulation of gene expression from single-cell RNA sequencing data using scMLnet with an application to COVID-19. Brief Bioinform 22, 988–1005. 10.1093/bib/bbaa327. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Hu Y, Peng T, Gao L, and Tan K (2021). CytoTalk: De novo construction of signal transduction networks using single-cell transcriptomic data. Sci Adv 7. 10.1126/sciadv.abf1356. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Almet AA, Cang Z, Jin S, and Nie Q (2021). The landscape of cell-cell communication through single-cell transcriptomics. Curr Opin Syst Biol 26, 12–23. 10.1016/j.coisb.2021.03.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Dimitrov D, Turei D, Garrido-Rodriguez M, Burmedi PL, Nagai JS, Boys C, Ramirez Flores RO, Kim H, Szalai B, Costa IG, et al. (2022). Comparison of methods and resources for cell-cell communication inference from single-cell RNA-Seq data. Nat Commun 13, 3224. 10.1038/s41467-022-30755-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Osorio D, Zhong Y, Li G, Huang JZ, and Cai JJ (2020). scTenifoldNet: A Machine Learning Workflow for Constructing and Comparing Transcriptome-wide Gene Regulatory Networks from Single-Cell Data. Patterns (N Y) 1, 100139. 10.1016/j.patter.2020.100139. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Golovko V, Kroshchanka A, and Treadwell D (2016). The nature of unsupervised learning in deep neural networks: A new understanding and novel approach. Optical Memory and Neural Networks 25, 127–141. . [DOI] [Google Scholar]
  • 23.Nguyen ND, Huang J, and Wang D (2022). A deep manifold-regularized learning model for improving phenotype prediction from multi-modal data. Nat Comput Sci 2, 38–46. 10.1038/s43588-021-00185-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Vu HT, Carey CJ, and Mahadevan S (2012). Manifold warping: manifold alignment over time. Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence. AAAI Press. [Google Scholar]
  • 25.Wang C, and Mahadevan S (2009). A General Framework for Manifold Alignment. AAAI Fall Symposium: Manifold Learning and Its Applications. Association for the Advancement of Artificial Intelligence. [Google Scholar]
  • 26.He H, Suryawanshi H, Morozov P, Gay-Mimbrera J, Del Duca E, Kim HJ, Kameyama N, Estrada Y, Der E, Krueger JG, et al. (2020). Single-cell transcriptome analysis of human skin identifies novel fibroblast subpopulation and enrichment of immune subsets in atopic dermatitis. J Allergy Clin Immunol 145, 1615–1628. 10.1016/j.jaci.2020.01.042. [DOI] [PubMed] [Google Scholar]
  • 27.Takamura K, Fukuyama S, Nagatake T, Kim DY, Kawamura A, Kawauchi H, and Kiyono H (2007). Regulatory role of lymphoid chemokine CCL19 and CCL21 in the control of allergic rhinitis. J Immunol 179, 5897–5906. 10.4049/jimmunol.179.9.5897. [DOI] [PubMed] [Google Scholar]
  • 28.Nedoszytko B, Sokolowska-Wojdylo M, Ruckemann-Dziurdzinska K, Roszkiewicz J, and Nowicki RJ (2014). Chemokines and cytokines network in the pathogenesis of the inflammatory skin diseases: atopic dermatitis, psoriasis and skin mastocytosis. Postepy Dermatol Alergol 31, 84–91. 10.5114/pdia.2014.40920. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Saalbach A, Janik T, Busch M, Herbert D, Anderegg U, and Simon JC (2015). Fibroblasts support migration of monocyte-derived dendritic cells by secretion of PGE2 and MMP-1. Exp Dermatol 24, 598–604. 10.1111/exd.12722. [DOI] [PubMed] [Google Scholar]
  • 30.Gschwandtner M, Derler R, and Midwood KS (2019). More Than Just Attractive: How CCL2 Influences Myeloid Cell Behavior Beyond Chemotaxis. Front Immunol 10, 2759. 10.3389/fimmu.2019.02759. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Mantovani A, Gray PA, Van Damme J, and Sozzani S (2000). Macrophage-derived chemokine (MDC). J Leukoc Biol 68, 400–404. [PubMed] [Google Scholar]
  • 32.Saeki H, and Tamaki K (2006). Thymus and activation regulated chemokine (TARC)/CCL17 and skin diseases. J Dermatol Sci 43, 75–84. 10.1016/j.jdermsci.2006.06.002. [DOI] [PubMed] [Google Scholar]
  • 33.Hirota T, Saeki H, Tomita K, Tanaka S, Ebe K, Sakashita M, Yamada T, Fujieda S, Miyatake A, Doi S, et al. (2011). Variants of C-C motif chemokine 22 (CCL22) are associated with susceptibility to atopic dermatitis: case-control studies. PLoS One 6, e26987. 10.1371/journal.pone.0026987. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Garcia-Cuesta EM, Santiago CA, Vallejo-Diaz J, Juarranz Y, Rodriguez-Frade JM, and Mellado M (2019). The Role of the CXCL12/CXCR4/ACKR3 Axis in Autoimmune Diseases. Front Endocrinol (Lausanne) 10, 585. 10.3389/fendo.2019.00585. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Stutte S, Quast T, Gerbitzki N, Savinko T, Novak N, Reifenberger J, Homey B, Kolanus W, Alenius H, and Forster I (2010). Requirement of CCL17 for CCR7- and CXCR4-dependent migration of cutaneous dendritic cells. Proc Natl Acad Sci U S A 107, 8736–8741. 10.1073/pnas.0906126107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Xia C, Braunstein Z, Toomey AC, Zhong J, and Rao X (2017). S100 Proteins As an Important Regulator of Macrophage Inflammation. Front Immunol 8, 1908. 10.3389/fimmu.2017.01908. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Broome AM, Ryan D, and Eckert RL (2003). S100 protein subcellular localization during epidermal differentiation and psoriasis. J Histochem Cytochem 51, 675–685. 10.1177/002215540305100513. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Liu FY, Zhao ZJ, Li P, Ding X, Guo N, Yang LL, Zong ZH, and Sun CF (2011). NF-kappaB participates in chemokine receptor 7-mediated cell survival in metastatic squamous cell carcinoma of the head and neck. Oncol Rep 25, 383–391. 10.3892/or.2010.1090. [DOI] [PubMed] [Google Scholar]
  • 39.Mburu YK, Egloff AM, Walker WH, Wang L, Seethala RR, van Waes C, and Ferris RL (2012). Chemokine receptor 7 (CCR7) gene expression is regulated by NF-kappaB and activator protein 1 (AP1) in metastatic squamous cell carcinoma of head and neck (SCCHN). J Biol Chem 287, 3581–3590. 10.1074/jbc.M111.294876. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Schonthaler HB, Guinea-Viniegra J, and Wagner EF (2011). Targeting inflammation by modulating the Jun/AP-1 pathway. Ann Rheum Dis 70 Suppl 1, i109–112. 10.1136/ard.2010.140533. [DOI] [PubMed] [Google Scholar]
  • 41.Kim MJ, Im MA, Lee JS, Mun JY, Kim DH, Gu A, and Kim IS (2019). Effect of S100A8 and S100A9 on expressions of cytokine and skin barrier protein in human keratinocytes. Mol Med Rep 20, 2476–2483. 10.3892/mmr.2019.10454. [DOI] [PubMed] [Google Scholar]
  • 42.Wang S, Song R, Wang Z, Jing Z, Wang S, and Ma J (2018). S100A8/A9 in Inflammation. Front Immunol 9, 1298. 10.3389/fimmu.2018.01298. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Phadke AP, Akangire G, Park SJ, Lira SA, and Mehrad B (2007). The role of CC chemokine receptor 6 in host defense in a model of invasive pulmonary aspergillosis. Am J Respir Crit Care Med 175, 1165–1172. 10.1164/rccm.200602-256OC. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Paradis TJ, Cole SH, Nelson RT, and Gladue RP (2008). Essential role of CCR6 in directing activated T cells to the skin during contact hypersensitivity. J Invest Dermatol 128, 628–633. 10.1038/sj.jid.5701055. [DOI] [PubMed] [Google Scholar]
  • 45.Hedrick MN, Lonsdorf AS, Shirakawa AK, Richard Lee CC, Liao F, Singh SP, Zhang HH, Grinberg A, Love PE, Hwang ST, and Farber JM (2009). CCR6 is required for IL-23-induced psoriasis-like inflammation in mice. J Clin Invest 119, 2317–2329. 10.1172/jci37378. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Kagami S, Kakinuma T, Saeki H, Tsunemi Y, Fujita H, Nakamura K, Takekoshi T, Kishimoto M, Mitsui H, Torii H, et al. (2003). Significant elevation of serum levels of eotaxin-3/CCL26, but not of eotaxin-2/CCL24, in patients with atopic dermatitis: serum eotaxin-3/CCL26 levels reflect the disease activity of atopic dermatitis. Clin Exp Immunol 134, 309–313. 10.1046/j.1365-2249.2003.02273.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Sharma A, Seow JJW, Dutertre CA, Pai R, Bleriot C, Mishra A, Wong RMM, Singh GSN, Sudhagar S, Khalilnezhad S, et al. (2020). Onco-fetal Reprogramming of Endothelial Cells Drives Immunosuppressive Macrophages in Hepatocellular Carcinoma. Cell 183, 377–394 e321. 10.1016/j.cell.2020.08.040. [DOI] [PubMed] [Google Scholar]
  • 48.Kramer RM, Hession C, Johansen B, Hayes G, McGray P, Chow EP, Tizard R, and Pepinsky RB (1989). Structure and properties of a human non-pancreatic phospholipase A2. J Biol Chem 264, 5768–5775. [PubMed] [Google Scholar]
  • 49.Suzuki N, Ishizaki J, Yokota Y, Higashino K, Ono T, Ikeda M, Fujii N, Kawamoto K, and Hanasaki K (2000). Structures, enzymatic properties, and expression of novel human and mouse secretory phospholipase A(2)s. J Biol Chem 275, 5785–5793. 10.1074/jbc.275.8.5785. [DOI] [PubMed] [Google Scholar]
  • 50.Ishizaki J, Suzuki N, Higashino K, Yokota Y, Ono T, Kawamoto K, Fujii N, Arita H, and Hanasaki K (1999). Cloning and characterization of novel mouse and human secretory phospholipase A(2)s. J Biol Chem 274, 24973–24979. 10.1074/jbc.274.35.24973. [DOI] [PubMed] [Google Scholar]
  • 51.Tian T, Li CL, Fu X, Wang SH, Lu J, Guo H, Yao Y, Nan KJ, and Yang YJ (2018). beta1 integrin-mediated multicellular resistance in hepatocellular carcinoma through activation of the FAK/Akt pathway. J Int Med Res 46, 1311–1325. 10.1177/0300060517740807. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Zhao DH, Hong JJ, Guo SY, Yang RL, Yuan J, Wen CY, Zhou KY, and Li CJ (2004). Aberrant expression and function of TCF4 in the proliferation of hepatocellular carcinoma cell line BEL-7402. Cell Res 14, 74–80. 10.1038/sj.cr.7290205. [DOI] [PubMed] [Google Scholar]
  • 53.Huang H, Fujii H, Sankila A, Mahler-Araujo BM, Matsuda M, Cathomas G, and Ohgaki H (1999). Beta-catenin mutations are frequent in human hepatocellular carcinomas associated with hepatitis C virus infection. Am J Pathol 155, 1795–1801. 10.1016/s0002-9440(10)65496-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Moh-Moh-Aung A, Fujisawa M, Ito S, Katayama H, Ohara T, Ota Y, Yoshimura T, and Matsukawa A (2020). Decreased miR-200b-3p in cancer cells leads to angiogenesis in HCC by enhancing endothelial ERG expression. Sci Rep 10, 10418. 10.1038/s41598-020-67425-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Arbuthnot P, Kew M, and Fitschen W (1991). c-fos and c-myc oncoprotein expression in human hepatocellular carcinomas. Anticancer Res 11, 921–924. [PubMed] [Google Scholar]
  • 56.Damdinsuren B, Nagano H, Kondo M, Yamamoto H, Hiraoka N, Yamamoto T, Marubashi S, Miyamoto A, Umeshita K, Dono K, et al. (2005). Expression of Id proteins in human hepatocellular carcinoma: relevance to tumor dedifferentiation. Int J Oncol 26, 319–327. [PubMed] [Google Scholar]
  • 57.Cohen M, Giladi A, Gorki AD, Solodkin DG, Zada M, Hladik A, Miklosi A, Salame TM, Halpern KB, David E, et al. (2018). Lung Single-Cell Signaling Interaction Map Reveals Basophil Role in Macrophage Imprinting. Cell 175, 1031–1044 e1018. 10.1016/j.cell.2018.09.009. [DOI] [PubMed] [Google Scholar]
  • 58.Shibata Y, Berclaz PY, Chroneos ZC, Yoshida M, Whitsett JA, and Trapnell BC (2001). GM-CSF regulates alveolar macrophage differentiation and innate immunity in the lung through PU.1. Immunity 15, 557–567. 10.1016/s1074-7613(01)00218-7. [DOI] [PubMed] [Google Scholar]
  • 59.Guilliams M, De Kleer I, Henri S, Post S, Vanhoutte L, De Prijck S, Deswarte K, Malissen B, Hammad H, and Lambrecht BN (2013). Alveolar macrophages develop from fetal monocytes that differentiate into long-lived cells in the first week of life via GM-CSF. J Exp Med 210, 1977–1992. 10.1084/jem.20131199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Ginhoux F, and Jung S (2014). Monocytes and macrophages: developmental pathways and tissue homeostasis. Nat Rev Immunol 14, 392–404. 10.1038/nri3671. [DOI] [PubMed] [Google Scholar]
  • 61.Jones CV, Alikhan MA, O’Reilly M, Sozo F, Williams TM, Harding R, Jenkin G, and Ricardo SD (2014). The effect of CSF-1 administration on lung maturation in a mouse model of neonatal hyperoxia exposure. Respir Res 15, 110. 10.1186/s12931-014-0110-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Wei F, Ge Y, Li W, Wang X, and Chen B (2020). Role of endothelin receptor type B (EDNRB) in lung adenocarcinoma. Thorac Cancer 11, 1885–1890. 10.1111/1759-7714.13474. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Dueck H, Eberwine J, and Kim J (2016). Variation is function: Are single cell differences functionally important?: Testing the hypothesis that single cell variation is required for aggregate function. Bioessays 38, 172–180. 10.1002/bies.201500124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Osorio D, Yu X, Zhong Y, Li G, Yu P, Serpedin E, Huang JZ, and Cai JJ (2019). Single-Cell Expression Variability Implies Cell Function. Cells 9. 10.3390/cells9010014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Cao K, Hong Y, and Wan L (2021). Manifold alignment for heterogeneous single-cell multi-omics data integration using Pamona. Bioinformatics. 10.1093/bioinformatics/btab594. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Jiang Q, Zhang S, and Wan L (2022). Dynamic inference of cell developmental complex energy landscape from time series single-cell transcriptomic data. PLoS Comput Biol 18, e1009821. 10.1371/journal.pcbi.1009821. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Wang S, Karikomi M, MacLean AL, and Nie Q (2019). Cell lineage and communication network inference via optimization for single-cell transcriptomics. Nucleic Acids Res 47, e66. 10.1093/nar/gkz204. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Qiu X, Mao Q, Tang Y, Wang L, Chawla R, Pliner HA, and Trapnell C (2017). Reversed graph embedding resolves complex single-cell trajectories. Nat Methods 14, 979–982. 10.1038/nmeth.4402. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Xu Q, Li G, Osorio D, Zhong Y, Yang Y, Lin YT, Zhang X, and Cai JJ (2022). scInTime: A Computational Method Leveraging Single-Cell Trajectory and Gene Regulatory Networks to Identify Master Regulators of Cellular Differentiation. Genes (Basel) 13. 10.3390/genes13020371. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Ng AHM, Khoshakhlagh P, Rojo Arias JE, Pasquini G, Wang K, Swiersy A, Shipman SL, Appleton E, Kiaee K, Kohman RE, et al. (2021). A comprehensive library of human transcription factors for cell fate engineering. Nat Biotechnol 39, 510–519. 10.1038/s41587-020-0742-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Stuart T, Butler A, Hoffman P, Hafemeister C, Papalexi E, Mauck WM 3rd, Hao Y, Stoeckius M, Smibert P, and Satija R (2019). Comprehensive Integration of Single-Cell Data. Cell 177, 1888–1902 e1821. 10.1016/j.cell.2019.05.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Stiefel E (1935). Richtungsfelder und fernparallelismus in n-dimensionalen mannigfaltigkeiten. Commentarii Mathematici Helvetici 8, 305–353. [Google Scholar]
  • 73.Liaw R, Liang E, Nishihara R, Moritz P, Gonzalez JE, and Stoica I (2018). Tune: A Research Platform for Distributed Model Selection and Training. arXiv:1807.05118. [Google Scholar]
  • 74.Chen EY, Tan CM, Kou Y, Duan Q, Wang Z, Meirelles GV, Clark NR, and Ma’ayan A (2013). Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinformatics 14, 128. 10.1186/1471-2105-14-128. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Conway JR, Lex A, and Gehlenborg N (2017). UpSetR: an R package for the visualization of intersecting sets and their properties. Bioinformatics 33, 2938–2940. 10.1093/bioinformatics/btx364. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Carneiro T, NóBrega RVMD, Nepomuceno T, Bian GB, Albuquerque VHCD, and Filho PPR (2018). Performance Analysis of Google Colaboratory as a Tool for Accelerating Deep Learning Applications. IEEE Access 6, 61677–61685. 10.1109/ACCESS.2018.2874767. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

2

Data Availability Statement

  • This paper analyzes existing, publicly available data. These accession numbers for the datasets are listed in the key resources table. Known LR pairs (n = 8,047) were retrieved from OmniPath (v1.0.4) 5, a knowledge database of intracellular and intercellular signaling pathways. TF gene names (n = 1,564) were downloaded from the Human TFome database.70

  • All original code has been deposited at GitHub (https://github.com/cailab-tamu/scTenifoldXct) and is publicly available as of the date of publication. This repository has been archived at Zenodo (https://doi.org/10.5281/zenodo.7453377).

  • Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

Key resources table.

REAGENT or RESOURCE SOURCE IDENTIFIER
Antibodies
Bacterial and virus strains
Biological samples
Chemicals, peptides, and recombinant proteins
Critical commercial assays
Deposited data
Inflammatory skin data set He, H., et al.26 GEO: GSE147424
Hepatocellular carcinoma data set Sharma, A., et al.47 GEO: GSE156337
Lung microenvironment data set Cohen, M., et al.57 GEO: GSE119228
Experimental models: Cell lines
Experimental models: Organisms/strains
Oligonucleotides
Recombinant DNA
Software and algorithms
scTenifoldXct This paper DOI:10.5281/zenodo.7453377
R https://www.cran.r-project.org v4.0.4
Python https://www.python.org/ v3.9
OmniPath Turei, D., et al.5 v1.0.4
Seurat (R package) https://cran.r-project.org/web/packages/Seurat v4.0.2
Ray Tune (Python package) https://www.ray.io/ray-tune v1.8.0
Pytorch (Python package) https://pytorch.org v1.9.0
igraph (Python package) https://igraph.org v0.9.6
LIANA (R package) Dimitrov, D., et al.20 v0.0.5
UpSetR (R package) https://cran.r-project.org/web/packages/UpSetR v1.4.0
CytoTalk (R package) Hu, Y., et al.18 v0.99.0
Other

RESOURCES