Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Mar 28.
Published in final edited form as: Cytometry A. 2022 May 20;101(7):547–551. doi: 10.1002/cyto.a.24656

Distinguishing cell-cell complexes from dual lineage cells using single-cell transcriptomics is not trivial

Julie G Burel 1,*, Ashu Chawla 1, Jason A Greenbaum 1, Bjoern Peters 1,*
PMCID: PMC10049842  NIHMSID: NIHMS1881723  PMID: 35594038

Abstract

In their recent correspondence, Jie et al. strongly defend that the DE cell population they discovered are always dual lineage co-expressing cells and not complexes of B cells and T cells, which we have previously described as frequently present in single-cell RNA sequencing data. Here, we respond to the specific arguments made in their correspondence. Specifically, we demonstrate that the presence of a gene signature in a given cell population is not enough to ascertain that it does not contain cell-cell complexes, or that it represents a biologically distinct cell type. We also show that the gene signature of DE cells contains several genes from the myeloid lineage, suggesting either that their DE cells are a triple-lineage co-expressing cell, or a three-component cell aggregate. Finally, we identify multiple transcriptomic features of DE cells that correspond to B cell-T cell complexes, namely the presence of lower average expression of B-and T-cell specific genes, and a higher number of detected genes per cell. Taken together, our results demonstrate that solely based on their scRNAseq profile, it is not possible to ascertain that DE cells are dual expressing cells and not cell-cell complexes.


We have recently discovered that cell-cell doublet populations pairing a T cell and an antigen-presenting cell (APC) - such as a monocyte or a B cell - can be detected in the singlet gate of ex vivo human blood samples analyzed by flow cytometry (Burel, Pomaznoy et al. 2019, Burel, Pomaznoy et al. 2020). Strikingly, the T cell-monocyte complexes we detected showed LFA1/ICAM1 polarization at their point of contact, their frequency fluctuated over time following immune perturbations, such as tuberculosis (TB) infection, and their T cell subset reflected the expected polarization of immune responses, suggesting biological relevance and not the result of random association (Burel, Pomaznoy et al. 2019). Importantly, these doublets were not removed by conventional forward and side scatter gating approaches to exclude cell aggregates in flow cytometry acquisition and cell sorting (Burel, Pomaznoy et al. 2019) and can thus be present in single-cell data. Building on our continued experience in the field since our first discovery of T cell-monocyte complexes (Burel, Pomaznoy et al. 2019), we subsequently published flow cytometry and transcriptomic signatures of cell-cell complexes, to facilitate their detection in datasets derived from single-cell techniques, such as single-cell RNA sequencing (scRNAseq) (Burel, Pomaznoy et al. 2020). In particular, we demonstrated that undetected cell-cell complexes in single-cell data will appear at first glance to be a distinct “halfway” cell population with mixed lineage features of both cell component of the complex at both protein and mRNA level (Burel, Pomaznoy et al. 2020).

Ahmed et al. identified a dual lineage co-expressing cell population (dual expressors, or DE cells) with expression of both B- and T-cell lineage markers, at both protein and RNA level, in the peripheral blood of Type 1 diabetes (T1D) patients (Ahmed, Omidian et al. 2019). Using newly generated flow cytometry data and the scRNAseq data from Ahmed et al., we found that cell-cell complexes signatures were present in DE cells, suggesting that the phenotypic definition of DE cells may contain a dual lineage expressing cell population, but also encompass a significant proportion of cell-cell complexes pairing a B cell and a T cell (Burel, Pomaznoy et al. 2020).

In their recent correspondence (Jie, Ahmed et al. 2022), Jie et al. argue that their DE cells are never cell-cell complexes but only dual lineage co-expressing cells because 1) they have a unique gene signature that distinguish them from singlet B cells and T cells, 2) B- and T-cell specific genes are similarly expressed in DE cells compared to singlet B cells and T cells, 3) they have a distinct Principal Component Analysis pattern compared to singlet B cells or T cells, and 4) the BCR-X clonotype (specific to DE cells) was found in an independent publicly available BCR dataset derived from conventional B cells. Here we respond why in our opinion each of their arguments is insufficient and/or not supported statistically.

Having a unique gene expression signature does not exclude being cell doublets

In their correspondence, Jie et al. claim that unlike CD3+CD14+ doublets, DE cells have a unique gene signature compared to singlet T cells and B cells, and thus represent a novel cell type (Jie, Ahmed et al. 2022). We did not test for a specific gene signature in CD3+CD14+ doublets in our original manuscript, as the purpose of our study was to demonstrate that when looking at genes unique to monocytes or T cells, CD3+CD14+ doublets have positive expression for both set of genes, and thus appear to have a mixed lineage nature (Figure 1 in (Burel, Pomaznoy et al. 2020)). But when specifically testing for genes that were differentially expressed in CD3+CD14+ doublets compared to T cells or monocytes, we can identify a distinct expression pattern (Figure 1A) and cell type score (Figure 1B) in CD3+CD14+ doublets compared to singlet monocytes and T cells. These results were highly similar to the expression pattern of the gene signature of DE cells when compared to singlet B cells and T cells, as displayed in Figure 1B from Jie et al. (Jie, Ahmed et al. 2022). Genes in the signature of CD3+CD14+ doublets included several genes associated with cell adhesion (ITGA5, SKAP2), cell proliferation (CCDC6, ETS1, NCOR2, PIN1, STAT5B), and the inflammatory p38 MAPK signaling pathway (MAPKAKP2, RELL1, SRF, TCF3) (full gene list in Table S1). The identification of a gene signature in CD3+CD14+ doublets was not surprising as we have shown that a significant proportion of cell-cell complexes are not the result of random association and instead hold biological relevance. Doublets may thus be enriched for cells that are more likely to stick together, such as those that have been recently activated, and hold unique transcriptomic programs. Together, our results demonstrate that the existence of a gene signature in a given cell population is not sufficient to eliminate the suspicion that it may contain cell-cell complexes.

Figure 1: The existence of a gene signature in a cell population is insufficient to claim that it is a biologically distinct cell type.

Figure 1:

A) Gene expression heatmap and B) gene score of the top 100 differentially expressed genes in CD3+CD14+ cells compared to T cells and monocytes. ScRNAseq data were derived from 21 monocytes, 22 T cells, and 22 CD3+CD14+ cells isolated from PBMC of one healthy subject, and are available under GEO accession number GSE117435. C) Gene expression heatmap and D) gene score of the top 100 differentially expressed genes in Group1 versus Group 2 and Group3. Each group contained 25 cells randomly selected from a larger scRNAseq unpublished dataset of sorted CD3+CD14- T cells from five active tuberculosis patients. This dataset was sequenced and analyzed as described in GSE117435. Differentially expressed genes were identified using the two-comparison test statistic function from Qlucore. Heatmap were drawn using Qlucore, with samples ordered by cell population and genes by hierarchical clustering. Gene scores were calculated for each cell by summing the normalized TPM counts of all genes included in the signature. E) Normalized counts for 4 myeloid genes present in the gene signature of DE cells as identified by Jie et al. (Jie, Ahmed et al. 2022). Statistical differences between cell populations were defined with the non-parametric unpaired Mann-Whitney test (*p < 0.05; **p < 0.01; ***p < 0.001; ****p < 0.0001).

The identification of a gene signature in a cell subset without subsequent validation is not necessarily meaningful

In their first publication (Ahmed, Omidian et al. 2019) as well as their most recent report (Jie, Ahmed et al. 2022), Hamad and colleagues identified the gene signature of DE cells on a small number of cells (n=77, split into three groups: DE cells, singlet B cells and singlet T cells, GSE129112) and did not perform a validation step in an independent dataset. This is problematic, as single-cell transcriptomics is known to show a lot of variability across cells, samples and cohorts, and it is often too easy to identify “false positive” gene signatures, especially in small datasets with less than 100 cells. To illustrate this point, we randomly selected from a T cell scRNAseq dataset (sorted CD3+CD14- T cells from five patients with active tuberculosis, unpublished dataset generated for an unrelated project), 75 cells and split them into 3 groups of 25 cells each: Group1, Group 2 and Group 3. Thus, each group represented a random combination of T cells from each subject and not obviously biologically distinct populations. Using the same strategy as in Figure 1, we identified a unique gene signature that distinguished Group 1 from the other two groups (Figure 1C and Figure 1D). This example demonstrates that the identification of a gene signature in a given cell subset is not in itself meaningful. Any signature, especially one that claims to identify a fundamentally new cell type, needs to be tested and confirmed on new data.

The gene signature of DE cells contains many myeloid lineage genes

Surprisingly, while Hamad and colleagues claim that the DE cell population they discovered is unique based on their gene signature, they never discuss the genes making up that signature. From the list of genes specific to DE cells that are legible in Figure 1B from (Jie, Ahmed et al. 2022), we identified many genes that are known to be exclusively expressed by the myeloid cell lineage such as CD14, CD33, CLEC4A and VCAN (Figure 1E). We have repeatedly observed three-cell complexes in the CD3+CD14+ population gated from live PBMC using imaging flow cytometry where the third cell partner was neither a T cell or a monocyte (unpublished observations). Thus, DE cells are either: i) a triple-lineage co-expressing cell population with shared expression of B cell, T cell but also myeloid lineage markers, or ii) complexes of T cells, B cells and myeloid cells with either two or three cell components.

The vast majority of B- and T-cell specific genes have an average expression lower in DE cells compared to B cells and T cells.

In our original paper, we observed that since scRNAseq is a relative measurement of gene expression and that complexes contain twice more mRNA compared to singlets, expression of genes unique to one cell component of a complex is lower in the sequenced complex compared to its singlet counterpart. Using the publicly available dataset of DE cells originally reported in (Ahmed, Omidian et al. 2019) (GSE129112), we showed that when restricting our analysis to the top 100 B- and T- cell specific genes, DE cells have a lower gene expression score compared to singlet T cells or B cells (Figure 4G (Burel, Pomaznoy et al. 2020)). In their correspondence, Jie et al. used a permutation test on the same dataset to show that most B- and T-cell specific genes have similar expression between DE cells and singlet T cells and B cells. They do even find a handful of genes with higher expression in individual DE cells. The justification for electing such test is not clear, and comparing individual gene expression for individual DE cells against the average expression for B cells or T cells is not appropriate. In scRNAseq data, binary (ON/OFF) expression patterns are often encountered, meaning that within a group of cells of identical lineage, many individual cells will have null expression values for some genes previously identified as highly specific to this cell lineage using bulk RNA sequencing. Thus, it is likely that the permutation test results from Jie et al. are only driven by a few outliers, and are not a good representation of the overall DE population. This is further supported by the heatmap representation in Figure 1C from their correspondence (Jie, Ahmed et al. 2022), where the expression of B- and T-cell specific genes seems overall of much lower magnitude in DE cells compared to B cell and T cells. Altogether, it is important when comparing cell populations at the single-cell level to not restrict the analysis to one gene but include several (as we did for our cell type score calculation reported in (Burel, Pomaznoy et al. 2020)). If individual gene analysis is necessary, the average expression within a cell population should thus be preferred for quantification and statistical comparisons rather than focusing on individual cells. For this particular analysis, we thus compared the expression of the top 100 B-cell specific genes within DE cells or B cells, by calculating a fold change between the average TPM expression in B cells versus DE cells. We identified that 80 of the top 100 genes had a positive fold change, meaning they do have a higher average expression in B cells compared to DE cells (Figure 2A, Table S2). Amongst the genes with higher average expression in B cells compared to DE cells were the key B cell lineage markers CD19, CD79B and MS4A1 (CD20). This also hold true for the comparison of DE cells with T cells, where 92 of the top 100 genes expressed by T cells have a higher average expression in T cells compared to DE cells (Figure 2B, Table S3), including key T cell lineage markers CD2, CD3D, CD8A and IL7R. Thus, the vast majority of B- and T-cell specific genes have an average expression lower in DE cells compared to B cells and T cells, similar to what would be obtained for cell complexes pairing a B cell and a T cell.

Figure 2: DE cells have multiple transcriptomic features that are hallmark of cell-cell complexes.

Figure 2:

A) Fold change average expression of the top 100 B-cell specific genes in B cells versus DE cells. B) Fold change average expression of the top 100 T-cell specific genes in T cells versus DE cells. C) Number of genes expressed per cell in DE cells, B cells and T cells. Any cell with a number of genes detected per cell > 3,000 was flagged as suspected doublet. PCA analysis of the 2,000 most variable genes in DE cells, B cells and T cells D) using all cells or E) after exclusion of suspected doublets. UMAP analysis of DE cells, B cells and T cells F) using all cells or G) after exclusion of suspected doublets. ScRNAseq data were derived from 20 B cells, 23 T cells and 34 DE cells isolated from PBMC of T1D subjects as described in (Ahmed, Omidian et al. 2019) and are available under GEO accession number GSE129112. Fold change average expression values were calculated using normalized TPM counts provided in the GEO submission. The remaining analyses were performed with Seurat using raw fastq files.

The distinct Principal Component Analysis pattern of DE cells is driven by outliers with gene features consistent with doublets.

When performing Principal Component Analysis (PCA) on the scRNAseq data of sorted DE cells, B cells and T cells originally reported in (Ahmed, Omidian et al. 2019) (GSE129112), Jie et al. found that DE cells presented a much higher dispersion than B cells and T cells, and not positioned halfway, as B cell-T cell complexes would (Jie, Ahmed et al. 2022). The high dispersion of DE cells suggests the presence of outliers (as was already noted in the previous paragraph), and prompted us to further look into the heterogeneity of this cell population. Using the same scRNAseq dataset as Hamad and colleagues, we extracted general gene features for each cell type. Cell doublets or multiplets are expected to have higher gene counts than singlets, and in standard scRNAseq analysis pipelines such as Seurat, it is recommended to exclude cells displaying an aberrantly high number of detected genes for downstream analysis (Butler, Hoffman et al. 2018). We identified that more than a third of DE cells (12 out of 34, 35%) had a higher-than-expected number of detected genes per cell (>3,000) (Figure 2C), which is the typical cutoff used in our scRNAseq analysis workflows to exclude doublets. In contrast, no T cell and only one B cell had more than 3,000 detected genes per cell. PCA on all cells showed high heterogeneity within the DE population, with some cells clustering with T cells or B cells, but others forming a separate cluster (red), that could indeed represent a distinct cell population, as Jie et al. also suggested in their analysis (Figure 2D). However, when removing the cells with a number of detected genes >3,000 per cell (i.e., the suspected doublets), this separate cluster disappeared, and the remaining DE cells clustered either with B cells or T cells (Figure 2E). This finding was even more prominent when using UMAP dimensionality reduction (Figure 2F and 2G). Thus, the distinct PCA pattern of DE cells is driven by a subset of outlier cells that do have a signature of cell-cell complexes.

Publicly available antigen receptor repertoire datasets cannot differentiate between cell complexes and dual expressing cells

Finally, Jie et al. concludes that because the BCR-X clonotype identified in DE cells (Ahmed, Omidian et al. 2019) is present in a publicly available dataset of BCR sequences derived from conventional B cells, this supports that DE cells are real. It is not at all clear how that finding is supportive of dual expressors. If anything, the fact that it was found in a dataset derived from sorted singlet B cells (not expressing CD3), weigh more in favor of the doublet hypothesis. Thus, the BCR-X clonotype could still very well be expressed by a B cell that is forming a complex with a T cell, and be as “real” as a dual-lineage co-expressing cell population would be.

Conclusions

We believe that none of the claims from Jie et al. rule out that the DE cells they have found are primarily cell-cell complexes. Throughout this manuscript, our analysis highlighted similarities between the transcriptomic profile of DE cells and cell-cell complexes. As we have repeatedly stated here and our previous publications (Burel, Pomaznoy et al. 2019, Burel, Pomaznoy et al. 2020), dual B- and T-lineage co-expressing cells may well exist. But quantifying and characterizing them will require careful gating, replication across multiple cohorts and proper statistical analysis. If that is not done, data will be dominated by the presence of cell-cell complexes.

Supplementary Material

Table S1
Table S2
Table S3

Footnotes

Conflict of interest statement

The authors declare no conflicts of interest.

Data availability statement

The data that support the findings of this study are available from the corresponding authors, JGB and BP, upon reasonable request.

References

  1. Ahmed R, Omidian Z, Giwa A, Cornwell B, Majety N, Bell DR, Lee S, Zhang H, Michels A, Desiderio S, Sadegh-Nasseri S, Rabb H, Gritsch S, Suva ML, Cahan P, Zhou R, Jie C, Donner T and Hamad ARA (2019). “A Public BCR Present in a Unique Dual-Receptor-Expressing Lymphocyte from Type 1 Diabetes Patients Encodes a Potent T Cell Autoantigen.” Cell 177(6): 1583–1599 e1516. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Burel JG, Pomaznoy M, Lindestam Arlehamn CS, Seumois G, Vijayanand P, Sette A and Peters B (2020). “The Challenge of Distinguishing Cell-Cell Complexes from Singlet Cells in Non-Imaging Flow Cytometry and Single-Cell Sorting.” Cytometry A 97(11): 1127–1135. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Burel JG, Pomaznoy M, Lindestam Arlehamn CS, Weiskopf D, da Silva Antunes R, Jung Y, Babor M, Schulten V, Seumois G, Greenbaum JA, Premawansa S, Premawansa G, Wijewickrama A, Vidanagama D, Gunasena B, Tippalagama R, deSilva AD, Gilman RH, Saito M, Taplitz R, Ley K, Vijayanand P, Sette A and Peters B (2019). “Circulating T cell-monocyte complexes are markers of immune perturbations.” Elife 8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Butler A, Hoffman P, Smibert P, Papalexi E and Satija R (2018). “Integrating single-cell transcriptomic data across different conditions, technologies, and species.” Nat Biotechnol 36(5): 411–420. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Jie C, Ahmed R and Hamad ARA (2022). “Expression of unique gene signature distinguishes TCRalphabeta(+) /BCR(+) dual expressers from CD3(+) CD14(+) doublets.” Cytometry A. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Table S1
Table S2
Table S3

Data Availability Statement

The data that support the findings of this study are available from the corresponding authors, JGB and BP, upon reasonable request.

RESOURCES