Skip to main content
Briefings in Bioinformatics logoLink to Briefings in Bioinformatics
. 2023 Jan 11;24(1):bbac616. doi: 10.1093/bib/bbac616

A novel Bayesian framework for harmonizing information across tissues and studies to increase cell type deconvolution accuracy

Wenxuan Deng 1,#, Bolun Li 2,3,#, Jiawei Wang 4, Wei Jiang 5, Xiting Yan 6, Ningshan Li 7, Milica Vukmirovic 8,9, Naftali Kaminski 10, Jing Wang 11, Hongyu Zhao 12,
PMCID: PMC9851324  PMID: 36631398

Abstract

Computational cell type deconvolution on bulk transcriptomics data can reveal cell type proportion heterogeneity across samples. One critical factor for accurate deconvolution is the reference signature matrix for different cell types. Compared with inferring reference signature matrices from cell lines, rapidly accumulating single-cell RNA-sequencing (scRNA-seq) data provide a richer and less biased resource. However, deriving cell type signature from scRNA-seq data is challenging due to high biological and technical noises. In this article, we introduce a novel Bayesian framework, tranSig, to improve signature matrix inference from scRNA-seq by leveraging shared cell type-specific expression patterns across different tissues and studies. Our simulations show that tranSig is robust to the number of signature genes and tissues specified in the model. Applications of tranSig to bulk RNA sequencing data from peripheral blood, bronchoalveolar lavage and aorta demonstrate its accuracy and power to characterize biological heterogeneity across groups. In summary, tranSig offers an accurate and robust approach to defining gene expression signatures of different cell types, facilitating improved in silico cell type deconvolutions.

Keywords: cell type deconvolution, scRNA-seq, reference signature matrix, harmonize information

Introduction

Characterizing cell type composition in tissues can help better understand disease pathogenesis and progression. Traditional approaches for measuring cell type proportions, such as flow cytometry [1–3] and immunohistochemistry [4, 5], involve complicated protocols, expensive antibodies, high-end platforms and expertise. As these approaches are mainly based on antigen–antibody reaction, they are limited to the specificity of antibodies. In addition, some of these techniques such as cytometry-based platforms may not always yield accurate cell type proportions due to different preservation rates across cell types [6]. With the rapid development of RNA sequencing (RNA-seq) technologies and accumulation of bulk RNA-seq data in recent years, there has been a surge of computational deconvolution methods [7–11] to factorize tissue-level RNA-seq data measuring the mixture gene expression profiles to infer cellular compositions. One critical component of these methods is the signature expression profiles (signature matrix) of known cell types which are largely derived from RNA-seq data of enriched or purified cell populations [12–14]. This limits the analysis to the well-defined cell types and it is hard to incorporate less-studied cell types or subtypes. In addition, the procedure of cell population enrichment and purification puts the targeted cells under stress, which may cause systematic transcriptomic changes. Taken together, the signature matrices derived using traditional approaches can be limited and/or inaccurate.

The development of single cell RNA sequencing (scRNA-seq) technologies [12–14] enabled characterization of transcriptomes at single cell resolution. In recent years, applications of scRNA-seq have studied cell types [15, 16], embryonic and organic development [17–19], disease mechanisms and so on [20, 21]. With several cell atlas projects, such as Human Cell Atlas [22, 23] and Human Cell Landscape (HCL) [24], which aim to construct and cover the primary tissues of healthy donors, single cell datasets are being rapidly accumulated. These single cell datasets together with bulk RNA-seq data can facilitate cell type deconvolution by providing an accurate signature matrix at a higher resolution. However, scRNA-seq data are more sparse and noisier than bulk RNA-seq data, and there are platform differences between scRNA-seq and bulk RNA-seq data. Furthermore, although cell type proportions can be directly obtained from scRNA-seq data, the estimates may be biased due to different cell capture rates across cell types. Therefore, there is a strong need to conduct cell type deconvolution on bulk RNA-seq data based on the signature matrix derived from scRNA-seq data.

Existing computational cell type deconvolution methods can be divided into two categories. The methods in the first category utilize bulk RNA-seq data for the generation of the reference signature matrix and the cell type deconvolution, with CIBERSORT [8], dtangle [25], DeconRNASeq [26] and quaNTiseq [27] as representative methods. These methods depend on the reliable cell type annotations and informative cell-type-specific cell markers to generate the cell type signature matrix from bulk RNA-seq, thus the cell types that can be considered are limited. The methods in the second category include methods that utilize scRNA-seq data to derive the signature matrix. For example, BSEQ-sc extends CIBERSORT to use a single cell dataset as reference [28]. CIBERSORTx leverages single cell data and offers both S mode and B mode to address the technical differences between scRNA-seq and bulk RNA-seq data. Bisque also estimates cell type proportions from transformed bulk data referring to the single cell dataset [29]. Another commonly used method, MuSiC [30], aims to select the cell type signatures by multi-subject comparisons, which are used in subsequent deconvolution. However, current deconvolution methods using single cell datasets are limited by the quality of single cell datasets. As a result, most deconvolution methods focus on a limited number of tissues and cell types, such as immune cells of PBMC.

As mentioned above, HCL provides a widespread choice for various tissues but with a low sequencing depth. There is evidence that cells showed conserved transcriptomes [31, 32] across tissues and developmental stages, and such shared patterns can provide additional information for signature matrix construction. Therefore, we take advantage of the various tissues and overcome the low sequencing depth capitalizing on shared patterns for the same cell type across different tissues and studies. In this article, we propose a novel Bayesian framework, called tranSig, to infer the signature matrix from scRNA-seq data by leveraging cross-tissue information. Through benchmarking comparisons and real applications on peripheral blood (PB), bronchoalveolar lavage (BAL) and aortic aneurysm, we show that tranSig facilitates more accurate estimates of cell type proportions and better interpretations of the pathogenesis of diseases.

Methods

Single cell RNA-seq batch correction: To prepare for downstream analysis, we first obtained batch-corrected expression profiles Inline graphic by comparing every reference single cell dataset Inline graphic with the target one through LIGER [33]. We then derived the empirical signature matrix from the reference tissue expression profile Inline graphic by averaging the expression levels for each cell type and to obtain Inline graphic where Inline graphic and Inline graphic is the number of reference datasets, and we denote the matrix from the target tissue as Inline graphic. After comparing every reference signature matrix Inline graphicwith the target signature matrix Inline graphicelement-wise, we used k-means to group all the cell-type-specific signature ratios Inline graphic into three groups. The groups with ratios close to 1 or 0 will be considered as tissue-specific signatures. Therefore, in the reference dataset, the signatures whose ratio is in the middle range are assumed to share similar expression patterns with the target tissue and will be taken into the tranSig model.

tranSig Bayesian model

Given the bulk RNA-seq data of target tissue Inline graphic of Inline graphic subjects with Inline graphic signature genes, Inline graphic, we can do matrix factorization as the product of cell type-specific signature matrix Inline graphic and cell type proportion matrix Inline graphicwhere Inline graphic is the number of cell types. In general, the cell type deconvolution methods assume Inline graphic and Inline graphic characterizes differential signatures across cell types by averaging the expression levels of signature genes.

In addition, Inline graphic is the expected expression level of gene Inline graphic in cell type Inline graphic from tissue Inline graphic, where Inline graphic. We also add a sparsity on the signature gene matrix by introducing the Bernoulli variable Inline graphic. The Gaussian mixture distribution exhibits the mixture heterogeneity of the signature matrix.

graphic file with name DmEquation1.gif
graphic file with name DmEquation2.gif

In addition, the terms in the Gaussian mixture distribution have the following prior specifications.

graphic file with name DmEquation3.gif
graphic file with name DmEquation4.gif

.

The prior distributions for the other parameters follow the uninformative conjugate distributions. In the current implementation of tranSig, the initial Inline graphic is arbitrary, where Inline graphic represents the hyperparameter for precision of the signature gene expression levels. However, in the future research, a better choice of Inline graphic can further improve its performance.

The goal of tranSig Bayesian model is to estimate Inline graphicandInline graphic based on the single cell data from multiple tissues.

tranSig optimization algorithm

We employed SAME [34] to accelerate the algorithm and estimate the Gaussian mixture priors. Let Inline graphic and Inline graphic. We are primarily interested in the inference of the parameters Inline graphic and Inline graphic in the tranSig matrix. We set the strictly positive increasing integer sequence as Inline graphic. When the iteration Inline graphic, we first initiate Inline graphic. As the iteration Inline graphic

  • ➔ For Inline graphic, sample Inline graphic

Sample Inline graphic

Therefore, in the last iteration Inline graphicand Inline graphic, we have Inline graphic and Inline graphic. We derive the tranSig matrix by averaging Inline graphic across Inline graphic. This average can enhance the confidence of tranSig estimation and each Inline graphiccan be considered as the indicator whether Inline graphicshould be taken into account. By comparing with the estimation of Inline graphicin the last iteration, deconvolution results of averaged Inline graphicsubstantially improved and showed more consistent with the ground truth (Supplementary Figure S3). Here Inline graphic is considered as the binary weight suggesting whether we take the genes as signatures, further describing the relationships between signatures and cell types.

Bulk RNA-seq batch correction: Due to the technical differences between UMI-based scRNA-seq (e.g. Microwell-seq and 10× genomics) and bulk RNA-seq, the deconvolution by the signature matrix derived from UMI-based single cell expression profiles is far from ideal. Therefore, we leverage Combat (an empirical Bayesian batch-effect remove model) and pseudo mixture constructed by single cell expression profiles to minimize the technical variation. We re-denote bulk RNA-seq Inline graphic to be Inline graphic as the first batch and denote a pseudo mixture to be Inline graphic as the second batch. To adjust the raw bulk RNA-seq to the space of scRNA-seq by Combat, the empirical Bayes (EB) model can be formulated as the following

graphic file with name DmEquation10.gif
graphic file with name DmEquation11.gif

where Inline graphic is the overall gene expression, Inline graphic and Inline graphic are the batch- and gene-specific random effects. By using the parametric model of ComBat, we can get the adjusted bulk RNA-seq data as:

graphic file with name DmEquation12.gif

where Inline graphic are the standardized data, Inline graphic is estimated from ordinary least squares and Inline graphic and Inline graphic are estimated from EB as the first moment of their posterior distribution iteratively.

Pseudo-bulk construction by single cell expression profile: For bulk RNA-seq batch correction, we generated pseudo-bulk expression profiles based on single cell datasets. Single cell expression profiles were normalized in transcripts per million (TPM) space, and cells were sampled according to the empirical proportion of each cell type (10 000 times in total). The average of the 10 000 cells was considered as one pseudo-bulk sample (TPM space).

Simulation setup

Although we do not assume the signature matrix continuous part Inline graphic to be non-negative, we still coerce Inline graphicto be non-negative in simulation for convenience. We sample Inline graphic from half-normal distribution and Inline graphic from Bernoulli distribution.

graphic file with name DmEquation13.gif
graphic file with name DmEquation14.gif

And we set Inline graphic, and Inline graphic

Per model’s assumption, the intermediate dataset-specific signature gene matrix in the first layer Inline graphic over all the tissues shares the same Gaussian distribution with a mean of Inline graphic. Therefore, we sample the elements in the matrix Inline graphic from the following distribution:

graphic file with name DmEquation18.gif

Here we set Inline graphic.

To further simulate single cell data from multiple tissues, we let Inline graphic to be sampled from a Gamma distribution. Then we have

graphic file with name DmEquation20.gif
graphic file with name DmEquation21.gif

where Inline graphic.

Finally, we got the simulated cell type fraction matrix Inline graphic and bulk data Inline graphic. We sampled Inline graphic from uniform distribution to make the summation be 1. Then we got

graphic file with name DmEquation22.gif
graphic file with name DmEquation23.gif

where Inline graphicis the tissue-specific tranSig matrix when given a target tissue Inline graphic

Benchmarking

We compared the results of the tranSig model with four deconvolution methods, including NNLS, quadratic programming (QP), CIBERSORTx, MuSiC, bisque and BSEQ-sc. NNLS, QP and BSEQ-sc take the average expressions of signature genes in each cell type as input, and CIBERSORTx, MuSiC and bisque take single cell expression profiles as input. For CIBERSORTx, S mode and B mode were used to correct the technical variation between scRNA-seq and bulk RNA-seq. CIBERSORTx deconvolution was implemented with default parameters, and single cell references were normalized in TPM space as the CIBERSORTx input. MuSiC needs a number of subjects to select signature genes, thus ‘sample’ in HCL metadata was used as the MuSiC parameter. However, because there was only one sample in the artery dataset of HCL, MuSiC cannot be implemented in the aneurysm application. The accuracy of cell type estimation was assessed by correlation, root-mean-squared error (RMSE) and K-L divergence. Correlation and RMSE between estimates and the ground truth are widely used by previous deconvolution methods. K-L divergence can measure the difference between two probability distributions, indicating the accuracy of estimations [11].

Data processing

We utilized the published HCL with 60 tissue types as the single cell references to perform deconvolution on all tissues. We manually cleaned the cell types in the most common 30 tissues from HCL, including 25 adult tissues, four fetal tissues and cord blood (CB). For instance, we annotated all macrophage subtypes as macrophages to accommodate different annotation resolutions in the 30 tissue-specific single cell datasets. We also removed the cells that did not have a precise annotation i.e. unknown cell clusters and distal cells in Lung. Overall, there are 96 distinct cell types across all 30 tissues (Table 1).

Table 1.

Uniformed cell type annotations of the HCL dataset across tissues.

Subtype new_cell_type Tissue
Neutrophil Neutrophil Adult-Adipose
Mast cell Mast cell Adult-Adipose
Stromal cell Stromal cell Adult-Adipose
Adipocyte Adipocyte Adult-Adipose
Proliferating cell Proliferating cell Adult-Adipose
Adipocyte Adipocyte Adult-Adipose
M2 Macrophage Macrophage Adult-Adipose
T cell T cell Adult-Adrenal-Gland
inflammatory cell Inflammatory cell Adult-Adrenal-Gland
Proliferating cell Proliferating cell Adult-Adrenal-Gland
Neutrophil Neutrophil Adult-Adrenal-Gland
Stromal cell Stromal cell Adult-Adrenal-Gland
Endothelial cell Endothelial cell Adult-Adrenal-Gland
Zona fasciculata cell Zona fasciculata cell Adult-Adrenal-Gland
Erythroid cell Erythroid cell Adult-Adrenal-Gland
Dendritic cell Dendritic cell Adult-Adrenal-Gland
Macrophage Macrophage Adult-Adrenal-Gland
Smooth muscle cell Smooth muscle cell Adult-Adrenal-Gland
Endothelial Endothelial cell Adult-Artery
Smooth muscle cell Smooth muscle cell Adult-Artery
B cell (Plasmocyte) B cell Adult-Artery
Endothelial cell Endothelial cell Adult-Artery
Epithelial cell Epithelial cell Adult-Artery
Fibroblast Fibroblast Adult-Artery
Basal cell Basal cell Adult-Artery
T cell T cell Adult-Artery
Mast cell Mast cell Adult-Artery
Stromal cell Stromal cell Adult-Artery
M1 Macrophage Macrophage Adult-Artery
Macrophage Macrophage Adult-Artery
Oligodendrocyte Oligodendrocyte Fetal-Brain
Purkinje cell Purkinje cell Fetal-Brain
Neuron Neuron Fetal-Brain
Proliferating radial glia Radial glia Fetal-Brain
Unknown (DELETED) Fetal-Brain
Fibroblast Fibroblast Fetal-Brain
Microglia Microglia Fetal-Brain
Erythroid cell Erythroid cell Fetal-Brain
Radial glia Radial glia Fetal-Brain
Ependymal cell Ependymal cell Fetal-Brain
Stromal cell Stromal cell Fetal-Brain
Oligodendrocyte progenitor cell Oligodendrocyte Fetal-Brain
Macrophage Macrophage Fetal-Brain
Proliferating cell Proliferating cell Fetal-Brain
Astrocyte Astrocyte Fetal-Brain
Endothelial cell Endothelial cell Fetal-Brain
Neutrophil Neutrophil Fetal-Brain
Macrophage Macrophage Adult-Colon
Unknown (DELETED) Adult-Colon
Enterocyte Enterocyte Adult-Colon
T cell T cell Adult-Colon
Smooth muscle cell Smooth muscle cell Adult-Colon
Mast cell Mast cell Adult-Colon
Enteric glial cell Enteric glial cell Adult-Colon
B cell (Plasmocyte) B cell Adult-Colon
Stromal cell Stromal cell Adult-Colon
Goblet cell Goblet cell Adult-Colon
Enterocyte progenitor Enterocyte Adult-Colon
Fibroblast Fibroblast Adult-Esophagus
Lymphocyte Lymphocyte Adult-Esophagus
B cell (Plasmocyte) B cell Adult-Esophagus
Basal cell Basal cell Adult-Esophagus
Unknown (DELETED) Adult-Esophagus
Epithelial cell Epithelial cell Adult-Esophagus

(continued)

Table 1.

Continued

Subtype new_cell_type Tissue
Endothelial cell Endothelial cell Adult-Esophagus
Smooth muscle cell Smooth muscle cell Adult-Esophagus
Mast cell Mast cell Adult-Esophagus
Mucosal cell Mucosal cell Adult-Esophagus
MT high cell MT high cell Adult-Esophagus
Macrophage Macrophage Adult-Esophagus
B cell B cell Adult-Esophagus
Neutrophil Neutrophil Adult-Esophagus
Stromal cell Stromal cell Adult-Esophagus
Keratinocyte Keratinocyte Adult-Esophagus
Neutrophil Neutrophil Adult-Esophagus
Neutrophil Neutrophil Adult-Heart
Smooth muscle cell Smooth muscle cell Adult-Heart
M2 Macrophage Macrophage Adult-Heart
M1 Macrophage Macrophage Adult-Heart
Apoptotic cell Apoptotic cell Adult-Heart
Conventional dendritic cell Dendritic cell Adult-Heart
Mast cell Mast cell Adult-Heart
T cell T cell Adult-Heart
Cardiomyocyte Cardiomyocyte Adult-Heart
Vascular endothelial cell Endothelial cell Adult-Heart
Ventricle cardiomyocyte Cardiomyocyte Adult-Heart
Fibroblast Fibroblast Adult-Heart
Dendritic cell Dendritic cell Adult-Heart
Endothelial cell Endothelial cell Adult-Heart
Macrophage Macrophage Adult-Heart
Distal tubule cell Tubule cell Adult-Kidney
Thick ascending limb of the loop of Henle Loop of Henle Adult-Kidney
Smooth muscle cell Smooth muscle cell Adult-Kidney
Macrophage Macrophage Adult-Kidney
B cell (Plasmocyte) B cell Adult-Kidney
Proximal tubule cell Tubule cell Adult-Kidney
Mast cell Mast cell Adult-Kidney
Fenestrated endothelial cell Endothelial cell Adult-Kidney
Unknown (DELETED) Adult-Kidney
Neutrophil Neutrophil Adult-Kidney
IC-tran-PC IC-tran-PC Adult-Kidney
Principle cell Principle cell Adult-Kidney
Fibroblast Fibroblast Adult-Kidney
Myeloid cell Myeloid cell Adult-Kidney
Endothelial cell Endothelial cell Adult-Kidney
Ureteric epithelial cell Epithelial cell Adult-Kidney
Glomerular endothelial cell Endothelial cell Adult-Kidney
Epithelial cell Epithelial cell Adult-Kidney
B cell(Plasmocyte) B cell Adult-Kidney
Conventional dendritic cell Dendritic cell Adult-Kidney
Epithelial Epithelial cell Adult-Kidney
Dendritic cell Dendritic cell Adult-Kidney
Loop of Henle Loop of Henle Adult-Kidney
B cell B cell Adult-Kidney
Loop of Henle Loop of Henle Adult-Kidney
Intercalated cell Intercalated cell Adult-Kidney
Myocyte Myocyte Adult-Kidney
T cell T cell Adult-Kidney
Ciliated cell Ciliated cell Adult-Lung
B cell (Plasmocyte) B cell Adult-Lung
Myeloid cell Myeloid cell Adult-Lung
AT1 cell AT1 cell Adult-Lung
Dendritic cell Dendritic cell Adult-Lung
Epithelial cell Epithelial cell Adult-Lung
Plasmocyte Plasmocyte Adult-Lung
Arterial endothelial cell Endothelial cell Adult-Lung
Macrophage Macrophage Adult-Lung
Alveolar bipotent progenitor(cell cycle) Alveolar bipotent Adult-Lung

(continued)

Table 1.

Continued

Subtype new_cell_type Tissue
Clara cell Clara cell Adult-Lung
Bronchial chondrocyte Bronchial chondrocyte Adult-Lung
Smooth muscle cell Smooth muscle cell Adult-Lung
B cell B cell Adult-Lung
Neutrophil Neutrophil Adult-Lung
NKT cell NKT cell Adult-Lung
Fibroblast Natural killer cell Adult-Lung
Natural killer cell Natural killer cell Adult-Lung
Megakaryocyte Megakaryocyte Adult-Lung
Unknown (DELETED) Adult-Lung
Endothelial cell Endothelial cell Adult-Lung
Proliferating cell Proliferating cell Adult-Lung
Conventional dendritic cell Dendritic cell Adult-Lung
Mast cell Mast cell Adult-Lung
Lymphatic endothelial cell Endothelial cell Adult-Lung
Bronchial Epithelial cell Epithelial cell Adult-Lung
AT2 cell AT2 cell Adult-Lung
Activated T cell (DELETED) Adult-Lung
Proliferating alveolar bipotent progenitor cell Alveolar bipotent Adult-Lung
M2 macrophage Macrophage Adult-Lung
Proliferating T cell Proliferating T cell Adult-Lung
Artery endothelial cell Endothelial cell Adult-Lung
Stromal cell Stromal cell Adult-Lung
T cell T cell Adult-Lung
AT1 cell AT1 cell Adult-Lung
Smooth muscle cell Smooth muscle cell Adult-Muscle
Endothelial cell Endothelial cell Adult-Muscle
Stromal cell Stromal cell Adult-Muscle
Fast skeletal muscle cell Fast skeletal muscle cell Adult-Muscle
Neutrophil Neutrophil Adult-Muscle
T cell T cell Adult-Muscle
B cell (Plasmocyte) B cell Adult-Muscle
Muscle progenitor cell Muscle progenitor cell Adult-Muscle
Unknown (DELETED) Adult-Muscle
Fibroblast Fibroblast Adult-Muscle
Myogenic precursor cell Myogenic precursor cell Adult-Muscle
NK cell Natural killer cell Adult-Muscle
Conventional dendritic cell Dendritic cell Adult-Muscle
Proliferating cell Proliferating cell Adult-Muscle
Mast cell Mast cell Adult-Muscle
M2 Macrophage Macrophage Adult-Muscle
Fibroblast Fibroblast Adult-Pancreas
Smooth muscle cell Smooth muscle cell Adult-Pancreas
Alpha cell Alpha cell Adult-Pancreas
Endothelial cell Endothelial cell Adult-Pancreas
Beta cell Beta cell Adult-Pancreas
M2 Macrophage Macrophage Adult-Pancreas
Acinar cell (DELETED) Adult-Pancreas
Exocrine cell Exocrine cell Adult-Pancreas
Ductal cell Ductal cell Adult-Pancreas
Acinar cell Acinar cell Adult-Pancreas
Neuroendocrine cell Neuroendocrine cell Adult-Prostate
Endothelial cell Endothelial cell Adult-Prostate
M1 Macrophage Macrophage Adult-Prostate
Smooth muscle cell Smooth muscle cell Adult-Prostate
Epithelial cell Epithelial cell Adult-Prostate
Unknown epithelial cell Epithelial cell Adult-Prostate
Unknown (DELETED) Adult-Prostate
Intermediate epithelial cell Epithelial cell Adult-Prostate
Basal cell Basal cell Adult-Prostate
Luminal cell Luminal cell Adult-Prostate
T cell T cell Adult-Prostate
Neutrophil Neutrophil Adult-Prostate
Fibroblast Fibroblast Adult-Prostate
M2 macrophage Macrophage Adult-Spleen

(continued)

Table 1.

Continued

Subtype new_cell_type Tissue
Lymphoid progenitor cell Lymphoid progenitor cell Adult-Spleen
Neutrophil Neutrophil Adult-Spleen
B cell (centrocyte) B cell Adult-Spleen
Erythroid cell Erythroid cell Adult-Spleen
B cell (Plasmocyte) B cell Adult-Spleen
CD8 T cell Adult-Spleen
T cell T cell Adult-Spleen
Endothelial cell Endothelial cell Adult-Spleen
Smooth muscle cell Smooth muscle cell Adult-Stomach
CD8 T cell T cell Adult-Stomach
Gastric mucosa cell Gastric mucosa cell Adult-Stomach
Mast Mast cell Adult-Stomach
Macrophage Macrophage Adult-Stomach
Chromaffin cell Chromaffin cell Adult-Stomach
Epithelial cell Epithelial cell Adult-Stomach
Fibroblast Fibroblast Adult-Stomach
Parietal cell Parietal cell Adult-Stomach
Stromal cell Stromal cell Adult-Stomach
Pit cell Pit cell Adult-Stomach
B cell(plasmocyte) B cell Adult-Stomach
B cell (Plasmocyte) B cell Adult-Stomach
Gastric chief cell Gastric chief cell Adult-Stomach
Endothelial cell Endothelial cell Adult-Stomach
Granulocyte Granulocyte Adult-Stomach
Inflammatory cell Inflammatory cell Adult-Stomach
D cell/ X/A cell D cell/ X/A cell Adult-Stomach
T cell T cell Adult-Stomach
Myeloid cell Myeloid cell Adult-Stomach
Mast cell Mast cell Adult-Stomach
Follicular cell Follicular cell Adult-Thyroid
Follicular B cell B cell Adult-Thyroid
Thyroid epithelial cell Epithelial cell Adult-Thyroid
NK cell Natural killer cell Adult-Thyroid
T cell T cell Adult-Thyroid
Proliferating cell Proliferating cell Adult-Thyroid
Stromal cell Stromal cell Adult-Thyroid
Smooth muscle cell Smooth muscle cell Adult-Thyroid
B cell (Plasmocyte) B cell Adult-Thyroid
Plasmacytoid dendritic cell Dendritic cell Adult-Thyroid
Thyroid follicular cell Follicular cell Adult-Thyroid
Fibroblast Fibroblast Adult-Thyroid
Endothelial cell Endothelial cell Adult-Thyroid
Conventional dendritic cell Dendritic cell Adult-Thyroid
Neutrophil Neutrophil Adult-Thyroid
Endothelial cell in EMT Endothelial cell Adult-Uterus
Endothelial cell Endothelial cell Adult-Uterus
T cell T cell Adult-Uterus
Mast cell Mast cell Adult-Uterus
M1 Macrophage Macrophage Adult-Uterus
Unknown (DELETED) Adult-Uterus
Stromal cell Stromal cell Adult-Uterus
Endometrial cell Endometrial cell Adult-Uterus
Smooth muscle cell Smooth muscle cell Adult-Uterus
Unknown epithelial cell Epithelial cell Adult-Uterus
Fibroblast Fibroblast Adult-Uterus
Vascular smooth muscle cell Vascular smooth muscle cell Adult-Uterus
Epithelial cell Epithelial cell Adult-Uterus
Proliferating cell Proliferating cell Adult-Peripheral-Blood
Neutrophil Neutrophil Adult-Peripheral-Blood
Plasmacytoid dendritic cell Dendritic cell Adult-Peripheral-Blood
Monocyte Monocyte Adult-Peripheral-Blood
Macrophage Macrophage Adult-Peripheral-Blood
NK cell Natural killer cell Adult-Peripheral-Blood
Eosinophil Eosinophil Adult-Peripheral-Blood

(continued)

Table 1.

Continued

Subtype new_cell_type Tissue
activative T cell activative T cell Adult-Peripheral-Blood
CD8+ T cell T cell Adult-Peripheral-Blood
CD4 T cell Adult-Peripheral-Blood
B cell(Centrocyte) B cell Adult-Peripheral-Blood
B cell(Plasmocyte) B cell Adult-Peripheral-Blood
Proliferating T cell T cell Adult-Peripheral-Blood
CD8 T cell Adult-Peripheral-Blood
Myeloid progenitor cell Myeloid progenitor cell Adult-Peripheral-Blood
Erythroid cell Erythroid cell Adult-Peripheral-Blood
Conventional dendritic cell Dendritic cell Adult-Peripheral-Blood
B cell B cell Adult-Peripheral-Blood
Dendritic cell Dendritic cell Adult-Peripheral-Blood
Proliferating B cell Proliferating B cell Adult-Peripheral-Blood
T cell T cell Adult-Peripheral-Blood
Erythroid cell Erythroid cell Fetal-Skin
Vascular endothelial cell Endothelial cell Fetal-Skin
Keratinocyte Keratinocyte Fetal-Skin
Lymphatic endothelial cell Endothelial cell Fetal-Skin
Neutrophil Neutrophil Fetal-Skin
Melanocyte Melanocyte Fetal-Skin
Osteoblast Osteoblast Fetal-Skin
Smooth muscle cell Smooth muscle cell Fetal-Skin
Fibroblast Fibroblast Fetal-Skin
Mesenchymal cell Mesenchymal cell Fetal-Skin
Mast cell Mast cell Fetal-Skin
Proliferating cell Proliferating cell Fetal-Skin
Dermis fibroblast Fibroblast Fetal-Skin
M2 Macrophage Macrophage Fetal-Skin
Macrophage Macrophage Fetal-Skin
Endothelial cell Endothelial cell Fetal-Skin
Skeletal muscle cell Skeletal muscle cell Fetal-Skin
Enterocyte progenitor Enterocyte Adult-Ileum
B cell (Plasmocyte) B cell Adult-Ileum
Goblet cell Goblet cell Adult-Ileum
T cell T cell Adult-Ileum
Neuron Neuron Adult-Ileum
Macrophage Macrophage Adult-Ileum
Fibroblast Fibroblast Adult-Ileum
Stromal cell Stromal cell Adult-Ileum
Endothelial cell Endothelial cell Adult-Ileum
Mast cell Mast cell Adult-Ileum
Paneth cell Paneth cell Adult-Ileum
Enterocyte Enterocyte Adult-Ileum
Myeloid cell Myeloid cell Adult-Ileum
Conventional dendritic cell Dendritic cell Adult-Ileum
Smooth muscle cell Smooth muscle cell Adult-Ileum
Epithelial cell Epithelial cell Fetal-Intestine
Proliferating cell Proliferating cell Fetal-Intestine
Enterocyte progenitor Enterocyte Fetal-Intestine
Enteroendocrine cell Enteroendocrine cell Fetal-Intestine
Enterocyte Enterocyte Fetal-Intestine
Goblet cell Goblet cell Fetal-Intestine
Endothelial cell Endothelial cell Fetal-Intestine
Fibroblast Fibroblast Fetal-Intestine
Macrophage Macrophage Fetal-Intestine
Stromal cell Stromal cell Fetal-Intestine
Neuron Neuron Fetal-Intestine
Smooth muscle cell Smooth muscle cell Fetal-Intestine
Enterocyte Enterocyte Fetal-Intestine
Myeloid cell Myeloid cell Fetal-Intestine
Erythroid cell Erythroid cell Fetal-Intestine
Vascular endothelial cell Endothelial cell Fetal-Intestine
Fibroblast Fibroblast Fetal-Intestine
Erythroid cell Erythroid cell Fetal-Intestine

(continued)

Table 1.

Continued

Subtype new_cell_type Tissue
Lymphatic endothelial cell Endothelial cell Fetal-Intestine
T cell T cell Fetal-Intestine
Dendritic cell Dendritic cell Fetal-Intestine
B cell B cell Fetal-Intestine
Antigen-presenting cell Antigen-presenting cell Fetal-Intestine
B cell (Plasmocyte) B cell Adult-Bladder
Endothelial cell (non-professional APC) Endothelial cell Adult-Bladder
Fibroblast Fibroblast Adult-Bladder
Macrophage Macrophage Adult-Bladder
Neutrophil Neutrophil Adult-Bladder
Smooth muscle cell Smooth muscle cell Adult-Bladder
T cell T cell Adult-Bladder
Urothelial cell Urothelial cell Adult-Bladder
M2 Macrophage Macrophage Adult-Bladder
Mast cell Mast cell Adult-Bladder
NK cell Natural killer cell Adult-Bladder
Stromal cell Stromal cell Adult-Bladder
Urothelial cell Urothelial cell Adult-Bladder
Vascular endothelial cell Endothelial cell Adult-Bladder
B cell (Centrocyte) B cell Adult-Bone-Marrow
Conventional dendritic cell Dendritic cell Adult-Bone-Marrow
Erythroid progenitor cell Erythroid cell Adult-Bone-Marrow
M2 Macrophage Macrophage Adult-Bone-Marrow
Neutrophil Neutrophil Adult-Bone-Marrow
NK cell Natural killer cell Adult-Bone-Marrow
B cell (Plasmocyte) B cell Adult-Bone-Marrow
Erythroid cell Erythroid cell Adult-Bone-Marrow
HSPC HSPC Adult-Bone-Marrow
Monocyte Monocyte Adult-Bone-Marrow
Neutrophil Neutrophil Adult-Bone-Marrow
T cell T cell Adult-Bone-Marrow
Astrocyte Astrocyte Adult-Cerebellum
Astrocyte(Bergmann glia) Astrocyte Adult-Cerebellum
B cell B cell Adult-Cerebellum
Endothelial cell Endothelial cell Adult-Cerebellum
Epithelial cell Epithelial cell Adult-Cerebellum
Excitatory neuron Neuron Adult-Cerebellum
Inhibitory neuron Neuron Adult-Cerebellum
Interneuron Neuron Adult-Cerebellum
Macrophage Macrophage Adult-Cerebellum
Microglia Microglia Adult-Cerebellum
Neutrophil Neutrophil Adult-Cerebellum
Oligodendrocyte Oligodendrocyte Adult-Cerebellum
Oligodendrocyte progenitor cell Oligodendrocyte Adult-Cerebellum
Smooth muscle cell Smooth muscle cell Adult-Cerebellum
Stromal cell Stromal cell Adult-Cerebellum
T cell T cell Adult-Cerebellum
B cell (Centrocyte) B cell Cord-Blood
B cell(Centrocyte) B cell Cord-Blood
B cell(Plasmocyte) B cell Cord-Blood
B cell(Unknown) B cell Cord-Blood
Conventional dendritic cell Dendritic cell Cord-Blood
Dendritic cell Dendritic cell Cord-Blood
Eosinophil Eosinophil Cord-Blood
Erythroid cell Erythroid cell Cord-Blood
Erythroid/Basophil Progenitor Erythroid cell Cord-Blood
HSPC HSPC Cord-Blood
Megakaryocyte Megakaryocyte Cord-Blood
Monocyte Monocyte Cord-Blood
Neutrophil Neutrophil Cord-Blood
NK cell Natural killer cell Cord-Blood
Plasmacytoid dendritic cell Dendritic cell Cord-Blood
Proliferating cell Proliferating cell Cord-Blood
T cell T cell Cord-Blood

(continued)

Table 1.

Continued

Subtype new_cell_type Tissue
Dendritic cell Dendritic cell Cord-Blood-CD34P
Eosinophil Eosinophil Cord-Blood-CD34P
Erythroid/Basophil Progenitor Erythroid cell Cord-Blood-CD34P
HSPC HSPC Cord-Blood-CD34P
Megakaryocyte Megakaryocyte Cord-Blood-CD34P
Neutrophil Neutrophil Cord-Blood-CD34P
NK cell Natural killer cell Cord-Blood-CD34P
Proliferating cell Proliferating cell Cord-Blood-CD34P
Monocyte Monocyte Cord-Blood-CD34P
Airway smooth muscle cell Smooth muscle cell Fetal-Lung
Basal stem cell Basal cell Fetal-Lung
CD8 T cell T cell Fetal-Lung
Chondrocyte Bronchial chondrocyte Fetal-Lung
Distal cell (DELETED) Fetal-Lung
Distal progenitor cell (DELETED) Fetal-Lung
Endothelial cell Endothelial cell Fetal-Lung
Erythroid cell Erythroid cell Fetal-Lung
Fibroblast Fibroblast Fetal-Lung
Lung mesenchyme cell (cardiopulmonary progenitor) Mesenchymal cell Fetal-Lung
Macrophage Macrophage Fetal-Lung
Megakaryocyte/Erythroid progenitor cell Megakaryocyte Fetal-Lung
Neuron Neuron Fetal-Lung
Neutrophil Neutrophil Fetal-Lung
NK cell Natural killer cell Fetal-Lung
Pericyte Pericyte Fetal-Lung
Proliferating cell Proliferating cell Fetal-Lung
Proliferating lung mesenchyme cell Mesenchymal cell Fetal-Lung
Proliferating smooth muscle cell Smooth muscle cell Fetal-Lung
Proliferating T cell Proliferating T cell Fetal-Lung
Proximal progenitor cell (DELETED) Fetal-Lung
Smooth muscle cell Smooth muscle cell Fetal-Lung
T cell T cell Fetal-Lung
Vascular smooth muscle cell Vascular smooth muscle cell Fetal-Lung
B cell (Plasmocyte) B cell Adult-Liver
Hepatocyte Hepatocyte Adult-Liver
Sinusoidal endothelial cell Sinusoidal endothelial cell Adult-Liver
Activated T cell activative T cell Adult-Liver
Myeloid cell Myeloid cell Adult-Liver
Vascular endothelial cell Endothelial cell Adult-Liver
Neutrophil Neutrophil Adult-Liver
Motile liver macrophage Motile liver macrophage Adult-Liver
Kupffer cell Kupffer cell Adult-Liver
Macrophage macrophage Adult-Liver
Epithelial cell Epithelial cell Adult-Liver
Mast cell Mast cell Adult-Liver
Dendritic cell Dendritic cell Adult-Liver
Conventional dendritic cell Dendritic cell Adult-Liver
Kupffer cell Kupffer cell Adult-Liver
Smooth muscle cell Smooth muscle cell Adult-Liver
Proliferating cell (DELETED) Adult-Liver
Granulocyte Granulocyte Adult-Liver
B cell (Plasmocyte) B cell Adult-Duodenum
T cell T cell Adult-Duodenum
Enterocyte Enterocyte Adult-Duodenum
Goblet cell Goblet cell Adult-Duodenum
Gastric chief cell Gastric chief cell Adult-Duodenum
Mast cell Mast cell Adult-Duodenum
Enterocyte progenitor Enterocyte Adult-Duodenum
Contamination (DELETED) Adult-Duodenum
Endothelial cell Endothelial cell Adult-Duodenum
Fibroblast Fibroblast Adult-Duodenum
Macrophage Macrophage Adult-Duodenum
Immune response stromal cell Immune response stromal cell Adult-Gallbladder
Macrophage Macrophage Adult-Gallbladder

(continued)

Table 1.

Continued

Subtype new_cell_type Tissue
Epithelial progenitor cell Epithelial cell Adult-Gallbladder
T cell T cell Adult-Gallbladder
Endothelial cell Endothelial cell Adult-Gallbladder
Stromal cell Stromal cell Adult-Gallbladder
Inflammatory stromal cell Inflammatory stromal cell Adult-Gallbladder
Mucous epithelial cell Mucosal cell Adult-Gallbladder
Epithelial cell Epithelial cell Adult-Gallbladder
Mast cell Mast cell Adult-Gallbladder
Neutrophil Neutrophil Adult-Gallbladder
Smooth muscle cell Smooth muscle cell Adult-Gallbladder
Antigen presenting cell (DELETED) Adult-Gallbladder
Lymphocyte (DELETED) Adult-Gallbladder
Dendritic cell Dendritic cell Adult-Gallbladder
Fibroblast Fibroblast Adult-Gallbladder
B cell(Plasmocyte) B cell Adult-Gallbladder
B cell(Plasmocyte) B cell Adult-Jejunum
Enterocyte Enterocyte Adult-Jejunum
Enterocyte progenitor Enterocyte Adult-Jejunum
Paneth cell Paneth cell Adult-Jejunum
Goblet cell Goblet cell Adult-Jejunum
Fibroblast Fibroblast Adult-Jejunum
X/A cell X/A cell Adult-Jejunum
Dendritic cell Dendritic cell Adult-Jejunum
T cell T cell Adult-Jejunum
Macrophage Macrophage Adult-Jejunum
Endothelial cell Endothelial cell Adult-Jejunum
Mast cell Mast cell Adult-Jejunum
Smooth muscle cell Smooth muscle cell Adult-Jejunum
Enterocyte Enterocyte Adult-Rectum
B cell(Plasmocyte) B cell Adult-Rectum
B cell B cell Adult-Rectum
Mast cell Mast cell Adult-Rectum
Enteric glial cell Enteric glial cell Adult-Rectum
Inflamed epithelial cell Epithelial cell Adult-Rectum
Macrophage Macrophage Adult-Rectum
T cell T cell Adult-Rectum
Goblet cell Goblet cell Adult-Rectum
Stromal cell Stromal cell Adult-Rectum
Smooth muscle cell Smooth muscle cell Adult-Rectum

To further validate the significance of the tranSig model for real studies, we used bulk RNA-seq of ascending aorta media from healthy donors and aneurysm tissues [35] from ascending aneurysm patients and compared the cell type proportions corresponding to aneurysm pathological changes between two groups. To evaluate the effect of the sequencing depth and single cell platform, we used the single cell datasets from three healthy donors and eight ascending thoracic aortic aneurysm (ATAA) patients by 10× genomics. The dataset includes main cell types in the aorta i.e. endothelial cells, smooth muscle cells, fibroblasts, macrophages, T cells, B cells, NK cells, mast cells and plasma cells.

For real applications, bulk RNA-seq of the whole blood from 12 healthy adults from the Stanford Blood Center [36] with group truth validated by FACS and immunofluorescence were used.

The BAL bulk RNA-seq data [37] were obtained from Vukmirovic et al. (2021). The samples were collected from 184 individuals in a sarcoidosis patient cohort by the Genomic Research in Alpha-1 Antitrypsin Deficiency and Sarcoidosis (GRADS) study [38]. BAL is composed of four cell populations: alveolar macrophages (AMs), eosinophils, lymphocytes and neutrophils. The mean proportion of AMs is 89%. Therefore, a reliable cell type deconvolution tool should identify the AMs as the dominant cell type in bulk RNA-seq data.

The aneurysm bulk RNA-seq data [39] were from Chen et al. (2020). The samples included aortic tissues from six healthy donors and aneurysm tissues from six patients with ascending aortic aneurysm. For aortic tissues, each donor had two samples on the middle and distal parts. In terms of aneurysms, the neck and belly were collected for each patient.

All the bulk datasets were normalized by TPM.

Signature gene list construction

Based on the HCL datasets, significantly highly expressed genes of each cell type from the five tissues [artery, lung, PBMC, bone marrow (BM) and liver] used in the tranSig model were calculated by the ‘FindAllMarkers’ function of Seurat R package. We set 0.25 as the cutoff for logarithmic fold changes, and the positive fold changes are selected by the parameter ‘only.pos’ of ‘FindAllMarker’ function. We used the union of these DEGs as the signature gene list for the tranSig model (Table 2) which covers most of the immune cells. However, we recommend generating a customized signature gene list by finding DE genes when the immune cells are not the only major cell populations in the target tissue.

Table 2.

The signature gene list

ACTB NPM1 ARRB1 IL7 BMX
ACTG1 NUP214 ASGR1 IL7R C1QA
AGTRAP OAZ1 ASGR2 ITK C1QB
AIF1 OST4 ATP8B4 KCNA3 CA4
ANXA1 OSTC AZU1 KCNG2 CACNA2D3
ANXA3 P4HB BACH2 KIR2DL1 CALB2
ANXA5 PARK7 BARX2 KIR2DL4 CALML4
AP1S2 PCBP1 BCL11B KIR2DS4 CAMK4
APOBEC3A PDIA4 BCL7A KIR3DL2 CASP1
ARHGDIB PDIA6 BEND5 KLRC3 CD14
ARL4C PEBP1 BFSP1 KLRC4 CD163
ARPC1B PECAM1 BHLHE41 KLRF1 CD207
ARPC2 PFDN5 BLK KLRG1 CDC14B
ARPC3 PFN1 BMP2K KLRK1 CDC42EP4
ASAH1 PGK1 BPI KYNU CDK2AP2
ATP6V1F PILRA BRAF LAG3 CDR2L
B2M PIM2 BRSK2 LAIR2 CEACAM1
BANK1 PLAC8 BST1 LAMP3 CES1
BCL2A1 PLBD1 BTNL8 LAT CFB
BIRC3 PLD4 C11orf80 LCK CHMP7
BLVRB PLPP5 C1orf54 LEF1 CIB2
BTG1 POMP C3AR1 LHCGR CIITA
C12orf75 POU2F2 C5AR2 LILRA2 CLCC1
C1orf162 PPIA CA8 LILRA4 CLCF1
C4orf3 PPIB CASP5 LIME1 CLEC5A
C5AR1 PPT1 CCDC102B LRMP CNNM1
CALM1 PRDX1 CCL1 LTA CNOT1
CALM2 PRDX4 CCL13 LY9 COL4A3
CALR PSAP CCL14 MAK CP
CAMP PSMA2 CCL17 MAN1A1 CRISPLD2
CAPG PSMA3 CCL18 MANEA CRLF2
CARD16 PSMB2 CCL19 MAP3K13 CYP4F3
CD1C PSME2 CCL20 MAP4K1 CYSLTR1
CD24 PYCARD CCL22 MAP4K2 DEFB1
CD27 RABAC1 CCL23 MAP9 DENND3
CD36 RACK1 CCL4 MARCO DPYD
CD37 RAN CCL5 MAST1 DUSP4
CD38 RETN CCL7 MEFV DYSF
CD3D RGS2 CCL8 MEP1A EDN1
CD48 RHOA CCND2 MGAM EMILIN2
CD53 RHOC CCR10 MICAL3 ESPL1
CD59 RNASE2 CCR2 MMP12 EVL
CD63 RNASE3 CCR3 MMP25 FARS2
CD74 RNASE6 CCR5 MMP9 FBLN1
CD79A RNASET2 CCR6 MROH7 FCHO1
CD79B RNF130 CCR7 MS4A2 FGFR3
CD99 ROMO1 CD160 MS4A3 FLT4
CDA RPL10 CD180 MSC FMO5
CDKN1C RPL10A CD19 MXD1 FST
CFD RPL11 CD1A MYB FSTL1
CFL1 RPL12 CD1B NAALADL1 FUT3
CHCHD2 RPL13 CD1D NCR3 FXYD6
CLEC10A RPL13A CD1E NFE2 GAS7
CLEC12A RPL14 CD2 NIPSNAP3B GATA2
CLEC4E RPL15 CD209 NKG7 GBP1
CLEC7A RPL18 CD22 NLRP3 GCH1
CMC1 RPL18A CD244 NME8 GFOD1
COPE RPL19 CD247 NOD2 GIMAP4
CORO1A RPL21 CD28 NOX3 GJB1
COTL1 RPL22 CD300A NPAS1 GLRX2
COX4I1 RPL22L1 CD33 NPIPB15 GP5
COX5B RPL23 CD3E NPL GPNMB
COX6A1 RPL23A CD3G NR4A3 HAGH
COX6C RPL24 CD4 NTRK1 HAVCR1
COX7A2 RPL27 CD40 ORC1 HBD
COX7B RPL27A CD40LG OSM HGD

(continued)

Table 2.

Continued

ACTB NPM1 ARRB1 IL7 BMX
CPVL RPL29 CD5 P2RX1 HOMER2
CSF3R RPL3 CD6 P2RX5 HOXA2
CST3 RPL30 CD68 P2RY10 HTRA1
CSTA RPL31 CD69 P2RY13 IFI27
CSTB RPL32 CD7 P2RY14 IFNB1
CTSA RPL34 CD70 P2RY2 IFT20
CTSB RPL35 CD72 PADI4 IGFBP2
CTSC RPL35A CD80 PAQR5 IL15RA
CTSD RPL36 CD86 PASK IL1R2
CTSS RPL36AL CD8A PBXIP1 IL1RAP
CUTA RPL37 CD8B PCDHA5 IL32
CUX1 RPL37A CD96 PDCD1 IL6ST
CYBA RPL39 CDC25A PDCD1LG2 ING2
CYBB RPL4 CDH12 PDE6C IRS1
CYCS RPL41 CDHR1 PDK1 JRKL
DAD1 RPL5 CDK6 PGLYRP1 KCNJ15
DEFA3 RPL7 CEACAM3 PIK3IP1 KIF22
DEFA4 RPL7A CEACAM8 PKD2L2 KIR3DL1
DERL3 RPL8 CEMP1 PLA1A KLHL18
DNAJA1 RPL9 CFP PLA2G7 KRT19
DUSP1 RPLP0 CHI3L1 PLCH2 KRT5
DUSP11 RPLP1 CHI3L2 PLEKHF1 KSR1
DYNLRB1 RPLP2 CHST15 PLEKHG3 LAIR1
DYNLT1 RPN1 CHST7 PMCH LILRA5
EAF2 RPN2 CLC PNOC LILRB1
EDF1 RPS10 CLEC2D PPBP LIMA1
EEF1B2 RPS11 CLEC4A PPFIBP1 LIMK2
EEF1D RPS12 CLIC2 PRF1 LRP5L
EEF2 RPS14 CMA1 PRG2 LRRC8D
EIF1 RPS15 COL8A2 PRR5L LSM4
EMB RPS15A COLQ PSG2 MAG
ERH RPS16 CPA3 PTGDR MAL
ERP29 RPS17 CR2 PTGER2 MAOA
EVI2B RPS18 CREB5 PTGIR MAPK7
FCER1A RPS19 CRISP3 PTPRG MAT2B
FCER1G RPS2 CRTAM QPCT MEST
FCGR3A RPS20 CRYBB1 RAB27B MME
FCGRT RPS21 CSF1 RALGPS2 MMP8
FCMR RPS23 CSF2 RASA3 MOCS3
FCN1 RPS25 CST7 RASGRP2 MPO
FGFBP2 RPS26 CTLA4 RASGRP3 MPPED2
FGL2 RPS27 CTSG RASSF4 MRPL3
FGR RPS27A CTSW RCAN3 MRPL4
FKBP11 RPS28 CXCL10 REN MT1X
FKBP2 RPS29 CXCL11 RENBP MTMR11
FOLR3 RPS3 CXCL13 REPS2 MTSS1
FOS RPS3A CXCL3 RGS1 MUC1
FPR1 RPS4X CXCL5 RGS13 MYLIP
FTL RPS5 CXCL9 RRP12 NAGA
GAPDH RPS6 CXCR1 RRP9 NBN
GCA RPS7 CXCR2 RSAD2 NBR1
GLIPR1 RPS8 CXCR5 RYR1 NDRG2
GLRX RPS9 CXCR6 S1PR5 NEFL
GMFG RPSA CYP27A1 SAMSN1 NOTCH4
GNAS S100A11 CYP27B1 SCN9A NPEPPS
GNG7 S100A12 DACH1 SEC31B NR2E3
GRN S100A6 DAPK2 SERGEF NR4A2
GSTP1 S100A8 DCSTAMP SH2D1A NRG1
GTF3A S100A9 DENND5B SIGLEC1 NRGN
GZMA S100P DEPDC5 SIK1 NUDT1
GZMB SAMHD1 DGKA SIRPG NUDT18
GZMK SAT1 DHRS11 SIT1 NXT1
HCK SDCBP DHX58 SKA1 NXT2
HERPUD1 SDF2L1 DPEP2 SKAP1 OLFM1

(continued)

Table 2.

Continued

ACTB NPM1 ARRB1 IL7 BMX
HINT1 SEC11C DPP4 SLAMF1 ORM1
HLA-A SEC61B DSC1 SLAMF8 OSBPL10
HLA-DMA SEC61G DUSP2 SLC12A8 PALLD
HLA-DMB SEC62 EBI3 SLC15A3 PANX1
HLA-DPA1 SELL EFNA5 SLC2A6 PAX5
HLA-DPB1 SERF2 EGR2 SLC7A10 PCGF2
HLA-DQA1 SERPINA1 ELANE SLCO5A1 PDGFB
HLA-DQA2 SERPINF1 EPB41 SMPD3 PDK4
HLA-DQB1 SH3BGRL EPHA1 SMPDL3B PHEX
HLA-DRA SH3BGRL3 EPN2 SOCS1 PI3
HLA-DRB1 SLC25A6 ETS1 SP140 PIK3CG
HLA-DRB5 SLIRP ETV3 SPAG4 PLAT
HM13 SMDT1 FAM124B SPOCK2 POU2AF1
HMGN1 SNRPD2 FAM174B ST3GAL6 PPA1
HMOX1 SNRPG FASLG ST6GALNAC4 PROM1
HSBP1 SNU13 FBXL8 ST8SIA1 PRR5
HSP90AA1 SNX3 FCER2 STAP1 PSAT1
HSP90B1 SOD2 FCGR2B STEAP4 PTPN13
HSPA5 SPCS1 FCGR3B STXBP6 PTPRK
IFI44L SPCS2 FCRL2 TBX21 PTPRS
IFITM1 SPCS3 FES TCF7 PTTG2
IFITM2 SPI1 FFAR2 TCL1A QPRT
IFITM3 SPIB FLT3LG TEC RAB9A
IGLL5 SRP14 FLVCR2 TEP1 RAMP1
IGSF6 SSR2 FOSB TGM5 RARRES2
ILF2 SSR3 FOXP3 TLR2 RNASE1
IRF7 SSR4 FPR2 TLR7 RNASE4
IRF8 STXBP2 FPR3 TLR8 RNF122
ISG15 SUB1 FRK TMEM255A RRAS
ISG20 SYNGR2 FRMD4A TNFAIP6 S100B
ITM2B TAGLN2 FRMD8 TNFRSF10C SCRN1
ITM2C TALDO1 FZD2 TNFRSF11A SDC1
JAML TIMP1 FZD3 TNFRSF13B SEC63
JCHAIN TKT GAL3ST4 TNFRSF4 SERPINF2
KCTD12 TMBIM6 GFI1 TNFSF14 SETBP1
KDELR2 TMED9 GGT5 TNIP3 SF3A3
KLRB1 TMEM156 GIPR TPSAB1 SFTPD
KLRD1 TMEM176B GNLY TRAF4 SFXN3
LCN2 TMEM258 GPC4 TRAT1 SH3BP2
LCP1 TMSB10 GPR1 TREM1 SIDT1
LDHB TMSB4X GPR171 TREM2 SIGLEC6
LGALS2 TNFRSF17 GPR18 TREML2 SLC12A3
LILRB2 TNFSF10 GPR183 TRIB2 SLC17A5
LIMD2 TNFSF13B GPR19 TRPM4 SLC1A4
LMAN2 TPI1 GPR25 TRPM6 SLC38A1
LSP1 TRMT112 GPR65 TSHR SLC4A1AP
LST1 TSPO GRAP2 TTC38 SLC6A13
LTA4H TUBA1B GYPE TXK SLC7A7
LTB TXN GZMH UBASH3A SLC9A3R1
LY6E TYMP GZMM UPK3A SLCO2B1
LY86 UBA52 HAL VILL SMARCD3
LY96 UBE2J1 HDC VNN1 SOCS2
LYZ UBE2N HESX1 VNN2 STAB2
MANF UCP2 HHEX VNN3 SYNE1
MCL1 UFM1 HIC1 WNT5B TAGLN
MIF UQCR11 HK3 WNT7A TBC1D8
MNDA UQCRH HLA-DOB ZAP70 TFEC
MPEG1 UQCRQ HNMT ZBP1 TGM3
MRPL33 VAMP8 HOXA1 ZBTB10 TLL1
MRPL52 VCAN HPGDS ZBTB32 TLR5
MS4A1 VPREB3 HPSE ZNF135 TMC6
MS4A6A XBP1 HRH1 ZNF165 TMEM9B
MS4A7 YWHAB HSPA6 ZNF222 TMF1
MT-ATP6 ZFP36L2 HTR2B ZNF286A TNFRSF25
MT-ATP8 ZNF706 ICA1 ZNF324 TNNI2

(continued)

Results

tranSig framework

To leverage information from multiple single cell references, we designed a new framework, called tranSig, based on transfer learning to handle cross-platform and cross-tissue variations to derive a more accurate signature matrix for downstream cell type deconvolution (Figure 1).

Figure 1.

Figure 1

Illustration of the tranSig framework. The framework starts from selecting signatures from reference single cell datasets for downstream harmonization. Based on LIGER, we filter out the genes in reference tissues which are unlikely to share common expression distributions with the target tissue. After that, all the selected signatures and their corresponding single cell expression profiles are input into the tranSig Bayesian model to derive a more reliable signature matrix. On the other hand, we remove batch-effects between the bulk and target single cell datasets based on Combat to project the bulk data onto the space of target single cell dataset. Finally, the tranSig signature matrix and the corrected bulk RNA-seq can be coupled with other cell type deconvolution optimization tools i.e. NNLS and CIBERSORTx.

As mentioned above, existing methods mostly generate their signature matrices based on one scRNA-seq dataset from the tissue type of the bulk dataset, noted as the target tissue. Due to the existence of technical batch effects, it is challenging to combine data from different tissues, or different studies on the same tissue. In tranSig, we assemble the scRNA-seq data of target tissue with those of other tissues or studies, noted as the reference tissues/studies, into an integrated dataset to derive the cell type signature matrix. For instance, when performing tranSig on the bulk data of PB, we took PB as the target tissue, and BM, CB, lung, kidney and liver as reference tissues. Specifically, we considered the HCL [24] and manually cleaned the cell type annotations to cover 25 adult and five fetal tissue types as sources for both target and reference tissues. We project the reference scRNA-seq datasets on data of target tissue by adopting the matrix factorization-based single cell batch correction method, LIGER [33, 40]. In addition, we identify the cell-type-conserved signature after batch-effect correction and remove tissue-specific signatures in the reference datasets. To accomplish this, as detailed in the Methods section, we compare every reference signature matrix with the target signature matrix to select cell-type-conserved signatures that may share the common distribution with the target tissue. Finally, we only keep the cell-type-conserved expression profiles in references that have a similar distribution with the target tissue.

Our hierarchical Bayesian model considers the batch-effect-corrected single cell expression profile after batch correction. To deconvolve bulk RNA-seq data, we first identify its tissue type and denote it as the target tissue type. Then we assume that the expression level of each gene in a given cell type from a given tissue follows a Gaussian distribution, in which the mean follows another mixture Gaussian distribution shared across different tissues. The latter mixture Gaussian distribution is to distinguish the signature genes from the others. With the implementation of the State-Augmentation for Marginal Estimation (SAME) [34], the algorithm is largely accelerated.

Table 2.

Continued

ACTB NPM1 ARRB1 IL7 BMX
MT-CO2 ABCB4 ICOS ZNF442 TOMM22
MT-CO3 ABCB9 IDO1 ABCA5 TOMM34
MT-CYB ACAP1 IFNG ABCB1 TRAF3IP2
MT-ND1 ACHE IL12B ABHD5 TRAK1
MT-ND2 ACP5 IL12RB2 ACSM3 TRIB1
MYDGF ADAM28 IL17A ADAM19 TSPAN7
MYL6 ADAMDEC1 IL18R1 ADAMTS5 TUBB6
MZB1 ADAMTS3 IL18RAP ADI1 TULP2
NAAA ADRB2 IL1A AGPAT5 TYRO3
NACA AIM2 IL1B ALAS1 ULK2
NAP1L1 ALOX15 IL1RL1 ALPL WEE1
NCF2 ALOX5 IL21 ANK3 YTHDF3
NDUFA1 AMPD1 IL26 AOC2 ZC3H12A
NDUFA11 ANGPT4 IL2RA APOE ZDHHC13
NDUFA4 ANKRD55 IL2RB ARID4A ZNF180
NDUFB1 APOBEC3G IL3 ARNT2 ZNF189
NDUFB4 APOL3 IL4 ASRGL1 ZNF34
NDUFB6 APOL6 IL4R ATP2A1 ZNF552
NME1 AQP9 IL5 ATP2B1 ZNF593
NPC2 ARHGAP22 IL5RA BEX1

To better align the bulk RNA-seq data with the signature matrix inferred from reference scRNA-seq data, we perform batch correction similar to the CIBERSORTx S mode [36] which utilizes the pseudo bulk mixtures derived from scRNA-seq to reduce the batch effects between scRNA-seq and bulk RNA-seq. We generate pseudo bulk mixtures by sampling cells from scRNA-seq dataset of the target tissue and implement the EB batch-effect removal model, Combat [41], to adjust bulk RNA-seq mixtures. The adjusted bulk RNA-seq data are taken as the input for downstream deconvolution along with the tranSig signature matrix.

Overall, with the above batch effect corrections, both the original bulk RNA-seq and the scRNA-seq dataset of reference tissues/studies were aligned with the scRNA-seq data of the target tissue.

We view the tranSig framework as an add-on step for cell type deconvolution. Therefore, it can be coupled with any existing cell type deconvolution methods e.g. NNLS and CIBERSORTx, to estimate cell type proportions.

Robustness evaluation through simulations

We have performed simulations to evaluate the robustness of our proposed tranSig model. Since it is challenging to simulate cross-platform or cross-tissue effects, we focused on assessing the robustness with the assumption that batch effects in both bulk RNA-seq and multi-tissue scRNA-seq have been successfully removed. We assumed that the true signature matrix of the target tissue type (Inline graphic) contains two parts: a matrix Inline graphic with binary entries indicating whether each gene is an expressed cell-type-specific signature gene, and a matrix Inline graphic with continuous entries quantifying the average expression levels of signature genes. The signatures from all the input single cell datasets, including both the target and reference datasets, share the same underlying distribution, which is the product of Inline graphic and Inline graphic. Specifically, we simulated Inline graphic from a Gaussian distribution with mean Inline graphic and let Inline graphic be the mean of Gaussian distribution of Inline graphic

We note the possibility that an expressed signature gene may not be detected due to the prevalent dropout events in scRNA-seq data. The droplet-based single cell RNA-seq can theoretically detect 5000 genes per cell as the saturated number of detected genes [42], but the number of detected genes in each cell is often lower than the saturated numbers in published studies due to the low sequencing depth (median number of detected genes: 256–602 in HCL; 1973 in ATAA [35]). Therefore, in our simulations, we set the expression level for an expressed gene to be zero with a certain probability, corresponding to the undetected rate. A higher undetected rate can result in a loss of signature information in the empirical signature matrix constructed by averaging the single cell expression profiles grouped by cell type. We evaluated the performance of tranSig model coupled with both NNLS and CIBERSORTx to that of other methods, including MuSiC, CIBERSORTx with empirical signature matrix as input (empirical + CIBERSORTx) and NNLS from MuSiC without weights. If the true cell type annotations are accessible, a higher correlation between estimates and the ground truth suggests better performance.

We investigated how the undetected rate and the number of signature genes can affect cell type deconvolution (Figure 2A). Compared with the other methods, tranSig combined with CIBERSORTx (tranSig + CIBERSORTx) achieved more accurate cell type proportion estimations with higher correlations between the true and estimated cell type proportions. In the left panel, the performance of tranSig + CIBERSORTx was the most stable across different undetected rates. When constructing a signature matrix by differential expression (DE) analysis, the number of signature genes depends on the method and threshold selected in DE analysis. We also investigated how the number of signature genes influenced cell type proportion estimation to evaluate the robustness of methods. The right panel in Figure 2A showed that the two tranSig methods and CIBERSORTx had the best performance with a relatively small number of signature genes e.g. 150. In contrast, MuSiC and NNLS in the MuSiC R package required multiple subjects and cross-subject variation in single cell expression profiles to make an accurate estimation. With respect to tissue numbers, the accuracy of estimations was increased while transferring the information from reference tissues by tranSig (Supplementary Figure S1).

Figure 2.

Figure 2

Model robustness assessment through simulations. (A) Robustness of tranSig signature matrix against non-zero expression undetected rate (left) and signature gene number (right). The colors code the combinations of signature matrix and deconvolution tools. The vertical lines are error bar of mean ± 0.5*s.d; (B) Benchmarking comparisons between the true and estimated labels. The x- and y-axis are the true and estimated cell type proportions. The colors indicate the cell type labels.

We also show the scatter plots of the true and estimated cell type proportions to assess tranSig’s performances across cell types (Figure 2B). When there were eight cell types, 500 signature genes and an undetected rate of 25%, tranSig-based methods coupled with CIBERSORTx and NNLS had the best estimation, and all points centered around a single line. CIBERSORTx also had second best performance but was not as accurate. None of the methods could successfully identify the rare cell types with proportions <10%. The two tranSig-based methods tended to overestimate the proportions of more prevalent cell types but underestimate those of rare cell types.

Bulk PB deconvolution to handle cross-tissue and cross-platform variations

To evaluate the performance of tranSig on real data, we first analyzed PB that is composed of easily distinguishable immune cells, including monocytes, neutrophils, T cells, B cells and others. We applied tranSig to bulk RNA-seq data of whole blood from 12 healthy adults [36], for which cell differentials were measured by flow cytometry. For scRNA-seq references, PB was taken as the target tissue, with BM, CB, lung, kidney and liver as reference tissues. The union of highly expressed genes (HEGs) of each cell type in each tissue was considered as the signature gene list (details in Methods). NNLS, CIBERSORTx and QP [43] were used for deconvolution analysis. The correlations between the estimated cell type proportions and the ‘ground truth’ obtained from flow cytometry were calculated to assess the deconvolution performance. As shown in Figure 3A, tranSig coupled with three deconvolution methods (i.e. NNLS, CIBERSORTx and QP) had more accurate estimation than those based on the empirical signature matrix. Among the three deconvolution methods, both NNLS and CIBERSORTx had more accurate proportion estimates for most cell types when coupled with tranSig. Overall, tranSig coupled with CIBERSORTx had the most accurate estimation compared with the ground truth.

Figure 3.

Figure 3

PB bulk data cell type deconvolution benchmarking analysis. (A) Box plots of the correlations between the estimated cell type proportions and the ground truth for Newman et al. blood samples (n = 12), with color coded by cell types. CIBERSORTx is denoted as square, NNLS is denoted as circles and QP (quadprog) is denoted as triangles. Statistical significance is calculated by the Wilcoxon test. Data are presented as medians ± interquartile range. (B) Benchmarking of deconvolution methods shown on jitter plots of correlations same as (A). Data are expressed as means ± s.d.

Within the tranSig framework, we made adjustments to both scRNA-seq references and bulk RNA-seq expression profiles so we also evaluated these adjustments in real applications of PB deconvolutions (Supplementary Figure S2). With LIGER implementation, the shared signature genes cross tissues were selected and improved the deconvolution results of tranSig. In addition, we compared two types of single cell expression profiles [i.e. raw counts and TPM normalization] as the input of the tranSig model. The results suggest that using raw counts as input may outperform TPM normalization due to the estimation of Inline graphic in the tranSig model. The details are discussed in the Methods section. Therefore, we used the raw counts as the input of the tranSig model in all subsequent analyses, unless stated otherwise. In the adjustment of bulk RNA-seq expression profiles, we generated the pseudo-bulk expression profiles by sampling from the target single cell data. There was an apparent technical batch effect between bulk and pseudo-bulk expression profiles. After adjustment by Combat, the bulk mixture was adjusted to the space of scRNA-seq (Supplementary Figure S3). We found that tranSig with adjusted bulk mixture as input outperformed tranSig with original bulk mixture by comparing tranSig with tranSig_nonadj in Supplementary Figure S2.

To systematically benchmark the performance of different methods (Figure 3B and Supplementary Figure S4), we implemented MuSiC, bisque, BSEQ-sc and CIBERSORTx with the S mode and B mode and evaluated the accuracy using correlation, RMSE and K-L divergence between estimates and the ground truth. Overall, the performance of tranSig + CIBERSORTx was superior to other methods. It is interesting to note that all three CIBERSORTx modes, including disabled batch correction mode, S mode and B mode, accurately estimated the proportions of B cells and T cells but failed to estimate those of monocytes and neutrophils. Because the HCL datasets were generated by Microwell-seq and were UMI-based sequencing data, the deconvolution results show that the S mode may substantially improve the overall estimation accuracy but perform poorly for neutrophils and monocytes (Supplementary Figure S5). Specifically, the estimated neutrophil proportions were smaller than 10% when the ground truth was around 60%. For MuSiC, although the proportions of neutrophils, T cells and B cells were accurately estimated, the performance of monocyte estimation was worse than either tranSig or CIBERSORTx. With respect to bisque and BSEQ-sc, they accurately estimated the proportion of T cells but failed in other cell type estimations. Taken together, the cell type deconvolution by tranSig coupled with CIBERSORTx achieved higher accuracy and less across-cell type variance than the other methods.

Bulk BAL deconvolution in a sarcoidosis cohort to identify dominant cell type

For real data, the samples have only one dominant cell type leading to highly unbalanced proportion across different cell types. It is important for a cell type deconvolution method to identify the dominant cell type and distinguish it from the others in such cases. To compare the performance of different methods under this scenario, we considered the BAL bulk RNA-seq dataset from the GRADS Sarcoidosis cohort [37], where proportion of AMs was around 80% [44].

We used adult PB in HCL as the target single cell dataset and the other five tissues (adult adipose, adult BM, adult lung, CB and fetal lung) as reference single cell datasets in tranSig because all these five tissues have immune cells as their major cell populations.

The AM is critical to lung inflammation and repairment [45]. They are tissue-resident cell types so they are cellularly and functionally different from the monocyte-derived macrophages in PBMC. We removed 48 AM differentially expressed genes (Table 3) from the signature gene list so that only AM and macrophages shared signature genes were used for deconvolution. It can largely reduce the bias caused by the heterogeneity between AMs and macrophages in PB.

Table 3.

Differentially expressed genes between AMs and macrophages

MARCKS EMP1 ZFP36L1 FNIP2 IER3 CXCL2
C15orf48 CTSL CCL2 CCL4 SPP1 CCL18
MCEMP1 CCL3 CCL20 LGMN G0S2 MT1X
PLA2G7 FOLR3 NEAT1 TIMP1 CCL3L1 MT2A
SOD2 SDS IFITM3 HIF1A SGK1 CXCL3
GPR183 TMEM176B CD36 CXCL8 MT1G IL1B
HP VCAN CCL4L2 CXCL10 AREG HSPA1B
BASP1 TNFAIP6 RNASE1 MAFB NFKBIA HSPA1A

This data set has macrophages as the largest cell population and lymphocytes as the second largest. As shown in Figure 4, both tranSig and NNLS with the empirical signature matrix estimated the proportion of macrophages to be around 70%. However, the latter failed to identify T cells and estimated the second dominant cell type as dendritic cells with a mean proportion around 30%, which is different from the measured cell differentials [37]. In comparison, tranSig successfully recognized T cells as another major cell type in these samples. In addition to other benchmarking methods, MuSiC and BSEQ-sc estimated dendritic cells as the largest cell type, and bisque and CIBERSORTx failed to identify the scenario of unbalanced cell composition. Collectively, tranSig coupled with CIBERSORTx was capable to identify the most dominant cell type as well as the rest part and fit the deconvolution of samples with unbalanced cell proportions.

Figure 4.

Figure 4

BAL bulk data cell type deconvolution and tranSig identifies macrophages as the dominant cell type. The boxplots of cell type proportions across cell types. The x-axis represents the cell types, coded by the colors. The y-axis represents the estimated cell type proportions.

Bulk aorta deconvolution to depict cellular pathological changes of aneurysm (AN)

We further assessed whether tranSig deconvolution can identify pathological changes of tissues (Figure 5) by applying tranSig to bulk RNA-seq of aortic or aneurysm [39] tissues from six healthy donors and six patients with ascending aortic aneurysm. Aortic aneurysm [46, 47] is a permanent and localized dilation of the aorta, and is a fatal vascular disease. The pathological features [48] of aortic aneurysm were well-studied, including apoptosis of smooth muscle cells (SMCs) [49], infiltration of immune cells [50] (i.e. macrophages [51, 52] and T cells [53, 54]), matrix metalloproteinase increase [55] and elastin degradation. We took the artery dataset from HCL as the target tissue and lung, BM, PB, CB, kidney and liver as the reference tissues. Deconvolution with empirical signature matrix by both NNLS and CIBERSORTx could infer increased macrophages and reduced SMCs and stromal cells but missed the signals of other immune cells. Deconvolution results of tranSig showed increased macrophages and T cells, consistent with the inflammatory infiltration, and the decreased SMCs and stromal cells, suggesting SMC apoptosis in the pathogenesis of aortic aneurysm. Similar to the results of PB deconvolution, CIBERSORTx was more stable than NNLS. We found that CIBERSORTx failed to estimate immune cells, and there was no obvious improvement while implementing the S mode or B mode.

Figure 5.

Figure 5

Applications on aorta to depict cellular pathological changes of aneurysm (AN). Jitter plots of estimated cell type proportions for the samples in Chen et al., including aortic tissues from six health donors as control and aneurysm tissues from patients with ascending aortic aneurysm as AN, color coded by individuals. Color bars on the right annotate the deconvolution methods and single cell references, the artery dataset of HCL in red and the ATAA dataset in green. Data are expressed as means ± s.d.

Due to the lack of multiple subjects in the artery dataset from HCL, MuSiC and bisque could not be applied in this case. Therefore, we implemented the benchmarking methods on another scRNA-seq expression data of aortic tissues [35] from three healthy donors and eight patients with ascending aortic aneurysm (ATAA dataset with a deeper sequencing depth compared with the HCL artery dataset). In tranSig framework, we took the aorta from the ATAA dataset as the target tissue and lung, BM, PB, CB, kidney and liver from HCL datasets as the reference tissues to evaluate the performance of leveraging information across studies and platforms (Supplementary Figure S6). By implementing tranSig with CIBERSORTx, the results exhibited pathogenesis of SMC and immune cells. Consistent with the above results, deconvolution with tranSig + CIBERSORTx was more stable than NNLS. Similar to the deconvolution with HCL artery data, CIBERSORTx missed immune cells. The substantial improvement of CIBERSORTx by coupling with tranSig demonstrated the benefit of tranSig across studies and platforms. In addition, almost all cells in the aorta or aneurysm were estimated as SMCs by MuSiC, which cannot interpret the infiltration of inflammation in the pathogenesis of aortic aneurysm (Figure 5). Intriguingly, the estimation of bisque demonstrated the pathogenesis of aortic aneurysm, but showed a high variance in T and NK cell estimations, probably resulting from identifying these analogous cell types (Figure 5).

Discussion

In this study, we have developed tranSig, a novel Bayesian model to better infer a signature matrix by transfer learning across multiple scRNA-seq datasets. In the tranSig framework, we use SAME [34] for statistical inference and estimate a more reliable signature matrix by a Gaussian mixture prior. Highly expressed genes of cell types are screened as signature genes. LIGER [33, 40] was implemented and k-means was used to integrate scRNA-seq datasets from different tissues and studies. Specifically, tranSig selects target tissue-specific signature genes from multiple reference tissues. It aims to integrate informative and conserved signature genes as input of tranSig Bayesian model and removes the reference tissue-specific genes that have distinct expression distributions compared with those in the target tissue. In addition, we adopt Combat [56, 57] on bulk RNA-seq mixture and pseudo-bulk mixture derived from scRNA-seq to correct for the batch effects between bulk RNA-seq and scRNA-seq data. The final tranSig signature matrix and batch-effects-corrected bulk RNA-seq can be input to NNLS or any other external cell type deconvolution tools e.g. CIBERSORTx and quadratic programming, for cell type deconvolution.

To investigate the robustness of tranSig, we conducted a number of simulations under different conditions. For the application of simulation, we skipped tissue- and platform-effects simulations for simplicity. Therefore, the simulations mainly examined whether the tranSig model can construct an accurate signature matrix if assuming all the tissue- and platform-effects have been eliminated. Simulations demonstrated higher stability of tranSig than other methods (e.g. NNLS, CIBERSORTx and MuSiC) across different tissue numbers, numbers of signature genes and undetected rates (Figure 2 and Supplementary Figure S1). Otherwise, we noticed that the previous method bisque had a poor performance under all simulation scenario but performed better in real application, which may result from the cell proportion simulation of single cell data. We generated cell proportions of single cell data from a uniform distribution, which may influence the process of transformations in bisque. In addition to SAME sampling, we take Inline graphic and Inline graphic as examples in simulations (Supplementary Figure S7). We observe convergence of the algorithm after around 50 iterations, indicating the robustness of the algorithm. Notably, the robustness to undetected rates suggests that our approach can handle single cell datasets of low quality.

Applications of tranSig in real datasets demonstrated more accurate and stable deconvolution results coupled with CIBERSORTx. The effectiveness of LIGER and Combat was also evaluated. We deployed the tranSig pipeline on a BAL bulk RNA-seq dataset and showed that tranSig could successfully identify the top two dominant cell types: AMs and T cells, while the other methods failed to. Although there was still a gap between the true and estimated cell type proportions of AMs, tranSig had the highest accuracy. Unlike the first two applications, the deconvolution methods did not perform well on the aortic aneurysm data set which does not have the ‘true’ cell type proportions estimated through sorting experiments. Therefore, we tried to validate our results indirectly by comparing the estimates between the normal aorta and aortic aneurysms. The results showed that tranSig+CIBERSORTx could interpret the pathological changes of aneurysms, including SMC apoptosis and inflammatory infiltration. Building upon the current deconvolution tools, we developed a Bayesian framework to infer the signature matrix and proposed a more accurate deconvolution framework. For future real applications, this framework can provide the pathological information of cell types as well as the rare subtypes without the need for fresh tissues and specialized platforms compared with the experimental methods.

Based on the simulation and real application results, tranSig+ CIBERSORTx performed better than other methods, specifically when using scRNA-seq datasets with low sequencing depth to derive a signature matrix. In the tranSig framework, we mainly utilized the HCL datasets generated by Microwell-seq [58, 59] with a low sequencing depth (~500 detected genes in each cell) over 30 main tissues, including some rarely studied ones. Thus, the tranSig framework takes advantage of the comprehensive tissue types and overcomes various common technical noises [60–64] in scRNA-seq data, such as the bias caused by insufficient sequencing depth, low capture efficiency, high drop-out rate or cell type misclassifications. Thus, the deconvolution results of tranSig are more robust than other benchmarking methods. Furthermore, we used the ATAA as the target single cell datasets and other tissues of HCL datasets as references in the real application of aortic aneurysms, which shows the robustness and effectiveness of tranSig in transferring cross-study and -platform information.

The tranSig framework requires raw counts in scRNA-seq datasets and TPM normalized data in bulk RNA-seq. Our Bayesian model allows unnormalized scRNA-seq data input because of our mixture Gaussian priors. The utilization of raw count matrix for scRNA-seq data shows improved performance over TPM normalized matrix in terms of correlations between estimates and ground truth (Supplementary Figure S6).

We note that the primary goal of tranSig is to harmonize information from other tissues. This enables tranSig to perform cell type deconvolution on rarely studied tissues when the matched single cell datasets are lacking. We used Gaussian mixture prior distribution to model the tranSig signature matrix Inline graphic. In the tranSig model, Inline graphic is not constrained by non-negativity, better representing the relative relationships between the signatures across cell types. Therefore, we can consider tranSig as fine-tuning and better capturing the relationships between signatures and cell types and between signatures.

There are some limitations for our proposed method. Appropriate tissue types should be carefully chosen while implementing tranSig, where tissues with similar cellular compositions may provide more accurate information for the signature matrix of tranSig. In addition, tranSig may be more time-consuming than other methods, especially when incorporating more tissues as input. Furthermore, the tranSig framework considered the HCL dataset as input. If the users need to use external single cell datasets, such as the ATAA dataset used in AN application, the cell type annotations are required.

In conclusion, tranSig is a novel Bayesian framework to infer a signature matrix by leveraging cross-tissue or -study information. Deconvolution based on the signature matrix inferred by tranSig leads to more accurate cell type proportion estimates and gains additional insights from analyzing bulk sample data. Coupled with HCL data, tranSig is applicable to deconvolution of various tissues. In a broader scheme, our approach may be considered as transfer learning. Future directions can focus on how to better incorporate information by integrative analysis and design more plausible models to derive signature matrices.

Key points

  • We developed a novel Bayesian model, tranSig, to infer an accurate and robust signature matrix by transfer learning across multiple scRNA-seq datasets, where SAME was implemented for statistical inference and signature matrix estimation.

  • In real applications, tranSig can infer pathological information of cell types in diseases as well as rare subtypes without the need for fresh tissues and specialized platforms, which is useful and applicable in the biological and medical basic research.

  • TranSig takes the advantage of the HCL including comprehensive tissues types and overcomes the problem of its relatively low sequencing depth, leading to a widespread application in almost all tissues types.

Supplementary Material

tranSig_supplementary_info_bbac616

Wenxuan Deng is a PhD student at the Department of Biostatistics, Yale School of Public Health, Yale University. Her research interests are integrative computational methodologies on single cell datasets.

Bolun Li is a PhD student in the Institute of Basic Medicine, Chinese Academy of Medical Sciences and studied at the Department of Biostatistics, Yale School of Public Health, Yale University as a visiting student in 2020. His research interests are the applications and methodologies of multi-omics in cardiopulmonary diseases.

Jiawei Wang is a postdoctoral associate at Yale School of Medicine. His research interest lies in imaging genetics and mental diseases.

Wei Jiang is an Associate Research Scientist in the Department of Biostatistics, Yale School of Public Health. His current research topic is to develop computational and statistical analysis methods in genome-wide association studies.

Xiting Yan is an Assistant Professor of Pulmonary and Biostatistics; Director of Data Analysis and Bioinformatics Hub, The Center for Precision Pulmonary Medicine. Her current research focuses on developing novel statistical and computational models to analyze large-scale omics and drug perturbation data to better understand disease pathogenesis.

Ningshan Li is a PhD student at SJTU-Yale Joint Center for Biostatistics and Data Science, Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University.

Milica Vukmirovic is an Associate Research Scientist in the Center for Precision Pulmonary Medicine at Yale. Her research focuses on understanding gene regulatory networks in Idiopathic Pulmonary Fibrosis.

Naftali Kaminski is the Boehringer-Ingelheim Endowed Professor of Internal Medicine and Chief of Pulmonary, Critical Care and Sleep Medicine, at Yale School of Medicine. He has a strong interest in integrating high throughput ‘omics’ data with clinical information to generate systems biology models of lung diseases and to develop precision medicine approaches.

Jing Wang is a professor and principal investigator of the Peking Union Medical College and the Deputy head of the Institute of Basic Medicine, Chinese Academy of Medical Sciences. Her main research interests involve the pathological mechanisms, molecular diagnosis and therapy of cardiovascular and pulmonary diseases.

Hongyu Zhao is the Ira V. Hiscock Professor of Biostatistics and Professor of Statistics and Data Science and Genetics. His research interests are the developments and applications of novel statistical methods to address scientific questions in genetics, molecular biology, drug developments and precision medicine.

Contributor Information

Wenxuan Deng, Department of Biostatistics, Yale School of Public Health, 60 College Street, New Haven, CT, USA.

Bolun Li, Department of Biostatistics, Yale School of Public Health, 60 College Street, New Haven, CT, USA; State Key Laboratory of Medical Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, Department of Pathophysiology, Peking Union Medical College, Beijing, China.

Jiawei Wang, Department of Biostatistics, Yale School of Public Health, 60 College Street, New Haven, CT, USA.

Wei Jiang, Department of Biostatistics, Yale School of Public Health, 60 College Street, New Haven, CT, USA.

Xiting Yan, Section of Pulmonary, Critical Care and Sleep Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA.

Ningshan Li, Section of Pulmonary, Critical Care and Sleep Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA.

Milica Vukmirovic, Section of Pulmonary, Critical Care and Sleep Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA; Leslie Dan Faculty of Pharmacy, University of Toronto, 144 College St., ON, Canada.

Naftali Kaminski, Section of Pulmonary, Critical Care and Sleep Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA.

Jing Wang, State Key Laboratory of Medical Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, Department of Pathophysiology, Peking Union Medical College, Beijing, China.

Hongyu Zhao, Department of Biostatistics, Yale School of Public Health, 60 College Street, New Haven, CT, USA.

Funding

This work was supported in part by the National Institutes of Health [R56 AG074015, P50 CA196530], the National Key Research and Development Program of China [2019YFA0801703, 2019YFA0802600], and the CAMS Innovation Fund for Medical Sciences [2021-I2M-1-049].

Data Availability

All the data could be downloaded from the GEO database (http://www.ncbi.nlm.nih.gov/geo/). The HCL can be accessed under the GEO accession number GSE134355. It could be obtained at http://bis.zju.edu.cn/HCL/ or https://db.cngb.org/HCL/. The PB bulk data are available through GEO with accession number GSE127472. The GEO accession number of the BAL dataset is GSE109516. The ATAA dataset can be accessed at GSE155468 and aorta bulk data at GSE140947.

References

  • 1. O’Neill K, Aghaeepour N, Spidlen J, et al. Flow cytometry bioinformatics. PLoS Comput Biol 2013;9:e1003365. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Lugli E, Roederer M, Cossarizza A. Data analysis in flow cytometry: the future just started. Cytometry A 2010;77A:705–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Watson JV. Introduction to Flow Cytometry. Cambridge, United Kingdom: Cambridge University Press, 2004.
  • 4. Ramos-Vara JA, Miller MA. When tissue antigens and antibodies get along: revisiting the technical aspects of immunohistochemistry—the red, Brown, and blue technique. Vet Pathol 2014;51:42–87. [DOI] [PubMed] [Google Scholar]
  • 5. Buchwalow IB, Bocker W. Immunohistochemistry: Basics and Methods. Heidelberg, Dordrecht, London, New York: Springer, 2010.
  • 6. Madissoon E, Wilbrey-Clark A, Miragaia RJ, et al. scRNA-seq assessment of the human lung, spleen, and esophagus tissue stability after cold preservation. Genome Biol 2019;21:1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Cobos FA, Alquicira-Hernandez J, Powell JE, et al. Benchmarking of cell type deconvolution pipelines for transcriptomics data. Nat Commun 2020;11:1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Newman AM, Liu CL, Green MR, et al. Robust enumeration of cell subsets from tissue expression profiles. Nat Methods 2015;12:453–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Vallania F, Tam A, Lofgren S, et al. Leveraging heterogeneity across multiple datasets increases cell-mixture deconvolution accuracy and reduces biological and technical biases. Nat Commun 2018;9:4735. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Tang D, Park S, Zhao H. NITUMID: nonnegative matrix factorization-based immune-TUmor MIcroenvironment deconvolution. Bioinformatics 2020;36:1344–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Tang D, Park S, Zhao H. SCADIE: simultaneous estimation of cell type proportions and cell type-specific gene expressions using SCAD-based iterative estimating procedure. Genome Biol 2022;23:129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Bezginov A, Clark GW, Charlebois RL, et al. Coevolution reveals a network of human proteins originating with multicellularity. Mol Biol Evol 2013;30:332–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Lukk M, Kapushesky M, Nikkilä J, et al. A global map of human gene expression. Nat Biotechnol 2010;28:322–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Lahti L, Torrente A, Elo LL, et al. A fully scalable online pre-processing algorithm for short oligonucleotide microarray atlases. Nucleic Acids Res 2013;41:e110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Zappia L, Phipson B, Oshlack A. Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database. PLoS Comput Biol 2018;14:e1006245. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Venteicher AS, Tirosh I, Hebert C, et al. Decoupling genetics, lineages, and microenvironment in IDH-mutant gliomas by single-cell RNA-seq. Science 2017;355. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Slyper M, Porter CBM, Ashenberg O, et al. Author correction: a single-cell and single-nucleus RNA-Seq toolbox for fresh and frozen human tumors. Nat Med 2020;26:1307. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Chu L-F, Leng N, Zhang J, et al. Single-cell RNA-seq reveals novel regulators of human embryonic stem cell differentiation to definitive endoderm. Genome Biol 2016;17:173. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Harland RM. A new view of embryo development and regeneration. Science 2018;360:967–8. [DOI] [PubMed] [Google Scholar]
  • 20. Boroughs AC, Larson RC, Marjanovic ND, et al. A distinct transcriptional program in human CAR T cells bearing the 4-1BB Signaling domain revealed by scRNA-Seq. Mol Ther 2020;28:2577–92. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Lavaert M, Liang KL, Vandamme N, et al. Integrated scRNA-Seq identifies human postnatal thymus seeding progenitors and regulatory dynamics of differentiating immature thymocytes. Immunity 2020;52:1088–1104.e6. [DOI] [PubMed] [Google Scholar]
  • 22. Regev A, Teichmann SA, Lander ES, et al. Science forum: the human cell atlas. Elife 2017;6:e27041. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Rozenblatt-Rosen O, Stubbington MJT, Regev A, et al. The human cell atlas: from vision to reality. Nature 2017;550:451–3. [DOI] [PubMed] [Google Scholar]
  • 24. Han X, Zhou Z, Fei L, et al. Construction of a human cell landscape at single-cell level. Nature 2020;581:303–9. [DOI] [PubMed] [Google Scholar]
  • 25. Hunt GJ, Freytag S, Bahlo M, et al. Dtangle: accurate and robust cell type deconvolution. Bioinformatics 2019;35:2093–9. [DOI] [PubMed] [Google Scholar]
  • 26. Gong T, Szustakowski JD. DeconRNASeq: a statistical framework for deconvolution of heterogeneous tissue samples based on mRNA-Seq data. Bioinformatics 2013;29:1083–5. [DOI] [PubMed] [Google Scholar]
  • 27. Plattner C, Finotello F, Rieder D. Deconvoluting tumor-infiltrating immune cells from RNA-seq data using quanTIseq. Methods Enzymol. 2020;636:261–85. [DOI] [PubMed] [Google Scholar]
  • 28. Baron M, Veres A, Wolock SL, et al. A single-cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure. Cell Syst 2016;3:346–360.e4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Jew B, Alvarez M, Rahmani E, et al. Accurate estimation of cell composition in bulk expression through robust integration of single-cell information. Nat Commun 2020;11:1971. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Wang X, Park J, Susztak K, et al. Bulk tissue cell type deconvolution with multi-subject single-cell expression reference. Nat Commun 2019;10:1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Szabo PA, Miron M, Farber DL. Location, location, location: tissue resident memory T cells in mice and humans. Sci Immunol 2019;4:eaas9673. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Meng W, Zhang B, Schwartz GW, et al. An atlas of B-cell clonal distribution in the human body. Nat Biotechnol 2017;35:879–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Liu J, Gao C, Sodicoff J, et al. Jointly defining cell types from multiple single-cell datasets using LIGER. Nat Protoc 2020;15:3632–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Doucet A, Godsill SJ, Robert CP. Marginal maximum a posteriori estimation using Markov chain Monte Carlo. Stat Comput 2002;12:77–84. [Google Scholar]
  • 35. Li Y, Ren P, Dawson A, et al. Single-cell transcriptome analysis reveals dynamic cell populations and differential gene expression patterns in control and aneurysmal human aortic tissue. Circulation 2020;142:1374–88. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Newman AM, Steen CB, Liu CL, et al. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat Biotechnol 2019;37:773–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Vukmirovic M, Yan X, Gibson KF, et al. Transcriptomics of bronchoalveolar lavage cells identifies new molecular endotypes of sarcoidosis. Eur Respir J 2021;58:2002950. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Moller DR, Koth LL, Maier LA, et al. Rationale and design of the genomic research in Alpha-1 antitrypsin deficiency and sarcoidosis (GRADS) study. Alpha-1 protocol. Ann Am Thorac Soc 2015;12:1561–71. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Chen P-Y, Qin L, Li G, et al. Smooth muscle cell reprogramming in aortic aneurysms. Cell Stem Cell 2020;26:542–557.e11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Welch JD, Kozareva V, Ferreira A, et al. Single-cell multi-omic integration compares and contrasts features of brain cell identity. Cell 2019;177:1873–1887.e17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Hansen KD, Irizarry RA, Wu Z. Removing technical variability in RNA-seq data using conditional quantile normalization. Biostatistics 2012;13:204–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Ziegenhain C, Vieth B, Parekh S, et al. Comparative analysis of single-cell RNA sequencing methods. Mol Cell 2017;65:631–643.e4. [DOI] [PubMed] [Google Scholar]
  • 43. Gong T, Hartmann N, Kohane IS, et al. Optimal deconvolution of transcriptional profiling data using quadratic programming with application to complex clinical blood samples. PLoS One 2011;6:e27156. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Patel VI, Metcalf JP. Airway macrophage and dendritic cell subsets in the resting human lung. Crit Rev Immunol 2018;38:303–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Hu G, Christman JW. Editorial: alveolar macrophages in lung inflammation and resolution. Front Immunol 2019;10:2275. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Sakalihasan N, Limet R, Defawe OD. Abdominal aortic aneurysm. Lancet 2005;365:1577–89. [DOI] [PubMed] [Google Scholar]
  • 47. Ernst CB. Abdominal aortic aneurysm. N Engl J Med 1993;328:1167–72. [DOI] [PubMed] [Google Scholar]
  • 48. Shimizu K, Mitchell RN, Libby P. Inflammation and cellular immune responses in abdominal aortic aneurysms. Arterioscler Thromb Vasc Biol 2006;26:987–94. [DOI] [PubMed] [Google Scholar]
  • 49. Rateri DL, Davis FM, Balakrishnan A, et al. Angiotensin II induces region-specific medial disruption during evolution of ascending aortic aneurysms. Am J Pathol 2014;184:2586–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Quintana RA, Taylor WR. Cellular mechanisms of aortic aneurysm formation. Circ Res 2019;124:607–18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Curci JA, Liao S, Huffman MD, et al. Expression and localization of macrophage elastase (matrix metalloproteinase-12) in abdominal aortic aneurysms. J Clin Invest 1998;102:1900–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Raffort J, Lareyre F, Clément M, et al. Monocytes and macrophages in abdominal aortic aneurysm. Nat Rev Cardiol 2017;14:457–71. [DOI] [PubMed] [Google Scholar]
  • 53. Xiong W, Zhao Y, Prall A, et al. Key roles of CD4+ T cells and IFN-γ in the development of abdominal aortic aneurysms in a murine model. J Immunol 2004;172:2607–12. [DOI] [PubMed] [Google Scholar]
  • 54. Ait-Oufella H, Wang Y, Herbin O, et al. Natural regulatory T cells limit angiotensin II-induced aneurysm formation and rupture in mice. Arterioscler Thromb Vasc Biol 2013;33:2374–9. [DOI] [PubMed] [Google Scholar]
  • 55. Fanjul-Fernández M, Folgueras AR, Cabrera S, et al. Matrix metalloproteinases: evolution, gene regulation and functional analysis in mouse models. Biochim Biophys Acta 2010;1803:3–19. [DOI] [PubMed] [Google Scholar]
  • 56. Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 2007;8:118–27. [DOI] [PubMed] [Google Scholar]
  • 57. Zhang Y, Jenkins DF, Manimaran S, et al. Alternative empirical Bayes models for adjusting for batch effects in genomic studies. BMC Bioinformatics 2018;19:262. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58. Han X, Wang R, Zhou Y, et al. Mapping the mouse cell atlas by microwell-Seq. Cell 2018;173:1307. [DOI] [PubMed] [Google Scholar]
  • 59. Ding J, Adiconis X, Simmons SK, et al. Systematic comparative analysis of single cell RNA-sequencing methods. bioRxiv. 2019;632216.
  • 60. Chen G, Ning B, Shi T. Single-cell RNA-Seq technologies and related computational data analysis. Front Genet 2019;10:317. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61. Lähnemann D, Köster J, Szczurek E, et al. Eleven grand challenges in single-cell data science. Genome Biol 2020;21:31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62. Hicks SC, Townes FW, Teng M, et al. Missing data and technical variability in single-cell RNA-sequencing experiments. Biostatistics 2018;19:562–78. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63. Bacher R, Kendziorski C. Design and computational analysis of single-cell RNA-sequencing experiments. Genome Biol 2016;17:63. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64. Korthauer KD, Chu L-F, Newton MA, et al. A statistical approach for identifying differential distributions in single-cell RNA-seq experiments. Genome Biol 2016;17:222. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

tranSig_supplementary_info_bbac616

Data Availability Statement

All the data could be downloaded from the GEO database (http://www.ncbi.nlm.nih.gov/geo/). The HCL can be accessed under the GEO accession number GSE134355. It could be obtained at http://bis.zju.edu.cn/HCL/ or https://db.cngb.org/HCL/. The PB bulk data are available through GEO with accession number GSE127472. The GEO accession number of the BAL dataset is GSE109516. The ATAA dataset can be accessed at GSE155468 and aorta bulk data at GSE140947.


Articles from Briefings in Bioinformatics are provided here courtesy of Oxford University Press

RESOURCES