Abstract
Computational cell type deconvolution on bulk transcriptomics data can reveal cell type proportion heterogeneity across samples. One critical factor for accurate deconvolution is the reference signature matrix for different cell types. Compared with inferring reference signature matrices from cell lines, rapidly accumulating single-cell RNA-sequencing (scRNA-seq) data provide a richer and less biased resource. However, deriving cell type signature from scRNA-seq data is challenging due to high biological and technical noises. In this article, we introduce a novel Bayesian framework, tranSig, to improve signature matrix inference from scRNA-seq by leveraging shared cell type-specific expression patterns across different tissues and studies. Our simulations show that tranSig is robust to the number of signature genes and tissues specified in the model. Applications of tranSig to bulk RNA sequencing data from peripheral blood, bronchoalveolar lavage and aorta demonstrate its accuracy and power to characterize biological heterogeneity across groups. In summary, tranSig offers an accurate and robust approach to defining gene expression signatures of different cell types, facilitating improved in silico cell type deconvolutions.
Keywords: cell type deconvolution, scRNA-seq, reference signature matrix, harmonize information
Introduction
Characterizing cell type composition in tissues can help better understand disease pathogenesis and progression. Traditional approaches for measuring cell type proportions, such as flow cytometry [1–3] and immunohistochemistry [4, 5], involve complicated protocols, expensive antibodies, high-end platforms and expertise. As these approaches are mainly based on antigen–antibody reaction, they are limited to the specificity of antibodies. In addition, some of these techniques such as cytometry-based platforms may not always yield accurate cell type proportions due to different preservation rates across cell types [6]. With the rapid development of RNA sequencing (RNA-seq) technologies and accumulation of bulk RNA-seq data in recent years, there has been a surge of computational deconvolution methods [7–11] to factorize tissue-level RNA-seq data measuring the mixture gene expression profiles to infer cellular compositions. One critical component of these methods is the signature expression profiles (signature matrix) of known cell types which are largely derived from RNA-seq data of enriched or purified cell populations [12–14]. This limits the analysis to the well-defined cell types and it is hard to incorporate less-studied cell types or subtypes. In addition, the procedure of cell population enrichment and purification puts the targeted cells under stress, which may cause systematic transcriptomic changes. Taken together, the signature matrices derived using traditional approaches can be limited and/or inaccurate.
The development of single cell RNA sequencing (scRNA-seq) technologies [12–14] enabled characterization of transcriptomes at single cell resolution. In recent years, applications of scRNA-seq have studied cell types [15, 16], embryonic and organic development [17–19], disease mechanisms and so on [20, 21]. With several cell atlas projects, such as Human Cell Atlas [22, 23] and Human Cell Landscape (HCL) [24], which aim to construct and cover the primary tissues of healthy donors, single cell datasets are being rapidly accumulated. These single cell datasets together with bulk RNA-seq data can facilitate cell type deconvolution by providing an accurate signature matrix at a higher resolution. However, scRNA-seq data are more sparse and noisier than bulk RNA-seq data, and there are platform differences between scRNA-seq and bulk RNA-seq data. Furthermore, although cell type proportions can be directly obtained from scRNA-seq data, the estimates may be biased due to different cell capture rates across cell types. Therefore, there is a strong need to conduct cell type deconvolution on bulk RNA-seq data based on the signature matrix derived from scRNA-seq data.
Existing computational cell type deconvolution methods can be divided into two categories. The methods in the first category utilize bulk RNA-seq data for the generation of the reference signature matrix and the cell type deconvolution, with CIBERSORT [8], dtangle [25], DeconRNASeq [26] and quaNTiseq [27] as representative methods. These methods depend on the reliable cell type annotations and informative cell-type-specific cell markers to generate the cell type signature matrix from bulk RNA-seq, thus the cell types that can be considered are limited. The methods in the second category include methods that utilize scRNA-seq data to derive the signature matrix. For example, BSEQ-sc extends CIBERSORT to use a single cell dataset as reference [28]. CIBERSORTx leverages single cell data and offers both S mode and B mode to address the technical differences between scRNA-seq and bulk RNA-seq data. Bisque also estimates cell type proportions from transformed bulk data referring to the single cell dataset [29]. Another commonly used method, MuSiC [30], aims to select the cell type signatures by multi-subject comparisons, which are used in subsequent deconvolution. However, current deconvolution methods using single cell datasets are limited by the quality of single cell datasets. As a result, most deconvolution methods focus on a limited number of tissues and cell types, such as immune cells of PBMC.
As mentioned above, HCL provides a widespread choice for various tissues but with a low sequencing depth. There is evidence that cells showed conserved transcriptomes [31, 32] across tissues and developmental stages, and such shared patterns can provide additional information for signature matrix construction. Therefore, we take advantage of the various tissues and overcome the low sequencing depth capitalizing on shared patterns for the same cell type across different tissues and studies. In this article, we propose a novel Bayesian framework, called tranSig, to infer the signature matrix from scRNA-seq data by leveraging cross-tissue information. Through benchmarking comparisons and real applications on peripheral blood (PB), bronchoalveolar lavage (BAL) and aortic aneurysm, we show that tranSig facilitates more accurate estimates of cell type proportions and better interpretations of the pathogenesis of diseases.
Methods
Single cell RNA-seq batch correction: To prepare for downstream analysis, we first obtained batch-corrected expression profiles
by comparing every reference single cell dataset
with the target one through LIGER [33]. We then derived the empirical signature matrix from the reference tissue expression profile
by averaging the expression levels for each cell type and to obtain
where
and
is the number of reference datasets, and we denote the matrix from the target tissue as
. After comparing every reference signature matrix
with the target signature matrix
element-wise, we used k-means to group all the cell-type-specific signature ratios
into three groups. The groups with ratios close to 1 or 0 will be considered as tissue-specific signatures. Therefore, in the reference dataset, the signatures whose ratio is in the middle range are assumed to share similar expression patterns with the target tissue and will be taken into the tranSig model.
tranSig Bayesian model
Given the bulk RNA-seq data of target tissue
of
subjects with
signature genes,
, we can do matrix factorization as the product of cell type-specific signature matrix
and cell type proportion matrix
where
is the number of cell types. In general, the cell type deconvolution methods assume
and
characterizes differential signatures across cell types by averaging the expression levels of signature genes.
In addition,
is the expected expression level of gene
in cell type
from tissue
, where
. We also add a sparsity on the signature gene matrix by introducing the Bernoulli variable
. The Gaussian mixture distribution exhibits the mixture heterogeneity of the signature matrix.
![]() |
![]() |
In addition, the terms in the Gaussian mixture distribution have the following prior specifications.
![]() |
![]() |
.
The prior distributions for the other parameters follow the uninformative conjugate distributions. In the current implementation of tranSig, the initial
is arbitrary, where
represents the hyperparameter for precision of the signature gene expression levels. However, in the future research, a better choice of
can further improve its performance.
The goal of tranSig Bayesian model is to estimate
and
based on the single cell data from multiple tissues.
tranSig optimization algorithm
We employed SAME [34] to accelerate the algorithm and estimate the Gaussian mixture priors. Let
and
. We are primarily interested in the inference of the parameters
and
in the tranSig matrix. We set the strictly positive increasing integer sequence as
. When the iteration
, we first initiate
. As the iteration 
➔ For
, sample 
Sample 
Therefore, in the last iteration
and
, we have
and
. We derive the tranSig matrix by averaging
across
. This average can enhance the confidence of tranSig estimation and each
can be considered as the indicator whether
should be taken into account. By comparing with the estimation of
in the last iteration, deconvolution results of averaged
substantially improved and showed more consistent with the ground truth (Supplementary Figure S3). Here
is considered as the binary weight suggesting whether we take the genes as signatures, further describing the relationships between signatures and cell types.
Bulk RNA-seq batch correction: Due to the technical differences between UMI-based scRNA-seq (e.g. Microwell-seq and 10× genomics) and bulk RNA-seq, the deconvolution by the signature matrix derived from UMI-based single cell expression profiles is far from ideal. Therefore, we leverage Combat (an empirical Bayesian batch-effect remove model) and pseudo mixture constructed by single cell expression profiles to minimize the technical variation. We re-denote bulk RNA-seq
to be
as the first batch and denote a pseudo mixture to be
as the second batch. To adjust the raw bulk RNA-seq to the space of scRNA-seq by Combat, the empirical Bayes (EB) model can be formulated as the following
![]() |
![]() |
where
is the overall gene expression,
and
are the batch- and gene-specific random effects. By using the parametric model of ComBat, we can get the adjusted bulk RNA-seq data as:
![]() |
where
are the standardized data,
is estimated from ordinary least squares and
and
are estimated from EB as the first moment of their posterior distribution iteratively.
Pseudo-bulk construction by single cell expression profile: For bulk RNA-seq batch correction, we generated pseudo-bulk expression profiles based on single cell datasets. Single cell expression profiles were normalized in transcripts per million (TPM) space, and cells were sampled according to the empirical proportion of each cell type (10 000 times in total). The average of the 10 000 cells was considered as one pseudo-bulk sample (TPM space).
Simulation setup
Although we do not assume the signature matrix continuous part
to be non-negative, we still coerce
to be non-negative in simulation for convenience. We sample
from half-normal distribution and
from Bernoulli distribution.
![]() |
![]() |
And we set
, and 
Per model’s assumption, the intermediate dataset-specific signature gene matrix in the first layer
over all the tissues shares the same Gaussian distribution with a mean of
. Therefore, we sample the elements in the matrix
from the following distribution:
![]() |
Here we set
.
To further simulate single cell data from multiple tissues, we let
to be sampled from a Gamma distribution. Then we have
![]() |
![]() |
where
.
Finally, we got the simulated cell type fraction matrix
and bulk data
. We sampled
from uniform distribution to make the summation be 1. Then we got
![]() |
![]() |
where
is the tissue-specific tranSig matrix when given a target tissue 
Benchmarking
We compared the results of the tranSig model with four deconvolution methods, including NNLS, quadratic programming (QP), CIBERSORTx, MuSiC, bisque and BSEQ-sc. NNLS, QP and BSEQ-sc take the average expressions of signature genes in each cell type as input, and CIBERSORTx, MuSiC and bisque take single cell expression profiles as input. For CIBERSORTx, S mode and B mode were used to correct the technical variation between scRNA-seq and bulk RNA-seq. CIBERSORTx deconvolution was implemented with default parameters, and single cell references were normalized in TPM space as the CIBERSORTx input. MuSiC needs a number of subjects to select signature genes, thus ‘sample’ in HCL metadata was used as the MuSiC parameter. However, because there was only one sample in the artery dataset of HCL, MuSiC cannot be implemented in the aneurysm application. The accuracy of cell type estimation was assessed by correlation, root-mean-squared error (RMSE) and K-L divergence. Correlation and RMSE between estimates and the ground truth are widely used by previous deconvolution methods. K-L divergence can measure the difference between two probability distributions, indicating the accuracy of estimations [11].
Data processing
We utilized the published HCL with 60 tissue types as the single cell references to perform deconvolution on all tissues. We manually cleaned the cell types in the most common 30 tissues from HCL, including 25 adult tissues, four fetal tissues and cord blood (CB). For instance, we annotated all macrophage subtypes as macrophages to accommodate different annotation resolutions in the 30 tissue-specific single cell datasets. We also removed the cells that did not have a precise annotation i.e. unknown cell clusters and distal cells in Lung. Overall, there are 96 distinct cell types across all 30 tissues (Table 1).
Table 1.
Uniformed cell type annotations of the HCL dataset across tissues.
| Subtype | new_cell_type | Tissue | |
|---|---|---|---|
| Neutrophil | Neutrophil | Adult-Adipose | |
| Mast cell | Mast cell | Adult-Adipose | |
| Stromal cell | Stromal cell | Adult-Adipose | |
| Adipocyte | Adipocyte | Adult-Adipose | |
| Proliferating cell | Proliferating cell | Adult-Adipose | |
| Adipocyte | Adipocyte | Adult-Adipose | |
| M2 Macrophage | Macrophage | Adult-Adipose | |
| T cell | T cell | Adult-Adrenal-Gland | |
| inflammatory cell | Inflammatory cell | Adult-Adrenal-Gland | |
| Proliferating cell | Proliferating cell | Adult-Adrenal-Gland | |
| Neutrophil | Neutrophil | Adult-Adrenal-Gland | |
| Stromal cell | Stromal cell | Adult-Adrenal-Gland | |
| Endothelial cell | Endothelial cell | Adult-Adrenal-Gland | |
| Zona fasciculata cell | Zona fasciculata cell | Adult-Adrenal-Gland | |
| Erythroid cell | Erythroid cell | Adult-Adrenal-Gland | |
| Dendritic cell | Dendritic cell | Adult-Adrenal-Gland | |
| Macrophage | Macrophage | Adult-Adrenal-Gland | |
| Smooth muscle cell | Smooth muscle cell | Adult-Adrenal-Gland | |
| Endothelial | Endothelial cell | Adult-Artery | |
| Smooth muscle cell | Smooth muscle cell | Adult-Artery | |
| B cell (Plasmocyte) | B cell | Adult-Artery | |
| Endothelial cell | Endothelial cell | Adult-Artery | |
| Epithelial cell | Epithelial cell | Adult-Artery | |
| Fibroblast | Fibroblast | Adult-Artery | |
| Basal cell | Basal cell | Adult-Artery | |
| T cell | T cell | Adult-Artery | |
| Mast cell | Mast cell | Adult-Artery | |
| Stromal cell | Stromal cell | Adult-Artery | |
| M1 Macrophage | Macrophage | Adult-Artery | |
| Macrophage | Macrophage | Adult-Artery | |
| Oligodendrocyte | Oligodendrocyte | Fetal-Brain | |
| Purkinje cell | Purkinje cell | Fetal-Brain | |
| Neuron | Neuron | Fetal-Brain | |
| Proliferating radial glia | Radial glia | Fetal-Brain | |
| Unknown | (DELETED) | Fetal-Brain | |
| Fibroblast | Fibroblast | Fetal-Brain | |
| Microglia | Microglia | Fetal-Brain | |
| Erythroid cell | Erythroid cell | Fetal-Brain | |
| Radial glia | Radial glia | Fetal-Brain | |
| Ependymal cell | Ependymal cell | Fetal-Brain | |
| Stromal cell | Stromal cell | Fetal-Brain | |
| Oligodendrocyte progenitor cell | Oligodendrocyte | Fetal-Brain | |
| Macrophage | Macrophage | Fetal-Brain | |
| Proliferating cell | Proliferating cell | Fetal-Brain | |
| Astrocyte | Astrocyte | Fetal-Brain | |
| Endothelial cell | Endothelial cell | Fetal-Brain | |
| Neutrophil | Neutrophil | Fetal-Brain | |
| Macrophage | Macrophage | Adult-Colon | |
| Unknown | (DELETED) | Adult-Colon | |
| Enterocyte | Enterocyte | Adult-Colon | |
| T cell | T cell | Adult-Colon | |
| Smooth muscle cell | Smooth muscle cell | Adult-Colon | |
| Mast cell | Mast cell | Adult-Colon | |
| Enteric glial cell | Enteric glial cell | Adult-Colon | |
| B cell (Plasmocyte) | B cell | Adult-Colon | |
| Stromal cell | Stromal cell | Adult-Colon | |
| Goblet cell | Goblet cell | Adult-Colon | |
| Enterocyte progenitor | Enterocyte | Adult-Colon | |
| Fibroblast | Fibroblast | Adult-Esophagus | |
| Lymphocyte | Lymphocyte | Adult-Esophagus | |
| B cell (Plasmocyte) | B cell | Adult-Esophagus | |
| Basal cell | Basal cell | Adult-Esophagus | |
| Unknown | (DELETED) | Adult-Esophagus | |
| Epithelial cell | Epithelial cell | Adult-Esophagus | |
(continued)
Table 1.
Continued
| Subtype | new_cell_type | Tissue | |
|---|---|---|---|
| Endothelial cell | Endothelial cell | Adult-Esophagus | |
| Smooth muscle cell | Smooth muscle cell | Adult-Esophagus | |
| Mast cell | Mast cell | Adult-Esophagus | |
| Mucosal cell | Mucosal cell | Adult-Esophagus | |
| MT high cell | MT high cell | Adult-Esophagus | |
| Macrophage | Macrophage | Adult-Esophagus | |
| B cell | B cell | Adult-Esophagus | |
| Neutrophil | Neutrophil | Adult-Esophagus | |
| Stromal cell | Stromal cell | Adult-Esophagus | |
| Keratinocyte | Keratinocyte | Adult-Esophagus | |
| Neutrophil | Neutrophil | Adult-Esophagus | |
| Neutrophil | Neutrophil | Adult-Heart | |
| Smooth muscle cell | Smooth muscle cell | Adult-Heart | |
| M2 Macrophage | Macrophage | Adult-Heart | |
| M1 Macrophage | Macrophage | Adult-Heart | |
| Apoptotic cell | Apoptotic cell | Adult-Heart | |
| Conventional dendritic cell | Dendritic cell | Adult-Heart | |
| Mast cell | Mast cell | Adult-Heart | |
| T cell | T cell | Adult-Heart | |
| Cardiomyocyte | Cardiomyocyte | Adult-Heart | |
| Vascular endothelial cell | Endothelial cell | Adult-Heart | |
| Ventricle cardiomyocyte | Cardiomyocyte | Adult-Heart | |
| Fibroblast | Fibroblast | Adult-Heart | |
| Dendritic cell | Dendritic cell | Adult-Heart | |
| Endothelial cell | Endothelial cell | Adult-Heart | |
| Macrophage | Macrophage | Adult-Heart | |
| Distal tubule cell | Tubule cell | Adult-Kidney | |
| Thick ascending limb of the loop of Henle | Loop of Henle | Adult-Kidney | |
| Smooth muscle cell | Smooth muscle cell | Adult-Kidney | |
| Macrophage | Macrophage | Adult-Kidney | |
| B cell (Plasmocyte) | B cell | Adult-Kidney | |
| Proximal tubule cell | Tubule cell | Adult-Kidney | |
| Mast cell | Mast cell | Adult-Kidney | |
| Fenestrated endothelial cell | Endothelial cell | Adult-Kidney | |
| Unknown | (DELETED) | Adult-Kidney | |
| Neutrophil | Neutrophil | Adult-Kidney | |
| IC-tran-PC | IC-tran-PC | Adult-Kidney | |
| Principle cell | Principle cell | Adult-Kidney | |
| Fibroblast | Fibroblast | Adult-Kidney | |
| Myeloid cell | Myeloid cell | Adult-Kidney | |
| Endothelial cell | Endothelial cell | Adult-Kidney | |
| Ureteric epithelial cell | Epithelial cell | Adult-Kidney | |
| Glomerular endothelial cell | Endothelial cell | Adult-Kidney | |
| Epithelial cell | Epithelial cell | Adult-Kidney | |
| B cell(Plasmocyte) | B cell | Adult-Kidney | |
| Conventional dendritic cell | Dendritic cell | Adult-Kidney | |
| Epithelial | Epithelial cell | Adult-Kidney | |
| Dendritic cell | Dendritic cell | Adult-Kidney | |
| Loop of Henle | Loop of Henle | Adult-Kidney | |
| B cell | B cell | Adult-Kidney | |
| Loop of Henle | Loop of Henle | Adult-Kidney | |
| Intercalated cell | Intercalated cell | Adult-Kidney | |
| Myocyte | Myocyte | Adult-Kidney | |
| T cell | T cell | Adult-Kidney | |
| Ciliated cell | Ciliated cell | Adult-Lung | |
| B cell (Plasmocyte) | B cell | Adult-Lung | |
| Myeloid cell | Myeloid cell | Adult-Lung | |
| AT1 cell | AT1 cell | Adult-Lung | |
| Dendritic cell | Dendritic cell | Adult-Lung | |
| Epithelial cell | Epithelial cell | Adult-Lung | |
| Plasmocyte | Plasmocyte | Adult-Lung | |
| Arterial endothelial cell | Endothelial cell | Adult-Lung | |
| Macrophage | Macrophage | Adult-Lung | |
| Alveolar bipotent progenitor(cell cycle) | Alveolar bipotent | Adult-Lung | |
(continued)
Table 1.
Continued
| Subtype | new_cell_type | Tissue | |
|---|---|---|---|
| Clara cell | Clara cell | Adult-Lung | |
| Bronchial chondrocyte | Bronchial chondrocyte | Adult-Lung | |
| Smooth muscle cell | Smooth muscle cell | Adult-Lung | |
| B cell | B cell | Adult-Lung | |
| Neutrophil | Neutrophil | Adult-Lung | |
| NKT cell | NKT cell | Adult-Lung | |
| Fibroblast | Natural killer cell | Adult-Lung | |
| Natural killer cell | Natural killer cell | Adult-Lung | |
| Megakaryocyte | Megakaryocyte | Adult-Lung | |
| Unknown | (DELETED) | Adult-Lung | |
| Endothelial cell | Endothelial cell | Adult-Lung | |
| Proliferating cell | Proliferating cell | Adult-Lung | |
| Conventional dendritic cell | Dendritic cell | Adult-Lung | |
| Mast cell | Mast cell | Adult-Lung | |
| Lymphatic endothelial cell | Endothelial cell | Adult-Lung | |
| Bronchial Epithelial cell | Epithelial cell | Adult-Lung | |
| AT2 cell | AT2 cell | Adult-Lung | |
| Activated T cell | (DELETED) | Adult-Lung | |
| Proliferating alveolar bipotent progenitor cell | Alveolar bipotent | Adult-Lung | |
| M2 macrophage | Macrophage | Adult-Lung | |
| Proliferating T cell | Proliferating T cell | Adult-Lung | |
| Artery endothelial cell | Endothelial cell | Adult-Lung | |
| Stromal cell | Stromal cell | Adult-Lung | |
| T cell | T cell | Adult-Lung | |
| AT1 cell | AT1 cell | Adult-Lung | |
| Smooth muscle cell | Smooth muscle cell | Adult-Muscle | |
| Endothelial cell | Endothelial cell | Adult-Muscle | |
| Stromal cell | Stromal cell | Adult-Muscle | |
| Fast skeletal muscle cell | Fast skeletal muscle cell | Adult-Muscle | |
| Neutrophil | Neutrophil | Adult-Muscle | |
| T cell | T cell | Adult-Muscle | |
| B cell (Plasmocyte) | B cell | Adult-Muscle | |
| Muscle progenitor cell | Muscle progenitor cell | Adult-Muscle | |
| Unknown | (DELETED) | Adult-Muscle | |
| Fibroblast | Fibroblast | Adult-Muscle | |
| Myogenic precursor cell | Myogenic precursor cell | Adult-Muscle | |
| NK cell | Natural killer cell | Adult-Muscle | |
| Conventional dendritic cell | Dendritic cell | Adult-Muscle | |
| Proliferating cell | Proliferating cell | Adult-Muscle | |
| Mast cell | Mast cell | Adult-Muscle | |
| M2 Macrophage | Macrophage | Adult-Muscle | |
| Fibroblast | Fibroblast | Adult-Pancreas | |
| Smooth muscle cell | Smooth muscle cell | Adult-Pancreas | |
| Alpha cell | Alpha cell | Adult-Pancreas | |
| Endothelial cell | Endothelial cell | Adult-Pancreas | |
| Beta cell | Beta cell | Adult-Pancreas | |
| M2 Macrophage | Macrophage | Adult-Pancreas | |
| Acinar cell | (DELETED) | Adult-Pancreas | |
| Exocrine cell | Exocrine cell | Adult-Pancreas | |
| Ductal cell | Ductal cell | Adult-Pancreas | |
| Acinar cell | Acinar cell | Adult-Pancreas | |
| Neuroendocrine cell | Neuroendocrine cell | Adult-Prostate | |
| Endothelial cell | Endothelial cell | Adult-Prostate | |
| M1 Macrophage | Macrophage | Adult-Prostate | |
| Smooth muscle cell | Smooth muscle cell | Adult-Prostate | |
| Epithelial cell | Epithelial cell | Adult-Prostate | |
| Unknown epithelial cell | Epithelial cell | Adult-Prostate | |
| Unknown | (DELETED) | Adult-Prostate | |
| Intermediate epithelial cell | Epithelial cell | Adult-Prostate | |
| Basal cell | Basal cell | Adult-Prostate | |
| Luminal cell | Luminal cell | Adult-Prostate | |
| T cell | T cell | Adult-Prostate | |
| Neutrophil | Neutrophil | Adult-Prostate | |
| Fibroblast | Fibroblast | Adult-Prostate | |
| M2 macrophage | Macrophage | Adult-Spleen | |
(continued)
Table 1.
Continued
| Subtype | new_cell_type | Tissue | |
|---|---|---|---|
| Lymphoid progenitor cell | Lymphoid progenitor cell | Adult-Spleen | |
| Neutrophil | Neutrophil | Adult-Spleen | |
| B cell (centrocyte) | B cell | Adult-Spleen | |
| Erythroid cell | Erythroid cell | Adult-Spleen | |
| B cell (Plasmocyte) | B cell | Adult-Spleen | |
| CD8 | T cell | Adult-Spleen | |
| T cell | T cell | Adult-Spleen | |
| Endothelial cell | Endothelial cell | Adult-Spleen | |
| Smooth muscle cell | Smooth muscle cell | Adult-Stomach | |
| CD8 T cell | T cell | Adult-Stomach | |
| Gastric mucosa cell | Gastric mucosa cell | Adult-Stomach | |
| Mast | Mast cell | Adult-Stomach | |
| Macrophage | Macrophage | Adult-Stomach | |
| Chromaffin cell | Chromaffin cell | Adult-Stomach | |
| Epithelial cell | Epithelial cell | Adult-Stomach | |
| Fibroblast | Fibroblast | Adult-Stomach | |
| Parietal cell | Parietal cell | Adult-Stomach | |
| Stromal cell | Stromal cell | Adult-Stomach | |
| Pit cell | Pit cell | Adult-Stomach | |
| B cell(plasmocyte) | B cell | Adult-Stomach | |
| B cell (Plasmocyte) | B cell | Adult-Stomach | |
| Gastric chief cell | Gastric chief cell | Adult-Stomach | |
| Endothelial cell | Endothelial cell | Adult-Stomach | |
| Granulocyte | Granulocyte | Adult-Stomach | |
| Inflammatory cell | Inflammatory cell | Adult-Stomach | |
| D cell/ X/A cell | D cell/ X/A cell | Adult-Stomach | |
| T cell | T cell | Adult-Stomach | |
| Myeloid cell | Myeloid cell | Adult-Stomach | |
| Mast cell | Mast cell | Adult-Stomach | |
| Follicular cell | Follicular cell | Adult-Thyroid | |
| Follicular B cell | B cell | Adult-Thyroid | |
| Thyroid epithelial cell | Epithelial cell | Adult-Thyroid | |
| NK cell | Natural killer cell | Adult-Thyroid | |
| T cell | T cell | Adult-Thyroid | |
| Proliferating cell | Proliferating cell | Adult-Thyroid | |
| Stromal cell | Stromal cell | Adult-Thyroid | |
| Smooth muscle cell | Smooth muscle cell | Adult-Thyroid | |
| B cell (Plasmocyte) | B cell | Adult-Thyroid | |
| Plasmacytoid dendritic cell | Dendritic cell | Adult-Thyroid | |
| Thyroid follicular cell | Follicular cell | Adult-Thyroid | |
| Fibroblast | Fibroblast | Adult-Thyroid | |
| Endothelial cell | Endothelial cell | Adult-Thyroid | |
| Conventional dendritic cell | Dendritic cell | Adult-Thyroid | |
| Neutrophil | Neutrophil | Adult-Thyroid | |
| Endothelial cell in EMT | Endothelial cell | Adult-Uterus | |
| Endothelial cell | Endothelial cell | Adult-Uterus | |
| T cell | T cell | Adult-Uterus | |
| Mast cell | Mast cell | Adult-Uterus | |
| M1 Macrophage | Macrophage | Adult-Uterus | |
| Unknown | (DELETED) | Adult-Uterus | |
| Stromal cell | Stromal cell | Adult-Uterus | |
| Endometrial cell | Endometrial cell | Adult-Uterus | |
| Smooth muscle cell | Smooth muscle cell | Adult-Uterus | |
| Unknown epithelial cell | Epithelial cell | Adult-Uterus | |
| Fibroblast | Fibroblast | Adult-Uterus | |
| Vascular smooth muscle cell | Vascular smooth muscle cell | Adult-Uterus | |
| Epithelial cell | Epithelial cell | Adult-Uterus | |
| Proliferating cell | Proliferating cell | Adult-Peripheral-Blood | |
| Neutrophil | Neutrophil | Adult-Peripheral-Blood | |
| Plasmacytoid dendritic cell | Dendritic cell | Adult-Peripheral-Blood | |
| Monocyte | Monocyte | Adult-Peripheral-Blood | |
| Macrophage | Macrophage | Adult-Peripheral-Blood | |
| NK cell | Natural killer cell | Adult-Peripheral-Blood | |
| Eosinophil | Eosinophil | Adult-Peripheral-Blood | |
(continued)
Table 1.
Continued
| Subtype | new_cell_type | Tissue | |
|---|---|---|---|
| activative T cell | activative T cell | Adult-Peripheral-Blood | |
| CD8+ T cell | T cell | Adult-Peripheral-Blood | |
| CD4 | T cell | Adult-Peripheral-Blood | |
| B cell(Centrocyte) | B cell | Adult-Peripheral-Blood | |
| B cell(Plasmocyte) | B cell | Adult-Peripheral-Blood | |
| Proliferating T cell | T cell | Adult-Peripheral-Blood | |
| CD8 | T cell | Adult-Peripheral-Blood | |
| Myeloid progenitor cell | Myeloid progenitor cell | Adult-Peripheral-Blood | |
| Erythroid cell | Erythroid cell | Adult-Peripheral-Blood | |
| Conventional dendritic cell | Dendritic cell | Adult-Peripheral-Blood | |
| B cell | B cell | Adult-Peripheral-Blood | |
| Dendritic cell | Dendritic cell | Adult-Peripheral-Blood | |
| Proliferating B cell | Proliferating B cell | Adult-Peripheral-Blood | |
| T cell | T cell | Adult-Peripheral-Blood | |
| Erythroid cell | Erythroid cell | Fetal-Skin | |
| Vascular endothelial cell | Endothelial cell | Fetal-Skin | |
| Keratinocyte | Keratinocyte | Fetal-Skin | |
| Lymphatic endothelial cell | Endothelial cell | Fetal-Skin | |
| Neutrophil | Neutrophil | Fetal-Skin | |
| Melanocyte | Melanocyte | Fetal-Skin | |
| Osteoblast | Osteoblast | Fetal-Skin | |
| Smooth muscle cell | Smooth muscle cell | Fetal-Skin | |
| Fibroblast | Fibroblast | Fetal-Skin | |
| Mesenchymal cell | Mesenchymal cell | Fetal-Skin | |
| Mast cell | Mast cell | Fetal-Skin | |
| Proliferating cell | Proliferating cell | Fetal-Skin | |
| Dermis fibroblast | Fibroblast | Fetal-Skin | |
| M2 Macrophage | Macrophage | Fetal-Skin | |
| Macrophage | Macrophage | Fetal-Skin | |
| Endothelial cell | Endothelial cell | Fetal-Skin | |
| Skeletal muscle cell | Skeletal muscle cell | Fetal-Skin | |
| Enterocyte progenitor | Enterocyte | Adult-Ileum | |
| B cell (Plasmocyte) | B cell | Adult-Ileum | |
| Goblet cell | Goblet cell | Adult-Ileum | |
| T cell | T cell | Adult-Ileum | |
| Neuron | Neuron | Adult-Ileum | |
| Macrophage | Macrophage | Adult-Ileum | |
| Fibroblast | Fibroblast | Adult-Ileum | |
| Stromal cell | Stromal cell | Adult-Ileum | |
| Endothelial cell | Endothelial cell | Adult-Ileum | |
| Mast cell | Mast cell | Adult-Ileum | |
| Paneth cell | Paneth cell | Adult-Ileum | |
| Enterocyte | Enterocyte | Adult-Ileum | |
| Myeloid cell | Myeloid cell | Adult-Ileum | |
| Conventional dendritic cell | Dendritic cell | Adult-Ileum | |
| Smooth muscle cell | Smooth muscle cell | Adult-Ileum | |
| Epithelial cell | Epithelial cell | Fetal-Intestine | |
| Proliferating cell | Proliferating cell | Fetal-Intestine | |
| Enterocyte progenitor | Enterocyte | Fetal-Intestine | |
| Enteroendocrine cell | Enteroendocrine cell | Fetal-Intestine | |
| Enterocyte | Enterocyte | Fetal-Intestine | |
| Goblet cell | Goblet cell | Fetal-Intestine | |
| Endothelial cell | Endothelial cell | Fetal-Intestine | |
| Fibroblast | Fibroblast | Fetal-Intestine | |
| Macrophage | Macrophage | Fetal-Intestine | |
| Stromal cell | Stromal cell | Fetal-Intestine | |
| Neuron | Neuron | Fetal-Intestine | |
| Smooth muscle cell | Smooth muscle cell | Fetal-Intestine | |
| Enterocyte | Enterocyte | Fetal-Intestine | |
| Myeloid cell | Myeloid cell | Fetal-Intestine | |
| Erythroid cell | Erythroid cell | Fetal-Intestine | |
| Vascular endothelial cell | Endothelial cell | Fetal-Intestine | |
| Fibroblast | Fibroblast | Fetal-Intestine | |
| Erythroid cell | Erythroid cell | Fetal-Intestine | |
(continued)
Table 1.
Continued
| Subtype | new_cell_type | Tissue | |
|---|---|---|---|
| Lymphatic endothelial cell | Endothelial cell | Fetal-Intestine | |
| T cell | T cell | Fetal-Intestine | |
| Dendritic cell | Dendritic cell | Fetal-Intestine | |
| B cell | B cell | Fetal-Intestine | |
| Antigen-presenting cell | Antigen-presenting cell | Fetal-Intestine | |
| B cell (Plasmocyte) | B cell | Adult-Bladder | |
| Endothelial cell (non-professional APC) | Endothelial cell | Adult-Bladder | |
| Fibroblast | Fibroblast | Adult-Bladder | |
| Macrophage | Macrophage | Adult-Bladder | |
| Neutrophil | Neutrophil | Adult-Bladder | |
| Smooth muscle cell | Smooth muscle cell | Adult-Bladder | |
| T cell | T cell | Adult-Bladder | |
| Urothelial cell | Urothelial cell | Adult-Bladder | |
| M2 Macrophage | Macrophage | Adult-Bladder | |
| Mast cell | Mast cell | Adult-Bladder | |
| NK cell | Natural killer cell | Adult-Bladder | |
| Stromal cell | Stromal cell | Adult-Bladder | |
| Urothelial cell | Urothelial cell | Adult-Bladder | |
| Vascular endothelial cell | Endothelial cell | Adult-Bladder | |
| B cell (Centrocyte) | B cell | Adult-Bone-Marrow | |
| Conventional dendritic cell | Dendritic cell | Adult-Bone-Marrow | |
| Erythroid progenitor cell | Erythroid cell | Adult-Bone-Marrow | |
| M2 Macrophage | Macrophage | Adult-Bone-Marrow | |
| Neutrophil | Neutrophil | Adult-Bone-Marrow | |
| NK cell | Natural killer cell | Adult-Bone-Marrow | |
| B cell (Plasmocyte) | B cell | Adult-Bone-Marrow | |
| Erythroid cell | Erythroid cell | Adult-Bone-Marrow | |
| HSPC | HSPC | Adult-Bone-Marrow | |
| Monocyte | Monocyte | Adult-Bone-Marrow | |
| Neutrophil | Neutrophil | Adult-Bone-Marrow | |
| T cell | T cell | Adult-Bone-Marrow | |
| Astrocyte | Astrocyte | Adult-Cerebellum | |
| Astrocyte(Bergmann glia) | Astrocyte | Adult-Cerebellum | |
| B cell | B cell | Adult-Cerebellum | |
| Endothelial cell | Endothelial cell | Adult-Cerebellum | |
| Epithelial cell | Epithelial cell | Adult-Cerebellum | |
| Excitatory neuron | Neuron | Adult-Cerebellum | |
| Inhibitory neuron | Neuron | Adult-Cerebellum | |
| Interneuron | Neuron | Adult-Cerebellum | |
| Macrophage | Macrophage | Adult-Cerebellum | |
| Microglia | Microglia | Adult-Cerebellum | |
| Neutrophil | Neutrophil | Adult-Cerebellum | |
| Oligodendrocyte | Oligodendrocyte | Adult-Cerebellum | |
| Oligodendrocyte progenitor cell | Oligodendrocyte | Adult-Cerebellum | |
| Smooth muscle cell | Smooth muscle cell | Adult-Cerebellum | |
| Stromal cell | Stromal cell | Adult-Cerebellum | |
| T cell | T cell | Adult-Cerebellum | |
| B cell (Centrocyte) | B cell | Cord-Blood | |
| B cell(Centrocyte) | B cell | Cord-Blood | |
| B cell(Plasmocyte) | B cell | Cord-Blood | |
| B cell(Unknown) | B cell | Cord-Blood | |
| Conventional dendritic cell | Dendritic cell | Cord-Blood | |
| Dendritic cell | Dendritic cell | Cord-Blood | |
| Eosinophil | Eosinophil | Cord-Blood | |
| Erythroid cell | Erythroid cell | Cord-Blood | |
| Erythroid/Basophil Progenitor | Erythroid cell | Cord-Blood | |
| HSPC | HSPC | Cord-Blood | |
| Megakaryocyte | Megakaryocyte | Cord-Blood | |
| Monocyte | Monocyte | Cord-Blood | |
| Neutrophil | Neutrophil | Cord-Blood | |
| NK cell | Natural killer cell | Cord-Blood | |
| Plasmacytoid dendritic cell | Dendritic cell | Cord-Blood | |
| Proliferating cell | Proliferating cell | Cord-Blood | |
| T cell | T cell | Cord-Blood | |
(continued)
Table 1.
Continued
| Subtype | new_cell_type | Tissue | |
|---|---|---|---|
| Dendritic cell | Dendritic cell | Cord-Blood-CD34P | |
| Eosinophil | Eosinophil | Cord-Blood-CD34P | |
| Erythroid/Basophil Progenitor | Erythroid cell | Cord-Blood-CD34P | |
| HSPC | HSPC | Cord-Blood-CD34P | |
| Megakaryocyte | Megakaryocyte | Cord-Blood-CD34P | |
| Neutrophil | Neutrophil | Cord-Blood-CD34P | |
| NK cell | Natural killer cell | Cord-Blood-CD34P | |
| Proliferating cell | Proliferating cell | Cord-Blood-CD34P | |
| Monocyte | Monocyte | Cord-Blood-CD34P | |
| Airway smooth muscle cell | Smooth muscle cell | Fetal-Lung | |
| Basal stem cell | Basal cell | Fetal-Lung | |
| CD8 T cell | T cell | Fetal-Lung | |
| Chondrocyte | Bronchial chondrocyte | Fetal-Lung | |
| Distal cell | (DELETED) | Fetal-Lung | |
| Distal progenitor cell | (DELETED) | Fetal-Lung | |
| Endothelial cell | Endothelial cell | Fetal-Lung | |
| Erythroid cell | Erythroid cell | Fetal-Lung | |
| Fibroblast | Fibroblast | Fetal-Lung | |
| Lung mesenchyme cell (cardiopulmonary progenitor) | Mesenchymal cell | Fetal-Lung | |
| Macrophage | Macrophage | Fetal-Lung | |
| Megakaryocyte/Erythroid progenitor cell | Megakaryocyte | Fetal-Lung | |
| Neuron | Neuron | Fetal-Lung | |
| Neutrophil | Neutrophil | Fetal-Lung | |
| NK cell | Natural killer cell | Fetal-Lung | |
| Pericyte | Pericyte | Fetal-Lung | |
| Proliferating cell | Proliferating cell | Fetal-Lung | |
| Proliferating lung mesenchyme cell | Mesenchymal cell | Fetal-Lung | |
| Proliferating smooth muscle cell | Smooth muscle cell | Fetal-Lung | |
| Proliferating T cell | Proliferating T cell | Fetal-Lung | |
| Proximal progenitor cell | (DELETED) | Fetal-Lung | |
| Smooth muscle cell | Smooth muscle cell | Fetal-Lung | |
| T cell | T cell | Fetal-Lung | |
| Vascular smooth muscle cell | Vascular smooth muscle cell | Fetal-Lung | |
| B cell (Plasmocyte) | B cell | Adult-Liver | |
| Hepatocyte | Hepatocyte | Adult-Liver | |
| Sinusoidal endothelial cell | Sinusoidal endothelial cell | Adult-Liver | |
| Activated T cell | activative T cell | Adult-Liver | |
| Myeloid cell | Myeloid cell | Adult-Liver | |
| Vascular endothelial cell | Endothelial cell | Adult-Liver | |
| Neutrophil | Neutrophil | Adult-Liver | |
| Motile liver macrophage | Motile liver macrophage | Adult-Liver | |
| Kupffer cell | Kupffer cell | Adult-Liver | |
| Macrophage | macrophage | Adult-Liver | |
| Epithelial cell | Epithelial cell | Adult-Liver | |
| Mast cell | Mast cell | Adult-Liver | |
| Dendritic cell | Dendritic cell | Adult-Liver | |
| Conventional dendritic cell | Dendritic cell | Adult-Liver | |
| Kupffer cell | Kupffer cell | Adult-Liver | |
| Smooth muscle cell | Smooth muscle cell | Adult-Liver | |
| Proliferating cell | (DELETED) | Adult-Liver | |
| Granulocyte | Granulocyte | Adult-Liver | |
| B cell (Plasmocyte) | B cell | Adult-Duodenum | |
| T cell | T cell | Adult-Duodenum | |
| Enterocyte | Enterocyte | Adult-Duodenum | |
| Goblet cell | Goblet cell | Adult-Duodenum | |
| Gastric chief cell | Gastric chief cell | Adult-Duodenum | |
| Mast cell | Mast cell | Adult-Duodenum | |
| Enterocyte progenitor | Enterocyte | Adult-Duodenum | |
| Contamination | (DELETED) | Adult-Duodenum | |
| Endothelial cell | Endothelial cell | Adult-Duodenum | |
| Fibroblast | Fibroblast | Adult-Duodenum | |
| Macrophage | Macrophage | Adult-Duodenum | |
| Immune response stromal cell | Immune response stromal cell | Adult-Gallbladder | |
| Macrophage | Macrophage | Adult-Gallbladder | |
(continued)
Table 1.
Continued
| Subtype | new_cell_type | Tissue | |
|---|---|---|---|
| Epithelial progenitor cell | Epithelial cell | Adult-Gallbladder | |
| T cell | T cell | Adult-Gallbladder | |
| Endothelial cell | Endothelial cell | Adult-Gallbladder | |
| Stromal cell | Stromal cell | Adult-Gallbladder | |
| Inflammatory stromal cell | Inflammatory stromal cell | Adult-Gallbladder | |
| Mucous epithelial cell | Mucosal cell | Adult-Gallbladder | |
| Epithelial cell | Epithelial cell | Adult-Gallbladder | |
| Mast cell | Mast cell | Adult-Gallbladder | |
| Neutrophil | Neutrophil | Adult-Gallbladder | |
| Smooth muscle cell | Smooth muscle cell | Adult-Gallbladder | |
| Antigen presenting cell | (DELETED) | Adult-Gallbladder | |
| Lymphocyte | (DELETED) | Adult-Gallbladder | |
| Dendritic cell | Dendritic cell | Adult-Gallbladder | |
| Fibroblast | Fibroblast | Adult-Gallbladder | |
| B cell(Plasmocyte) | B cell | Adult-Gallbladder | |
| B cell(Plasmocyte) | B cell | Adult-Jejunum | |
| Enterocyte | Enterocyte | Adult-Jejunum | |
| Enterocyte progenitor | Enterocyte | Adult-Jejunum | |
| Paneth cell | Paneth cell | Adult-Jejunum | |
| Goblet cell | Goblet cell | Adult-Jejunum | |
| Fibroblast | Fibroblast | Adult-Jejunum | |
| X/A cell | X/A cell | Adult-Jejunum | |
| Dendritic cell | Dendritic cell | Adult-Jejunum | |
| T cell | T cell | Adult-Jejunum | |
| Macrophage | Macrophage | Adult-Jejunum | |
| Endothelial cell | Endothelial cell | Adult-Jejunum | |
| Mast cell | Mast cell | Adult-Jejunum | |
| Smooth muscle cell | Smooth muscle cell | Adult-Jejunum | |
| Enterocyte | Enterocyte | Adult-Rectum | |
| B cell(Plasmocyte) | B cell | Adult-Rectum | |
| B cell | B cell | Adult-Rectum | |
| Mast cell | Mast cell | Adult-Rectum | |
| Enteric glial cell | Enteric glial cell | Adult-Rectum | |
| Inflamed epithelial cell | Epithelial cell | Adult-Rectum | |
| Macrophage | Macrophage | Adult-Rectum | |
| T cell | T cell | Adult-Rectum | |
| Goblet cell | Goblet cell | Adult-Rectum | |
| Stromal cell | Stromal cell | Adult-Rectum | |
| Smooth muscle cell | Smooth muscle cell | Adult-Rectum | |
To further validate the significance of the tranSig model for real studies, we used bulk RNA-seq of ascending aorta media from healthy donors and aneurysm tissues [35] from ascending aneurysm patients and compared the cell type proportions corresponding to aneurysm pathological changes between two groups. To evaluate the effect of the sequencing depth and single cell platform, we used the single cell datasets from three healthy donors and eight ascending thoracic aortic aneurysm (ATAA) patients by 10× genomics. The dataset includes main cell types in the aorta i.e. endothelial cells, smooth muscle cells, fibroblasts, macrophages, T cells, B cells, NK cells, mast cells and plasma cells.
For real applications, bulk RNA-seq of the whole blood from 12 healthy adults from the Stanford Blood Center [36] with group truth validated by FACS and immunofluorescence were used.
The BAL bulk RNA-seq data [37] were obtained from Vukmirovic et al. (2021). The samples were collected from 184 individuals in a sarcoidosis patient cohort by the Genomic Research in Alpha-1 Antitrypsin Deficiency and Sarcoidosis (GRADS) study [38]. BAL is composed of four cell populations: alveolar macrophages (AMs), eosinophils, lymphocytes and neutrophils. The mean proportion of AMs is 89%. Therefore, a reliable cell type deconvolution tool should identify the AMs as the dominant cell type in bulk RNA-seq data.
The aneurysm bulk RNA-seq data [39] were from Chen et al. (2020). The samples included aortic tissues from six healthy donors and aneurysm tissues from six patients with ascending aortic aneurysm. For aortic tissues, each donor had two samples on the middle and distal parts. In terms of aneurysms, the neck and belly were collected for each patient.
All the bulk datasets were normalized by TPM.
Signature gene list construction
Based on the HCL datasets, significantly highly expressed genes of each cell type from the five tissues [artery, lung, PBMC, bone marrow (BM) and liver] used in the tranSig model were calculated by the ‘FindAllMarkers’ function of Seurat R package. We set 0.25 as the cutoff for logarithmic fold changes, and the positive fold changes are selected by the parameter ‘only.pos’ of ‘FindAllMarker’ function. We used the union of these DEGs as the signature gene list for the tranSig model (Table 2) which covers most of the immune cells. However, we recommend generating a customized signature gene list by finding DE genes when the immune cells are not the only major cell populations in the target tissue.
Table 2.
The signature gene list
| ACTB | NPM1 | ARRB1 | IL7 | BMX | |
|---|---|---|---|---|---|
| ACTG1 | NUP214 | ASGR1 | IL7R | C1QA | |
| AGTRAP | OAZ1 | ASGR2 | ITK | C1QB | |
| AIF1 | OST4 | ATP8B4 | KCNA3 | CA4 | |
| ANXA1 | OSTC | AZU1 | KCNG2 | CACNA2D3 | |
| ANXA3 | P4HB | BACH2 | KIR2DL1 | CALB2 | |
| ANXA5 | PARK7 | BARX2 | KIR2DL4 | CALML4 | |
| AP1S2 | PCBP1 | BCL11B | KIR2DS4 | CAMK4 | |
| APOBEC3A | PDIA4 | BCL7A | KIR3DL2 | CASP1 | |
| ARHGDIB | PDIA6 | BEND5 | KLRC3 | CD14 | |
| ARL4C | PEBP1 | BFSP1 | KLRC4 | CD163 | |
| ARPC1B | PECAM1 | BHLHE41 | KLRF1 | CD207 | |
| ARPC2 | PFDN5 | BLK | KLRG1 | CDC14B | |
| ARPC3 | PFN1 | BMP2K | KLRK1 | CDC42EP4 | |
| ASAH1 | PGK1 | BPI | KYNU | CDK2AP2 | |
| ATP6V1F | PILRA | BRAF | LAG3 | CDR2L | |
| B2M | PIM2 | BRSK2 | LAIR2 | CEACAM1 | |
| BANK1 | PLAC8 | BST1 | LAMP3 | CES1 | |
| BCL2A1 | PLBD1 | BTNL8 | LAT | CFB | |
| BIRC3 | PLD4 | C11orf80 | LCK | CHMP7 | |
| BLVRB | PLPP5 | C1orf54 | LEF1 | CIB2 | |
| BTG1 | POMP | C3AR1 | LHCGR | CIITA | |
| C12orf75 | POU2F2 | C5AR2 | LILRA2 | CLCC1 | |
| C1orf162 | PPIA | CA8 | LILRA4 | CLCF1 | |
| C4orf3 | PPIB | CASP5 | LIME1 | CLEC5A | |
| C5AR1 | PPT1 | CCDC102B | LRMP | CNNM1 | |
| CALM1 | PRDX1 | CCL1 | LTA | CNOT1 | |
| CALM2 | PRDX4 | CCL13 | LY9 | COL4A3 | |
| CALR | PSAP | CCL14 | MAK | CP | |
| CAMP | PSMA2 | CCL17 | MAN1A1 | CRISPLD2 | |
| CAPG | PSMA3 | CCL18 | MANEA | CRLF2 | |
| CARD16 | PSMB2 | CCL19 | MAP3K13 | CYP4F3 | |
| CD1C | PSME2 | CCL20 | MAP4K1 | CYSLTR1 | |
| CD24 | PYCARD | CCL22 | MAP4K2 | DEFB1 | |
| CD27 | RABAC1 | CCL23 | MAP9 | DENND3 | |
| CD36 | RACK1 | CCL4 | MARCO | DPYD | |
| CD37 | RAN | CCL5 | MAST1 | DUSP4 | |
| CD38 | RETN | CCL7 | MEFV | DYSF | |
| CD3D | RGS2 | CCL8 | MEP1A | EDN1 | |
| CD48 | RHOA | CCND2 | MGAM | EMILIN2 | |
| CD53 | RHOC | CCR10 | MICAL3 | ESPL1 | |
| CD59 | RNASE2 | CCR2 | MMP12 | EVL | |
| CD63 | RNASE3 | CCR3 | MMP25 | FARS2 | |
| CD74 | RNASE6 | CCR5 | MMP9 | FBLN1 | |
| CD79A | RNASET2 | CCR6 | MROH7 | FCHO1 | |
| CD79B | RNF130 | CCR7 | MS4A2 | FGFR3 | |
| CD99 | ROMO1 | CD160 | MS4A3 | FLT4 | |
| CDA | RPL10 | CD180 | MSC | FMO5 | |
| CDKN1C | RPL10A | CD19 | MXD1 | FST | |
| CFD | RPL11 | CD1A | MYB | FSTL1 | |
| CFL1 | RPL12 | CD1B | NAALADL1 | FUT3 | |
| CHCHD2 | RPL13 | CD1D | NCR3 | FXYD6 | |
| CLEC10A | RPL13A | CD1E | NFE2 | GAS7 | |
| CLEC12A | RPL14 | CD2 | NIPSNAP3B | GATA2 | |
| CLEC4E | RPL15 | CD209 | NKG7 | GBP1 | |
| CLEC7A | RPL18 | CD22 | NLRP3 | GCH1 | |
| CMC1 | RPL18A | CD244 | NME8 | GFOD1 | |
| COPE | RPL19 | CD247 | NOD2 | GIMAP4 | |
| CORO1A | RPL21 | CD28 | NOX3 | GJB1 | |
| COTL1 | RPL22 | CD300A | NPAS1 | GLRX2 | |
| COX4I1 | RPL22L1 | CD33 | NPIPB15 | GP5 | |
| COX5B | RPL23 | CD3E | NPL | GPNMB | |
| COX6A1 | RPL23A | CD3G | NR4A3 | HAGH | |
| COX6C | RPL24 | CD4 | NTRK1 | HAVCR1 | |
| COX7A2 | RPL27 | CD40 | ORC1 | HBD | |
| COX7B | RPL27A | CD40LG | OSM | HGD | |
(continued)
Table 2.
Continued
| ACTB | NPM1 | ARRB1 | IL7 | BMX | |
|---|---|---|---|---|---|
| CPVL | RPL29 | CD5 | P2RX1 | HOMER2 | |
| CSF3R | RPL3 | CD6 | P2RX5 | HOXA2 | |
| CST3 | RPL30 | CD68 | P2RY10 | HTRA1 | |
| CSTA | RPL31 | CD69 | P2RY13 | IFI27 | |
| CSTB | RPL32 | CD7 | P2RY14 | IFNB1 | |
| CTSA | RPL34 | CD70 | P2RY2 | IFT20 | |
| CTSB | RPL35 | CD72 | PADI4 | IGFBP2 | |
| CTSC | RPL35A | CD80 | PAQR5 | IL15RA | |
| CTSD | RPL36 | CD86 | PASK | IL1R2 | |
| CTSS | RPL36AL | CD8A | PBXIP1 | IL1RAP | |
| CUTA | RPL37 | CD8B | PCDHA5 | IL32 | |
| CUX1 | RPL37A | CD96 | PDCD1 | IL6ST | |
| CYBA | RPL39 | CDC25A | PDCD1LG2 | ING2 | |
| CYBB | RPL4 | CDH12 | PDE6C | IRS1 | |
| CYCS | RPL41 | CDHR1 | PDK1 | JRKL | |
| DAD1 | RPL5 | CDK6 | PGLYRP1 | KCNJ15 | |
| DEFA3 | RPL7 | CEACAM3 | PIK3IP1 | KIF22 | |
| DEFA4 | RPL7A | CEACAM8 | PKD2L2 | KIR3DL1 | |
| DERL3 | RPL8 | CEMP1 | PLA1A | KLHL18 | |
| DNAJA1 | RPL9 | CFP | PLA2G7 | KRT19 | |
| DUSP1 | RPLP0 | CHI3L1 | PLCH2 | KRT5 | |
| DUSP11 | RPLP1 | CHI3L2 | PLEKHF1 | KSR1 | |
| DYNLRB1 | RPLP2 | CHST15 | PLEKHG3 | LAIR1 | |
| DYNLT1 | RPN1 | CHST7 | PMCH | LILRA5 | |
| EAF2 | RPN2 | CLC | PNOC | LILRB1 | |
| EDF1 | RPS10 | CLEC2D | PPBP | LIMA1 | |
| EEF1B2 | RPS11 | CLEC4A | PPFIBP1 | LIMK2 | |
| EEF1D | RPS12 | CLIC2 | PRF1 | LRP5L | |
| EEF2 | RPS14 | CMA1 | PRG2 | LRRC8D | |
| EIF1 | RPS15 | COL8A2 | PRR5L | LSM4 | |
| EMB | RPS15A | COLQ | PSG2 | MAG | |
| ERH | RPS16 | CPA3 | PTGDR | MAL | |
| ERP29 | RPS17 | CR2 | PTGER2 | MAOA | |
| EVI2B | RPS18 | CREB5 | PTGIR | MAPK7 | |
| FCER1A | RPS19 | CRISP3 | PTPRG | MAT2B | |
| FCER1G | RPS2 | CRTAM | QPCT | MEST | |
| FCGR3A | RPS20 | CRYBB1 | RAB27B | MME | |
| FCGRT | RPS21 | CSF1 | RALGPS2 | MMP8 | |
| FCMR | RPS23 | CSF2 | RASA3 | MOCS3 | |
| FCN1 | RPS25 | CST7 | RASGRP2 | MPO | |
| FGFBP2 | RPS26 | CTLA4 | RASGRP3 | MPPED2 | |
| FGL2 | RPS27 | CTSG | RASSF4 | MRPL3 | |
| FGR | RPS27A | CTSW | RCAN3 | MRPL4 | |
| FKBP11 | RPS28 | CXCL10 | REN | MT1X | |
| FKBP2 | RPS29 | CXCL11 | RENBP | MTMR11 | |
| FOLR3 | RPS3 | CXCL13 | REPS2 | MTSS1 | |
| FOS | RPS3A | CXCL3 | RGS1 | MUC1 | |
| FPR1 | RPS4X | CXCL5 | RGS13 | MYLIP | |
| FTL | RPS5 | CXCL9 | RRP12 | NAGA | |
| GAPDH | RPS6 | CXCR1 | RRP9 | NBN | |
| GCA | RPS7 | CXCR2 | RSAD2 | NBR1 | |
| GLIPR1 | RPS8 | CXCR5 | RYR1 | NDRG2 | |
| GLRX | RPS9 | CXCR6 | S1PR5 | NEFL | |
| GMFG | RPSA | CYP27A1 | SAMSN1 | NOTCH4 | |
| GNAS | S100A11 | CYP27B1 | SCN9A | NPEPPS | |
| GNG7 | S100A12 | DACH1 | SEC31B | NR2E3 | |
| GRN | S100A6 | DAPK2 | SERGEF | NR4A2 | |
| GSTP1 | S100A8 | DCSTAMP | SH2D1A | NRG1 | |
| GTF3A | S100A9 | DENND5B | SIGLEC1 | NRGN | |
| GZMA | S100P | DEPDC5 | SIK1 | NUDT1 | |
| GZMB | SAMHD1 | DGKA | SIRPG | NUDT18 | |
| GZMK | SAT1 | DHRS11 | SIT1 | NXT1 | |
| HCK | SDCBP | DHX58 | SKA1 | NXT2 | |
| HERPUD1 | SDF2L1 | DPEP2 | SKAP1 | OLFM1 |
(continued)
Table 2.
Continued
| ACTB | NPM1 | ARRB1 | IL7 | BMX | |
|---|---|---|---|---|---|
| HINT1 | SEC11C | DPP4 | SLAMF1 | ORM1 | |
| HLA-A | SEC61B | DSC1 | SLAMF8 | OSBPL10 | |
| HLA-DMA | SEC61G | DUSP2 | SLC12A8 | PALLD | |
| HLA-DMB | SEC62 | EBI3 | SLC15A3 | PANX1 | |
| HLA-DPA1 | SELL | EFNA5 | SLC2A6 | PAX5 | |
| HLA-DPB1 | SERF2 | EGR2 | SLC7A10 | PCGF2 | |
| HLA-DQA1 | SERPINA1 | ELANE | SLCO5A1 | PDGFB | |
| HLA-DQA2 | SERPINF1 | EPB41 | SMPD3 | PDK4 | |
| HLA-DQB1 | SH3BGRL | EPHA1 | SMPDL3B | PHEX | |
| HLA-DRA | SH3BGRL3 | EPN2 | SOCS1 | PI3 | |
| HLA-DRB1 | SLC25A6 | ETS1 | SP140 | PIK3CG | |
| HLA-DRB5 | SLIRP | ETV3 | SPAG4 | PLAT | |
| HM13 | SMDT1 | FAM124B | SPOCK2 | POU2AF1 | |
| HMGN1 | SNRPD2 | FAM174B | ST3GAL6 | PPA1 | |
| HMOX1 | SNRPG | FASLG | ST6GALNAC4 | PROM1 | |
| HSBP1 | SNU13 | FBXL8 | ST8SIA1 | PRR5 | |
| HSP90AA1 | SNX3 | FCER2 | STAP1 | PSAT1 | |
| HSP90B1 | SOD2 | FCGR2B | STEAP4 | PTPN13 | |
| HSPA5 | SPCS1 | FCGR3B | STXBP6 | PTPRK | |
| IFI44L | SPCS2 | FCRL2 | TBX21 | PTPRS | |
| IFITM1 | SPCS3 | FES | TCF7 | PTTG2 | |
| IFITM2 | SPI1 | FFAR2 | TCL1A | QPRT | |
| IFITM3 | SPIB | FLT3LG | TEC | RAB9A | |
| IGLL5 | SRP14 | FLVCR2 | TEP1 | RAMP1 | |
| IGSF6 | SSR2 | FOSB | TGM5 | RARRES2 | |
| ILF2 | SSR3 | FOXP3 | TLR2 | RNASE1 | |
| IRF7 | SSR4 | FPR2 | TLR7 | RNASE4 | |
| IRF8 | STXBP2 | FPR3 | TLR8 | RNF122 | |
| ISG15 | SUB1 | FRK | TMEM255A | RRAS | |
| ISG20 | SYNGR2 | FRMD4A | TNFAIP6 | S100B | |
| ITM2B | TAGLN2 | FRMD8 | TNFRSF10C | SCRN1 | |
| ITM2C | TALDO1 | FZD2 | TNFRSF11A | SDC1 | |
| JAML | TIMP1 | FZD3 | TNFRSF13B | SEC63 | |
| JCHAIN | TKT | GAL3ST4 | TNFRSF4 | SERPINF2 | |
| KCTD12 | TMBIM6 | GFI1 | TNFSF14 | SETBP1 | |
| KDELR2 | TMED9 | GGT5 | TNIP3 | SF3A3 | |
| KLRB1 | TMEM156 | GIPR | TPSAB1 | SFTPD | |
| KLRD1 | TMEM176B | GNLY | TRAF4 | SFXN3 | |
| LCN2 | TMEM258 | GPC4 | TRAT1 | SH3BP2 | |
| LCP1 | TMSB10 | GPR1 | TREM1 | SIDT1 | |
| LDHB | TMSB4X | GPR171 | TREM2 | SIGLEC6 | |
| LGALS2 | TNFRSF17 | GPR18 | TREML2 | SLC12A3 | |
| LILRB2 | TNFSF10 | GPR183 | TRIB2 | SLC17A5 | |
| LIMD2 | TNFSF13B | GPR19 | TRPM4 | SLC1A4 | |
| LMAN2 | TPI1 | GPR25 | TRPM6 | SLC38A1 | |
| LSP1 | TRMT112 | GPR65 | TSHR | SLC4A1AP | |
| LST1 | TSPO | GRAP2 | TTC38 | SLC6A13 | |
| LTA4H | TUBA1B | GYPE | TXK | SLC7A7 | |
| LTB | TXN | GZMH | UBASH3A | SLC9A3R1 | |
| LY6E | TYMP | GZMM | UPK3A | SLCO2B1 | |
| LY86 | UBA52 | HAL | VILL | SMARCD3 | |
| LY96 | UBE2J1 | HDC | VNN1 | SOCS2 | |
| LYZ | UBE2N | HESX1 | VNN2 | STAB2 | |
| MANF | UCP2 | HHEX | VNN3 | SYNE1 | |
| MCL1 | UFM1 | HIC1 | WNT5B | TAGLN | |
| MIF | UQCR11 | HK3 | WNT7A | TBC1D8 | |
| MNDA | UQCRH | HLA-DOB | ZAP70 | TFEC | |
| MPEG1 | UQCRQ | HNMT | ZBP1 | TGM3 | |
| MRPL33 | VAMP8 | HOXA1 | ZBTB10 | TLL1 | |
| MRPL52 | VCAN | HPGDS | ZBTB32 | TLR5 | |
| MS4A1 | VPREB3 | HPSE | ZNF135 | TMC6 | |
| MS4A6A | XBP1 | HRH1 | ZNF165 | TMEM9B | |
| MS4A7 | YWHAB | HSPA6 | ZNF222 | TMF1 | |
| MT-ATP6 | ZFP36L2 | HTR2B | ZNF286A | TNFRSF25 | |
| MT-ATP8 | ZNF706 | ICA1 | ZNF324 | TNNI2 | |
(continued)
Results
tranSig framework
To leverage information from multiple single cell references, we designed a new framework, called tranSig, based on transfer learning to handle cross-platform and cross-tissue variations to derive a more accurate signature matrix for downstream cell type deconvolution (Figure 1).
Figure 1.

Illustration of the tranSig framework. The framework starts from selecting signatures from reference single cell datasets for downstream harmonization. Based on LIGER, we filter out the genes in reference tissues which are unlikely to share common expression distributions with the target tissue. After that, all the selected signatures and their corresponding single cell expression profiles are input into the tranSig Bayesian model to derive a more reliable signature matrix. On the other hand, we remove batch-effects between the bulk and target single cell datasets based on Combat to project the bulk data onto the space of target single cell dataset. Finally, the tranSig signature matrix and the corrected bulk RNA-seq can be coupled with other cell type deconvolution optimization tools i.e. NNLS and CIBERSORTx.
As mentioned above, existing methods mostly generate their signature matrices based on one scRNA-seq dataset from the tissue type of the bulk dataset, noted as the target tissue. Due to the existence of technical batch effects, it is challenging to combine data from different tissues, or different studies on the same tissue. In tranSig, we assemble the scRNA-seq data of target tissue with those of other tissues or studies, noted as the reference tissues/studies, into an integrated dataset to derive the cell type signature matrix. For instance, when performing tranSig on the bulk data of PB, we took PB as the target tissue, and BM, CB, lung, kidney and liver as reference tissues. Specifically, we considered the HCL [24] and manually cleaned the cell type annotations to cover 25 adult and five fetal tissue types as sources for both target and reference tissues. We project the reference scRNA-seq datasets on data of target tissue by adopting the matrix factorization-based single cell batch correction method, LIGER [33, 40]. In addition, we identify the cell-type-conserved signature after batch-effect correction and remove tissue-specific signatures in the reference datasets. To accomplish this, as detailed in the Methods section, we compare every reference signature matrix with the target signature matrix to select cell-type-conserved signatures that may share the common distribution with the target tissue. Finally, we only keep the cell-type-conserved expression profiles in references that have a similar distribution with the target tissue.
Our hierarchical Bayesian model considers the batch-effect-corrected single cell expression profile after batch correction. To deconvolve bulk RNA-seq data, we first identify its tissue type and denote it as the target tissue type. Then we assume that the expression level of each gene in a given cell type from a given tissue follows a Gaussian distribution, in which the mean follows another mixture Gaussian distribution shared across different tissues. The latter mixture Gaussian distribution is to distinguish the signature genes from the others. With the implementation of the State-Augmentation for Marginal Estimation (SAME) [34], the algorithm is largely accelerated.
Table 2.
Continued
| ACTB | NPM1 | ARRB1 | IL7 | BMX | |
|---|---|---|---|---|---|
| MT-CO2 | ABCB4 | ICOS | ZNF442 | TOMM22 | |
| MT-CO3 | ABCB9 | IDO1 | ABCA5 | TOMM34 | |
| MT-CYB | ACAP1 | IFNG | ABCB1 | TRAF3IP2 | |
| MT-ND1 | ACHE | IL12B | ABHD5 | TRAK1 | |
| MT-ND2 | ACP5 | IL12RB2 | ACSM3 | TRIB1 | |
| MYDGF | ADAM28 | IL17A | ADAM19 | TSPAN7 | |
| MYL6 | ADAMDEC1 | IL18R1 | ADAMTS5 | TUBB6 | |
| MZB1 | ADAMTS3 | IL18RAP | ADI1 | TULP2 | |
| NAAA | ADRB2 | IL1A | AGPAT5 | TYRO3 | |
| NACA | AIM2 | IL1B | ALAS1 | ULK2 | |
| NAP1L1 | ALOX15 | IL1RL1 | ALPL | WEE1 | |
| NCF2 | ALOX5 | IL21 | ANK3 | YTHDF3 | |
| NDUFA1 | AMPD1 | IL26 | AOC2 | ZC3H12A | |
| NDUFA11 | ANGPT4 | IL2RA | APOE | ZDHHC13 | |
| NDUFA4 | ANKRD55 | IL2RB | ARID4A | ZNF180 | |
| NDUFB1 | APOBEC3G | IL3 | ARNT2 | ZNF189 | |
| NDUFB4 | APOL3 | IL4 | ASRGL1 | ZNF34 | |
| NDUFB6 | APOL6 | IL4R | ATP2A1 | ZNF552 | |
| NME1 | AQP9 | IL5 | ATP2B1 | ZNF593 | |
| NPC2 | ARHGAP22 | IL5RA | BEX1 |
To better align the bulk RNA-seq data with the signature matrix inferred from reference scRNA-seq data, we perform batch correction similar to the CIBERSORTx S mode [36] which utilizes the pseudo bulk mixtures derived from scRNA-seq to reduce the batch effects between scRNA-seq and bulk RNA-seq. We generate pseudo bulk mixtures by sampling cells from scRNA-seq dataset of the target tissue and implement the EB batch-effect removal model, Combat [41], to adjust bulk RNA-seq mixtures. The adjusted bulk RNA-seq data are taken as the input for downstream deconvolution along with the tranSig signature matrix.
Overall, with the above batch effect corrections, both the original bulk RNA-seq and the scRNA-seq dataset of reference tissues/studies were aligned with the scRNA-seq data of the target tissue.
We view the tranSig framework as an add-on step for cell type deconvolution. Therefore, it can be coupled with any existing cell type deconvolution methods e.g. NNLS and CIBERSORTx, to estimate cell type proportions.
Robustness evaluation through simulations
We have performed simulations to evaluate the robustness of our proposed tranSig model. Since it is challenging to simulate cross-platform or cross-tissue effects, we focused on assessing the robustness with the assumption that batch effects in both bulk RNA-seq and multi-tissue scRNA-seq have been successfully removed. We assumed that the true signature matrix of the target tissue type (
) contains two parts: a matrix
with binary entries indicating whether each gene is an expressed cell-type-specific signature gene, and a matrix
with continuous entries quantifying the average expression levels of signature genes. The signatures from all the input single cell datasets, including both the target and reference datasets, share the same underlying distribution, which is the product of
and
. Specifically, we simulated
from a Gaussian distribution with mean
and let
be the mean of Gaussian distribution of 
We note the possibility that an expressed signature gene may not be detected due to the prevalent dropout events in scRNA-seq data. The droplet-based single cell RNA-seq can theoretically detect 5000 genes per cell as the saturated number of detected genes [42], but the number of detected genes in each cell is often lower than the saturated numbers in published studies due to the low sequencing depth (median number of detected genes: 256–602 in HCL; 1973 in ATAA [35]). Therefore, in our simulations, we set the expression level for an expressed gene to be zero with a certain probability, corresponding to the undetected rate. A higher undetected rate can result in a loss of signature information in the empirical signature matrix constructed by averaging the single cell expression profiles grouped by cell type. We evaluated the performance of tranSig model coupled with both NNLS and CIBERSORTx to that of other methods, including MuSiC, CIBERSORTx with empirical signature matrix as input (empirical + CIBERSORTx) and NNLS from MuSiC without weights. If the true cell type annotations are accessible, a higher correlation between estimates and the ground truth suggests better performance.
We investigated how the undetected rate and the number of signature genes can affect cell type deconvolution (Figure 2A). Compared with the other methods, tranSig combined with CIBERSORTx (tranSig + CIBERSORTx) achieved more accurate cell type proportion estimations with higher correlations between the true and estimated cell type proportions. In the left panel, the performance of tranSig + CIBERSORTx was the most stable across different undetected rates. When constructing a signature matrix by differential expression (DE) analysis, the number of signature genes depends on the method and threshold selected in DE analysis. We also investigated how the number of signature genes influenced cell type proportion estimation to evaluate the robustness of methods. The right panel in Figure 2A showed that the two tranSig methods and CIBERSORTx had the best performance with a relatively small number of signature genes e.g. 150. In contrast, MuSiC and NNLS in the MuSiC R package required multiple subjects and cross-subject variation in single cell expression profiles to make an accurate estimation. With respect to tissue numbers, the accuracy of estimations was increased while transferring the information from reference tissues by tranSig (Supplementary Figure S1).
Figure 2.
Model robustness assessment through simulations. (A) Robustness of tranSig signature matrix against non-zero expression undetected rate (left) and signature gene number (right). The colors code the combinations of signature matrix and deconvolution tools. The vertical lines are error bar of mean ± 0.5*s.d; (B) Benchmarking comparisons between the true and estimated labels. The x- and y-axis are the true and estimated cell type proportions. The colors indicate the cell type labels.
We also show the scatter plots of the true and estimated cell type proportions to assess tranSig’s performances across cell types (Figure 2B). When there were eight cell types, 500 signature genes and an undetected rate of 25%, tranSig-based methods coupled with CIBERSORTx and NNLS had the best estimation, and all points centered around a single line. CIBERSORTx also had second best performance but was not as accurate. None of the methods could successfully identify the rare cell types with proportions <10%. The two tranSig-based methods tended to overestimate the proportions of more prevalent cell types but underestimate those of rare cell types.
Bulk PB deconvolution to handle cross-tissue and cross-platform variations
To evaluate the performance of tranSig on real data, we first analyzed PB that is composed of easily distinguishable immune cells, including monocytes, neutrophils, T cells, B cells and others. We applied tranSig to bulk RNA-seq data of whole blood from 12 healthy adults [36], for which cell differentials were measured by flow cytometry. For scRNA-seq references, PB was taken as the target tissue, with BM, CB, lung, kidney and liver as reference tissues. The union of highly expressed genes (HEGs) of each cell type in each tissue was considered as the signature gene list (details in Methods). NNLS, CIBERSORTx and QP [43] were used for deconvolution analysis. The correlations between the estimated cell type proportions and the ‘ground truth’ obtained from flow cytometry were calculated to assess the deconvolution performance. As shown in Figure 3A, tranSig coupled with three deconvolution methods (i.e. NNLS, CIBERSORTx and QP) had more accurate estimation than those based on the empirical signature matrix. Among the three deconvolution methods, both NNLS and CIBERSORTx had more accurate proportion estimates for most cell types when coupled with tranSig. Overall, tranSig coupled with CIBERSORTx had the most accurate estimation compared with the ground truth.
Figure 3.
PB bulk data cell type deconvolution benchmarking analysis. (A) Box plots of the correlations between the estimated cell type proportions and the ground truth for Newman et al. blood samples (n = 12), with color coded by cell types. CIBERSORTx is denoted as square, NNLS is denoted as circles and QP (quadprog) is denoted as triangles. Statistical significance is calculated by the Wilcoxon test. Data are presented as medians ± interquartile range. (B) Benchmarking of deconvolution methods shown on jitter plots of correlations same as (A). Data are expressed as means ± s.d.
Within the tranSig framework, we made adjustments to both scRNA-seq references and bulk RNA-seq expression profiles so we also evaluated these adjustments in real applications of PB deconvolutions (Supplementary Figure S2). With LIGER implementation, the shared signature genes cross tissues were selected and improved the deconvolution results of tranSig. In addition, we compared two types of single cell expression profiles [i.e. raw counts and TPM normalization] as the input of the tranSig model. The results suggest that using raw counts as input may outperform TPM normalization due to the estimation of
in the tranSig model. The details are discussed in the Methods section. Therefore, we used the raw counts as the input of the tranSig model in all subsequent analyses, unless stated otherwise. In the adjustment of bulk RNA-seq expression profiles, we generated the pseudo-bulk expression profiles by sampling from the target single cell data. There was an apparent technical batch effect between bulk and pseudo-bulk expression profiles. After adjustment by Combat, the bulk mixture was adjusted to the space of scRNA-seq (Supplementary Figure S3). We found that tranSig with adjusted bulk mixture as input outperformed tranSig with original bulk mixture by comparing tranSig with tranSig_nonadj in Supplementary Figure S2.
To systematically benchmark the performance of different methods (Figure 3B and Supplementary Figure S4), we implemented MuSiC, bisque, BSEQ-sc and CIBERSORTx with the S mode and B mode and evaluated the accuracy using correlation, RMSE and K-L divergence between estimates and the ground truth. Overall, the performance of tranSig + CIBERSORTx was superior to other methods. It is interesting to note that all three CIBERSORTx modes, including disabled batch correction mode, S mode and B mode, accurately estimated the proportions of B cells and T cells but failed to estimate those of monocytes and neutrophils. Because the HCL datasets were generated by Microwell-seq and were UMI-based sequencing data, the deconvolution results show that the S mode may substantially improve the overall estimation accuracy but perform poorly for neutrophils and monocytes (Supplementary Figure S5). Specifically, the estimated neutrophil proportions were smaller than 10% when the ground truth was around 60%. For MuSiC, although the proportions of neutrophils, T cells and B cells were accurately estimated, the performance of monocyte estimation was worse than either tranSig or CIBERSORTx. With respect to bisque and BSEQ-sc, they accurately estimated the proportion of T cells but failed in other cell type estimations. Taken together, the cell type deconvolution by tranSig coupled with CIBERSORTx achieved higher accuracy and less across-cell type variance than the other methods.
Bulk BAL deconvolution in a sarcoidosis cohort to identify dominant cell type
For real data, the samples have only one dominant cell type leading to highly unbalanced proportion across different cell types. It is important for a cell type deconvolution method to identify the dominant cell type and distinguish it from the others in such cases. To compare the performance of different methods under this scenario, we considered the BAL bulk RNA-seq dataset from the GRADS Sarcoidosis cohort [37], where proportion of AMs was around 80% [44].
We used adult PB in HCL as the target single cell dataset and the other five tissues (adult adipose, adult BM, adult lung, CB and fetal lung) as reference single cell datasets in tranSig because all these five tissues have immune cells as their major cell populations.
The AM is critical to lung inflammation and repairment [45]. They are tissue-resident cell types so they are cellularly and functionally different from the monocyte-derived macrophages in PBMC. We removed 48 AM differentially expressed genes (Table 3) from the signature gene list so that only AM and macrophages shared signature genes were used for deconvolution. It can largely reduce the bias caused by the heterogeneity between AMs and macrophages in PB.
Table 3.
Differentially expressed genes between AMs and macrophages
| MARCKS | EMP1 | ZFP36L1 | FNIP2 | IER3 | CXCL2 |
|---|---|---|---|---|---|
| C15orf48 | CTSL | CCL2 | CCL4 | SPP1 | CCL18 |
| MCEMP1 | CCL3 | CCL20 | LGMN | G0S2 | MT1X |
| PLA2G7 | FOLR3 | NEAT1 | TIMP1 | CCL3L1 | MT2A |
| SOD2 | SDS | IFITM3 | HIF1A | SGK1 | CXCL3 |
| GPR183 | TMEM176B | CD36 | CXCL8 | MT1G | IL1B |
| HP | VCAN | CCL4L2 | CXCL10 | AREG | HSPA1B |
| BASP1 | TNFAIP6 | RNASE1 | MAFB | NFKBIA | HSPA1A |
This data set has macrophages as the largest cell population and lymphocytes as the second largest. As shown in Figure 4, both tranSig and NNLS with the empirical signature matrix estimated the proportion of macrophages to be around 70%. However, the latter failed to identify T cells and estimated the second dominant cell type as dendritic cells with a mean proportion around 30%, which is different from the measured cell differentials [37]. In comparison, tranSig successfully recognized T cells as another major cell type in these samples. In addition to other benchmarking methods, MuSiC and BSEQ-sc estimated dendritic cells as the largest cell type, and bisque and CIBERSORTx failed to identify the scenario of unbalanced cell composition. Collectively, tranSig coupled with CIBERSORTx was capable to identify the most dominant cell type as well as the rest part and fit the deconvolution of samples with unbalanced cell proportions.
Figure 4.
BAL bulk data cell type deconvolution and tranSig identifies macrophages as the dominant cell type. The boxplots of cell type proportions across cell types. The x-axis represents the cell types, coded by the colors. The y-axis represents the estimated cell type proportions.
Bulk aorta deconvolution to depict cellular pathological changes of aneurysm (AN)
We further assessed whether tranSig deconvolution can identify pathological changes of tissues (Figure 5) by applying tranSig to bulk RNA-seq of aortic or aneurysm [39] tissues from six healthy donors and six patients with ascending aortic aneurysm. Aortic aneurysm [46, 47] is a permanent and localized dilation of the aorta, and is a fatal vascular disease. The pathological features [48] of aortic aneurysm were well-studied, including apoptosis of smooth muscle cells (SMCs) [49], infiltration of immune cells [50] (i.e. macrophages [51, 52] and T cells [53, 54]), matrix metalloproteinase increase [55] and elastin degradation. We took the artery dataset from HCL as the target tissue and lung, BM, PB, CB, kidney and liver as the reference tissues. Deconvolution with empirical signature matrix by both NNLS and CIBERSORTx could infer increased macrophages and reduced SMCs and stromal cells but missed the signals of other immune cells. Deconvolution results of tranSig showed increased macrophages and T cells, consistent with the inflammatory infiltration, and the decreased SMCs and stromal cells, suggesting SMC apoptosis in the pathogenesis of aortic aneurysm. Similar to the results of PB deconvolution, CIBERSORTx was more stable than NNLS. We found that CIBERSORTx failed to estimate immune cells, and there was no obvious improvement while implementing the S mode or B mode.
Figure 5.
Applications on aorta to depict cellular pathological changes of aneurysm (AN). Jitter plots of estimated cell type proportions for the samples in Chen et al., including aortic tissues from six health donors as control and aneurysm tissues from patients with ascending aortic aneurysm as AN, color coded by individuals. Color bars on the right annotate the deconvolution methods and single cell references, the artery dataset of HCL in red and the ATAA dataset in green. Data are expressed as means ± s.d.
Due to the lack of multiple subjects in the artery dataset from HCL, MuSiC and bisque could not be applied in this case. Therefore, we implemented the benchmarking methods on another scRNA-seq expression data of aortic tissues [35] from three healthy donors and eight patients with ascending aortic aneurysm (ATAA dataset with a deeper sequencing depth compared with the HCL artery dataset). In tranSig framework, we took the aorta from the ATAA dataset as the target tissue and lung, BM, PB, CB, kidney and liver from HCL datasets as the reference tissues to evaluate the performance of leveraging information across studies and platforms (Supplementary Figure S6). By implementing tranSig with CIBERSORTx, the results exhibited pathogenesis of SMC and immune cells. Consistent with the above results, deconvolution with tranSig + CIBERSORTx was more stable than NNLS. Similar to the deconvolution with HCL artery data, CIBERSORTx missed immune cells. The substantial improvement of CIBERSORTx by coupling with tranSig demonstrated the benefit of tranSig across studies and platforms. In addition, almost all cells in the aorta or aneurysm were estimated as SMCs by MuSiC, which cannot interpret the infiltration of inflammation in the pathogenesis of aortic aneurysm (Figure 5). Intriguingly, the estimation of bisque demonstrated the pathogenesis of aortic aneurysm, but showed a high variance in T and NK cell estimations, probably resulting from identifying these analogous cell types (Figure 5).
Discussion
In this study, we have developed tranSig, a novel Bayesian model to better infer a signature matrix by transfer learning across multiple scRNA-seq datasets. In the tranSig framework, we use SAME [34] for statistical inference and estimate a more reliable signature matrix by a Gaussian mixture prior. Highly expressed genes of cell types are screened as signature genes. LIGER [33, 40] was implemented and k-means was used to integrate scRNA-seq datasets from different tissues and studies. Specifically, tranSig selects target tissue-specific signature genes from multiple reference tissues. It aims to integrate informative and conserved signature genes as input of tranSig Bayesian model and removes the reference tissue-specific genes that have distinct expression distributions compared with those in the target tissue. In addition, we adopt Combat [56, 57] on bulk RNA-seq mixture and pseudo-bulk mixture derived from scRNA-seq to correct for the batch effects between bulk RNA-seq and scRNA-seq data. The final tranSig signature matrix and batch-effects-corrected bulk RNA-seq can be input to NNLS or any other external cell type deconvolution tools e.g. CIBERSORTx and quadratic programming, for cell type deconvolution.
To investigate the robustness of tranSig, we conducted a number of simulations under different conditions. For the application of simulation, we skipped tissue- and platform-effects simulations for simplicity. Therefore, the simulations mainly examined whether the tranSig model can construct an accurate signature matrix if assuming all the tissue- and platform-effects have been eliminated. Simulations demonstrated higher stability of tranSig than other methods (e.g. NNLS, CIBERSORTx and MuSiC) across different tissue numbers, numbers of signature genes and undetected rates (Figure 2 and Supplementary Figure S1). Otherwise, we noticed that the previous method bisque had a poor performance under all simulation scenario but performed better in real application, which may result from the cell proportion simulation of single cell data. We generated cell proportions of single cell data from a uniform distribution, which may influence the process of transformations in bisque. In addition to SAME sampling, we take
and
as examples in simulations (Supplementary Figure S7). We observe convergence of the algorithm after around 50 iterations, indicating the robustness of the algorithm. Notably, the robustness to undetected rates suggests that our approach can handle single cell datasets of low quality.
Applications of tranSig in real datasets demonstrated more accurate and stable deconvolution results coupled with CIBERSORTx. The effectiveness of LIGER and Combat was also evaluated. We deployed the tranSig pipeline on a BAL bulk RNA-seq dataset and showed that tranSig could successfully identify the top two dominant cell types: AMs and T cells, while the other methods failed to. Although there was still a gap between the true and estimated cell type proportions of AMs, tranSig had the highest accuracy. Unlike the first two applications, the deconvolution methods did not perform well on the aortic aneurysm data set which does not have the ‘true’ cell type proportions estimated through sorting experiments. Therefore, we tried to validate our results indirectly by comparing the estimates between the normal aorta and aortic aneurysms. The results showed that tranSig+CIBERSORTx could interpret the pathological changes of aneurysms, including SMC apoptosis and inflammatory infiltration. Building upon the current deconvolution tools, we developed a Bayesian framework to infer the signature matrix and proposed a more accurate deconvolution framework. For future real applications, this framework can provide the pathological information of cell types as well as the rare subtypes without the need for fresh tissues and specialized platforms compared with the experimental methods.
Based on the simulation and real application results, tranSig+ CIBERSORTx performed better than other methods, specifically when using scRNA-seq datasets with low sequencing depth to derive a signature matrix. In the tranSig framework, we mainly utilized the HCL datasets generated by Microwell-seq [58, 59] with a low sequencing depth (~500 detected genes in each cell) over 30 main tissues, including some rarely studied ones. Thus, the tranSig framework takes advantage of the comprehensive tissue types and overcomes various common technical noises [60–64] in scRNA-seq data, such as the bias caused by insufficient sequencing depth, low capture efficiency, high drop-out rate or cell type misclassifications. Thus, the deconvolution results of tranSig are more robust than other benchmarking methods. Furthermore, we used the ATAA as the target single cell datasets and other tissues of HCL datasets as references in the real application of aortic aneurysms, which shows the robustness and effectiveness of tranSig in transferring cross-study and -platform information.
The tranSig framework requires raw counts in scRNA-seq datasets and TPM normalized data in bulk RNA-seq. Our Bayesian model allows unnormalized scRNA-seq data input because of our mixture Gaussian priors. The utilization of raw count matrix for scRNA-seq data shows improved performance over TPM normalized matrix in terms of correlations between estimates and ground truth (Supplementary Figure S6).
We note that the primary goal of tranSig is to harmonize information from other tissues. This enables tranSig to perform cell type deconvolution on rarely studied tissues when the matched single cell datasets are lacking. We used Gaussian mixture prior distribution to model the tranSig signature matrix
. In the tranSig model,
is not constrained by non-negativity, better representing the relative relationships between the signatures across cell types. Therefore, we can consider tranSig as fine-tuning and better capturing the relationships between signatures and cell types and between signatures.
There are some limitations for our proposed method. Appropriate tissue types should be carefully chosen while implementing tranSig, where tissues with similar cellular compositions may provide more accurate information for the signature matrix of tranSig. In addition, tranSig may be more time-consuming than other methods, especially when incorporating more tissues as input. Furthermore, the tranSig framework considered the HCL dataset as input. If the users need to use external single cell datasets, such as the ATAA dataset used in AN application, the cell type annotations are required.
In conclusion, tranSig is a novel Bayesian framework to infer a signature matrix by leveraging cross-tissue or -study information. Deconvolution based on the signature matrix inferred by tranSig leads to more accurate cell type proportion estimates and gains additional insights from analyzing bulk sample data. Coupled with HCL data, tranSig is applicable to deconvolution of various tissues. In a broader scheme, our approach may be considered as transfer learning. Future directions can focus on how to better incorporate information by integrative analysis and design more plausible models to derive signature matrices.
Key points
We developed a novel Bayesian model, tranSig, to infer an accurate and robust signature matrix by transfer learning across multiple scRNA-seq datasets, where SAME was implemented for statistical inference and signature matrix estimation.
In real applications, tranSig can infer pathological information of cell types in diseases as well as rare subtypes without the need for fresh tissues and specialized platforms, which is useful and applicable in the biological and medical basic research.
TranSig takes the advantage of the HCL including comprehensive tissues types and overcomes the problem of its relatively low sequencing depth, leading to a widespread application in almost all tissues types.
Supplementary Material
Wenxuan Deng is a PhD student at the Department of Biostatistics, Yale School of Public Health, Yale University. Her research interests are integrative computational methodologies on single cell datasets.
Bolun Li is a PhD student in the Institute of Basic Medicine, Chinese Academy of Medical Sciences and studied at the Department of Biostatistics, Yale School of Public Health, Yale University as a visiting student in 2020. His research interests are the applications and methodologies of multi-omics in cardiopulmonary diseases.
Jiawei Wang is a postdoctoral associate at Yale School of Medicine. His research interest lies in imaging genetics and mental diseases.
Wei Jiang is an Associate Research Scientist in the Department of Biostatistics, Yale School of Public Health. His current research topic is to develop computational and statistical analysis methods in genome-wide association studies.
Xiting Yan is an Assistant Professor of Pulmonary and Biostatistics; Director of Data Analysis and Bioinformatics Hub, The Center for Precision Pulmonary Medicine. Her current research focuses on developing novel statistical and computational models to analyze large-scale omics and drug perturbation data to better understand disease pathogenesis.
Ningshan Li is a PhD student at SJTU-Yale Joint Center for Biostatistics and Data Science, Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University.
Milica Vukmirovic is an Associate Research Scientist in the Center for Precision Pulmonary Medicine at Yale. Her research focuses on understanding gene regulatory networks in Idiopathic Pulmonary Fibrosis.
Naftali Kaminski is the Boehringer-Ingelheim Endowed Professor of Internal Medicine and Chief of Pulmonary, Critical Care and Sleep Medicine, at Yale School of Medicine. He has a strong interest in integrating high throughput ‘omics’ data with clinical information to generate systems biology models of lung diseases and to develop precision medicine approaches.
Jing Wang is a professor and principal investigator of the Peking Union Medical College and the Deputy head of the Institute of Basic Medicine, Chinese Academy of Medical Sciences. Her main research interests involve the pathological mechanisms, molecular diagnosis and therapy of cardiovascular and pulmonary diseases.
Hongyu Zhao is the Ira V. Hiscock Professor of Biostatistics and Professor of Statistics and Data Science and Genetics. His research interests are the developments and applications of novel statistical methods to address scientific questions in genetics, molecular biology, drug developments and precision medicine.
Contributor Information
Wenxuan Deng, Department of Biostatistics, Yale School of Public Health, 60 College Street, New Haven, CT, USA.
Bolun Li, Department of Biostatistics, Yale School of Public Health, 60 College Street, New Haven, CT, USA; State Key Laboratory of Medical Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, Department of Pathophysiology, Peking Union Medical College, Beijing, China.
Jiawei Wang, Department of Biostatistics, Yale School of Public Health, 60 College Street, New Haven, CT, USA.
Wei Jiang, Department of Biostatistics, Yale School of Public Health, 60 College Street, New Haven, CT, USA.
Xiting Yan, Section of Pulmonary, Critical Care and Sleep Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA.
Ningshan Li, Section of Pulmonary, Critical Care and Sleep Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA.
Milica Vukmirovic, Section of Pulmonary, Critical Care and Sleep Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA; Leslie Dan Faculty of Pharmacy, University of Toronto, 144 College St., ON, Canada.
Naftali Kaminski, Section of Pulmonary, Critical Care and Sleep Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA.
Jing Wang, State Key Laboratory of Medical Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, Department of Pathophysiology, Peking Union Medical College, Beijing, China.
Hongyu Zhao, Department of Biostatistics, Yale School of Public Health, 60 College Street, New Haven, CT, USA.
Funding
This work was supported in part by the National Institutes of Health [R56 AG074015, P50 CA196530], the National Key Research and Development Program of China [2019YFA0801703, 2019YFA0802600], and the CAMS Innovation Fund for Medical Sciences [2021-I2M-1-049].
Data Availability
All the data could be downloaded from the GEO database (http://www.ncbi.nlm.nih.gov/geo/). The HCL can be accessed under the GEO accession number GSE134355. It could be obtained at http://bis.zju.edu.cn/HCL/ or https://db.cngb.org/HCL/. The PB bulk data are available through GEO with accession number GSE127472. The GEO accession number of the BAL dataset is GSE109516. The ATAA dataset can be accessed at GSE155468 and aorta bulk data at GSE140947.
References
- 1. O’Neill K, Aghaeepour N, Spidlen J, et al. Flow cytometry bioinformatics. PLoS Comput Biol 2013;9:e1003365. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Lugli E, Roederer M, Cossarizza A. Data analysis in flow cytometry: the future just started. Cytometry A 2010;77A:705–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Watson JV. Introduction to Flow Cytometry. Cambridge, United Kingdom: Cambridge University Press, 2004.
- 4. Ramos-Vara JA, Miller MA. When tissue antigens and antibodies get along: revisiting the technical aspects of immunohistochemistry—the red, Brown, and blue technique. Vet Pathol 2014;51:42–87. [DOI] [PubMed] [Google Scholar]
- 5. Buchwalow IB, Bocker W. Immunohistochemistry: Basics and Methods. Heidelberg, Dordrecht, London, New York: Springer, 2010.
- 6. Madissoon E, Wilbrey-Clark A, Miragaia RJ, et al. scRNA-seq assessment of the human lung, spleen, and esophagus tissue stability after cold preservation. Genome Biol 2019;21:1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Cobos FA, Alquicira-Hernandez J, Powell JE, et al. Benchmarking of cell type deconvolution pipelines for transcriptomics data. Nat Commun 2020;11:1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Newman AM, Liu CL, Green MR, et al. Robust enumeration of cell subsets from tissue expression profiles. Nat Methods 2015;12:453–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Vallania F, Tam A, Lofgren S, et al. Leveraging heterogeneity across multiple datasets increases cell-mixture deconvolution accuracy and reduces biological and technical biases. Nat Commun 2018;9:4735. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Tang D, Park S, Zhao H. NITUMID: nonnegative matrix factorization-based immune-TUmor MIcroenvironment deconvolution. Bioinformatics 2020;36:1344–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Tang D, Park S, Zhao H. SCADIE: simultaneous estimation of cell type proportions and cell type-specific gene expressions using SCAD-based iterative estimating procedure. Genome Biol 2022;23:129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Bezginov A, Clark GW, Charlebois RL, et al. Coevolution reveals a network of human proteins originating with multicellularity. Mol Biol Evol 2013;30:332–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Lukk M, Kapushesky M, Nikkilä J, et al. A global map of human gene expression. Nat Biotechnol 2010;28:322–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Lahti L, Torrente A, Elo LL, et al. A fully scalable online pre-processing algorithm for short oligonucleotide microarray atlases. Nucleic Acids Res 2013;41:e110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Zappia L, Phipson B, Oshlack A. Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database. PLoS Comput Biol 2018;14:e1006245. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Venteicher AS, Tirosh I, Hebert C, et al. Decoupling genetics, lineages, and microenvironment in IDH-mutant gliomas by single-cell RNA-seq. Science 2017;355. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Slyper M, Porter CBM, Ashenberg O, et al. Author correction: a single-cell and single-nucleus RNA-Seq toolbox for fresh and frozen human tumors. Nat Med 2020;26:1307. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Chu L-F, Leng N, Zhang J, et al. Single-cell RNA-seq reveals novel regulators of human embryonic stem cell differentiation to definitive endoderm. Genome Biol 2016;17:173. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Harland RM. A new view of embryo development and regeneration. Science 2018;360:967–8. [DOI] [PubMed] [Google Scholar]
- 20. Boroughs AC, Larson RC, Marjanovic ND, et al. A distinct transcriptional program in human CAR T cells bearing the 4-1BB Signaling domain revealed by scRNA-Seq. Mol Ther 2020;28:2577–92. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Lavaert M, Liang KL, Vandamme N, et al. Integrated scRNA-Seq identifies human postnatal thymus seeding progenitors and regulatory dynamics of differentiating immature thymocytes. Immunity 2020;52:1088–1104.e6. [DOI] [PubMed] [Google Scholar]
- 22. Regev A, Teichmann SA, Lander ES, et al. Science forum: the human cell atlas. Elife 2017;6:e27041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Rozenblatt-Rosen O, Stubbington MJT, Regev A, et al. The human cell atlas: from vision to reality. Nature 2017;550:451–3. [DOI] [PubMed] [Google Scholar]
- 24. Han X, Zhou Z, Fei L, et al. Construction of a human cell landscape at single-cell level. Nature 2020;581:303–9. [DOI] [PubMed] [Google Scholar]
- 25. Hunt GJ, Freytag S, Bahlo M, et al. Dtangle: accurate and robust cell type deconvolution. Bioinformatics 2019;35:2093–9. [DOI] [PubMed] [Google Scholar]
- 26. Gong T, Szustakowski JD. DeconRNASeq: a statistical framework for deconvolution of heterogeneous tissue samples based on mRNA-Seq data. Bioinformatics 2013;29:1083–5. [DOI] [PubMed] [Google Scholar]
- 27. Plattner C, Finotello F, Rieder D. Deconvoluting tumor-infiltrating immune cells from RNA-seq data using quanTIseq. Methods Enzymol. 2020;636:261–85. [DOI] [PubMed] [Google Scholar]
- 28. Baron M, Veres A, Wolock SL, et al. A single-cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure. Cell Syst 2016;3:346–360.e4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Jew B, Alvarez M, Rahmani E, et al. Accurate estimation of cell composition in bulk expression through robust integration of single-cell information. Nat Commun 2020;11:1971. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Wang X, Park J, Susztak K, et al. Bulk tissue cell type deconvolution with multi-subject single-cell expression reference. Nat Commun 2019;10:1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Szabo PA, Miron M, Farber DL. Location, location, location: tissue resident memory T cells in mice and humans. Sci Immunol 2019;4:eaas9673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Meng W, Zhang B, Schwartz GW, et al. An atlas of B-cell clonal distribution in the human body. Nat Biotechnol 2017;35:879–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Liu J, Gao C, Sodicoff J, et al. Jointly defining cell types from multiple single-cell datasets using LIGER. Nat Protoc 2020;15:3632–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Doucet A, Godsill SJ, Robert CP. Marginal maximum a posteriori estimation using Markov chain Monte Carlo. Stat Comput 2002;12:77–84. [Google Scholar]
- 35. Li Y, Ren P, Dawson A, et al. Single-cell transcriptome analysis reveals dynamic cell populations and differential gene expression patterns in control and aneurysmal human aortic tissue. Circulation 2020;142:1374–88. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Newman AM, Steen CB, Liu CL, et al. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat Biotechnol 2019;37:773–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Vukmirovic M, Yan X, Gibson KF, et al. Transcriptomics of bronchoalveolar lavage cells identifies new molecular endotypes of sarcoidosis. Eur Respir J 2021;58:2002950. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Moller DR, Koth LL, Maier LA, et al. Rationale and design of the genomic research in Alpha-1 antitrypsin deficiency and sarcoidosis (GRADS) study. Alpha-1 protocol. Ann Am Thorac Soc 2015;12:1561–71. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Chen P-Y, Qin L, Li G, et al. Smooth muscle cell reprogramming in aortic aneurysms. Cell Stem Cell 2020;26:542–557.e11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Welch JD, Kozareva V, Ferreira A, et al. Single-cell multi-omic integration compares and contrasts features of brain cell identity. Cell 2019;177:1873–1887.e17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Hansen KD, Irizarry RA, Wu Z. Removing technical variability in RNA-seq data using conditional quantile normalization. Biostatistics 2012;13:204–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Ziegenhain C, Vieth B, Parekh S, et al. Comparative analysis of single-cell RNA sequencing methods. Mol Cell 2017;65:631–643.e4. [DOI] [PubMed] [Google Scholar]
- 43. Gong T, Hartmann N, Kohane IS, et al. Optimal deconvolution of transcriptional profiling data using quadratic programming with application to complex clinical blood samples. PLoS One 2011;6:e27156. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Patel VI, Metcalf JP. Airway macrophage and dendritic cell subsets in the resting human lung. Crit Rev Immunol 2018;38:303–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Hu G, Christman JW. Editorial: alveolar macrophages in lung inflammation and resolution. Front Immunol 2019;10:2275. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Sakalihasan N, Limet R, Defawe OD. Abdominal aortic aneurysm. Lancet 2005;365:1577–89. [DOI] [PubMed] [Google Scholar]
- 47. Ernst CB. Abdominal aortic aneurysm. N Engl J Med 1993;328:1167–72. [DOI] [PubMed] [Google Scholar]
- 48. Shimizu K, Mitchell RN, Libby P. Inflammation and cellular immune responses in abdominal aortic aneurysms. Arterioscler Thromb Vasc Biol 2006;26:987–94. [DOI] [PubMed] [Google Scholar]
- 49. Rateri DL, Davis FM, Balakrishnan A, et al. Angiotensin II induces region-specific medial disruption during evolution of ascending aortic aneurysms. Am J Pathol 2014;184:2586–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Quintana RA, Taylor WR. Cellular mechanisms of aortic aneurysm formation. Circ Res 2019;124:607–18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51. Curci JA, Liao S, Huffman MD, et al. Expression and localization of macrophage elastase (matrix metalloproteinase-12) in abdominal aortic aneurysms. J Clin Invest 1998;102:1900–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. Raffort J, Lareyre F, Clément M, et al. Monocytes and macrophages in abdominal aortic aneurysm. Nat Rev Cardiol 2017;14:457–71. [DOI] [PubMed] [Google Scholar]
- 53. Xiong W, Zhao Y, Prall A, et al. Key roles of CD4+ T cells and IFN-γ in the development of abdominal aortic aneurysms in a murine model. J Immunol 2004;172:2607–12. [DOI] [PubMed] [Google Scholar]
- 54. Ait-Oufella H, Wang Y, Herbin O, et al. Natural regulatory T cells limit angiotensin II-induced aneurysm formation and rupture in mice. Arterioscler Thromb Vasc Biol 2013;33:2374–9. [DOI] [PubMed] [Google Scholar]
- 55. Fanjul-Fernández M, Folgueras AR, Cabrera S, et al. Matrix metalloproteinases: evolution, gene regulation and functional analysis in mouse models. Biochim Biophys Acta 2010;1803:3–19. [DOI] [PubMed] [Google Scholar]
- 56. Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 2007;8:118–27. [DOI] [PubMed] [Google Scholar]
- 57. Zhang Y, Jenkins DF, Manimaran S, et al. Alternative empirical Bayes models for adjusting for batch effects in genomic studies. BMC Bioinformatics 2018;19:262. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58. Han X, Wang R, Zhou Y, et al. Mapping the mouse cell atlas by microwell-Seq. Cell 2018;173:1307. [DOI] [PubMed] [Google Scholar]
- 59. Ding J, Adiconis X, Simmons SK, et al. Systematic comparative analysis of single cell RNA-sequencing methods. bioRxiv. 2019;632216.
- 60. Chen G, Ning B, Shi T. Single-cell RNA-Seq technologies and related computational data analysis. Front Genet 2019;10:317. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61. Lähnemann D, Köster J, Szczurek E, et al. Eleven grand challenges in single-cell data science. Genome Biol 2020;21:31. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62. Hicks SC, Townes FW, Teng M, et al. Missing data and technical variability in single-cell RNA-sequencing experiments. Biostatistics 2018;19:562–78. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63. Bacher R, Kendziorski C. Design and computational analysis of single-cell RNA-sequencing experiments. Genome Biol 2016;17:63. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64. Korthauer KD, Chu L-F, Newton MA, et al. A statistical approach for identifying differential distributions in single-cell RNA-seq experiments. Genome Biol 2016;17:222. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All the data could be downloaded from the GEO database (http://www.ncbi.nlm.nih.gov/geo/). The HCL can be accessed under the GEO accession number GSE134355. It could be obtained at http://bis.zju.edu.cn/HCL/ or https://db.cngb.org/HCL/. The PB bulk data are available through GEO with accession number GSE127472. The GEO accession number of the BAL dataset is GSE109516. The ATAA dataset can be accessed at GSE155468 and aorta bulk data at GSE140947.


















