Abstract
The distribution and abundance of immune cells, particularly T‐cell subsets, play pivotal roles in cancer immunology and therapy. T cells have many subsets with specific function and current methods are limited in estimating them, thus, a method for predicting comprehensive T‐cell subsets is urgently needed in cancer immunology research. Here, Immune Cell Abundance Identifier (ImmuCellAI), a gene set signature‐based method, is introduced for precisely estimating the abundance of 24 immune cell types including 18 T‐cell subsets, from gene expression data. Performance evaluation on both the sequencing data with flow cytometry results and public expression data indicate that ImmuCellAI can estimate the abundance of immune cells with superior accuracy to other methods especially on many T‐cell subsets. Application of ImmuCellAI to immunotherapy datasets reveals that the abundance of dendritic cells, cytotoxic T, and gamma delta T cells is significantly higher both in comparisons of on‐treatment versus pre‐treatment and responders versus non‐responders. Meanwhile, an ImmuCellAI result‐based model is built for predicting the immunotherapy response with high accuracy (area under curve 0.80–0.91). These results demonstrate the powerful and unique function of ImmuCellAI in tumor immune infiltration estimation and immunotherapy response prediction.
Keywords: cancer, immune cells, immunotherapy, T‐cell subsets
Immune Cell Abundance Identifier (ImmuCellAI) is a gene set signature‐based method for precisely estimating the abundance of 24 immune cell types including 18 T‐cell subsets. Application of ImmuCellAI to immunotherapy datasets reveals the dynamic change of immune cell abundance. An ImmuCellAI result‐based model for predicting the immunotherapy response achieves high accuracy with area under curve 0.80–0.91.
1. Introduction
The immune system, comprising various proteins, immune cells, and tissues, is complex and important for host defense.[qv: 1] Immune cells, including innate immune cells [e.g., macrophages, neutrophils, natural killer (NK) cells, and dendritic cells (DC)] and adaptive immune cells (e.g., B and T cells), are important components of the immune system. Dysfunctions of immune cells such as abnormal distributions of abundance and type as well as abnormal development and functions are always associated with diseases, including cancers.[qv: 2,3] Thus, investigating immune cell distribution in individuals could provide important insights into immune status, disease progression, and prognosis, as well as therapy (particularly in cancer immunotherapy).[qv: 4]
Tumor‐infiltrating immune cells are considered to be primary immune signatures and are strongly associated with the clinical outcomes of immunotherapies.[qv: 5] T cells play pivotal roles in cancer initiation, progression, and therapy (particularly immunotherapy)[qv: 6] and are composed of two major groups: CD4+ T cells and CD8+ T cells with each including numerous functional subpopulations (or subsets). The CD4+ T‐cell subsets, such as T helper cells (e.g., Th1, Th2, Th17, and Tfh) and regulatory T cells (e.g., nTreg, iTreg, and Tr1), primarily display helper and/or regulatory activities on other immune cells.[qv: 7] The CD8+ T‐cell subsets, cytotoxic T cells (Tc) and mucosal‐associated invariant T cells (MAIT), function in killing target cells. Importantly, the abundance of T‐cell subsets, particularly that of tumor‐infiltrating T cells, could influence clinical curative effects and prognosis.[qv: 8] In addition, strategies used for adjusting the proportion of T‐cell subsets have demonstrated profound efficacy in cancer immunotherapies. For example, increasing the ratio between effector T cell and Treg cell subsets could enhance the antitumor effects of anti‐CTLA‐4 therapy against melanoma.[qv: 9] Thus, investigating the landscape of immune cells, particularly T cells, can help us better understand the interplay between the immune system and diseases and provide important clues for improving the efficacy of immunotherapy in precision medicine.[qv: 10]
High‐throughput technologies, including microarrays and RNA sequencing (RNA‐Seq), produce large‐scale transcriptome data and provide opportunities for estimating the abundance of immune cells using gene expression profiles. Several methods, including xCell,[qv: 11] CIBERSORT,[qv: 12] EPIC,[qv: 13] TIMER,[qv: 14] MCP‐counter,[qv: 15] and DeconRNASeq[qv: 16] have been developed for enumerating immune cells from bulk transcriptome data of tumor samples, whereas a rare method has been designed for estimating the abundance of numerous T‐cell subsets, such as iTreg, Tc, and exhausted T cells (Tex). As such, there is an urgent need to develop a method focusing on abundance prediction of T‐cell subsets and other important immune cells in immuno‐oncology and immunotherapy studies.
In this study, we developed Immune Cell Abundance Identifier (ImmuCellAI), a method to robustly and precisely estimate the abundance of 24 immune cell types (including 18 T‐cell subsets) from transcriptome data. ImmuCellAI was applicable to both microarray and RNA‐Seq expression profiles from various resources (e.g., tumor, adjacent or normal tissue, and peripheral blood). Furthermore, we applied ImmuCellAI to cancer immunotherapy and The Cancer Genome Atlas (TCGA) pan‐cancer data to explore the influence of immune cells on the efficacy of immunotherapy and clinical progression of patients with cancer.
2. Results
2.1. Algorithmic Overview of the ImmuCellAI Method
ImmuCellAI was designed to estimate the abundance of 18 T‐cell subsets [CD4+, CD8+, CD4+ naïve, CD8+ naïve, central memory T (Tcm), effector memory T (Tem), Tr1, iTreg, nTreg, Th1, Th2, Th17, Tfh, Tc, MAIT, Tex, gamma delta T (γδ T), and natural killer T (NKT) cells] and six other important immune cells (B cells, macrophages, monocytes, neutrophils, DC, and NK cells) (Figure 1a). A brief illustration of the core algorithm of ImmuCellAI is represented in Figure 1b, and its detailed algorithm is described in the Experimental Section. Briefly, we curated a specific gene set from publications as gene signature (Table S1, Supporting Information) and obtained the reference expression profile from the Gene Expression Omnibus (GEO) database for each cell type (Table S2, Supporting Information). Then, we calculated the total expression deviation of the gene signature in the input expression profile in comparison with the reference expression profiles of the 24 immune cell types. We assigned the deviation to corresponding immune cell type based on the enrichment score of its gene signature, which was calculated using the single sample gene set enrichment analysis (ssGSEA) algorithm.[qv: 17] To correct the bias due to shared genes in the gene signatures of different immune cell types, a compensation matrix was introduced and least square regression was implemented to measure the weight of shared genes on these immune cells and to re‐estimate their abundance (Figure 1b). ImmuCellAI was suitable for application to both RNA‐Seq and microarray expression data from blood or tissue samples. To better utilize ImmuCellAI, we designed a user‐friendly web server, which is freely available at https://bioinfo.life.hust.edu.cn/web/ImmuCellAI/, for estimating the abundance of 24 immune cell types from gene expression profiles.
2.2. Performance of ImmuCellAI in RNA‐Seq and Microarray Datasets
To evaluate the performance of ImmuCellAI, we applied it to multiple RNA‐Seq and microarray expression datasets, performed benchmark tests, and compared the results with other five methods (xCell,[qv: 11] CIBERSORT,[qv: 12] EPIC,[qv: 13] MCP‐counter,[qv: 15] and TIMER[qv: 14]). Pearson correlation between the abundance estimated by flow cytometry and in silico method was used to assess the performance of each method in estimating the abundance of individual immune cell type, whereas the correlation deviation for all cell types was calculated to systematically evaluate the overall prediction power of each method (details are discussed in the Experimental Section).
First, we enumerated the amount of immune cell types available in each of the six analytical methods, among which ImmuCellAI proved capable of predicting more T cell subsets than other methods (Figure 2a). Then, we used six RNA‐Seq datasets as benchmark resources for evaluating the performance of ImmuCellAI (Figure 2b,c) on RNA‐Seq data. Three of them were simulated and integrated from single‐cell sequencing data of liver cancer (GSE98638),[qv: 18] lung cancer (GSE99254),[qv: 19] and melanoma (GSE72056),[qv: 20] their immune cell proportions were calculated from single cell barcode information (Tables S5–S7, Supporting Information). One dataset was taken from the lymph nodes of four patients with melanoma included in the EPIC[qv: 13] project and their flow cytometry result was also obtained. Furthermore, because of the limited number of T‐cell subsets in currently available data, to evaluate the performance of ImmuCellAI in estimating the abundance of unique T‐cell subsets, we generated two datasets using flow cytometry analysis for all 24 immune cell types (Table S6, Supporting Information) and sequenced their RNA (BIG Data Center ids: CRA001839 and CRA001840). One of these datasets contained five samples from healthy donors and the other contained seven samples from patients with acute myelocytic leukemia. Based on the results, the abundance of most immune cells estimated by ImmuCellAI showed a higher positive correlation with the counting results of flow cytometry than that estimated by the other methods, particularly for T‐cell subsets. These results suggested that ImmuCellAI is able to robustly and accurately estimate the abundance of 24 immune cell types in RNA‐Seq datasets (Figure 2b,c; Figures S1a,b and S2a, Supporting Information).
Meanwhile, we used two microarray datasets (GSE65135[qv: 12] and GSE65133[qv: 12] from GEO), which were disaggregated lymph node biopsy samples from patients with follicular lymphoma and peripheral blood samples from individuals vaccinated for influenza with immune cell fractions determined by flow cytometry. The abundance of each cell type measured by ImmuCellAI showed overall high positive correlations with the flow cytometry results in both datasets (Figure 2d; Figure S2b, Supporting Information). In addition, ImmuCellAI showed the least correlation deviation in both RNA‐Seq and microarray datasets (Figure 2e). The performance evaluation results indicated that ImmuCellAI has the best performance in both microarray and RNA‐Seq data with stable and high precision in terms of estimation of abundance of the 24 immune cell types.
2.3. Case Study of ImmuCellAI Application for Cancer Immunotherapy Response Prediction
To investigate the impact of immune cell abundance on cancer immunotherapy, we applied ImmuCellAI to a dataset, GSE91061,[qv: 21] which comprised 58 melanoma samples from a clinical trial on anti‐PD1 therapy. We analyzed the results using two comparisons: responders versus non‐responders and on‐treatment versus pre‐treatment. The abundance of three immune cells including Tc, γδ T, and DC cells significantly increased with anti‐PD1 treatment (Figure 3a; Mann–Whitney U‐test, p < 0.05). Besides, Tc, γδ T, and DC cells also significantly infiltrated more in responders compared with non‐responders at the on‐treatment time point (Figure 3b; Mann–Whitney U‐test, p < 0.05). The results suggested that ImmuCellAI can provide important insights on the dynamic immune cell infiltration during immunotherapy and offer valuable indicator for immunotherapy response during the treatment.
Next, we built a prediction model for immune checkpoint blockade (ICB) therapy response based on the immune cell abundance estimated by ImmuCellAI. Five anti‐PD1 or anti‐CTLA4 therapy datasets (GSE91061,[qv: 21] GSE78220,[qv: 22] and GSE115821,[qv: 23] ERP107734,[qv: 24] and SRP011540[qv: 25]) with a total of 176 patients were involved in the process. The former three datasets from GEO were used for training and testing in a support vector machine model based on the abundance of immune cells, and the last two cohorts from dbGAP were used for further validation (details in methods). By taking the predicted abundance of 22 immune cells (B cell, CD4+ naïve, CD8+ naïve, Tcm, Tc, DC, γδ T, Tem, Tex, iTreg, macrophage, MAIT, monocyte, neutrophil, NK, NKT, nTreg, Tfh, Th1, Th17, Th2, and Tr1) as features, the model achieved relatively high accuracy in predicting immunotherapy response in the test data (area under curve (AUC) 0.91) and other two validation cohorts (AUC 0.84 and 0.80, Figure 3c). And T‐cell subsets made great contribution in the model by using only the abundance of T‐cell subsets as features achieved AUC of 0.77–0.85 (Figure 3d). Furthermore, using the abundance of immune cell subsets available in both ImmuCellAI and xCell as features, the model with abundance predicted by ImmuCellAI performed better than xCell in all three cohorts (Figure 3e,f). Overall, the abundance of immune cells measured by ImmuCellAI was highly predictive of immune therapy sensitivity (Figure 3c–f), suggesting that ImmuCellAI can serve as an ideal method for immunotherapy studies. We implemented the model for immune therapy response prediction as a functional module on the ImmuCellAI server.
2.4. Case Study of ImmuCellAI Application to TCGA Pan‐Cancer Data for Predicting the Infiltration of Immune Cells and Patient Survival
Increasing evidence has demonstrated that immune cells are critical in cancer progression, and the infiltration of different T‐cell subsets could dramatically influence the treatment strategy and prognosis.[qv: 26] In this study, to demonstrate the application of ImmuCellAI to cancer research, we analyzed 17 cancer types in TCGA with gene expression data of both the tumor and adjacent tissues to survey the infiltration difference of immune cells. Partial correlation analysis was implemented to reduce false correlation, which may be caused by other factors, such as age and gender (Figure S3, Supporting Information). The results indicated that the abundance of many immune cell types was significantly different (false discovery rate (FDR) < 0.1) between samples from tumor and adjacent tissue in most cancers, particularly for Tc, NK, NKT, Th2, iTreg, nTreg, and DC (Figure 4a). The iTreg, nTreg, Tr1, and monocyte cells were markedly enriched in the nidus of most cancer types, which is consistent with their immunosuppressive properties (Figure 4a). In contrast, several antitumor cells, such as γδ T, MAIT, NK, NKT, and Th2 cells, showed higher infiltration in adjacent tissues of most cancers, indicating that the tumor microenvironment may prohibit their access to the nidus.
Furthermore, we investigated the effects of the immune cell infiltration on patient survival by controlling other factors (i.e., age, gender, and stage) using cox regression analysis. In a nutshell, the infiltration of most immune cell types significantly affected the overall survival of patients in different cancers (Figure 4b). The infiltration of most immune cells had opposite effects on survival in brain lower grade glioma and uveal melanoma compared with that in other cancers (Figure 4b). Furthermore, skin cutaneous melanoma (SKCM) had the most immune cell types (12/24) significantly associated with patient survival (FDR < 0.1; Figure 4b; Figure S4, Supporting Information). The infiltration of T‐cell subsets (e.g., γδ T cells, Th1, Th2, and iTreg) had positive effects on long‐term survival in patients with SKCM, whereas patients with high infiltration of CD8+ naïve and neutrophils were associated with worse outcomes (Figure 4b; Figure S4, Supporting Information). Although the infiltration of a single immune cell type in some cancer types was not related with patient survival, the infiltration of a combination of multiple immune cell types was significantly associated with survival (multivariate Cox regression, p < 0.05), such as Tc + Th17 in adrenocortical carcinoma (ACC), Tc + Tcm in ACC, Tc + Tem in mesothelioma, γδ T + Tcm in cervical and endocervical cancers, MAIT + Th1 in diffuse large B‐cell lymphoma, and NKT + Th1 in uterine carcinosarcoma (Figure 4c). In addition, the infiltration of immune cells was correlated with microsatellite instability (MSI) after partial correlation analysis was performed to reduce false correlation caused by other features, such as age and gender (Figure S5a,b, Supporting Information). In colon adenocarcinoma (COAD) and stomach adenocarcinoma (STAD), patients with high‐MSI cancer (MSI‐H) showed a significantly higher infiltration of antitumor and tumor helper cells, such as Tc, γδ T, NK, and DC (Figure S6a, Supporting Information), but a significantly lower infiltration of tumor suppressor cells (Tr1 and neutrophils) and CD8 naïve cells (Figure S6b, Supporting Information). These results may partially explain the better outcomes of patients with MSI‐H colorectal cancer undergoing immunotherapy.[qv: 27] Furthermore, we observed that some immune cell types showed stage‐related profiles in cancers. For example, the infiltration of Tex, Th1, iTreg, and CD8+ T cells gradually increased with the development of kidney renal clear cell carcinoma (Figure S7, Supporting Information).
3. Discussions
Increasing evidence suggests that immune cells play critical roles in carcinogenesis and progression, and a proper proportion of T‐cell subsets could contribute to long‐term clinical benefits of anticancer treatments.[qv: 28] Investigating the abundance of immune cell types could help to have a more comprehensive understanding of the immune status of patients and could thus benefit disease therapy.[qv: 29] In this study, we developed ImmuCellAI, a highly accurate method of estimating the abundance of immune cells, particularly T‐cell subsets, from transcriptome data. The case study application results on immunotherapy and pan‐cancer data suggest that ImmuCellAI is a very useful tool in cancer immunology.
ImmuCellAI focuses on immune cell prediction from expression profiles of array or RNA‐seq data and is able to enumerate the abundance of 18 T‐cell subsets (Figure 1a). Although there are several methods available for immune cell abundance estimation, most of them have intrinsic limitations. The 4 immune cell focused tools (CIBERSORT, EPIC, TIMER, and MCP‐counter) can only predict two to four subsets of T cells and only CIBERSORT can be used on both array and RNA‐Seq data (Figure 1a). xCell was designed for immune and stroma cells including 10 T‐cell subsets, while it cannot retrieve the abundance of some important T‐cell subsets (iTreg, Tc, and exhausted T cells etc., which were proved playing vital roles in the immune system and immunotherapy[qv: 30–32]) and does not show robust accuracy in some cases (e.g., Figure 2b). Compared with those alive tools, the unique function of ImmuCellAI is that it can accurately estimate the abundance of comprehensive T‐cell subsets, which is particularly important in cancer therapy. For those immune cells that could be identified by other methods, our comparison results showed that ImmuCellAI had the highest consistency with flow cytometry results for most cells (Figure 2b–e). Because there are very limited T‐cell subsets with both flow cytometry data and RNA‐Seq data, we produced two datasets using flow cytometry analysis for all 24 immune cell types and sequenced their RNA. The results confirmed that ImmuCellAI has the best performance in terms of accurately identifying these T‐cell subsets, which is an advantage of this method.
Tumor‐infiltrating T cells could serve as a prognostic factor and predictor of therapeutic efficacy.[qv: 33] In our result, the abundance of Tc, γδ T, and DC cells were significantly increased in both comparisons of on‐treatment versus pre‐treatment and responders versus non‐responders (Figure 3a,b). DC cells are antigen‐presenting cells (APC) that are essential for the activation of immune responses, which have the potential to turn immunologically “cold” tumors into “hot” tumors.[qv: 34] Tc cells play key roles in the tumor cell killing process,[qv: 35] and γδ T cells also function in the antigen recognition and tumor killing process.[qv: 36] To extend these results, taking the abundance of 22 immune cells of pre‐treatment samples into account, an immunotherapy model was proposed and reached high accuracy (AUC 0.80–0.91) (Figure 3c). The model using abundance of 22 immune cells predicted by ImmuCellAI was implemented for immunotherapy response prediction on the ImmuCellAI server. This is the second user‐friendly web server for immunotherapy response prediction, except for the TIDE.[qv: 37] Comparing with the reported accuracy of TIDE, our ImmuCellAI has a little bit higher accuracy. In addition, to date, most studies have focused on the infiltration of CD8+ T cells as a predictive biomarker for response to ICB therapy.[qv: 38] Our study indicated that integrating many immune cells, particularly different T‐cell subsets, could serve as a biomarker for better therapy response prediction (Figure 3c). Thus, the systematic evaluation of immune cell abundance could be an effective approach for predicting immunotherapy response and improving the effects of cancer immunotherapy.[qv: 39] Some of the results of the infiltration of immune cells in TCGA cancer data were consistent with those of previous reports, for example, the infiltration of Tc cells is more often observed in kidney cancer but less so in colorectal cancer.[qv: 40] These results indicate the powerful and unique function of ImmuCellAI on cancer immunology and immunotherapy research.
Although ImmuCellAI had the best performance in comparison with other methods, it still has several limitations that need to be addressed. First, ImmuCellAI could only estimate the relative abundance of immune cells based on the deviation of gene signatures. It could not provide the absolute amount of each immune cell type. Moreover, ImmuCellAI did not consider the spatiotemporal localization of immune cells and the abundance of cancer cells. In addition, the sample size used in the immunotherapy case study was relatively small, and the performance of our model needs to be tested in larger cohorts. Other immune cell subsets, besides the T cells used in our method, also need to be tested in future studies.
In summary, this study presented an accurate and reliable tool ImmuCellAI to dissect T‐cell properties and explore the infiltration of immune cells in cancer. The best advantage of ImmuCellAI is its ability to accurately estimate the abundance of 18 T‐cell subsets, which is its unique function. Besides, ImmuCellAI can be applied to predict the patient response of ICB therapy. The results of ImmuCellAI provided valuable prognostic predictors and comprehensive resources to elucidate cancer‐immune interactions, which could facilitate applications of cancer immunotherapy and precision medicine.
4. Experimental Section
The Main Algorithm of ImmuCellAI
The main algorithm of ImmuCellAI, presented in Figure 1b, includes three main steps: 1) reference expression matrix (RT) and marker gene preparation, 2 enrichment score calculation, and 3 compensation matrix correction.
Reference Expression Matrix and Marker Gene Preparation
The datasets of the expression profiles of 24 immune cell types (Figure 1a) were downloaded from the GEO database. In total, 415 datasets from 26 studies were manually curated to build RT of the immune cell types (Table S2, Supporting Information). Gene expression data were obtained from CEL files according to the frozen robust multiarray analysis protocol with batch effect correction.[qv: 41] Each line of the matrix denotes the expression of a gene in the 24 immune cell types. The median value was used if there were multiple samples of a cell type.
Furthermore, a gene signature per cell type was developed by integrating the marker genes obtained from the literature and other analytical methods, such as CIBERSORT and xCell; thus, a total of 2547 genes were collected (denoted as Ga, Table S1, Supporting Information). Next, a robust marker gene set per immune cell type was selected using in silico simulated data by taking advantage of the TCGA data, which was based on the work of Li et al.[qv: 14] For each cancer type in the TCGA data, the expression of Ga in the samples (log2 transferred) and immune cell reference profiles were used to simulate the immune cell infiltrated tumor samples with known fractions. To control the mixing ratios of immune cell components for maintaining the correlative structure of real data, the gene–gene covariance matrix Σa was first calculated for all genes in Ga using tumor expression data. Then, 24 numbers (f1–f24) were randomly sampled from Uniform (0,1) and μa (length n) was calculated, which was the average of gene expression in the reference profiles of the 24 immune cell types weighted by f1–f24. Next, a vector of length n was sampled from the multivariate normal distribution with mean μa and covariance Σa. For each cancer type, the same number of samples was simulated as its sample size in the TCGA data.
Then, for all collected marker genes in Ga, the average correlation between gene expression in simulated samples with cell fractions was calculated using Pearson correlation for all cancers, and genes with an average correlation of r ≥ 0.6 were selected (denoted as G1). Next, for each marker gene per immune cell, the standard correlation deviation among the cell with other cells was calculated, and genes with standard deviation larger than 1.5 were selected (denoted as G2). The deviation between CD4+ T and CD4+ T‐cell subsets (such as CD4+ naïve and Th1) as well as that between CD8+ T and CD8+ T‐cell subsets was not calculated. Finally, a robust marker gene set per immune cell type was obtained by intersecting G1 with G2 (denoted as Gf), which included 344 marker genes of the 24 immune cell types (Table S1, Supporting Information). In addition, a sparse matrix (ST) was constructed for these marker genes in which “1” means that the gene is a marker gene in the corresponding cell type.
Enrichment Score Calculation
For a user‐uploaded expression dataset, ImmuCellAI first calculates the expression deviation of all marker genes compared with RT. Here two different approaches were implemented to deal with the microarray and RNA‐Seq datasets.
(1) |
(2) |
(3) |
where vector D denotes the relative deviation of marker genes and STi is the vector in ST for marker genes of cell type i. RTi is the reference marker gene expression in cell type i, whereas S indicates the gene expression in the user‐provided dataset.
The ssGSEA algorithm in the GSVA package[qv: 42] was used to estimate the abundance of immune cell types. The ssGSEA enrichment score for deviation vector D of the gene signature of each immune cell type (named ES) was used to indicate the relative abundance of immune cell types in the user‐provided dataset. A higher enrichment score indicates a higher abundance of the immune cell type in the mixture sample than that of other cell types.
Compensation Matrix Correction
Some immune cell types may share a part of common marker genes, which will cause bias in the estimation of abundance of these immune cell types. Thus, ImmuCellAI used a compensation matrix and least square regression method based on the work of Aran et al.[qv: 11] to fix this issue. After the estimation of abundance of the detected immune cell types in a dataset, the weights of common marker genes for these immune cell types were reassigned with the following steps: 1) A N * N contribution matrix was produced by calculating the mutual contributions of marker genes in RT using ssGSEA. 2) Each column of the contribution matrix was divided by a diagonal value and weighted by the proportion of non‐diagonal elements, and a compensation matrix was obtained (named C). 3) To reduce redundancy and overestimation of compensation between detected immune cell types, ImmuCellAI discarded the compensational calibration between the parental immune cell type and its subsets (e.g., CD4+ T and Th1 cells), and limited the total compensation level at 0.5 for the non‐diagonal cell types. 4) The least square method was used to calibrate the enrichment score based on the compensation matrix C.
(4) |
where the parameter ES is the ssGSEA enrichment score of detected immune cell types and C is the compensation matrix. Finally, after calibration, we deemed the abundance of 24 immune cell types (named I) to be high confidence.
Benchmark Dataset Preparation: Our Datasets for the 24 Immune Cell Types
Heparinized blood samples from seven patients with leukemia and five healthy adult volunteers (Table S3, Supporting Information) were collected from Wuhan Central Hospital, China. Fresh blood samples were treated with Pharm Lyse (BD Biosciences, San Jose, CA, USA) to remove erythrocytes. Cells from each sample were used in parallel experiments of flow cytometry and RNA extraction. This study was approved by the ethics committee of Tongji Medical College, Huazhong University of Science and Technology, and followed the Declaration of Helsinki principles. The informed consent was obtained from all volunteers. The proportions of the 24 immune cell types used in the study were examined by flow cytometry using the combined markers listed in Table S3, Supporting Information, and antibodies listed in Table S4, Supporting Information. All antibodies were purchased from BD Biosciences, except those used against TCR‐Vβ2 and TCR‐Va7.2 (Miltenyi Biotec, Bergisch Gladbach, Germany).
The total RNA extracted from the cells of all 12 samples was used for RNA sequencing (RNA‐Seq) (PE150) via the Illumina HiSeqTM4000 platform by Haplox (Jiangxi, China). RNA‐Seq reads were mapped to Ensembl v81 (GRCH38) and processed using the HISAT2‐StringTie‐ballgown pipeline. Fragments per kilobase per million mapped reads were used to calculate gene expression levels.
Other Public Datasets
The microarray datasets and corresponding flow cytometry results were obtained from GEO (accession nos. GSE65135 and GSE65133), which included 14 disaggregated lymph node biopsie samples from patients with follicular lymphoma and 20 peripheral blood samples from individuals vaccinated for influenza, respectively.
Besides, RNA‐Seq expression profile of samples from melanoma patients and their corresponding flow cytometry of four immune cell types (B, CD4+ T, CD8+ T, and NK) result were collected from EPIC publication.[qv: 13] Because of the scarcity of bulk RNA‐Seq datasets containing both gene expression profiles and flow cytometry counts for different immune cell types, particularly for T‐cell subsets, we simulated two bulk RNA‐Seq datasets by integrating the expression profiles of seven cell types (CD4+ naïve, CD8+ naïve, MAIT, Tcm, Tex, Treg, Th) from single‐cell RNA sequencing data. The transcripts per million (TPM) normalized expression of liver and lung cancers from two Nature papers were collected from GEO (GSE98638[qv: 18] and GSE99254[qv: 19]). Based on the work of Max et al.,[qv: 43] single‐cell expression was normalized as follows:
(5) |
for each single‐cell dataset, the TPM values were transformed to
To ensure cross‐sample comparability, the expression of all single‐cell samples from the same dataset were normalized to the average expression of 3686 housekeeping genes[qv: 44] as follows:
(6) |
where expi represents the gene expression profile of sample i, HKi denotes the average gene expression of all housekeeping genes in sample i, and is the average expression of all housekeeping genes in all samples. Besides, a single‐cell sequencing dataset from 19 patients with melanoma was collected from GEO (accession GSE72056), which is the normalized expression matrix as described above by Tirosh et al.[qv: 20] and contains the single‐cell RNA‐Seq of B cells, T cells, macrophages, NK cells, and three other nonimmune cell types. Because CD8+ and CD4+ T cells can be easily distinguished by CD4, CD8A, and CD8B expression, we divided T cells into CD8+ T cells, CD4+ T cells, and others. Then, the bulk expression of each sample was identified by aggregating normalized expression from all cell barcodes for each patient sample. The cell ratio per cell type in a sample was calculated by the cell number of a specific cell type divided by the total number of cells (Tables S5–S7, Supporting Information).
Performance Assessment of ImmuCellAI
The performance of ImmuCellAI was evaluated using both microarray and RNA‐Seq datasets and compared with that of five other methods (CIBERSORT, EPIC, MCP‐counter, TIMER, and xCell). For a given immune cell type, the accuracy and sensitivity of each method were measured using the Pearson correlation between the results of in silico method and flow cytometry counting in samples (named ri). In addition, we introduced the correlation deviation to measure the global performance of each method, which took the sample size and overall accuracy into consideration.
(7) |
where n is the amount of immune cell types detected in samples and ri is the Pearson correlation of immune cell type i.
Case Study of Immune Therapy and Prediction Model Building
Five immune checkpoint therapy datasets, including those from anti‐PD1‐ or anti‐CTLA4‐treated patients with melanoma or gastric cancer, were collected from the GEO database (GSE91061,[qv: 21] GSE78220,[qv: 22] and GSE115821[qv: 23]) and dbGAP (ERP107734[qv: 24] and SRP011540[qv: 25]). The abundance of infiltrating immune cells was calculated by ImmuCellAI and used to build the response prediction model.
The immunotherapy response prediction model was built using support vector machine with the radial basis function kernel. The training features were the abundance of immune cell types. The sequential backward feature selection algorithm was used to minimize the feature number and improve the performance. At first, three GEO datasets composed of 91 pre‐treatment samples (response: complete response and partial response, n = 27, non‐response: stable disease and progressive disease, n = 64) were used to train and test the model. The undersampling method was used to fit the unbalanced sample size between responders and non‐responders with 38 samples in the training and validation cohort (19 responders and 19 non‐responders, fivefold cross validation) and 53 samples in the test cohort (eight responders and 45 non‐responders). Then, the other two cohorts from dbGAP (ERP107734: 12 responders and 33 non‐responders; SRP011540: seven responders and 33 non‐responders) were used to further validate the model. The area under curve (AUC) was used to measure the model performance.
The gene expression profiles of TCGA samples and the clinical information were downloaded from Broad GDAC Firehose (https://gdac.broadinstitute.org/).
Statistical Analysis
Basic statistical analyses, such as Wilcoxon rank sum test and Pearson correlation, were performed using R language. The correlations between clinical indicators and the abundance of immune cell types were evaluated using partial correlation analysis in the R package “ppcor.” Multivariate Cox regression, log‐rank test, and Kaplan–Meier in R package “survival” were used to assess the relationships between the abundance of immune cell types and survival time. The p values for each test were calibrated using FDR, and the FDR threshold was 0.1 in case studies. All results supported the current study and were deposited into the ImmuCellAI website (https://bioinfo.life.hust.edu.cn/web/ImmuCellAI/).
Data and Materials Availability
The sequence data sets reported in this paper have been deposited in the National Genomics Data Center with Accession Nos. CRA001839 and CRA001840.
Conflict of Interest
The authors declare no conflict of interest.
Authors Contributions
Y.R.M. and Q.Z. performed formal analysis, conceptualized, conceived method, and wrote the original draft and edited the manuscript; Q.L. collected samples and performed the experiments; M.L. and G.Y.X. performed formal analysis; H.X.W. provided sample and experiment assistance; A.Y.G. conceptualized, wrote‐reviewed and edited the manuscript, and funded and supervised the study
Supporting information
Acknowledgements
Y.‐.M., Q.Z., and Q.L. contributed equally to this work. The authors express their gratitude to all study participants. The authors acknowledge funding from the National Natural Science Foundation of China (Grant Nos. 31822030, 31801113, and 31771458), National Key Research and Development Program of China (Grant No. 2017YFA0700403), China Postdoctoral Science Foundation (Grant Nos. 2018M632830, 2017M622455, and 2019T120664)
Miao Y.‐R., Zhang Q., Lei Q., Luo M., Xie G.‐Y., Wang H., Guo A.‐Y., ImmuCellAI: A Unique Method for Comprehensive T‐Cell Subsets Abundance Prediction and its Application in Cancer Immunotherapy. Adv. Sci. 2020, 7, 1902880 10.1002/advs.201902880
References
- 1. Chaplin D. D., J. Allergy Clin. Immunol. 2010, 125, S3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Brodin P., Davis M. M., Nat. Rev. Immunol. 2017, 17, 21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Dunn G. P., Koebel C. M., Schreiber R. D., Nat. Rev. Immunol. 2006, 6, 836. [DOI] [PubMed] [Google Scholar]
- 4. Gajewski T. F., Schreiber H., Fu Y.‐X., Nat. Immunol. 2013, 14, 1014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Pandya P. H., Murray M. E., Pollok K. E., Renbarger J. L., J. Immunol. Res. 2016, 2016, 1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Kochenderfer J. N., Dudley M. E., Kassim S. H., Somerville R. P. T., Carpenter R. O., Stetler‐Stevenson M., Yang J. C., Phan G. Q., Hughes M. S., Sherry R. M., Raffeld M., Feldman S., Lu L., Li Y. F., Ngo L. T., Goy A., Feldman T., Spaner D. E., Wang M. L., Chen C. C., Kranick S. M., Nath A., Nathan D.‐A. N., Morton K. E., Toomey M. A., Rosenberg S. A., J. Clin. Oncol. 2015, 33, 540. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Luckheeram R. V., Zhou R., Verma A. D., Xia B., Clin. Dev. Immunol. 2012, 2012, 1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Kamphorst A. O., Pillai R. N., Yang S., Nasti T. H., Akondy R. S., Wieland A., Sica G. L., Yu K., Koenig L., Patel N. T., Behera M., Wu H., McCausland M., Chen Z., Zhang C., Khuri F. R., Owonikoko T. K., Ahmed R., Ramalingam S. S., Proc. Natl. Acad. Sci. U. S. A. 2017, 114, 4993. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Simpson T. R., Li F., Montalvo‐Ortiz W., Sepulveda M. A., Bergerhoff K., Arce F., Roddie C., Henry J. Y., Yagita H., Wolchok J. D., Peggs K. S., Ravetch J. V., Allison J. P., Quezada S. A., J. Exp. Med. 2013, 210, 1695. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Mittal D., Gubin M. M., Schreiber R. D., Smyth M. J., Curr. Opin. Immunol. 2014, 27, 16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Aran D., Hu Z., Butte A. J., Genome Biol. 2017, 18, 220. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Newman A. M., Liu C. L., Green M. R., Gentles A. J., Feng W., Xu Y., Hoang C. D., Diehn M., Alizadeh A. A., Nat. Methods 2015, 12, 453. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Racle J., de Jonge K., Baumgaertner P., Speiser D. E., Gfeller D., eLife 2017, 6, e26476. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Li T., Fan J., Wang B., Traugh N., Chen Q., Liu J. S., Li B., Liu X. S., Cancer Res. 2017, 77, e108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Becht E., Giraldo N. A., Lacroix L., Buttard B., Elarouci N., Petitprez F., Selves J., Laurent‐Puig P., Sautès‐Fridman C., Fridman W. H., de Reyniès A., Genome Biol. 2016, 17, 218. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Gong T., Szustakowski J. D., Bioinformatics 2013, 29, 1083. [DOI] [PubMed] [Google Scholar]
- 17. Barbie D. A., Tamayo P., Boehm J. S., Kim S. Y., Moody S. E., Dunn I. F., Schinzel A. C., Sandy P., Meylan E., Scholl C., Fröhling S., Chan E. M., Sos M. L., Michel K., Mermel C., Silver S. J., Weir B. A., Reiling J. H., Sheng Q., Gupta P. B., Wadlow R. C., Le H., Hoersch S., Wittner B. S., Ramaswamy S., Livingston D. M., Sabatini D. M., Meyerson M., Thomas R. K., Lander E. S. et al., Nature 2009, 462, 108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Zheng C., Zheng L., Yoo J.‐K., Guo H., Zhang Y., Guo X., Kang B., Hu R., Huang J. Y., Zhang Q., Liu Z., Dong M., Hu X., Ouyang W., Peng J., Zhang Z., Cell 2017, 169, 1342. [DOI] [PubMed] [Google Scholar]
- 19. Guo X., Zhang Y., Zheng L., Zheng C., Song J., Zhang Q., Kang B., Liu Z., Jin L., Xing R., Gao R., Zhang L., Dong M., Hu X., Ren X., Kirchhoff D., Roider H. G., Yan T., Zhang Z., Nat. Med. 2018, 24, 978. [DOI] [PubMed] [Google Scholar]
- 20. Tirosh I., Izar B., Prakadan S. M., Wadsworth M. H., Treacy D., Trombetta J. J., Rotem A., Rodman C., Lian C., Murphy G., Fallahi‐Sichani M., Dutton‐Regester K., Lin J.‐R., Cohen O., Shah P., Lu D., Genshaft A. S., Hughes T. K., Ziegler C. G. K., Kazer S. W., Gaillard A., Kolb K. E., Villani A.‐C., Johannessen C. M., Andreev A. Y., Van Allen E. M., Bertagnolli M., Sorger P. K., Sullivan R. J., Flaherty K. T. et al., Science 2016, 352, 189. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Riaz N., Havel J. J., Makarov V., Desrichard A., Urba W. J., Sims J. S., Hodi F. S., Martín‐Algarra S., Mandal R., Sharfman W. H., Bhatia S., Hwu W.‐J., Gajewski T. F., Slingluff C. L., Chowell D., Kendall S. M., Chang H., Shah R., Kuo F., Morris L. G. T., Sidhom J.‐W., Schneck J. P., Horak C. E., Weinhold N., Chan T. A., Cell 2017, 171, 934. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Hugo W., Zaretsky J. M., Sun L., Song C., Moreno B. H., Hu‐Lieskovan S., Berent‐Maoz B., Pang J., Chmielowski B., Cherry G., Seja E., Lomeli S., Kong X., Kelley M. C., Sosman J. A., Johnson D. B., Ribas A., Lo R. S., Cell 2017, 168, 542. [DOI] [PubMed] [Google Scholar]
- 23. Auslander N., Zhang G., Lee J. S., Frederick D. T., Miao B., Moll T., Tian T., Wei Z., Madan S., Sullivan R. J., Boland G., Flaherty K., Herlyn M., Ruppin E., Nat. Med. 2018, 24, 1545. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Kim S. T., Cristescu R., Bass A. J., Kim K.‐M., Odegaard J. I., Kim K., Liu X. Q., Sher X., Jung H., Lee M., Lee S., Park S. H., Park J. O., Park Y. S., Lim H. Y., Lee H., Choi M., Talasaz A., Kang P. S., Cheng J., Loboda A., Lee J., Kang W. K., Nat. Med. 2018, 24, 1449. [DOI] [PubMed] [Google Scholar]
- 25. Miao D., Margolis C. A., Vokes N. I., Liu D., Taylor‐Weiner A., Wankowicz S. M., Adeegbe D., Keliher D., Schilling B., Tracy A., Manos M., Chau N. G., Hanna G. J., Polak P., Rodig S. J., Signoretti S., Sholl L. M., Engelman J. A., Getz G., Jänne P. A., Haddad R. I., Choueiri T. K., Barbie D. A., Haq R., Awad M. M., Schadendorf D., Hodi F. S., Bellmunt J., Wong K.‐K., Hammerman P. et al., Nat. Genet. 2018, 50, 1271. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Fend F., Quintanilla‐Martínez L., Haematologica 2014, 99, 599. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Kalyan A., Kircher S., Shah H., Mulcahy M., Benson A., J. Gastrointest. Oncol. 2018, 9, 160. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Schaaf M. B., Garg A. D., Agostinis P., Cell Death Dis. 2018, 9, 115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Delhalle S., Bode S. F. N., Balling R., Ollert M., He F. Q., npj Syst. Biol. Appl. 2018, 4, 9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Adeegbe D. O., Nishikawa H., Front. Immunol. 2013, 4, 190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Thommen D. S., Schumacher T. N., Cancer Cell 2018, 33, 547. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Ganesan A.‐P., Clarke J., Wood O., Garrido‐Martin E. M., Chee S. J., Mellows T., Samaniego‐Castruita D., Singh D., Seumois G., Alzetani A., Woo E., Friedmann P. S., King E. V., Thomas G. J., Sanchez‐Elsner T., Vijayanand P., Ottensmeier C. H., Nat. Immunol. 2017, 18, 940. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Pardoll D. M., Nat. Rev. Cancer 2012, 12, 252. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Teng M. W. L., Ngiow S. F., Ribas A., Smyth M. J., Cancer Res. 2015, 75, 2139. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Martínez‐Lostao L., Anel A., Pardo J., Clin. Cancer Res. 2015, 21, 5047. [DOI] [PubMed] [Google Scholar]
- 36. Lawand M., Déchanet‐Merville J., Dieu‐Nosjean M.‐C., Front. Immunol. 2017, 8, 10.3389/fimmu.2017.00761. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Jiang P., Gu S., Pan D., Fu J., Sahu A., Hu X., Li Z., Traugh N., Bu X., Li B., Liu J., Freeman G. J., Brown M. A., Wucherpfennig K. W., Liu X. S., Nat. Med. 2018, 24, 1550. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Pfannenstiel L. W., Diaz‐Montero C. M., Tian Y. F., Scharpf J., Ko J. S., Gastman B. R., Cancer Immunol. Res. 2019, 7, 510. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Finn O. J., Ann. Oncol. 2012, 23, viii6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Rooney M. S., Shukla S. A., Wu C. J., Getz G., Hacohen N., Cell 2015, 160, 48. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. McCall M. N., Bolstad B. M., Irizarry R. A., Biostatistics 2010, 11, 242. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Hänzelmann S., Castelo R., Guinney J., BMC Bioinformatics 2013, 14, 7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Schelker M., Feau S., Du J., Ranu N., Klipp E., MacBeath G., Schoeberl B., Raue A., Nat. Commun. 2017, 8, 2032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Eisenberg E., Levanon E. Y., Trends Genet. 2013, 29, 569. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.