Abstract
Immune checkpoint inhibitors (ICIs) have significantly changed cancer therapy, yet their response rates remain relatively low. Identifying methods for robust prediction is crucial. This study evaluates the efficacy of gene-based methods for deriving predictive tumor-microenvironment scores in cancer patients, focusing on their performances in predicting survival outcomes and response to ICI therapy across various cancer types. The TIP Hot method demonstrated robustness as a predictive method for ICI response, particularly in Non-Small Cell Lung Cancer, Head and Neck Squamous Cell Carcinoma, and Urothelial Cancer. However, no score is robustly applicable to all cancer types. Therefore, significant challenges remain due to the variability of tumor biology and host immune responses, and universally applicable method should be further explored. Future research should aim to refine these predictive scoring methods through larger and more diverse datasets, and integrate advanced computational techniques to enhance predictive accuracy and utility in personalized cancer treatment.
Subject terms: Cancer, Computational biology and bioinformatics
Introduction
In recent years, the development of Immune Checkpoint Inhibitors (ICI) has significantly advanced cancer treatment and revolutionized cancer immunotherapy1. ICIs are drugs designed to target immune checkpoints within the immune system, ultimately enabling the immune system to recognize and kill tumor cells. ICIs such as anti-Programmed Death-Ligand 1 (anti-PD-L1), anti-Programmed Death 1 (anti-PD-1), and anti-Cytotoxic T-Lymphocyte-Associated protein 4 (anti-CTLA-4) have been approved by the U.S. Food and Drug Administration as anti-cancer agents2. These agents have demonstrated their capacities to extend patients’ survival3. Despite the enduring benefits of ICIs, several limitations and challenges persist. Primarily, a subset of patients fails to respond to ICIs due to either primary or acquired resistance, resulting in low response rates4. Only approximately 19%–44% of patients with advanced melanoma, Non-Small-Cell Lung Cancer (NSCLC), or Renal Cell Carcinoma (RCC) respond to ICI therapy5. In addition to low response rates, ICIs may induce side effects in non-responders, potentially introduce toxicity and progress the disease6,7. Moreover, ICIs have demonstrated limited effectiveness for certain cancer types, such as breast, prostate and colon cancers8. Consequently, identifying potential ICI responders at the baseline level is crucial for improving patient selection.
Extracting reliable biomarkers to predict ICI response in various cancers remains a significant challenge. Conventional biomarkers such as tumor mutation burden, somatic copy number alterations, HLA gene expression, and PD-L1 expression are reported to be associated with ICI response9. However, the predictive accuracy of these conventional biomarkers independently remains suboptimal, with less than 60% accuracy reported in some studies10. Moreover, the predictive power of some conventional biomarkers in immune response prediction remains unclear, especially the controversial roles of high mutation burden and high PD-L1 expression level11–14. Therefore, efforts have been made to construct novel methods depicting the Tumor Microenvironment (TME). TME is an ecosystem containing tumor cells, blood vessels, extracellular matrix and other cellular components. Tumor cells can functionally reframe their TMEs, leading to the reprogramming of the relating cells, thereby impacting tumor survival and progression15. To tackle the outstanding challenge of identifying patients who would respond to ICI therapy, various TME-related methods have been developed. These methods were trained by either non-ICI-treated patient data or ICI-treated patient data. Methods derived from non-ICI-treated patient data such as Tumor Immune Dysfunction and Exclusion (TIDE), Immune Cell Abundance Identifier (ImmuCellAI), and Signature of Immune Activation (SIA) focus on depicting the immune system, and demonstrate predictive power in melanoma, breast, and bladder cancer16–20. Beyond the immune system, the ISTME method provides scores for both immune and stromal systems, capturing more information in the TME and serving as a robust biomarker for melanoma21. However, limitations exist in the lack of sufficient information from ICI-treated patients in these methods, resulting in difficulty to identify signature specific to ICI treatments22. Consequently, several methods have been developed using data from ICI-treated patients, such as radiomic biomarkers identified by Trebeschi et al. and NetBio signature developed by Kong et al.22,23. These methods reached higher Area Under Curves (AUCs) than conventional biomarkers and demonstrated significantly longer overall survival for the predicted responders.
In this paper, we conduct a systematic review and a meta-analysis on TME scores developed from bulk RNA-seq data. We evaluated 17 TME scores that have been developed to predict ICI response (Table 1). Notably, some methods were developed primarily for survival prognosis irrespective of treatments, but have demonstrated their capabilities in predicting ICI response, for instance the CD8Treg, ISTME, and SIA methods. Some were specifically designed to assess immune response, yet have also been reported as significantly associated with post-ICI survival. Additionally, some methods, although not intended for either purpose, have shown the capabilities to predict ICI response and survival outcome. Through this review, we aim to identify methods that robustly predict both survival and ICI response, and discuss the applicability and future perspectives of TME scoring methods.
Table 1.
TME score description
| Reference | Name | ICI-oriented | Description | Method | Signature |
|---|---|---|---|---|---|
| Sato et al. | CD8Treg | No | Abundance of CD8+ T cells and regulatory T cells | Ratio | CD8T/Treg |
| Chang et al. | B_cells | Yes | Tumor and blood B-cell abundance | Deconvolution | B-cell abundance |
| Wang et al. | TIP Hot | Yes | Tumor immunological phenotype-based hot tumor genes | Mean | CXCL9-11, CXCR3, CD3, CD4, CD8A,CD8B, CD274, PDCD1, CXCE4, CCL5 |
| TIP Cold | Tumor immunological phenotype-based cold tumor genes | Mean | CXCL1, CXCL2, CCL20 | ||
| Rooney et al. | CYT1 | No | key cytolytic effectors | Geometric mean | GZMA, PRF1 |
| CYT2 | spontaneous cytolytic activity | Geometric mean | B2M, HLA-A, HLA-B, HLA-C, CASP8 | ||
| Auslander et al. | IMPRES | Yes | Immune checkpoints pairs | Sum of indicator function | (PD-1,OX40L), (CD27,PD-1), (CTLA4,OX40L), (CD40,CD28), (CD86,OX40L), (CD28,CD86), (CD80,CD137L), (PDL-1,VISTA), (CD86,TIM-3), (CD40,PD-1), (CD86,CD200), (CD40,CD80), (CD28,CD276), (CD40,PDL-1), (HVEM,CD86) |
| Mezheyeuskia et al. | SIA | No | CD8+ lymphocytes and CD68+CD163+ cells | Ratio | CD8A/C1QA |
| Damotte et al. | TIS | Yes | Tumor inflammation signature | Geometric mean | CD276, HLA-DQA1, CD274, IDO1, HLA-DRB1, HLA-E, CMKLR1, PDCD1LG2, PSMB10, LAG3, CXCL9, STAT1, CD8A, CCL5, NKG7, TIGIT, CD27, CXCR6 |
| Cabrita et al. | TLS | Yes | Tertiary lymphoid structures genes | Geometric mean | BCL6, CD86, CXCR4, LAMP3, SELL,CCR7, CXCL13, CCL21, CCL19 |
| Jiang et al. | TIDE | Yes | T cell exclusion and dysfunction | Pearson correlation | Did not provide in original publication |
| Ayers et al. | IFNγ | Yes | Interferon-γ signature | Geometric mean | CD3D, IDO1, CIITA, CD3E, CCL5, GZMK, CD2, HLA-DRA, CXCL3, IL2RG, NKG7, HLA-E, CXCR6, LAG3, TAGAP, CXCL10, STAT1, GZMB |
| Ni et al. | TGFβ | Yes | TGF-β signaling pathway genes | ssGSEA | SLC20A1, XIAP, TGFBR1, BMPR2, FKBP1A, SKIL |
| Zeng et al. | IS_immune | No | Immune system gene signature | ssGSEA | see Table S6 in original publication |
| IS_stromal | Stromal system gene signature | ssGSEA | see Table S6 in original publication | ||
| Kong et al. | NetBio | Yes | Pathways in PD-1 and PD-L1 network | ssGSEA | see Data S4 in original publication |
| Bill et al. | CS Polarity | No | Macrophage polarity | Ratio | CXCL9/SPP1 |
Results
Overview of TME scores
TME can be quantified and analyzed through various experimental methods, including immunohistochemistry, cytometry, and transcriptomics. Among these, RNA sequencing and microarrays stand out for their abilities to profile a higher number of markers with precise quantification and greater accessibility24. Therefore, using bulk RNA-seq data to assess TME has become a prevalent approach for constructing TME scoring methods. The development of a TME score typically involves two main steps: (i) identifying critical features from the full genome, and (ii) assigning scores to these critical features. The critical features may include biomarkers, genes, biological pathways, and newly clustered features. Since these critical features reflect underlying changes in the TME, they often correlate with cancer risk and patient survival, which makes the resulting TME scores capable of predicting responses to ICI treatments. The primary distinction between TME scoring methods lies in how the critical features are selected. In this context, we summarize two types of strategies that utilize bulk RNA-seq data to derive the TME scores, respectively scoring by immune cells and molecular mediators, and by computational methods.
Immune cells are one of the key components in the TME and have significant influence on immune response. For instance, CD8 T cells play a crucial role in tumor cells elimination25, and memory T cells contribute to long-term immunologic memory under the guidance of CD4+ helper T cells26,27. Other immune cells, including B cells, NK cells, dendritic cells, and macrophages, are also associated with immune checkpoint activities and are important for both survival and immune response28–32. Given the importance of immune cells in immune response, a straight-forward strategy to capture TME characteristics is to estimate immune cell abundance from bulk RNA-seq data using deconvolution methods such as CIBERSORTx and Kassandra33,34. These methods employ well-defined algorithms to quantify the relative proportion of immune cell types within a bulk cell population. In this review, we include two TME scores constructed directly using immune cell abundance (Table 1 Rows 1–2). Sato et al. defined the ratio between CD8 T cells and regulatory T cells (CD8Treg) to quantify tumor-infiltrating lymphocytes35. Chang et al. identified that B cell abundance in tumor and blood samples (B_cells score) was associated with tumor eradication promotion and demonstrated predictivity to ICI response, especially in head and neck cancer36. Another common approach is to leverage established TME-related gene markers that are significantly expressed by immune cells or molecular mediators. Informative TME scores can be constructed by systematically evaluating the relevance of these markers to ICI response (Table 1 Rows 3–8). For example, Wang et al. extracted immune therapy-related genes from literature and identified the Tumor Immunological Phenotype (TIP) Hot and Cold signatures as the top ranked genes regarding hot and cold tumors37. Rooney et al. assessed the association between predefined cytolytic activity gene sets and immune cell types, and identified key cytolytic effectors (CYT1) as well as genes reflecting spontaneous cytolytic activity (CYT2), both intended to measure T cell cytolytic activities38. Similarly, Auslander et al. established the IMmuno-PREdictive Score (IMPRES) by first constructing a list of immune checkpoints with known co-stimulatory or co-inhibitory effects from published studies, and formed immune checkpoint pairs based on the relevance to anti-PD-1 or anti-CTLA-4 therapy39. Mezheyeuski et al. extracted cell counts from tumor slides and identified that CD8+CD163+ and CD8 cells had the highest and lowest hazard ratios to overall survival, hence defining the ratio between them as the prognostic SIA score20. Previous studies have shown that CD8A and the C1Qx family are gene markers for CD8 T cells and CD8+CD163+ cells, respectively. Therefore, for bulk RNA-seq data, the SIA score was computed through the ratio of CD8A and C1Qx. The SIA score was originally developed for survival prognosis, but demonstrated to be associated with ICI response in colon cancer in its original publication. Likewise, Damotte et al. and Cabrita et al. also collected data from stained tumor slides and identified over-expressed genes as the Tumor Inflammation Signature (TIS) and Tertiary Lymphoid Structure (TLS), respectively40,41.
Instead of relying on established biomarkers or immune cell infiltration, many TME scores are based on computational methods. These approaches typically begin with the whole gene profile and apply feature selection techniques to identify key features. Traditional approaches for feature selection include machine learning algorithms, mathematical clustering techniques, and statistical models. Statistical and mathematical models are instrumental in the development of TME scores (Table 1 Rows 9–12). Jiang et al. used the Cox-PH model to identify genes related to Cytotoxic T Lymphocyte (CTL) levels and defined these genes as the T cell dysfunction signature16. They defined the TIDE score as the Pearson correlation between a tumor’s gene expression profile and either the dysfunction signature or the predefined exclusion signature depending on the tumor’s CTL level. Ayers et al. discovered the Interferon-γ (IFNγ) gene signature across patient data in an initial study by identifying its association with immune response using a one-sided T-test42. Similarly, Ni et al. derived the Transforming Growth Factor-β (TGFβ) score by performing differential gene analysis across various cancer types43. Zeng et al. constructed an Immune and Stromal TME scoring system (ISTME) using Non-negative Matrix Factorization (NMF)21, which decomposed the gene expression matrix into two components: a coefficient matrix and a weight matrix. NMF reduced the original high-dimensional gene expression data into a smaller set of features shared across all samples. As a result, the coefficient matrix captured a lower-dimensional representation of each tumor’s gene expression profile. Each row (feature) in this matrix was referred to as a cluster. Zeng et al. derived the ISTME immune and stromal gene signatures from genes associated with clusters enriched with immune or stromal related genes, and constructed the ISTME immune (IS_immune) and stromal (IS_stromal) scores based on the corresponding gene signatures. Although the two ISTME scores were constructed to depict the TME environment, they showed capabilities in predicting ICI response because of the association between TME and immune response. Machine learning models, particularly classifiers, are also employed in the construction of TME scores (Table 1 Rows 13–14). Kong et al. developed network-based (NetBio) method based on enriched biological pathways22. They utilized the PageRank algorithm to identify highly influential genes of immune checkpoints PD-1 and PD-L144, and incorporated the significant reactome biological pathways associated with these influential genes into a logistic regression model to predict ICI response. Alternatively, Bill et al. identified the macrophage polarity defined by CXCL9 and SPP1 (CS Polarity) using the pairwise log ratio approach with elastic network classifier on overall survival45.
Data
In order to get a comprehensive view of TME scores, we evaluated them on both individual cancer cohorts which contain per-study single cancer dataset, or merged dataset aggregating data from multiple sources and cancer types. Integrating data across various studies and cancer types enhances the robustness of the findings, and sometimes leads to new discoveries46. We therefore evaluated three aggregated cohorts, MELANOMA-I, MERGED-ICI, and TCGA. The MELANOMA-I cohort is a single cancer type cohort merging data from four sources and with two treatment types. This cohort includes two datasets of metastatic melanoma treated with anti-CTLA-4 therapy from Snyder et al.47 and Van Allen et al.48 and the other two data sources of patients with metastatic melanoma treated under anti-PD-1 or anti-PD-L1 therapy from Hugo et al.49 and Cui et al.50. The MERGED-ICI cohort is a mixed cancer type multi-source data cohort. We merged the aggregated MELANOMA-I cohort, a clear cell Renal Cell Carcinoma (ccRCC) cohort ccRCC51, two Urothelial Cancer (UC) cohorts UNC-10852 and UC-II53, a NSCLC cohort NSCLC-II54, and a mixed cancer cohort INvestigator-initiated Phase-II Study of Pembrolizumab Immunological Response Evaluation (INSPIRE)55 to create the MERGED-ICI cohort. The INSPIRE cohort is a mixed cancer cohort containing 21 patients with known response and RNA-seq from breast, melanoma, ovary, and head and neck cancers. Due to its limited sample size, we did not conduct analysis to this dataset independently but only merged it to the MERGED-ICI cohort. Except patients from the two anti-CTLA-4 treated datasets in MELANOMA-I, all the other patients underwent anti-PD-1 or anti-PD-L1 therapy in the MERGED-ICI cohort. The TCGA cohort is a non-ICI treated patient cohort including 904 samples collected from The Cancer Genome Atlas (TCGA)56.
For all aggregated cohorts, we used batch correction methods to reduce confounding. We compared three batch effect removal algorithms: ComBat57, Harmony58, and scBatch59, and selected the one showing the highest mean entropy after correction while also most integrated using PCA analysis (Supplementary Table 1). Based on this analysis, we selected ComBat method for all cohorts. Figure 1 illustrated the batch correction effect of the three merged cohorts by visualizing principal components.
Fig. 1. Illustration of the three merged cohorts before and after ComBat batch effect correction using PCA analysis.
Mean entropy is annotated on the plot. Details of datasets and the sample size in each dataset are available in the legend. a MELANOMA-I. b MERGED-ICI. c TCGA.
In addition to the three aggregated cohorts, our analysis also includes seven individual cancer cohorts spanning multiple cancer types. All the individual cancer cohorts are anti-PD-1 or anti-PD-L1 treated cohorts. Among all individual cohorts, we analyzed UNC-108, UC-II, NSCLC-II, and ccRCC that both independently and collectively after aggregating into the MERGED-ICI cohort. We also analyzed additional three cohorts independently, including a melanoma cohort MELANOMA-II60, a NSCLC cohort NIVOBIO61, and a Head and Neck Squamous Cell Carcinoma (HNSCC) cohort Centre Léon Bérard-Immunotherapy for Head and Neck patients (CLB-IHN)61.
In the following, we present the results of our analyses of TME scores’ performance in the two aggregated ICI-treated cohorts and then individual cohorts. We identify the optimal scores in each cohort and the most robust scores across cohorts. Lastly, we examine their prognosis value under traditional cancer therapies using the non-ICI-treated TCGA cohort. The list of all analyzed cohorts and their corresponding optimal scores and performances is available in Table 2.
Table 2.
Cohort description and the optimal score in each cohort
| Cohort | Cancer | R | NR | Size | Optimal | AUC | c-index | p-value |
|---|---|---|---|---|---|---|---|---|
| MERGED-ICI | mixed | 328 | 277 | 605 | CS Polarity | 0.6073 | 0.5807 | 0.0018 |
| MELANOMA-I | melanoma | 50 | 76 | 126 | CYT1 | 0.6703 | 0.6584 | 0.0440 |
| TIDE | 0.6453 | 0.6770 | 0.0080 | |||||
| TCGA | mixed | 904 | IS_immune | 0.7101 | 0.0002 | |||
| MELANOMA-II | melanoma | 70 | 34 | 104 | CS Polarity | 0.6608 | 0.0332 | |
| UNC-108 | UC | 60 | 29 | 89 | IS_immune | 0.7529 | 0.0005 | |
| UC-II | UC | 13 | 49 | 62 | TIP Hot | 0.6829 | 0.0077 | |
| NIVOBIO | NSCLC | 41 | 34 | 75 | TGFβ | 0.6622 | 0.0595 | |
| NSCLC-II | NSCLC | 91 | 45 | 136 | TIP Hot | 0.5978 | 0.0110 | |
| CLB-IHN | HNSCC | 38 | 64 | 102 | CYT1 | 0.6534 | 0.0330 | |
| SIA | 0.6461 | 0.0688 | ||||||
| ccRCC | ccRCC | 103 | 68 | 171 | None |
AUC column reports the AUC of ICI response prediction for the optimal method. R and NR columns report the number of responders and non-responders in each cohort. For mixed cohorts, the c-index of the multivariate Cox-PH model is shown. The p-value column reports the p-value in survival prognosis. For mixed cohorts, this column reports the p-value of the score in multivariate Cox-PH regression. For individual cancer cohorts, it reports the p-values of log-rank test between scores in patients with high and low scores.
TME scores in aggregated ICI-treated cohorts
We evaluated TME scores on two aggregated ICI-treated cohorts: MERGED-ICI and MELANOMA-I. We performed Receiver Operating characteristic Curve (ROC) analysis on ICI response prediction, where the TME scores were used to classify samples into responders and non-responders. Focusing on MERGED-ICI, among all the TME scores, CS Polarity demonstrated the highest AUC of 0.6073 and became the only score reaching an AUC over 0.6 in this cohort (Fig. 2a). The average AUC of all scores was 0.5525, with a small variance of 0.0015 (Supplementary Data 1). Although there were only few scores exhibited worse performance than random prediction, the other scores showed very similar and suboptimal performances. We continued to look at the differences in score distribution between responders and non-responders in the MERGED-ICI cohort (Fig. 2b). In agreement with the ROC analysis outcome, the distribution of responders and non-responders’ scores were significantly different for most scores. Consistent with the original studies, TIDE had a negative association with favorable response, while other scores were positively correlated with better response.
Fig. 2. Performance of each TME score in the MERGED-ICI cohort.
a Barplot of AUCs for each TME score in predicting ICI response. The horizontal gray line represents AUC = 0.5. b Boxplots comparing all scores' difference in score distributions between responders (green) and non-responders (yellow). c Forest plot of multivariate (dark) and univariate (light) Cox-PH regression on overall survival. The x-axis shows the coefficient (log of hazard ratio) of the regression. The y-axis depicts the TME scores. The vertical light gray dashed line shows the level of log of hazard ratio equals 0. The c-indices of both multivariate and univariate Cox-PH models are shown on the right side of the forest plot.
We further assessed whether these scores could predict post-ICI overall survival in MERGED-ICI. We constructed two models, a univariate Cox Proportional Hazard (Cox-PH) model using the score as the solely independent covariate, and a multivariate Cox-PH model with additional clinical covariates. Among available clinical factors, significant differences shown in survival status across treatment type, age, and gender in at least one of the aggregated cohorts (Fig. 3). Additionally, considering that batch effect from different data sources and tumor types might not be fully removed from the aggregated cohorts, we added data source and tumor type as two additional covariates in the multivariate Cox-PH model when more than one level was presented in the cohort. We used the concordence-index (c-index) to measure the model’s ability to predict survival62. Among all scores, TIS showed the strongest ability as an independent predictor of overall survival for the entire MERGED-ICI cohort (Fig. 2c, Supplementary Data 2, c-index = 0.5677). It also achieved the second highest c-index within the non-responder sub-group in MERGED-ICI (Supplementary Data 3, c-index = 0.5740). CS Polarity achieved the highest c-index in the multivariate Cox-PH model (Fig. 2c, c-index = 0.5807), indicating its strong synergistic effect with clinical covariates. However, although TME score performances varied in the univariate model, they performed similarly and remained suboptimal in the multivariate regression. The average c-index of the multivariate regression was 0.5667, only a modest increase of 0.0273 compared to the univariate model (Supplementary Data 2). This limited improvement may be due to the fact that among the additional covariates included in the multivariate Cox-PH models only the batch effect related variable data source was significance, while none of the clinical covariates significantly contribute to the prediction of overall survival in MERGED-ICI (Supplementary Data 2). Regarding the optimal score in the MERGED-ICI cohort, with the highest c-index in multivariate Cox-PH regression as well as the highest AUC in ICI response prediction, the CS Polarity score demonstrated the most robust performance. However, given the modest performances in both ICI response and overall survival prediction, further research is warranted to develop more robust and outstanding scoring methods capable of reliably predicting outcomes across mixed cancer ICI cohorts.
Fig. 3. Illustration of significant clinical covariates.
a, b Stacked barplot comparing the survival rates across different treatments in MERGED-ICI (a) and MELANOMA-I (b). The counts of each sub-category are annotated on the bars. Blue represents alive patients and gray represents dead patients. Fisher’s exact test was used to compare survival rate distributions between CTLA-4 and PD-1/PD-L1 treatments. c Stacked barplot comparing survival rates of gender in TCGA, assessed with Fisher’s exact test. d Illustration of the effect of age on survival status in TCGA, assessed with Mann-Whitney U test. In all panels, p-values are indicated by asterisks, with significance levels defined as: p-value < 0.001 (****); p-value < 0.01 (***); p-value < 0.05 (**); p-value < 0.1 (*); non-significant (ns).
We also evaluated TME scores on the other ICI-treated aggregated cohort MELANOMA-I. Compared to the MERGED-ICI cohort, TME scores had higher AUCs in predicting ICI response. The average AUC was 0.6024, which was 0.0498 higher than that of MERGED-ICI cohort (Fig. 4a, Supplementary Data 1). Among all scores, CYT1 and CYT2 both achieved relatively good prediction with AUCs exceeding 0.65 (CYT1: AUC = 0.6582, CYT2: AUC = 0.6703), demonstrating the importance of local immune cytolytic activity in melanoma ICI therapy. Similar to MERGED-ICI, most scores showed consistent performance when comparing score distribution between responders and non-responders (Fig. 4b). Consistent with the ROC analysis, TME scores exhibited stronger ability of predicting post-therapy survival in MELANOMA-I cohort compared to MERGED-ICI cohort. TIDE reported the highest c-index of 0.6248 in the univariate Cox-PH model (Fig. 4c), and it was the only score reaching a c-index over 0.6 in univariate Cox-PH regression for both ICI-treated merged cohorts. It also exhibited a favorable AUC of 0.6453, ranking the third among all scores. However, after adding clinical covariates, CYT2 became the optimal score with a c-index of 0.6786 (Fig. 4c). Although none of the clinical factors demonstrated statistical significance in MELANOMA-I Cox-PH regression, they still contributed an averaged increase of 0.0953 in multivariate c-index compared to the univariate model (Supplementary Data 2). This further supported that in the MERGED-ICI cohort, clinical factors were potentially confounded by tumor types and did not make valuable contributions in the Cox regression model. Overall, the robust performance of CYT2 and TIDE in MELANOMA-I cohort supported them to be reliable predictors of melanoma cancer under various ICI treatment types. Note that both CYT2 and TIDE focused on the level of cytotoxic T cells, highlighting the importance of cytotoxic T cell activity in melanoma immune response. To further investigate, we performed univariate Cox-PH regression separately within responder and non-responder subgroups. SIA and NetBio reached the highest c-indices of 0.7531 and 0.5851 in responders and non-responders, respectively (Supplementary Data 3). The high c-index of SIA among responders suggested its ability to capture survival heterogeneity within this subgroup, while its performance was modest in non-responders (c-index = 0.5162), resulting in limited predictive power in the entire MELANOMA-I cohort. CYT2 and TIDE, which were identified as optimal in the entire MELANOMA-I cohort, did not perform well within either subgroup. This suggested that both scores exhibited relatively homogeneous distributions within responders and non-responders, and that their predictive ability primarily reflected differences between the subgroups. Overall, TME scores demonstrated stronger prediction performance for post-therapy overall survival in responders compared to non-responders (Supplementary Data 3, responders: average c-index = 0.6411; nonresponders: average c-index = 0.5200), indicating that survival in non-responders may be less dependent on TME-related factors in melanoma.
Fig. 4. Performance of each TME score in MELANOMA-I.
a Barplot of AUCs for each TME score in predicting ICI response. The horizontal gray line represents AUC = 0.5. b Boxplots comparing all scores' difference in score distributions between responders (green) and non-responders (yellow). c Forest plot of multivariate (dark) and univariate (light) Cox-PH regression models on overall survival. The x-axis shows the coefficient (log of hazard ratio) of the regression. The y-axis is the TME scores. The vertical light gray dashed line shows the level of log of hazard ratio equals 0. The c-indices of both multivariate and univariate Cox-PH model are shown on the right side of the forest plot.
TME scores in individual cancer cohorts
Aggregating cohorts can provide a larger and more complex dataset to evaluate the robustness of TME scores, yet it also introduces unobserved confounding that is difficult to control and fully eliminate. Due to tumor heterogeneity and the specificity of TME scores, we conducted a detailed examination of performance in per-dataset level in order to offer a more precise assessment. Figure 5a–e illustrated the AUCs of ICI response prediction of some individual cohorts that have been identified at least one optimal score. All optimal scores and their key metric performances were summarized in Table 2. Some tumor types demonstrated generally higher AUCs, suggesting that their outcomes were more effectively predicted by TME-related changes. For example, in the CLB-IHN cohort including patients with HNSCC, nearly half of the scores achieved AUCs above 0.6 (Fig. 5a). Conversely, in the ccRCC cohort, most scores were just as good as random prediction (Supplementary Data 1). Among these individual cohorts, some scores demonstrated superior prediction power. In the metastatic UC cohort UC-II, both TIP Hot and IFNγ scores achieved high predictive performance (Fig. 5b, TIP Hot: AUC = 0.6829, IFNγ: AUC = 0.6701). Besides, only one score exhibited strong performance in certain individual cohorts, respectively IS_immune in UNC-108 (Fig. 5c, AUC = 0.7529), CS Polarity in MELANOMA-II (Fig. 5d, AUC = 0.6608), TGFβ in NIVOBIO (Supplementary Data 1, AUC = 0.6622), and CYT1 in CLB-IHN (Fig. 5a, AUC = 0.6534). In addition, no score demonstrated outstanding performance in ccRCC. Furthermore, Supplementary Table 2 indicated that, for the IS_stromal and the TIDE methods on the ccRCC cohort, the Spearman correlations between the scores and ICI response were close to zero and the score distributions of responders and non-responders did not differ significantly (high p-values in the Mann-Whitney test). Because the AUC is a sample-based estimate, such near-zero, non-significant associations are still compatible with empirical AUC values slightly below 0.5.
Fig. 5. Illustration of TME score performance in individual cohorts.
a–e AUCs in ICI response prediction in individual cohorts. For scores proven to be negatively associated with treatment response, we used their negative values to compute the AUC, ensuring that higher AUCs consistently reflect better predictive ability. Only scores with AUC exceeding 0.5 are shown. The x-axis is TME score and the y-axis is AUC. f Volcano plot of the Mann-Whitney test results in each cohort. The x-axis is the AUC predicting binary ICI response. And the y-axis is the negative log10-based transformed p-value from Mann-Whitney U test comparing score distribution between responders and non-responders. Significant scatters are colored by cohort with score names annotated aside. Non-significant scatters are in gray color. The horizontal red dashed line represents the level of p-value = 0.1, and the vertical red dashed line denotes the level of random prediction. Scatters inside the gray dashed box are zoomed to have a clearer visualization.
We identified the optimal scores for each individual cancer cohort based on key metrics in ROC analysis, score distribution, and survival analysis (Table 2). Multiple scores significantly differentiated responders’ score distribution from non-responders’ distribution (Fig. 5f, Supplementary Table 2). Notably, significant score response associations were more common in CLB-IHN and NSCLC-II cohorts, aligning with the higher number of predictive scores identified in the ROC analysis (Fig. 5a, e). However, no score showed an AUC exceeding 0.65 in either cohort, indicating that none of the scores provided strong predictive performance for ICI response in these two cohorts. Among the associated scores in the HNSCC cohort CLB-IHN, CYT1 (Mann-Whitney U test p-value = 0.0172) and SIA (p-value = 0.0160) showed the greatest differences in ICI response types. These two scores also significantly predicted post-ICI survival under the same cohort (Fig. 6a, b, CYT1: Log-rank test p-value = 0.0330; SIA: p-value = 0.0688). Given their strong associations with both ICI response and survival, CYT1 and SIA demonstrated promising predictive power for HNSCC. In NSCLC-II and MELANOMA-II cohorts, CS Polarity outperformed others in distinguishing responders’ score distributions (Fig 5f, NSCLC-II: p-value = 0.0149; MELANOMA-II: p-value = 0.093). Specifically, it was found significantly associated with survival in MELANOMA-II (Fig. 6f, log-rank test p-value = 0.0332), highlighting it as the optimal score for metastatic melanoma. Regarding UC, IS_immune demonstrated the most significant score difference between responders and non-responders in UNC-108 (Fig. 5f, p-value = 0.0001), and was positively associated with favorable survival (Fig. 6c, p-value = 0.0005). TIP Hot exhibited the strongest performance in UC-II (Fig. 5f, p-value = 0.0449), and patients with higher TIP Hot were significantly associated with better survival (Fig. 6i, p-value = 0.077). These results supported TIP Hot and IS_immune as leading predictive indicators in UC-II and UNC-108, respectively. TGFβ was uniquely associated with responders in the metastatic NSCLC cohort NIVOBIO (Fig. 5f, p-value = 0.0212), and was negatively related to survival (Fig. 6d, p-value = 0.0595).
Fig. 6. Kaplan-Meier survival curves of the optimal scores.
a–d Survival curves of the optimal scores in CLB-IHN, UNC-108, NIVOBIO, respectively. Patients with scores above the 75% quantile were classified into the H group, while those with scores below the 25% quantile were classified into the L group. Red curve represents the H group, and green curve represents the L group. The p-value of log-rank test comparing H and L groups, the total sample size of the cohort, and the number of patients in each group are annotated. e, f Survival curves of CS Polarity in its robust cohorts. The H and L groups are colored as black and yellow. g–i Survival curves of TIP Hot in its robust cohorts. The H and L groups are colored as blue and gray, respectively.
We performed univariate Cox-PH regression within responder and non-responder subgroups for NIVOBIO, NSCLC-II, CLB-IHN, and ccRCC, which had sufficient sample sizes for both subgroups. We identified several scores with high c-indices within these subgroups that exceeded the highest univariate c-index observed in the two aggregated ICI-treated cohorts MERGED-ICI and MELANOMA-I (Supplementary Data 3). For example in the two NSCLC cohorts, CYT2 reached a c-index of 0.6800 in the non-responder subgroup of NIVOBIO, while TIP Cold exhibited the highest c-index of 0.6786 in the responder subgroup of NSCLC-II. Similarly, TIDE showed a high c-index of 0.6703 in the responder subgroup of the HNSCC cohort CLB-IHN. However, we observed that none of these subgroup optimal scores overlapped with the optimal scores identified in the corresponding full cohorts. Instead, they demonstrated relatively poor predictive performance for ICI response. For instance, TIP Cold and IS_immune achieved the highest c-indices for responder and non-responder subgroups of NSCLC-II, respectively. However, they both had AUCs below 0.6 and were not ranked among the top-performing TME scores in this cohort (Fig. 5e). Similarly in CLB-IHN, although TIDE and B_cells yielded the highest c-indices for responders and non-responders respectively, their AUCs were suboptimal. Taken together with the subgroup survival analysis results from MELANOMA-I, these findings suggested that while certain TME scores may capture survival-relevant heterogeneity within different response subgroups, this heterogeneity may confound with effective prediction across the entire cohort and thereby limit their overall predictive performance.
We identified two outstanding TME scores showing consistently robust performance across multiple cohorts: TIP Hot and CS Polarity. Although they might not serve as the optimal scores, they remained robust in all evaluation metrics in certain cohorts. TIP Hot and CS Polarity have been identified as robust predictive scores for the UC-II and MELANOMA-II cohorts respectively, and both continued to demonstrate consistent performance across some other cohorts. As concluded previously, CS Polarity was also found optimal in the MERGED-ICI cohort. As for TIP Hot, it remained robust in the CLB-IHN and NSCLC-II cohorts. While its predictive power in NSCLC-II was moderate (Fig. 5e, AUC = 0.5978), it was the only score consistently exhibited robust behaviors across all evaluation metrics. Notably, it showed a significant association with ICI response (Fig. 5f, p-value = 0.0649) and survival outcome (Fig. 6h, p-value = 0.0110). Similarly, although TIP Hot did not outperform CYT1 and SIA in the CLB-IHN cohort, it achieved an AUC of 0.6222 (Fig. 5a), showed a significant difference in the distribution of scores between responders and non-responders (Fig. 5f, p-value = 0.0500), and was significantly associated with survival (Fig. 6g, p-value = 0.0122). These results indicated the potential of TIP Hot as a robust predictor for melanoma, NSCLC, and HNSCC.
TME scores performance in TCGA cohort
We used the TCGA cohort to evaluate the prognostic value of TME scores in relation to patients’ survival. IS_immune achieved the highest c-index in multivariate Cox-PH regression (Fig. 7a, c-index = 0.7101), representing the best survival prediction performance across all merged cohorts. In univariate regression, the IFNγ score yielded the optimal c-index and showed a negative association with hazard (Fig. 7a, c-index = 0.5825). While the average c-index for univariate regression (Supplementary Data 2, c-index = 0.5457) was comparable to that of MERGED-ICI and MELANOMA-I, the multivariate Cox-PH regression presented a better performance in TCGA cohort than the other two aggregated cohorts (Supplementary Data 2, c-index = 0.7000). This may be due to the significant contribution of tumor type and age to overall survival in the regression analysis (Fig. 3c-d, Supplementary Data 2), making the TCGA cohort the one with the greatest number of significant clinical covariates in multivariate regression. This finding highlighted the importance of clinical covariates in survival prognosis.
Fig. 7. Illustration of performance in the TCGA cohort.
a Forest plot of multivariate (dark) and univariate (light) Cox-PH regression on overall survival in TCGA. The x-axis is the coefficient of the regression. The y-axis is the TME scores. The vertical light gray dashed line shows the level of log of hazard ratio equals 0. The c-indices of both multivariate and univariate Cox-PH model are shown on the right side of the forest plot. b Volcano plot of CoxPH regression results in TCGA sub-cohorts. The x-axis denotes the log of the Hazard Ratios, y-axis denotes the negative log10-based transformed p-values. Non-significant scatters are in gray, and significant scatters are colored by TCGA sub-cohorts, with the name of TME score annotated aside. The horizontal red dashed line shows p-value = 0.1, and the vertical red dashed line shows hazard ratio equals 1.
The prognostic performance of TME scores varied across TCGA sub-cohorts. We included eight TCGA sub-cohorts in this part, respectively Breast invasive Carcinoma (BRCA), Bladder Urothelial Carcinoma (BLCA), Colon Adenocarcinoma (COAD), Colorectal Adenocarcinoma (COADREAD), Lung Squamous Cell Carcinoma (LUSC), Head-Neck Squamous Cell Carcinoma (HNSC), Kidney Renal Clear Cell Carcinoma (KIRC), and Skin Cutaneous Melanoma (SKCM). Two scores were significantly associated with overall survival in more than one cancer (Fig. 7b). TIDE demonstrated prognostic value in BLCA (Supplementary Table 3, Hazard Ratio (HR) = 13.8345, Cox-PH p-value = 0.0399) and SKCM (HR = 8.3127, p-value = 0.0008). CYT2 significantly predicted survival in two TCGA cancers: BLCA (HR = 0.1241, p-value = 0.0399) and SKCM (HR = 0.1868, p-value = 0.0318). In addition to these relatively robust prognostic scores, several others showed specific associations with some TCGA sub-cohorts. The IS_immune score showed the strongest association with survival in SKCM (HR = 0.0573, p-value = 0.0001). TIP Cold and TGFβ were the only two scores demonstrated associations in HNSC survival (TIP Cold: HR = 6.7923, p-value = 0.0674; TGFβ: HR = 7.4208, p-value = 0.1130).
Discussion
Numerous novel biomarkers have been identified for predicting ICI response, including those based on tumor and host genomics9. While many reviews have been published on TME scoring methods for deriving immune responsive biomarkers, they often focus on specific types of methods such as network-based graphical models or include a broad range of biomarker types63–67. In this paper, we presented a comprehensive meta-analysis reviewing TME scoring methods for predicting ICI therapy responses based on bulk RNA-seq data. Unlike previous reviews that compare methods specifically designed for ICI response, we included several methods which were initially developed for survival prognosis for traditional cancer treatments but have demonstrated their predictive power on ICI response. These methods, together with the other response-specific methods, have been applied in multiple clinical studies18,19,21,68,69. The TME scores have also been identified as important tools for cancer immunology, contributing to the discovery of novel biomarkers70–74. TIDE was applied to validate immune-related biomarkers such as POLE and PBRM1 co-mutation and estimate T cell exclusion signature75–77. Additionally, some TME scores have been evaluated using samples from clinical trials. For example, the performance of IMPRES was examined on a dataset comprising samples from multiple clinical trials39.
Despite the promising applications of these TME scores, our findings revealed substantial variability in their performance across different cancer types. No single score consistently demonstrated robust predictiveness across all cohorts, indicating the need for further development of universally applicable TME scores. Particularly, for TME scores originally developed for survival prognosis, their performance was limited, because we assumed that patients with better survival outcomes tend to be more responsive to ICI therapy and used this as a proxy in defining predicted responders. This assumption might not hold true across different datasets and tumor types, resulting in the inconsistent performance. However, note that the limited performance in predicting ICI response does not affect their effectiveness in survival prognosis, especially within the cancer types for which they were designed.
Although some scores showed inconsistent performance, we identified two robust TME scores across different data cohorts. The TIP Hot score showed potential as a robust score for NSCLC, HNSCC, and UC. These tumor types are usually associated with “hot” immunophenotypes and higher response rates to ICI78,79, aligning with our enrichment analysis for the TIP Hot signature. The enrichment analysis showed that pathways related to cold to hot tumor transformation such as chemokine-mediated signaling (Supplementary Data 4, adjusted p-value = 0, Odds Ratio (OR) = 191.6923) and inflammatory response (Supplementary Data 4, adjusted p-value = 0.0001, OR = 43.7212) ranked top among all biological pathways, further indicating the association between the TIP Hot score and hot tumors80,81. Notably, patients with high TIP Hot scores exhibited favorable responses to ICI therapy and improved survival outcomes. The mass screening used to retrieve TIP genes from relevant literature enhances the robustness of the score, as these genes repeatedly appear across studies targeting different tumor types and therapies. Importantly, the CS Polarity score was identified as a robust predictor for melanoma and the mixed cancer cohort MERGED-ICI. This score is defined as a simple ratio between only CXCL9 and SPP1 genes, which are the dominant markers of macrophages. Macrophages have been demonstrated as one of the two independent prognostic immune cells in the original study, and they have the property of memoryless phenotypic plasticity to different stimuli45,82. This biological flexibility of macrophages potentially contributes to CS Polarity’s robustness and predictive performance across certain cohorts. Besides the two robust scores, the IS_immune score demonstrated the strongest predictive performance for UNC-108, extending its application beyond melanoma. Its prognostic value has also been validated through the TCGA cohort. Although the ISTME method was only trained on non-ICI treated patient data, immune response-related enrichment terms ranked top and contributed to its ability to predict ICI response (Supplementary Data 4).
Furthermore, our analysis highlighted several findings in immune resistance that align with prior research, while also raising important questions about the robustness of existing TME scores. All scoring methods showed poor performance in predicting overall survival or ICI response for certain tumor types, for instance renal cancer. In the TCGA cohort, no score was significantly identified as prognostic for TCGA KIRC (Fig. 7b), which indicated an potential association of KIRC with low tumor T cell infiltration83. Additionally, while multiple scores exhibited differences in distribution between responders and non-responders in the ccRCC cohort (Fig. 5f), these associations were insufficient for accurate response prediction. The highest AUC was 0.6109 for B cells and all others falling below 0.5600, further demonstrating the link between immune resistance and TME scores’ survival prognosis ability irrespective of treatments (Supplementary Data 1). The poor performance on ccRCC ICI response prediction aligns with the association of renal cancer to “cold” tumor type, as cold tumors are typfigureically characterized by a lack of tumor antigens and insufficient T Cell co-stimulation and activation even when the antigen is present83. Among the top 20 enriched pathways for ccRCC responders, only “neutrophil mediated immunity” (Supplementary Table 4, false discovery rate (FDR) = 0), “neutrophil degranulation” (FDR = 0), and “neutrophil activation involved in immune response” (FDR = 0) were related to immune response, further demonstrating ccRCC’s association with cold tumors84. Conversely, our review also revealed consistent results for hot tumors. In cohorts NSCLC-II, CLB-IHN, UNC-108, and UC-II consisting hot tumor types NSCLC, HNSCC, and UC, all TME scores except CD8Treg achieved AUCs greater than 0.5 (Fig. 5a, c, e). Since CD8Treg was originally developed for survival prognosis in cold tumor-related ovarian cancer85, its limited predictive performance in hot tumors was expected. Additionally, responders in NSCLC-II exhibited a significantly higher proportion of immune-related pathways among the top 20 enriched terms compared to ccRCC (Supplementary Table 4). In NSCLC-II, the top-ranked immune-related pathways for responders were associated with innate immune response, immune cell migration, and tumor microenvironment structural or immune regulatory processes. Similarly, the hot tumor-related cohort MELANOMA-I showed a strong enrichment of immune response and immunity-related pathways among responders (Supplementary Table 4). For example, “regulation of immune response” (FDR = 0) and “cellular response to interferon-gamma” (FDR = 0) were both ranked within the top five enriched terms. This contrast further underscored the significant heterogeneity between hot and cold tumors. Therefore, therapeutic strategies transforming cold tumors into hot ones are important in enhancing the effectiveness of ICI therapy and improving TME scores86.
Nevertheless, our study presented some contradictory results in TME scoring methods’ applicability. Referring to the ROC analysis, all methods exhibited relatively low AUCs compared to their own analyses for merged ICI-treated cohorts (Figs. 2a, 4a). Specifically, while SIA was found to be significantly associated with melanoma20, its predictive utility for ICI response in MELANOMA-II was limited (Fig. 5d). Likewise, CYT2 only demonstrated favorable performance in MELANOMA-I cohort but failed to outperform random prediction in MELANOMA-II. For TLS, both the predictive performance in MELANOMA-II and the performance of survival analysis in MELANOMA-I were poor, despite TLS being validated to be effective in its original research41. This discrepancy may be due to an incomplete consideration of the broader immune context. For example, CYT2 primarily captures the cytolytic activities of cytotoxic T cells, which only function effectively when T cells are not in a dysfunctional state70. By overlooking this condition, CYT2 presented inconsistent performance across melanoma cohorts. In contrast, TIDE explicitly accounts for T cell dysfunction by incorporating a dysfunction signature for patients with low CTL levels, thus exhibiting a better performance. Similarly, while the TLS score emphasizes tertiary lymphoid structure, its positive association with immune response depends on the maturity of tertiary lymphoid structures87. Immature tertiary lymphoid structures contribute to immune heterogeneity, and the TLS score’s nine-gene signature may not fully capture this maturation, leading to inconsistent results. The SIA score, which includes only two immune cell markers, also lacks generalizability. Unlike the construction of the robust CS Polarity score where the prognostic relevance of macrophages was validated across multiple datasets, the SIA score’s immune cell selection was based on a single dataset, potentially limiting its robustness. These findings underscore that while TME scores targeting specific immune features may demonstrate predictive value under certain circumstances, their limited scope may fail to capture the complexity of immune responses, emphasizing the need for further refinement to ensure broad applicability and predictive power in heterogeneous immune response. Having larger signature also does not necessarily guarantee a good performance. For example, the NetBio model incorporates approximately 400 enriched pathways in a logistic regression model, which has a higher dimension than its training data. This high dimensionality can introduce multi-collinearity and possibly explains the suboptimal performance of NetBio in the review. While the B_cells score outperformed in ccRCC ICI response prediction, it did not achieve the expected performance in HNSCC36. The potential explanation lies in the sensitivity of immune cell deconvolution algorithms to missing genes. These algorithms typically require the full existence of all genes in each immune cell signature, whereas the CLB-IHN cohort contains a limited gene set, which may have impaired the score’s performance.
Significant challenges remain in identifying robust TME scores for ICI therapy, including tumoral heterogeneity, variability in host immunity, the complexity of tumor-immune cell interactions, and the evolution of cancer through treatment88. The inconsistencies in the predictive power of TME scores across different tumor types and datasets suggest that further development is needed to enhance their robustness. The limitations of methods differ based on the data used for constructing the method. For scoring methods using non-ICI-treated patient data, the scores may fail to capture the dynamic changes induced by ICI therapies, as immune responses are often transient89. In cancers such as renal cancer, where the TME is characterized by an immunosuppressive milieu that hinders immune cell function, these methods are less effective90. For methods that rely on ICI-treated patient data, a major challenge is the insufficient ICI datasets. Research has found that among 16 identified studies using genomics and transcriptomics data, the top three researched cancers are NSCLC, melanoma, and bladder cancer, collectively reaching 41% of the total studies discussed91. Therefore, extracting valuable information from limited training data remains a critical challenge.
Several strategies could potentially enhance the predictiveness and robustness of TME scoring methods. In survival analysis, incorporating clinical covariates alongside TME scores could account for potential confounders and improve accuracy. However, when predicting post-ICI survival, the clinical factors should be strictly screened to assure effective contribution. Constructing the gene signature from the whole gene set that maximizes the acquisition of immune activities might exhibit a more robust performance, which also requires novel and powerful dimension reduction methods. Therefore, incorporating novel methods such as DR-FS-MFMR and NMF-based Nonnegative Spatial Factorization (NSF) can better address for the sparsity issue in the gene expression data and efficiently decrease the dimension92,93. Furthermore, mathematical mechanistic models that are able to capture the dynamic changes in tumor progression could also be employed to decrease the confounding effect of baseline survival to immune response and better solve the problem of tumor heterogeneity94–100. For methods that utilize ICI-treated patient data, employing larger and more diverse training datasets could better capture variability across different cancer types. Additionally, employing semi-supervised learning approaches could enable models to leverage the extensive data available from non-ICI-treated patients, such as survival data, while also learning from ICI-treated cohorts. Excluding data from cold tumor-related immunotherapy-resistant cancer types such as pancreatic cancer, prostate cancer, and glioblastoma during model training could improve accuracy by focusing on more immunotherapy-responsive cancers101–103. Moreover, incorporating host-related biomarkers, including metastatic status, microbiome composition, and peripheral blood features, could provide a more complete understanding of patient response104–106. Interactions between genes and existing biomarkers could also be considered, which may provide more information about the dynamic changes in TME. For example, integrating immune cell-cell interactions and baseline tumor mutations into the models could also enhance predictive power by accounting for their influence on immunity and immune response107,108.
Ultimately, developing more robust TME scores capable of predicting immune responses across a broader spectrum of cancer types remains a complex but crucial objective. Successfully implementing TME scores that effectively correlate with survival and immune responses across multiple cancers could significantly advance research in cancer ICI therapy.
Methods
Data pre-processing
For the three aggregated cohorts, we used the following packages to correct batch effect: ComBat by inmoose Python package109, Harmony by harmonypy Python package58, and scBatch by scBatch R package59. We compared the effect of batch correction using these three methods and eventually used ComBat for all three merged cohorts since this method provided the highest mean entropy and a more random data distribution after correction (Supplementary Table 1). All gene expression data were first log2 transformed and then min-max normalized within each sample. For genes with multiple transcript values, we used the average of expression for that gene. The classification of responders and non-responders was consistent with the data cohorts used, where patients with complete response, partial response, and stable disease were classified as responders. We used the binary encoding to encode ICI response, where responders were encoded as 1 and 0 otherwise. Only samples with known survival or response status were considered.
TME scores computation
The TIDE and ISTME scores were obtained from Python TIDEpy and R ISTMEscore packages, respectively. The SIA score was computed by the CD8A/C1QA ratio, as this ratio demonstrated relevance for immunotherapy. We calculated the NetBio score as the odds ratio between pathway coefficients from the NetBio model and pathway normalized enrichment scores for each validation cohort. The other scores were computed using the method described in the original papers. Among them, the immune cell infiltration required by CD8Treg and B_cells were calculated using the Kassandra algorithm, as it demonstrated superior performance than other commonly used deconvolution methods34. The two TIP scores were computed as the averaged expression of genes in the signatures. The CS Polarity was computed by the ratio of genes CXCL9 and SPP1. The IMPRES score was the summation of the indicator function values comparing each immune checkpoint sets in IMPRES signature. The TGFβ score was calculated by the single sample Geneset Enrichment Analysis (ssGSEA) method110. And all the other scores were computed by the geometric means of expression in the gene signatures. Scores exceeding 1.5 times of the interquartile range were considered as outliers, and were removed before the analyses. In order to facilitate score comparison and visualization, we normalized each score by min-max normalization. Scores were used as continuous variables in Cox-PH regression for merged cohorts and analysis within responder and non-responder subgroups. However for the survival analysis in the entire individual cancer cohorts and TCGA sub-cohorts, samples were divided into groups based on quantiles. For smaller TCGA individual cancer datasets with sample size smaller than 100, we divided samples into only H and L groups by the median score. For all the other cohorts, including larger TCGA sub-cohorts and individual ICI-treated cohorts, samples were divided into H (score ≥ 75%), L (score ≤ 25%), and M (25% < score < 75%) groups.
Statistical testing
The Kaplan-Meier curves were used to estimate the survival outcomes between groups, and log-rank tests were implemented to compare the survival outcomes between H and L groups. In terms of comparing the score distributions of responders versus non-responders, we conducted the Mann-Whitney U test. Additionally, we conducted the univariate and multivariate Cox-PH regression with l1 ratio 0.9 and penalizer 0.0001 to measure the hazard ratios, 90% confidence intervals and their corresponding p-values in each score. In multivariate Cox-PH models, categorical variables with more than two levels were encoded by frequency encoding, and categorical variables with only two levels were encoded by binary encoding. For univariate Cox-PH regression within responders and non-responders, we only considered cohorts with sample size greater than 30 in both subgroups. To determine clinical covariates in the multivariate Cox-PH model, we applied Fisher’s exact tests to assessed survival rate differences across categorical covariates, and we conducted Mann-Whitney U tests for continuous covariates. We conducted ROC analysis to determine the AUC of each score, where responders were binary encoded as 1 and non-responders as 0.
In ROC analysis, higher scores indicate a greater likelihood of response, and the continuous scores are used to discriminate between responders and non-responders by varying the threshold. Given that TME scores are associated with ICI response in different directions, we applied transformations to negatively correlated scores to facilitate interpretability and comparability. For scores such as TIDE, IS_stromal, and TGFβ, which have been identified as negative predictors to ICI response, we transformed the score value by multiplying by negative one (i.e., − 1 × score) to account for the opposite directionality of association. After this adjustment, a higher AUC ideally indicates better predictive ability. It should be noted that this transformation was applied only for the ROC analysis. In other parts of this meta-analysis, we used the raw values of these scores. Importantly, the AUC transformation strictly follows the original publication of each TME score rather than the direction implied by each validation cohort. For TME scores developed to predict ICI response, we used their original reported direction of association. For scores developed for survival prognosis, we assumed that patients with better survival tend to be more responsive to ICI therapy. Accordingly, if a high (or low) score is associated with better survival outcome, we treat the score as positively (or negatively) correlated with ICI response. Therefore, there are cases in which a score may appear negatively correlated with ICI response in a particular validation dataset, yet no transformation is applied, leading to an AUC less than 0.5. For instance, in the original publication of the CD8Treg score, Sato et al. reported that a higher CD8+ T cells and regulatory T cell ratio was associated with favorable survival35, so we adopted this positive association and did not apply any transformation. However, the positive correlation between CD8Treg and ICI response was not consistent across different validation cohorts. In UNC-108, a negative correlation was shown (Supplementary Table 2, Spearman correlation = −0.1909) and hence resulted in an AUC less than 0.5 (Fig. 5f).
The enrichment analysis on all gene-based score was performed using over-representation analysis implemented in the gseapy package with default settings111. We showed the top 20 significantly enriched pathways for gene-based scores in Supplementary Data 4. If the total number of significant pathways was less than 20, we showed all significant pathways. For responder-specific enrichment, preranked geneset enrichment analysis was conducted in gseapy using log fold-change values between responders and non-responders with 100 permutations. We only conducted enrichment analysis for responders in cohorts with full genome data: MERGED-ICI, MELANOMA-I, UNC-108, UC-II, NSCLC-II, and ccRCC. We considered enrichment terms with FDR less than 0.01 as significantly enriched. Similarly, we only included top 20 significantly enriched pathways in Supplementary Table 4 if more than 20 pathways were significant. All p-values in this meta-analysis were rounded to four decimal points. Except enrichment analysis, other statistical tests with p-values < 0.1 were considered statistical significance.
Supplementary information
Acknowledgements
Research reported in this publication was partially supported by the National Institute of General Medical Sciences of the National Institutes of Health under Award Number R35GM159993. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Author contributions
Q.Z. performed the analyses and wrote the codes and the paper; A.K. improved documentation consistency within the codes; and L.S. supervised the project; all authors reviewed and edited the manuscript before submission.
Data availability
For the MELANOMA-I cohort, data from Hugo et al. is available in GSE78220 under GEO database112. Data from Van Allen et al. and PUCH are available in GitHub https://github.com/vanallenlab/VanAllen_CTLA4_Science_RNASeq_TPM and https://github.com/xmuyulab/ims_gene_signature/tree/main, respectively. Data from Snyder et al. is available at cBioPortal under Melanoma (MSK, NEJM 2014). Regarding the individual data cohorts, UC-II, MELANOMA-II, CLB-IHN, NIVOBIO are available in the GEO database under GSE176307, GSE215868, GSE159067, and GSE161537, respectively. The other data are available in the supplementary materials from their original publications. The TCGA cancer genomic data and clinical data can be downloaded at Firehose113.
Code availability
The source codes for reproducing the results are available under GitHub repository [https://github.com/qiluzhou/TMEscore_review].
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
The online version contains supplementary material available at 10.1038/s41698-025-01221-z.
References
- 1.Bagchi, S., Yuan, R. & Engleman, E. G. Immune checkpoint inhibitors for the treatment of cancer: Clinical impact and mechanisms of response and resistance. Annu. Rev. Pathol.: Mechanisms Dis.16, 223–249 (2021). [DOI] [PubMed] [Google Scholar]
- 2.Fessas, P., Lee, H., Ikemizu, S. & Janowitz, T. A molecular and preclinical comparison of the pd-1–targeted t-cell checkpoint inhibitors nivolumab and pembrolizumab. Semin. Oncol.44, 136–140 (2017). [DOI] [PMC free article] [PubMed]
- 3.Banday, A. & Abdalla, M. Immune checkpoint inhibitors: Recent clinical advances and future prospects. Curr. Med. Chem.30, 3215−3237 (2022). [DOI] [PubMed]
- 4.Jenkins, R. W., Barbie, D. A. & Flaherty, K. T. Mechanisms of resistance to immune checkpoint inhibitors. Br. J. Cancer118, 9–16 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Topalian, S. L., Taube, J. M., Anders, R. A. & Pardoll, D. M. Mechanism-driven biomarkers to guide immune checkpoint blockade in cancer therapy. Nat. Rev. Cancer16, 275–287 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Schoenfeld, A. J. & Hellmann, M. D. Acquired resistance to immune checkpoint inhibitors. Cancer cell37, 443–455 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Johnson, D. B., Nebhan, C. A., Moslehi, J. J. & Balko, J. M. Immune-checkpoint inhibitors: long-term implications of toxicity. Nat. Rev. Clin. Oncol.19, 254–267 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Sharma, P., Hu-Lieskovan, S., Wargo, J. A. & Ribas, A. Primary, adaptive, and acquired resistance to cancer immunotherapy. Cell168, 707–723 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Havel, J. J., Chowell, D. & Chan, T. A. The evolving landscape of biomarkers for checkpoint inhibitor immunotherapy. Nat. Rev. Cancer19, 133–150 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Litchfield, K. et al. Meta-analysis of tumor-and t cell-intrinsic mechanisms of sensitization to checkpoint inhibition. Cell184, 596–614 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Addeo, A., Friedlaender, A., Banna, G. L. & Weiss, G. J. Tmb or not tmb as a biomarker: That is the question. Crit. Rev. Oncol./Hematol.163, 103374 (2021). [DOI] [PubMed] [Google Scholar]
- 12.McGrail, D. et al. High tumor mutation burden fails to predict immune checkpoint blockade response across all cancer types. Ann. Oncol.32, 661–672 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Strickler, J. H., Hanks, B. A. & Khasraw, M. Tumor mutational burden as a predictor of immunotherapy response: Is more always better? Clin. Cancer Res.27, 1236–1241 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Lei, Y., Li, X., Huang, Q., Zheng, X. & Liu, M. Progress and challenges of predictive biomarkers for immune checkpoint blockade. Front. Oncol.11, 617335 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Hinshaw, D. C. & Shevde, L. A. The Tumor Microenvironment Innately Modulates Cancer Progression. Cancer Res.79, 4557–4566 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Jiang, P. et al. Signatures of T cell dysfunction and exclusion predict cancer immunotherapy response. Nat. Med.24, 1550–1558 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Miao, Y.-R. et al. Immucellai: a unique method for comprehensive t-cell subsets abundance prediction and its application in cancer immunotherapy. Adv. Sci.7, 1902880 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Sammut, S.-J. et al. Multi-omic machine learning predictor of breast cancer therapy response. Nature601, 623–629 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Lapuente-Santana, Ó., van Genderen, M., Hilbers, P. A., Finotello, F. & Eduati, F. Interpretable systems biomarkers predict response to immune-checkpoint inhibitors. Patterns2, 100293 (2021). [DOI] [PMC free article] [PubMed]
- 20.Mezheyeuski, A. et al. An immune score reflecting pro- and anti-tumoural balance of tumour microenvironment has major prognostic impact and predicts immunotherapy response in solid cancers. EBioMedicine88, 104452 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Zeng, Z. et al. Immune and stromal scoring system associated with tumor microenvironment and prognosis: a gene-based multi-cancer analysis. J. Transl. Med19, 330 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Kong, J. et al. Network-based machine learning approach to predict immunotherapy response in cancer patients. Nat. Commun.13, 3703 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Trebeschi, S. et al. Predicting response to cancer immunotherapy using noninvasive radiomic biomarkers. Ann. Oncol.30, 998–1004 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Petitprez, F. et al. Quantitative analyses of the tumor microenvironment composition and orientation in the era of precision medicine. Front. Oncol.8, 390 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.O’Donnell, J. S., Long, G. V., Scolyer, R. A., Teng, M. W. & Smyth, M. J. Resistance to pd1/pdl1 checkpoint inhibition. Cancer Treat. Rev.52, 71–81 (2017). [DOI] [PubMed] [Google Scholar]
- 26.Farber, D. L., Yudanin, N. A. & Restifo, N. P. Human memory t cells: generation, compartmentalization and homeostasis. Nat. Rev. Immunol.14, 24–35 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Ribas, A. et al. Pd-1 blockade expands intratumoral memory t cells. Cancer Immunol. Res.4, 194–203 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Tran Janco, J. M., Lamichhane, P., Karyampudi, L. & Knutson, K. L. Tumor-infiltrating dendritic cells in cancer pathogenesis. J. Immunol.194, 2985–2991 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Wouters, M. C. A. & Nelson, B. H. Prognostic Significance of Tumor-Infiltrating B Cells and Plasma Cells in Human Cancer. Clin. Cancer Res.24, 6125–6135 (2018). [DOI] [PubMed] [Google Scholar]
- 30.Cassetta, L. & Kitamura, T. Targeting tumor-associated macrophages as a potential strategy to enhance the response to immune checkpoint inhibitors. Front. cell developmental Biol.6, 38 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Lee, Y. S. & Radford, K. J. Chapter three - the role of dendritic cells in cancer. In Lhuillier, C. & Galluzzi, L. (eds.) Immunobiology of Dendritic Cells Part A, vol. 348 of International Review of Cell and Molecular Biology, 123–178 (Academic Press, 2019). [DOI] [PubMed]
- 32.Nersesian, S. et al. NK cell infiltration is associated with improved overall survival in solid cancers: A systematic review and meta-analysis. Transl. Oncol.14, 100930 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Newman, A. M. et al. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat. Biotechnol.37, 773–782 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Zaitsev, A. et al. Precise reconstruction of the tme using bulk RNA-seq and a machine learning algorithm trained on artificial transcriptomes. Cancer Cell40, 879–894 (2022). [DOI] [PubMed] [Google Scholar]
- 35.Sato, E. et al. Intraepithelial CD8+ tumor-infiltrating lymphocytes and a high cd8+/regulatory t cell ratio are associated with favorable prognosis in ovarian cancer. Proc. Natl Acad. Sci.102, 18538–18543 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Chang, T.-G. et al. Tumor and blood B-cell abundance outperforms established immune checkpoint blockade response prediction signatures in head and neck cancer. Ann. Oncol.36, 309–320 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Wang, H. et al. Tumor immunological phenotype signature-based high-throughput screening for the discovery of combination immunotherapy compounds. Sci. Adv.7, eabd7851 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Rooney, M. S., Shukla, S. A., Wu, C. J., Getz, G. & Hacohen, N. Molecular and genetic properties of tumors associated with local immune cytolytic activity. Cell160, 48–61 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Auslander, N. et al. Robust prediction of response to immune checkpoint blockade therapy in metastatic melanoma. Nat. Med.24, 1545–1549 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Damotte, D. et al. The tumor inflammation signature (TIS) is associated with anti-PD-1 treatment benefit in the Certim pan-cancer cohort. J. Transl. Med.17, 1–10 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Cabrita, R. et al. Tertiary lymphoid structures improve immunotherapy and survival in melanoma. Nature577, 561–565 (2020). [DOI] [PubMed] [Google Scholar]
- 42.Ayers, M. et al. Ifn-γ–related mRNA profile predicts clinical response to PD-1 blockade. J. Clin. Investig.127, 2930–2940 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Ni, Y. et al. High tgf-β signature predicts immunotherapy resistance in gynecologic cancer patients treated with immune checkpoint inhibition. NPJ Precis. Oncol.5, 101 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Page, L., Brin, S., Motwani, R. & Winograd, T. The pagerank citation ranking: Bringing order to the web. Technical Report 1999-66, Stanford InfoLab http://ilpubs.stanford.edu:8090/422/ (1999).
- 45.Bill, R. et al. Cxcl9: Spp1 macrophage polarity identifies a network of cellular programs that control human cancers. Science381, 515–524 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Jiang, P. et al. Big data in basic and translational cancer research. Nat. Rev. Cancer22, 625–639 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Snyder, A. et al. Genetic basis for clinical response to CTLA-4 blockade in melanoma. N. Engl. J. Med.371, 2189–2199 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Van Allen, E. M. et al. Genomic correlates of response to CTLA-4 blockade in metastatic melanoma. Science350, 207–211 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Hugo, W. et al. Genomic and transcriptomic features of response to anti-PD-1 therapy in metastatic melanoma. Cell165, 35–44 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Cui, C. et al. Ratio of the interferon-γ signature to the immunosuppression signature predicts anti-PD-1 therapy response in melanoma. NPJ Genom. Med.6, 7 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Braun, D. A. et al. Interplay of somatic alterations and immune infiltration modulates response to PD-1 blockade in advanced clear cell renal cell carcinoma. Nat. Med.26, 909–918 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Damrauer, J. S. et al. Collaborative study from the bladder cancer advocacy network for the genomic analysis of metastatic urothelial cancer. Nat. Commun.13, 6658 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Rose, T. L. et al. Fibroblast growth factor receptor 3 alterations and response to immune checkpoint inhibition in metastatic urothelial cancer: a real world experience. Br. J. Cancer125, 1251–1260 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Ravi, A. et al. Genomic and transcriptomic analysis of checkpoint blockade response in advanced non-small cell lung cancer. Nat. Genet.55, 807–819 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Yang, H. et al. Identification of lactylation related model to predict prognostic, tumor infiltrating immunocytes and response of immunotherapy in gastric cancer. Front. Immunol.14, 1149989 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Weinstein, J. N. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet45, 1113–1120 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Leek, J. T., Johnson, W. E., Parker, H. S., Jaffe, A. E. & Storey, J. D. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics28, 882–883 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with harmony. Nat. methods16, 1289–1296 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Fei, T. & Yu, T. scbatch: batch-effect correction of RNA-seq data through sample distance matrix adjustment. Bioinformatics36, 3115–3123 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Vathiotis, I. A. et al. Baseline gene expression profiling determines long-term benefit to programmed cell death protein 1 axis blockade. NPJ Precis. Oncol.6, 92 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Foy, J.-P. et al. Datasets for gene expression profiles of head and neck squamous cell carcinoma and lung cancer treated or not by pd1/pd-l1 inhibitors. Data Brief.44, 108556 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Harrell, F. E., Califf, R. M., Pryor, D. B., Lee, K. L. & Rosati, R. A. Evaluating the Yield of Medical Tests. JAMA247, 2543–2546 (1982). [PubMed] [Google Scholar]
- 63.Hong, M. et al. Rna sequencing: new technologies and applications in cancer research. J. Hematol. Oncol.13, 1–16 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Zhang, N. et al. Machine learning-based identification of tumor-infiltrating immune cell-associated lncrnas for improving outcomes and immunotherapy responses in patients with low-grade glioma. Theranostics12, 5931 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Li, T. et al. Artificial intelligence in cancer immunotherapy: applications in neoantigen recognition, antibody design and immunotherapy response prediction. In Seminars in Cancer Biology, vol. 91, 50–69 (Elsevier, 2023). [DOI] [PubMed]
- 66.Addala, V. et al. Computational immunogenomic approaches to predict response to cancer immunotherapies. Nat. Rev. Clin. Oncol.21, 28–46 (2024). [DOI] [PubMed] [Google Scholar]
- 67.Liu, Y. et al. Predicting patient outcomes after treatment with immune checkpoint blockade: A review of biomarkers derived from diverse data modalities. Cell Genomics4, 100444 (2024). [DOI] [PMC free article] [PubMed]
- 68.Dorjkhorloo, G. et al. Prognostic value of a modified-immune scoring system in patients with pathological t4 colorectal cancer. Oncol. Lett.27, 104 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Li, X. et al. A muti-modal feature fusion method based on deep learning for predicting immunotherapy response. J. Theor. Biol.586, 111816 (2024). [DOI] [PubMed] [Google Scholar]
- 70.Wherry, E. J. & Kurachi, M. Molecular and cellular insights into t cell exhaustion. Nat. Rev. Immunol.15, 486–499 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Song, C. et al. A prognostic nomogram combining immune-related gene signature and clinical factors predicts survival in patients with lung adenocarcinoma. Front. Oncol.10, 1300 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Backman, M. et al. Spatial immunophenotyping of the tumour microenvironment in non–small cell lung cancer. Eur. J. Cancer185, 40–52 (2023). [DOI] [PubMed] [Google Scholar]
- 73.Yuan, J. & Yu, S. Comprehensive analysis reveals prognostic and therapeutic immunity-related biomarkers for pediatric metastatic osteosarcoma. Medicina60, 95 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Pan, J.-W. et al. The molecular landscape of asian breast cancers reveals clinically relevant population-specific differences. Nat. Commun.11, 6433 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Vos, J. L. et al. Nivolumab plus ipilimumab in advanced salivary gland cancer: a phase 2 trial. Nat. Med.29, 3077–3089 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Jin, Y. et al. A phase II clinical trial of toripalimab in advanced solid tumors with polymerase epsilon/polymerase delta (pole/pold1) mutation. Signal Transduct. Target. Ther.9, 227 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Cheng, B. et al. The activity and immune dynamics of PD-1 inhibition on high-risk pulmonary ground glass opacity lesions: insights from a single-arm, phase ii trial. Signal Transduct. Target. Ther.9, 93 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Maleki Vareki, S. High and low mutational burden tumors versus immunologically hot and cold tumors and response to immune checkpoint inhibitors. J. Immunother. cancer6, 157 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Hodgson, A., Liu, S. K., Vesprini, D., Xu, B. & Downes, M. R. Basal-subtype bladder tumours show a ‘hot’immunophenotype. Histopathology73, 748–757 (2018). [DOI] [PubMed] [Google Scholar]
- 80.Karin, N. Chemokines in the landscape of cancer immunotherapy: how they and their receptors can be used to turn cold tumors into hot ones? Cancers13, 6317 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Prendergast, G. C., Mondal, A., Dey, S., Laury-Kleintop, L. D. & Muller, A. J. Inflammatory reprogramming with ido1 inhibitors: turning immunologically unresponsive ‘cold’tumors ‘hot’. Trends Cancer4, 38–58 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Liu, S. X., Gustafson, H. H., Jackson, D. L., Pun, S. H. & Trapnell, C. Trajectory analysis quantifies transcriptional plasticity during macrophage polarization. Sci. Rep.10, 12273 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Bonaventura, P. et al. Cold tumors: a therapeutic challenge for immunotherapy. Front. Immunol.10, 168 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Kobayashi, S. D., Voyich, J. M., Burlak, C. & DeLeo, F. R. Neutrophils in the innate immune response. Archivum Immunol. Et. Therapiae Experimentalis-Engl. Ed.-53, 505 (2005). [PubMed] [Google Scholar]
- 85.Wu, J. W. et al. T-cell receptor therapy in the treatment of ovarian cancer: A mini review. Front. Immunol.12, 672502 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Liu, Y.-T. & Sun, Z.-J. Turning cold tumors into hot tumors by improving t-cell infiltration. Theranostics11, 5365 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Chen, Y., Wu, Y., Yan, G. & Zhang, G. Tertiary lymphoid structures in cancer: maturation and induction. Front. Immunol.15, 1369626 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.McKean, W. B., Moser, J. C., Rimm, D. & Hu-Lieskovan, S. Biomarkers in precision cancer immunotherapy: Promise and challenges. In American Society of Clinical Oncology Educational book. American Society of Clinical Oncology. Annual Meeting, vol. 40, e275–e291 (2020). [DOI] [PubMed]
- 89.Ratnam, N. M., Frederico, S. C., Gonzalez, J. A. & Gilbert, M. R. Clinical correlates for immune checkpoint therapy: significance for CNS malignancies. Neuro-Oncol. Adv.3, vdaa161 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Monjaras-Avila, C. U. et al. The tumor immune microenvironment in clear cell renal cell carcinoma. Int. J. Mol. Sci.24, 7946 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Prelaj, A. et al. Artificial intelligence for predictive biomarker discovery in immuno-oncology: a systematic review. Ann. Oncol.35, 29−65 (2023). [DOI] [PubMed]
- 92.Saberi-Movahed, F. et al. Dual regularized unsupervised feature selection based on matrix factorization and minimum redundancy with application in gene selection. Knowl.-Based Syst.256, 109884 (2022). [Google Scholar]
- 93.Townes, F. W. & Engelhardt, B. E. Nonnegative spatial factorization applied to spatial genomics. Nat. methods20, 229–238 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Yin, A., Moes, D. J. A., van Hasselt, J. G., Swen, J. J. & Guchelaar, H.-J. A review of mathematical models for tumor dynamics and treatment resistance evolution of solid tumors. CPT: Pharmacomet. Syst. Pharmacol.8, 720–737 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Sofia, D., Zhou, Q. & Shahriyari, L. Mathematical and machine learning models of renal cell carcinoma: A review. Bioengineering10, 1320 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Budithi, A., Su, S., Kirshtein, A. & Shahriyari, L. Data driven mathematical model of folfiri treatment for colon cancer. Cancers13, 2632 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Mohammad Mirzaei, N. et al. A Mathematical Model of Breast Tumor Progression Based on Immune Infiltration. J. Personalized Med.11, 1031 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98.Mirzaei, N. M. et al. A PDE model of breast tumor progression in MMTV-Pymt mice. J. Personalized Med.12, 807 (2022). [DOI] [PMC free article] [PubMed]
- 99.Mirzaei, N. M., Hao, W. & Shahriyari, L. Investigating the spatial interaction of immune cells in colon cancer. iScience26, 106596 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100.Hu, Y., Mirzaei, N. M. & Shahriyari, L. Bio-mechanical model of osteosarcoma tumor microenvironment: A porous media approach. Cancers14, 6143 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.Jackson, C. M., Choi, J. & Lim, M. Mechanisms of immunotherapy resistance: lessons from glioblastoma. Nat. Immunol.20, 1100–1109 (2019). [DOI] [PubMed] [Google Scholar]
- 102.Bear, A. S., Vonderheide, R. H. & O’Hara, M. H. Challenges and opportunities for pancreatic cancer immunotherapy. Cancer cell38, 788–802 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 103.Cha, H.-R., Lee, J. H. & Ponnazhagan, S. Revisiting immunotherapy: a focus on prostate cancer. Cancer Res.80, 1615–1623 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 104.Rosellini, M. et al. Prognostic and predictive biomarkers for immunotherapy in advanced renal cell carcinoma. Nat. Rev. Urol.20, 133–157 (2023). [DOI] [PubMed] [Google Scholar]
- 105.Kerekes, D. M. et al. Immunotherapy initiation at the end of life in patients with metastatic cancer in the us. JAMA Oncol.10, 342–351 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 106.Yoo, S.-K. et al. Prediction of checkpoint inhibitor immunotherapy efficacy for cancer using routine blood tests and clinical data. Nat. Med.31, 869−880 (2025). [DOI] [PMC free article] [PubMed]
- 107.Wang, X. Q. et al. Spatial predictors of immunotherapy response in triple-negative breast cancer. Nature621, 868–876 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 108.Niknafs, N. et al. Persistent mutation burden drives sustained anti-tumor immune responses. Nat. Med.29, 440–449 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 109.Behdenna, A. et al. pycombat, a python tool for batch effects correction in high-throughput molecular data using empirical Bayes methods. BMC Bioinforma.24, 459 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 110.Barbie, D. A. et al. Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1. Nature462, 108–112 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 111.Fang, Z., Liu, X. & Peltz, G. Gseapy: a comprehensive package for performing gene set enrichment analysis in python. Bioinformatics39, btac757 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 112.Barrett, T. et al. NCBI GEO: archive for functional genomics data sets–update. Nucleic Acids Res.41, D991–995 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 113.Deng, M., Brägelmann, J., Kryukov, I., Saraiva-Agostinho, N. & Perner, S. FirebrowseR: an R client to the Broad Institute’s Firehose Pipeline. Database2017, baw160 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
For the MELANOMA-I cohort, data from Hugo et al. is available in GSE78220 under GEO database112. Data from Van Allen et al. and PUCH are available in GitHub https://github.com/vanallenlab/VanAllen_CTLA4_Science_RNASeq_TPM and https://github.com/xmuyulab/ims_gene_signature/tree/main, respectively. Data from Snyder et al. is available at cBioPortal under Melanoma (MSK, NEJM 2014). Regarding the individual data cohorts, UC-II, MELANOMA-II, CLB-IHN, NIVOBIO are available in the GEO database under GSE176307, GSE215868, GSE159067, and GSE161537, respectively. The other data are available in the supplementary materials from their original publications. The TCGA cancer genomic data and clinical data can be downloaded at Firehose113.
The source codes for reproducing the results are available under GitHub repository [https://github.com/qiluzhou/TMEscore_review].







