Automated quantification of tumor-infiltrating lymphocytes by machine learning reveals prognostic and immunogenomic features in lung cancer

Ang Li; Yutao Pang; Hongfei Zhang; Dong Wu; Liyao Lin; Zhan He; Zhu Liang; Jie Chen; Fasheng Li

doi:10.1038/s41598-026-37076-y

. 2026 Feb 2;16:7006. doi: 10.1038/s41598-026-37076-y

Automated quantification of tumor-infiltrating lymphocytes by machine learning reveals prognostic and immunogenomic features in lung cancer

Ang Li ¹, Yutao Pang ¹, Hongfei Zhang ¹, Dong Wu ¹, Liyao Lin ¹, Zhan He ¹, Zhu Liang ¹, Jie Chen ^1,^✉, Fasheng Li ^1,^✉

PMCID: PMC12920713 PMID: 41629429

Abstract

Tumor-infiltrating lymphocytes (TILs) are key components of the tumor microenvironment (TME) and are recognized as prognostic and predictive biomarkers in non-small cell lung cancer (NSCLC). However, manual TIL assessment on hematoxylin and eosin (H&E)-stained slides is subjective and poorly reproducible. This study aimed to develop and validate an automated, machine learning–based framework for TIL quantification and explore its associations with immunogenomic features and patient outcomes. H&E-stained slides and transcriptomic, genomic, and clinical data from lung adenocarcinoma patients were retrieved from The Cancer Genome Atlas (TCGA). An automated TIL quantification pipeline was built in QuPath (v0.5.1) with stain normalization, watershed cell segmentation, and a supervised cell classifier to identify tumour cells, stromal cells, and TILs. In a separate step, a random forest model based on aggregated Haralick texture features and tumour stage was trained to classify patients into high- and low-TIL subgroups. TIL density cut-offs were defined by maximally selected rank statistics. Survival was analyzed via the Kaplan–Meier method and Cox regression. ssGSEA, ESTIMATE, GSVA, and WGCNA were applied to characterize immune infiltration and transcriptomic modules. Somatic mutations were compared between groups, and drug sensitivity was predicted via GDSC-derived ridge regression models. Model performance was evaluated via 10-fold cross-validation with SMOTE oversampling. Automated quantification achieved high concordance with the results of the pathologist review and RNA-seq inference. An optimal TIL cut-off of 135 cells/mm² was used to stratify patients into high- and low-density groups. High-TIL tumors were enriched for adaptive immune infiltration, antigen presentation, and TCR signaling, and exhibited greater mutational diversity, whereas low-TIL tumors were enriched in ribosome biogenesis and protein translation pathways. Prognostically, high-TIL density was associated with improved overall survival (HR=0.48, 95% CI: 0.29–0.79; P = 0.004). The predicted IC50 values did not differ for standard chemotherapies but varied for the selected compounds. The Haralick-based classification model achieved an AUC of 0.87 (95% CI 0.835–0.901) in internal cross-validation, which improved to 0.892 (95% CI 0.848–0.913) when tumour stage was incorporated. This study demonstrated that automated TIL quantification is feasible and prognostically relevant in lung cancer and may provide a hypothesis-generating marker of immune activation for future immunotherapy studies; however, direct validation in immunotherapy-treated cohorts is required before clinical implementation.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-026-37076-y.

Keywords: Tumor-infiltrating lymphocytes, Lung adenocarcinoma, Machine learning, Digital pathology, Immune microenvironment

Subject terms: Biomarkers, Cancer, Computational biology and bioinformatics, Oncology

Introduction

Lung cancer remains one of the most common malignancies worldwide and continues to be the leading cause of cancer-related death¹. Over the past decade, increasing evidence has underscored the pivotal role of the tumor microenvironment (TME) in the initiation and progression of primary lung cancer. The TME encompasses both cellular and acellular components—such as stromal and immune cells, soluble signaling molecules, and the extracellular matrix—which interact in complex ways to shape tumor growth, metastatic potential, and responsiveness to therapeutic interventions².

Among emerging therapeutic strategies, immune checkpoint blockade targeting the programmed cell death protein 1/programmed death-ligand 1 (PD-1/PD-L1) pathway has reshaped the treatment landscape for advanced non-small cell lung cancer (NSCLC). These agents have achieved durable responses and extended survival in a subset of patients^3–6. Nevertheless, a considerable proportion of patients fail to benefit from these therapies^7–11.

Tumor-infiltrating lymphocytes (TILs) have been identified as both prognostic indicators and predictive biomarkers of immunotherapy efficacy in patients with NSCLC^12,13. At present, the standard approach for evaluating TIL relies on manual assessment of hematoxylin and eosin (H&E)-stained slides by pathologists—a method prone to observer bias and limited reproducibility¹⁴. In recent years, the integration of machine learning and deep learning into digital pathology has gained momentum, particularly for the automated detection and quantification of TILs¹⁵.

Deep learning architectures, including convolutional neural networks (CNNs) and fully convolutional networks (FCNs), excel at image processing and feature extraction¹⁶. These capabilities make them particularly suitable for automated TIL identification and analysis. Building on this progress^17,18, we developed an automated digital pathology pipeline for TIL quantification in H&E slides and a complementary Haralick texture + stage classifier for TIL-based risk stratification in lung adenocarcinoma.

Methods

Patients and pathological materials

The TCGA-LUAD datasets analysed in this study are publicly available via the NCI Genomic Data Commons portal (https://portal.gdc.cancer.gov/ )¹⁹. The full list of TCGA case barcodes and image identifiers included in the present analysis is provided in Supplementary Table S1. Processed data matrices and analysis scripts are available from the corresponding author on reasonable request. From an initial set of 1,067 H&E-stained whole-slide images, we excluded normal tissues, nonprimary lung cancers, duplicate sections, and slides with motion artifacts or blur. This filtering yielded 304 high-quality images for subsequent analysis. All analyses used deidentified public data in compliance with the GDC Data Use Policy; institutional review board approval and informed consent were not needed.

In addition, we assembled an independent external validation cohort from our institution between 2019-2020 using the same inclusion and exclusion criteria as for the TCGA dataset. Consecutive patients with pathologically confirmed lung adenocarcinoma who underwent curative-intent surgical resection were screened. Only postoperative H&E-stained whole-slide sections from the resection specimens were included, and patients who had received any preoperative antitumour therapy (including chemotherapy, radiotherapy, targeted therapy, or immunotherapy) were excluded. After excluding slides with insufficient tumour tissue, poor staining quality, or severe scanning artefacts, 93 cases were retained for external validation. The study protocol for the external cohort was approved by the institutional ethics committee, and the requirement for individual informed consent was waived owing to the retrospective design and use of anonymized data.

Quantification of TIL via machine learning

Whole-slide H&E images from resection samples were analyzed via a supervised machine learning workflow in QuPath v0.5.1²⁰. The analysis regions were delineated by two board-certified pathologists to encompass the tumor nests and reactive/desmoplastic stroma, explicitly excluding necrotic or artifact-affected areas; disagreements were resolved by consensus.

Cell detection parameters (including pixel size, background radius, median filter radius, and watershed settings) were initialized from QuPath defaults and then iteratively tuned on 10 representative slides spanning the range of staining intensities. Once a visually satisfactory segmentation agreed upon by two pathologists was achieved, the same parameter set was fixed and applied unchanged to all remaining slides to ensure reproducibility across the cohort.

TIL levels were categorized according to an optimal cut-off identified in the training set via maximally selected rank statistics with Hothorn’s correction via the survminer package in R (v4.4.2)^21,22. The training set cut-off was locked and directly applied to all slides. Cases with TIL density equal to the cut-off were assigned to the high-TIL group (TIL density ≥ cut-off) to avoid ambiguity in group definition.

Data preprocessing and stratification strategy

Cell detection and Haralick texture feature extraction were performed in QuPath, and the results were exported for downstream analysis²³. TIL, tumor cell, and stromal cell densities (cells/mm²) were computed. Using the caret package, cases were randomly assigned to training and validation cohorts at a 7:3 ratio by patient ID, with stratified sampling to preserve the distribution of TIL density across subsets²⁴. The optimal TIL cut-off derived from the training set was applied to categorize patients into high- TIL or low-TIL groups.

Descriptive statistical analysis

Descriptive statistics—including the mean, median, interquartile range (IQR), maximum, minimum, Q1, and Q3—were calculated via the dplyr package, with all values reported to one decimal place²⁵.

Intraclass consistency analysis (ICC)

Within the training set, tumor cell proportion estimates obtained via automated detection, manual counting, and transcriptomic inference were compared. Intraclass correlation coefficients were computed via the psych package, with 95% confidence intervals provided²⁶.

Survival analysis

The optimal cut-off value for TIL density was determined via the log-rank test²⁷. Kaplan–Meier survival curves were then generated to compare overall survival between the high- and low-TIL groups²⁸. Hazard ratios (HRs) and 95% confidence intervals (CIs) were estimated via univariable Cox proportional hazards models, without the inclusion of additional covariates²⁹.

Immune infiltration analysis

Immune cell subset enrichment scores were calculated via single-sample GSEA (ssGSEA) implemented in the gene set variation analysis (GSVA) package³⁰, with gene sets from MSigDB (c2.cp.v2024.1.Hs.symbols.gmt). Differential enrichment was assessed with limma, and visualizations were produced with ggplot2³¹. Spearman correlations among immune cell subsets were plotted as heatmaps via a corr plot. ESTIMATE score were calculated with the estimate package, and group differences were tested via the Wilcoxon rank-sum method³².

Pathway enrichment and coexpression network analysis

Gene ontology enrichment was conducted with GO sets from MSigDB. Coexpression modules were identified via weighted gene coexpression network analysis (WGCNA)³³, and their eigengenes were correlated with the TIL category via Pearson’s method. Modules with FDR p < 0.05 were subjected to functional annotation via clusterProfiler and enrichplot. Protein–protein interaction networks from STRING (https://string-db.org/) were visualized via Cytoscape (https://cytoscape.org/).

Somatic mutation analysis

Somatic mutation profiles from TCGA MAF files were processed with the maftools package³⁴.

Drug sensitivity and immune score comparisons

Drug response was estimated via OncoPredict (ridge regression), comparing IC50 values between TIL groups with nonparametric tests³⁵. The cytolytic (CYT) score was calculated as the geometric mean of PRF1 and GZMA expression³⁶.

Classification model construction and validation

After Haralick feature extraction, a classification model was developed via randomForest³⁷. The class imbalance was mitigated with SMOTE and adjusted class weights. SMOTE and class weighting were applied within each training fold to avoid information leakage. The top 10 features ranked by the Gini index formed the final model. Tenfold cross-validation was used to assess model performance, reporting AUC, sensitivity, and specificity. ROC curves were generated with pROC curves, and 95% CIs were calculated. In addition, we generated precision–recall (PR) curves and calculated the PR-AUC to account for class imbalance between high- and low-TIL cases. Overall calibration was assessed using Brier scores and bootstrap calibration curves (200 resamples), with predicted probabilities for the high-TIL class on the x-axis and the observed event frequencies on the y-axis. Confusion matrices were derived by applying the Youden index to determine the optimal probability threshold for classifying slides as high versus low TIL.

All the statistical analyses were performed in R (v4.4.2). For multiple comparisons, FDR–adjusted P values < 0.05 were considered significant; for primary survival analyses, P < 0.05 was considered significant.

Results

Development of an automated TIL quantification pipeline

We established an automated TIL quantification pipeline using the open-source QuPath platform (v0.5.1). Given the interslide variability in H&E staining intensity, we applied QuPath’s estimate stain vectors function to normalize the stain color profiles for each whole-slide image. The cells were segmented via watershed cell detection on the basis of nuclear morphology³⁸ with the following parameters: detection image: hematoxylin OD; requested pixel size: 0.5 μm; background radius: 10 μm; median filter radius: 1 μm; sigma: 1.5 μm; minimum cell area: 10 μm²; maximum cell area: 500 μm²; threshold: 0.1; maximum background intensity: 2; and cell expansion: 2 μm. Smoothed object features (radius = 25 μm) and Haralick texture features (distance=1, level=32) were computed to enhance cell classification. Segmentation quality control was performed independently by two pathologists.

The overall computational workflow is summarized in Fig. 1. An initial classifier was trained and iteratively applied to the training set, with repeated review and correction of misclassified cells. The final locked model achieved consistent classification of tumor cells, TILs, and stromal cells, as confirmed by pathologists.

TIL and clinical outcomes

By applying the trained model, we quantified TIL, tumor, and stromal cell densities for each patient (Table 1). The median TIL density was 72.5 cells/mm² (mean: 145; maximum: 1,061), indicating marked intersample heterogeneity. The median tumor and stromal cell densities were 4,794.5 and 3,637.5 cells/mm², respectively. On average, TILs comprised 1.6% of the total cells (maximum: 12.8%), whereas tumor and stromal cells accounted for 54.6% and 43.8%, respectively, indicating generally low TIL prevalence.

Table 1.

Cell density and composition.

	Mean	Min	Median	Max	IQR	Q1
TIL cells/mm²	145	1	72.5	1061	157.8	32
Tumor cells/mm²	4873.6	408	4794.5	25355	2539	3399.8
Stroma cells/mm²	3831.5	676	3637.5	26104	1757.8	2779.2
TIL (%)	1.6	0	0.9	12.6	1.8	0.4
Tumor (%)	54.6	9.9	54.6	89.8	21.4	43.1
Stroma (%)	43.8	6.8	44	89.3	20.7	33.7
TSP (%)	44.5	7	44.7	90.1	21.8	34.5

Open in a new tab

Summary of TIL, tumor, and stromal cell densities (cells/mm²) and percentages, with mean, median, minimum, maximum, interquartile range (IQR), and Q1.

Within the training set, tumor cell proportions derived from the model showed moderate concordance with manual counts and transcriptomic inference, as assessed by intraclass correlation coefficients (ICCs) (Fig. 2 A-B). Bland–Altman plots further demonstrated small mean bias and acceptable 95% limits of agreement (Figure S1 F).

Fig. 2 — Survival analysis by TIL density. (A–B) Concordance of tumor cell estimates between automated quantification, manual counts, and RNA-seq inference. (C–E) Optimal TIL cutoffs identified in training, validation, and combined cohorts. (F–G) Kaplan–Meier curves of overall survival stratified by TIL density.

Patients were randomly split into training (n = 204) and validation (n = 100) sets (7:3 ratio) by patient ID. In the training set, the optimal TIL density cut-off identified via maximally selected rank statistics (log-rank) was 135 cells/mm² and was locked for validation (Fig. 2 C). Using this prespecified cut-off, the high-TIL group showed significantly better overall survival than the low-TIL group in both the training cohort (Cox HR=0.52, 95% CI 0.29–0.95; P=0.035) and the validation cohort (Cox HR=0.39, 95% CI 0.16–0.97; P=0.044) (Fig. 2F–G). Notably, the training set cut-off closely matched values derived independently in the validation set (110 cells/mm²) and combined cohort (124 cells/mm²) (Fig. 2 D-E).

Immune infiltration analysis

To characterize the immune microenvironment systematically, we quantified the ssGSEA scores of immune-related gene sets via transcriptomic data. The high-TIL group exhibited increased infiltration of multiple adaptive and antigen-presenting cell subsets, including activated B cells, activated CD8⁺ T cells, effector CD4⁺ T cells, and dendritic cells, with several differences that reached statistical significance (Fig. 3 A). A heatmap of more than 20 immune cell subsets revealed consistent upregulation of immune-related gene expression in the high-TIL subgroup (Fig. 3 B). Correlation analysis further demonstrated predominantly positive associations among immune cell subsets (Fig. 3 C).

Fig. 3 — Immune infiltration by TIL groups. (A) ssGSEA scores of immune cell subsets in high- and low-TIL tumors. (B) Heatmap of immune-related gene expression. (C) Correlation matrix of immune cell subsets. (D) ESTIMATE-derived stromal, immune, and composite scores.

The ESTIMATE algorithm, revealed that the immune score and the composite ESTIMATE score were significantly greater in the high-TIL subgroup than in the low-TIL subgroup (P < 0.05), whereas the stromal score did not differ significantly between the groups (Fig. 3 D).

Pathway enrichment and network analysis

GSVA revealed distinct functional divergence between TIL groups. The high-TIL group was significantly enriched in immune-related pathways, including antigen processing and presentation, T-cell receptor (TCR) signaling, NK cell–mediated cytotoxicity, and leukocyte migration, whereas the low-TIL group was enriched in protein synthesis–related pathways such as ribosome biogenesis, translation, and rRNA metabolism (Fig. 4 A).

Fig. 4 — Transcriptomic profiling by TIL status. (A) GSVA heatmap of enriched pathways. (B–C) WGCNA parameter selection and clustering of genes into modules. (D) Network heatmap of representative modules. (E) Module–trait correlations associated with TIL status.

To further investigate the transcriptional architecture associated with TIL status, we applied WGCNA to 4,000 candidate genes. After setting the optimal soft-thresholding power at 5 (scale-free topology R² > 0.9), hierarchical clustering was used to divide the genes into 26 coexpression modules (Fig. 4 B–E). Module–trait correlation analysis indicated that the magenta module was positively correlated with the high-TIL group, whereas the yellow module was significantly correlated with the low-TIL group (Fig. 5 A,C). Although the brown module showed the strongest correlation with high-TIL status and the light green module correlated most strongly with low-TIL status, further analyses—including functional annotation and gene significance (GS) versus module membership (MM) correlations—failed to yield meaningful biological interpretations (Fig. 5 B,D). Specifically, the brown module lacked specific biological content, and the light green module did not exhibit statistically significant GS–MM associations. Therefore, subsequent enrichment and network analyses focused on the magenta and yellow modules as the most biologically interpretable representatives for the high- and low-TIL groups, respectively.

Fig. 5 — Functional analysis of TIL-related modules. (A–D) Module–trait correlations and gene significance plots. (E–F) GO enrichment of magenta (high-TIL) and yellow (low-TIL) modules. (G–H) PPI networks of representative modules. (I–J) Somatic mutation landscapes of high- and low-TIL groups.

In the high-TIL group, 92 genes from the magenta module were successfully mapped to the STRING database, yielding 666 significant enrichments (FDR < 0.05). These genes were enriched in GO biological processes related to antigen presentation, T-cell activation, leukocyte adhesion, and T-cell receptor signaling (Fig. 5 E). Protein–protein interaction (PPI) network analysis revealed that CD8A, HLA family members, and B2M were central hubs that formed densely interconnected clusters (Fig. 5 G).

In the low-TIL group, 346 genes from the yellow module yielded 808 significant enrichments (FDR < 0.05), spanning multiple annotation categories. These genes were predominantly associated with cytoplasmic translation, ribosome biogenesis and assembly, and rRNA metabolism (Fig. 5 F). The corresponding PPI network was enriched for ribosomal proteins, with RPS27A, RPS3, and RPS5 forming tightly clustered interaction networks (Fig. 5 H).

Mutation analysis

Comparative mutation profiling revealed that the high-TIL subgroup presented increased mutation frequencies across recurrent genes such as CD163, KCNK9, NINL, ZNF683, and PRLR, with a predominance of missense variants (Fig. 5 I-J). Mutation diversity was greater in the high-TIL subgroup, whereas the low-TIL subgroup presented fewer mutations overall and lower per-gene frequencies.

Drug sensitivity and immune activity

Using cell line data from the GDSC database, IC50 values for multiple compounds were estimated in the training cohort via ridge regression with 10-fold cross-validation. Overall, no significant differences were observed between the high- and low-TIL groups for common chemotherapeutic or targeted agents. However, several compounds—including navitoclax, PLX-4720, and PRIMA-1MET—exhibited significantly different IC50 distributions (P < 0.05) (Fig. 6 A-C).

Fig. 6 — Drug sensitivity, immune activity, and model performance. (A–C) Predicted IC50 distributions of compounds with significant differences between TIL groups. (D) CYT scores in high- and low-TIL tumors. (E) Kaplan–Meier survival by TIL density. (F–G) ROC curves of models based on Haralick features alone and with tumor stage integration.

Analysis of immune-related features further revealed that the CYT score was markedly higher in the high-TIL subgroup than in the low-TIL subgroup (Fig. 6 D). In addition, both the immune score and ESTIMATE score were significantly elevated in the high-TIL subgroup, which was consistent with a more activated immune microenvironment.

Haralick + stage classification model for TIL-based risk stratification

Using the TIL density estimated from the QuPath pipeline, all 304 patients were stratified into high-TIL (n=96) and low-TIL (n=208) groups. In the combined cohort, patients in the high-TIL subgroup experienced significantly better overall survival than those in the low-TIL subgroup (Cox HR = 0.48, 95% CI 0.29–0.79; P = 0.004) (Fig. 6E). We then asked whether a compact set of image-derived Haralick features could recapitulate this high/low TIL stratification. Feature selection with a random forest classifier identified the top 10 Haralick features, which were used to construct a predictive model. To address class imbalance, SMOTE oversampling and class weighting (low-TIL weight = 1.2) were applied. The model achieved an AUC of 0.87 in 10-fold cross-validation (Fig. 6 F).

To further improve performance, tumor stage was incorporated with the top Haralick features in a combined random forest model. This integrated approach yielded superior predictive accuracy, with the AUC increasing to 0.892 (Fig. 6 G), underscoring the additive value of clinical stage in predicting TIL density. Internal precision–recall analysis demonstrated a PR-AUC of 0.889, and bootstrap calibration analysis showed good agreement between predicted probabilities and observed TIL groups, with a Brier score of 0.134 and a calibration curve closely following the ideal diagonal (Figure S1 A, B). Feature-importance plots indicated that both histogram- and texture-based Haralick descriptors (e.g. F5, F1, F0) and tumour stage contributed substantially to model discrimination (Figure S1 C).

To assess generalizability, we applied the combined Haralick + stage model to an independent External validation cohort after harmonizing stage categories (Stage I–IV) and using the same cut-off–derived TIL groups as in the TCGA cohort. In this external dataset, the model retained moderate discriminative ability, with an AUC of 0.715 (95% CI 0.575–0.855) (Fig. 6 H). External PR-AUC was 0.285, and the Brier score increased to 0.219. Calibration analysis showed an approximately unit slope (0.98) but a negative intercept (−1.90) (Figure S1 D, E).

Discussion

This study developed and validated an automated QuPath-based pipeline for TIL quantification, and, in a separate step, a Haralick texture + stage classifier for TIL-based risk stratification in lung adenocarcinoma. By integrating the QuPath open-source platform with a watershed cell detection workflow, our model automatically identified and quantified TILs, tumor cells, and stromal cells on H&E–stained whole-slide images. Rigorous quality control and iterative classifier optimization, guided by pathologist review, resulted in high classification accuracy and strong concordance with both manual counts and transcriptomic inference, underscoring the potential of artificial intelligence (AI)-driven histopathology for scalable immune profiling.

Automated TIL quantification pipeline

To mitigate inter-slide variability in H&E staining, we applied QuPath’s “Estimate stain vectors” for per-image stain normalization, a strategy previously validated to reduce technical variation²⁰. For cell segmentation and classification, we used QuPath’s implementation, which internally combines ImageJ-based image processing with an OpenCV-based random-forest classifier and incorporates Haralick texture features to enhance discrimination among tumour, immune and stromal cells^23,39. The classifier was iteratively refined with pathologist feedback to ensure that the vast majority of cells were correctly labeled across diverse tissue morphologies, underscoring the feasibility of integrating AI into digital pathology for quantitative immune microenvironment assessment.

TIL density and clinical outcomes

Automated deep learning–based analysis successfully enabled the identification and quantification of TILs, tumor cells, and stromal cells in H&E-stained lung cancer slides. TIL density exhibited marked interpatient heterogeneity and generally accounted for only a minor fraction of total cells, whereas tumor and stromal cells were more abundant, reflecting the complexity of the tumor microenvironment. The proportion of automatically identified tumor cells showed moderate concordance with both manual counts and sequencing-based inferences, indicating the reasonable accuracy of the model while also highlighting room for further optimization. This validates the feasibility of integrating automated approaches into pathological image analysis for clinical research applications.

Using the maximally selected rank statistics method, we determined an optimal cut-off of 135 cells/mm² to stratify patients into high- and low-TIL groups. Survival analyses demonstrated that patients in the high-TIL subgroup had significantly better overall survival than those in the low-TIL subgroup did, a finding validated across both the training and validation cohorts. Importantly, the cut-off derived from the training cohort (135 cells/mm²) was highly consistent with values obtained in the validation (110 cells/mm²) and combined cohorts (124 cells/mm²), suggesting robustness and generalizability. These results reinforce the prognostic value of TIL density in lung cancer, which is consistent with previous studies, and highlight the potential of automated quantification to provide a standardized, reproducible biomarker for clinical application¹².

In addition, although we quantified agreement between automated and manual assessments using ICCs and Bland–Altman analysis, we did not compute cell-level confusion matrices or per-class precision/recall for the QuPath classifier, which should be addressed in future prospective technical validation studies.

Immune infiltration and the tumor immune microenvironment

In this study, we applied ssGSEA to systematically evaluate the TME of patients stratified by TIL density. The high-TIL group exhibited markedly increased infiltration of adaptive immune populations—including activated B cells, CD8⁺ cytotoxic T lymphocytes (CTLs), and effector CD4⁺ T cells—as well as antigen-presenting cells such as dendritic cells. Heatmap analysis revealed broad upregulation of immune-related gene expression in the high-TIL subgroup, which was consistent with increased immune activity and more effective immune surveillance. Correlation analysis further revealed that most immune cell subsets were positively associated with each other, highlighting a coordinated immune network in the high-TIL subgroup.

ESTIMATE analysis corroborated these findings: the immune score and ESTIMATE score were significantly greater in the high-TIL subgroup than in the low-TIL subgroup, whereas the stromal score did not differ between the groups, suggesting that stromal components may play a relatively stable role across immune contexts. These results are in line with the well-established distinction between “hot” and “cold” tumors⁴⁰. Tumors are characterized by dense immune infiltration, particularly activated CD8⁺ CTLs capable of directly killing tumor cells through the release of cytotoxic granules (perforin, granzymes) and the secretion of cytokines (IFNγ, TNF) ^41,42. Effector CD4⁺ T cells also contribute by producing proinflammatory cytokines and modulating B-cell and CTL responses in secondary lymphoid organs^43,44. In contrast, cold tumors exhibit poor immune infiltration, often driven by oncogenic signaling pathways such as the WNT/β-catenin, MAPK, and JAK/STAT3 pathways, which suppress T-cell recruitment and activation⁴⁵.

Clinically, this dichotomy has important therapeutic implications. High-TIL (“hot”) tumors are inherently more responsive to immune checkpoint blockade, whereas low-TIL (“cold”) tumors may require additional immunomodulatory interventions to convert them into more immunogenic phenotypes and improve therapeutic benefit^46–48.

Our current workflow quantifies TIL density at the whole-slide level and does not explicitly capture spatial heterogeneity (e.g. peritumoral versus intratumoral or regional ‘hot spots’ of infiltration). Future work leveraging region-level TIL maps and compartment-specific analyses will be needed to better characterise intra-tumoural immune architecture.

Pathway and coexpression module differences

Through GSVA and GO enrichment analyses, we observed striking pathway differences between the high- and low-TIL groups. The high-TIL group was enriched for immune-related pathways, including antigen processing and presentation, TCR signaling, NK cell–mediated cytotoxicity, and leukocyte migration, which is consistent with enhanced immune surveillance and antitumor activity. In contrast, the low-TIL group was significantly enriched in ribosome biogenesis, protein translation, and rRNA metabolic processes, suggesting lower immune activity but enhanced anabolic programming, potentially supporting immune evasion and tumor progression.

WGCNA further partitioned the candidate genes into 26 coexpression modules. Correlation analysis revealed that the magenta module was positively associated with the high-TIL group and enriched for immune-regulatory genes, whereas the yellow module correlated with the low-TIL group and was enriched for ribosomal protein genes. PPI analysis revealed that the hub genes in the magenta module, including CD8A, HLA family members, and B2M, were central to immune activation. In contrast, the yellow module contained ribosome-related hub genes such as RPS27A, underscoring a metabolic growth program in the low-TIL group.

The biological roles of these hub genes reinforce the functional plausibility of the modules. CD8A, which encodes the CD8α chain of CTLs, has been proposed as a quantifiable marker of CTL recruitment and activity, and is a reliable biomarker for the response to PD-1/PD-L1 blockade^49–51. Its expression is correlated not only with survival outcomes but also with immunotherapy efficacy, highlighting its predictive value. On the other hand, RPS27A, a 40S ribosomal subunit protein, is frequently overexpressed in multiple cancers (e.g., CML, colon cancer and lung adenocarcinoma)⁵². Functional studies have shown that RPS27A knockdown induces cell cycle arrest and apoptosis by enhancing the RPL11–MDM2 interaction, which inhibits MDM2-mediated p53 ubiquitination and degradation, thereby stabilizing the tumor suppressor p53^53–56. These mechanistic associations are consistent with the hub genes identified in our PPI network analysis, further confirming the robustness and reliability of our findings.

Although WGCNA also identified brown and light-green modules that showed the strongest correlations with the high- and low-TIL phenotypes, respectively, their functional annotation and GS–MM relationships did not yield clear tumour- or immune-related biology. For this reason, we focused our mechanistic interpretation on the magenta and yellow modules, which showed coherent pathway enrichment, and treated the brown and light-green modules as exploratory findings. The potential biological roles of these less well-annotated modules will require further validation in larger cohorts and with complementary approaches (e.g., single-cell or spatial transcriptomics) in future studies.

Somatic mutation differences

Compared with the low-TIL group, the high-TIL group presented significantly greater somatic mutation frequencies and greater diversity of altered genes, including recurrent mutations in CD163, KCNK9, and NINL. The predominance of missense mutations, together with nonsense, frameshift, and splice-site variants, suggests adaptive evolution under immune pressure. Notably, the mutation spectrum in the high-TIL subgroup was more complex, particularly for genes such as CD163 and KCNK9, indicating potential mechanisms of immune evasion.

Accumulated somatic mutations can generate neoantigens that may be recognized by the immune system⁵⁷. Although only a fraction of mutations yield effective neoantigens capable of being processed, presented by MHC molecules, and recognized by T cells, a higher tumor mutational burden (TMB) increases the likelihood of neoantigen formation^58–60. This mechanistic link helps explain why the high-TIL group displayed a more active immune microenvironment and may exhibit greater susceptibility to immune checkpoint blockade (ICI) therapies⁶¹.

Collectively, these findings underscore the interplay between mutational burden and immune activation, reinforcing the potential of TIL density as a predictive biomarker for immunotherapy responsiveness in patients with lung cancer.

The higher tumour mutational burden and broader spectrum of nonsynonymous mutations observed in high-TIL tumours are consistent with the concept that an increased pool of immunogenic mutations and neoantigens can drive enhanced immune recognition. Although we did not perform formal neoantigen prediction in this study, integrating mutation profiles with in silico neoepitope analyses will be an important next step to quantify immunogenic mutation load and to more directly link TMB, neoantigenicity, and TIL accumulation.

Drug sensitivity and immune activation

Using GDSC-derived drug sensitivity data in combination with ridge regression modeling and 10-fold cross-validation, we evaluated the predicted IC50 values for multiple chemotherapeutic and targeted agents in the TCGA cohort. No significant differences were observed between the high- and low-TIL groups for commonly used chemotherapy or targeted therapies, suggesting that TIL density does not directly influence sensitivity to conventional treatments. However, for a subset of compounds—including BI-2536, leflunomide, and WEHI-539—the IC50 distributions differed significantly between groups, indicating that patients with distinct TIL levels may display differential responses to drugs linked to specific molecular mechanisms.

Further immune microenvironment analysis revealed significantly higher CYT cores, immune scores, and ESTIMATES scores in the high-TIL subgroup than in the low-TIL subgroup, which was consistent with a more active and immunologically enriched tumor microenvironment. These analyses indicate that TIL density is more strongly linked to cytolytic and immune activation scores than to predicted sensitivity to specific cytotoxic agents. These findings align with prior reports showing that immune infiltration scores are positively associated with prognosis and immunotherapy efficacy^62,63. Taken together, our results suggest that the clinical relevance of TIL density is more likely mediated through its ability to reflect immune activity and predict responsiveness to immunotherapy rather than through direct modulation of chemotherapy or targeted drug sensitivity. Because detailed treatment and response data were not available for the TCGA cohort, our drug sensitivity and CYT analyses should be regarded as hypothesis-generating markers of immune activation rather than direct predictors of therapeutic response. Prospective studies in clinically annotated, immunotherapy-treated cohorts will be required to determine whether TIL density and derived signatures have true predictive value.

Classification model performance

Our machine learning framework provides an end-to-end, open-source workflow that starts from routine H&E slides and yields quantitative TIL density, coexpression module signatures, and a deployable Haralick + stage classifier. This integration of digital pathology with transcriptomic and mutational profiling demonstrates how AI-assisted image analysis can be embedded into biomarker discovery pipelines, enabling reproducible risk stratification in lung adenocarcinoma. From a machine-learning perspective, the internal cross-validated AUCs of 0.87 and 0.892 for the Haralick-only and Haralick + stage models, respectively, indicate strong discrimination for separating high- and low-TIL tumours within the TCGA cohort, with good precision–recall behaviour and overall calibration. In the independent single-centre validation cohort, the Haralick + stage model achieved an AUC of 0.715 (95% CI 0.575–0.855), with a lower PR-AUC and higher Brier score than in the TCGA cohort. These findings indicate that the model retains moderate discriminative ability when transferred to a different population, but that performance attenuation likely reflects cohort-specific differences in tissue processing, staining protocols, scanning hardware and case mix. Accordingly, the current external validation should be interpreted as an initial proof-of-concept, and larger multicentre studies will be required to rigorously assess robustness and clinical utility. Because this classifier relies on pre-computed Haralick texture features and a conventional random-forest algorithm implemented in QuPath and R, it can be run on standard CPU-only workstations using freely available software, without requiring dedicated GPU hardware or complex deep-learning infrastructure. Nevertheless, we did not formally benchmark runtime or computational cost, and future work should systematically compare deployment efficiency against deep-learning approaches.

A key limitation is that the external validation was restricted to a single centre with a limited number of high-TIL cases and without immunotherapy-treated patients. Consequently, our results primarily support the prognostic relevance of automated TIL density, while any potential predictive value for immunotherapy response should be regarded as hypothesis-generating. Future work will focus on multi-centre and multi-platform validation, incorporation of explicit domain-adaptation strategies and spatial TIL features, and evaluation in immunotherapy cohorts, in line with current guidelines for reporting machine-learning studies in computational pathology.

Beyond methodological performance, the practical value of this approach lies in its potential integration into digital pathology pipelines. By providing an efficient and reproducible tool for quantifying immune infiltration, the model could complement pathologist evaluation, facilitate large-scale biomarker studies, and ultimately support the clinical application of TILs as prognostic and predictive markers in lung cancer.

Conclusion

In this study, we developed a machine learning–driven, image analysis–based framework for automated quantification of tumor-infiltrating lymphocytes (TILs) and applied it to profile the lung cancer immune microenvironment. Our results demonstrated that high-TIL tumors were characterized by stronger immune activation, broader mutational spectra, and improved overall survival, whereas low-TIL tumors displayed enrichment of biosynthetic pathways and features indicative of immune evasion. These findings suggest that TIL density may function as both a prognostic indicator and a potential predictive biomarker for immunotherapy response, although direct validation is needed.

Nonetheless, several limitations should be acknowledged. First, this study relied mainly on retrospective TCGA data, which may not fully capture clinical heterogeneity or real-world variability in H&E staining protocols. Second, although we included an independent external validation cohort, this dataset was derived from a single centre, had a relatively small sample size, and did not include patients treated with immune checkpoint inhibitors; therefore, the generalizability and predictive value of the Haralick + stage model remain to be confirmed. Third, we did not explicitly model spatial patterns of TIL distribution, which may carry additional prognostic and biological information beyond global density measures. Finally, although the classification model achieved high AUCs in both internal and external validation, prospective evaluation in larger, multi-institutional cohorts will be necessary before considering clinical implementation.

Future work should focus on integrating additional molecular layers, such as genomic, epigenomic, and spatial transcriptomic features, to refine TIL quantification and improve the mechanistic understanding of immune evasion. Prospective clinical studies are essential for establishing robust cut-offs, assessing the utility of this approach in immunotherapy-treated patients, and advancing automated TIL quantification toward clinical translation.

Supplementary Information

Supplementary Information.^{(203.2KB, pdf)}

Acknowledgements

The authors thank TCGA Research Network for providing data resources.

Author contributions

Ang Li and Fasheng Li conceived and designed the study. Ang Li performed image processing and machine learning analysis. Ang Li, Yutao Pang, Hongfei Zhang, Dong Wu, Liyao Lin, Zhan He, Zhu Liang and Jie Chen conducted immunogenomic and statistical analyses. Ang Li drafted the manuscript. All authors reviewed and approved the final version of the manuscript.

Funding

This project was supported by the Key Clinical Projects of Affiliated Hospital of Guangdong Medical University (LCYJ2022DL003), the Supported Projects of Zhanjiang (2021A05076), and the Zhanjiang Science and Technology Plan Project (2010H20190029).

Data availability

The datasets analysed in this study are publicly available in TCGA (https://portal.gdc.cancer.gov/). Case identifiers and slide barcodes are listed in Supplementary Table S1. All processed feature matrices, model predictions, and the R scripts and trained random-forest models used for Haralick-based classification and performance evaluation are available from the corresponding authors on reasonable request.

Declarations

Competing interests

The authors declare no competing interests.

Ethics

This study was conducted in accordance with the Declaration of Helsinki. For the TCGA cohort, only publicly available, de-identified data were used, and ethical approval and informed consent were obtained in the original studies. The external validation cohort of postoperative, treatment-naïve lung adenocarcinoma patients from the Affiliated Hospital of Guangdong Medical University was approved by the Ethics Committee of Clinical Research, Affiliated Hospital of Guangdong Medical University (approval No. KT2025-149), which waived the requirement for written informed consent owing to the retrospective design and the use of anonymized archival H&E whole-slide images.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Jie Chen, Email: chen.jie13579@163.com.

Fasheng Li, Email: lfs1020@foxmail.com.

References

1.Bray, F. et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin.74, 229–263 (2024). [DOI] [PubMed] [Google Scholar]
2.Li, C., Yuan, Y., Jiang, X. & Wang, Q. Identification and validation of tumor microenvironment-related signature for predicting prognosis and immunotherapy response in patients with lung adenocarcinoma. Sci. Rep.13, 13568 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Johnson, M. L. et al. Durvalumab with or without tremelimumab in combination with chemotherapy as first-line therapy for metastatic non-small-cell lung cancer: the phase III POSEIDON study. J. Clin. Oncol.41, 1213–1227 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Paz-Ares, L. et al. First-line nivolumab plus ipilimumab combined with two cycles of chemotherapy in patients with non-small-cell lung cancer (CheckMate 9LA): an international, randomised, open-label, phase 3 trial. Lancet Oncol.22, 198–211 (2021). [DOI] [PubMed] [Google Scholar]
5.Paz-Ares, L. et al. A randomized, placebo-controlled trial of pembrolizumab plus chemotherapy in patients with metastatic squamous NSCLC: protocol-specified final analysis of KEYNOTE-407. J. Thorac. Oncol.15, 1657–1669 (2020). [DOI] [PubMed] [Google Scholar]
6.Garassino, M. C. et al. Pembrolizumab plus pemetrexed and platinum in nonsquamous non-small-cell lung cancer: 5-year outcomes from the phase 3 KEYNOTE-189 study. J. Clin. Oncol.41, 1992–1998 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Schoenfeld, A. J. & Hellmann, M. D. Acquired resistance to immune checkpoint inhibitors. Cancer Cell37, 443–455 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Schoenfeld, J. D. et al. Durvalumab plus tremelimumab alone or in combination with low-dose or hypofractionated radiotherapy in metastatic non-small-cell lung cancer refractory to previous PD(L)-1 therapy: an open-label, multicentre, randomised, phase 2 trial. Lancet Oncol.23, 279–291 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
9.de Castro Jr, G. et al. Five-year outcomes with pembrolizumab versus chemotherapy as first-line therapy in patients with non–small-cell lung cancer and programmed death ligand-1 tumor proportion score≥ 1% in the KEYNOTE-042 study. J. Clin. Oncol.41(11), 1986–1991 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Gettinger, S. N. et al. Clinical features and management of acquired resistance to PD-1 axis inhibitors in 26 patients with advanced non-small cell lung cancer. J. Thorac. Oncol.13, 831–839 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Ricciuti, B. et al. Genomic and immunophenotypic landscape of acquired resistance to PD-(L)1 blockade in non-small-cell lung cancer. J. Clin. Oncol.42, 1311–1321 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Rakaee, M. et al. Evaluation of tumor-infiltrating lymphocytes using routine H&E slides predicts patient survival in resected non-small cell lung cancer. Hum. Pathol.79, 188–198 (2018). [DOI] [PubMed] [Google Scholar]
13.Gataa, I. et al. Tumour-infiltrating lymphocyte density is associated with favourable outcome in patients with advanced non-small cell lung cancer treated with immunotherapy. Eur. J. Cancer145, 221–229 (2021). [DOI] [PubMed] [Google Scholar]
14.Kos, Z. et al. Pitfalls in assessing stromal tumor infiltrating lymphocytes (sTILs) in breast cancer. NPJ Breast Cancer6, 17 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Saltz, J. et al. Spatial organization and molecular correlation of tumor-infiltrating lymphocytes using deep learning on pathology images. Cell Rep.23(181–193), e187 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Ugolini, F. et al. Tumor-infiltrating lymphocyte recognition in primary melanoma by deep learning convolutional neural network. Am. J. Pathol.193, 2099–2110 (2023). [DOI] [PubMed] [Google Scholar]
17.Vayrynen, J. P. et al. Prognostic significance of immune cell populations identified by machine learning in colorectal cancer using routine hematoxylin and eosin-stained sections. Clin. Cancer Res.26, 4326–4338 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Acs, B. et al. An open source automated tumor infiltrating lymphocyte algorithm for prognosis in melanoma. Nat. Commun.10, 5440 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Grossman, R. L. et al. Toward a shared vision for cancer genomic data. N. Engl. J. Med.375, 1109–1112 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Bankhead, P. et al. QuPath: Open source software for digital pathology image analysis. Sci. Rep.7, 16878 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Hothorn, T. & Lausen, B. On the exact distribution of maximally selected rank statistics. Comput. Stat. Data Anal.43, 121–137 (2003). [Google Scholar]
22.Biecek AKaMKaP: survminer: drawing survival curves using ‘ggplot2’. (2024).
23.Lofstedt, T., Brynolfsson, P., Asklund, T., Nyholm, T. & Garpebring, A. Gray-level invariant Haralick texture features. PLoS One14, e0212110 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Kuhn, M. Building predictive models in R using the caret package. J. Stat. Softw.28, 1–26 (2008).27774042 [Google Scholar]
25.Vaughan HWaRFaLHaKMaD: dplyr: a grammar of data manipulation. (2025).
26.Revelle W: psych: procedures for psychological, psychometric, and personality research. R package version 1.0–95. Evanston, Illinois (2013).
27.Bland, J. M. & Altman, D. G. The logrank test. BMJ328, 1073 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Guyot, P., Ades, A. E., Ouwens, M. J. & Welton, N. J. Enhanced secondary analysis of survival data: reconstructing the data from published Kaplan-Meier survival curves. BMC Med. Res. Methodol.12, 9 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Dunkler, D., Ploner, M., Schemper, M. & Heinze, G. Weighted cox regression using the R package coxphw. J. Stat. Softw.84, 1–26 (2018).30450020 [Google Scholar]
30.Hanzelmann, S., Castelo, R. & Guinney, J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinform.14, 7 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res.43, e47 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Yoshihara, K. et al. Inferring tumour purity and stromal and immune cell admixture from expression data. Nat. Commun.4, 2612 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinform.9, 559 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Mayakonda, A., Lin, D. C., Assenov, Y., Plass, C. & Koeffler, H. P. Maftools: efficient and comprehensive analysis of somatic variants in cancer. Genome Res.28, 1747–1756 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Maeser, D., Gruener, R. F. & Huang, R. S. oncoPredict: an R package for predicting in vivo or cancer patient drug response and biomarkers from cell line screening data. Brief Bioinform22, bbab260 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Rooney, M. S., Shukla, S. A., Wu, C. J., Getz, G. & Hacohen, N. Molecular and genetic properties of tumors associated with local immune cytolytic activity. Cell160, 48–61 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Liaw, A. & Wiener, M. Classification and regression by randomForest. R News2(3), 18–22 (2002).
38.Malpica, N. et al. Applying watershed algorithms to the segmentation of clustered nuclei. Cytometry28, 289–297 (1997). [DOI] [PubMed] [Google Scholar]
39.Schneider, C. A., Rasband, W. S. & Eliceiri, K. W. NIH Image to ImageJ: 25 years of image analysis. Nat. Methods9, 671–675 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Rajbhandary, S., Dhakal, H. & Shrestha, S. Tumor immune microenvironment (TIME) to enhance antitumor immunity. Eur. J. Med. Res.28, 169 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Durgeau, A., Virk, Y., Corgnac, S. & Mami-Chouaib, F. Recent advances in targeting CD8 T-cell immunity for more effective cancer immunotherapy. Front. Immunol.9, 14 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Martinez-Lostao, L., Anel, A. & Pardo, J. How do cytotoxic lymphocytes kill cancer cells?. Clin. Cancer Res.21, 5047–5056 (2015). [DOI] [PubMed] [Google Scholar]
43.Kim, H. J. & Cantor, H. CD4 T-cell subsets and tumor immunity: the helpful and the not-so-helpful. Cancer Immunol. Res.2, 91–98 (2014). [DOI] [PubMed] [Google Scholar]
44.Borst, J., Ahrends, T., Babala, N., Melief, C. J. M. & Kastenmuller, W. CD4(+) T cell help in cancer immunology and immunotherapy. Nat. Rev. Immunol.18, 635–647 (2018). [DOI] [PubMed] [Google Scholar]
45.Zabransky, D. J., Yarchoan, M. & Jaffee, E. M. Strategies for Heating Up Cold Tumors to Boost Immunotherapies. Annu. Rev. Cancer Biol.7, 149–170 (2023). [Google Scholar]
46.Frederico, S. C. et al. Making a Cold tumor hot: the role of vaccines in the treatment of glioblastoma. Front. Oncol.11, 672508 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
47.June, C. H., O’Connor, R. S., Kawalekar, O. U., Ghassemi, S. & Milone, M. C. CAR T cell immunotherapy for human cancer. Science359, 1361–1365 (2018). [DOI] [PubMed] [Google Scholar]
48.Abdalsalam, N. M. F. et al. MDSC: a new potential breakthrough in CAR-T therapy for solid tumors. Cell Commun. Signal22, 612 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Ock, C. Y. et al. Pan-cancer immunogenomic perspective on the tumor microenvironment based on PD-L1 and CD8 T-cell infiltration. Clin. Cancer Res.22, 2261–2270 (2016). [DOI] [PubMed] [Google Scholar]
50.Chen, Y. P. et al. Genomic analysis of tumor microenvironment immune types across 14 solid cancer types: immunotherapeutic implications. Theranostics7, 3585–3594 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Niu, D., Chen, Y., Mi, H., Mo, Z. & Pang, G. The epiphany derived from T-cell-inflamed profiles: pan-cancer characterization of CD8A as a biomarker spanning clinical relevance, cancer prognosis, immunosuppressive environment, and treatment responses. Front. Genet.13, 974416 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
52.Hong, S. W., Kim, S. M., Jin, D. H., Kim, Y. S. & Hur, D. Y. RPS27a enhances EBV-encoded LMP1-mediated proliferation and invasion by stabilizing of LMP1. Biochem. Biophys. Res. Commun.491, 303–309 (2017). [DOI] [PubMed] [Google Scholar]
53.Redman, K. L. & Rechsteiner, M. Identification of the long ubiquitin extension as ribosomal protein S27a. Nature338, 438–440 (1989). [DOI] [PubMed] [Google Scholar]
54.Li, H. et al. Loss of RPS27a expression regulates the cell cycle, apoptosis, and proliferation via the RPL11-MDM2-p53 pathway in lung adenocarcinoma cells. J. Exp. Clin. Cancer Res.41, 33 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
55.Duan, J. et al. Knockdown of ribosomal protein S7 causes developmental abnormalities via p53 dependent and independent pathways in zebrafish. Int. J. Biochem. Cell Biol.43, 1218–1227 (2011). [DOI] [PubMed] [Google Scholar]
56.Zhang, Y. et al. Negative regulation of HDM2 to attenuate p53 degradation by ribosomal protein L26. Nucleic Acids Res.38, 6544–6554 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
57.Riaz, N. et al. The role of neoantigens in response to immune checkpoint blockade. Int. Immunol.28, 411–419 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
58.Coulie, P. G., Van den Eynde, B. J., van der Bruggen, P. & Boon, T. Tumour antigens recognized by T lymphocytes: at the core of cancer immunotherapy. Nat. Rev. Cancer14, 135–146 (2014). [DOI] [PubMed] [Google Scholar]
59.Carreno, B. M. et al. Cancer immunotherapy a dendritic cell vaccine increases the breadth and diversity of melanoma neoantigen-specific T cells. Science348(803), 808 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
60.Schumacher, T. N. & Schreiber, R. D. Neoantigens in cancer immunotherapy. Science348, 69–74 (2015). [DOI] [PubMed] [Google Scholar]
61.Chabanon, R. M. et al. Mutational landscape and sensitivity to immune checkpoint blockers. Clin. Cancer Res.22, 4309–4321 (2016). [DOI] [PubMed] [Google Scholar]
62.Tian, J. et al. Construction of immune cell infiltration score model to assess prognostic ability of tumor immune environment in lung adenocarcinoma. Am. J. Transl. Res.15, 1730–1743 (2023). [PMC free article] [PubMed] [Google Scholar]
63.Liao, Y., He, D. & Wen, F. Analyzing the characteristics of immune cell infiltration in lung adenocarcinoma via bioinformatics to predict the effect of immunotherapy. Immunogenetics73, 369–380 (2021). [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information.^{(203.2KB, pdf)}

Data Availability Statement

[CR1] 1.Bray, F. et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin.74, 229–263 (2024). [DOI] [PubMed] [Google Scholar]

[CR2] 2.Li, C., Yuan, Y., Jiang, X. & Wang, Q. Identification and validation of tumor microenvironment-related signature for predicting prognosis and immunotherapy response in patients with lung adenocarcinoma. Sci. Rep.13, 13568 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR3] 3.Johnson, M. L. et al. Durvalumab with or without tremelimumab in combination with chemotherapy as first-line therapy for metastatic non-small-cell lung cancer: the phase III POSEIDON study. J. Clin. Oncol.41, 1213–1227 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR4] 4.Paz-Ares, L. et al. First-line nivolumab plus ipilimumab combined with two cycles of chemotherapy in patients with non-small-cell lung cancer (CheckMate 9LA): an international, randomised, open-label, phase 3 trial. Lancet Oncol.22, 198–211 (2021). [DOI] [PubMed] [Google Scholar]

[CR5] 5.Paz-Ares, L. et al. A randomized, placebo-controlled trial of pembrolizumab plus chemotherapy in patients with metastatic squamous NSCLC: protocol-specified final analysis of KEYNOTE-407. J. Thorac. Oncol.15, 1657–1669 (2020). [DOI] [PubMed] [Google Scholar]

[CR6] 6.Garassino, M. C. et al. Pembrolizumab plus pemetrexed and platinum in nonsquamous non-small-cell lung cancer: 5-year outcomes from the phase 3 KEYNOTE-189 study. J. Clin. Oncol.41, 1992–1998 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR7] 7.Schoenfeld, A. J. & Hellmann, M. D. Acquired resistance to immune checkpoint inhibitors. Cancer Cell37, 443–455 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR8] 8.Schoenfeld, J. D. et al. Durvalumab plus tremelimumab alone or in combination with low-dose or hypofractionated radiotherapy in metastatic non-small-cell lung cancer refractory to previous PD(L)-1 therapy: an open-label, multicentre, randomised, phase 2 trial. Lancet Oncol.23, 279–291 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR9] 9.de Castro Jr, G. et al. Five-year outcomes with pembrolizumab versus chemotherapy as first-line therapy in patients with non–small-cell lung cancer and programmed death ligand-1 tumor proportion score≥ 1% in the KEYNOTE-042 study. J. Clin. Oncol.41(11), 1986–1991 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR10] 10.Gettinger, S. N. et al. Clinical features and management of acquired resistance to PD-1 axis inhibitors in 26 patients with advanced non-small cell lung cancer. J. Thorac. Oncol.13, 831–839 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR11] 11.Ricciuti, B. et al. Genomic and immunophenotypic landscape of acquired resistance to PD-(L)1 blockade in non-small-cell lung cancer. J. Clin. Oncol.42, 1311–1321 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] 12.Rakaee, M. et al. Evaluation of tumor-infiltrating lymphocytes using routine H&E slides predicts patient survival in resected non-small cell lung cancer. Hum. Pathol.79, 188–198 (2018). [DOI] [PubMed] [Google Scholar]

[CR13] 13.Gataa, I. et al. Tumour-infiltrating lymphocyte density is associated with favourable outcome in patients with advanced non-small cell lung cancer treated with immunotherapy. Eur. J. Cancer145, 221–229 (2021). [DOI] [PubMed] [Google Scholar]

[CR14] 14.Kos, Z. et al. Pitfalls in assessing stromal tumor infiltrating lymphocytes (sTILs) in breast cancer. NPJ Breast Cancer6, 17 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] 15.Saltz, J. et al. Spatial organization and molecular correlation of tumor-infiltrating lymphocytes using deep learning on pathology images. Cell Rep.23(181–193), e187 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR16] 16.Ugolini, F. et al. Tumor-infiltrating lymphocyte recognition in primary melanoma by deep learning convolutional neural network. Am. J. Pathol.193, 2099–2110 (2023). [DOI] [PubMed] [Google Scholar]

[CR17] 17.Vayrynen, J. P. et al. Prognostic significance of immune cell populations identified by machine learning in colorectal cancer using routine hematoxylin and eosin-stained sections. Clin. Cancer Res.26, 4326–4338 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Acs, B. et al. An open source automated tumor infiltrating lymphocyte algorithm for prognosis in melanoma. Nat. Commun.10, 5440 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] 19.Grossman, R. L. et al. Toward a shared vision for cancer genomic data. N. Engl. J. Med.375, 1109–1112 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Bankhead, P. et al. QuPath: Open source software for digital pathology image analysis. Sci. Rep.7, 16878 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] 21.Hothorn, T. & Lausen, B. On the exact distribution of maximally selected rank statistics. Comput. Stat. Data Anal.43, 121–137 (2003). [Google Scholar]

[CR22] 22.Biecek AKaMKaP: survminer: drawing survival curves using ‘ggplot2’. (2024).

[CR23] 23.Lofstedt, T., Brynolfsson, P., Asklund, T., Nyholm, T. & Garpebring, A. Gray-level invariant Haralick texture features. PLoS One14, e0212110 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Kuhn, M. Building predictive models in R using the caret package. J. Stat. Softw.28, 1–26 (2008).27774042 [Google Scholar]

[CR25] 25.Vaughan HWaRFaLHaKMaD: dplyr: a grammar of data manipulation. (2025).

[CR26] 26.Revelle W: psych: procedures for psychological, psychometric, and personality research. R package version 1.0–95. Evanston, Illinois (2013).

[CR27] 27.Bland, J. M. & Altman, D. G. The logrank test. BMJ328, 1073 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.Guyot, P., Ades, A. E., Ouwens, M. J. & Welton, N. J. Enhanced secondary analysis of survival data: reconstructing the data from published Kaplan-Meier survival curves. BMC Med. Res. Methodol.12, 9 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR29] 29.Dunkler, D., Ploner, M., Schemper, M. & Heinze, G. Weighted cox regression using the R package coxphw. J. Stat. Softw.84, 1–26 (2018).30450020 [Google Scholar]

[CR30] 30.Hanzelmann, S., Castelo, R. & Guinney, J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinform.14, 7 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR31] 31.Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res.43, e47 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR32] 32.Yoshihara, K. et al. Inferring tumour purity and stromal and immune cell admixture from expression data. Nat. Commun.4, 2612 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR33] 33.Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinform.9, 559 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR34] 34.Mayakonda, A., Lin, D. C., Assenov, Y., Plass, C. & Koeffler, H. P. Maftools: efficient and comprehensive analysis of somatic variants in cancer. Genome Res.28, 1747–1756 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR35] 35.Maeser, D., Gruener, R. F. & Huang, R. S. oncoPredict: an R package for predicting in vivo or cancer patient drug response and biomarkers from cell line screening data. Brief Bioinform22, bbab260 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR36] 36.Rooney, M. S., Shukla, S. A., Wu, C. J., Getz, G. & Hacohen, N. Molecular and genetic properties of tumors associated with local immune cytolytic activity. Cell160, 48–61 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR37] 37.Liaw, A. & Wiener, M. Classification and regression by randomForest. R News2(3), 18–22 (2002).

[CR38] 38.Malpica, N. et al. Applying watershed algorithms to the segmentation of clustered nuclei. Cytometry28, 289–297 (1997). [DOI] [PubMed] [Google Scholar]

[CR39] 39.Schneider, C. A., Rasband, W. S. & Eliceiri, K. W. NIH Image to ImageJ: 25 years of image analysis. Nat. Methods9, 671–675 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR40] 40.Rajbhandary, S., Dhakal, H. & Shrestha, S. Tumor immune microenvironment (TIME) to enhance antitumor immunity. Eur. J. Med. Res.28, 169 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR41] 41.Durgeau, A., Virk, Y., Corgnac, S. & Mami-Chouaib, F. Recent advances in targeting CD8 T-cell immunity for more effective cancer immunotherapy. Front. Immunol.9, 14 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR42] 42.Martinez-Lostao, L., Anel, A. & Pardo, J. How do cytotoxic lymphocytes kill cancer cells?. Clin. Cancer Res.21, 5047–5056 (2015). [DOI] [PubMed] [Google Scholar]

[CR43] 43.Kim, H. J. & Cantor, H. CD4 T-cell subsets and tumor immunity: the helpful and the not-so-helpful. Cancer Immunol. Res.2, 91–98 (2014). [DOI] [PubMed] [Google Scholar]

[CR44] 44.Borst, J., Ahrends, T., Babala, N., Melief, C. J. M. & Kastenmuller, W. CD4(+) T cell help in cancer immunology and immunotherapy. Nat. Rev. Immunol.18, 635–647 (2018). [DOI] [PubMed] [Google Scholar]

[CR45] 45.Zabransky, D. J., Yarchoan, M. & Jaffee, E. M. Strategies for Heating Up Cold Tumors to Boost Immunotherapies. Annu. Rev. Cancer Biol.7, 149–170 (2023). [Google Scholar]

[CR46] 46.Frederico, S. C. et al. Making a Cold tumor hot: the role of vaccines in the treatment of glioblastoma. Front. Oncol.11, 672508 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR47] 47.June, C. H., O’Connor, R. S., Kawalekar, O. U., Ghassemi, S. & Milone, M. C. CAR T cell immunotherapy for human cancer. Science359, 1361–1365 (2018). [DOI] [PubMed] [Google Scholar]

[CR48] 48.Abdalsalam, N. M. F. et al. MDSC: a new potential breakthrough in CAR-T therapy for solid tumors. Cell Commun. Signal22, 612 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR49] 49.Ock, C. Y. et al. Pan-cancer immunogenomic perspective on the tumor microenvironment based on PD-L1 and CD8 T-cell infiltration. Clin. Cancer Res.22, 2261–2270 (2016). [DOI] [PubMed] [Google Scholar]

[CR50] 50.Chen, Y. P. et al. Genomic analysis of tumor microenvironment immune types across 14 solid cancer types: immunotherapeutic implications. Theranostics7, 3585–3594 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR51] 51.Niu, D., Chen, Y., Mi, H., Mo, Z. & Pang, G. The epiphany derived from T-cell-inflamed profiles: pan-cancer characterization of CD8A as a biomarker spanning clinical relevance, cancer prognosis, immunosuppressive environment, and treatment responses. Front. Genet.13, 974416 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR52] 52.Hong, S. W., Kim, S. M., Jin, D. H., Kim, Y. S. & Hur, D. Y. RPS27a enhances EBV-encoded LMP1-mediated proliferation and invasion by stabilizing of LMP1. Biochem. Biophys. Res. Commun.491, 303–309 (2017). [DOI] [PubMed] [Google Scholar]

[CR53] 53.Redman, K. L. & Rechsteiner, M. Identification of the long ubiquitin extension as ribosomal protein S27a. Nature338, 438–440 (1989). [DOI] [PubMed] [Google Scholar]

[CR54] 54.Li, H. et al. Loss of RPS27a expression regulates the cell cycle, apoptosis, and proliferation via the RPL11-MDM2-p53 pathway in lung adenocarcinoma cells. J. Exp. Clin. Cancer Res.41, 33 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR55] 55.Duan, J. et al. Knockdown of ribosomal protein S7 causes developmental abnormalities via p53 dependent and independent pathways in zebrafish. Int. J. Biochem. Cell Biol.43, 1218–1227 (2011). [DOI] [PubMed] [Google Scholar]

[CR56] 56.Zhang, Y. et al. Negative regulation of HDM2 to attenuate p53 degradation by ribosomal protein L26. Nucleic Acids Res.38, 6544–6554 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR57] 57.Riaz, N. et al. The role of neoantigens in response to immune checkpoint blockade. Int. Immunol.28, 411–419 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR58] 58.Coulie, P. G., Van den Eynde, B. J., van der Bruggen, P. & Boon, T. Tumour antigens recognized by T lymphocytes: at the core of cancer immunotherapy. Nat. Rev. Cancer14, 135–146 (2014). [DOI] [PubMed] [Google Scholar]

[CR59] 59.Carreno, B. M. et al. Cancer immunotherapy a dendritic cell vaccine increases the breadth and diversity of melanoma neoantigen-specific T cells. Science348(803), 808 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR60] 60.Schumacher, T. N. & Schreiber, R. D. Neoantigens in cancer immunotherapy. Science348, 69–74 (2015). [DOI] [PubMed] [Google Scholar]

[CR61] 61.Chabanon, R. M. et al. Mutational landscape and sensitivity to immune checkpoint blockers. Clin. Cancer Res.22, 4309–4321 (2016). [DOI] [PubMed] [Google Scholar]

[CR62] 62.Tian, J. et al. Construction of immune cell infiltration score model to assess prognostic ability of tumor immune environment in lung adenocarcinoma. Am. J. Transl. Res.15, 1730–1743 (2023). [PMC free article] [PubMed] [Google Scholar]

[CR63] 63.Liao, Y., He, D. & Wen, F. Analyzing the characteristics of immune cell infiltration in lung adenocarcinoma via bioinformatics to predict the effect of immunotherapy. Immunogenetics73, 369–380 (2021). [DOI] [PubMed] [Google Scholar]

PERMALINK

Automated quantification of tumor-infiltrating lymphocytes by machine learning reveals prognostic and immunogenomic features in lung cancer

Ang Li

Yutao Pang

Hongfei Zhang

Dong Wu

Liyao Lin

Zhan He

Zhu Liang

Jie Chen

Fasheng Li

Abstract

Supplementary Information

Introduction

Methods

Patients and pathological materials

Quantification of TIL via machine learning

Data preprocessing and stratification strategy

Descriptive statistical analysis

Intraclass consistency analysis (ICC)

Survival analysis

Immune infiltration analysis

Pathway enrichment and coexpression network analysis

Somatic mutation analysis

Drug sensitivity and immune score comparisons

Classification model construction and validation

Results

Development of an automated TIL quantification pipeline

Fig. 1.

TIL and clinical outcomes

Table 1.

Fig. 2.

Immune infiltration analysis

Fig. 3.

Pathway enrichment and network analysis

Fig. 4.

Fig. 5.

Mutation analysis

Drug sensitivity and immune activity

Fig. 6.

Haralick + stage classification model for TIL-based risk stratification

Discussion

Automated TIL quantification pipeline

TIL density and clinical outcomes

Immune infiltration and the tumor immune microenvironment

Pathway and coexpression module differences

Somatic mutation differences

Drug sensitivity and immune activation

Classification model performance

Conclusion

Supplementary Information

Acknowledgements

Author contributions

Funding

Data availability

Declarations

Competing interests

Ethics

Footnotes

Contributor Information

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases