Summary
A robust predictive biomarker is critical for identifying patients with NSCLC who may benefit from immunotherapy. This study developed a CT-based habitat model using 590 advanced NSCLC cases. The model was constructed in contrast-enhanced CT images and validated on an independent cohort with non-contrast CT. Tumor volumes were segmented into three subregions via K-means clustering. Radiomic features were extracted from each habitat and used to build predictive models with six machine learning classifiers. The ExtraTrees-based habitat model demonstrated superior predictive performance in the test cohort (AUC = 0.814). Compared to traditional radiomics, 3D deep learning, clinical, and PD-L1 expression models, the habitat model maintained strong predictive advantages, enabling efficient prediction of immunotherapy benefit and aiding in the identification of suitable patients for personalized.
Subject areas: Health sciences, Medicine, Medical specialty, Internal medicine, Oncology
Graphical abstract

Highlights
-
•
A multi-algorithm Habitat model predicts NSCLC immunotherapy response
-
•
Habitat model outperforms radiomics, deep learning, and clinical models
-
•
Imaging-based models surpass PD-L1 in predicting immunotherapy benefit
-
•
Habitat analysis serves as a non-invasive biomarker for treatment decisions
Health sciences; Medicine; Medical specialty; Internal medicine; Oncology
Introduction
Non-small cell lung cancer (NSCLC) is the most common histological type of lung cancer,1 with a 5-year survival rate of around 18% for advanced-stage disease.2
Recently, the combination of programmed death-1 (PD-1) receptor and its ligand (PD-L1) checkpoint blockade antibodies with chemotherapy has significantly improved patients' overall survival (OS) rates and progression-free survival (PFS).3 However, only 20% of patients with lung cancer respond to PD-1/PD-L1 immunotherapy and achieve meaningful clinical benefits.4 Therefore, identifying robust biomarkers capable of effectively predicting responses to PD-1/PD-L1 checkpoint blockade antibodies is crucial to avoid unnecessary immune-related toxicities associated with immunotherapy.
PD-L1 expression in tumor tissue is a primary biomarker, approved by the U.S. Food and Drug Administration (FDA), for predicting patient response to immunotherapy. Patients with the positive expression of PD-L1 usually have better clinical outcomes.5 However, recent studies have shown that patients with lung cancer with PD-L1 tumor proportion score (TPS) < 1% achieve superior objective response rates (ORRs) when treated with immunotherapy combined with chemotherapy compared to chemotherapy alone.6 Moreover, for PD-L1 TPS <1% patients with lung cancer who respond to PD-L1 inhibitors, anti-PD-1/PD-L1 inhibitors not only prolong their overall survival (OS) but also demonstrate comparable duration of response (DOR) to patients with PD-L1 TPS ≥50%.7,8 This indicates that even PD-L1-negative patients can achieve favorable objective response rates from PD-1/PD-L1 immunotherapy, and PD-L1 expression alone cannot accurately predict clinical benefit from immunotherapy. For advanced and inoperable patients with lung cancer, PD-L1 expression assessment primarily relies on tumor tissue biopsy. However, the widespread intratumoral heterogeneity of PD-L1 may lead to sampling bias, preventing accurate evaluation of PD-L1 expression levels in tumors.9,10 Therefore, researchers continue to explore stable molecular biomarkers to predict immunotherapy response.
Recently, radiomics has emerged as a crucial non-invasive tool for evaluating therapeutic efficacy and disease diagnosis in patients with cancer,11,12,13 owing to its ability to extract high-throughput quantitative information from medical images such as computed tomography (CT), magnetic resonance imaging (MRI), and positron emission tomography (PET).14 Due to the diagnostic biopsy nature of PD-1/PD-L1, the specimens collected are typically small portions of tumor tissue, which often leads to heterogeneous results. In contrast, medical imaging can provide a whole view of tumors, while radiomics technology extracts high-dimensional features from the entire tumor image, thereby avoiding the omission of tumor information in one region. Additionally, due to the non-invasive and repeatable nature of radiological examinations, radiomics can be continuously employed to monitor therapeutic responses in diseases. Numerous previous studies have demonstrated that radiomics can effectively predict PD-L1 expression levels in tumor tissues and further evaluate the durable clinical benefits and survival outcomes of anti-PD-1/PD-L1 immunotherapy in patients with NSCLC.15,16,17
Numerous studies have consistently demonstrated the considerable potential of radiomics in predicting the prognosis of lung cancer treatment. Research by Zhu, Liu, Wu et al.16,18,19 has shown that radiomic models constructed using machine learning classifiers, such as logistic regression (LR) and support vector machines (SVMs), can effectively predict whether patients will benefit from immunotherapy and enable stratification of survival risk. Traditional radiomics approaches offer good model interpretability; however, their feature extraction process relies on manual design and selection. In contrast, deep learning leverages an end-to-end feature learning paradigm to automatically extract complex, high-level patterns from medical images, exhibiting greater potential in terms of prediction accuracy and automation. Studies by Mu, Saad et al.,20,21 focusing specifically on the application of deep learning in predicting immunotherapy response in patients with non-small cell lung cancer, have also demonstrated superior predictive performance, highlighting the broad prospects of such methods in clinical prognostic assessment.
Previous studies on radiomics in lung cancer immunotherapy primarily treated lung cancer tissue as a homogeneous entity for feature extraction. However, numerous studies have demonstrated that tumors are heterogeneous and consist of complex internal components.1 Tumor heterogeneity often leads to uneven growth of intratumoral cells, which can contribute to the occurrence of treatment resistance.22 Since intratumoral heterogeneity frequently results in varying therapeutic outcomes among individuals, considering the tumor as a uniform entity may fail to accurately predict treatment responses. Moreover, many studies delineate the tumor region of interest (ROI) based on only a few slices,15,23 which may lead to the loss of critical imaging features. Recently, a new radiomics method named “Habitat” divides tumors into different subregions by identifying grey voxels with similar characteristics in images of tumor tissues.24 This method can effectively evaluate tumor heterogeneity and has been proven in the treatment and metastasis evaluation of breast cancer, rectal cancer, and liver cancer.25,26,27 Nevertheless, there is relatively limited literature concerning its application in lung cancer.
In this study, we employed Habitat imaging technology to partition intratumoral subregions and extract radiomic features, thereby enabling a more comprehensive characterization of spatial heterogeneity within tumors. Subsequently, we constructed predictive models based on the Habitat model, the traditional radiomic model, the 3D deep learning model, and the clinical model, systematically evaluating their respective values in predicting the efficacy of anti-PD-1/PD-L1 immunotherapy in patients with NSCLC. To ensure consistency and comparability across modeling approaches, identical machine learning classifiers were applied in the development of the clinical, Habitat, and radiomics models. The study utilized a multi-center retrospective cohort for model training and validation, allowing for comprehensive performance evaluation and comparison among the four types of predictive models. Our objective was to elucidate the relative advantage of the Habitat-based model in predicting immunotherapy benefits, thereby providing reliable evidence for its potential integration into clinical decision-support systems.
Results
Patient’s baseline clinical characteristics
The patient’s inclusion process was shown in Figure 1. In our study, we found that 75% of patients responded to immune checkpoint inhibitors after 6 months of immunotherapy. The average age of the patients was 60.32 ± 9.77, with a predominance of males, and most patients were classified at clinical stage IV and N2-3. The details of demographic and clinical characteristics are outlined in Table 1. The experimental workflow was shown in Figure 2.
Figure 1.
Patient enrollment flowchart
Table 1.
Patients characteristics in Train, Val, and Test cohorts
| ALL cohort | Train cohort | Val cohort | Test cohort | p-value | |
|---|---|---|---|---|---|
| Age | 60.32 ± 9.77 | 59.92 ± 10.02 | 60.52 ± 9.52 | 62.48 ± 8.50 | 0.189 |
| Gender | – | – | – | – | 0.039 |
| Female | 121(20.51) | 75(20.00) | 28(17.39) | 18(33.33) | – |
| Male | 469(79.49) | 300(80.00) | 133(82.61) | 36(66.67) | – |
| Smoking_Status | – | – | – | – | 0.122 |
| No | 249(42.20) | 152(40.53) | 65(40.37) | 32(59.26) | – |
| 0-30 years | 264(44.75) | 172(45.87) | 74(45.96) | 18(33.33) | – |
| ≥30 years | 77(13.05) | 51(13.60) | 22(13.66) | 4(7.41) | – |
| Histopathology | – | – | – | – | 0.393 |
| adenocarcinoma | 333(56.44) | 207(55.20) | 92(57.14) | 34(62.96) | – |
| SCC1 | 236(40.00) | 151(40.27) | 65(40.37) | 20(37.04) | – |
| others | 21(3.92) | 17(4.53) | 4(2.48) | None | – |
| Clinical_stage | – | – | – | – | 0.084 |
| Ⅲ | 150(25.42) | 87(23.20) | 43(26.71) | 20(37.04) | – |
| Ⅳ | 440(74.58) | 288(76.80) | 118(73.29) | 34(62.96) | – |
| T | – | – | – | – | 0.912 |
| 1–2 | 118(20.00) | 74(19.73) | 32(19.87) | 12(22.23) | – |
| 3–4 | 472(80.00) | 301(80.26) | 129(80.13) | 42(77.78) | – |
| N | – | – | – | – | 0.011 |
| 0–1 | 92(15.59) | 52(13.86) | 24(14.91) | 15(27.77) | – |
| 2–3 | 498(84.41) | 323(86.13) | 137(85.1) | 39(71.37) | – |
| M | – | – | – | – | 0.108 |
| No | 152(25.76) | 87(23.20) | 46(28.57) | 19(35.19) | – |
| Yes | 438(74.24) | 288(76.80) | 115(71.43) | 35(64.81) | – |
| CEA | – | – | – | – | 0.071 |
| No | 328(55.59) | 204(54.40) | 86(53.42) | 38(70.37) | – |
| Yes | 262(44.41) | 171(45.60) | 75(46.58) | 16(29.63) | – |
| CA125 | – | – | – | – | 0.164 |
| No | 356(60.34) | 217(57.87) | 101(62.73) | 38(70.37) | – |
| Yes | 234(39.66) | 158(42.13) | 60(37.27) | 16(29.63) | – |
| CA153 | – | – | – | – | 0.922 |
| No | 478(81.02) | 302(80.53) | 132(81.99) | 44(81.48) | – |
| Yes | 112(18.98) | 73(19.47) | 29(18.01) | 10(18.52) | – |
| CYFRA21_1 | – | – | – | – | 0.001 |
| No | 302(51.19) | 187(49.87) | 68(42.24) | 51(94.44) | – |
| Yes | 288(48.81) | 188(50.13) | 93(57.76) | 3(5.56) | – |
| SCC2 | – | – | – | – | 0.049 |
| No | 457(77.46) | 285(76.00) | 123(76.40) | 49(90.74) | – |
| Yes | 133(22.54) | 90(24.00) | 38(23.60) | 5(9.26) | – |
| CA199 | – | – | – | – | 0.420 |
| No | 495(83.90) | 320(85.33) | 132(81.99) | 43(79.63) | – |
| Yes | 95(16.10) | 55(14.67) | 29(18.01) | 11(20.37) | – |
| DCB | – | – | – | – | <0.001 |
| No | 145(24.58) | 109(29.07) | 21(13.04) | 15(27.78) | – |
| Yes | 445(75.42) | 266(70.93) | 140(86.96) | 39(72.22) | – |
Continuous variables were expressed as mean +/standard deviation (SD). Categorical variables were described as frequencies and percentages.SCC1: Squamous Cell Carcinoma; CEA: Carcinoembryonic Antigen; CA125: Cancer Antigen 125; CA153: Cancer Antigen 15-3; CYFRA21_1: Cytokeratin 19 Fragment; SCC2: Squamous Cell Carcinoma Antigen;CA199: Carbohydrate Antigen 19-9; DCB: Durable Clinical Benefit.
Figure 2.
The experimental workflow implemented in this study
The process included image preprocessing and segmentation, feature extraction, model construction, and the development of distinct predictive models based on Habitat, radiomics, deep learning, and clinical data. Finally, the models were evaluated.
Habitat model feature extraction and selection
In our study, we manually extracted 1,834 radiomics features from Habitat1, Habitat2, and Habitat3, respectively. The features of Habitat were integrated from the three subregions (Habitat1, Habitat2, and Habitat3), resulting in a total of 5,502 features for Habitat. Dimensionality reduction and feature selection were performed using the Pearson correlation coefficient, mRMR algorithm, and LASSO regression. Ultimately, 17 features were selected for Habitat1, 9 features for Habitat2, 8 features for Habitat3, and 24 features for Habitat. Figure 3 presents the histogram of Rad-scores for the final selected features in each Habitat model, predicting the outcome. The extracted radiomics features included shape features, first-order features, and texture features. Figure 4 illustrates the quantity and proportion of manually extracted features, as well as the Lasso regression and Mean Standard Error (MSE) of the features incorporated into the Habitat models.
Figure 3.
Histogram of Rad-score of radiomics features in different habitat regions
Habitat1 (A), Habitat2 (B), Habitat3 (C), and Habitat (D).
Figure 4.
Habitat feature selection and model performance
Pie charts of radiomics features in Habitat (A); bar charts of radiomics features in Habitat (B); the coefficients for the cross-validation of LASSO regression (C); and the MSE of LASSO analysis(D).
Construction of habitat models and evaluation
We used six machine learning classifiers-LR, SVM, RandomForest, ExtraTrees, LightGBM, and multilayer perceptron (MLP)-to develop habitat models based on the selected final features from Habitat1, Habitat2, Habitat3, and Habitat. The performance of models was evaluated by comparing AUC values, sensitivity, specificity, positive predictive value, negative predictive value, Recall, and F1-score in both train and validation cohorts to determine the most effective habitat imaging model.
Tables 2 and 3 present the predictive performance of the Habitat1, Habitat2, Habitat3, and Habitat models based on different machine learning classifiers in the train and validation cohorts (for model performance comparisons on the test cohort, please refer to Table S1). As shown in Figures 5 and 6, the Habitat model consistently outperformed the three subregional models in terms of AUC values across various classifiers. This suggests that the integrated Habitat model, incorporating features from all subregions, offers the highest predictive capability. Through comparative analysis, the Habitat-ExtraTrees model achieved the highest AUC values, with an AUC of 0.865 in the train Cohort, 0.845 in the validation Cohort, and 0.814 in the test Cohort (Figure S1). Compared to other models, the Habitat-ExtraTrees model demonstrated superior performance, with closely matched AUC values across all three cohorts, indicating its excellent robustness and generalization capability.
Table 2.
Comparative performance of habitat, Habitat1, Habitat2, and Habitat3 models on train cohorts
| model_name | Accuracy | AUC | 95%CI | Sensitivity | Specificity | PPV | NPV | Recall | F1 |
|---|---|---|---|---|---|---|---|---|---|
| LR | |||||||||
| Habitat | 0.749 | 0.744 | 0.688–0.799 | 0.820 | 0.578 | 0.826 | 0.568 | 0.82 | 0.823 |
| Habitat1 | 0.689 | 0.708 | 0.648–0.768 | 0.727 | 0.598 | 0.812 | 0.478 | 0.727 | 0.767 |
| Habitat2 | 0.668 | 0.685 | 0.626–0.744 | 0.709 | 0.569 | 0.800 | 0.446 | 0.709 | 0.752 |
| Habitat3 | 0.669 | 0.654 | 0.590–0.719 | 0.727 | 0.521 | 0.793 | 0.431 | 0.727 | 0.759 |
| SVM | |||||||||
| Habitat | 0.688 | 0.772 | 0.719–0.826 | 0.662 | 0.752 | 0.867 | 0.477 | 0.662 | 0.751 |
| Habitat1 | 0.777 | 0.736 | 0.680–0.792 | 0.922 | 0.430 | 0.795 | 0.697 | 0.922 | 0.854 |
| Habitat2 | 0.709 | 0.295 | 0.236–0.355 | 1.000 | 0.000 | 0.709 | 0.000 | 1.000 | 0.829 |
| Habitat3 | 0.601 | 0.662 | 0.597–0.727 | 0.562 | 0.698 | 0.824 | 0.387 | 0.562 | 0.668 |
| RandomForest | |||||||||
| Habitat | 0.781 | 0.823 | 0.776–0.869 | 0.816 | 0.697 | 0.868 | 0.608 | 0.816 | 0.841 |
| Habitat1 | 0.785 | 0.778 | 0.721–0.835 | 0.812 | 0.720 | 0.874 | 0.616 | 0.812 | 0.842 |
| Habitat2 | 0.741 | 0.771 | 0.719–0.822 | 0.785 | 0.633 | 0.839 | 0.548 | 0.785 | 0.811 |
| Habitat3 | 0.678 | 0.736 | 0.680–0.792 | 0.674 | 0.687 | 0.845 | 0.455 | 0.674 | 0.749 |
| ExtraTrees | |||||||||
| Habitat | 0.816 | 0.865 | 0.825–0.905 | 0.872 | 0.679 | 0.869 | 0.685 | 0.872 | 0.871 |
| Habitat1 | 0.747 | 0.798 | 0.748–0.848 | 0.777 | 0.673 | 0.850 | 0.558 | 0.777 | 0.812 |
| Habitat2 | 0.730 | 0.827 | 0.781–0.873 | 0.683 | 0.844 | 0.914 | 0.523 | 0.683 | 0.782 |
| Habitat3 | 0.698 | 0.763 | 0.709–0.816 | 0.702 | 0.687 | 0.850 | 0.478 | 0.702 | 0.769 |
| LGBM | |||||||||
| Habitat | 0.891 | 0.942 | 0.915–0.968 | 0.891 | 0.890 | 0.952 | 0.770 | 0.891 | 0.920 |
| Habitat1 | 0.837 | 0.923 | 0.895–0.951 | 0.828 | 0.860 | 0.934 | 0.676 | 0.828 | 0.878 |
| Habitat2 | 0.837 | 0.881 | 0.842–0.920 | 0.849 | 0.807 | 0.915 | 0.687 | 0.849 | 0.881 |
| Habitat3 | 0.820 | 0.875 | 0.835–0.916 | 0.814 | 0.833 | 0.925 | 0.640 | 0.814 | 0.866 |
| MLP | |||||||||
| Habitat | 0.720 | 0.774 | 0.722–0.826 | 0.729 | 0.697 | 0.855 | 0.514 | 0.729 | 0.787 |
| Habitat1 | 0.758 | 0.756 | 0.702–0.811 | 0.820 | 0.607 | 0.833 | 0.586 | 0.820 | 0.827 |
| Habitat2 | 0.644 | 0.717 | 0.659–0.775 | 0.592 | 0.771 | 0.863 | 0.437 | 0.592 | 0.702 |
| Habitat3 | 0.689 | 0.659 | 0.594–0.725 | 0.777 | 0.469 | 0.787 | 0.455 | 0.777 | 0.782 |
AUC:area under the receiver operating characteristic curve; CI:confidence intervals; PPV: Positive Predictive Value;NPV: Negative Predictive Value;F1:F1 Score;LR:Logistic Regression; SVM: Support Vector Machine; LGBM: Light Gradient Boosting Machine;MLP: Multilayer Perceptron.
Bold values indicate the AUC of the optimal habitat model in the training cohort.
Table 3.
Comparative performance of habitat, Habitat1, Habitat2, and Habitat3 models on val cohorts
| model_name | Accuracy | AUC | 95%CI | Sensitivity | Specificity | PPV | NPV | Recall | F1 |
|---|---|---|---|---|---|---|---|---|---|
| LR | |||||||||
| Habitat | 0.602 | 0.767 | 0.674–0.859 | 0.557 | 0.905 | 0.975 | 0.235 | 0.557 | 0.709 |
| Habitat1 | 0.647 | 0.749 | 0.645–0.853 | 0.630 | 0.762 | 0.944 | 0.242 | 0.630 | 0.756 |
| Habitat2 | 0.388 | 0.678 | 0.565–0.791 | 0.300 | 1.000 | 1.000 | 0.169 | 0.300 | 0.462 |
| Habitat3 | 0.651 | 0.645 | 0.512–0.778 | 0.664 | 0.571 | 0.904 | 0.218 | 0.664 | 0.766 |
| SVM | |||||||||
| Habitat | 0.677 | 0.710 | 0.571–0.850 | 0.671 | 0.714 | 0.940 | 0.246 | 0.671 | 0.783 |
| Habitat1 | 0.699 | 0.697 | 0.561–0.833 | 0.711 | 0.619 | 0.923 | 0.250 | 0.711 | 0.803 |
| Habitat2 | 0.163 | 0.239 | 0.134–0.344 | 0.043 | 1.000 | 1.000 | 0.130 | 0.043 | 0.082 |
| Habitat3 | 0.826 | 0.565 | 0.410–0.719 | 0.906 | 0.333 | 0.892 | 0.368 | 0.906 | 0.899 |
| RandomForest | |||||||||
| Habitat | 0.795 | 0.790 | 0.695–0.886 | 0.814 | 0.667 | 0.942 | 0.350 | 0.814 | 0.874 |
| Habitat1 | 0.756 | 0.745 | 0.629–0.860 | 0.785 | 0.571 | 0.922 | 0.293 | 0.785 | 0.848 |
| Habitat2 | 0.594 | 0.618 | 0.493–0.742 | 0.586 | 0.650 | 0.921 | 0.183 | 0.586 | 0.716 |
| Habitat3 | 0.490 | 0.656 | 0.547–0.766 | 0.422 | 0.905 | 0.964 | 0.204 | 0.422 | 0.587 |
| ExtraTrees | |||||||||
| Habitat | 0.758 | 0.845 | 0.777–0.913 | 0.743 | 0.857 | 0.972 | 0.333 | 0.743 | 0.842 |
| Habitat1 | 0.654 | 0.773 | 0.666–0.879 | 0.622 | 0.857 | 0.966 | 0.261 | 0.622 | 0.757 |
| Habitat2 | 0.806 | 0.622 | 0.477–0.768 | 0.857 | 0.450 | 0.916 | 0.310 | 0.857 | 0.886 |
| Habitat3 | 0.839 | 0.691 | 0.566–0.816 | 0.906 | 0.429 | 0.906 | 0.429 | 0.906 | 0.906 |
| LGBM | |||||||||
| Habitat | 0.720 | 0.789 | 0.695–0.883 | 0.714 | 0.762 | 0.952 | 0.286 | 0.714 | 0.816 |
| Habitat1 | 0.564 | 0.740 | 0.643–0.837 | 0.504 | 0.952 | 0.986 | 0.230 | 0.504 | 0.667 |
| Habitat2 | 0.537 | 0.629 | 0.501–0.757 | 0.507 | 0.750 | 0.934 | 0.179 | 0.507 | 0.657 |
| Habitat3 | 0.423 | 0.579 | 0.468–0.691 | 0.344 | 0.905 | 0.957 | 0.184 | 0.344 | 0.506 |
| MLP | |||||||||
| Habitat | 0.534 | 0.776 | 0.677–0.874 | 0.471 | 0.952 | 0.985 | 0.213 | 0.471 | 0.638 |
| Habitat1 | 0.756 | 0.690 | 0.553–0.828 | 0.785 | 0.571 | 0.922 | 0.293 | 0.785 | 0.848 |
| Habitat2 | 0.406 | 0.605 | 0.478–0.733 | 0.336 | 0.900 | 0.959 | 0.162 | 0.336 | 0.497 |
| Habitat3 | 0.604 | 0.647 | 0.508–0.785 | 0.586 | 0.714 | 0.926 | 0.221 | 0.586 | 0.718 |
AUC:area under the receiver operating characteristic curve; CI:confidence intervals; PPV: Positive Predictive Value;NPV: Negative Predictive Value;F1:F1 Score;LR:Logistic Regression; SVM: Support Vector Machine; LGBM: Light Gradient Boosting Machine;MLP: Multilayer Perceptron
Bold values indicate the AUC of the optimal habitat model in the validation cohort.
Figure 5.
The ROC curves of habitat models constructed based on different machine learning classifiers in the train cohort
Habitat1 (A), Habitat2 (B), Habitat3 (C), and Habitat (D).
Figure 6.
The ROC curves of habitat models constructed based on different machine learning classifiers in the validation cohort
Habitat1 (A), Habitat2 (B), Habitat3 (C), and Habitat (D).
Radiomics model construction and evaluation
We extracted 1,834 features from the tumor VOI. These features were then screened using the same dimensionality reduction method as the habitat analysis, retaining the 2 most representative features (Figure 7). Based on these 2 features, we constructed a radiomics model employing six machine learning classifiers, consistent with the habitat analysis approach.
Figure 7.
Radiomics features selection
The coefficients for the cross-validation of the LASSO regression of the radiomics model (A); the MSE of LASSO analysis of the radiomics model (B); and final feature weights for the radiomics model (C).
As demonstrated in Figure 8 and Table 4, the performance of the radiomics models constructed using the six machine learning classifiers was generally suboptimal. The AUC values exhibited instability across the three cohorts, with noticeable decreases observed in both the validation and test cohorts. Among the six models, the one based on LightGBM demonstrated relatively acceptable performance, achieving AUC values of 0.766, 0.506, and 0.520 in the train, validation, and test cohorts, respectively.
Figure 8.
The ROC curves of radiomics models constructed based on different machine learning classifiers
Train cohort (A), validation cohort (B), and test cohorts (C).
Table 4.
Performance comparison of radiomics models constructed by different machine learning classifiers in different cohorts
| model_name | Accuracy | AUC | 95%CI | Sensitivity | Specificity | PPV | NPV | Recall | F1 |
|---|---|---|---|---|---|---|---|---|---|
| LR | |||||||||
| Train | 0.675 | 0.612 | 0.545–0.678 | 0.748 | 0.495 | 0.783 | 0.446 | 0.783 | 0.748 |
| Val | 0.696 | 0.511 | 0.361–0.660 | 0.736 | 0.429 | 0.896 | 0.196 | 0.896 | 0.736 |
| Test | 0.333 | 0.458 | 0.291–0.624 | 0.077 | 1.000 | 1.000 | 0.294 | 1.000 | 0.077 |
| SVM | |||||||||
| Train | 0.699 | 0.610 | 0.543–0.675 | 0.816 | 0.413 | 0.772 | 0.479 | 0.772 | 0.816 |
| Val | 0.534 | 0.504 | 0.364–0.642 | 0.514 | 0.667 | 0.911 | 0.171 | 0.911 | 0.514 |
| Test | 0.333 | 0.400 | 0.235–0.564 | 0.077 | 1.000 | 1.000 | 0.294 | 1.000 | 0.077 |
| RandomForest | |||||||||
| Train | 0.685 | 0.716 | 0.658–0.772 | 0.737 | 0.560 | 0.803 | 0.466 | 0.803 | 0.737 |
| Val | 0.714 | 0.494 | 0.348–0.640 | 0.764 | 0.381 | 0.892 | 0.195 | 0.892 | 0.764 |
| Test | 0.444 | 0.528 | 0.371–0.684 | 0.256 | 0.933 | 0.909 | 0.326 | 0.909 | 0.256 |
| ExtraTrees | |||||||||
| Train | 0.720 | 0.622 | 0.558–0.686 | 0.868 | 0.358 | 0.767 | 0.527 | 0.767 | 0.868 |
| Val | 0.571 | 0.490 | 0.370–0.608 | 0.579 | 0.524 | 0.890 | 0.157 | 0.890 | 0.579 |
| Test | 0.648 | 0.479 | 0.301–0.656 | 0.795 | 0.267 | 0.738 | 0.333 | 0.738 | 0.795 |
| LGBM | |||||||||
| Train | 0.701 | 0.766 | 0.714–0.817 | 0.703 | 0.697 | 0.850 | 0.490 | 0.850 | 0.703 |
| Val | 0.745 | 0.506 | 0.368–0.644 | 0.814 | 0.286 | 0.884 | 0.187 | 0.884 | 0.814 |
| Test | 0.500 | 0.520 | 0.354–0.684 | 0.385 | 0.800 | 0.833 | 0.333 | 0.833 | 0.385 |
| MLP | |||||||||
| Train | 0.693 | 0.619 | 0.554–0.684 | 0.812 | 0.404 | 0.769 | 0.468 | 0.769 | 0.812 |
| Val | 0.236 | 0.514 | 0.378–0.648 | 0.121 | 1.000 | 1.000 | 0.146 | 1.000 | 0.121 |
| Test | 0.444 | 0.508 | 0.337–0.678 | 0.256 | 0.933 | 0.909 | 0.326 | 0.909 | 0.256 |
AUC:area under the receiver operating characteristic curve; CI:confidence intervals; PPV: Positive Predictive Value;NPV: Negative Predictive Value;F1:F1 Score;LR:Logistic Regression; SVM: Support Vector Machine; LGBM: Light Gradient Boosting Machine;MLP: Multilayer Perceptron.
Bold values indicate the AUC of the best-performing model in the training, validation, and test cohorts.
3D deep learning model construction and evaluation
We evaluated three deep learning architectures-DenseNet121, ResNet18, and DenseNet169 construct end-to-end 3D deep learning models. The 3D deep learning model based on DenseNet121 exhibited the strongest performance, attaining AUC values of 0.733, 0.676, and 0.624 on the train, validation, and test cohorts, respectively (refer to Figure 9; Table 5).
Figure 9.
The ROC curves of 3D deep learning models constructed based on different algorithm
Train cohort (A), validation cohort (B), and test cohort (C).
Table 5.
Performance comparison of 3D deep learning models constructed in different cohorts
| model_name | Accuracy | AUC | 95%CI | Sensitivity | Specificity | PPV | NPV | Recall | F1 |
|---|---|---|---|---|---|---|---|---|---|
| DenseNet121 | |||||||||
| Train | 0.618 | 0.733 | 0.681–0.785 | 0.529 | 0.835 | 0.885 | 0.423 | 0.885 | 0.529 |
| Val | 0.703 | 0.676 | 0.537–0.815 | 0.709 | 0.667 | 0.931 | 0.264 | 0.931 | 0.709 |
| Test | 0.648 | 0.624 | 0.461–0.786 | 0.641 | 0.667 | 0.833 | 0.417 | 0.833 | 0.641 |
| Resnet18 | |||||||||
| Train | 0.715 | 0.770 | 0.721–0.819 | 0.715 | 0.716 | 0.858 | 0.510 | 0.858 | 0.715 |
| Val | 0.561 | 0.710 | 0.589–0.832 | 0.515 | 0.857 | 0.958 | 0.217 | 0.958 | 0.515 |
| Test | 0.463 | 0.491 | 0.316–0.666 | 0.333 | 0.800 | 0.812 | 0.316 | 0.812 | 0.333 |
| DenseNet169 | |||||||||
| Train | 0.578 | 0.549 | 0.485–0.613 | 0.601 | 0.523 | 0.752 | 0.352 | 0.752 | 0.601 |
| Val | 0.348 | 0.525 | 0.391–0.659 | 0.261 | 0.905 | 0.946 | 0.161 | 0.946 | 0.261 |
| Test | 0.704 | 0.395 | 0.218–0.572 | 0.897 | 0.200 | 0.745 | 0.429 | 0.745 | 0.897 |
AUC: area under the receiver operating characteristic curve; CI: confidence intervals; PPV: positive predictive value; NPV: negative predictive value; F1: F1 score
Bold values indicate the AUC of the best-performing model in the training, validation, and test cohorts.
Clinical model construction and evaluation
We conducted a comprehensive univariate analysis of all clinical features and calculated the odds ratios and corresponding p-values for each feature. As shown in Table 6, Age, Gender, Smoking_Status, CEA, CA125, CA153, CA199, CYFRA21_1, SCC, Histopathology, Clinical_stage, and TNM staging were all associated with immunotherapy benefit (all p-values <0.05). However, when these factors were included in the multivariate analysis, only age remained significantly associated with immunotherapy benefit. Subsequently, we incorporated age into the clinical model and constructed it using different classifiers.
Table 6.
Univariate and multivariate analyses of clinical factors in NSCLC patients
| Univariate analysis |
Multivariate analysis |
|||
|---|---|---|---|---|
| OR_UNI | p_value | OR_MULTI | p_value | |
| Age | 1.015(1.012–1.018) | 0.000 | 1.03(1.015–1.046) | 0.001 |
| Gender | 2.333(1.896–2.872) | 0.000 | 0.833(0.474–1.465) | 0.595 |
| Smoking_Status | 1.788 (1.481–2.158) | 0.000 | 0.941 (0.687–1.289) | 0.752 |
| CEA | 2.226 (1.696–2.921) | 0.000 | 1.021(0.666–1.564) | 0.936 |
| CA125 | 2.908(1.586–2.776) | 0.000 | 0.961(0.615–1.501) | 0.882 |
| CA153 | 2.174(1.436–3.290) | 0.002 | 1.047(0.619–1.770) | 0.886 |
| CA199 | 1.750(1.104–2.776) | 0.046 | 0.835(0.482–1.445) | 0.589 |
| CYFRA21_1 | 2.133(1.649–2.759) | 0.000 | 0.859(0.550–1.343) | 0.576 |
| SCC | 2.462(1.679–3.607) | 0.000 | 0.966(0.572–1.632) | 0.913 |
| Histopathology | 1.779(1.560–2.028) | 0.000 | 1.045(0.730–1.496) | 0.841 |
| Clinical_stage | 2.064(1.679–2.537) | 0.000 | 0.586(0.120–2.852) | 0.578 |
| T | 1.260(1.195–1.328) | 0.000 | 0.964(0.783–1.186) | 0.773 |
| N | 1.364(1.265–1.470) | 0.000 | 0.953(0.776–1.192) | 0.722 |
| M | 2.064(1.679–2.537) | 0.000 | 0.965(0.197–4.721) | 0.970 |
OR_UNI: the OR of univariate analysis; OR_MULTI: the OR of multivariate analysis.
Bold p-values indicate statistical significance (p < 0.05) in the multivariate analysis.
Figure 10 displays the ROC curves of the clinical models built by different machine learning classifiers. By comparing the AUC values across the train cohort, validation cohort, and test cohort, the LR-clinical model emerged as the optimal clinical model, with AUC values of 0.625, 0.608, and 0.526 for the training cohort, validation cohort, and external test cohorts, respectively. Table 7 provides a detailed comparison of the performance metrics of models constructed by different classifiers.
Figure 10.
Clinical features selection and models construction
Forest plot of clinical univariate analysis (A), forest plot of multivariable analysis (B), heatmap of clinical variables' correlations (C), the ROC curves of clinical models across machine learning classifiers on train cohort (D), validation cohort (E), and testing cohort (F).
Table 7.
Performance comparison of clinical models constructed by different machine learning classifiers in different cohorts
| model_name | Accuracy | AUC | 95%CI | Sensitivity | Specificity | PPV | NPV | Recall | F1 |
|---|---|---|---|---|---|---|---|---|---|
| LR | |||||||||
| Train | 0.600 | 0.625 | 0.564–0.685 | 0.571 | 0.670 | 0.809 | 0.39 | 0.571 | 0.670 |
| Val | 0.528 | 0.608 | 0.484–0.733 | 0.500 | 0.714 | 0.921 | 0.176 | 0.500 | 0.648 |
| Test | 0.593 | 0.526 | 0.355–0.698 | 0.564 | 0.667 | 0.815 | 0.37 | 0.564 | 0.667 |
| SVM | |||||||||
| Train | 0.795 | 0.835 | 0.793–0.878 | 0.808 | 0.761 | 0.892 | 0.619 | 0.808 | 0.848 |
| Val | 0.640 | 0.639 | 0.508–0.771 | 0.636 | 0.667 | 0.927 | 0.215 | 0.636 | 0.754 |
| Test | 0.352 | 0.441 | 0.271–0.611 | 0.128 | 0.933 | 0.833 | 0.292 | 0.128 | 0.222 |
| RandomForest | |||||||||
| Train | 0.749 | 0.754 | 0.702–0.807 | 0.827 | 0.560 | 0.821 | 0.570 | 0.827 | 0.824 |
| Val | 0.379 | 0.580 | 0.461–0.698 | 0.293 | 0.952 | 0.976 | 0.168 | 0.293 | 0.451 |
| Test | 0.407 | 0.516 | 0.340–0.692 | 0.179 | 1.000 | 1.000 | 0.319 | 0.179 | 0.304 |
| ExtraTrees | |||||||||
| Train | 0.568 | 0.698 | 0.641–0.754 | 0.440 | 0.881 | 0.900 | 0.392 | 0.440 | 0.591 |
| Val | 0.410 | 0.556 | 0.454–0.659 | 0.321 | 1.000 | 1.000 | 0.181 | 0.321 | 0.486 |
| Test | 0.704 | 0.574 | 0.392–0.757 | 0.821 | 0.400 | 0.780 | 0.462 | 0.821 | 0.800 |
| LGBM | |||||||||
| Train | 0.456 | 0.622 | 0.566–0.678 | 0.271 | 0.908 | 0.878 | 0.338 | 0.271 | 0.414 |
| Val | 0.410 | 0.614 | 0.499–0.729 | 0.350 | 0.810 | 0.925 | 0.157 | 0.350 | 0.508 |
| Test | 0.593 | 0.513 | 0.351–0.674 | 0.692 | 0.333 | 0.730 | 0.294 | 0.692 | 0.711 |
| MLP | |||||||||
| Train | 0.643 | 0.716 | 0.659–0.773 | 0.609 | 0.725 | 0.844 | 0.432 | 0.609 | 0.707 |
| Val | 0.509 | 0.646 | 0.523–0.769 | 0.464 | 0.810 | 0.942 | 0.185 | 0.464 | 0.622 |
| Test | 0.556 | 0.537 | 0.356–0.718 | 0.487 | 0.733 | 0.826 | 0.355 | 0.487 | 0.613 |
AUC: area under the receiver operating characteristic curve; CI: confidence intervals; PPV: positive predictive value; NPV: negative predictive value; F1: F1 score; LR: logistic regression; SVM: Support Vector Machine; LGBM: light gradient boosting machine; MLP: multilayer perceptron.
Bold values indicate the AUC of the best-performing model in the training, validation, and test cohorts.
Comparison of the habitat, radiomics, 3D deep learning, and clinical models
As shown in Figure 11 and Table 8, among all the models included for comparison, the Habitat-ExtraTrees-based model demonstrated the best performance in predicting the benefits of immunotherapy for lung cancer, with AUC values of 0.865, 0.845, and 0.814 in the training, validation, and test sets, respectively. The Delong test results indicated that, except for the absence of a statistically significant difference in AUC between Habitat-ExtraTrees and DenseNet121 in the validation set (p = 0.062), the model exhibited statistically significant differences in AUC compared to the Radiomics, 3D DeepLearning, and Clinical models in all other cohorts.
Figure 11.
The best comparison of image models
Comparison of ROC curves among different models in the train, validation, and test cohorts (A), comparison of DeLong tests among different models AUC in the train (B), validation (C), and test cohorts (D).
Table 8.
Comparative performance of the Habitat, Radiomics,3D DeepLearning, and clinical models
| model_name | Accuracy | AUC | 95%CI | Sensitivity | Specificty | PPV | NPV | Recall | F1 |
|---|---|---|---|---|---|---|---|---|---|
| Train | |||||||||
| Clinical-LR | 0.600 | 0.625 | 0.564–0.685 | 0.571 | 0.670 | 0.809 | 0.390 | 0.571 | 0.670 |
| Rad-LGBM | 0.701 | 0.766 | 0.714–0.817 | 0.703 | 0.697 | 0.850 | 0.490 | 0.850 | 0.703 |
| DenseNet121 | 0.618 | 0.733 | 0.681–0.785 | 0.529 | 0.835 | 0.885 | 0.423 | 0.885 | 0.529 |
| Habitat-ExtraTrees | 0.816 | 0.865 | 0.825–0.905 | 0.872 | 0.679 | 0.869 | 0.685 | 0.872 | 0.871 |
| Val | |||||||||
| Clinical-LR | 0.528 | 0.608 | 0.484–0.733 | 0.500 | 0.714 | 0.921 | 0.176 | 0.500 | 0.648 |
| Rad-LGBM | 0.745 | 0.506 | 0.368–0.644 | 0.814 | 0.286 | 0.884 | 0.187 | 0.884 | 0.814 |
| DenseNet121 | 0.703 | 0.676 | 0.537–0.815 | 0.709 | 0.667 | 0.931 | 0.264 | 0.931 | 0.709 |
| Habitat-ExtraTrees | 0.758 | 0.845 | 0.777–0.913 | 0.743 | 0.857 | 0.972 | 0.333 | 0.743 | 0.842 |
| Test | |||||||||
| Clinical-LR | 0.593 | 0.526 | 0.355–0.698 | 0.564 | 0.667 | 0.815 | 0.37 | 0.564 | 0.667 |
| Rad-LGBM | 0.500 | 0.520 | 0.354–0.684 | 0.385 | 0.800 | 0.833 | 0.333 | 0.833 | 0.385 |
| DenseNet121 | 0.648 | 0.624 | 0.461–0.786 | 0.641 | 0.667 | 0.833 | 0.417 | 0.833 | 0.641 |
| Habitat-ExtraTrees | 0.648 | 0.814 | 0.697–0.930 | 0.513 | 1.000 | 1.000 | 0.441 | 0.513 | 0.678 |
AUC: area under the receiver operating characteristic curve; CI: confidence intervals; PPV: Negative Predictive Value;NPV: Negative Predictive Value; F1: F1 score; Clinical-LR: a clinical model built with logistic regression; Rad-LGBM: a radiomics model built with LightGBM; Habitat-ExtraTrees: a habitat analysis model built with ExtraTrees.
Bold values denote the top performance for each metric per cohort. Notably, the Habitat-ExtraTrees model demonstrates the highest AUC in all three cohorts.
Comparison of the habitat, radiomics, 3D deeplearning, clinical models, and programmed death-1 receptor and its ligand model
To further validate the clinical value of the Habitat model in predicting the efficacy of immunotherapy in NSCLC, we enrolled 146 patients with NSCLC from Center A who had undergone PD-L1 immunohistochemical testing. We systematically compared the most predictive performance of the Habitat model, radiomics (Rad) model, 3D deep learning (3D-DL) model, clinical model, and PD-L1 expression models (including positive/negative and high/low expression groupings). The Habitat model was developed using the ExtraTrees classifier, the Rad model using the LGBM classifier, the 3D-DL model using the DenseNet121 architecture, and the clinical model using the LR classifier. As shown in Figure 12 and Table 9, the Habitat model demonstrated superior performance across multiple key metrics, achieving an accuracy of 0.803 and an AUC of 0.889 (95% CI: 0.835–0.942), while maintaining high sensitivity (0.800) and specificity (0.809). The 3D-DL model also exhibited certain competitive advantages in terms of AUC (0.714) and specificity (0.894), though its overall discriminative performance remained lower than that of the Habitat model. In contrast, both PD-L1 expression models (PDL1-PN and PDL1-HL) showed generally lower values across all metrics, indicating limited predictive capability.
Figure 12.
Comparative Performance of the Habitat, Radiomics, 3D DeepLearning, Clinical, and PD-L1 Models
Rad: Radiomics model; 3D-DL: The 3D deep learning model was constructed using the DenseNet121 architecture; PDL1-PN:PD-L1 positive/negative expression; PDL1-HL:PD-L1 high/low expression.
Table 9.
Comparative performance of the Habitat, Radiomics,3D DeepLearning, clinical, and PD-L1 models
| model_name | Accuracy | AUC | 95%Cl | Sensitivity | Specificity | PPV | NPV | Recall | F1 |
|---|---|---|---|---|---|---|---|---|---|
| Clinical | 0.476 | 0.571 | 0.474–0.666 | 0.271 | 0.915 | 0.871 | 0.371 | 0.27 | 0.412 |
| Rad | 0.707 | 0.582 | 0.471–0.691 | 0.813 | 0.489 | 0.771 | 0.548 | 0.81 | 0.790 |
| Habitat | 0.803 | 0.889 | 0.835–0.942 | 0.800 | 0.809 | 0.899 | 0.655 | 0.80 | 0.847 |
| 3D-DL | 0.626 | 0.714 | 0.629–0.798 | 0.501 | 0.894 | 0.909 | 0.457 | 0.50 | 0.645 |
| PDL1-PN | 0.592 | 0.570 | 0.483–0.656 | 0.632 | 0.511 | 0.733 | 0.393 | 0.63 | 0.677 |
| PDL1-HL | 0.429 | 0.535 | 0.466–0.603 | 0.245 | 0.830 | 0.750 | 0.339 | 0.24 | 0.364 |
AUC: area under the receiver operating characteristic curve; CI: confidence intervals; PPV: positive predictive value; NPV: negative predictive value; F1: F1 score; Rad: radiomics model; 3D-DL: the 3D deep learning model was constructed using the DenseNet121 architecture; PDL1-PN:PD-L1positive/negative expression; PDL1-HL:PD-L1 high/low expression.
Model explainability
To further enhance the interpretability of the Habitat model, we employed the SHAP algorithm to quantify the contribution of each radiomics feature to the final prediction. The Habitat model was constructed based on the ExtraTrees classifier. As illustrated in Figure 13, the SHAP summary plot provides a global interpretation of the ExtraTrees_Habitat model. This analysis identifies wavelet_HHL_firstorder_Maximum_h1_CT and square_firstorder_Median_h2_CT as critical predictive radiomic features for determining response to anti-PD-1/PD-L1 immunotherapy in patients with NSCLC. Figure 14 demonstrates the application of this ExtraTrees classifier-based Habitat model to an individual patient case: a 62-year-old male who achieved clinical benefit following 6 months of immunotherapy. Notably, the top-ranked predictors for this patient’s favorable outcome were also wavelet_HHL_firstorder_Maximum_h1_CT and square_firstorder_Median_h2_CT.
Figure 13.
Global interpretation of the Habitat-ExtraTrees Model
Bee swarm plot of all features in the ExtraTrees-Habitat model in the training cohort (A), Bar plot of radiomics features importance for the ExtraTrees-Habitat model in the training cohort (B). The wavelet_HHL_firstorder_Maximum_h1_CT and square_firstorder_Median_h2_CT were identified as the most significant features, with larger negative values exerting stronger inhibitory effects on the predicted outcomes.
Figure 14.
Case-specific interpretation of the Habitat-ExtraTrees Model
Visualization of the original lung tumor image, segmentation map, and habitat regions(A), Waterfall plot of the model for a single sample(B), Force plot of the model for a single sample(C).
Discussion
Although anti-PD-1/PD-L1 immunotherapy can significantly improve the OS and PFS of patients with advanced NSCLC. Currently, only a subset of patients responds to this treatment strategy. Given this situation, this study aims to construct a habitat model using CT images of patients with lung cancer, with the hope of establishing a non-invasive, robust, and safe biomarker to predict whether patients with NSCLC can benefit from immunotherapy. To rigorously evaluate the model’s generalization ability, it was trained exclusively on contrast-enhanced CT images and subsequently validated on an independent external test set consisting of non-contrast CT images. We used the K-Means algorithm to divide the tumor VOI into three sub-regions and extracted radiomics features from them. Subsequently, we constructed a habitat model based on six machine learning classifiers, namely LR, SVM, RF, ExtraTrees, LGBM, and MLP. Comprehensive comparison showed that the Habitat model constructed by the ExtraTrees-based classifier had the best performance, with AUC values of 0.865, 0.845, and 0.814 in the training set, validation set, and test set, respectively.
To further validate the predictive efficacy of the Habitat model, we simultaneously constructed a radiomics model and a 3D deep learning model based on the tumor VOI. The comprehensive comparison results indicated that the habitat model performed best in multiple evaluations. To more comprehensively assess whether this model could replace tumor tissue PD - L1 expression for predicting the benefits of immunotherapy, we enrolled 146 patients who underwent PD - L1 testing at Center A and systematically compared the Clinical - LR, Habitat - ExtraTrees, radiomics - LGBM models, the 3D deep learning model based on the DenseNet121 architecture, and the PD - L1 expression models (including positive/negative and high/low expression groups). The results showed that the Habitat model performed excellently in several key metrics, with an accuracy of 0.803, an AUC of 0.889, and it also maintained high sensitivity (0.800) and specificity (0.809).
We noticed that on the external test set, the habitat model constructed in this study experienced a decline in performance, which was specifically manifested as a decrease in sensitivity and accuracy. We analyze that this performance degradation may be caused by the following factors: Firstly, the unbalanced distribution of the outcome metric DCB in the dataset is a potential major factor leading to the decline in model performance. In this study, there were significant differences in the DCB ratios among the training set, validation set and external test set, which were 71%, 87%, and 72%, respectively. A higher DCB ratio in the internal validation set may cause the model to over-learn to identify DCB-positive cases during the optimization process, thereby resulting in an inflated sensitivity under this distribution. However, when the model was applied to an external test set where the DCB ratio dropped back to a level close to that of the training set, it showed a decline in the recognition ability for non-beneficial cases, and the sensitivity also decreased accordingly. Second, the model was developed using contrast-enhanced CT scans from Center A and tested using non-contrast CT scans from Center B. Since the model highly depended on contrast uptake patterns during training, the absence of contrast enhancement in the test set may reduce the discriminative power of certain imaging features, thereby compromising the model’s classification stability and resulting in decreased sensitivity. In addition, the heterogeneity in the selection of immunotherapy drugs among different centers may introduce uncontrollable confounding factors, further affect the consistency of patients' immune response patterns and thereby weakening the predictive stability of the model in external cohorts.
Both conventional radiomics and deep learning have demonstrated promising progress in predicting the efficacy of immunotherapy for lung cancer. Numerous scholars have developed radiomics models by extracting CT features before lung cancer treatment.16,19 Their findings demonstrate that, compared to tumor PD-L1 expression levels, radiomics models constructed based on CT imaging features exhibit superior efficacy in predicting anti-PD-L1 immunotherapy over conventional PD-L1 expression assessment methods. The radiomics models developed by the teams of Zhu et al.16 and Wu et al.19 achieved AUC values of 0.800 and 0.752 in the test set, respectively. In this study, we also attempted to construct radiomics models based on six machine learning classifiers (including LR, SVM, RF, ExtraTrees, LGBM, and MLP), but their prediction performance was generally poor. In the test sets, the AUC of most models was only in the range of 0.400–0.528, and the sensitivity and specificity were also unsatisfactory. Compared with the approach of Wu et al., who expanded the tumor VOI to include a 20 mm peritumoral area to capture more peri-tumoral information, our study, while incorporating some peri-tumoral context in the VOI delineation, defined a notably smaller peritumoral space. This difference in spatial coverage of the tumor microenvironment may have limited the extraction of informative radiomic features related to the tumor’s interaction with its surroundings, thereby contributing to the comparatively lower model performance. In addition, for multiple lesions, Zhu et al. retained up to five lesions according to the longest diameter of the lesions and used the multiple instance learning (MIL) method to adaptively weight the prediction results of each lesion, thus constructing a comprehensive radiomics model. In contrast, this study only constructed models based on single lesions and failed to fully capture the spatial heterogeneity and multifocal characteristics of tumors. Probably, the radiomics strategy combined with multiple instance learning may comprehensively and accurately evaluate the potential benefits of lung cancer immunotherapy.
Compared with traditional radiomics methods, deep learning can automatically extract higher-level and more abstract imaging features by learning from large-scale data, and has stronger representational ability.28 Leveraging the advantage of deep learning, our study constructed 3D deep learning models using three neural network architectures: DenseNet121, ResNet18, and DenseNet169. Among them, the model based on DenseNet121 performed relatively well, with AUC values of 0.733, 0.676, and 0.624 in the training set, validation set, and test set, respectively. Compared with the previously constructed traditional radiomics models, the deep learning models demonstrated certain improvements in prediction performance. Similarly, the SimTA deep learning model developed by Yang et al.17 demonstrated excellent performance in predicting DCB of immunotherapy for patients with NSCLC with an AUC of 0.80 (95% CI: 0.74–0.86). This model also showed effective predictive ability for patients' OS and PFS. Additionally, Wang et al.29 constructed a model integrating deep learning and radiomics features to predict the survival outcomes of patients with NSCLC after immunotherapy. Their research results also indicated that the predictive performance of the deep learning model was superior to that of traditional radiomics models.
We found that although deep learning models have more advantages than traditional radiomics in predicting the immunotherapeutic benefits of lung cancer, both typically regard tumors as homogeneous entities and fail to fully consider their spatial heterogeneity. This heterogeneity is usually manifested as irregular tumor growth patterns and changes in cell distribution and density, which profoundly affect the treatment outcomes.30 To address this challenge, researchers have proposed a novel radiomics approach—Habitat analysis. Habitat analysis identifies voxels with similar greyscale intensities in medical images and subsequently divides the tumor into several subregions for radiomic feature extraction.24 This method effectively simulates the spatial heterogeneity of tumors and enables accurate prediction of treatment response and prognostic evaluation.31,32
Cai et al.33 developed deep learning (DL) and Habitat models based on enhanced lung CT images to evaluate the DCB of immunotherapy for NSCLC. In the test set, the AUC of the DL model was 0.631 (95% CI:0.517–0.735), and the AUC of Habitat4 was 0.781 (95% CI:0.676–0.865). The Habitat model was slightly superior to deep learning. By fusing the DL and Habitat models, the model performance was improved, with an AUC of 0.865 (95%CI:0.772–0.931) in the test set. Similarly, Ye et al.34 developed a Habitat model utilizing arterial-phase contrast-enhanced CT images to forecast the pathological complete response of patients with NSCLC undergoing neoadjuvant immunochemotherapy. In comparison to the conventional radiomics model, the Habitat model exhibited superior performance, achieving an AUC of 0.781 (95% CI: 0.673–0.889) in the external validation set, while the AUC of the traditional radiomics model was 0.723 (95% CI: 0.591–0.855). These findings suggest that habitat analysis holds promise for assessing the efficacy of lung cancer treatments. Although CT examination is a routine method for patients with lung cancer, contrast-enhanced chest CT is not mandatory for elderly patients with advanced-stage disease. Additionally, patients with renal insufficiency are generally not recommended to undergo CT scans with iodine contrast agents. Based on these clinical considerations, our study aimed to develop a Habitat model using contrast-enhanced CT for training, and rigorously evaluate its generalizability on an independent external test set consisting of non-contrast CT images. This design directly addresses the common clinical scenario where contrast-enhanced CT may be contraindicated or unavailable. To comprehensively evaluate the performance of the machine learning-based habitat analysis model, we employed six different classifiers for comparative analysis, avoiding the limitations of a single classifier. Our study also shows that the habitat model outperforms the deep learning model and the traditional radiomics model in predicting the benefits of NSCLC immunotherapy. On the test set, the AUCs of ExtraTrees-Habitat, DenseNet121, and radiomics-LGBM are 0.814 (95% CI:0.697–0.930), 0.624 (95% CI:0.461–0.786), and 0.520 (95% CI:0.354–0.684), respectively. Although previous studies by Cai and Ye established habitat models based on contrast-enhanced CT, our work further demonstrates that a model trained on contrast-enhanced CT can maintain robust predictive performance when applied to non-contrast CT images. This supports the potential clinical translation of habitat analysis across varying imaging protocols.
However, in this study, both the traditional radiomics model and the deep learning model constructed from the same non-contrast CT scans performed poorly, with the radiomics model particularly showing limited discriminative ability. We speculate that the reasons may include the following: non-contrast images lack the hemodynamic information provided by contrast agents, and their relatively lower image contrast and signal-to-noise ratio restrict the ability of handcrafted features to characterize biological behaviors; meanwhile, the deep learning model may have been constrained by the training sample size and model architecture, preventing it from effectively extracting predictive deep features. In contrast, the habitat model, by characterizing the heterogeneity distribution within different subregions of the tumor, reduces reliance on absolute greyscale values to some extent. This approach allows it to maintain relatively robust performance even on non-contrast CT images.
To improve the clinical interpretability of key features of the habitat model in predicting the benefit of immunotherapy for lung cancer, we used a game theory-based SHAP analysis framework. This methodology uniformly assesses the direction and magnitude of each feature’s contribution to individual predictive outcomes, elucidating the specific impact of various features on the model’s decision-making process.35 Through SHAP analysis, we identified two key Habitat features that were significantly associated with the benefit of immunotherapy in NSCLC: wavelet_HHL_firstorder_Maximum_h1_CT and square_firstorder_Median_h2_CT. Formally, both features were categorized as first-order texture features. Traditionally, first-order features only describe the distribution properties of pixel grey values within an image region without considering their spatial relationships.36 However, in the way these two features are extracted, although they belong to the same first-order category, these two features are not directly derived from the greyscale information of the original image. They are derived from different tumor subregions through two advanced image processing techniques, the wavelet transform and the square filter, respectively. More importantly, SHAP analysis reveals significant differences in their contribution to the final outcome in the prediction model. This phenomenon confirms our previous hypothesis that the advantage of the effectiveness of Habitat’s model is not dependent on the original, absolute CT grey values, but rather stems from its ability to extract and parse the deeper features of the image. Specifically, the wavelet transform is able to decompose the image into components of different frequencies and orientations to capture texture information that is difficult to recognize by the human eye and conventional features,14 an approach that enhances the details of the heterogeneous structures inside the tumor (Habitat1), while the squared filtering enhances the subtle density contrasts of the peri-tumor region (Habitat2) through a nonlinear transformation.37 This suggests that our habitat model reduces the direct reliance on absolute grey values and instead keenly captures the changes in grey distribution and implicit spatial structural information within different regions enhanced by filtering. Of particular importance, this mechanism provides a rational explanation for the fact that our model still exhibits good performance on unenhanced CT images. In the absence of hemodynamic information provided by contrast agents, conventional CT feature discrimination is limited. Our method indirectly “mines” and “amplifies” the grey-scale changes or texture features in different regions of the image through the image transformation technique described above. This mechanism may be the basis for the good performance of the model on unenhanced CT images.
From a biological perspective, the constructed subregions in this study include Habitat1 and Habitat3, which originate from the tumor core region, while Habitat2 corresponds to the tumor invasive margin. The results of the swarm plot in Figure 13A demonstrate that lower values of the feature wavelet_HHL_firstorder_Maximum_h1_CT are more likely to predict poor prognosis. The underlying biological mechanism may lie in the fact that a lower greyscale maximum in this region reflects the absence of high-density cellular structures, which could be attributed to two potential reasons: first, the lack of high-density lymphocyte aggregation; second, a reduced number of tumor cells accompanied by extensive necrotic areas. Insufficient lymphocyte infiltration in the tumor core suggests that immune cells fail to effectively penetrate and eliminate tumor cells, manifesting as an “immune-excluded” or “immune-desert” phenotype,38 which is typically associated with resistance to immunotherapy.39,40 Additionally, tumor necrosis is often accompanied by hypoxia, enhanced glycolysis, and the formation of an immunosuppressive microenvironment, further impairing the response to immunotherapy.
On the other hand, the feature square_firstorder_Median_h2_CT originates from the tumor invasive margin region (Habitat2). Its median value reflects the central tendency of pixel grey values in this region, and a higher feature value tends to predict a positive treatment outcome. Previous studies have demonstrated that the texture features of the peritumoral region in lung cancer possess predictive value for immunotherapy efficacy. For instance, the model established by Khorrami et al.41 extracted radiomic features from both intratumoral and peritumoral regions and found that peritumoral textures could effectively distinguish immunotherapy responders from non-responders. Their preliminary research in breast cancer42 also suggested that peritumoral imaging features could be used to assess treatment response. This study further reveals that first-order texture features based on the invasive margin region can similarly predict immunotherapy efficacy, although no prior studies have reported characterization using first-order category features. We hypothesize that the increase in Median values may be associated with enhanced lymphocyte infiltration within the invasive margin, where elevated cellular density leads to an overall rise in grey values on CT imaging. The tumor invasive margin is often accompanied by abundant tumor-infiltrating lymphocytes and tertiary lymphoid structure formation,43,44 indicating a more active immune microenvironment, which may contribute to improved responses to immunotherapy. As this study employed a retrospective design and most enrolled cases were derived from biopsy specimens, we were unable to obtain whole-tumor pathological sections for comparative analysis. This limitation hindered our ability to directly validate the correspondence between the constructed imaging habitat subregions and the spatial cellular distribution structure of the tumor at the histological level. Although we could speculate, based on existing literature and pathophysiological mechanisms, that the radiomic features reflecting immune response changes might be associated with biological behaviors such as lymphocyte infiltration, necrosis, and vascular architecture, we were unable to provide further precise biological interpretations at the spatial cellular distribution level. Future prospective studies incorporating multi-region biopsies or whole-tumor resection specimens with spatial transcriptomics or multiplex immunohistochemistry are warranted to elucidate the true spatial relationship between imaging habitats and the tumor microenvironment.
Our study demonstrates that key imaging features derived from distinct subregions of the habitat collectively influence the prediction of immunotherapy efficacy in significantly different or even opposite directions. This finding strongly suggests that tumors and their microenvironments exhibit high spatial heterogeneity, with distinct anatomical subregions potentially representing vastly different biological behaviors, which can be separately extracted and quantified through imaging features. Notably, these regional features, particularly first-order features, exhibit complementary rather than uniform predictive value for treatment response, further highlighting the unique advantages and potential of habitat-based analysis in deciphering the spatial complexity of tumors and precisely assessing the immune microenvironment.
In conclusion, this study demonstrates that the “Habitat” analysis technique based on lung CT images can serve as a non-invasive and clinically feasible imaging biomarker to effectively predict the clinical benefits of immunotherapy in patients with NSCLC. The research provides a promising new strategy for identifying potential immunotherapy beneficiaries before treatment, which aids in optimizing clinical decision-making, improving treatment selection, and ultimately enhancing patient prognosis.
Limitations of the study
This study has several limitations. First, as a retrospective study, there was some heterogeneity in the immunotherapy regimens of the included patients, which may affect the generalizability of the findings. Second, most cases in this study were derived from biopsy specimens, and whole-tumor pathological sections were unavailable for comparative analysis. This limitation restricted our ability to directly validate the spatial correspondence between imaging-based habitat subregions and tumor cellular distribution patterns at the histological level. Finally, as a dual-center study, the size and diversity of the external validation cohort remained relatively limited. Future prospective, multicenter studies incorporating multi-region biopsies or whole-tumor resection specimens with spatial transcriptomics or multiplex immunofluorescence analyses are warranted to further elucidate the intrinsic relationship between imaging habitats and the tumor microenvironment at the spatial molecular level and to validate the model’s generalizability.
Resource availability
Lead contact
Further information and requests for data should be directed to and will be fulfilled by the lead contact, Guanqiao Jin (jinguanqiao77@gxmu.edu.cn).
Materials availability
This study did not generate new unique reagents.
Data and code availability
-
•
The radiological images and clinical data reported in this article can be obtained from the lead contact upon reasonable request.
-
•
All original code has been deposited on the GitHub and is publicly available as of the date of publication. DOI is listed in the key resources table.
-
•
Any additional information required to reanalyze the data reported in this article is available from the lead contact upon request.
Acknowledgments
We thank the Onekey AI platform for its assistance with this work. This research was funded by the Beijing Medical Award Foundation(Grant No. YXJL-2022-0665-0210), the Guangxi Key Research and Development Program (Grant No. GuikeAB23026087), the Natural Science Foundation of Guangxi Zhuang Autonomous Region(Grant No. 2023GXNSFAA026225), and the Baise Science Research and Technology Development Program(Grant No. Baike 20250338).
Author contributions
Conceptualization:X.X.H.1 and G.Q.J; methodology: X.X.H.1, Y.R.X, and H.Y.N.; investigation: H.Y.N., X.X.H.2, D.L.G, and K.W.; writing – original draft: X.X.H.1 and Y.R.X.; writing – review and editing: X.X.H.1, D.Y.H., and G.Q.J.; funding acquisition: X.X.H.1, Y.R.X., and G.Q.J.; resources: X.X.H.1, Y.R.X., and H.Y.N.; supervision: D.Y.H. and G.Q.J. X.X.H.1: Xiaoxiao Huang; X.X.H.2: Xiaoxin Huang.
Declaration of interests
The authors declare that they have no competing interests.
STAR★Methods
Key resources table
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| Deposited data | ||
| Code for models | This paper | https://github.com/Huang-xx-521/Habiitat-DL-Radiomics |
| Clinical data of patients | This paper | N/A |
| chest CT images | This paper | N/A |
| Software and algorithms | ||
| ITK-SNAP (version 4.4.0) | ITK-SNAP software | https://www.itksnap.org/ |
| Python (version 3.14.0) | Python Software Foundation | https://www.python.org/ |
| Onekey (version 4.9.1) | OnekeyAl-Platform | https://github.com/OnekeyAl-Platform/onekey |
Experimental model and study participant details
Ethics statement
This retrospective study was conducted in accordance with the principles of the Declaration of Helsinki and was approved by the Ethics Committee of Guangxi Medical University Cancer Hospital (approval number: KY-2022-301) and the Ethics Committee of the Affiliated Hospital of Youjiang Medical University for Nationalities (approval number: YYFY-LL-2025-228). The requirement for informed consent was waived by the ethics committees due to the retrospective nature of this study.
Patients and data information
This retrospective study collected data from NSCLC patients at two centers: Guangxi Medical University Cancer Hospital (Center A) and the Affiliated Hospital of Youjiang Medical University for Nationalities (Center B), between January 2021 and December 2024. Inclusion criteria were as follows: (1) receipt of at least two cycles of anti-PD-1/PD-L1 immunotherapy; (2) pathologically confirmed NSCLC. The exclusion criteria were as follows: (1) NSCLC patients defined as stage I-II according to the 9th edition of AJCC/UICC;(2) any treatment (surgery, radiotherapy, etc.) before the first CT examination;(3) incomplete clinical history data;(4) CT image slice thickness> 1.25 mm(all patients from center A underwent contrast-enhanced CT, whereas those from center B underwent non-contrast CT); (5) follow-up time less than 6 months or patients lost to follow-up. A total of 590 NSCLC patients met the inclusion and exclusion criteria for this study, with 536 from Center A and 54 from Center B. Patients from Center A were randomly split into a training cohort and a validation cohort at a 7:3 ratio, while those from Center B served as the test cohort (Figure 1). To assess the clinical utility of the imaging model, 146 NSCLC patients who underwent PD-L1 immunohistochemical testing from Center A were included. Center B was excluded from this analysis due to the limited number of patients with PD-L1 testing. Two pathologists jointly assessed the PD-L1 tumor proportion score. The expression levels were defined as follows: TPS < 1% was considered negative; TPS ≥ 1% was considered positive. Additionally, TPS < 50% was classified as low expression, and TPS ≥ 50% was classified as high expression.Subsequently, we collected patient clinical data such as age, gender, smoking history, histopathological type, clinical stage, TNM stage, tumor markers from clinical electronic databases.
Method details
CT scanning protocol and image acquisition
All patients underwent CT scanning. The CT scanning equipment included: 128-slice Siemens Sensation, 64-slice GE Revolution, 64-slice GE Discovery CT750 HD, and 256-slice GE Revolution Ace. For patients from center A, contrast-enhanced CT was performed via intravenous injection of a non-ionic iodinated contrast agent at a dose of 1.2–1.5 mL/kg body weight, with an injection rate of 2.5–3.0 mL/s. The arterial phase scanning was initiated using a bolus-tracking technique, with a region of interest placed in the descending aorta and a trigger threshold set at 100–120 Hounsfield units. The scanning parameters were typically set as follows: tube voltage, 120 kV; tube current, automated dose modulation; and reconstruction using a bone algorithm. Patients were positioned supine, head-first, and the scanning range extended from the thoracic inlet to 2–3 cm below the costophrenic angle during a single inspiratory breath-hold. All lung CT images were retrieved from the Picture Archiving and Communication System (PACS) in Digital Imaging and Communications in Medicine (DICOM) format for subsequent segmentation of the tumor region of interest (ROI).
Tumor segmentation
In this study, a junior doctor with five years of experience initially used ITK-SNAP (version 3.8.0, available at www.itksnap.org) for slice-by-slice delineation of the tumor ROI. A senior doctor with a decade of experience then reviewed the delineations. Tumor boundaries were manually traced using the brush tool in ITK-SNAP to create a complete tumor VOI. An 8-unit cubic brush was subsequently employed to expand the previously delineated tumor ROI, facilitating a comprehensive analysis of both intratumoral and peritumoral regions. This approach is based on evidence that peritumoral features can effectively predict treatment response.45,46 For inconsistent segmentations, a third senior radiologist provided expert consultation.
Generation of habitat subregions
Within the VOI of each tumor, we extracted a total of 19 local features per voxel.
These features included a range of shape descriptors, textural features, and first-order statistical attributes. The specific features extracted were: firstorder_Entropy,
firstorder_MeanAbsoluteDeviation, firstorder_Median,
glcm_DifferenceAverage, glcm_DifferenceEntropy, glcm_DifferenceVariance,
glcm_Imc1, glcm_Imc2, glcm_InverseVariance, glcm_JointEnergy,
glcm_JointEntropy, glcm_SumEntropy, glrlm_LongRunEmphasis,
glrlm_RunEntropy, glrlm_RunVariance,glszm_SizeZoneNonUniformityNormalized, glszm_SmallAreaHighGrayLevelEmphasis, ngtdm_Contrast, and ngtdm_Strength.
we employed the K-means algorithm to cluster all voxels and their associated features, exploring 3 to 10 distinct cluster centers to delineate different intratumoral habitat regions.The clustering performance was evaluated using the Calinski-Harabasz (CH) index, Davies-Bouldin (DB) index, and Silhouette score to optimize segmentation efficacy. Following cluster analysis, subregions sharing identical clustering features were merged, with each region representing a unique intratumoral microenvironmental signature. We systematically assessed the impact of varying the number of cluster centers from 3 to 10 on analytical validity. After a comprehensive evaluation, the optimal number of cluster centers (K) was ultimately determined to be 3(Detailed information is provided in Figure S1, Table S2). Thus, we categorized the habitat into three subregions, designated as Habitat1, Habitat2, and Habitat3.
Feature extraction and selection of radiomics and habitat
In the radiomics model and the Habitat model, we strictly followed the Image Biomarker Standardization Initiative (IBSI) guidelines and used Python package PyRadiomics for feature extraction. We adopted the same method for feature extraction in the radiomics model and the habitat sub - regions, and divided the extracted features into geometric, intensity, and texture categories. GLCM, GLRLM, GLSZM, and NGTDM techniques were used to evaluate tumor shape, voxel brightness, and spatial patterns.
The extracted features were normalized to ensure the consistency of all data dimensions and facilitate subsequent feature selection. For feature selection, we first retained features with an intraclass correlation coefficient (ICC) greater than 0.75 based on ICC analysis to ensure good reproducibility.Then, the extracted features were first subjected to Pearson correlation analysis. Based on the correlation coefficients, we retained only one feature among those with high correlation (corr > 0.9). Subsequently, we employed the Minimum Redundancy Maximum Relevance (mRMR) algorithm and LASSO regression with 10-fold cross-validation to identify the final radiomics features, retaining only the most predictive ones to construct a robust and streamlined model.
Construction of radiomics, habitat, and clinical models
We employed six machine learning classifiers, including Logistic Regression, Support Vector Machine, Random Forest, ExtraTrees, LightGBM, and Multilayer Perceptron, to construct predictive models based on radiomic features and habitat-derived features for predicting durable clinical benefit (DCB) from anti-PD-1/PD-L1 therapy in NSCLC patients. For the habitat-based approach, we developed four distinct models: Habitat, Habitat1, Habitat2, and Habitat3. The Habitat model integrated imaging features from all three subregions (Habitat1, Habitat2, and Habitat3). Additionally, clinical features were screened through univariable and multivariable analyses, and the selected clinical variables were used to build a clinical prediction model using the same set of machine learning classifiers. DCB was defined as PFS exceeding 6 months, as evaluated by the Response Evaluation Criteria in Solid Tumors (RECIST version 1.1) . Accordingly, patients were classified into either the DCB (PFS > 6 months) or the no-DCB cohort (PFS ≤ 6 months).
Construction of 3D deep learning model
In this study, three deep learning architectures, namely DenseNet121, ResNet18, and DenseNet169, were employed to construct 3D deep learning models. By extending the two-dimensional convolution, pooling, and normalization operations in the original network to three-dimensional forms and replacing the fully connected layer with global average pooling 3D, the models can effectively process three-dimensional medical imaging data. The input images were uniformly sampled to 64×64×64 voxels and subjected to Z-score normalization. During the training phase, data augmentation strategies such as random rotation, translation, and scaling were adopted to enhance the generalization ability of the models. The Adam optimizer was used for model training, with an initial learning rate of 0.01. The cosine annealing scheduling strategy was applied to dynamically adjust the learning rate, and the maximum number of training epochs was set to 100.
Model evaluation
The diagnostic performance of each predictive model was evaluated using receiver operating characteristic (ROC) curve analysis, including sensitivity, specificity, accuracy, positive predictive value(PPV), negative predictive value(NPV), and Recall, F1-score and area under the ROC curve (AUC). The optimal imaging predictive model was determined based on ROC curves. A detailed flowchart of the process from CT scanning to model development is illustrated in Figure 2.
Model interpretability
We employed SHAP (Shapley Additive exPlanations) visualization to enhance the interpretability of the model, thereby improving its credibility for clinical applications. SHAP quantifies the contribution of each feature to the model's prediction as a “Shapley value,” elucidating the impact of individual features on the model's output. The SHAP analysis was performed using the KernelExplainer package in Python.
Quantification and statistical analysis
We assessed the normality of clinical characteristics using the Shapiro-Wilk test. Continuous variables were analyzed using t-tests or Mann-Whitney U tests, while categorical variables were analyzed using chi-square (χ2) tests. Any analyses and model construction in this study were performed on the OnekeyAI platform (v4.9.1, Python 3.14.0). We use PyRadiomics (v3.1.0) for radiomics feature extraction, Scikit-learn (v1.0.2) for machine learning training, and PyTorch (v1.11.0) for deep learning training.
Published: December 24, 2025
Footnotes
Supplemental information can be found online at https://doi.org/10.1016/j.isci.2025.114522.
Contributor Information
Deyou Huang, Email: fzxyh2012@126.com.
Guanqiao Jin, Email: jinguanqiao77@gxmu.edu.cn.
Supplemental information
References
- 1.Jamal-Hanjani M., Wilson G.A., McGranahan N., Birkbak N.J., Watkins T.B.K., Veeriah S., Shafi S., Johnson D.H., Mitter R., Rosenthal R., et al. Tracking the Evolution of Non-Small-Cell Lung Cancer. N. Engl. J. Med. 2017;376:2109–2121. doi: 10.1056/NEJMoa1616288. [DOI] [PubMed] [Google Scholar]
- 2.Doroshow D.B., Herbst R.S. Treatment of Advanced Non-Small Cell Lung Cancer in 2018. JAMA Oncol. 2018;4:569–570. doi: 10.1001/jamaoncol.2017.5190. [DOI] [PubMed] [Google Scholar]
- 3.Spurr L.F., Martinez C.A., Kang W., Chen M., Zha Y., Hseu R., Gutiontov S.I., Turchan W.T., Lynch C.M., Pointer K.B., et al. Highly aneuploid non-small cell lung cancer shows enhanced responsiveness to concurrent radiation and immune checkpoint blockade. Nat. Cancer. 2022;3:1498–1512. doi: 10.1038/s43018-022-00467-x. [DOI] [PubMed] [Google Scholar]
- 4.John T., Sakai H., Ikeda S., Cheng Y., Kasahara K., Sato Y., Nakahara Y., Takeda M., Kaneda H., Zhang H., et al. First-line nivolumab plus ipilimumab combined with two cycles of chemotherapy in advanced non-small cell lung cancer: a subanalysis of Asian patients in CheckMate 9LA. Int. J. Clin. Oncol. 2022;27:695–706. doi: 10.1007/s10147-022-02120-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Rittmeyer A., Barlesi F., Waterkamp D., Park K., Ciardiello F., von Pawel J., Gadgeel S.M., Hida T., Kowalski D.M., Dols M.C., et al. Atezolizumab versus docetaxel in patients with previously treated non-small-cell lung cancer (OAK): a phase 3, open-label, multicentre randomised controlled trial. Lancet. 2017;389:255–265. doi: 10.1016/s0140-6736(16)32517-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Hellmann M.D., Paz-Ares L., Bernabe Caro R., Zurawski B., Kim S.W., Carcereny Costa E., Park K., Alexandru A., Lupinacci L., de la Mora Jimenez E., et al. Nivolumab plus Ipilimumab in Advanced Non-Small-Cell Lung Cancer. N. Engl. J. Med. 2019;381:2020–2031. doi: 10.1056/NEJMoa1910231. [DOI] [PubMed] [Google Scholar]
- 7.Gettinger S., Horn L., Jackman D., Spigel D., Antonia S., Hellmann M., Powderly J., Heist R., Sequist L.V., Smith D.C., et al. Five-Year Follow-Up of Nivolumab in Previously Treated Advanced Non-Small-Cell Lung Cancer: Results From the CA209-003 Study. J. Clin. Oncol. 2018;36:1675–1684. doi: 10.1200/jco.2017.77.0412. [DOI] [PubMed] [Google Scholar]
- 8.Borghaei H., Paz-Ares L., Horn L., Spigel D.R., Steins M., Ready N.E., Chow L.Q., Vokes E.E., Felip E., Holgado E., et al. Nivolumab versus Docetaxel in Advanced Nonsquamous Non-Small-Cell Lung Cancer. N. Engl. J. Med. 2015;373:1627–1639. doi: 10.1056/NEJMoa1507643. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Nishino M., Ramaiya N.H., Hatabu H., Hodi F.S. Monitoring immune-checkpoint blockade: response evaluation and biomarker development. Nat. Rev. Clin. Oncol. 2017;14:655–668. doi: 10.1038/nrclinonc.2017.88. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Topalian S.L., Taube J.M., Anders R.A., Pardoll D.M. Mechanism-driven biomarkers to guide immune checkpoint blockade in cancer therapy. Nat. Rev. Cancer. 2016;16:275–287. doi: 10.1038/nrc.2016.36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Huang E.P., O'Connor J.P.B., McShane L.M., Giger M.L., Lambin P., Kinahan P.E., Siegel E.L., Shankar L.K. Criteria for the translation of radiomics into clinically useful tests. Nat. Rev. Clin. Oncol. 2023;20:69–82. doi: 10.1038/s41571-022-00707-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Gillies R.J., Kinahan P.E., Hricak H. Radiomics: Images Are More than Pictures, They Are Data. Radiology. 2016;278:563–577. doi: 10.1148/radiol.2015151169. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Cen Y.Y., Nong H.Y., Huang X.X., Lu X.X., Pu C.H., Huang L.H., Zheng X.J., Pan Z.L., Huang Y., Ding K., Huang D.Y. Computed tomography-based deep learning and multi-instance learning for predicting microvascular invasion and prognosis in hepatocellular carcinoma. World J. Gastroenterol. 2025;31 doi: 10.3748/wjg.v31.i30.109186. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Lambin P., Rios-Velazquez E., Leijenaar R., Carvalho S., van Stiphout R.G.P.M., Granton P., Zegers C.M.L., Gillies R., Boellard R., Dekker A., Aerts H.J.W.L. Radiomics: extracting more information from medical images using advanced feature analysis. Eur. J. Cancer. 2012;48:441–446. doi: 10.1016/j.ejca.2011.11.036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Mu W., Tunali I., Gray J.E., Qi J., Schabath M.B., Gillies R.J. Radiomics of (18)F-FDG PET/CT images predicts clinical benefit of advanced NSCLC patients to checkpoint blockade immunotherapy. Eur. J. Nucl. Med. Mol. Imaging. 2020;47:1168–1182. doi: 10.1007/s00259-019-04625-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Zhu Z., Chen M., Hu G., Pan Z., Han W., Tan W., Zhou Z., Wang M., Mao L., Li X., et al. A pre-treatment CT-based weighted radiomic approach combined with clinical characteristics to predict durable clinical benefits of immunotherapy in advanced lung cancer. Eur. Radiol. 2023;33:3918–3930. doi: 10.1007/s00330-022-09337-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Yang Y., Yang J., Shen L., Chen J., Xia L., Ni B., Ge L., Wang Y., Lu S. A multi-omics-based serial deep learning approach to predict clinical outcomes of single-agent anti-PD-1/PD-L1 immunotherapy in advanced stage non-small-cell lung cancer. Am. J. Transl. Res. 2021;13:743–756. [PMC free article] [PubMed] [Google Scholar]
- 18.Liu C., Gong J., Yu H., Liu Q., Wang S., Wang J. A CT-Based Radiomics Approach to Predict Nivolumab Response in Advanced Non-Small-Cell Lung Cancer. Front. Oncol. 2021;11 doi: 10.3389/fonc.2021.544339. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Wu S., Zhan W., Liu L., Xie D., Yao L., Yao H., Liao G., Huang L., Zhou Y., You P., et al. Pretreatment radiomic biomarker for immunotherapy responder prediction in stage IB-IV NSCLC (LCDigital-IO Study): a multicenter retrospective study. J. Immunother. Cancer. 2023;11 doi: 10.1136/jitc-2023-007369. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Mu W., Jiang L., Shi Y., Tunali I., Gray J.E., Katsoulakis E., Tian J., Gillies R.J., Schabath M.B. Non-invasive measurement of PD-L1 status and prediction of immunotherapy response using deep learning of PET/CT images. J. Immunother. Cancer. 2021;9 doi: 10.1136/jitc-2020-002118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Saad M.B., Hong L., Aminu M., Vokes N.I., Chen P., Salehjahromi M., Qin K., Sujit S.J., Lu X., Young E., et al. Predicting benefit from immune checkpoint inhibitors in patients with non-small-cell lung cancer by CT-based ensemble deep learning: a retrospective study. Lancet Digit. Health. 2023;5:e404–e420. doi: 10.1016/s2589-7500(23)00082-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Kashyap A., Rapsomaniki M.A., Barros V., Fomitcheva-Khartchenko A., Martinelli A.L., Rodriguez A.F., Gabrani M., Rosen-Zvi M., Kaigala G. Quantification of tumor heterogeneity: from data acquisition to metric generation. Trends Biotechnol. 2022;40:647–676. doi: 10.1016/j.tibtech.2021.11.006. [DOI] [PubMed] [Google Scholar]
- 23.Tian P., He B., Mu W., Liu K., Liu L., Zeng H., Liu Y., Jiang L., Zhou P., Huang Z., et al. Assessing PD-L1 expression in non-small cell lung cancer and predicting responses to immune checkpoint inhibitors using deep learning on computed tomography images. Theranostics. 2021;11:2098–2107. doi: 10.7150/thno.48027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Chaudhury B., Zhou M., Goldgof D.B., Hall L.O., Gatenby R.A., Gillies R.J., Patel B.K., Weinfurtner R.J., Drukteinis J.S. Heterogeneity in intratumoral regions with rapid gadolinium washout correlates with estrogen receptor status and nodal metastasis. J. Magn. Reson. Imaging. 2015;42:1421–1430. doi: 10.1002/jmri.24921. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Shi Z., Huang X., Cheng Z., Xu Z., Lin H., Liu C., Chen X., Liu C., Liang C., Lu C., et al. MRI-based Quantification of Intratumoral Heterogeneity for Predicting Treatment Response to Neoadjuvant Chemotherapy in Breast Cancer. Radiology. 2023;308 doi: 10.1148/radiol.222830. [DOI] [PubMed] [Google Scholar]
- 26.Huang H., Chen H., Zheng D., Chen C., Wang Y., Xu L., Wang Y., He X., Yang Y., Li W. Habitat-based radiomics analysis for evaluating immediate response in colorectal cancer lung metastases treated by radiofrequency ablation. Cancer Imaging. 2024;24:44. doi: 10.1186/s40644-024-00692-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Liu H.F., Wang M., Lu Y.J., Wang Q., Lu Y., Xing F., Xing W. CEMRI-Based Quantification of Intratumoral Heterogeneity for Predicting Aggressive Characteristics of Hepatocellular Carcinoma Using Habitat Analysis: Comparison and Combination of Deep Learning. Acad. Radiol. 2024;31:2346–2355. doi: 10.1016/j.acra.2023.11.024. [DOI] [PubMed] [Google Scholar]
- 28.Chen M., Copley S.J., Viola P., Lu H., Aboagye E.O. Radiomics and artificial intelligence for precision medicine in lung cancer treatment. Semin. Cancer Biol. 2023;93:97–113. doi: 10.1016/j.semcancer.2023.05.004. [DOI] [PubMed] [Google Scholar]
- 29.Wang C., Ma J., Shao J., Zhang S., Li J., Yan J., Zhao Z., Bai C., Yu Y., Li W. Non-Invasive Measurement Using Deep Learning Algorithm Based on Multi-Source Features Fusion to Predict PD-L1 Expression and Survival in NSCLC. Front. Immunol. 2022;13 doi: 10.3389/fimmu.2022.828560. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Gatenby R.A., Grove O., Gillies R.J. Quantitative Imaging in Cancer Evolution and Ecology. Radiology. 2013;269:8–15. doi: 10.1148/radiol.13122697. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Kim J., Ryu S.Y., Lee S.H., Lee H.Y., Park H. Clustering approach to identify intratumour heterogeneity combining FDG PET and diffusion-weighted MRI in lung adenocarcinoma. Eur. Radiol. 2019;29:468–475. doi: 10.1007/s00330-018-5590-0. [DOI] [PubMed] [Google Scholar]
- 32.Park J.E., Kim H.S., Kim N., Park S.Y., Kim Y.H., Kim J.H. Spatiotemporal Heterogeneity in Multiparametric Physiologic MRI Is Associated with Patient Outcomes in IDH-Wildtype Glioblastoma. Clin. Cancer Res. 2021;27:237–245. doi: 10.1158/1078-0432.Ccr-20-2156. [DOI] [PubMed] [Google Scholar]
- 33.Caii W., Wu X., Guo K., Chen Y., Shi Y., Chen J. Integration of deep learning and habitat radiomics for predicting the response to immunotherapy in NSCLC patients. Cancer Immunol. Immunother. 2024;73:153. doi: 10.1007/s00262-024-03724-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Ye G., Wu G., Zhang C., Wang M., Liu H., Song E., Zhuang Y., Li K., Qi Y., Liao Y. CT-based quantification of intratumoral heterogeneity for predicting pathologic complete response to neoadjuvant immunochemotherapy in non-small cell lung cancer. Front. Immunol. 2024;15 doi: 10.3389/fimmu.2024.1414954. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Qi X., Wang S., Fang C., Jia J., Lin L., Yuan T. Machine learning and SHAP value interpretation for predicting comorbidity of cardiovascular disease and cancer with dietary antioxidants. Redox Biol. 2025;79 doi: 10.1016/j.redox.2024.103470. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Schöneck M., Lennartz S., Zopfs D., Sonnabend K., Wawer Matos Reimer R., Rinneburger M., Graffe J., Persigehl T., Hentschke C., Baeßler B., et al. Robustness of radiomic features in healthy abdominal parenchyma of patients with repeated examinations on dual-layer dual-energy CT. Eur. J. Radiol. 2024;175 doi: 10.1016/j.ejrad.2024.111447. [DOI] [PubMed] [Google Scholar]
- 37.van Griethuysen J.J.M., Fedorov A., Parmar C., Hosny A., Aucoin N., Narayan V., Beets-Tan R.G.H., Fillion-Robin J.C., Pieper S., Aerts H.J.W.L. Computational Radiomics System to Decode the Radiographic Phenotype. Cancer Res. 2017;77:e104–e107. doi: 10.1158/0008-5472.Can-17-0339. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Galon J., Bruni D. Approaches to treat immune hot, altered and cold tumours with combination immunotherapies. Nat. Rev. Drug Discov. 2019;18:197–218. doi: 10.1038/s41573-018-0007-y. [DOI] [PubMed] [Google Scholar]
- 39.Herbst R.S., Soria J.C., Kowanetz M., Fine G.D., Hamid O., Gordon M.S., Sosman J.A., McDermott D.F., Powderly J.D., Gettinger S.N., et al. Predictive correlates of response to the anti-PD-L1 antibody MPDL3280A in cancer patients. Nature. 2014;515:563–567. doi: 10.1038/nature14011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Tumeh P.C., Harview C.L., Yearley J.H., Shintaku I.P., Taylor E.J.M., Robert L., Chmielowski B., Spasic M., Henry G., Ciobanu V., et al. PD-1 blockade induces responses by inhibiting adaptive immune resistance. Nature. 2014;515:568–571. doi: 10.1038/nature13954. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Khorrami M., Prasanna P., Gupta A., Patil P., Velu P.D., Thawani R., Corredor G., Alilou M., Bera K., Fu P., et al. Changes in CT Radiomic Features Associated with Lymphocyte Distribution Predict Overall Survival and Response to Immunotherapy in Non-Small Cell Lung Cancer. Cancer Immunol. Res. 2020;8:108–119. doi: 10.1158/2326-6066.Cir-19-0476. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Braman N., Prasanna P., Whitney J., Singh S., Beig N., Etesami M., Bates D.D.B., Gallagher K., Bloch B.N., Vulchi M., et al. Association of Peritumoral Radiomics With Tumor Biology and Pathologic Response to Preoperative Targeted Therapy for HER2 (ERBB2)-Positive Breast Cancer. JAMA Netw. Open. 2019;2 doi: 10.1001/jamanetworkopen.2019.2561. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Rodriguez A.B., Peske J.D., Woods A.N., Leick K.M., Mauldin I.S., Meneveau M.O., Young S.J., Lindsay R.S., Melssen M.M., Cyranowski S., et al. Immune mechanisms orchestrate tertiary lymphoid structures in tumors via cancer-associated fibroblasts. Cell Rep. 2021;36 doi: 10.1016/j.celrep.2021.109422. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Lopez de Rodas M., Villalba-Esparza M., Sanmamed M.F., Chen L., Rimm D.L., Schalper K.A. Biological and clinical significance of tumour-infiltrating lymphocytes in the era of immunotherapy: a multidimensional approach. Nat. Rev. Clin. Oncol. 2025;22:163–181. doi: 10.1038/s41571-024-00984-x. [DOI] [PubMed] [Google Scholar]
- 45.Li X., Guo Y., Huang S., Wang F., Dai C., Zhou J., Lin D. A CT-based intratumoral and peritumoral radiomics nomogram for postoperative recurrence risk stratification in localized clear cell renal cell carcinoma. BMC Med. Imaging. 2025;25:167. doi: 10.1186/s12880-025-01715-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Chen Z., Zhu H., Shu H., Zhang J., Gu K., Yao W. Preoperative prediction of WHO/ISUP grade of ccRCC using intratumoral and peritumoral habitat imaging: multicenter study. Cancer Imaging. 2025;25:59. doi: 10.1186/s40644-025-00875-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
-
•
The radiological images and clinical data reported in this article can be obtained from the lead contact upon reasonable request.
-
•
All original code has been deposited on the GitHub and is publicly available as of the date of publication. DOI is listed in the key resources table.
-
•
Any additional information required to reanalyze the data reported in this article is available from the lead contact upon request.














