Skip to main content
Breast Cancer Research : BCR logoLink to Breast Cancer Research : BCR
. 2026 Apr 4;28:97. doi: 10.1186/s13058-026-02275-y

Integrating tumor habitat heterogeneity with a hybrid deep learning architecture for ultrasound radiomics: a dual-center study on non-invasive prediction of PD-L1 expression in triple-negative breast cancer

Zhiyong Li 1,#, Huanzhong Su 2,#, Han Xiao 3, Cong Chen 1, Peng Lin 1, Ensheng Xue 1, Rongxi Liang 1, Qin Ye 1,, Zhenhu Lin 1,
PMCID: PMC13188313  PMID: 41935256

Abstract

Objective

We sought to create a non-invasive method for predicting programmed death-ligand 1 (PD-L1) expression in triple-negative breast cancer (TNBC) by combining ultrasound radiomics with tumor habitat analysis and a Transformer-ResNet hybrid deep learning approach.

Materials and methods

Pathologically confirmed TNBC patients treated from January 2020 through December 2024 at two centers were retrospectively analyzed. Pretreatment ultrasound images and PD-L1 immunohistochemistry results were collected, with positivity defined as a combined positive score ≥ 10. We applied K-means clustering to partition tumor regions into three habitat zones and extracted radiomic features from each zone separately. Transformer and ResNet networks provided additional deep learning features. A multi-stage selection process—including intraclass correlation coefficient testing, univariate screening, correlation filtering, and LASSO regression—was used to build Habitat, Transformer, and ResNet models individually. These were then merged into a Combined nomogram. Model performance was examined through ROC curves, calibration plots, and decision curve analysis.

Results

Six hundred fifty-four patients were enrolled (252 with PD-L1 positivity; 402 without). Training used 457 cases from Fujian Medical University Union Hospital; external validation involved 197 cases from the First Affiliated Hospital of Xiamen University. Zone 3 yielded the most predictive features (n = 18). Training AUCs reached 0.843, 0.869, 0.854, and 0.945 for Habitat, Transformer, ResNet, and Combined models respectively. External validation AUCs were 0.812, 0.842, 0.827, and 0.946 respectively. The Combined approach exceeded individual models by 10.4–13.4% and showed superior net benefit at threshold probabilities from 0.2 to 0.7.

Conclusion

Our Combined model accurately predicts PD-L1 status in TNBC using integrated habitat and deep learning features while offering a practical imaging biomarker for immunotherapy candidate selection.

Keywords: Triple-negative breast cancer, PD-L1, Ultrasound radiomics, Habitat analysis, Transformer, ResNet, Nomogram

Introduction

Triple-negative breast cancer (TNBC) makes up roughly 15–20% of breast cancer diagnoses and shows no expression of estrogen receptor (ER), progesterone receptor (PR), or human epidermal growth factor receptor 2 (HER2). This means patients cannot benefit from endocrine therapy or treatments targeting HER2. The disease demonstrates aggressive growth patterns, high malignant potential, and an unfavorable prognosis [1]. Recent advances in immunotherapy, particularly inhibitors targeting the programmed death-1/programmed death-ligand 1 (PD-1/PD-L1) axis, have changed treatment options for TNBC. Data from several large clinical trials indicate that PD-L1-positive TNBC patients receiving immunotherapy reach objective response rates of 40–60% and experience notable survival gains [2, 3]. Determining PD-L1 expression accurately is thus crucial for selecting appropriate immunotherapy candidates and designing individualized treatment plans.

At present, immunohistochemistry (IHC) on biopsy or surgical tissue serves as the main approach for assessing PD-L1 expression. While IHC is regarded as the gold standard, it carries significant drawbacks. The invasive procedure required for tissue collection can cause patient complications. Beyond this, biopsies typically sample only one tumor area, which may fail to represent heterogeneity across the whole lesion. There is also the problem that PD-L1 expression changes over time and location, making repeated biopsies during treatment impractical [4]. These limitations underscore why non-invasive methods capable of evaluating PD-L1 status throughout entire tumors are needed.

Ultrasound radiomics provides useful capabilities for cancer imaging—it’s non-invasive, allows real-time scanning, and involves no radiation [5]. To date, ultrasound-based radiomics has demonstrated promising performance in the noninvasive prediction of various molecular biomarkers in breast cancer, including estrogen receptor (ER), progesterone receptor (PR), human epidermal growth factor receptor 2 (HER2), and Ki-67 expression status [68], establishing a solid methodological foundation for imaging-based molecular profiling. However, to the best of our knowledge, no published study has yet applied ultrasound-based radiomics or deep learning for the noninvasive prediction of PD-L1 expression in breast cancer. However, using standard ultrasound radiomics to predict PD-L1 in TNBC presents challenges. Conventional approaches treat whole tumors as homogeneous structures while ignoring biological variation between different intratumoral regions. This variation actually corresponds to how PD-L1 distributes spatially within tumors. Another limitation is that traditional feature extraction methods depend on fixed mathematical definitions and frequently fail to capture intricate image patterns or higher-level semantic information [9, 10].

Tumor habitat analysis offers a newer way to characterize imaging phenotypes by applying clustering algorithms that divide tumors into separate subregions (habitats) based on similar imaging traits. These habitats represent different local microenvironment properties which help quantify spatial heterogeneity and provide more biologically relevant descriptions of tumor complexity [1113]. Meanwhile, deep learning has revolutionized how we analyze medical images. Convolutional networks such as ResNet learn hierarchical features through multiple convolution layers and work well for detecting local structural details. Transformer models use self-attention to identify long-range spatial relationships and capture broader contextual information [14, 15]. Bringing together these two approaches should allow for more complete and detailed characterization of ultrasound data. Unlike conventional multimodal frameworks that extract and fuse radiomics and deep learning features at the global tumor level, our approach employs habitat clustering as a spatial bridge, enabling biologically guided, subregion-specific feature extraction that better preserves intratumoral heterogeneity.

With this background, we set out to build and test a new ultrasound radiomics method that brings tumor habitat heterogeneity together with a combined Transformer-ResNet deep learning system to predict PD-L1 expression non-invasively in TNBC patients. Since ensuring robustness, broad applicability, and clinical practicality matters greatly for real-world use—and because no previous studies have tackled this particular question—we used a two-center study design with separate external validation in an independent patient group. We aim to create a practical imaging biomarker that helps guide immunotherapy decisions in TNBC while supporting precision medicine approaches for breast cancer patients.

Materials and methods

Study design and patient cohorts

This dual-center retrospective study received ethics approval (No.: 2024KJCX061) with a waiver of informed consent. We enrolled TNBC patients treated at Fujian Medical University Union Hospital (training cohort, n = 457) and the First Affiliated Hospital of Xiamen University (external validation cohort, n = 197) from January 2020 to December 2024. The two cohorts were allocated based on institution of origin rather than random splitting, thereby constituting a geographically independent external validation. No internal validation subset was held out from the training cohort; internal model optimization (i.e., LASSO regularization parameter λ selection) was performed using tenfold cross-validation within the training cohort. Inclusion criteria were: pathologically confirmed TNBC (ER, PR, HER2 negative) via core needle biopsy; complete PD-L1 immunohistochemistry results; pretreatment breast ultrasound with adequate image quality; and complete clinicopathological data. Patients were excluded if they received neoadjuvant therapy before ultrasound, had poor-quality images with severe artifacts, or lacked complete clinical data.

PD-L1 expression was assessed using the 22C3 antibody (Dako) for immunohistochemical staining. Two experienced pathologists independently calculated the Combined Positive Score (CPS), defined as PD-L1-positive cells (tumor cells, lymphocytes, macrophages) divided by total viable tumor cells, multiplied by 100. CPS ≥ 10 indicated PD-L1 positivity; CPS < 10 indicated negativity. Discrepancies were resolved by consensus among three senior pathologists.

Ultrasound image acquisition and preprocessing

Ultrasound images were acquired using PHILIPS EPIQ7 or GE LOGIQ E20 systems with 7.5–12 MHz linear transducers. Two-dimensional grayscale images showing the maximal tumor cross-section were selected. Gain, contrast, and depth settings were optimized to visualize the entire lesion and surrounding tissue. To mitigate potential domain shift introduced by differences in ultrasound systems and operator practices, several post-hoc harmonization measures were implemented. First, all images underwent a standardized preprocessing pipeline applied uniformly across both centers, comprising the following steps: (1) Intensity normalization: pixel intensities were linearly rescaled to a fixed range of 0–1000 using a fixed-scale linear transformation (normalizeScale = 1000), rather than mean- and standard deviation-based (z-score) normalization. This fixed-range approach preserves absolute inter-image intensity relationships and ensures a consistent gray-level distribution across different ultrasound systems, providing sufficient dynamic range for subsequent intensity discretization. No additional ultrasound-specific normalization—such as time-gain compensation (TGC) correction, log-compression adjustment, or speckle-noise filtering—was applied, as these corrections were considered to be handled at the scanner hardware level prior to DICOM export; (2) Smoothing: no additional smoothing filter was applied beyond the built-in preprocessing, as ultrasound images were considered to contain sufficient inherent signal-to-noise characteristics for direct feature extraction; (3) Resampling: images were resampled to an isotropic pixel spacing of 3 × 3 mm using nearest-neighbor interpolation, standardizing spatial resolution across acquisition devices; (4) Intensity discretization: gray-level intensities were discretized using a fixed bin width of 5 (binWidth = 5), yielding approximately 200 discrete gray levels across the normalized intensity range of 0–1000. This fixed bin width approach was applied consistently across all samples to ensure feature reproducibility, in accordance with IBSI reporting recommendations. Second, ComBat harmonization was applied to all extracted radiomics features prior to model training, with center identity used as the batch covariate, to explicitly correct for inter-center batch effects at the feature level. Critically, ComBat model parameters (batch effect estimates) were estimated exclusively from the training cohort and subsequently applied to the validation cohort without re-fitting, to prevent data leakage. This approach has been validated in multicenter radiomics studies for its ability to remove center-related variance while preserving biologically meaningful feature variability. Nonetheless, we acknowledge that residual variability attributable to retrospective differences in acquisition settings and operator habits cannot be fully eliminated, and this remains a limitation of the present study.

Tumors were segmented using ITK-SNAP software (v. 4.0) by two radiologists (each with > 10 years of breast ultrasound experience), both blinded to PD-L1 status. Inter-observer segmentation agreement was quantified using the Dice similarity coefficient (DSC). Across all 654 cases, the mean DSC was 0.923 ± 0.041 (range: 0.847–0.981), indicating overall good inter-observer reproducibility. Cases with DSC < 0.85 (n = 31, 4.7% of the total cohort) were considered to have insufficient agreement and were re-segmented through consensus discussion moderated by a third senior radiologist (> 15 years of experience). The final segmentation mask used for all subsequent analyses was the consensus result for these cases, and the independently produced mask of the first radiologist for all remaining cases.

Habitat partitioning and feature extraction

Within the segmented region of interest (ROI), voxel-level local feature extraction was conducted. For each voxel Inline graphic in the ROI, multidimensional local features were calculated within a 9 × 9 neighborhood window, including local entropy, local mean, local standard deviation, and local contrast. For boundary voxels where the 9 × 9 neighborhood window extended beyond the tumor margin, a masked-padding strategy was applied: only voxels within the segmented tumor mask were included in the neighborhood computation, while extra-tumoral voxels were excluded. This approach prevents contamination from peritumoral tissue and ensures that all local feature calculations reflect solely the internal tumor characteristics. Among these, local entropy quantifies the uncertainty of grayscale intensity distribution within the neighborhood of a voxel, and is mathematically defined as:

graphic file with name d33e314.gif

K-means clustering partitioned tumors into spatial habitats by grouping voxels with similar imaging characteristics. Prior to clustering, all radiomics features were z-score normalized (across features, not applied at the image intensity level) to reduce sensitivity to non-Gaussian distributions and to minimize the influence of outliers and inter-feature scale differences. The mean and standard deviation used for z-score normalization were computed solely from the training cohort and subsequently applied to the validation cohort, ensuring no information leakage between cohorts. We tested K values from 2 to 10 and used the Calinski-Harabasz (CH) index to determine the optimal cluster number. For each candidate K, clustering was applied to all tumors in the training cohort independently, and the mean CH index across all training-cohort tumors was computed. The K yielding the highest mean CH index (K = 3) was selected as the globally fixed cluster number and applied uniformly to all tumors in both cohorts. Radiomics features were then independently extracted from each habitat using the PyRadiomics toolkit (v. 3.0.1).

Transformer model training and feature extraction

We used Vision Transformer (ViT-B/16) with 12 encoder layers and multi-headed self-attention for global context learning. Images were divided into 16 × 16 pixel patches, linearly projected with positional encoding, and then fed to the encoder. Transfer learning from ImageNet-21 k pretrained weights was applied: the first 9 encoder layers were frozen while the final 3 layers and classification head were fine-tuned, as lower layers encode transferable Training used the AdamW optimizer (learning rate 5 × 10⁻5, weight decay 0.01) with cosine annealing scheduling for up to 100 epochs and early stopping (patience = 20, monitored on training-cohort validation loss via tenfold cross-validation). Data augmentation included random flipping, ± 10° rotation, and brightness/contrast adjustment. Deep features were extracted from the penultimate encoder layer output and then reduced via PCA retaining ≥ 95% cumulative variance. The PCA transformation matrix (eigenvectors and explained variance) was estimated solely from the training cohort and subsequently applied to the validation cohort without re-fitting.

ResNet model training and feature extraction

ResNet50 served as the convolutional feature extractor, leveraging residual connections to mitigate gradient vanishing and capture hierarchical spatial features. Images were resized to 224 × 224 pixels. We applied two-stage transfer learning from ImageNet pretrained weights: initially, all convolutional layers were frozen and only the classification head was trained for 10 epochs; then the last two residual blocks (conv4_x and conv5_x) were unfrozen for fine-tuning, as this progressive strategy preserves low-level generic texture representations in early layers while allowing higher-level feature adaptation to ultrasound-specific characteristics, thereby reducing overfitting risk given our limited dataset size. Training used the SGD optimizer (momentum 0.9, learning rate 0.001) with a batch size of 32 for up to 80 epochs and early stopping after 15 consecutive epochs without improvement in validation loss, monitored on a held-out fold within the training cohort. Data augmentation included random flipping, ± 15° rotation, and 0.85–1.15 × scaling. We extracted 2048-dimensional features from conv5_x, applied L2 normalization, and then reduced dimensionality via PCA while retaining ≥ 95% variance. As with the Transformer pipeline, the PCA transformation matrix was derived exclusively from the training cohort and applied to the validation cohort without re-fitting.

Construction of habitat, transformer, and ResNet models

Feature selection for each feature type followed four steps: (1) Stability: features with an intraclass correlation coefficient (ICC) < 0.75 were excluded; (2) Univariate filtering: the Mann–Whitney U test was applied, followed by Benjamini–Hochberg false discovery rate (FDR) correction for multiple comparisons; only features with a corrected q-value < 0.05 were retained; (3) Correlation filtering: when Pearson correlation exceeded |r|> 0.9, the feature with the smaller q-value was kept; (4) LASSO regression with tenfold cross-validation selected features with non-zero coefficients at optimal λ. Critically, all steps of this feature selection pipeline—including ICC threshold determination, univariate Mann–Whitney U testing, correlation-based pruning, and LASSO coefficient estimation—were performed exclusively on the training cohort. The resulting selection criteria (ICC threshold, retained feature list, correlation structure, and LASSO coefficients at optimal λ) were frozen and applied to the validation cohort without re-fitting, to prevent data leakage.

Three LASSO logistic regression models were built from the selected features: Habitat, Transformer, and ResNet models. Each generated a risk score (HRS, TRS, RRS) representing PD-L1 positivity probability for subsequent model integration.

Construction of the combined model

The three risk scores (HRS, TRS, RRS) and clinical variables were integrated via multivariate logistic regression. Clinical variables were first screened by univariate analysis (p < 0.05), then entered into stepwise bidirectional multivariate regression with the risk scores (entry α = 0.05, removal α = 0.10) to identify independent predictors.A nomogram was constructed using the R package ‘rms’ to visualize the Combined model, assigning points to each predictor and converting total points to PD-L1 positivity probability.

Model evaluation

The four models (Habitat, Transformer, ResNet, Combined nomogram) were evaluated in training and external validation cohorts using: (1) Discrimination: ROC curves and AUC with 95% CI; (2) Classification metrics: sensitivity, specificity, PPV, NPV, and accuracy; (3) Calibration: calibration curves assessing predicted versus observed probabilities; (4) Clinical utility: decision curve analysis (DCA) quantifying net benefit. Figure 1 shows the overall workflow.

Fig. 1.

Fig. 1

Overview of the study workflow. xStep 1: Pretreatment breast ultrasound images were collected from TNBC patientsacross two centers. Step 2: Tumors were manually segmented (upper: whole-tumor mask in red) and partitioned into three habitat subregions via K-means clustering (lower: red, green, and blue zones). Step 3: Radiomics features were extracted from each habitat zone; deep features were extracted from ResNet50 (conv2_x–conv5_x) and Vision Transformer, with a representative class activation map shown. Step 4: Three radiomics-based risk scores and clinical variables were integrated into a Combined nomogram; model performance was assessed by ROC-AUC and decision curve analysis in both cohorts

Statistical analysis

Statistical analyses used R version 4.1.0. Continuous variables were compared using t-tests or Mann–Whitney U tests; categorical variables were compared using chi-square or Fisher’s exact tests. Inter-rater agreement was assessed by ICC (“irr” package). Feature selection applied ICC filtering, Mann–Whitney U tests with Benjamini–Hochberg FDR correction (“p.adjust” function in R, method = ”BH”), Pearson correlation analysis, and LASSO regression (“glmnet” with tenfold cross-validation). A corrected q-value threshold of < 0.05 was applied during the univariate filtering step to minimize Type I error accumulation across multiple simultaneous feature comparisons. Logistic regression and nomogram construction used the “rms” package. Model performance was evaluated by ROC-AUC (“pROC”), calibration curves (“rms”, 1000 bootstrap resamples), and DCA (“rmda”). Figures were generated using “ggplot2”. Two-sided p < 0.05 indicated significance for all other statistical tests.

Results

Baseline patient characteristics

A total of 654 pathologically confirmed TNBC patients were enrolled (Fig. 2): 252 (38.5%) PD-L1-positive and 402 (61.5%) PD-L1-negative. The training cohort included 457 patients (178 PD-L1-positive [38.9%], 279 PD-L1-negative [61.1%]); the external validation cohort included 197 patients (74 PD-L1-positive [37.6%], 123 PD-L1-negative [62.4%]). PD-L1 positivity rates were similar between cohorts (p = 0.752).

Fig. 2.

Fig. 2

Patient selection flowchart

Clinicopathologic features are shown in Table 1. Age, tumor diameter, lateralization, lymph node metastasis, histological grade, and Ki-67 index showed no significant differences between the PD-L1-positive and PD-L1-negative groups in either cohort (all p > 0.05), highlighting the need for imaging-based predictive models.

Table 1.

Baseline characteristics of the patients

Variable Class Training cohort (n = 457) Validation cohort (n = 197)
PD-L1 positive (n = 178) PD-L1 negative (n = 279) p PD-L1 positive (n = 74) PD-L1 negative (n = 123) p
Side Left 91 (51.12) 139 (49.82) 0.861 39 (52.70) 68 (55.28) 0.838
Right 87 (48.88) 140 (50.18) 35 (47.30) 55 (44.72)
Age (years) 48.03 ± 10.75 49.42 ± 10.28 0.199 49.45 ± 11.62 48.32 ± 10.83 0.491
US diameter (mm) 34.84 ± 15.01 34.44 ± 13.57 0.919 32.73 ± 14.53 34.45 ± 12.34 0.283
Lymph node metastasis Positive 142 (79.78) 231 (82.80) 0.585 66 (89.19) 107 (86.99) 0.817
Negative 36 (20.22) 48 (17.20) 8 (10.81) 16 (13.01)
Histological grading Ⅰ, Ⅱ 151 (84.83) 243 (87.10) 0.840 59 (79.73) 106 (86.18) 0.323
27 (15.17) 36 (12.90) 15 (20.27) 17 (13.82)
Ki-67 status ≥ 20% 156 (87.64) 228 (81.72) 0.12 65 (87.84) 96 (78.05) 0.126
< 20% 22 (12.36) 51 (18.28) 9 (12.16) 27 (21.95)

The data are the number of patients, with percentages in parentheses.

Ki-67, antigen Ki67

Habitat model

K-means clustering with Calinski-Harabasz index optimization identified three tumor habitats: h1 (red), h2 (green), and h3 (blue) (Fig. 3). From 1,561 radiomics features per habitat, multi-step selection retained 34 features for LASSO regression (Figs. 4A, 5A, 6A): 18 from habitat 3, 11 from habitat 2, and 5 from habitat 1. These features captured texture heterogeneity, gray-level density, and spatial structure. The Habitat model achieved an AUC of 0.827 (95% CI 0.791–0.863) in training and an AUC of 0.812 (95% CI 0.754–0.870) in validation cohorts (Table 2, Figs. 7A-B), demonstrating good generalizability.

Fig. 3.

Fig. 3

Illustration of habitat analysis for breast cancer lesions. Workflow of radiomics-based tumor habitat analysis. A Original ultrasound image of a triple-negative breast cancer lesion; B Tumor ROI segmentation (red); C Visualization of the Gray Level Run Length Matrix Short Run Emphasis feature; D Visualization of the Gray Level Size Zone Matrix Gray Level Non-Uniformity feature; E Visualization of the Gray Level Co-occurrence Matrix Maximum Probability feature; F Final habitat partitioning using K-means clustering (K = 3), displaying three subregions with distinct imaging phenotypes: habitat region 1 (red), habitat region 2 (green), and habitat region 3 (blue)

Fig. 4.

Fig. 4

LASSO regression feature selection process. Optimal lambda parameter selection via tenfold cross-validation. A Habitat model; B Transformer model; C ResNet model. The x-axis represents log(lambda) values, the y-axis denotes AUC performance, and the numbers at the top indicate the number of retained features. Red dots represent mean values, gray error bars represent standard deviation, and dashed vertical lines indicate the optimal lambda values

Fig. 5.

Fig. 5

LASSO regression coefficient paths. A Habitat model; B Transformer model; C ResNet model. The x-axis represents log(lambda) values, and the y-axis denotes the corresponding regression coefficients. Each colored curve represents the trajectory of a feature’s coefficient as lambda varies, with numbers at the top indicating the number of features with non-zero coefficients. Dashed vertical lines indicate the optimal lambda values of 0.0332, 0.0145, and 0.0827, respectively

Fig. 6.

Fig. 6

Distribution of feature coefficients after LASSO selection. A Habitat model; B Transformer model; C ResNet model. The x-axis represents coefficient values, and the y-axis lists the feature names retained after LASSO regression. Bar length indicates the absolute magnitude of the coefficients, reflecting each feature’s contribution to the prediction outcome. Different colors distinguish coefficient directions: positive coefficients (extending to the right) indicate a positive association with PD-L1 positivity, while negative coefficients (extending to the left) indicate a negative association

Table 2.

Predictive performance of the models

Model Class AUC (95% CI) Accuracy Sensitivity Specificity PPV NPV
Habitat model Training cohort 0.827 (0.791–0.863) 0.729 0.86 0.606 0.671 0.822
Validation cohort 0.812 (0.754 − 0.870) 0.726 0.848 0.587 0.701 0.771
Transformer model Training cohort 0.845 (0.810 − 0.880) 0.772 0.679 0.86 0.82 0.741
Validation cohort 0.842 (0.789 − 0.894) 0.736 0.629 0.859 0.835 0.669
Resnet model Training cohort 0.842 (0.808 − 0.877) 0.753 0.747 0.758 0.743 0.762
Validation cohort 0.827 (0.771 − 0.883) 0.721 0.648 0.804 0.791 0.667
Combined model Training cohort 0.945 (0.925 − 0.964) 0.877 0.864 0.89 0.88 0.875
Validation cohort 0.946 (0.919 − 0.974) 0.868 0.848 0.87 0.881 0.833

Fig. 7.

Fig. 7

ROC curve comparison of four models. A Training cohort; B External validation cohort. Blue: Combined Model; Red: Habitat model; Green: ResNet model; Yellow: Transformer model. Legends show AUC values with 95% confidence intervals. The Combined Model reached an AUC of 0.945 in training, outperforming individual models, and an AUC of 0.946 in external validation, indicating good generalization

Transformer model

Vision transformer (ViT-B/16) extracted deep features from the penultimate encoder layer, reduced via PCA and multi-step selection, yielding 41 features for LASSO regression. These features represent global spatial dependencies and high-level semantic representations. The Transformer model achieved an AUC of 0.845 (95% CI 0.810–0.880) in training and an AUC of 0.842 (95% CI 0.789–0.894) in validation (Table 2, Figs. 7A, B). High specificity and PPV indicate low false-positive rates.

ResNet model

ResNet50 extracted 2048-dimensional features from conv5_x, followed by L2 normalization, PCA reduction, and multi-step selection. LASSO regression selected 31 features capturing local texture and hierarchical representations.The ResNet model achieved an AUC of 0.897 (95% CI 0.851–0.943) in training and an AUC of 0.827 (95% CI 0.771–0.883) in validation (Table 2, Figs. 7A, B), with balanced sensitivity and specificity.

Combined model

A nomogram integrated predicted probabilities from all three models via multivariate logistic regression (Fig. 8). Point allocations reflected each model’s contribution: Habitat 0–60 points, ResNet 0–70 points, Transformer 0–100 points (total 0–280), converted to PD-L1 positivity probability (0.1–0.9). The Transformer model received the highest weight, indicating the importance of global deep learning features.

Fig. 8.

Fig. 8

Nomogram of the combined prediction model. This nomogram combines predicted probabilities from Habitat, ResNet, and Transformer models. Usage: (1) Find each model’s probability on its axis and draw a line upward to the Points axis to get the score. (2) Add the three scores for Total Points. (3) Draw a line from Total Points to the Risk axis for the final predicted probability. The nomogram provides a visual aid for clinical risk assessment and treatment decisions

The Combined model achieved an AUC of 0.945 (95% CI 0.925–0.964) in training and an AUC of 0.946 (95% CI 0.919–0.974) in validation (Table 2, Fig. 7), outperforming individual models. The optimal decision threshold for all models was determined in the training cohort using Youden’s Index (sensitivity + specificity − 1). AUC values with 95% confidence intervals (estimated via DeLong’s method) are reported in Table 2 for both the training cohortand the external validation cohort, thereby providing center-stratified discrimination performance. Other classification metrics (sensitivity, specificity, PPV, NPV, and accuracy) are reported as point estimates at the Youden-optimal threshold. Decision curve analysis showed that the Combined model provided greater net benefit than individual models across threshold probabilities of 0.1–0.8, particularly in the clinically relevant range of 0.2–0.7, exceeding “treat all” and “treat none” strategies (Fig. 9). Calibration curves demonstrated good agreement between predicted and observed PD-L1 positivity rates for all models (Figs. 10, 11), with the Combined model closest to the ideal diagonal. In accordance with the principles outlined by Abbasian Ardakani et al. [16], quantitative calibration was further assessed in the training cohort using bootstrap resampling (1,000 replicates). The Combined model achieved a Brier score of 0.142 (95% CI 0.128–0.156), a calibration slope of 1.03 (95% CI 0.97–1.09), and a calibration intercept of − 0.01 (95% CI − 0.06–0.04), collectively confirming well-aligned predicted probabilities with observed event frequencies and satisfactory internal probabilistic calibration.

Fig. 9.

Fig. 9

Decision curve analysis. A Training cohort; B External validation cohort. The combined model (red) demonstrates the highest net benefit across the majority of the clinically relevant threshold probability range, outperforming all individual models as well as the “treat-all” and “treat-none” strategies

Fig. 10.

Fig. 10

Calibration curves of four models in the training cohort. A Combined Model; B Habitat Model; C ResNet Model; D Transformer Model. X-axis: predicted probability; Y-axis: observed outcome proportion. Red dashed line: ideal calibration (45° diagonal), where predictions match observed outcomes. Gray dotted line: apparent calibration. Black solid line: bias-corrected calibration from bootstrap resampling (1000 replicates). Curves nearer the diagonal indicate better agreement between predicted and observed probabilities. The Combined Model has the best calibration, with its curve closest to the ideal line. Quantitative calibration metrics for the Combined Model (bootstrap internal validation): Brier score = 0.142 (95% CI 0.128–0.156), calibration slope = 1.03 (95% CI 0.97–1.09), calibration intercept =  − 0.01 (95% CI − 0.06–0.04)

Fig. 11.

Fig. 11

Calibration curves of four models in the validation cohort. A Combined Model; B Habitat Model; C ResNet Model; D Transformer Model. X-axis: predicted probability; Y-axis: observed outcome proportion. Red dashed line: ideal calibration (45° diagonal). The Combined Model maintains good calibration in external validation, with its curve aligning well with the ideal diagonal

Discussion

We developed an ultrasound radiomics model integrating tumor habitat heterogeneity with a Transformer-ResNet hybrid architecture for non-invasive PD-L1 prediction in TNBC patients. In dual-center validation, the Combined model achieved strong discrimination (training AUC: 0.945; external AUC: 0.946), surpassing individual models. This supports its potential as an imaging biomarker to guide immunotherapy decisions in TNBC.

Tumors are highly spatially heterogeneous, and the microenvironmental features in various regions directly affect the distribution patterns of PD-L1 expression. [11]. Standard whole-tumor ROI approaches overlook this complexity. Habitat analysis addresses this limitation by partitioning tumors into subregions with similar imaging phenotypes, revealing underlying biological features [17, 18]. We applied K-means clustering to divide tumors into three habitat subregions. Habitat 3 yielded the most predictive features (n = 18), indicating potential enrichment of imaging phenotypes linked to the immune microenvironment. Wu et al. recently demonstrated that habitat analysis detects intratumoral phenotypic variations and improves heterogeneity characterization [17]. Other studies have shown that habitat imaging predicts neoadjuvant chemotherapy response [12]. Using MRI-based habitat quantification, Shi et al. established associations between intratumoral heterogeneity and treatment outcomes [13]. In external validation, our Habitat model reached an AUC of 0.812, supporting cross-institutional applicability.

Few studies have applied ultrasound-based habitat analysis to predict PD-L1 expression in TNBC. We integrated habitat analysis into ultrasound radiomics to quantify intratumoral imaging heterogeneity. The three habitats exhibited distinct imaging characteristics, and the following biological interpretations are offered as exploratory, imaging-based inferences grounded in established ultrasound semiology, rather than histopathologically validated conclusions. Habitat 1 (red) was predominantly localized to hyperechoic peripheral regions. In standard ultrasound practice, peripheral hyperechogenicity is conventionally associated with desmoplastic stromal reactions or calcifications; on this basis, we tentatively propose that Habitat 1 may reflect stromal-dominant microenvironments, which have been broadly linked to immune exclusion. Habitat 2 (green) was distributed within solid tumor components—a pattern that, by conventional ultrasound interpretation, corresponds to regions of high tissue density and may speculatively reflect proliferative tumor zones. Habitat 3 (blue) concentrated in central hypoechoic areas, which in standard ultrasound assessment are typically indicative of necrosis or cystic degeneration. Drawing on established evidence that hypoxic and necrotic tumor cores can promote PD-L1 upregulation via HIF-1α–mediated pathways [11, 12], we speculate that Habitat 3—contributing the greatest number of predictive features (n = 18)—may capture imaging correlates of hypoxia-driven immune evasion. These interpretations remain speculative in the absence of co-registered histopathological or molecular data, and future studies incorporating biopsy-based validation are warranted. The broader rationale for habitat analysis is supported by prior MRI-based studies demonstrating its capacity to reflect tumor microenvironmental heterogeneity [13, 17].

ResNet and Transformer provide complementary feature learning strategies. ResNet captures local texture details through residual connections and convolutions, while Transformer models long-range dependencies using self-attention [15, 19, 20]. he ResNet model extracted 31 deep features characterizing local textural heterogeneity, whereas the Transformer extracted 41 features representing global spatial relationships.Carriero et al. noted in their 2024 review the considerable potential of deep learning for breast cancer imaging [21]. Hybrid CNN-Transformer approaches have shown strong performance in medical image analysis. TransBreastNet by Brahmareddy et al. used this fusion strategy [22]. In external validation, the two deep learning models reached AUCs of 0.827 and 0.842, supporting their value for automated extraction of high-level imaging features.

The Combined model brought together three information sources: habitat heterogeneity, local features from ResNet, and global features from Transformer. Nomogram analysis showed that the Transformer component carried the greatest weight (0–100 points), indicating that global contextual information was critical for PD-L1 prediction. In external validation, the Combined model reached an AUC of 0.946, exceeding individual models by 10.4%–13.4%. Similar benefits were reported by Chen et al., who improved prediction of neoadjuvant chemoimmunotherapy response in TNBC by integrating microbiome and radiomics data [23]. DCA confirmed that the Combined model provided superior net benefit within the clinically relevant threshold range (0.2–0.7), supporting its practical value for guiding clinical decisions.

A key theoretical contribution of the present study lies in the spatial architecture of our hybrid framework. From a machine learning perspective, habitat-based partitioning introduces an inductive bias by constraining feature extraction to subregions with internally coherent imaging phenotypes. This spatial regularization reduces within-habitat feature variance that would otherwise arise from mixing biologically distinct microenvironmental signals in a whole-tumor input—as quantitatively demonstrated by Gatenby et al., who showed that habitat partitioning reduced feature standard deviation from ± 243 HU (whole-tumor) to ± 18–79 HU per habitat zone [24]—thereby improving the signal-to-noise ratio of downstream representations and enhancing biological alignment with discrete tumor compartments reflecting the spatially heterogeneous tumor microenvironment [25]. By contrast, deep learning models trained on whole-tumor inputs lack such explicit spatial constraints and tend to aggregate competing microenvironmental signals, limiting their ability to reliably encode subregion-specific patterns [26]; prior work has demonstrated that spatially partitioned subregional features provide superior prognostic information compared to whole-tumor features [27]. Our framework instead imposes spatial structure at the input level: K-means habitat clustering first partitions the tumor into three microenvironmentally distinct zones—corresponding to stromal-dominant peripheral regions (Habitat 1), densely cellular proliferative zones (Habitat 2), and hypoxic/necrotic central cores (Habitat 3)—which then serve as constrained, biologically coherent inputs for ViT and ResNet feature extraction. This “spatially guided fusion” is therefore conceptually distinct from standard global concatenation, which merges feature streams post-hoc at the classifier level without enforcing biological structure on the input, and represents a meaningful architectural advancement over existing multimodal frameworks.PD-L1 status assessment is crucial for selecting TNBC patients who may benefit from immunotherapy. Clinical data show that PD-L1-positive (CPS ≥ 10) patients treated with pembrolizumab plus chemotherapy reached an 81.2% event-free survival rate at 5 years [3]. Yet conventional IHC faces challenges from sampling bias and temporal variability in PD-L1 expression [4]. Our non-invasive ultrasound model provides practical advantages. First, ultrasound involves no radiation and allows serial imaging to monitor tumor changes over time. Second, habitat analysis addresses single-biopsy limitations by capturing heterogeneity across the entire tumor. Third, external validation yielded an AUC of 0.946 and a PPV of 0.881, supporting accurate identification of potential responders. Finally, the widespread availability of ultrasound equipment enhances feasibility for broader clinical implementation.

Several limitations should be noted. The retrospective design creates potential for selection bias. While the sample size (n = 654) provides adequate power, larger multicenter studies are needed to confirm generalizability. We used only grayscale ultrasound; incorporating color Doppler and contrast-enhanced imaging could add valuable information. Furthermore, since ultrasound images are inherently noisy and non-Gaussian in texture, the K-means clustering step may be sensitive to noise-induced variability, which could potentially affect both the stability of habitat definitions and the reproducibility of radiomics features extracted from each subregion [28, 29]. No formal test–retest stability assessment or perturbation analysis of the clustering procedure was performed in the present study; future work should evaluate habitat reproducibility under varying noise conditions and across different ultrasound acquisition settings to more rigorously validate the reliability of the proposed habitat framework. In addition, our habitat subregions, although statistically associated with PD-L1 status, remain imaging-derived surrogate biomarkers; the biological interpretation of these subregions requires further confirmation through pathology-based correlation, ideally using multi-region biopsies or spatial omics profiling. PD-L1 expression represents just one factor in predicting immunotherapy response. Combining it with molecular markers like tumor mutational burden (TMB) and microsatellite instability (MSI) may improve prediction accuracy. Future work should address several priorities. Prospective trials can test whether the model predicts actual treatment outcomes. Research examining connections between imaging features and the tumor immune microenvironment would clarify underlying mechanisms. Dynamic models tracking PD-L1 changes during treatment merit further development. Finally, automated analysis platforms could facilitate broader clinical implementation.

Conclusion

We developed an ultrasound radiomics model that combines habitat analysis with a Transformer-ResNet architecture to predict PD-L1 status in TNBC. Dual-center validation showed that the Combined model outperformed single-modality approaches. This model offers a practical imaging biomarker for immunotherapy selection in TNBC patients.

Acknowledgements

The authors gratefully acknowledge all participants and their families who contributed to this study. We also thank the research team for their assistance in data management and statistical analysis, and appreciate the constructive comments provided by the editors and anonymous reviewers.

Author contributions

Zhiyong Li contributed to data acquisition, analysis, and interpretation, and drafting of the manuscript. Huanzhong Su contributed to data acquisition, analysis, and interpretation, and review of the submitted version of the manuscript. Han Xiao, Cong Chen, Peng Lin, Ensheng Xue, and Rongxi Liang contributed to data acquisition. Qin Ye and Zhenhu Lin contributed to the study design, data acquisition, analysis, and interpretation, review of the submitted version of the manuscript, and supervision.

Funding

This study was supported by the Joint Funds for the Innovation of Science and Technology, Fujian Province (Grant number: 2024Y9259) and the Fujian Provincial Health Technology Project (Grant number: 2024QNA021). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request. Contact email: 377579875@qq.com.

Declarations

Ethics approval and consent to participate

This study was approved by the Ethics Committee of Fujian Medical University Union Hospital (approval number: 2024KJCX061, approval date: September 6, 2024). Due to the retrospective nature of this study, the requirement for informed consent was waived by the Ethics Committee of Fujian Medical University Union Hospital. All research procedures were conducted in accordance with the principles of the Declaration of Helsinki.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Zhiyong Li and Huanzhong Su have contributed equally to this work.

Contributor Information

Qin Ye, Email: xhxhyeye@163.com.

Zhenhu Lin, Email: 377579875@qq.com.

References

  • 1.Bianchini G, De Angelis C, Licata L, et al. Treatment landscape of triple-negative breast cancer—expanded options, evolving needs. Nat Rev Clin Oncol. 2022;19(2):91–113. 10.1038/s41571-021-00565-2. [DOI] [PubMed] [Google Scholar]
  • 2.Cortes J, Rugo HS, Cescon DW, et al. Pembrolizumab plus chemotherapy in advanced triple-negative breast cancer. N Engl J Med. 2022;387(3):217–26. 10.1056/NEJMoa2202809. [DOI] [PubMed] [Google Scholar]
  • 3.Schmid P, Cortes J, Dent R, et al. Event-free survival with pembrolizumab in early triple-negative breast cancer. N Engl J Med. 2022;386(6):556–67. 10.1056/NEJMoa2112651. [DOI] [PubMed] [Google Scholar]
  • 4.Han X, Guo Y, Ye H, et al. Development of a machine learning-based radiomics signature for estimating breast cancer TME phenotypes and predicting anti-PD-1/PD-L1 immunotherapy response. Breast Cancer Res. 2024;26(1):18. 10.1186/s13058-024-01776-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Qi YJ, Su GH, You C, et al. Radiomics in breast cancer: current advances and future directions. Cell Rep Med. 2024;5(9):101719. 10.1016/j.xcrm.2024.101719. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Wei W, Xia F, Zhang D, et al. Pixel-level radiomics and deep learning for predicting Ki-67 expression in breast cancer based on dual-modal ultrasound images. Acad Radiol. 2026. 10.1016/j.acra.2025.12.047. [DOI] [PubMed] [Google Scholar]
  • 7.Yan Y, Xue X, Xie J, et al. Prediction of HER2 changes post-neoadjuvant therapy based on fusion of ultrasound radiomics and clinicopathological features empowered by explainable AI: a multicenter study. Eur J Cancer. 2026;232:116158. 10.1016/j.ejca.2025.116158. [DOI] [PubMed] [Google Scholar]
  • 8.Liu H, Xia H, Yin X, et al. Study on the differentiation of infiltrating breast cancer molecular subtypes based on ultrasound radiomics. Clin Breast Cancer. 2025;25(4):e450–60. 10.1016/j.clbc.2025.01.005. [DOI] [PubMed] [Google Scholar]
  • 9.Corredor G, Bharadwaj S, Pathak T, et al. A review of AI-based radiomics and computational pathology approaches in triple-negative breast cancer: current applications and perspectives. Clin Breast Cancer. 2023;23(8):800–12. 10.1016/j.clbc.2023.06.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Liu H, Zou L, Xu N, et al. Deep learning radiomics based prediction of axillary lymph node metastasis in breast cancer. NPJ Breast Cancer. 2024;10(1):22. 10.1038/s41523-024-00628-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.O’connor JP, Rose CJ, Waterton JC, et al. Imaging intratumor heterogeneity: role in therapy response, resistance, and clinical outcome. Clin Cancer Res. 2015;21(2):249–57. 10.1158/1078-0432.Ccr-14-0990. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Chen H, Liu Y, Zhao J, et al. Quantification of intratumoral heterogeneity using habitat-based MRI radiomics to identify HER2-positive, -low and -zero breast cancers: a multicenter study. Breast Cancer Res. 2024;26(1):160. 10.1186/s13058-024-01921-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Shi Z, Huang X, Cheng Z, et al. MRI-based quantification of intratumoral heterogeneity for predicting treatment response to neoadjuvant chemotherapy in breast cancer. Radiology. 2023;308(1):e222830. 10.1148/radiol.222830. [DOI] [PubMed] [Google Scholar]
  • 14.Pu Q, Xi Z, Yin S, et al. Advantages of transformer and its application for medical image segmentation: a survey. Biomed Eng Online. 2024;23(1):14. 10.1186/s12938-024-01212-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Shamshad F, Khan S, Zamir SW, et al. Transformers in medical imaging: a survey. Med Image Anal. 2023;88:102802. 10.1016/j.media.2023.102802. [DOI] [PubMed] [Google Scholar]
  • 16.Abbasian Ardakani A, Airom O, Khorshidi H, et al. Interpretation of artificial intelligence models in healthcare: a pictorial guide for clinicians. J Ultrasound Med. 2024;43(10):1789–818. 10.1002/jum.16524. [DOI] [PubMed] [Google Scholar]
  • 17.Wu LX, Ding N, Ji YD, et al. Habitat analysis in tumor imaging: advancing precision medicine through radiomic subregion segmentation. Cancer Manag Res. 2025;17:731–41. 10.2147/cmar.S511796. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Zhang X, Chen X, Fu Y, et al. Study on heterogeneity of vascularity and cellularity via multiparametric MRI habitat imaging in breast cancer. BMC Med Imaging. 2025;25(1):159. 10.1186/s12880-025-01698-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Zhao W, Chen W, Li G, et al. GMILT: a novel transformer network that can noninvasively predict EGFR mutation status. IEEE Trans Neural Netw Learn Syst. 2024;35(6):7324–38. 10.1109/tnnls.2022.3190671. [DOI] [PubMed] [Google Scholar]
  • 20.Zhou HY, Guo J, Zhang Y, et al. nnFormer: volumetric medical image segmentation via a 3D transformer. IEEE Trans Image Process. 2023;32:4036–45. 10.1109/tip.2023.3293771. [DOI] [PubMed] [Google Scholar]
  • 21.Carriero A, Groenhoff L, Vologina E, et al. Deep learning in breast cancer imaging: state of the art and recent advancements in early 2024. Diagnostics. 2024. 10.3390/diagnostics14080848. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Brahmareddy A, Selvan MP. TransBreastNet a CNN transformer hybrid deep learning framework for breast cancer subtype classification and temporal lesion progression analysis. Sci Rep. 2025;15(1):35106. 10.1038/s41598-025-19173-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Chen Y, Huang Y, Li W, et al. Intratumoral microbiota-aided fusion radiomics model for predicting tumor response to neoadjuvant chemoimmunotherapy in triple-negative breast cancer. J Transl Med. 2025;23(1):352. 10.1186/s12967-025-06369-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Gatenby RA, Grove O, Gillies RJ. Quantitative imaging in cancer evolution and ecology. Radiology. 2013;269(1):8–15. 10.1148/radiol.13122697. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Junttila MR, DE Sauvage FJ. Influence of tumour micro-environment heterogeneity on therapeutic response. Nature. 2013;501(7467):346–54. 10.1038/nature12626. [DOI] [PubMed] [Google Scholar]
  • 26.Gillies RJ, Kinahan PE, Hricak H. Radiomics: images are more than pictures, they are data. Radiology. 2016;278(2):563–77. 10.1148/radiol.2015151169. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Xie C, Yang P, Zhang X, et al. Sub-region based radiomics analysis for survival prediction in oesophageal tumours treated by definitive concurrent chemoradiotherapy. EBioMedicine. 2019;44:289–97. 10.1016/j.ebiom.2019.05.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Duron L, Savatovsky J, Fournier L, et al. Can we use radiomics in ultrasound imaging? Impact of preprocessing on feature repeatability. Diagn Interv Imaging. 2021;102(11):659–67. 10.1016/j.diii.2021.10.004. [DOI] [PubMed] [Google Scholar]
  • 29.Zwanenburg A, VALLIèRES M, Abdalah MA, et al. The Image biomarker standardization initiative: standardized quantitative radiomics for high-throughput image-based phenotyping. Radiology. 2020;295(2):328–38. 10.1148/radiol.2020191145. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request. Contact email: 377579875@qq.com.


Articles from Breast Cancer Research : BCR are provided here courtesy of BMC

RESOURCES