Skip to main content
Cancer Medicine logoLink to Cancer Medicine
. 2025 Sep 8;14(17):e71221. doi: 10.1002/cam4.71221

Interpretable Machine Learning for Predicting Neoadjuvant Chemotherapy Response in Breast Cancer Using the Baseline Clinical and Pathological Characteristics

Shan Fang 1, Jun Zhang 2, Chengyan Han 3, Mingxiang Kong 4, Haibo Zhang 5, Miaochun Zhong 6, Wuzhen Chen 7, Hongjun Yuan 6, Wenjie Xia 6,, Wei Zhang 8,
PMCID: PMC12415587  PMID: 40917012

ABSTRACT

Background

The pathological response to neoadjuvant chemotherapy (NAC) has become a vital prognostic indicator for patients with breast cancer (BC). The newly generated models depended on rather basic imaging and pathology characteristics and did not sufficiently elucidate the importance of the incorporated data. The purpose of this study is to establish and authenticate a machine learning model for predicting the pathological complete response to NAC using baseline clinical and pathological features in BC patients.

Methods

Data were collected from hospitalized BC patients treated with NAC at Zhejiang Provincial People's Hospital between January 2014 and August 2023. The dataset was randomly split, with 70% allocated for model training and 30% for validation. LASSO regression was used to select predictive features. Six ML models—XGBoost, LightGBM, CatBoost, logistic regression, random forest (RF), and support vector machine (SVM)—were developed, with performance assessed using the area under the curve (AUC) and accuracy, precision, recall, F1 score, and Brier score. Clinical benefits were evaluated using decision curve analysis (DCA), and SHapley Additive exPlanation (SHAP) was applied to interpret the features of the optimal ML model.

Results

A total of 303 bc patients treated with NAC were included, with a pCR rate of 29.37% (89/303). Twelve features, such as age, menopausal status, PR, HER2 status, Ki‐67 expression, stromal tumor‐infiltrating lymphocytes (sTILs) et al., were selected for model construction. Among the six models, the CatBoost model demonstrated the best predictive performance, achieving an AUC of 0.853 after Bayesian hyperparameter tuning. SHAP analysis ranked sTILs as the most critical predictive feature. In fivefold cross‐validation, the CatBoost model incorporating sTILs achieved an average AUC of 0.83.

Conclusions

The ML‐based pCR prediction model enables more accurate pCR prediction for BC patients at baseline, aiding in optimizing treatment strategies. Additionally, the interpretable SHAP framework enhances model transparency, fostering clinical trust, and understanding among doctors.

Keywords: breast cancer, interpretable machine learning, neoadjuvant chemotherapy, pathological complete response, tumor‐infiltrating lymphocytes


Abbreviations

AUC

area under the receiver operating characteristic curve

BC

breast cancer

BMI

body mass index

CatBoost

categorical boosting

CEA

carcinoembryonic antigen

DCA

decision curve analysis

LightGBM

light gradient boosting machine

LR

logistic regression

ML

machine learning

NAC

neoadjuvant chemotherapy

pCR

pathologic complete response

ROC

receiver operating characteristic curve

SHAP

SHapley additive exPlanations

sTILs

stromal tumor infiltrating lymphocytes

SVM

support vector machine

XGBoost

eXtreme gradient boosting

1. Introduction

Breast cancer (BC) is one of the most prevalent tumors among women worldwide [1]. While early BC generally has a favorable prognosis, the prognosis remains suboptimal due to the high risk of metastasis [1]. Pathological response to neoadjuvant chemotherapy (NAC) is an essential predictor of survival in BC patients, with pathological complete response (pCR) having a strong, positive link to survival outcomes [2, 3, 4]. Customized post‐neoadjuvant therapy holds promise for enhancing long‐term prognosis in patients who do not achieve pCR [5, 6]. Ethnic, racial, and tumor‐specific characteristics also influence BC outcomes, underscoring the importance of diverse and inclusive datasets in developing accurate predictive models [7, 8].

Recently, studies have proposed artificial intelligence (AI) has shown remarkable potential in advancing breast cancer diagnosis, treatment, and management, and machine learning (ML)‐based models could be used for predicting BC outcomes [9, 10, 11, 12]. These studies underscore the significance of machine learning models in developing accurate predictive models and personalized precision therapies, as compared to traditional approaches. Machine learning can work with complex data sets without specifying complex relationships in a good deal of variables a priori, so it is considered a promising analytical method. However, these previous models had many flaws that require further clarification. First, they relied on relatively simple imaging features and especially few pathological characteristics. Second, they failed to adequately explain the significance of the included features, and their model had relatively low interpretability [9, 10, 11, 12]. Additionally, some incorporated post‐treatment information, which was not suitable for predictive models intended for baseline evaluations. In this study, we aimed to outperform a prior model to predict BC outcomes and enhance the interpretability.

The tumor microenvironment exerts a significant influence on the initiation and progression of tumors. Tumor‐infiltrating lymphocytes (TILs), essential regulators in the tumor microenvironment, are classified into intratumoral and stromal TILs (sTILs) subtypes based on their location in the tumor nest or stroma [13, 14, 15]. Recent research has demonstrated that an immune‐rich microenvironment is associated with improved prognosis in both early and advanced BC. As critical markers of the tumor microenvironment [16, 17, 18], TILs have been identified as significant predictors of response to NAC in BC [19, 20, 21]. Moreover, our previous study highlighted the important role of sTILs in predicting BC outcomes [22]. Building on these findings, we aimed to develop an accurate and promising ML‐based prediction model incorporating various pathological and clinical features to facilitate personalized treatment and improve BC prognosis.

2. Materials and Methods

2.1. Study Population and Data Source

This retrospective study analyzed clinicopathological data from 386 BC patients treated with NAC followed by surgery at Zhejiang Provincial People's Hospital between January 2014 and August 2023. Of these, 83 were excluded due to missing data or a history of prior BC, as illustrated in Figure 1. This study received ethical approval from the Ethics Committee at Zhejiang Provincial People's Hospital (Ethics number: 2019KY274). The requirement for informed consent was waived because the data were gathered retrospectively. Meanwhile, the study was designed based on the guidance for reporting clinical prediction models that use machine learning methods (TRIPOD+AI statement) [23].

FIGURE 1.

FIGURE 1

Study flowchart of 386 breast cancer patients treated with NAC and surgery. A total of 303 patients without missing data were incorporated into the final analysis. A Python function was used to randomly divide the dataset into training and testing sets.

2.2. Predictor Variables

The prediction outcome is determined by the PCR status, which is defined according to the Miller–Payne system following surgery [24]. A total of 22 baseline features were collected from BC patients, including body mass index (BMI), age, menopausal status, red blood cell distribution width (RDW), mean platelet volume (MPV), platelet distribution width (PDW), carcinoembryonic antigen (CEA), and cancer antigens (CA125 and CA153). Tumor size, shape, internal echo, aspect ratio, calcification, posterior echo, abnormal blood flow signal, and lymphatic metastasis were gathered. In addition, pathological markers encompassing Ki‐67 expression, estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER2) levels were also sourced from core‐needle biopsy specimens before NAC. Particularly, sTILs were quantified by two pathologists using the baseline core‐needle biopsy specimens.

2.3. Data Preprocessing and Feature Selection

Age, BMI, RDW, PDW, MPV, CEA, CA125, and CA153 were treated as continuous variables, while eight ultrasonic features were categorized as categorical variables. ER, PR, and HER2 status were classified into three categories: negative, weakly positive, and strongly positive. Numerical values of Ki‐67 expression and sTILs were analyzed directly to retain all available information. Feature selection was performed through the least absolute shrinkage and selection operator (LASSO) method, which was applied to filter and select predictive variables to prevent overfitting [25]. The process of selecting key clinical factors using LASSO was carried out in the training dataset.

2.4. Model Development and Model Evaluation

A stratified random sampling method was employed to divide the dataset into training (70%) and testing (30%) datasets. The training dataset was employed to develop six ML models, and the testing dataset was used to evaluate their prediction performance. Bayesian optimization and grid‐search method were applied to fine‐tune the hyperparameters of the ML models in the training dataset [26]. Each model's performance was evaluated on the testing dataset using the area under the receiver operating characteristic curve (AUC) and decision curve analysis (DCA) to determine the best‐performing model, as well as accuracy, precision, recall, F1 score, and Brier score. The calibration curve was utilized to assess the best‐performing ML model.

2.5. ML Explainable Tool

The best‐performing ML model was interpreted using SHapley Additive exPlanations (SHAP), a unified framework for quantifying the contribution and influence of each feature on the ultimate prediction [27]. SHAP values highlighted the positive or negative contribution of each predictor to the target variable. Additionally, SHAP values were utilized to interpret individual cases within the dataset, enhancing the transparency and clinical applicability of the model.

2.6. Statistical Analysis

All analyses were conducted using R V4.3.2 and Python V3.7.0. Categorical variables were reported as total numbers and percentages, and group comparisons were made using either chi‐square tests or Fisher's exact test. Continuous variables were expressed as the mean ± standard deviation for normally distributed data or as the median and interquartile range (median, IQR) for skewed distributions. Group differences were evaluated using the t‐test for normally distributed data and the Mann–Whitney U test for skewed data. Statistical significance was set at p < 0.05.

Six ML models—light gradient boosting machine (LightGBM), eXtreme gradient boosting (XGBoost), categorical boosting (CatBoost), logistic regression (LR), random forest (RF), and support vector machine (SVM)—were developed for pCR prediction. Model performance was compared using receiver operating characteristic (ROC) curves and decision curve analysis (DCA). The SHAP framework was employed to interpret the importance of features in the best‐performing model.

3. Results

3.1. Patient Characteristics

A total of 386 bc patients were retrospectively included in this study. The patient screening process and the study design flow chart are presented in Figure 1. After exclusions, 303 patients without missing information were included in the final ML analysis. Among these were 89 patients (29.37%) who achieved pCR following NAC. The mean age of the cohort was 49.45 years, and the mean BMI was 23.83. Differences between the training and testing datasets are presented in Table 1. No major differences were observed between the training and testing datasets.

TABLE 1.

Patient features.

Features N = 303 Training dataset (n = 212) Testing dataset (n = 91) p
Age 49.45 ± 10.45 49.97 ± 10.66 48.24 ± 9.88 0.2
BMI 23.83 ± 3.93 23.62 ± 3.45 24.30 ± 4.86 0.2
RDW 13.15 ± 1.58 13.11 ± 1.45 13.26 ± 1.84 0.5
PDW 13.67 ± 2.82 13.63 ± 2.67 13.77 ± 3.16 0.7
MPV 10.99 ± 1.23 10.97 ± 1.16 11.03 ± 1.39 0.7
CEA 2.0 (1.3, 3.2) 2.1 (1.4, 3.3) 1.8 (1.3, 2.7) 0.08
CA125 14 (10, 20) 14 (10, 20) 15 (10, 20) 0.8
CA153 18 (12, 26) 18 (13, 27) 15 (11, 25) 0.07
Ki‐67 expression (%) 30 (18, 50) 30 (20, 50) 30 (15, 43) 0.2
sTILs (%) 20 (10, 30) 20 (10, 30) 20 (15, 30) 0.04
Menopausal 0.04
Premenopausal 162 (53%) 105 (50%) 57 (63%)
Postmenopausal 141 (47%) 107 (50%) 34 (37%)
Tumor size (US) 0.2
≤ 2 cm 48 (16%) 31 (15%) 17 (19%)
2–5 cm 196 (65%) 138 (65%) 58 (64%)
≥ 5 cm 50 (17%) 39 (18%) 11 (12%)
Trespass chest wall/skin 9 (3.0%) 4 (1.9%) 5 (5.5%)
Tumor shape (US) 0.7
Regular 11 (3.6%) 7 (3.3%) 4 (4.4%)
Irregular 292 (96%) 205 (97%) 87 (96%)
Tumor internal echo (US) 0.1
Uniform 22 (7.3%) 19 (9.0%) 3 (3.3%)
Uneven 281 (93%) 193 (91%) 88 (97%)
Aspect ratio (US) 0.1
Balance 281 (93%) 200 (94%) 81 (89%)
Imbalance 22 (7.3%) 12 (5.7%) 10 (11%)
Calcification (US) 0.2
Negative 126 (42%) 83 (39%) 43 (47%)
Positive 177 (58%) 129 (61%) 48 (53%)
Posterior echo (US) 0.8
Attenuation 83 (27%) 56 (26%) 27 (30%)
Unchanged 214 (71%) 152 (72%) 62 (68%)
Enhancement 6 (2.0%) 4 (1.9%) 2 (2.2%)
Abnormal blood flow signal (US) 0.2
Negative 50 (17%) 31 (15%) 19 (21%)
Positive 253 (83%) 181 (85%) 72 (79%)
Lymphatic metastasis (US) 0.01
Negative 119 (39%) 73 (34%) 46 (51%)
Positive 184 (61%) 139 (66%) 45 (49%)
ER 0.4
Negative 135 (45%) 92 (43%) 43 (47%)
Weakly positive 36 (12%) 23 (11%) 13 (14%)
Strongly positive 132 (44%) 97 (46%) 35 (38%)
PR 0.3
Negative 143 (47%) 98 (46%) 45 (49%)
Weakly positive 83 (27%) 55 (26%) 28 (31%)
Strongly positive 77 (25%) 59 (28%) 18 (20%)
HER2 0.04
Negative 63 (21%) 36 (17%) 27 (30%)
Weakly positive 108 (36%) 79 (37%) 29 (32%)
Strongly positive 132 (44%) 97 (46%) 35 (38%)
pCR 89 (29%) 65 (31%) 24 (26%) 0.5

Note: Data format: x ± s, Median (IQR); n (%). Training set and Testing set were separated by Python function: train_test_split. The 22 features were tested simultaneously, the p‐value threshold was corrected by Bonferroni correction, and the statistical significance was set at p‐value < 0.0023 (0.05/22).

Abbreviations: BMI, body mass index; ER, estrogen receptor; HER2, human epidermal growth factor receptor‐2; Ki‐67, Ki‐67 expression (%); MPV, mean platelet volume; PDW, platelet distribution width; PR, progesterone receptor; RDW, red blood cell distribution width; sTILs, stromal tumor infiltrating lymphocytes; US, ultrasound.

3.2. Feature Selection

A correlation analysis was performed on the 22 predictive features of the model; the results demonstrated no strong correlation (Figure S1). Features were automatically recognized in the training set using LASSO regression (Figure 2). LASSO regression effectively prevents overfitting and reduces the loss function by adjusting the regularization coefficient, lambda (λ). Twelve of 22 features were chosen with nonzero coefficients in the LASSO regression analysis, which were identified at a shrinkage parameter (lambda.min) of 0.01737013. These 12 features were used to construct the ML models, including baseline age, menopausal status, tumor size, aspect ratio, posterior echo, lymphatic metastasis, RDW, PDW, PR, HER2 status, Ki‐67 expression, and sTILs.

FIGURE 2.

FIGURE 2

Feature selection using the LASSO binary logistic regression model. (A) LASSO coefficient profiles of 22 features in the training set. The trajectory of the coefficient of each feature was tracked in the LASSO coefficient profiles as lambda varied in the LASSO algorithm. (B) Identification of the optimal penalization coefficient lambda (λ) in the LASSO model in the training set with threefold cross‐validation.

3.3. Model Building and Evaluation

We developed six prevalent ML models—XGBoost, LightGBM, CatBoost, LR, RF, and SVM—aimed at predicting pathological complete response to NAC in BC patients. In the training dataset, hyperparameter tuning was realized by Bayesian optimization and grid‐search method. The performance of six ML models was subsequently evaluated employing the testing dataset. The AUC values of XGBoost, LightGBM, CatBoost, LR, RF, and SVM for the testing dataset were 0.848, 0.831, 0.853, 0.837, 0.805, and 0.733, respectively (Figure 3). Additionally, to fully assess the performance, accuracy, precision, recall, F1 score, and Brier score for all six models are provided in Table 2. Detailed hyperparameters for each ML model are listed in Table 3. Among these models, CatBoost demonstrated the highest predictive performance (AUC = 0.853); its accuracy, precision, recall, F1 score, and Brier score were 0.846, 0.692, 0.750, 0.720, and 0.141, respectively. DCA was performed to assess the clinical utility of the models (Figure 4). According to the DCA plot, using treatment strategies based on any ML model resulted in a greater net benefit than default strategies of treating either all or no patients. At a threshold probability of 50%, the net benefit of the CatBoost model surpassed that of the other five models. Both AUC, precision, recall, F1 score, Brier score, and DCA analyses indicated that the CatBoost had the strongest clinical utility. Based on these findings, CatBoost was selected as the best‐performing model for subsequent analysis. The calibration plots for the CatBoost model are presented in Figure 5. The calibration curve's proximity to the diagonal line shows that the model's predicted probabilities align well with actual proportions, indicating strong calibration.

FIGURE 3.

FIGURE 3

ROC curve for pCR prediction of six ML models in the testing set. CatBoost prediction model (AUC = 0.913).

TABLE 2.

Performance of different ML models.

Models Auc Accuracy Precision Recall F1 score Brier score
CatBoost 0.852612 0.846154 0.692308 0.75 0.72 0.141
LightGBM 0.830846 0.78022 0.6 0.5 0.545455 0.154
Logistic regression 0.837065 0.78022 0.576923 0.625 0.6 0.146
Random forest 0.804726 0.758242 0.535714 0.625 0.576923 0.159
SVM 0.733209 0.692308 0.416667 0.416667 0.416667 0.175
XGBoost 0.848259 0.813187 0.64 0.666667 0.653061 0.147

TABLE 3.

ML optimal parameter tuning.

Models Optimal parameter
Random Forest Classifier max_depth = 7, max_features = 6, n_estimators = 78, min_samples_leaf = 3, min_samples_split = 6
CatBoost Classifier depth = 3, learning_rate = 0.06, l2_leaf_reg = 4, iterations = 143
XGBClassifier (XGBoost) n_estimators = 294, learning_rate = 0.03, max_depth = 2, colsample_bytree = 0.8, alpha = 1, gamma = 1
SVC (SVM) kernel = “rbf”, C = 1, gamma = 0.01, probability = True
LGBMClassifier (LightGBM) n_estimators = 143, learning_rate = 0.01, max_depth = 3
Logistic Regression C = 10, max_iter = 50, solver = “liblinear”

FIGURE 4.

FIGURE 4

Decision curve analysis (DCA) for six ML models in the testing sets.

FIGURE 5.

FIGURE 5

Calibration curve of CatBoost.

Feature importance rankings for four ML models during the six prediction models are shown in Figure 6, including (A) RF, (B) XGBoost, (C) LightGBM, and (D) CatBoost. The importance rankings were derived using built‐in attributes of each ML algorithm. Across these models, sTILs, HER2, and age consistently ranked among the top five predictors most associated with pCR.

FIGURE 6.

FIGURE 6

Importance ranking of features in four prediction algorithms. (A) Random Forest, (B) XGBoost, (C) LightGBM, and (D) CatBoost.

3.4. Explanation of CatBoost Model With the SHAP Method

The SHAP algorithm was used to quantify the contribution of each predictor variable to the outcomes predicted by the CatBoost model. The feature importance rankings for the CatBoost model are presented in Figure 7A. STILs demonstrated the highest predictive value, followed by HER2 status, age, lymphatic metastasis, aspect ratio, Ki‐67 expression, tumor size, RDW, menopausal status, posterior echo, PDW, and PR. SHAP values were further employed to identify positive and negative correlations between predictors and the pCR. As illustrated in Figure 7B, the horizontal axis indicates whether a predictor value contributes to a higher (red) or lower (blue) likelihood of pCR. For example, an increase in sTILs positively influenced the prediction, shifting it toward pCR. Similarly, HER2 status and Ki‐67 expression showed positive impacts on pCR prediction. In contrast, an increase in age, tumor size, and RDW negatively influenced the prediction, shifting it toward non‐pCR.

FIGURE 7.

FIGURE 7

SHAP interpretation of the CatBoost model. (A) Feature importance rankings based on SHAP weights. (B) The effect of each feature on the model output. (C, D) SHAP force plot for individualized predictions of two patients in the testing set, one of whom was predicted to achieve pCR and the other to not achieve pCR. Red bars represent features positively influencing pCR, whereas blue bars represent features negatively influencing pCR. Longer bars indicate greater functional importance of the respective features.

3.5. SHAP Individual Force Plots

This study provided examples of individual predictions using SHAP force plots, where red bars represent features that this patient had sTILs at 30%, HER2 status strongly positive (label 2), tumor size ≤ 2 cm (label 1), and an age of 46 years. These features collectively increased the likelihood of achieving pCR as shown in Figure 7C. Conversely, Figure 7D illustrated a patient who did not achieve pCR. This patient had a sTILs level of 10%, Ki‐67 expression of 60%, and HER2 status negative (label 0). These features, particularly the lower sTILs and HER2 status, contributed to a decreased likelihood of pCR.

3.6. Importance of sTILs

As the sTILs had the strongest predictive value, fivefold cross‐validation was applied to create and confirm the performance of the CatBoost model with or without sTILs in the entire data. The CatBoost model with sTILs had an average AUC of 0.83, as presented in Figure 8A, whereas the average AUC of the CatBoost model without sTILs decreased to 0.70, as shown in Figure 8B. Thus, the sTILs sourced from core‐needle biopsy specimens before NAC had substantial predictive value in the ML model. The Youden's Index of sTILs was 17.5%. A higher value of sTILs indicated that breast cancer patients receiving neoadjuvant chemotherapy were more likely to achieve PCR.

FIGURE 8.

FIGURE 8

Importance of sTILs. The average AUC of the CatBoost model with a fivefold cross‐validation containing sTILs was 0.83, and the average AUC of the CatBoost model with a fivefold cross‐validation without sTILs was 0.7.

4. Discussion

Accurately predicting cancer prognosis is critical for guiding individualized treatment strategies in BC [14, 28, 29]. This study focused on BC patients undergoing NAC, using postoperative pCR as the primary evaluation metric. By analyzing clinicopathological characteristics, predictive factors closely associated with the Miller–Payne grading were identified, and ML prediction models with significant clinical application potential were developed. Based on the actual data features of missing clinical information, we did not perform data interpolation because it involves groups of features from ultrasound exams or biopsy markers, rather than individual features. Finally, a total of 303 BC patients treated with NAC were included, with a pCR rate of 29.37% (89/303). In recent ML studies, Synthetic Minority Over‐sampling Technique (SMOTE) is used to address sample imbalance by generating synthetic samples for the minority class [25]. It is advisable to use SMOTE when the class ratio exceeds 1:10 or 1:20. Model type is important, as some models are more sensitive to imbalance, while decision tree models are less affected. Our study has a 1:3 class ratio and primarily uses tree models, so we analyzed the data directly.

At present, artificial intelligence has been widely applied in various research fields, and AI related studies have shown promise in enhancing accuracy and efficiency in mammographic screening programs, although there are still limitations in clinical work [30]. In our study, the ML‐based pCR prediction model enables more accurate pCR prediction for BC patients, aiding in optimizing treatment strategies. It further validates the broad prospects of AI in the field of clinical applications.

Unlike previous studies on pCR in BC, this study incorporated sTIL as a key feature [9, 10, 11, 31]. Previous studies have reported that the progression of breast tumors is closely related to the tumor immune microenvironment and tumor resistance [32, 33]. As a critical marker of the tumor microenvironment, sTIL level has demonstrated a strong correlation with tumor response to NAC in BC [19, 20, 22, 34, 35]. In this study, the percentage of sTILs was quantified by two pathologists using baseline biopsy specimens examined under a microscope. Based on our current knowledge, this study represents one of the first to develop and validate six ML algorithms with the feature of sTILs for predicting pCR status in BC patients undergoing NAC.

LASSO algorithm was utilized to select and filter features which could effectively prevent overfitting and reduce the loss function by adjusting the regularization coefficient. Twelve of 22 features were chosen automatically in the LASSO regression analysis in the training dataset, including baseline age, menopausal status, tumor size, aspect ratio, posterior echo, lymphatic metastasis, RDW, PDW, PR, HER2 status, Ki‐67 expression, and sTILs. These 12 features were used to construct the six ML models.

Among the six models tested, the CatBoost model demonstrated excellent performance, outperforming LR, RF, SVM, LightGBM, and XGBoost, achieving an AUC of 0.853 in the testing dataset after hyperparameter tuning with Bayesian optimization [36]. Bayesian optimization is recognized for its efficiency in solving black‐box problems and offers significant advantages over traditional methods such as GridSearch and RandomSearch [37]. We rigorously evaluate the performance of our model, and the results from the testing dataset validate its effectiveness. The CatBoost model has high accuracy, precision, recall, F1 score, and low Brier score. Combining precision and recall (0.692/0.75), the overall performance of the model is balanced. Expanding the sample size could improve model performance. Brier score evaluates the variance between expected and actual outcomes, with a lower score indicating a more effective prediction model [38]. DCA further confirmed that the CatBoost model performed optimally within a relevant threshold probability range of approximately 50%. The calibration plots for the CatBoost model, which are in close proximity to the diagonal line, show that the model's predicted probabilities align well. Notably, the AUCs in this study exceeded those reported in similar research [9, 10, 31]. Pu et al. established a nomogram‐derived prediction of pCR in breast cancer patients treated with neoadjuvant chemotherapy with an AUC of 0.758 [9]. Dell'Aquila et al. evaluated four ML algorithms in predicting pCR in BC patients with AUCs of 0.743–0.754 [10]. A study from South Korea found that LightGBM had the highest prediction performance among six models, with an AUC of 0.7845 [31].

The SHAP method was employed to enhance the interpretability of the CatBoost model, ensuring both robust model performance and clinical relevance. To demonstrate the average importance of each feature, the global SHAP value is plotted as a bar chart. By applying SHAP, several key features associated with pCR in BC patients undergoing NAC were identified, then sTILs emerged as the most important predictor, followed by HER2 status, age, lymphatic metastasis, aspect ratio, Ki‐67 expression, tumor size, RDW, menopausal status, posterior echo, PDW, and PR.

Additionally, the CatBoost model incorporating sTILs achieved an average AUC of 0.83 in the fivefold cross‐validation throughout the whole dataset. But the average AUC of the CatBoost model without sTILs decreased to 0.70. This meticulous measurement process demonstrated that sTILs are significant predictors of pCR status, highlighting their value in ML‐based prognostic models.

Furthermore, interpretable ML revealed key relationships between predictive features and pCR outcomes. Higher sTIL levels and Ki‐67 expression were associated with a greater likelihood of achieving pCR following NAC. Conversely, elevated baseline levels of older age, higher RDW, and larger tumor size were linked to a lower probability of achieving pCR. Additionally, attenuation of the aspect ratio and posterior echo in ultrasound imaging was related to a decreased probability of pCR. HER2 strongly and positively influenced pCR, whereas PR and PDW had a weak impact.

This study contributed to a broader comprehension of the relationship between pCR and clinicopathological features in breast cancer. However, several limitations must be recognized. First, the sample size was relatively small, which may constrain the generalizability of our findings. Second, the retrospective nature of this analysis introduced potential biases that could impact the outcomes. These limitations underscored the need for larger studies to validate and extend our findings. A comprehensive trial is anticipated to both corroborate these results and refine the predictive model for more extensive clinical applications moving forward.

5. Conclusions

The interpretable CatBoost prediction model developed in this study demonstrated certain predictive performance and clinical significance in forecasting pCR following NAC in BC patients. By leveraging the clinicopathological characteristics of newly diagnosed patients, this model provided a valuable tool for personalized treatment planning and enhancing clinical decision‐making.

Author Contributions

Shan Fang: conceptualization, formal analysis, validation, writing – original draft. Jun Zhang: writing – review and editing. Chengyan Han: formal analysis. Mingxiang Kong: data curation. Haibo Zhang: project administration. Miaochun Zhong: investigation, funding acquisition. Wuzhen Chen: data curation. Hongjun Yuan: investigation. Wenjie Xia: conceptualization, writing – original draft, funding acquisition. Wei Zhang: validation, writing – review and editing.

Ethics Statement

Ethical approval was granted by the Ethics Committee of Zhejiang Provincial People's Hospital (Approval No. 2019KY274).

Conflicts of Interest

The authors declare no conflicts of interest.

Supporting information

Figure S1: The correlation heatmap of all features (p < 0.8).

CAM4-14-e71221-s001.docx (328.1KB, docx)

Fang S., Zhang J., Han C., et al., “Interpretable Machine Learning for Predicting Neoadjuvant Chemotherapy Response in Breast Cancer Using the Baseline Clinical and Pathological Characteristics,” Cancer Medicine (2025): e71221, 10.1002/cam4.71221.

Funding: This work was supported by the Medical and Health Science and Technology Project of Zhejiang Province (No. 2022KY068, 2023KY046, 2025KY574), Project of the Education Department of Zhejiang Province (Y202044656), the Public Welfare Technology Application Research Project of Zhejiang Province (LGD22H070003), and Zhejiang Clinovation Pride (CXTD202501004).

Shan Fang and Jun Zhang contributed equally to this study.

Contributor Information

Wenjie Xia, Email: xiawenjie1031@zju.edu.cn.

Wei Zhang, Email: zhangwei1@hmc.edu.cn.

Data Availability Statement

The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.

References

  • 1. Siegel R. L., Giaquinto A. N., and Jemal A., “Cancer Statistics, 2024,” CA: A Cancer Journal for Clinicians 74, no. 1 (2024): 12–49, 10.3322/caac.21820. [DOI] [PubMed] [Google Scholar]
  • 2. Spring L. M., Bar Y., and Isakoff S. J., “The Evolving Role of Neoadjuvant Therapy for Operable Breast Cancer,” Journal of the National Comprehensive Cancer Network 20, no. 6 (2022): 723–734, 10.6004/jnccn.2022.7016. [DOI] [PubMed] [Google Scholar]
  • 3. Huang M., O'Shaughnessy J., Zhao J., et al., “Association of Pathologic Complete Response With Long‐Term Survival Outcomes in Triple‐Negative Breast Cancer: A Meta‐Analysis,” Cancer Research 80, no. 24 (2020): 5427–5434, 10.1158/0008-5472.CAN-20-1792. [DOI] [PubMed] [Google Scholar]
  • 4. Cortazar P., Zhang L., Untch M., et al., “Pathological Complete Response and Long‐Term Clinical Benefit in Breast Cancer: The CTNeoBC Pooled Analysis,” Lancet 384, no. 9938 (2014): 164–172, 10.1016/S0140-6736(13)62422-8. [DOI] [PubMed] [Google Scholar]
  • 5. Pusztai L., Foldi J., Dhawan A., DiGiovanna M. P., and Mamounas E. P., “Changing Frameworks in Treatment Sequencing of Triple‐Negative and HER2‐Positive, Early‐Stage Breast Cancers,” Lancet Oncology 20, no. 7 (2019): e390–e396, 10.1016/S1470-2045(19)30158-5. [DOI] [PubMed] [Google Scholar]
  • 6. Jang M. K., Park S., Park C., Doorenbos A. Z., Go J., and Kim S., “Body Composition Change During Neoadjuvant Chemotherapy for Breast Cancer,” Frontiers in Oncology 12 (2022): 941496, 10.3389/fonc.2022.941496. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Acs B., Hartman J., Sonmez D., Lindman H., Johansson A. L. V., and Fredriksson I., “ Real‐World Overall Survival and Characteristics of Patients With ER‐Zero and ER‐Low HER2‐Negative Breast Cancer Treated as Triple‐Negative Breast Cancer: A Swedish Population‐Based Cohort Study,” Lancet Regional Health 40 (2024): 100886, 10.1016/j.lanepe.2024.100886. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Bennett A., Shaver N., Vyas N., et al., “Screening for Breast Cancer: A Systematic Review Update to Inform the Canadian Task Force on Preventive Health Care Guideline,” Systematic Reviews 13, no. 1 (2024): 304, 10.1186/s13643-024-02700-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Pu S., Wang K., Liu Y., et al., “Nomogram‐Derived Prediction of Pathologic Complete Response (pCR) in Breast Cancer Patients Treated With Neoadjuvant Chemotherapy (NCT),” BMC Cancer 20, no. 1 (2020): 1120, 10.1186/s12885-020-07621-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Dell'Aquila K., Vadlamani A., Maldjian T., et al., “Machine Learning Prediction of Pathological Complete Response and Overall Survival of Breast Cancer Patients in an Underserved Inner‐City Population,” Breast Cancer Research 26, no. 1 (2024): 7, 10.1186/s13058-023-01762-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Rahadian R. E., Tan H. Q., Ho B. S., et al., “Using Machine Learning Models to Predict Pathologic Complete Response to Neoadjuvant Chemotherapy in Breast Cancer,” JCO Clinical Cancer Informatics 8 (2024): e2400071, 10.1200/CCI.24.00071. [DOI] [PubMed] [Google Scholar]
  • 12. Chia J. L. L., He G. S., Ngiam K. Y., Hartman M., Ng Q. X., and Goh S. S. N., “Harnessing Artificial Intelligence to Enhance Global Breast Cancer Care: A Scoping Review of Applications, Outcomes, and Challenges,” Cancers 17, no. 2 (2025): 197, 10.3390/cancers17020197. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Tarantino P., Hortobagyi G., Tolaney S. M., and Mittendorf E. A., “Heterogeneity of Residual Disease After Neoadjuvant Systemic Therapy in Breast Cancer: A Review,” JAMA Oncology 10, no. 11 (2024): 1578–1584, 10.1001/jamaoncol.2024.3679. [DOI] [PubMed] [Google Scholar]
  • 14. Rayson V. C., Harris M. A., Savas P., et al., “The Anti‐Cancer Immune Response in Breast Cancer: Current and Emerging Biomarkers and Treatments,” Trends Cancer 10, no. 6 (2024): 490–506, 10.1016/j.trecan.2024.02.008. [DOI] [PubMed] [Google Scholar]
  • 15. Xu H., Jia Z., Liu F., et al., “Biomarkers and Experimental Models for Cancer Immunology Investigation,” MedComm (London) (2020) 4, no. 6 (2023): e437, 10.1002/mco2.437. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. de Melo Gagliato D., Cortes J., Curigliano G., et al., “Tumor‐Infiltrating Lymphocytes in Breast Cancer and Implications for Clinical Practice,” Biochimica et Biophysica Acta 1868, no. 2 (2017): 527–537, 10.1016/j.bbcan.2017.10.003. [DOI] [PubMed] [Google Scholar]
  • 17. Denkert C., von Minckwitz G., Darb‐Esfahani S., et al., “Tumour‐Infiltrating Lymphocytes and Prognosis in Different Subtypes of Breast Cancer: A Pooled Analysis of 3771 Patients Treated With Neoadjuvant Therapy,” Lancet Oncology 19, no. 1 (2018): 40–50, 10.1016/S1470-2045(17)30904-X. [DOI] [PubMed] [Google Scholar]
  • 18. Kos Z., Roblin E., Kim R. S., et al., “Pitfalls in Assessing Stromal Tumor Infiltrating Lymphocytes (sTILs) in Breast Cancer,” npj Breast Cancer no. 17 (2020): 6–17, 10.1038/s41523-020-0156-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Quintana A., Saini K. S., Vidal L., et al., “Window of Opportunity Trials With Immune Checkpoint Inhibitors in Triple‐Negative Breast Cancer,” ESMO Open 9, no. 10 (2024): 103713, 10.1016/j.esmoop.2024.103713. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Pasetto C. V., Aguiar F. N., Peixoto M. B., et al., “Evaluation of Tumor Infiltrating Lymphocytes as a Prognostic Biomarker in Patients With Ductal Carcinoma In Situ of the Breast,” Breast Cancer Research and Treatment 208, no. 1 (2024): 9–18, 10.1007/s10549-024-07466-9. [DOI] [PubMed] [Google Scholar]
  • 21. Tay J. Y., Ho J. X., Cheo F. F., and Iqbal J., “The Tumour Microenvironment and Epigenetic Regulation in BRCA1 Pathogenic Variant‐Associated Breast Cancers,” Cancers 16, no. 23 (2024): 3910, 10.3390/cancers16233910. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Fang S., Xia W., Zhang H., et al., “A Real‐World Clinicopathological Model for Predicting Pathological Complete Response to Neoadjuvant Chemotherapy in Breast Cancer,” Frontiers in Oncology 14 (2024): 1323226, 10.3389/fonc.2024.1323226. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.“TRIPOD+AI Statement: Updated Guidance for Reporting Clinical Prediction Models That Use Regression or Machine Learning Methods,” BMJ (Clinical Research Ed.) 385 (2024): q902, 10.1136/bmj.q902. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Ogston K. N., Miller I. D., Payne S., et al., “A New Histological Grading System to Assess Response of Breast Cancers to Primary Chemotherapy: Prognostic Significance and Survival,” Breast 12, no. 5 (2003): 320–327, 10.1016/s0960-9776(03)00106-1. [DOI] [PubMed] [Google Scholar]
  • 25. Liu X., Xie Z., Zhang Y., et al., “Machine Learning for Predicting In‐Hospital Mortality in Elderly Patients With Heart Failure Combined With Hypertension: A Multicenter Retrospective Study,” Cardiovascular Diabetology 23, no. 1 (2024): 407, 10.1186/s12933-024-02503-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Ghahramani Z., “Probabilistic Machine Learning and Artificial Intelligence,” Nature 521, no. 7553 (2015): 452–459, 10.1038/nature14541. [DOI] [PubMed] [Google Scholar]
  • 27. Ali S., Akhlaq F., Imran A. S., Kastrati Z., Daudpota S. M., and Moosa M., “The Enlightening Role of Explainable Artificial Intelligence in Medical & Healthcare Domains: A Systematic Literature Review,” Computers in Biology and Medicine 166 (2023): 107555, 10.1016/j.compbiomed.2023.107555. [DOI] [PubMed] [Google Scholar]
  • 28. Zhang J., Wu J., Zhou X. S., Shi F., and Shen D., “Recent Advancements in Artificial Intelligence for Breast Cancer: Image Augmentation, Segmentation, Diagnosis, and Prognosis Approaches,” Seminars in Cancer Biology 96 (2023): 11–25, 10.1016/j.semcancer.2023.09.001. [DOI] [PubMed] [Google Scholar]
  • 29. van't Erve I., Alipanahi B., Lumbard K., et al., “Cancer Treatment Monitoring Using Cell‐Free DNA Fragmentomes,” Nature Communications 15, no. 1 (2024): 8801, 10.1038/s41467-024-53017-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Goh S., Goh R. S. J., Chong B., et al., “Challenges in Implementing Artificial Intelligence in Breast Cancer Screening Programs: Systematic Review and Framework for Safe Adoption,” Journal of Medical Internet Research 27 (2025): e62941, 10.2196/62941. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Kim J. Y., Jeon E., Kwon S., et al., “Prediction of Pathologic Complete Response to Neoadjuvant Chemotherapy Using Machine Learning Models in Patients With Breast Cancer,” Breast Cancer Research and Treatment 189, no. 3 (2021): 747–757, 10.1007/s10549-021-06310-8. [DOI] [PubMed] [Google Scholar]
  • 32. Chen W., Jiang M., Zou X., et al., “Fibroblast Activation Protein (FAP)(+) Cancer‐Associated Fibroblasts Induce Macrophage M2‐Like Polarization via the Fibronectin 1‐Integrin alpha5beta1 Axis in Breast Cancer,” Oncogene 44, no. 28 (2025): 2396–2412, 10.1038/s41388-025-03359-3. [DOI] [PubMed] [Google Scholar]
  • 33. Xia W., Chen W., Ni C., et al., “Chemotherapy‐Induced Exosomal circBACH1 Promotes Breast Cancer Resistance and Stemness via miR‐217/G3BP2 Signaling Pathway,” Breast Cancer Research 25, no. 1 (2023): 85, 10.1186/s13058-023-01672-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Zhong M., Ren X., Xia W., Qian Y., Sun K., and Wu J., “The Role of Adjuvant Endocrine Treatment in ER+, PR‐, HER2‐ Early Breast Cancer: A Retrospective Study of Real‐World Data,” Scientific Reports 14, no. 1 (2024): 26377, 10.1038/s41598-024-78341-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Wu W., Yang Y., Yang W., et al., “Tailoring Escalation Adjuvant Therapy for Early‐Stage Triple‐Negative Breast Cancer in the CBCSG010 Clinical Trial Biomarker Analysis,” Journal of the National Comprehensive Cancer Network 22, no. 8 (2024): 528–536, 10.6004/jnccn.2024.7032. [DOI] [PubMed] [Google Scholar]
  • 36. Hancock J. T. and Khoshgoftaar T. M., “CatBoost for Big Data: An Interdisciplinary Review,” Journal of Big Data 7, no. 1 (2020): 94, 10.1186/s40537-020-00369-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Guo Z., Ong Y. S., He T., and Liu H., “Co‐Learning Bayesian Optimization,” IEEE Transactions on Cybernetics 52, no. 9 (2022): 9820–9833, 10.1109/TCYB.2022.3168551. [DOI] [PubMed] [Google Scholar]
  • 38. Angraal S., Mortazavi B. J., Gupta A., et al., “Machine Learning Prediction of Mortality and Hospitalization in Heart Failure With Preserved Ejection Fraction,” JACC Heart Failure 8, no. 1 (2020): 12–21, 10.1016/j.jchf.2019.06.013. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Figure S1: The correlation heatmap of all features (p < 0.8).

CAM4-14-e71221-s001.docx (328.1KB, docx)

Data Availability Statement

The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.


Articles from Cancer Medicine are provided here courtesy of Wiley

RESOURCES