Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2025 Oct 30;15:38079. doi: 10.1038/s41598-025-21932-4

Ensemble model for neoadjuvant chemotherapy response prediction and treatment sensitivity in TNBC based on DNA replication stress signatures

Meishun Hu 1, Qifan Luo 1, Jun Li 1, Yanyan Chen 1, Mengting Chen 1, Zhuowan Tian 1, Lei Wei 2, Fangfang Chen 1,, Jingwei Zhang 1,
PMCID: PMC12575693  PMID: 41168278

Abstract

Triple-negative breast cancer (TNBC) is a highly aggressive subtype of breast cancer. Although neoadjuvant chemotherapy (NACT) has some effectiveness in TNBC, a portion of patients still do not benefit from them. The critical role of DNA replication stress (DRS) in cancer therapy has been recognized, but its study in TNBC NACT remains relatively limited. Affymetrix microarray data were obtained from the GEO database for both training and test sets. These data were processed using the “affy” R package. The Boruta algorithm and SVM-RFE method were employed for key gene selection, and an integrated model based on multiple algorithms was developed to establish a risk score. Additionally, the tumor microenvironment (TME) was analyzed, and the correlation between risk score and drug sensitivity was explored, incorporating several drug databases. Through the analysis of TNBC patients’ responses to NACT, we found a close correlation between DRS and TNBC treatment responses and identified eight key genes. The developed ensemble model (ENS) demonstrated high AUC values of 0.922, 0.886, and 0.858 across the three independent datasets, respectively, indicating its strong ability to accurately predict the effectiveness of NACT. The study also revealed that patients with higher risk score are more prone to recurrence and metastasis, and have a rich TME composition. Additionally, drug sensitivity analysis offers potentially effective personalized treatment options for high-risk TNBC. This study successfully constructed an ensemble model to predict TNBC patients’ response to NACT. Additionally, it was discovered that the risk score held significant value in analyzing the correlation between TNBC patients’ TME and drug sensitivity. These findings offer important new insights into personalized treatment strategies for TNBC.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-025-21932-4.

Keywords: TNBC, Machine learning, DNA replication stress, Neoadjuvant chemotherapy

Subject terms: Breast cancer, Breast cancer

Introduction

Breast cancer is the most common tumor in women, with its incidence ranking highest and its mortality rate among the top for malignant tumors in women worldwide1. TNBC, as a distinct subtype of breast cancer, accounts for approximately 15–20% of all breast cancer cases2. Compared to the other breast cancer subtypes, TNBC is more invasive, has a poorer prognosis, and lacks targeted treatment methods3,4. This is because TNBC cells do not express Estrogen Receptor (ER), Progesterone Receptor (PR), and Human Epidermal Growth Factor Receptor 2 (HER2), rendering conventional hormone and HER2-targeted treatments ineffective5. Hence, the development of new therapeutic strategies has become a focal point in clinical research.

NACT, initially applied only to locally advanced breast cancer, is now widely used in the treatment of early-stage breast cancer. Especially in TNBC, NACT has demonstrated significant therapeutic effects, effectively reducing the risks of recurrence and metastasis6. However, due to the lack of clear molecular therapeutic targets in TNBC, NACT still primarily employs traditional chemotherapy methods, such as the combination of anthracyclines and taxanes7,8. Although about 40% of TNBC patients achieve a pathologic complete response (pCR), a substantial number remain unresponsive to this treatment9, potentially missing the optimal window for surgical intervention. Therefore, accurately identifying high-risk patients and formulating individualized treatment plans to avoid the adverse effects of ineffective therapies and improve patient outcomes has become a critical research focus for achieving the goals of stratified treatment and precision medicine.

DNA replication is a central process in cellular life and has garnered increasing attention in the field of oncology in recent years. The stress and obstacles potentially encountered during this process, collectively termed DRS, are known to be associated with DNA damage repair, genomic instability, and the origin and progression of tumors10,11. Accumulating research data indicates that DRS plays a pivotal role in the treatment of various cancers, especially in TNBC12,13. For instance, the combined use of RNA polymerase I inhibitors and p53 inhibitors can enhance DNA damage and replication stress, effectively suppressing the development of TNBC14. Moreover, studies have shown that the absence of DDX11 can induce DRS and heighten the sensitivity of various cancer cells to DNA damaging agents15. Given these findings, it can be hypothesized that different levels of DRS might lead to diverse responses to NACT, offering a new insight for the deep understanding and exploration of TNBC treatment methods. Therefore, it is essential to delve deeply into the levels of DRS in TNBC, aiding in devising more precise treatment strategies for patients and further improving the prognosis.

Given the aforementioned background, this study aims to explore the potential predictive value and treatment relevance of DRS in the response to NACT for TNBC, with the goal of providing a new scientific basis for improving treatment outcomes and expanding precision treatment strategies. We believe that this research will help enhance the therapeutic outcomes for TNBC patients, thereby improving their survival rates and quality of life.

Materials and methods

Data collection and processing

The training set and test sets data consisted of 3 raw Affymetrix microarray gene expression datasets downloaded from the Gene Expression Omnibus (GEO) database. The training set data was derived from GSE2506616, encompassing 170 biopsy samples from TNBC patients defined as ER-negative, PR-negative, and HER2-negative, prior to NACT (regimen: taxane + anthracycline). The test sets data were respectively sourced from GSE2019417, containing 71 biopsy samples from TNBC patients, also defined as ER-negative, PR-negative, and HER2-negative, before NACT (regimen: paclitaxel, 5-fluorouracil, cyclophosphamide, and doxorubicin) and GSE2027118, comprising 59 biopsy samples from TNBC patients, similarly defined as ER-negative, PR-negative, and HER2-negative, prior to NACT (regimen: paclitaxel, 5-fluorouracil, doxorubicin, and cyclophosphamide). All microarray gene expression data were processed using the robust multi-array average (RMA) algorithm for background correction and normalization to obtain the final expression intensities, provided by the “affy” R package (v1.78.2)19. After splitting the datasets into training and test sets, we performed z-score normalization on each set separately. Specifically, we used the mean and standard deviation of the training set to perform corresponding z-score normalization on test sets. Such processing ensured that the data remained consistent during the model training and testing phases, thereby improving the robustness and reliability of the model.

Feature selection

21 DRS-related signatures were obtained from previous studies20,21, encompassing a total of 982 genes (Table S1). To ensure the applicability of these signatures within our dataset, we first meticulously compared these 982 genes with the genes present in our study’s training and testing sets. During this process, genes that did not appear in either the training or testing sets were excluded to guarantee the analysis’s accuracy and relevance. Through such filtering, we ultimately identified 832 genes that were expressed in our dataset, which serve as the basis for the DRS-related signatures used in our analysis. In the GSE25066, we applied Boruta algorithm22 to identify important genes associated with the response to NACT. We further screened the obtained important genes using the SVM-RFE method in the “caret” R package (v6.0-94)23, with 5-fold cross-validation and 1000 times resampling with replacement on the GSE25066, to identify key genes related to the response to TNBC NACT.

Development of the ensemble model

The “caret” R package (v6.0-94) was utilized to develop a stacked ENS. Six machine learning algorithms were selected, including Neural Networks (NNET)24, Random Forest (RF, implemented via Ranger)25, XGBoost (XGB)26, Naive Bayes (BN)27, Gradient Boosting Machine (GBM)28, and Support Vector Machine (SVM)29 as base models. Each model was trained on a 70% random split of the GSE25066 dataset, with 10-fold cross-validation applied to enhance robustness and reduce overfitting. Hyperparameter tuning was performed using grid search (Table S2). To integrate base model predictions, a two-layer stacking framework was employed, where each base model independently generated probability scores, which were then used as input features for a generalized linear model (GLM)30 meta-model. GLM was chosen due to its simplicity and effectiveness in minimizing overfitting, ensuring stable performance across datasets. A total of 63 model combinations were evaluated, and the optimal ENS was selected based on the highest average AUC across three independent datasets (GSE25066, GSE20194, and GSE20271), ultimately consisting of NNET, RF, BN, and GBM as base models, with predictions aggregated through GLM-based stacking. The ENS’s predicted probabilities were used as risk scores. To assess the relative importance of each feature in the ENS, we performed Min-Max normalization on the feature importance in all base models. The detailed process of obtaining feature importance is described in the supplementary material. Subsequently, we calculated the sum of the products of the absolute values of each base model coefficient and its corresponding feature importance. This was used to help explain the contribution of each feature in the predictions of the ENS.

Risk scoring and risk stratification

In this study, we predict the potential occurrence of residual disease (RD) in TNBC patients after receiving NACT based on the ENS. The probability values generated by this prediction process, namely the likelihood of each patient not fully responding to NACT, are directly defined as risk scores. The core purpose of this risk scoring is to quantify patients’ sensitivity to NACT, providing a pre-quantitative indicator for patients’ treatment response. To effectively stratify patients based on risk scores, we employ ROC curve analysis to determine an optimal cutoff point. Patients with risk scores higher than or equal to this cutoff point are classified into a high-risk group, indicating they may be insensitive to NACT; whereas patients with risk scores below the cutoff point are placed into a low-risk group, suggesting they may have a better expected response to NACT. This grouping strategy aims to reveal patients’ potential reactions to NACT, providing a scientific basis for subsequent personalized treatment choices.

Model evaluation

A comprehensive evaluation of the ENS was conducted, employing Decision Curve Analysis (DCA), calibration curves, and the Precision-Recall Curve (PR). DCA analyzed the net benefit at different probability thresholds, the calibration curve assessed the accuracy of model predictions, and AUCPR was used to demonstrate the model’s precision and recall rates. Additionally, to compare the efficacy of the ENS, we refitted and evaluated five published TNBC chemotherapy response prediction scores and models: the three-gene score31, four-gene score32, six-gene score33, sixteen-gene model34, and eighty-six-gene model35. We provide detailed procedures for obtaining these scores and constructing the model in the supplementary materials. Notably, two different versions of the sixteen-gene model were included in the comparison. The performance of all these models was compared using the AUC metric.

Differential gene expression and functional enrichment analysis

Differential expression genes (DEGs) between different risk groups were analyzed using the “limma” package (v 3.58.0), with the threshold set at FDR less than 0.05 and |log2FC| greater than 0.5. Subsequently, we performed Gene Ontology (GO)36 and Kyoto Encyclopedia of Genes and Genomes (KEGG)37,38 pathway enrichment analyses on these DEGs using the “clusterProfiler” package (v 4.8.3)39 to reveal the roles of these genes in biological functions and metabolic signaling pathways. Additionally, we employed Gene Set Variation Analysis (GSVA, v1.50.0)—an unsupervised method—to investigate the enrichment differences in tumor hallmark pathways among different risk groups. This method allows for the assessment of the activity of specific gene sets in individual samples40.

Tumor microenvironment analysis

To explore the relationship between the tumor microenvironment and risk scores, we employed the “immunedeconv” R package (v2.1.0) and applied six algorithms—XCELL41, TIMER42, QUANTISEQ43, MCPCOUNTER44, EPIC45, and CIBERSORT46—to assess the infiltration of immune cells. Through Pearson correlation analysis, we calculated the association between risk scores and these immune cell infiltration indices. In addition, we utilized the “estimate” R package (v1.0.13)47 to obtain immune scores, stromal scores, and tumor purity within the TME, enabling a more comprehensive understanding of the tumor’s immune context.

Drug sensitivity analysis

The Cancer Drug Sensitivity Genomics (GDSC, https://www.cancerrxgene.org/) database was utilized to obtain data on drug sensitivity. The “oncoPredict” package (v1.2)48 was used to calculate the IC50 values of drugs in samples, a key metric for measuring drug inhibitory effects. To determine the correlation between drug sensitivity and risk scores, we employed Pearson correlation analysis. For further screening of small molecule drugs sensitive to the high-risk TNBC, we also used the “oncoPredict” package (v1.2) to analyze 481 drugs in the Cancer Therapeutics Response Portal (CTRP, https://portals.broadinstitute.org/ctrp.v2.1/) database, calculating their AUC values. Additionally, we leveraged the Connectivity Map (Cmap, https://clue.io/query) database, submitting the top 150 upregulated and downregulated DEGs from high-risk and low-risk groups to obtain drug sensitivity scores.

Statistical analysis

In this research, we performed the majority of the data analysis using the R programming language, version 4.3.1. To create graphical illustrations, we utilized the “ggplot2” R package (v 3.5.1)49. We employed the Wilcoxon test to gauge the statistical significance of the quantitative data when analyzing more than two groups. The Kaplan-Meier survival analysis was conducted, with the log-rank method applied to test for statistical significance. In this context, a two-sided P-value below 0.05 or an FDR below 0.05 was considered statistically significant. To control the FDR when conducting multiple hypothesis testing, we applied the Benjamini-Hochberg method.

Results

DNA replication stress in TNBC NACT

To explore the potential biological mechanisms underlying the response to NACT in TNBC, we first performed Gene Set Variation Analysis (GSVA) to assess functional differences between the pCR and RD groups. The results revealed significant discrepancies in Mismatch Repair (MMR) and DNA replication repair, both enriched in the pCR group (Fig. 1A). Since these pathways are closely associated with genomic instability and replication stress, we hypothesized that DRS might be a key factor influencing NACT response in TNBC. To further investigate this, we applied single-sample Gene Set Enrichment Analysis (ssGSEA) to quantify DRS levels in GSE25066 samples. A complex clustering heatmap demonstrated that DRS-related signatures were significantly enriched in samples from the pCR group (Fig. 1B), reinforcing their association with a favorable response to NACT. To identify genes related to NACT linked to DRS, we screened 832 genes from 21 DRS-related signatures in the GSE25066. Through the screening with the Boruta algorithm, 13 important genes that are significantly associated with the response to NACT were identified (Fig. 1C). Next, we performed 1000 times resampling with replacement on these 13 important genes and further screened them using the SVM-RFE method. 8 genes had an occurrence frequency exceeding 0.8 (Fig. 1D), including CCND1, DCTN3, FEN1, GMNN, H2BC9, ILF2, PMF1, and SKA1. The expression among these 8 key genes shows significant correlation (Fig. 1E), and notably, CCND1 is negatively correlated with the other 7 genes. To further illustrate their expression patterns, we generated a heatmap (Fig. S1), which clearly shows distinct clustering of these genes between pCR and RD patients. GO functional enrichment analysis showed that these genes are predominantly involved in cellular component functions related to cell cycle DNA replication, chromosomal regions, centromeres, and chromosome condensation, as well as molecular functions such as histone deacetylation, transcriptional co-repressor activity, and endoribonuclease activity (Fig. 1F). This suggests these genes play crucial roles in cell cycle control, chromosome maintenance, and gene expression regulation. Additionally, the interacting proteins of the key genes were retrieved from the STRING database (https://string-db.org) and used to construct a protein-protein interaction (PPI) network. PPI network analysis further revealed that these key genes and their interacting partners are clustered into functional modules, including DNA replication and repair, chromatin remodeling, microtubule dynamics, chromosome segregation, and RNA processing (Fig. S2). These biological functions suggest that disruptions in these pathways can induce DRS, a state characterized by stalled replication forks and genomic instability, which can sensitize tumor cells to DNA-damaging chemotherapeutic agents or, conversely, promote resistance by activating DNA repair pathways. In summary, our study demonstrated a strong link between DRS and the response to TNBC NACT, identifying eight key genes associated with the response to TNBC NACT.

Fig. 1.

Fig. 1

Association of DRS with response to TNBC NACT and identification of key genes. (A) Significant difference in MMR and DNA replication repair between the pCR and RD groups. (B) Enrichment of DRS-related signatures in the pCR group as assessed by ssGESA in the GSE25066 samples. (C) Boruta algorithm identified 13 important genes related to the response to NACT. Yellow indicates confirmed features, while other colors represent shadow attributes. (D) Radar chart displays the occurrence frequency of the 13 important genes during the SVM-RFE process with 1000 times resampling with replacement. (E) Expression correlation among the 8 key genes. (F) GO functional enrichment of key genes.

Development and evaluation of ENS

Six machine learning models known for their robustness—NNET, RF, XGB, BN, GBM and SVM—were selected to develop a robust prediction model. Each model was trained on the training set, which was created by splitting the GSE25066 dataset with 70%. Subsequently, these models were combined in a stacking approach, resulting in a total of 63 different model combinations (Fig. 2A). Based on the average AUC performance across the three datasets (GSE25066, GSE20194, and GSE20271), NNET, RF, BN, and GBM were selected as the base models for final stacking, leading to the construction of the final ensemble model, ENS. The ENS achieved AUCs of 0.922 (95% CI: 0.882–0.963) in GSE25066, 0.886 (95% CI: 0.806–0.966) in GSE20194, and 0.858 (95% CI: 0.764–0.952), confirming its robust predictive performance. Hyperparameter sensitivity analysis further showed modest AUC variation for the four base models (RF: 0.04, GBM: 0.08, Naive Bayes: 0.03), while NNET exhibited greater sensitivity (0.12), indicating the need for careful tuning (Table S3).

Fig. 2.

Fig. 2

Comparison of multiple model performances and comprehensive evaluation of the ENS. (A) The left side of the figure shows the AUC performance of various combinations of six base models (NNET, RF, XGB, BN, GBM, SVM) across three datasets (GSE25066, GSE20194, and GSE20271), while the right side displays the mean AUC for each combination. The color scale represents the AUC values for the individual datasets, with red indicating higher AUCs and blue indicating lower AUCs. (BD) DCA curve demonstrates the clinical utility of the model. The x-axis represents the threshold probability of patients. The y-axis represents the net benefit of the model, with ‘none’ indicating that no patients receive treatment, and ‘all’ indicating that all patients undergo treatment. (E) Calibration curves showing the closeness of the ENS predictions to the actual observed results across the GSE25066, GSE20194, and GSE20271.

To ensure the stability of the models and prevent overfitting or underfitting, 10-fold cross-validation was employed during the modeling process. The consistent AUC scores across all folds confirmed the robustness of the models (Fig. S3A). In addition, we compared 10-fold CV with leave-one-out cross-validation (LOOCV), which yielded highly similar AUC values across datasets (Fig. S3B), further supporting the stability of the modeling results. Additionally, a sensitivity analysis was conducted by evaluating the model’s performance at different data split percentages (50%, 60%, 70%, and 80%). The results demonstrated that the model maintained consistent performance across all splits, further highlighting its ability to generalize well across varying training set sizes (Fig. S3C). The ENS model demonstrated strong predictive power, achieving consistently high AUC scores across all datasets. This was further confirmed by the DCA results, which quantified the net clinical benefit of the ENS across a wide range of threshold ranges of 0.01–0.81 in the GSE25066 dataset, 0.01–0.81 in the GSE20194 dataset, and 0.01–0.55 in the GSE20271 dataset (Fig. 2B-D), suggesting the potential importance of the ENS in guiding clinical decision-making during NACT. The calibration curve shows that the predictions of the ENS are very close to the actual observed outcomes (Fig. 2E), indicating that the predicted probabilities are well calibrated and can be reliably interpreted in clinical practice. Additionally, the PR curve further highlights the superior predictive performance of the ENS model across the three datasets (Fig. S3D). Given the class imbalance between pCR and RD, the PR curve provides complementary information to ROC analysis, with PRAUC values of 0.867 for GSE25066, 0.801 for GSE20194, and 0.487 for GSE20271. These results confirm the model’s robust ability to accurately predict outcomes in the context of imbalanced data. Similarly, the prediction probability distribution (Fig. S3E) reveals consistent patterns across the datasets, suggesting the ENS model’s strong generalization capability. The learning curve (Fig. S3F) shows that as the training sample size increases, the log loss for both the training and validation sets gradually decreases and stabilizes at a low level, further confirming the model’s accuracy and reliability. To evaluate the model’s robustness to missing data, missing values were simulated, and the data was subsequently imputed using median and KNN methods. The resulting AUCs differed from the original by < 0.03 across all datasets (Fig. S3G), indicating minimal impact on model performance. Finally, we extracted the weights of the base models and the feature importance, providing interpretability for the ENS (Fig. S3H, I). SHAP analysis further illustrated the contribution and direction of each key gene to the predictions (Fig. S3J). In summary, through the comprehensive application of multiple machine learning models, we successfully developed an outstanding ENS. This model not only demonstrated excellent predictive performance and discriminative ability across multiple datasets but also showed potential for practical clinical decision-making applications, providing valuable references for future NACT.

Further evaluation and performance comparison of the ENS

Following a thorough evaluation, the ENS was found to demonstrate significant potential value in clinical applications. Univariate and multivariate logistic regression analyses confirmed that the risk score possesses independent predictive ability for the response to NACT (Fig. 3A, B). To further optimize clinical decision-making, we constructed a nomogram incorporating the patients’ clinical characteristics, which visually represents the relationship between different variables and treatment response (Fig. 3C). The results of the DCA and calibration curve further validate the nomogram as an accurate and reliable predictive tool for assessing the anticipated response of TNBC patients to NACT (Fig. 3D, E). Finally, we compared the ENS with five existing NACT prediction scores and models. The ENS demonstrated significantly superior predictive performance compared to the others (Fig. 3F-H), further validating the effectiveness of the ENS in predicting responses to NACT. Moreover, DCA confirmed that the ENS provided the greater net clinical benefit across wide threshold ranges, and PR analysis showed higher PRAUCs than most competing models, underscoring its robustness under imbalanced outcomes (Fig. S4). These comprehensive results strongly support the clinical application value of the ENS, suggesting that it has the potential to become an effective tool to help doctors develop more personalized and accurate diagnostic and treatment plans.

Fig. 3.

Fig. 3

Evaluation and performance comparison of the ENS. (A), ( B) Univariate and multivariate logistic regression analyses suggested the independent predictive ability of the risk score for the response to NACT. (C) Nomogram for predicting the response to TNBC NACT. (D) DCA curve demonstrates the clinical utility of the nomogram in predicting the response to NACT. (E) Calibration curve illustrates the accuracy of the nomogram in predicting the response to NACT for TNBC. (FH) Performance comparison of the ENS with five existing NACT prediction scores and models across the GSE25066, GSE20194, and GSE20271.

Correlation exploration of risk score with clinical features

To better develop personalized clinical treatment strategies, we further explored the correlation between risk scores and NACT risk stratification with clinical prognosis. The risk score cutoff (0.715) was determined using ROC analysis, providing the optimal balance between sensitivity and specificity. Clinically, this threshold distinguishes patients at higher risk of residual disease and recurrence, indicating those who may require treatment strategies beyond standard NACT. Univariate and multivariate Cox regression analysis confirmed that the risk score is an independent and effective prognostic factor (Fig. 4A, B). The results of the survival curves further showed that patients in the high-risk group had a significantly poorer prognosis (Fig. 4C), and similar results were also obtained in subgroups stratified according to clinical features (Fig. S5). The survival status plot and risk curve suggest that as the risk score increases, the probability of patients developing distant metastases also increases (Fig. 4D, E). Additionally, we explored the correlation between the risk score and five clinical features. A significant correlation was observed between a higher T stage and a higher risk score (p = 0.041), while no significant statistical differences were found for other clinical features (Fig. 4F-J). In summary, the risk score can be deemed as an independent factor closely tied to prognosis, where a high risk is associated with a worse outcome. Developing personalized treatment strategies for these patients in the context of NACT may improve their prognosis.

Fig. 4.

Fig. 4

Clinical correlation analysis of the ENS prediction values. (A), (B) Univariate and multivariate Cox regression analysis validates the risk score as an independent prognostic factor. (C) Survival curves comparing prognosis between high-risk and low-risk groups. (D) Risk curve based on risk score. (E) Survival status plot of patients in the GSE25066. ‘Event’ indicates the occurrence of observed distant metastases, while ‘Censored’ signifies that no distant metastases were observed by the end of the study. (F-J) Correlation between the risk score and various clinical features, including age, AJCC stage, histological grade, nodal status (N), and tumor size (T).

Biological functional differences

To explore the biological function differences between different risk groups, differential gene expression analysis using the limma package identified 97 DEGs, with 34 upregulated genes in the high-risk group and 63 upregulated genes in the low-risk group (Fig. 5A). We further conducted KEGG pathway and GO function enrichment analyses to explore the biological significance of these genes. KEGG pathway enrichment analysis revealed significant enrichment in pathways closely related to cell cycle control and cancer development, such as cell cycle and proteoglycans in cancer (Fig. 5B). These differential genes were significantly enriched in functions related to the extracellular matrix, collagen composition, and growth factor binding, emphasizing their potential biological importance in TNBC (Fig. 5C). These enriched pathways and functions provide strong evidence for further elucidating the biological basis of TNBC treatment responses. To delve deeper into the association between the identified differential genes and cancer biological pathways, we used the GSVA method to analyze the enrichment status of tumor hallmark pathways. The results showed that genes in the high-risk group were significantly enriched in pathways closely related to tumor aggressiveness, metastatic ability, and drug resistance, such as angiogenesis, apical surface, coagulation, and epithelial-mesenchymal transition. Additionally, the high-risk group also exhibited significant enrichment in inflammation-related pathways such as IL-6/JAK/STAT3 signaling, inflammatory response, and TNFα signaling via NF-κB. In contrast, genes in the low-risk group were mainly associated with cell cycle-related pathways like E2F targets, G2M checkpoint, and MYC targets V1/V2 (Fig. 5D). By systematically analyzing the differential expression genes of different risk groups, we not only identified key biological pathways related to tumor growth and metastasis but also revealed that potential inflammatory response mechanisms are closely related to the risk stratification of TNBC patients. These findings reinforce the importance of risk scoring in identifying potential therapeutic targets and predicting treatment outcomes, providing valuable molecular information for subsequent clinical decision-making and personalized treatment.

Fig. 5.

Fig. 5

Biological functional differences between TNBC risk groups. (A) Volcano plot of DEGs between high-risk and low-risk groups identified by limma package. (B) KEGG pathway enrichment analysis highlighting significant pathways associated with cancer development and cell cycle. (C) GO function enrichment analysis emphasizing extracellular matrix. (D) Heatmap of GSVA enrichment status for tumor hallmark pathways, contrasting high-risk and low-risk groups with pathways related to tumor aggressiveness, metastatic ability, drug resistance, and inflammation.

TME features in risk group

We delve into the association between risk scores and TME features, aiming to uncover key TME factors to guide personalized treatment. In this study, we used six different bioinformatics algorithms to comprehensively analyze the associations between various components of the TME and risk score. The analysis revealed that the risk score is significantly positively correlated (r > 0.3, P < 0.05) with cancer-associated fibroblasts (CAFs), endothelial cells, and NK cells, and significantly negatively correlated (r < -0.3, P < 0.05) with T cell CD4 + Th1/Th2 and T cells follicular helper (Fig. 6A). Using the GSVA method, we conducted enrichment analysis on 29 immune function features and found significant enrichment in high-risk groups for 10 immune cells including B cells, DCs, iDCs, macrophages, mast cells, neutrophils, T helper cells, Th1 cells, TIL, Treg, and 6 immune functions including CCR, checkpoint, cytolytic activity, T cell co-inhibition, APC co-simulation, Type II IFN response (Fig. 6B). These results suggest a more active immune function in high-risk patients. Applying the ESTIMATE algorithm, our analysis showed that the high-risk group had significantly higher immune and stromal scores, and overall ESTIMATE score (Figs. 6C-E), further confirming the richness of immune and stromal components in high-risk TME. Additionally, tumor purity was significantly lower in the high-risk group (Fig. 6F), which could be a key factor in the sample’s insensitivity to chemotherapy drugs in this group. In further analysis of risk score stratification, we noted significant expression differences in fifteen immune checkpoint-related genes between different risk groups, especially PDCD1, CTLA4, CD40, CD40LG, CD48, CD68, CSF1R, HAVCR1, KIR2DL3, KIR3DL1, LAIR1, NECTIN2, TGFBI, TNFRSF25, VTCN1, which were significantly upregulated in the high-risk group (Fig. 6G). These results indicate that high-risk patients may be more responsive to immune checkpoint inhibitors (ICIs) therapy, offering potential therapeutic advantages for this group. In summary, our study through multi-dimensional comprehensive analysis reveals the TME characteristics of TNBC prior to NACT, showing significant activity of specific immune cells and functions in the high-risk patient group. This active state may indicate potential sensitivity to ICIs therapy in high-risk patients, providing important biological insights and clinical guidance for future treatment options.

Fig. 6.

Fig. 6

Patterns of immune cell infiltration in TNBC with varying risk profiles. (A) Correlation analysis of TME components and risk score across multiple bioinformatics platforms. (B) GSVA enrichment of immune functions in high-risk versus low-risk groups. (C–F) The difference of immune scores, stromal scores, ESTIMATE scores, and tumor purity scores in risk group. (G) Differential expression of immune checkpoint genes in risk group. (*P < 0.05, **P < 0.01, ***P < 0.001).

Potential drug screening for high-risk group

In exploring potentially effective treatment drugs for high-risk TNBC patients, we utilized the oncoPredict algorithm to analyze the IC50 values of 198 drugs in the GDSC V2 database. Through Pearson correlation analysis comparing with risk scores, we successfully identified 77 drugs (|r| > 0.2, P < 0.05; Fig. 7A). Among these, 11 drugs showed a significant negative correlation between their IC50 values and the risk score, primarily targeting the PI3K/MTOR and MAPK/ERK signaling pathways. The remaining 66 drugs exhibited a positive correlation between their IC50 values and risk score, with mechanisms of action mainly involving the DNA replication and apoptosis processes (Fig. 7B). Notably, several traditional breast cancer chemotherapy drugs, including platinum compounds, taxanes, and cyclophosphamide, showed low sensitivity in the high-risk group (Figs. 7C-E). Additionally, analysis using the Cmap database identified 101 small molecule inhibitors sensitive to the high-risk group (FDR < 0.05 and normalized score > 1; Table S4), with a matrix plot displaying the top 50 small molecule inhibitors and their associated pathways, including inhibitors targeting MTOR, MEK, PI3K, and CDK (Fig. 7F). By combining the oncoPredict algorithm with the CTRP V2 database, we calculated AUC values for 481 drugs in the GSE25066 samples, identifying 56 drugs significantly correlated with sensitivity in high-risk score samples (r < -0.2, P < 0.05; Table S5). Further analysis identified three small molecule inhibitors with potential benefits for treating high-risk TNBC patients (Fig. 7G). To further explore their clinical relevance, we tested these inhibitors in the pCR and RD groups. The results indicate that these inhibitors show a trend toward sensitivity in the RD group (Fig. S6). However, since these inhibitors were specifically selected based on their relevance to the high-risk group identified in the risk stratification, no statistically significant differences were observed. KEGG pathway enrichment analysis indicated that the targets of these inhibitors are primarily concentrated in the PI3K-Akt and MAPK signaling pathways (Fig. 7H). In summary, this study provides potential therapeutic targets for personalized drug selection in high-risk TNBC patients. The drug sensitivity differences we identified related to risk scores, especially small molecule inhibitors in key signaling pathways such as PI3K/AKT/MTOR and MAPK/ERK, offer valuable directions for future clinical applications and therapeutic intervention research.

Fig. 7.

Fig. 7

Drug sensitivity and pathway analysis in high-risk TNBC. (A) Pearson correlation of IC50 values for drugs in the GDSC database with risk score. (B) Bar chart of drug count versus sensitivity type across various pathways. (C–E) Boxplots comparing the sensitivity of traditional chemotherapy agents in high-risk versus low-risk groups. (F) Matrix plot of the top 50 small molecule inhibitors and their target pathways from the Cmap database. (G) Venn diagram displaying the overlap of drugs sensitive to high-risk TNBC between the Cmap and CTRP databases. (H) KEGG pathway enrichment analysis for targets of the 3 noteworthy small molecule inhibitors. (*P < 0.05, **P < 0.01, ***P < 0.001).

Discussion

This study conducted an in-depth investigation into the relationship between NACT response and DRS in TNBC, successfully identifying key genes associated with this connection. By analyzing DRS-related signatures, we found significant differences between TNBC patients with pCR and RD, particularly with these signatures being more pronounced in the pCR group. Moreover, we identified eight key genes significantly related to NACT response from DRS-related signatures. Functional analysis indicated their involvement in cell cycle regulation, DNA replication and repair, and chromosomal organization, processes closely linked to the mechanisms of chemotherapy, thus providing a biological basis for their association with treatment response. Nevertheless, the underlying mechanisms require further investigation. Through the construction and evaluation of an ENS, we demonstrated its high accuracy and superior performance in predicting the responses to NACT, providing significant value for clinical decision-making and aiding physicians in developing more personalized and precise treatment plans. In terms of drug sensitivity, our analysis revealed the low sensitivity of traditional chemotherapy drugs to high-risk TNBC patients, while small molecule inhibitors targeting the PI3K/MTOR and MAPK/ERK signaling pathways showed enhanced sensitivity, offering potential new therapeutic targets for high-risk TNBC patients. Additionally, our study revealed a close relationship between risk scores and the TME, and found that patients in the high-risk group may have potential sensitivity to ICIs treatment. Overall, our research provides in-depth molecular insights into TNBC NACT and valuable guidance for personalized treatment strategies for high-risk patients, with the potential to improve their treatment outcomes and survival prognosis.

The study focused on the role of DRS in the NACT of TNBC. It was found that the DNA repair pathway was significantly enriched in comparison to the pCR group, which showed a favorable response to NACT, aligning with findings from previous studies50,51. This enrichment may be due to DNA-damaging agents exacerbating DRS, thereby increasing genomic instability and DNA damage, leading to the death of tumor cells52. Through ssGESA analysis, we further observed that DRS-related signatures were more pronounced in the pCR group, highlighting the importance of DRS in TNBC treatment. DRS plays a complex dual role in cancer treatment: on one hand, increased DRS can enhance tumor sensitivity to anticancer treatments; on the other hand, the inhibition of DRS might lead to resistance of tumor cells to anticancer drugs53. In summary, our study provides a new perspective on understanding the molecular mechanisms of NACT in TNBC, particularly in emphasizing the role of DRS in TNBC treatment. These findings could contribute to the development of more effective treatment strategies in the future, especially targeted therapies addressing DNA damage and repair pathways.

The increasing application of machine learning in biomedical fields has highlighted the significant advantages of ensemble learning, particularly stacking methods, in predicting various disease classifications 54 55. However, many existing studies rely on single-model approaches, such as logistic regression or random forests, which often fail to capture the complex biological heterogeneity of TNBC, limiting their generalizability. In contrast, our study employs a more flexible ensemble method, combining multiple base models (NNET, RF, BN, GBM) through a stacking strategy. Stacking methods provide a comprehensive and robust predictive framework by integrating predictions from multiple base models and using a secondary model56,57, surpassing traditional ensemble learning strategies like bagging, boosting, and voting58. This approach enhances the model’s accuracy, robustness, and generalizability across different datasets, as demonstrated by the ENS’s superior performance with AUC values of 0.922, 0.886, and 0.858 across independent datasets, significantly outperforming traditional single-model methods. In terms of gene features, other studies focus on biological features such as immune response or cell proliferation, our model differs by incorporating DRS as a key biological feature, providing a new perspective on treatment response. Clinically, the ENS provides substantial benefits by enabling accurate patient stratification, particularly identifying high-risk patients who are less likely to respond to standard NACT. This capability supports the development of personalized treatment plans, including exploring alternative therapies or optimizing drug selection, to improve patient outcomes. Furthermore, a nomogram based on the ENS was developed to predict responses to TNBC NACT, and the ENS-derived risk score was identified as an independent prognostic indicator, with high-risk patients exhibiting significantly poorer outcomes than their low-risk counterparts. These findings establish a strong foundation for tailoring treatment strategies based on risk stratification, particularly for improving the prognosis of high-risk TNBC patients. In summary, this study successfully applied stacking techniques to construct the ENS based on DRS-related signatures. This model not only demonstrated exceptional predictive performance across multiple datasets but also showed its potential application value in clinical decision-making. Figure 8 illustrates the ideal future workflow where the ENS can help optimize treatment strategies by stratifying TNBC patients into risk groups. In clinical practice, patients identified as low-risk would continue to receive standard chemotherapy, while those classified as high-risk could be directed toward alternative options such as chemotherapy combined with targeted therapy, chemotherapy combined with immunotherapy, or, in selected cases, direct surgery. This integration highlights how the risk score can guide individualized treatment intensity and avoid ineffective therapies.

Fig. 8.

Fig. 8

Clinical workflow for stratifying TNBC Patients using the ENS.

Further analysis was conducted on the features of the TME in different risk groups. TME plays a crucial role in tumor progression, significantly contributing to tumor proliferation and metastasis, suppression of the immune system, and drug resistance59. To reveal the complex role of the immune microenvironment in TNBC treatment, we applied six different bioinformatics algorithms to comprehensively analyze the association between TNBC patient immune microenvironment components and risk scores. Our analysis revealed that high risk score is significantly positively correlated with various types of TME cell infiltration, especially with CAFs. The enrichment of endothelial cell infiltration and the IL6/JAK/STAT3 signaling pathway in the high-risk group may be the main source of production and activation of CAFs60. CAFs not only promote tumor invasion and metastasis but also suppress anti-tumor immune responses through interactions with immune cells and mediate chemotherapy drug resistance61. This mechanism may be one of the key reasons for the high-risk group’s insensitivity to chemotherapy drugs and higher risk of recurrence and metastasis. Additionally, through GSVA and ESTIMATE analysis of immune functions, we further confirmed the richness of immune and stromal cells in the TME of the high-risk group. Current research in TNBC explores various drugs aimed at targeting CAFs to reshape the TME, thereby enhancing anti-tumor immunity62,63. Our findings provide further scientific evidence for therapy strategies targeting CAFs, emphasizing the potential value of reshaping the TME in TNBC treatment.

With the rapid advancement of immunotherapy in cancer, particularly the application of ICIs in solid tumor treatment, the role of such therapies in TNBC NACT is increasingly emphasized. Studies have shown that the addition of PD-1/PD-L1 inhibitors can significantly improve the response of TNBC patients to NACT64,65. In our study, we observed that crucial immune checkpoint genes like PDCD1 and CTLA4 are significantly overexpressed in high-risk TNBC patients. This finding suggests that patients in the high-risk group are more likely to benefit from ICIs treatment. These results provide important evidence for personalized strategies in TNBC NACT, especially when considering the use of immune checkpoint inhibitors as a treatment approach. The application of this treatment strategy could significantly improve the clinical outcomes and prognosis of high-risk TNBC patients.

A comprehensive screening for selecting treatment drugs for high-risk TNBC patients was conducted, involving the integrated analysis of multiple drug databases. This process yielded key insights into drug sensitivity. Through the analysis of drug sensitivity data from the GDSC database, we found that in the high-risk group of TNBC patients, the drugs that exhibit sensitivity primarily target the PI3K/MTOR and ERK/MAPK signaling pathways. Interestingly, conventional chemotherapy drugs, such as platinum compounds, taxanes, and cyclophosphamide, displayed lower sensitivity in these patients, indicating potential resistance to standard treatments. Moreover, analysis of the Cmap database showed that the high-risk group was particularly sensitive to inhibitors targeting MTOR, MEK, and PI3K pathways. Further, our analysis included the Cmap and CTRP databases, through which we successfully identified four small molecule drugs that showed sensitivity to high-risk TNBC patients: TG-101348, thalidomide, and erlotinib. The targets of these drugs are mainly focused on the PI3K/AKT and MAPK signaling pathways. The interaction and crosstalk between these two pathways are key mechanisms for tumor drug resistance66. These findings underscore the importance of developing more targeted and innovative therapies for high-risk TNBC patients to overcome resistance to traditional treatments. Overall, this study provides valuable biological and molecular bases for developing new treatment strategies and optimizing existing ones, which is crucial for improving treatment outcomes and survival rates in high-risk TNBC patients. Future research should further validate the safety and efficacy of these targeted drugs in clinical applications and explore their optimal combination in comprehensive treatment regimens to maximize therapeutic effects.

There are several limitations to this study. First, DRS-related signatures show promise as biomarkers for predicting chemotherapy response in TNBC, but their application may be limited by the heterogeneity of TNBC subtypes. Variability in DRS expression across subtypes could affect the model’s generalizability. Future research should focus on refined subtype classification to account for molecular and genetic heterogeneity, enabling more personalized use of DRS signatures. Additionally, the data used primarily comes from public GEO database and may be influenced by variations between different datasets within the database, as well as by experimental operations and equipment. While we utilized three independent datasets to validate the model, the relatively small sample sizes and lack of diverse geographical and clinical representations may limit the generalizability of the findings. This highlights the need for more heterogeneous datasets to ensure the robustness of the model in broader patient populations. Furthermore, as a retrospective study, there is a risk of selection bias, as patient inclusion is based on available datasets rather than a controlled prospective design, potentially leading to an overrepresentation of specific patient subgroups. To mitigate this, future prospective multi-center studies with diverse populations are needed to improve generalizability. In terms of bioinformatics analysis, each method we applied carries its specific assumptions and limitations, which might affect the interpretation and clinical application of the results. Lastly, while our model demonstrated good predictive performance across three independent datasets, its application in real-world clinical settings requires further validation through external datasets and prospective, multi-center studies. Future research should focus on incorporating larger, multi-ethnic cohorts, refining TNBC subtype classification, and integrating additional omics data to enhance model precision and clinical relevance.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1 (77.6KB, xlsx)
Supplementary Material 2 (59.1MB, docx)

Acknowledgements

We would like to express gratitude for the data support from the GEO (http://www.ncbi.nlm.nih.gov/geo/), GDSC (https://www.cancerrxgene.org/), Cmap (https://clue.io/query), and CTRP (https://portals.broadinstitute.org/ctrp.v2.1/) databases, as well as the selfless contributions of the broader community of software package developers.

Abbreviations

TNBC

Triple-negative breast cancer

DRS

DNA replication stress

PPI

Protein-protein interaction

Author contributions

MSH, FFC, and JWZ conceived and designed the study. MSH conducted the entire data analysis for the manuscript. YYC, MTC, and ZWT were responsible for the data collection. QFL and JL provided technical support. JWZ and LW offered financial support. The first draft of the manuscript was written by MSH and QFL, and all authors reviewed and edited the manuscript.

Funding

This work was supported by the Fundamental Research Funds for the Central Universities (Grant No. 2042023kf0026) and the Beijing Life Oasis Public Service Center (Grant No. cphcf-2022-216).

Data availability

Data supporting the findings of this study are available for download from the databases specified in the methods section. The code for developing the ENS is available on the GitHub website (https://github.com/Noobcodeer467/Ensemble-Model).

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Fangfang Chen, Email: chenfangfang@znhosptial.cn.

Jingwei Zhang, Email: zjwzhang68@whu.edu.cn.

References

  • 1.Nierengarten, M. B. Global cancer statistics 2022. Cancer130, 2. 10.1002/cncr.35444 (2024). [DOI] [PubMed] [Google Scholar]
  • 2.Perou, C. M. et al. Molecular portraits of human breast tumours. Nature406, 747–752. 10.1038/35021093 (2000). [DOI] [PubMed] [Google Scholar]
  • 3.Lin, N. U. et al. Clinicopathologic features, patterns of recurrence, and survival among women with triple-negative breast cancer in the National comprehensive cancer network. Cancer118, 5463–5472. 10.1002/cncr.27581 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Dent, R. et al. Triple-negative breast cancer: Clinical features and patterns of recurrence. Clin. Cancer Res.13, 4429–4434. 10.1158/1078-0432.CCR-06-3045 (2007). [DOI] [PubMed] [Google Scholar]
  • 5.Tamirisa, N., Hunt, K. K., Neoadjuvant & Chemotherapy Endocrine therapy, and targeted therapy for breast cancer: ASCO guideline. Ann. Surg. Oncol.29, 1489–1492. 10.1245/s10434-021-11223-3 (2022). [DOI] [PubMed] [Google Scholar]
  • 6.Cortazar, P. et al. Pathological complete response and long-term clinical benefit in breast cancer: The CTNeoBC pooled analysis. Lancet384, 164–172. 10.1016/S0140-6736(13)62422-8 (2014). [DOI] [PubMed] [Google Scholar]
  • 7.Leon-Ferre, R. A. & Goetz, M. P. Advances in systemic therapies for triple negative breast cancer. BMJ381, e071674. 10.1136/bmj-2022-071674 (2023). [DOI] [PubMed] [Google Scholar]
  • 8.Garufi, G. et al. Updated neoadjuvant treatment landscape for early triple negative breast cancer: Immunotherapy, potential predictive Biomarkers, and novel agents. Cancers (Basel). 14. 10.3390/cancers14174064 (2022). [DOI] [PMC free article] [PubMed]
  • 9.Gamucci, T. et al. Neoadjuvant chemotherapy in triple-negative breast cancer: A multicentric retrospective observational study in real-life setting. J. Cell. Physiol.233, 2313–2323. 10.1002/jcp.26103 (2018). [DOI] [PubMed] [Google Scholar]
  • 10.Macheret, M. & Halazonetis, T. D. DNA replication stress as a hallmark of cancer. Annu. Rev. Pathol.10, 425–448. 10.1146/annurev-pathol-012414-040424 (2015). [DOI] [PubMed] [Google Scholar]
  • 11.Zeman, M. K. & Cimprich, K. A. Causes and consequences of replication stress. Nat. Cell. Biol.16, 2–9. 10.1038/ncb2897 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Cybulla, E. & Vindigni, A. Leveraging the replication stress response to optimize cancer therapy. Nat. Rev. Cancer. 23, 6–24. 10.1038/s41568-022-00518-6 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.da Costa, A., Chowdhury, D., Shapiro, G. I., D’Andrea, A. D. & Konstantinopoulos, P. A. Targeting replication stress in cancer therapy. Nat. Rev. Drug Discov. 22, 38–58. 10.1038/s41573-022-00558-5 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Makhale, A., Nanayakkara, D., Raninga, P., Khanna, K. K. & Kalimutho, M. CX-5461 enhances the efficacy of APR-246 via induction of DNA damage and replication stress in Triple-Negative breast cancer. Int. J. Mol. Sci.2210.3390/ijms22115782 (2021). [DOI] [PMC free article] [PubMed]
  • 15.Jegadesan, N. K. & Branzei, D. DDX11 loss causes replication stress and pharmacologically exploitable DNA repair defects. Proc. Natl. Acad. Sci. U. S. A. 11810.1073/pnas.2024258118 (2021). [DOI] [PMC free article] [PubMed]
  • 16.Hatzis, C. et al. A genomic predictor of response and survival following taxane-anthracycline chemotherapy for invasive breast cancer. JAMA305, 1873–1881. 10.1001/jama.2011.593 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Shi, L. et al. The microarray quality control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models. Nat. Biotechnol.28, 827–838. 10.1038/nbt.1665 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Tabchy, A. et al. Evaluation of a 30-gene paclitaxel, fluorouracil, doxorubicin, and cyclophosphamide chemotherapy response predictor in a multicenter randomized trial in breast cancer. Clin. Cancer Res.16, 5351–5361. 10.1158/1078-0432.CCR-10-1265 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Gautier, L., Cope, L., Bolstad, B. M. & Irizarry, R. A. affy–analysis of affymetrix genechip data at the probe level. Bioinformatics20, 307–315. 10.1093/bioinformatics/btg405 (2004). [DOI] [PubMed] [Google Scholar]
  • 20.Dreyer, S. B. et al. Targeting DNA damage response and replication stress in pancreatic cancer. Gastroenterology160, 362–377 (2021). 10.1053/j.gastro.2020.09.043 [DOI] [PMC free article] [PubMed]
  • 21.Huang, R. H. et al. A machine learning framework develops a DNA replication stress model for predicting clinical outcomes and therapeutic vulnerability in primary prostate cancer. J. Transl. Med.21, 20. 10.1186/s12967-023-03872-7 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Kursa, M. B. & Rudnicki, W. R. Feature selection with the Boruta package. J. Stat. Softw.36, 1–13. 10.18637/jss.v036.i11 (2010). [Google Scholar]
  • 23.Kuhn, M. Building predictive models in R using the caret package. J. Stat. Softw.28, 1–26. 10.18637/jss.v028.i05 (2008).27774042 [Google Scholar]
  • 24.Ripley, B. D. Pattern Recognition and Neural Networks (Cambridge University Press, 2007).
  • 25.Wright, M. N. & Ziegler, A. Ranger: A fast implementation of random forests for high dimensional data in C + + and R. J. Stat. Softw.7710.18637/jss.v077.i01 (2017).
  • 26.Chen, T. & Guestrin, C. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 785–794.
  • 27.Rish, I. In IJCAI 2001 workshop on empirical methods in artificial intelligence. 41–46.
  • 28.Friedman, J. H. Greedy function approximation: A gradient boosting machine. Annals Stat.29 (5), 1189–1232 (2001). [Google Scholar]
  • 29.Cortes, C., Vapnik, V., Support-vector & networks. Mach. Learn.20, 273–297. 10.1007/bf00994018 (1995). [Google Scholar]
  • 30.Nelder, J. A. & Wedderburn, R. W. Generalized linear models. J. R. Stat. Soc. Ser. A-Gen. 135, 370–. 10.2307/2344614 (1972). [Google Scholar]
  • 31.Oshi, M. et al. A novel Three-Gene score as a predictive biomarker for pathologically complete response after neoadjuvant chemotherapy in Triple-Negative breast cancer. Cancers (Basel). 13. 10.3390/cancers13102401 (2021). [DOI] [PMC free article] [PubMed]
  • 32.Zuo, K. et al. qRT-PCR-based DNA homologous recombination-associated 4-gene score predicts pathologic complete response to platinum-based neoadjuvant chemotherapy in triple-negative breast cancer. Breast Cancer Res. Treat.191, 335–344. 10.1007/s10549-021-06442-x (2022). [DOI] [PubMed] [Google Scholar]
  • 33.Han, Y., Wang, J. & Xu, B. Novel biomarkers and prediction model for the pathological complete response to neoadjuvant treatment of triple-negative breast cancer. J. Cancer. 12, 936–945. 10.7150/jca.52439 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Fournier, M. V. et al. A predictor of pathological complete response to neoadjuvant chemotherapy stratifies triple negative breast cancer patients with high risk of recurrence. Sci. Rep.9, 14863. 10.1038/s41598-019-51335-1 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Park, S. & Yi, G. Development of gene Expression-Based random forest model for predicting neoadjuvant chemotherapy response in Triple-Negative breast cancer. Cancers (Basel). 14. 10.3390/cancers14040881 (2022). [DOI] [PMC free article] [PubMed]
  • 36.Ashburner, M. et al. Gene ontology: Tool for the unification of biology. The gene ontology consortium. Nat. Genet.25, 25–29. 10.1038/75556 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Kanehisa, M. & Goto, S. K. E. G. G. Kyoto encyclopedia of genes and genomes. Nucleic Acids Res.28, 27–30. 10.1093/nar/28.1.27 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Kanehisa, M., Furumichi, M., Sato, Y., Matsuura, Y. & Ishiguro-Watanabe, M. KEGG: Biological systems database as a model of the real world. Nucleic Acids Res.53, D672–D677. 10.1093/nar/gkae909 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Wu, T. et al. ClusterProfiler 4.0: A universal enrichment tool for interpreting omics data. Innov. (Camb). 2, 100141. 10.1016/j.xinn.2021.100141 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Hanzelmann, S., Castelo, R. & Guinney, J. GSVA: Gene set variation analysis for microarray and RNA-Seq data. BMC Bioinform.1410.1186/1471-2105-14-7 (2013). [DOI] [PMC free article] [PubMed]
  • 41.Aran, D., Hu, Z. & Butte, A. J. xCell: Digitally portraying the tissue cellular heterogeneity landscape. Genome Biol.18, 220. 10.1186/s13059-017-1349-1 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Li, T. et al. A web server for comprehensive analysis of Tumor-Infiltrating immune cells. Cancer Res.77, e108–e110. 10.1158/0008-5472.CAN-17-0307 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Plattner, C., Finotello, F. & Rieder, D. Deconvoluting tumor-infiltrating immune cells from RNA-seq data using quantiseq. Methods Enzymol.636, 261–285. 10.1016/bs.mie.2019.05.056 (2020). [DOI] [PubMed] [Google Scholar]
  • 44.Becht, E. et al. Estimating the population abundance of tissue-infiltrating immune and stromal cell populations using gene expression. Genome Biol.17, 218. 10.1186/s13059-016-1070-5 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Racle, J. & Gfeller, D. E. P. I. C. A tool to estimate the proportions of different cell types from bulk gene expression data. Methods Mol. Biol. (Clifton N. J.). 2120, 233–248. 10.1007/978-1-0716-0327-7_17 (2020). [DOI] [PubMed] [Google Scholar]
  • 46.Newman, A. M. et al. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat. Biotechnol.37, 773–782. 10.1038/s41587-019-0114-2 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Yoshihara, K. et al. Inferring tumour purity and stromal and immune cell admixture from expression data. Nat. Commun.4, 2612. 10.1038/ncomms3612 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Maeser, D., Gruener, R. F. & Huang, R. S. OncoPredict: an R package for predicting in vivo or cancer patient drug response and biomarkers from cell line screening data. Brief. Bioinform. 2210.1093/bib/bbab260 (2021). [DOI] [PMC free article] [PubMed]
  • 49.Ginestet, C. ggplot2: elegant graphics for data analysis. J. R Stat. Soc. Ser. A-Stat. Soc.174, 245–245. 10.1111/j.1467-985X.2010.00676_9.x (2011). [Google Scholar]
  • 50.Anand, K. et al. Targeting mTOR and DNA repair pathways in residual triple negative breast cancer post neoadjuvant chemotherapy. Sci. Rep.11, 82. 10.1038/s41598-020-80081-y (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Tang, X. et al. Integration of multiomics data shows down regulation of mismatch repair and tubulin pathways in triple-negative chemotherapy-resistant breast tumors. Breast Cancer Res.2510.1186/s13058-023-01656-x (2023). [DOI] [PMC free article] [PubMed]
  • 52.Saxena, S. & Zou, L. Hallmarks of DNA replication stress. Mol. Cell.82, 2298–2314. 10.1016/j.molcel.2022.05.004 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Murai, J., Thomas, A., Miettinen, M., Pommier, Y. & Schlafen SLFN11), a restriction factor for replicative stress induced by DNA-targeting anti-cancer therapies. Pharmacol. Ther. 201. 11, 94–102. 10.1016/j.pharmthera.2019.05.009 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Greener, J. G., Kandathil, S. M., Moffat, L. & Jones, D. T. A guide to machine learning for biologists. Nat. Rev. Mol. Cell. Biol.23, 40–55. 10.1038/s41580-021-00407-0 (2022). [DOI] [PubMed] [Google Scholar]
  • 55.Xiong, Y., Ye, M. & Wu, C. Cancer classification with a Cost-Sensitive Naive Bayes stacking ensemble. Comput. Math. Methods Med.2021 (5556992). 10.1155/2021/5556992 (2021). [DOI] [PMC free article] [PubMed]
  • 56.Wolpert, D. H. Stacked generalization. Neural Netw.5, 241–259. 10.1016/s0893-6080(05)80023-1 (1992). [Google Scholar]
  • 57.Onozato, Y. et al. Predicting pathological highly invasive lung cancer from preoperative [(18)F]FDG PET/CT with multiple machine learning models. Eur. J. Nucl. Med. Mol. Imaging. 50, 715–726. 10.1007/s00259-022-06038-7 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Mahajan, P., Uddin, S., Hajati, F. & Moni, M. A. Ensemble learning for disease prediction: A review. Healthc. (Basel). 1110.3390/healthcare11121808 (2023). [DOI] [PMC free article] [PubMed]
  • 59.Deepak, K. G. K. et al. Tumor microenvironment: Challenges and opportunities in targeting metastasis of triple negative breast cancer. Pharmacol. Res.153, 104683. 10.1016/j.phrs.2020.104683 (2020). [DOI] [PubMed] [Google Scholar]
  • 60.Zhao, Z., Li, T., Sun, L., Yuan, Y. & Zhu, Y. Potential mechanisms of cancer-associated fibroblasts in therapeutic resistance. Biomed. Pharmacother. 166, 115425. 10.1016/j.biopha.2023.115425 (2023). [DOI] [PubMed] [Google Scholar]
  • 61.Mao, X. et al. Crosstalk between cancer-associated fibroblasts and immune cells in the tumor microenvironment: New findings and future perspectives. Mol. Cancer. 20, 131. 10.1186/s12943-021-01428-1 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Wu, Y. et al. FGFR Blockade boosts T cell infiltration into triple-negative breast cancer by regulating cancer-associated fibroblasts. Theranostics12, 4564–4580. 10.7150/thno.68972 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Zhang, Y. et al. Co-Delivery nanomicelles for potentiating TNBC immunotherapy by synergetically reshaping CAFs-Mediated tumor stroma and reprogramming immunosuppressive microenvironment. Int. J. Nanomed.18, 4329–4346. 10.2147/IJN.S418100 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Fasching, P. A. et al. Pembrolizumab in combination with nab-paclitaxel for the treatment of patients with early-stage triple-negative breast cancer—A single-arm phase II trial (NeoImmunoboost, AGO-B-041). Eur. J. Cancer. 184, 1–9. 10.1016/j.ejca.2023.01.001 (2023). [DOI] [PubMed] [Google Scholar]
  • 65.Ademuyiwa, F. O. et al. A randomized phase 2 study of neoadjuvant carboplatin and Paclitaxel with or without Atezolizumab in triple negative breast cancer (TNBC) - NCI 10013. NPJ Breast Cancer. 8, 134. 10.1038/s41523-022-00500-3 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Mendoza, M. C., Er, E. E. & Blenis, J. The Ras-ERK and PI3K-mTOR pathways: Cross-talk and compensation. Trends Biochem. Sci.36, 320–328. 10.1016/j.tibs.2011.03.006 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1 (77.6KB, xlsx)
Supplementary Material 2 (59.1MB, docx)

Data Availability Statement

Data supporting the findings of this study are available for download from the databases specified in the methods section. The code for developing the ENS is available on the GitHub website (https://github.com/Noobcodeer467/Ensemble-Model).


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES