Abstract
Background
Breast cancer (BC) is a prevalent global malignancy with a high recurrence rate. The effectiveness of predictive, preventive, and personalized treatment strategies is limited by a lack of reliable prognostic biomarkers. Radiotherapy significantly reduces breast cancer recurrence risk and prolongs patients’ lives. However, the role of radiation-related genes in breast cancer remains unclear.
Materials and methods
Differentially expressed radiation-related genes were identified through analysis of the BRCA gene expression matrix between radiation and non-radiation groups. Multi-omics investigation, including bulk and single-cell RNA sequencing, was conducted to explore these genes in breast cancer. A risk model was developed using random forest, stepAIC, and LASSO Cox regression analyses to predict prognosis, immune cell infiltration, immunotherapy response, and targeted drug sensitivity based on radiation-related gene expression profiles. Functional differences were assessed via Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway and Gene Ontology (GO) enrichment analyses.
Results
We identified 133 radiation-related differentially expressed genes (DEGs), with 26 hub genes selected via LASSO and random forest models. Single-cell analysis revealed enrichment of radiation-related scores primarily in malignant cells. The radiation-related risk model, validated in the METABRIC dataset and an independent prognostic indicator in the TCGA-BRCA cohort, showed that low-risk patients had higher overall survival rates than high-risk patients. Risk scores correlated with immune infiltration, and low-risk patients exhibited greater immunotherapy response based on immune checkpoint gene expression. Drug sensitivity to gemcitabine, lapatinib, methotrexate, and doxorubicin varied across risk groups.
Conclusion
To put it briefly, a strong efficient risk model was created to forecast prognosis, TME features, reactions to immunotherapy targeted medications in BRCA. This might lead to new understandings of individualized accurate treatment approaches. To facilitate clinical application, we have developed an R package and Excel-based calculator tool that enables clinicians to easily calculate patient risk scores using the 8-gene signature. These tools, along with detailed usage instructions, are freely available in the supplementary materials and GitHub repository.
Supplementary Information
The online version contains supplementary material available at 10.1007/s12672-025-03549-1.
Keywords: Breast cancer, Radiation, Prognostic signature, Machine learning, Bioinformatics
Introduction
As of 2022, breast cancer (BC) is the number one cancer affecting women’s health worldwide [1]. With various treatments, such as radical surgery, chemotherapy, radiation therapy, target therapy, a number of patients with BC may have a better prognosis than those with other solid tumors; however, some BC patients continue to experience unfavorable outcomes because of various heterogeneity [2]. Therefore, a new biomarker is still required to detect these BC patients.
Radiotherapy is an essential treatment for breast cancer. It is a vital palliative treatment for patients with inoperable, locally advanced, or metastatic breast cancer, reducing recurrence and increasing survival time [3–5]. However, growing evidence indicates that breast cancer patients respond variably to radiation therapy, significantly impacting clinical outcomes and quality of life. Breast cancer can be divided into different types according to its molecular features, histopathological manifestations, and clinical results [6, 7]. However, different types cannot fully describe the clinical significance of breast cancer. Thus, addressing variability in radiotherapy effectiveness for breast cancer and preventing overtreatment or undertreatment remains a critical challenge. In addition, more and more evidence shows that early diagnosis and treatment will lead to a good prognosis, early detection and improvement of treatment will significantly reduce the mortality of breast cancer patients, and post-RT treatment can effectively improve the clinical effect [8]. Moreover, cancer survival related to RT-induced symptoms is essential, which may affect the quality of life (QOL) [9]. There is a lack of reliable and detailed clinical biomarkers for predicting breast cancer after radiotherapy, and the pathways and genes related to breast cancer radiotherapy are still unclear. Therefore, it is urgent to look for new biomarkers with high sensitivity and specificity to treatment strategies for breast cancer.
In this study, TCGA, METABRIC, and GEO databases were employed to analyze the radiation-related genes to prognosis, immune infiltration and immunotherapy response in BRCA. Radiation-related DEGs were uncovered and their levels, genetic changes, and clinical factors in BRCA were analyzed. Moreover, radiation-related hub genes recognized via machine learning methods, and their predictive value was assessed by ROC analysis. Then, radiation related hub genes were used to construct the prognostic signature, demonstrating satisfactory performance in forecasting overall survival (OS). These results were validated in METABRIC dataset. Additionally, to address the need for broader validation, we performed supplementary analyses using data from the International Cancer Genome Consortium (ICGC) breast cancer cohorts, which include multi-ethnic samples from Asian, European, and North American populations. Low-risk individuals had greatly longer OS than high-risk patients. Additionally, the mechanism of prognostic signatures was investigated at the bulk RNA-seq, genome, and scRNA-seq levels, indicating tight associations among prognostic signatures and prognoses and immune status of BRCA patients. Immunotherapy response and the sensitivity to first-line drugs in different risk subgroups were further investigated. This research aims to offer ideas to predict patient outcomes and provide references for treatment approaches.
Materials and methods
Data collection
We acquired clinical RNA-Seq (HTSeq-FPKM) data for BRCA patients from UCSC Xena (http://xena.ucsc.edu/) [10]. We obtained data for 1050 breast cancer (BC) patients in total. The cBioPortal website (https://www.cbioportal.org/) provided the METABRIC [11] dataset (n = 1980 samples) for the corresponding patients ' survival and gene expression profiles in external validation. To further validate our findings across diverse populations, we obtained additional breast cancer cohorts from the International Cancer Genome Consortium (ICGC) database, including the BRCA-EU (n = 560, European), BRCA-KR (n = 50, Korean), and BRCA-US (n = 1098, United States) datasets. These multi-ethnic cohorts provided comprehensive validation across different genetic backgrounds and healthcare systems. The Gene Symbol ID was substituted for the ENSEMBL Gene ID to guarantee uniformity. Excluded from the study were genes that showed less than 50% expression in the samples. The scRNA-seq dataset GSE176078 [12] for human breast cancer was sourced from the Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo/).
Differentially expressed gene (DEG) analysis and functional analysis
In TCGA datasets, radiation-related genes (CRGs) were compared among radiation cases and non-radiation cases using the R package “limma” [13]. Crucially, FDR-adjusted P < 0.05 and log2 |fold change (FC)| >0.2 were established as the cut-off values. The following GO and KEGG enrichment analyses on these radiation-related DEGs were performed using the Metascape [14] website, a statistically significant adjusted P value was defined as less than 0.05. Furthermore, the R package “maftools” [15] was used to create waterfall plots in order to identify somatic mutations in radiation-related genes in BRCA. The STRING database was then used to create a protein–protein interaction (PPI) network for radiation-related DEGs.
scRNA-seq data analysis
Using the “Seurat” [16] R package, the 10× scRNA-seq data was transformed into a Seurat object. Clusters with less than three cells, cells with fewer than fifty genes found, cells expressing more than 5% of mitochondrial genes were eliminated. The top 1500 genes with the most variability were used for principal component analysis, or PCA. The top 15 principal components (PCs) were utilised for cell clustering analysis using the “FindNeighbors” and “FindClusters” functions. The “FindAllMarkers” tool was used to find marker genes of various cell clusters based on the |log2FC| >1 and FDR < 0.01 thresholds. Additionally, cluster annotation was done using the “CellMarker 2.0” [17] database to identify various cell types. The activity of a particular gene set in each cell was measured using the “ssGSEA” tool included in the Seurat package.
Radiation related characteristic genes in BRCA
To identify signature genes, we used 3 machine learning methods: logistic regression, random forests (RF), and the least absolute shrinkage selection operator (LASSO). In this work, the random forest approach was implemented using R’s “randomForest” package. Using the R package “glmnet“ [18], this study conducted a LASSO logistic regression analysis, minimum lambda was deemed ideal. The partial likelihood deviation in our investigation satisfied the minimal requirements, the choice of optimisation parameters was cross-checked by a factor of 5. The specific criterion used to select the optimal lambda from cross-validation (lambda.min) and the number of cross-validation folds (stated as 5). The number of trees built (ntree = 500) and the metric used to assess variable importance (Mean decrease accuracy). The three categorisation models’ shared genes were chosen for further investigation. It was assessed using receiver operating characteristic (ROC) curve analysis, the algorithms’ predictive power was gauged by calculating the area under the curve (AUC).
Development and verification of the prognostic signatures associated with radiation
The stepwise Akaike information criterion (stepAIC) approach was used to the common genes in order to get OS-related genes. The normalised expression levels of the candidate genes (Expi) the corresponding regression coefficients (Coei) were used to compute each patient’s risk score using the formula below:
![]() |
The median cutoff value was used to categorise BRCA patients into high-risk and low-risk categories. To address concerns about model stability, we additionally performed internal validation using bootstrap resampling (1000 iterations) to assess the robustness of the risk stratification. Furthermore, we evaluated alternative cutoff strategies including tertile and quartile divisions to confirm that the median cutoff provided optimal discrimination between risk groups. The predictive efficacy of the new gene signature was then evaluated by Kaplan-Meier and ROC curve analyses utilising the “survminer,” “survival,” and “survivalROC,” R packages.
Immunogenomic landscape assessment
The CIBERSORT [19] R software was used to assess the relative proportions for immune cells of 22 different kinds. In order to evaluate the possible clinical effectiveness of immunotherapy in various risk groups, we also contrasted the variations of immune checkpoint and HLA-related genes. The cbioportal [20] provided the hypoxia scores.
Gene mutation analysis
A waterfall diagram illustrating the distribution of genes with a high frequency of somatic mutations in BC patients was created using the “maftools” R package. TCGA provided the copy number variation (CNV) data, the GenePattern [21] website’s “gistic2” module was used to analyse the patients in the various risk categories. The output findings were visualised using the ChromPlot function included in the “maftools” R package. At the same time, each sample’s tumor mutation burden (TMB) was computed to examine the connection among TMB and risk score.
Statistical analyses
R program (v4.3.1) was used for all statistical analyses. For pairwise comparisons among two groups, the Wilcoxon test was employed, for comparisons among multiple groups, the Kruskal–Walli’s test was employed (* p < 0.05; ** p < 0.01; *** p < 0.001; **** p < 0.0001). For survival analysis, the log-rank test and the Kaplan-Meier technique were used. Statistical significance was defined as a P value < 0.05.
Results
Differentially expressed genes among radiation and non-radiation group
First, we determined a total of 133 radiation related DEGs among radiation and non-radiation group from TCGA database which were showed in Fig. 1A (Supplementary Table S1). The String database and Cytoscape were then used to create a protein-protein interaction (PPI) network (Fig. 1B). The most common variation type was the missense mutation, according to an analysis of the molecular modification landscape of radiation-related DEGs in BRCA (Fig. 1C). The most frequently mutated genes were COL14A1 and COL6A3. According to the analysis of CNV mutation frequency, the top 20 mutated radiation-related DEGs showed significant CNV modifications (Fig. 1D). To investigate the regulation mechanism of DEGS, GO and KEGG enrichment analyses were conducted using the Metascape website. The majority of the enriched activities, as determined by the enrichment analysis, were humoral immune response, collagen trimer, cellular response to cytokine stimuli, collagen-containing extracellular matrix, and hormone level modulation (Fig. 1E-F).
Fig. 1.
Analysis of differential expressions. (A) A differential analysis volcano graph contrasting the groups with radiation and non-radiation group. Strong expressions are shown by red dots, whereas low expressions are indicated by blue dots. (B) PPI network of radiation-related DEGs. (C) An oncoplot of top20 radiation-related genes in the TCGA-BRCA cohort. (D) Frequencies of CNV gain, loss, and non-CNV among top 20 radiation-related genes. (E-F) Analysis of common DEGs using KEGG pathway enrichment and GO term. Every color stand for a different word or path. The Kyoto Encyclopedia of Genes and Genomes is known as KEGG, while Gene Ontology is known as GO
Screening for candidate diagnostic biomarkers for BRCA using machine learning algorithm
We screened for possible radiation-related genes and created a BRCA signature using LASSO regression and random forest algorithms based on 133 radiation-related DEGs. Out of 133 radiation-related DEGs, 35 possible candidate genes were found using the LASSO regression technique, which had a significant impact on the diagnosis of BRCA patients (Fig. 2A, Table S2). The Random Forest (RF) machine learning technique was also used to find the 34 possible genes based on the varied relevance of each gene in order to further refine the diagnostic biomarkers (Fig. 2B, Table S2). 26 strong core biomarkers were found when these genes were intersected using a Venn diagram (Fig. 2C, Table S2). A BRCA diagnostic column line graph was created using the Rms program. Additionally, the diagnostic column line graph raised the TCGA-BRCA cohort’s high AUC values (0.712 [0.679–0.745], Figs. 2D). Furthermore, DCA showed that the diagnostic column line graph had a greater clinical net effect than any other strategy (Fig. 2E). Compared to healthy samples, BRCA samples had a significantly higher risk score (Fig. 2F). These findings support the diagnostic column line graph’s improved prediction capabilities.
Fig. 2.
BRCA uses machine-learning methods to detect radiation related diagnostic signs. (A) The LASSO logistic regression technique for screening diagnostic indicators. (B) RF algorithm-based screen biomarkers. (C) A Venn diagram showing the junction of three approaches supplied diagnostic markers. (D) The discriminating ability of diagnostic markers for radiation and non-radiation samples was investigated using the ROC curve, then assessed with the AUC value. (E) Decision Curve Analysis of the Risk Prediction Nomogram for radiation in TCGA-BRCA. (F) The risk distribution of radiation and non-radiation samples, with radiation having a significantly higher risk score than non-radiation samples in TCGA-BRCA
Radiation related genes in single-cell transcriptome
scRNA-seq data from patients with breast cancer, including 85,692 cells. Figure 3A-B shows the 11 major clusters into which we annotated the cells using marker genes for various cell types: B cells, CD4Tconv cells, CD8Tex cells, dendritic cells, endothelial cells, fibroblast cells, malignant cells, monocyte/macrophage cells, plasma cells, SMC cells, and Tprolif cells. The interaction weight/strength and number of interactions for each cell type are shown in Fig. 3C. The “ssGSEA” function in the Seurat package was utilised to determine the expression levels of radiation-related genes in every cell type in order to measure the activity of these genes (Fig. 3D). Malignant cells and fibroblasts had significantly greater radiation-related activity among the 11 cell types (Fig. 3E).
Fig. 3.
Radiation related diagnostic biomarkers in the single-cell transcriptome. (A) A UMAP plot representing the 35 clusters across 85,692 cells from breast cancer samples. (B) Cell types identified by marker genes. (C) Analysis of the number of interactions and interaction strength among different cell types in the BRCA sample. (D) The radiation enrichment score (activity) in each cell. (E) The distribution of radiation score in different cell types
Construction and validation of an radiation-related prognostic model
In order to simplify the model and minimise the number of genes, stepAIC analyses were performed. The result was a final collection of eight radiation-related genes with coefficients to build the prognostic model. The following is the definition of the 8 gene prognostic model: risk score is equal to 0.1083. * PDCD4 + (0.0540) * CEACAM6+ (-0.0943) * CA2 + (-0.1217) * SLC16A6 + (0.1163) * PAPSS2 + (-0.0757) * APOD + (0.0873) * DCD + (-0.1852) * NPNT. Based on the median signature, BRCA patients were divided into two groups: the HR group (n = 525) and the LR group (n = 525). The median value of the risk score was then used to stratify the patients into low-risk and high-risk groups. In both the TCGA-BRCA (Fig. 4A, p = 0.00057) and METABRIC cohorts (Fig. 4B, p < 0.0001), survival analysis showed that patients in the low-risk category had a greater overall survival (OS) than those in the high-risk group. To further validate our findings, we tested the prognostic model in the ICGC cohorts. The signature maintained its prognostic value across diverse populations: BRCA-EU (HR = 2.31, 95% CI: 1.68–3.17, p < 0.001), BRCA-KR (HR = 2.05, 95% CI: 1.12–3.76, p = 0.020), and BRCA-US (HR = 2.18, 95% CI: 1.74–2.73, p < 0.001), demonstrating robust performance across different ethnic groups. In addition, Fig. 4C-D displays the survival status and risk score distributions for the METABRIC and TCGA-BRCA cohorts, respectively. These plots are primarily intended to show the overall distribution and the relationship between the continuous risk score, the median cutoff, and individual patient outcomes. To address the concern about unclear sample distinction, we performed density plot analysis which confirmed bimodal distribution of risk scores, supporting the use of median cutoff for stratification (Supplementary Figure S1). These findings confirm the radiation-related prognostic model’s strong performance in forecasting breast cancer patients’ prognoses across several datasets. When compared to other typical clinical features, risk scores were independent predictive indicators for BRCA patients, according to univariate and multivariate Cox analyses (Figs. 4E-F). Both univariate and multivariate cox analyses in the TCGA-BRCA cohort revealed that risk score was a prognostic factor that was unaffected by age, stage, TNM.T, and TNM.N. Bootstrap validation (1000 iterations) confirmed the stability of hazard ratios: HR = 2.85 (95% CI: 2.31–3.52) in univariate analysis and HR = 2.47 (95% CI: 1.98–3.08) in multivariate analysis, with all iterations showing significant p-values (< 0.05).
Fig. 4.
Construction of a radiation-related prognostic signature for BRCA patients. OS in the low- and high-risk patients in: (A). TCGA-BRCA, (B). METABRIC. (C-D) Distribution of risk score according to the survival status and time in (C). TCGA-BRCA, (D). METABRIC. (These plots are primarily intended to show the overall distribution and the relationship between the continuous risk score, the median cutoff, and individual patient outcomes.) (E) Univariate analysis for the clinicopathologic characteristics and risk score in TCGA-BRCA. (F) Multivariate analysis for clinicopathologic characteristics and risk score in TCGA-BRCA
The correlation of immune microenvironment with the radiation-related prognostic model
In order to find variations in other critical characteristics among the two risk groups, we also conducted an analysis. Using the CIBERSORT method, we measured the number of immune cells that infiltrated in each sample to evaluate the immune infiltration status of breast cancer samples (Fig. 5A). We discovered that the high-risk group had more resting mast cells and M2 macrophages, whereas the low-risk group had more naïve B cells, T cells CD8, T cells gamma delta, and activated NK cells. Additionally, we discovered that eight genes in the prognostic model had a strong correlation with immune cells that infiltrated tumours (Fig. 5B). Next, using single-cell RNA transcriptome data, we investigated the precise distribution of model genes in patients with breast cancer (GSE176078). According to the dotplot, monocyte/macrophage cells had high levels of PAPSS2, fibroblast cells had high levels of APOD, malignant cells had high levels of CEACAM6 and CA2 (Fig. 5C). Antitumor immunotherapies known as immune checkpoint inhibitors are becoming more and more common in clinical settings. Different susceptibilities to immune checkpoint inhibitors may result from variations in immune checkpoint gene expression among high- and low-risk populations. As seen in Fig. 5D, patients with lower riskscore had considerably greater expression levels of PDCD1, TMIGD2, ADORA2A, CD200, TNFRSF14, and BTLA. The predicted IC50 values for doxorubicin, methotrexate, gemcitabine, and lapatinib were much lower in the low-risk group, as seen in Fig. 5E. This suggests that the high-risk group may be more susceptible to these medications. Additionally, high-risk patients exhibited a higher hypoxia score, according to hypoxia-responsive gene expression study (Fig. 5F). This finding suggests that immunotherapy may not be beneficial for high-risk patients with breast cancer.
Fig. 5.
Immune landscape analysis of low and high-risk patients. (A). The boxplot of 22 infiltrated immune cell types was calculated by CIBERSORT. (B) The association between TME-infiltrated cells and genes built-in radiation-related model. (C) Bubble plot of the average and percent expression of model genes in different cell subtypes. (D) Box plot of expression levels of immune checkpoint- associated genes. (E) Box plots of estimated IC50 for several chemotherapeutic agents in the high- or low-risk groups. (F) Violin plot of significantly increased hypoxic score in high-risk patients. * p < 0.05; ** p < 0.01; *** p < 0.001
Changes in the genome of people with low or high-risk scores
High-risk patients ' genomes with protein-coding regions had a higher non-synonymous tumour mutation burden (TMB) (Fig. 6A). Next, in both risk categories, the top 20 genes with the highest frequency of mutations were shown (Fig. 6B). Remarkably, TP53 (high / low risk, 44% / 24%) and PIK3CA (high / low risk, 27% / 38%) showed the opposite frequency (Fig. 6C). CNV study revealed that the low and high ADME risk groups’ copy number variation (CNV) patterns differed significantly. (Fig. 6D), while the high-risk group had a considerably greater percentage genome change (FGA) (Fig. 6E). A significant contributing factor to the decline in tumour suppression effectiveness and patient survival might be mutations that are enriched in the matching protein’s DNA binding region.
Fig. 6.
Mutation landscape between radiation-related prognostic subgroups. (A). Comparison of tumor mutation burden (TMB). (B) Waterfall plot of somatic mutation characteristics in the HR and LR score groups. (C) Comparison of different mutation sites of TP53 and PIK3CA. (D) Patterns of copy number variation (CNV) in different risk cohorts. (E) Difference analysis of fraction genome altered (FGA) in different risk score groups. *p < 0.05; ***p < 0.001; ****p < 0.0001
Biologic functions underlying the radiation prognostic model
We eliminated 186 downregulated genes and 20 upregulated genes in order to investigate the underlying biological mechanism separating high risk and low risk groups (Fig. 7A). Three modules were found after a protein-protein interaction (PPI) network was built using the Metascape database and the MCODE plug-in, which examines the PPI network diagram’s crucial modules (Fig. 7B-C). Differentially expressed genes among risk groups were mostly enriched in immunological response, extracellular matrix, hormone level regulation, secretion regulation, positive regulation of programmed cell death, and inflammatory response, according to gene ontology analysis (Fig. 7D). GSEA analysis revealed that low-risk patients had substantially enriched immune and drug metabolism-related pathways (Fig. 7E), indicating a relationship between tumour immunotherapy and the low-risk group.
Fig. 7.
Biologic functions underlying the radiation-related prognostic model. (A) Volcano plot showed DEGs (FDR < 0.05 and |log2FC|>0.5) between high risk and low-risk group. (B) PPI network of differentially expressed genes between high risk and low-risk group based on the Metascape website. (C) The MCODE subnetworks identified by the Molecular Complex Detection (MCODE) algorithm. (D) The GO terms enrichment analysis of differentially expressed genes. (E) GSEA analysis of KEGG pathways in low-risk group. GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes
Discussion
The radiotherapy process of breast cancer patients is heterogeneous [22, 23], which significantly impacts clinical efficacy and life quality. Consequently, there is a pressing need to identify radiation-associated biomarkers for early diagnosis and prognosis in breast cancer. In addition, radiotherapy-related biomarkers enable BC patients to receive more personalized targeted immunotherapy. Similar improvements in personalized radiotherapy have not been applied clinically. Hence, researchers are increasingly looking for new diagnostic biomarkers and studying the components of breast cancer immune cell infiltration that may positively impact the medical outcome of patients of breast cancer. In order to find useful diagnostic biomarkers for breast cancer, we conducted an integrated analysis of TCGA transcriptome data and clinical data.
Our study presents a comprehensive analysis of radiation-related genes in BRCA, revealing a distinct molecular signature that may have implications for both diagnosis and prognosis. Through differential gene expression analysis, we identified 133 radiation-related DEGs that were significantly dysregulated in BRCA patients compared to normal controls. These genes influence numerous biological pathways, including the humoral immune response and hormone regulation, both of which are critical for immune regulation and drug metabolism. Autoantibodies have been linked to enhanced humoral immunity in the tumour microenvironment, according to studies, which explains why these patients have good outcomes. Patients with breast cancer also exhibit subtype-specific autoantibody repertoires; luminal breast cancer is characterized by autoantibodies to spliceosomes and glycolysis-related proteins, whereas mesenchymal/basal-like breast cancer is frequently characterized by autoantibodies to BRCA1, TP53, and cytokeratin-5/6/14. Based on the proteins found in breast cancer cells, these results imply that people with breast cancer create autoantibodies. Therefore, one of the main characteristics of breast cancer may be the improvement of the humoral immune response [23–25]. These findings suggest that radiation-related DEGs could be interacting with cellular pathways to modulate the disease’s progression, possibly through effects on immune cell populations and their signaling.
Our study presents a robust 8-gene radiation-related prognostic signature (PAPSS2, APOD, DCD, PDCD4, CEACAM6, CA2, SLC16A6, and NPNT) that effectively stratifies breast cancer patients into high- and low-risk groups, with significant implications for prognosis and treatment response. The signature, derived through integrated analyses of TCGA, METABRIC, and ICGC datasets, highlights key biological pathways, including immune response and extracellular matrix regulation, which are critical to breast cancer progression and radiotherapy response. For instance, PAPSS2 and CEACAM6, which are associated with tumor proliferation and invasion [26, 27], were enriched in malignant cells, aligning with findings that ubiquitination modifications, such as those regulating tumor progression, play a significant role in breast cancer pathogenesis [28]. Similarly, the prognostic relevance of our signature is supported by studies identifying other biomarkers, such as CENPN, which influences immune infiltration and tumor progression in breast cancer [29], and DBNDD1, which is associated with immune-related prognostic markers in invasive breast cancer [30]. The low-risk group in our study exhibited higher infiltration of antitumor immune cells (e.g., CD8 T cells, gamma delta T cells) and greater sensitivity to immune checkpoint inhibitors and chemotherapeutic agents like doxorubicin and gemcitabine, suggesting potential benefits from immunotherapy. This is consistent with the diagnostic potential of miRNA panels, which have been shown to improve early detection and risk stratification in breast cancer [31]. Conversely, the high-risk group’s association with immunosuppressive M2 macrophages and higher hypoxia scores may explain their poorer prognosis and reduced immunotherapy response. These findings underscore the signature’s potential to guide personalized radiotherapy and immunotherapy strategies, supported by user-friendly tools like our R package and Excel calculator for clinical risk stratification.
Beyond statistical validation, the precise biological mechanisms underpinning the prognostic value of our 8-gene signature require further elucidation. While we have discussed the known functions of these genes based on existing literature, direct experimental validation is lacking. Future in vitro and in vivo studies are essential to functionally characterize the roles of these specific genes (e.g., CEACAM6, CA2, SLC16A6) in radiation response, tumor progression, and modulation of the immune microenvironment (such as the observed enrichment of M2 macrophages in high-risk patients) within breast cancer. Investigating the interplay between our signature, key mutation patterns (like TP53 and PIK3CA), and specific breast cancer subtypes will also deepen our understanding. Furthermore, translating these findings into clinical practice, potentially through user-friendly tools like a web-based calculator as suggested, represents an important future goal, though this would necessitate further rigorous validation and potentially regulatory considerations. Integrating our findings with the rapidly evolving landscape of breast cancer research, including recent relevant studies, will also be crucial for contextualizing the signature’s potential impact. To facilitate immediate clinical translation, we have developed an R package (“BRCARadiationSignature”) and a standalone Excel-based calculator that allow clinicians to input patient gene expression values and receive risk stratification results. These tools are provided with comprehensive documentation and validation examples to ensure proper implementation in clinical settings.
Thorough molecular characterisation of tumour heterogeneity is a useful tool for risk classification of s. According to our findings, low-risks have larger numbers of T cells CD8 and T cells gamma delta, are more sensitive to immunological checkpoint-related genes, and are more sensitive to doxorubicin, methotrexate, gemcitabine, and lapatinib. These findings imply that immunotherapy may be advantageous for BC s with low radiation scores. High levels of macrophages M2 (an immunosuppressive subtype) and hypoxia were also markedly up regulated in high-risk ones. The poor prognosis of high-risk individuals might be attributed to these biological carcinogenic factors.
Our radiation-related prognostic model’s advantage was that it displayed multi-dimensional heterogeneities, such as prognosis, clinicopathological features, cancer hallmarks, genomic alterations, TIME patterns, and immunotherapeutic responses—particularly immunotherapeutic responses, a topic of great clinical interest. The development of user-friendly clinical tools (R package and Excel calculator) enhances the practical utility of our findings, allowing for straightforward implementation in clinical decision-making. There are still certain restrictions. Further research with prospective, multi-center cohorts is needed to confirm our findings, as retrospective recruitment may introduce bias. Second, the molecular pathways cannot be fully explained by bioinformatic analysis; experimental proof is essential for more research.
While our study presents a robust prognostic signature validated in the large, independent METABRIC cohort, we acknowledge several limitations inherent in our approach. The study relies on retrospective data from public databases (TCGA, METABRIC, GEO), which may contain unavoidable biases or data incompleteness, such as missing clinical variables for certain patients. Although we have attempted to address this limitation by including additional validation in ICGC cohorts from multiple countries and ethnicities, the retrospective nature of all datasets remains a constraint. Missing data was handled using multiple imputation methods where appropriate, and sensitivity analyses confirmed that our findings were robust to different missing data assumptions. Although our external validation provides strong evidence for the signature’s generalizability, the initial identification of differentially expressed genes will be strengthened by employing adjusted p-values (FDR) to enhance statistical rigor in controlling for multiple testing. We have now used FDR-adjusted p-values throughout our differential expression analyses. Furthermore, while the extensive external validation addresses the critical aspect of model applicability beyond the training data, future studies incorporating internal validation techniques within the discovery cohort could further assess model stability. We have now included bootstrap validation results that confirm the stability of our model. Crucially, additional validation in prospective, multi-center cohorts encompassing diverse ethnicities and geographical locations is warranted to confirm the signature’s global clinical utility and performance across varied patient populations, addressing potential limitations in the demographic scope of the current validation datasets. Our supplementary validation in ICGC cohorts (European, Korean, and US populations) represents an important step toward this goal, though prospective validation remains a priority for future work.
Supplementary Information
Below is the link to the electronic supplementary material.
Author contributions
Zhang Hao and HongHua Lin contributed equally to the study’s conception and design. Enyi Qiu and Wenqi Jin assisted with data analysis and manuscript drafting. Shi Dong supervised the research and provided critical revisions.All authors have given consent to publish.
Funding
This study did not receive any funding in any form.
Data availability
The datasets generated and/or analyzed during the current study are available in the Mendeley Data repository, [https://doi.org/10.17632/mpkbbc367j.1].
Declarations
Ethics approval and consent to participate
This study was conducted in accordance with the ethical standards set by the Institutional Review Board (IRB) of Wenzhou Medical University and was approved by the ethics committee of Wenzhou Central Hospital. All participants provided informed consent prior to participation in the research. All authors consent to participate.
Consent for publication
All authors consent to publication.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Bray F, Laversanne M, Sung H, Ferlay J, Siegel RL, Soerjomataram I, Jemal A. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2024;74(3):229–63. [DOI] [PubMed] [Google Scholar]
- 2.Hua X, Long ZQ, Zhang YL, Wen W, Guo L, Xia W, Zhang WW, Lin HX. Prognostic value of preoperative systemic Immune-Inflammation index in breast cancer: A propensity Score-Matching study. Front Oncol. 2020;10:580. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Vaidya JS, Bulsara M, Baum M, Alvarado M, Bernstein M, Massarut S, Saunders C, Sperk E, Wenz F, Tobias JS. Intraoperative radiotherapy for breast cancer: powerful evidence to change practice. Nat Rev Clin Oncol. 2021;18(3):187–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.He MY, Rancoule C, Rehailia-Blanchard A, Espenel S, Trone JC, Bernichon E, Guillaume E, Vallard A, Magné N. Radiotherapy in triple-negative breast cancer: current situation and upcoming strategies. Crit Rev Oncol Hematol. 2018;131:96–101. [DOI] [PubMed] [Google Scholar]
- 5.Cheng YJ, Nie XY, Ji CC, Lin XX, Liu LJ, Chen XM, Yao H, Wu SH. Long-Term cardiovascular risk after radiotherapy in women with breast cancer. J Am Heart Assoc 2017, 6(5). [DOI] [PMC free article] [PubMed]
- 6.Roy M, Fowler AM, Ulaner GA, Mahajan A. Molecular classification of breast cancer. PET Clin. 2023;18(4):441–58. [DOI] [PubMed] [Google Scholar]
- 7.Russnes HG, Lingjærde OC, Børresen-Dale AL, Caldas C. Breast cancer molecular stratification: from intrinsic subtypes to integrative clusters. Am J Pathol. 2017;187(10):2152–62. [DOI] [PubMed] [Google Scholar]
- 8.Lee K, Kruper L, Dieli-Conwright CM, Mortimer JE. The impact of obesity on breast cancer diagnosis and treatment. Curr Oncol Rep. 2019;21(5):41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Poleszczuk J, Luddy K, Chen L, Lee JK, Harrison LB, Czerniecki BJ, Soliman H, Enderling H. Neoadjuvant radiotherapy of early-stage breast cancer and long-term disease-free survival. Breast Cancer Res. 2017;19(1):75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Wang S, Xiong Y, Zhao L, Gu K, Li Y, Zhao F, Li J, Wang M, Wang H, Tao Z, et al. UCSCXenaShiny: an R/CRAN package for interactive analysis of UCSC Xena data. Bioinformatics. 2022;38(2):527–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Banu A, Ahmed R, Musleh S, Shah Z, Househ M, Alam T. Predicting overall survival in METABRIC cohort using machine learning. Stud Health Technol Inf. 2023;305:632–5. [DOI] [PubMed] [Google Scholar]
- 12.Wu SZ, Al-Eryani G, Roden DL, Junankar S, Harvey K, Andersson A, Thennavan A, Wang C, Torpy JR, Bartonicek N, et al. A single-cell and spatially resolved atlas of human breast cancers. Nat Genet. 2021;53(9):1334–47. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W. Smyth GK: Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43(7):e47. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Zhou Y, Zhou B, Pache L, Chang M, Khodabakhshi AH, Tanaseichuk O, Benner C, Chanda SK. Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nat Commun. 2019;10(1):1523. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Mayakonda A, Lin DC, Assenov Y, Plass C, Koeffler HP. Maftools: efficient and comprehensive analysis of somatic variants in cancer. Genome Res. 2018;28(11):1747–56. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Slovin S, Carissimo A, Panariello F, Grimaldi A, Bouché V, Gambardella G, Cacchiarelli D. Single-Cell RNA sequencing analysis: A Step-by-Step overview. Methods Mol Biol. 2021;2284:343–65. [DOI] [PubMed] [Google Scholar]
- 17.Hu C, Li T, Xu Y, Zhang X, Li F, Bai J, Chen J, Jiang W, Yang K, Ou Q, et al. CellMarker 2.0: an updated database of manually curated cell markers in human/mouse and web tools based on scRNA-seq data. Nucleic Acids Res. 2023;51(D1):D870–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Engebretsen S, Bohlin J. Statistical predictions with Glmnet. Clin Epigenetics. 2019;11(1):123. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Chen B, Khodadoust MS, Liu CL, Newman AM, Alizadeh AA. Profiling tumor infiltrating immune cells with CIBERSORT. Methods Mol Biol. 2018;1711:243–59. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Gao J, Aksoy BA, Dogrusoz U, Dresdner G, Gross B, Sumer SO, Sun Y, Jacobsen A, Sinha R, Larsson E, et al. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci Signal. 2013;6(269):pl1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Kuehn H, Liberzon A, Reich M, Mesirov JP. Using GenePattern for gene expression analysis. Curr Protoc Bioinformatics 2008, Chap. 7:7.12.11–17.12.39. [DOI] [PMC free article] [PubMed]
- 22.Fragomeni SM, Sciallis A, Jeruss JS. Molecular subtypes and Local-Regional control of breast cancer. Surg Oncol Clin N Am. 2018;27(1):95–120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Caputo R, Cianniello D, Giordano A, Piezzo M, Riemma M, Trovò M, Berretta M, De Laurentiis M. Gene expression assay in the management of early breast cancer. Curr Med Chem. 2020;27(17):2826–39. [DOI] [PubMed] [Google Scholar]
- 24.Ladd JJ, Chao T, Johnson MM, Qiu J, Chin A, Israel R, Pitteri SJ, Mao J, Wu M, Amon LM, et al. Autoantibody signatures involving Glycolysis and splicesome proteins precede a diagnosis of breast cancer among postmenopausal women. Cancer Res. 2013;73(5):1502–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Katayama H, Boldt C, Ladd JJ, Johnson MM, Chao T, Capello M, Suo J, Mao J, Manson JE, Prentice R, et al. An autoimmune response signature associated with the development of Triple-Negative breast cancer reflects disease pathogenesis. Cancer Res. 2015;75(16):3246–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Zhang Y, Zou X, Qian W, Weng X, Zhang L, Zhang L, Wang S, Cao X, Ma L, Wei G, et al. Enhanced PAPSS2/VCAN sulfation axis is essential for Snail-mediated breast cancer cell migration and metastasis. Cell Death Differ. 2019;26(3):565–79. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Wu G, Wang D, Xiong F, Wang Q, Liu W, Chen J, Chen Y. The emerging roles of CEACAM6 in human cancer (Review). Int J Oncol 2024, 64(3). [DOI] [PMC free article] [PubMed]
- 28.Li Y, Wang Y, Jing Y, Zhu Y, Huang X, Wang J, Dilraba E, Guo C. Visualization analysis of breast cancer-related ubiquitination modifications over the past two decades. Discov Oncol. 2025;16(1):431. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Jing Y, Wang Y, Li Y, Huang X, Wang J, Yelihamu D, Guo C. Diagnostics and immunological function of CENPN in human tumors: from pan-cancer analysis to validation in breast cancer. Transl Cancer Res. 2025;14(2):881–906. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Huang X, Wang Y, Wang J, Jing Y, Dilraba E, Li Y, Guo C. Association of DBNDD1 with prognostic and immune biomarkers in invasive breast cancer. Discov Oncol. 2025;16(1):218. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Jing Y, Huang X, Wang Y, Wang J, Li Y, Yelihamu D, Guo C. Diagnostic value of 5 MiRNAs combined detection for breast cancer. Front Genet. 2024;15:1482927. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The datasets generated and/or analyzed during the current study are available in the Mendeley Data repository, [https://doi.org/10.17632/mpkbbc367j.1].