Identification of NK cell marker genes based on single-cell sequencing to establish a prognostic signature in breast cancer

Jiawei Zhang; Donghui Wang; Xiangmei He; Lan Hou; Juliang Zhang

doi:10.1007/s12672-025-03647-0

. 2025 Dec 19;17:133. doi: 10.1007/s12672-025-03647-0

Identification of NK cell marker genes based on single-cell sequencing to establish a prognostic signature in breast cancer

Jiawei Zhang ^1,^#, Donghui Wang ^2,^#, Xiangmei He ², Lan Hou ^2,^✉, Juliang Zhang ^2,^✉

PMCID: PMC12830521 PMID: 41417435

Abstract

Background

Tumor-infiltrating natural killer (NK) cells are pivotal in modulating tumor progression, either by promoting or inhibiting neoplastic development. Nevertheless, the implications of NK cells in breast carcinoma remain inadequately understood. This investigation aimed to delineate the impact of NK cells on both the prognosis and the immune infiltration landscape in breast cancer.

Methods

NK cell marker genes were identified using single-cell sequencing data from breast cancer available in the Gene Expression Omnibus (GEO) database. A prognostic model was constructed based on data from The Cancer Genome Atlas (TCGA) and subsequently validated with the GEO dataset. Disparities in immune cell infiltration between low-risk and high-risk cohorts, as stratified by the prognostic model, were examined. Additionally, genes differentially expressed between these cohorts were subjected to enrichment analysis.

Results

A total of 29 NK cell marker genes were identified through single-cell sequencing, and a prognostic model was subsequently developed using machine learning techniques based on the TCGA data. This model demonstrated robust predictive performance when applied to both TCGA and GEO datasets. Notably, a significant difference in immune infiltration was observed between the low-risk and high-risk groups. The findings were further validated through enrichment analysis.

Conclusions

In summary, we constructed a prognostic signature characterized by strong predictive performance, which has elucidated the critical role of NK cells in the pathogenesis of breast cancer. Furthermore, this model offers a predictive index and identifies a novel therapeutic target for the advancement of immunotherapeutic strategies in the clinical management of breast cancer patients.

Supplementary Information

The online version contains supplementary material available at 10.1007/s12672-025-03647-0.

Keywords: Breast cancer, Natural killer cell, Cellular infiltration, Prognosis, Risk score, Tumor microenvironment, Cancer heterogeneity

Introduction

Breast cancer is the most prevalent malignancy among women globally, accounting for an estimated 685,000 fatalities in 2020 [1]. It can develop in women of any age following puberty and, although rare, can also affect men [1]. The prognostic outlook for breast cancer patients is influenced by numerous factors, including age, tumor stage, general health status, as well as the tumor’s sensitivity to therapy [2–4]. In the United States, a surveillance program monitors 5-year relative survival rates for breast cancer, which currently stand at 99% for localized disease [5]. Therapeutic interventions for breast cancer may include surgical resection, radiotherapy, endocrine therapy, and chemotherapy [6–8]. The selection of therapy is contingent upon the stage and histological type of the neoplasm, the patient’s overall health, and their personal treatment preferences [6–8]. Prognostication in breast cancer is multifaceted and influenced by a variety of factors, including the stage and histological type of the neoplasm, the patient’s age and general health, and the efficacy of the therapeutic regimen [9, 10]. Breast cancer etiopathogenesis and advancement involve intricate interactions between neoplastic cells and the encompassing immune microenvironment. The tumor microenvironment (TME), along with conventional prognostic indicators, has emerged as a critical factor in cancer progression and therapeutic response [11]. Current studies emphasizing the need for its pathological evaluation in breast cancer management. TME-targeted therapies could potentially improve prognosis and treatment outcomes [12]. However, the clinical effectiveness of these therapeutic approaches requires further investigation and validation through clinical trials.

NK cells, a critical component of lymphoid cells, play a pivotal role in the innate immune defense. They have the inherent ability to identify and eliminate neoplastic cells, and their prevalence within the tumor microenvironment is positively correlated with patient prognosis [13]. A study examining DNA damage-repair-related genes in breast cancer patients revealed that tumors with a lower risk profile were typified by an increased infiltration of activated NK cells, CD8⁺ T lymphocytes, follicular helper T (Tfh) cells, macrophages and resting dendritic cells (DCs). These immunological infiltration patterns were concomitant with distinct metabolic signatures, diminished DNA replication, and augmented antineoplastic immunity [14]. In the context of HER2-positive breast cancer, a tri-gene prognostic model demonstrated more robust immune cell infiltration in the cohort with lower risk scores, encompassing tumor-infiltrating lymphocytes (TILs), CD8⁺ cytotoxic T lymphocytes, NK cells, dendritic cells, among other immune constituents. Furthermore, the expression level of immune checkpoint molecules, such as Programmed Death-Ligand 1 (PD-L1), Lymphocyte-activation gene 3 (LAG-3), Cytotoxic T-Lymphocyte Associated protein 4 (CTLA-4), T-cell Immunoglobulin and Mucin-domain containing-3 (TIM-3), and T cell immunoreceptor with Ig and ITIM domains (TIGIT), was markedly decreased in the high-risk group [15]. An investigation into the influence of lactate metabolism genes on the immune cell infiltrate and breast cancer prognosis identified a significant association between NK cell activation and lactate metabolism [16]. The composite prognostic model exhibited enhanced predictive accuracy over individual models, as evidenced by the area under the receiver operating characteristic curve (AUC) across three patient cohorts [16]. In summary, the infiltration of NK cells within breast cancer tissue correlates with patient prognosis. These findings could potentially facilitate the advancement of innovative therapeutic approaches in the management of breast cancer.

In this study, we used a breast cancer scRNA-seq data to create a NK-related risk model that could predict the survival outcomes of breast carcinoma patients. The validity of this prognostic model was assessed using GEO dataset of breast cancer patients, and the derived risk scores were established as an independent prognostic indicator for patient outcomes. Furthermore, the study investigated the relation between this model and various components of the immune cells within the tumor microenvironment, along with the sensitivity to therapeutic drugs. All analysis processes in the article can be found in Supplementary Fig. 1.

Methods

Data sources

The single-cell transcriptomic data were acquired from dataset GSE235168 (n = 25) in the Gene Expression Omnibus (GEO) database, which is a single-cell dataset of breast cancer tumors that includes 25 samples, specifically comprising ER + and triple-negative breast cancer types. Using RNA sequencing–determined transcriptomic profiles and corresponding clinical metadata from patients diagnosed with breast cancer, we analyzed a cohort of 1,475 subjects—371 adjacent normal tissues and 1,104 breast cancer samples—sourced from The Cancer Genome Atlas (TCGA) database (https://portal.gdc.cancer.gov/). The drug response records for the aforementioned breast cancer specimens were also retrieved from TCGA database. Additionally, the GSE20685 dataset (n = 327) was obtained from the GEO. Furthermore, to further verify the stability of our results, we used the METABRIC database to further verify the stability of the model. METABRIC (Molecular Taxonomy of Breast Cancer International Consortium) is a data set containing multiomics data for 2,000 cases of breast cancer.

Data preprocessing and standardization

For the microarray data, probes were annotated with Gene Symbols based on the GPL570 platform. Probes corresponding to multiple genes were excluded. For genes with multiple probes, the average expression was calculated, and normalization was applied to the original geographic data using the “NormalizeBetweenArray” R package. The GSE20685 dataset included a total of 327 BRCA samples. Regarding the bulk RNA-seq data (referred to as the TCGA dataset), BRCA samples and survival information with survival time greater than zero were retained. Ensembl IDs were converted to gene symbols, including protein-coding genes. The TCGA dataset includes a total of 371 BRCA samples.

Identification of NK cell-related genes

The quality control and data filtration of scRNA data was executed using the “Seurat” package in R. The raw matrix for each cell was subjected to filtration according to the following criteria: nCount_RNA > 1000, nFeature_RNA > 100, and percent.mt < 20. The “NormalizeData” function was employed to normalize the raw Unique Molecular Identifier (UMI) counts on a scale of 10,000, and highly variable genes were scrutinized using the Variance Stabilizing Transformation (VST) method. Subsequently, a Principal Component Analysis (PCA) was executed via the “RunPCA” operation, with the top 20 Principal Components (PCs) selected in accordance with the ElbowPlot. This was followed by a dimensionality reduction process, implemented through the “RunUMAP” operation. Following this, the “FindNeighbors” and “FindClusters” functions were utilized to conduct unsupervised cell clustering with a resolution of 0.6. The different clusters were showed by using Uniform Manifold Approximation and Projection (UMAP) plots. To ensure accurate NK cell annotation, we employed the standardized automatic annotation workflow of the SingleR package and performed manual validation using well-established marker genes from the literature, establishing a dual annotation strategy [17, 18]. Finally, to improve the accuracy of the screening for NK cell-related genes, we conducted a differential expression analysis among the various cell types using the FindAllMarkers function (min.pct = 0.25, logfc.threshold = 1). Subsequently, we excluded genes that exhibited differential expression in both NK cells and other cell types from the results pertaining to NK cell differential expression. To enhance the precision of screening for NK cell-related genes, we performed a differential expression analysis across different cell types using the FindAllMarkers function (min.pct = 0.25, logfc.threshold = 1). We then omitted genes that showed differential expression in both NK cells and other cell types from the NK cell differential expression results.

Construction of a prognostic risk scoring model according to NK markers

Utilizing TCGA-BRCA data, a preliminary screening was conducted to identify differentially expressed NK cell marker genes within scRNA-seq datasets. This screening employed a univariate Cox proportional hazards regression framework, operationalized within the “survival” package of R statistical software, to ascertain NK-related marker genes that correlate with overall survival (OS) and are pertinent for downstream machine learning applications. Subsequently, ten widely recognized machine learning algorithms were deployed to predict the prognostic risk associated with breast cancer. These algorithms comprised generalised boosted regression modelling (GBM), Ridge, stepwise Cox (StepCox), Lasso, random survival forest (RSF), elastic network (Enet), supervised principal components (SuperPC), CoxBoost, partial least squares regression for Cox (plsRcox), and survival support vector machine (survival-SVM) in R. The performance of each machine learning model was rigorously assessed through the utilization of the Area Under the Receiver Operating Characteristic Curve (AUC-ROC) and the Harrell’s concordance index (C-index). Construct prognostic models using the ten machine-learning methods described above, and select the best model based on the CI and AUC values. Then calculate risk scores using the ML.Dev.Prog.Sig function from the Mime1R package. Breast cancer specimens were dichotomized into high- and low-risk categories predicated on the median risk scores derived from the risk model. The prognostic efficacy of the optimal machine learning model was further validated using Kaplan-Meier (K-M) survival analysis and ROC curve assessments.

Validation and evaluation of model efficiency

To assess the prognostic efficiency of the model, a cohort of 327 breast cancer specimens from the GEO database was utilized. This was undertaken to assess the model’s reliability and accuracy in risk stratification. The “survivalROC” package in R was conducted to generate time-dependent ROC curves and to calculate the AUC for 1-, 3-, and 5-year survival rates of the patients with breast cancer. The disparities in prognostic outcomes between high-risk and low-risk cohorts were assessed by K-M analysis conducted using the “survminer” package in R software.

Nomogram of the prediction model

Cox regression analysis was used to evaluate the probability of risk score as an independent prognostic factor. A clinical nomogram was established by integrating the risk score with clinical phenotypes to predict the OS of breast cancer patients at 1 and 2-year intervals. “Survival” R package was performed to conduct the nomogram model for relevant clinical parameters. Decision Curve Analysis (DCA) function in the “ggDCA” R package was conducted to evaluate the patients’ outcomes.

Assessment of tumor-infiltrating immune cells and microenvironment

The quantification of enrichment scores pertaining to twenty-one immunological cell types within both high-risk and low-risk cohorts was calculated by Single Sample Gene Set Enrichment Analysis (ssGSEA). The expression level of immune checkpoint moleclues and TME scores between the high-risk and low-risk cohorts were analyzed. In addition, to elucidate the functional biological disparities between the two risk-defined groups, Gene Set Variation Analyses (GSVA) were employed.

Identification of gene mutation and drug response prediction

The R package “maftools” was employed to ascertain the frequency and distribution of genetic alterations within these two risk cohorts. Furthermore, drug response data in TCGA for the samples were retrieved. A quantitative analysis was performed to assess the sensitivity of responses to commonly used chemotherapeutic agents in both high-risk and low-risk patient stratifications. The disparities in pharmacological sensitivity between the two distinct cohorts were determined by the Wilcoxon signed-rank test.

Statistical analysis

The log-rank test was utilized to perform K-M survival curves. The precision of the risk score and the nomogram was appraised through the construction of ROC curves, facilitated by the “pROC” and “survivalROC” packages in R. All data processing and statistical analyses were performed using R software, version 4.2.1. The Spearman rank correlation coefficient, along with its associated p-value, was determined to assess the correlation between risk scores and other investigational variables. The Wilcoxon rank-sum test was employed to conduct a comparative evaluation of differential expression across subgroups. Statistical tests were bilateral, with a significance threshold established at p < 0.05.

Results

Identification of NK cell marker genes

Quality control parameters were rigorously applied to single-cell transcriptomic data derived from a cohort of five individuals diagnosed with breast cancer, as depicted in Fig. 1A. Following stringent filtration, a compendium of 21,419 genes was identified. From this set, a subset of 2,000 genes with high-variance was selected for further analysis (Fig. 1B). The top 2,000 genes were then incorporated into a Principal Component Analysis (PCA) to reduce dimensionality (Fig. 1C-D). Following this, the cells underwent clustering analysis were divided into 11 distinct cell clusters (Fig. 1E). After annotating the clusters, cells of cluster 3 were classified as NK cells. Ultimately, we identified 29 NK cell-related genes (Fig. 1F and Tab.S1). To investigate the roles of these 29 genes, we performed a functional enrichment analysis, revealing that NK cell-related genes are primarily associated with the Wnt and JAK signaling pathways (Fig. 1G). Additionally, to examine the interactions between NK cells and other cells, we analyzed the communication networks among immune cells. Our analysis indicated that NK cells engage in communication with various immune cells, including T cells and B cells (Fig. 1H).

Fig. 1 — Identification of NK cell marker genes by single-cell sequencing samples. (A) Violin plots of quality control standards. (B) Detection of the top 2,000 highly variable genes. (C) Scatter plots of the percentage of mitochondrial genes and gene counts in the sum of all gene expression levels in each cell. (D) PCA plot of 5 samples with different colors. (E) UMAP plot colored by various clusters (left) and cell types (right). (F) NK cell-related genes. (G). KEGG enrichment analysis of NK cell-related genes. (H) Cell communication analysis

Establishment of an NK related prognostic model for breast cancer

Utilizing the NK cell marker genes derived from single-cell data analysis, a risk assessment model pertinent to breast cancer was developed and evaluated. In our preliminary analysis, we identified 29 NK-related genes, of which 27 were detected in the TCGA-BRCA dataset. After prognostic evaluation of these genes, univariate Cox regression analysis revealed that five genes (TCF7, CCND3, BTG1, ZFP36L2, and RPSA) had significant prognostic impact (Fig. 2A and Supplementary Table 1).

Fig. 2 — Construction of a risk-scoring model for breast cancer by machine learning. (A) Five NK-related genes that affect breast cancer prognosis; (B) AUC and C-index model efficacy of predictive models constructed by different algorithms in different data sets; (C) the relationship between NK prognosis model and clinical parameters of breast cancer; (D) the relationship between NK prognosis model and breast cancer TNM analysis

Initially, several widely employed machine learning algorithms were utilized to construct prognostic subtypes based on TCGA-BRCA data, with the predictive performance of the model on prognosis being evaluated using AUC and C-index metrics After evaluating the predictive performance of ten machine learning algorithms using AUC and C-Index across two datasets, our analysis determined that the RSF algorithm demonstrated superior performance, achieving an area under the curve of 0.8 in the TCGA dataset and a C-index of 0.8. Consequently, we selected the RSF algorithm as the optimal model for further analysis. (Fig. 2B). To validate the correlation between the prognostic model and the clinical progression of breast cancer patients, we compared the model with TNM analysis and tumor characteristics. The analysis revealed that the NK prognostic model is associated with T analysis, with increased expression observed in the T3 and T4 stages (Fig. 2C-D). Ultimately, the prognostic efficacy of the model was assessed and validated on various datasets using K-M survival analysis and timeROC curve, with AUC of 0.98(0.931–0.993) for 5-year survival in TCGA (Fig. 3A-B). It is noteworthy that in the GEO dataset, although the survival analysis results did not reach statistical significance (P = 0.05), a trend was observed wherein patients in the high-risk stratification exhibited poorer survival outcomes (Fig. 3C-D). Simultaneously, to further assess the stability of our model, the METABRIC dataset was employed to validate its effectiveness. Upon analysis, it was determined that the NK-related prediction model effectively differentiates breast cancer prognosis. Additionally, time series AUC analysis revealed that the model’s prediction of the five-year survival rate achieved a score of 0.82(0.731–0.842) (Fig. 3E-F).

Fig. 3 — Evaluation of the model in breast cancer cohorts. (A-B) Evaluate the predictive effect of NK prediction models on breast cancer prognosis in TCGA database. (C-D) The predictive efficacy of predictive models on breast cancer prognosis was evaluated in the GSE20685 dataset. (E-F) evaluated the predictive efficacy of predictive models on breast cancer prognosis in the METABRIC dataset

Nomogram of the prognostic risk model

Univariate and multivariate Cox regression analyses were conducted to analyze the characteristics of prognostic genes. After analysis, we found that prognostic model scores and age were associated with prognosis before breast cancer (Fig. 4A-B). Utilizing these clinical parameters, along with risk groups and risk scores, we constructed a nomogram model to predict OS. Subsequently, the one-year and two-year OS probabilities for patients were estimated based on the aggregate of assigned points from each parameter (Fig. 4C). A higher cumulative point total was associated with a poorer survival prognosis. The clinical utility of the nomogram was evaluated through the application of DCA, with the corresponding outcomes presented in Fig. 4D. Furthermore, AUC for one-, three-, and five-year OS predictions in the time-dependent ROC curves are 0.834, 0.954, and 0.981, respectively. These values suggest that the predictive model exhibits robust performance (Fig. 4E). Additionally, we included a GEO dataset for external validation of the model, and the validated information demonstrated significant diagnostic efficacy for tumor prognosis. Please refer to the results section and Supplementary Fig. 2 for details.

Fig. 4 — Nomogram developed for the verification of NK marker genes prognostic signature. (A-B) Univariate (A) and multivariate (B) Cox regression analysis of the NK marker gene prognostic signature in breast cancer patients to determine independent risk variables. (C) Nomogram of the risk score and clinical information. (D) DCA curves of the nomogram prediction. (E) Time-dependent ROC curves of overall survival for nomogram model. DCA, decision curve analysis. ROC, receiver operating characteristic

Characteristics of NK cell marker genes in the prognostic risk model

To identify the molecular characteristics of NK cell-related marker genes incorporated within this model, we conducted an analytical comparison of the expression patterns for the the five most significantly differentially expressed genes (DEGs) (Fig. 5A). All other genes also significantly distinguish between tumor tissue and normal tissue. Subsequently, we analyzed the differential expression patterns of these genes in cohorts stratified by high-risk and low-risk, as determined by the risk scoring model (Fig. 5B). The expression levels of all genes exhibited statistically significant disparities between the high-risk and low-risk cohorts. Furthermore, comprehensive analyses of gene mutations and copy number variations were undertaken across all samples, as well as within the high- and low-risk cohorts specifically (Fig. 5C-F).

Fig. 5 — Molecular characteristics of the high- and low-risk groups. (A) NK-related DEGs between tumor and normal tissues. (B) NK-related DEGs between the high- and low-risk groups. (C) Mutation frequencies in the high- and low-risk groups. (D) CNV profile in all breast cancer patients. (E) CNV profile in the high-risk scores of breast cancer patients. (F) CNV profile in the low-risk scores of breast cancer patients. DEGs, differentially expressed genes; TMB, tumor mutational burden; CNV, copy number variation. *p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001, measured by unpaired t test

Evaluation of tumor-infiltrating immune cells utilizing the risk stratification

The risk model established in this study was based on the infiltration of NK cells, so we further investigated the association between prognostic features of breast cancer and the features of the tumor-infiltrating immune cells. Therefore, the distribution of various immune cell types across two risk groups were evaluated. Figure 6A shows significant differences in various T cell types between the high-risk and low-risk cohorts. A comparative analysis of the quartile distribution of tumor mutational burden (TMB) scores revealed that the cohort with a higher risk profile exhibited elevated stromal-, immune-, and ESTIMATE scores, accompanied by a reduction in tumor purity, as shown in Fig. 6B. The upregulation of immune checkpoint gene expression could potentially inhibit the activation of specific immune cell populations, thereby facilitating immune escape mechanisms in oncogenic processes. The transcriptional landscape of immune checkpoint molecules serves as a pivotal prognostic indicator for evaluating the therapeutic efficacy of immune checkpoint blockade therapies. A comparative expression analysis of immune checkpoints highlighted significant differences between the two risk stratifications. The expression levels of 50 immune checkpoints showed significant differences between the high-risk and low-risk cohorts, with genes such as CD47, TMAIGD2, HLA-DOB, and CD40LG having p-values less than 0.0001 (Fig. 6C). A subsequent functional disparity analysis of these cohorts in TCGA using GSVA indicated that the genes distinguishing the high-risk and low-risk stratifications were predominantly implicated in the GOMF INTERLEUKIN 1 RECEPTOR ACTIVITY, SPERM EJACULATION, and NEGATIVE REGULATION OF MONONUCLEAR CELL MIGRATION (Fig. 6D).

Fig. 6 — TCGA tumor microenvironment and immune characteristics between the high- and low-risk groups. There are differences in expression of multiple immune cells, including T cells and DC cells, among them, among high- and low-risk groups. (B) In groups with elevated NK expression, the stromal score, immune score, ESTIMATE score, and tumor purity are all higher compared to those with lower expression. (C) Almost all immune tests were reduced in the NK model high expression grouping. (D) Multiple tumor-related pathways differ between the two groups. *p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001, measured by unpaired t test

Tumor mutation landscape and drug responses in the risk model

In this study, we quantified the prevalence and distribution of gene mutations across the examined cohorts was performed, as illustrated in Fig. 7A. The generated waterfall plots showed that the mutations in TP53 (33%), PIK3CA (31%), TTN (18%), CDH1 (14%), and GATA3 (13%) emerged as the five most frequently mutated molecules within the prognostic risk model. Upon conducting an analysis of the tumor mutation bruder(TMB) variance amongst the cohorts, it was discerned that the TMB scores within the cohort classified as low-risk exhibited a slight elevation relative to those categorized under the high-risk cohort (Fig. 7B). Nonetheless, this difference did not reach statistical significance, with a p-value of 0.0712. Subsequently, patient specimens were stratified into two cohorts with elevated and reduced TMB values. Kaplan-Meier survival curves survival curves indicated no significant survival difference between these two cohorts (Fig. 7C). However, when the TMB score was integrated with the risk score for stratification, a more marked difference (p < 0.0001) was observed between individuals classified within high-risk and low-risk categories (Fig. 7D). These observations suggest that risk classification may aid in differentiating specific gene mutations and quantifying TMB scores. Futhermore, signatures of gene mutation are recognized as predictive biomarkers for pharmacological responsiveness within the field of oncology. In the following phase of our study, we quantified and compared the pharmacological susceptibility between the classification of high-risk and low-risk cohorts. The results illustrated in Fig. 7E-F showed that the frequently utilized chemotherapeutic agents and targeted therapeutic pharmaceuticals scrutinized within this investigation could potentially function as viable therapeutic strategies for patients exhibiting diverse risk scores.

Fig. 7 — Evaluation of TMB scores and drug sensitivity analysis. Waterfall plot of mutation frequencies in high- and low-risk groups. (B) TMB status in high- and low-risk groups. (C) Kaplan-Meier survival curves of patients among high and low TMB score groups. (D) Kaplan-Meier survival curves for patients by TMB status in the high- and low-risk groups. (E, F) Sensitivity of all (E) or top 10 (F) significant drugs in the high- and low-risk score groups. TMB, tumor mutational burden. *p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001, measured by unpaired t test

Discussion

Breast cancer, one of the most prevalent malignant tumors, is characterized by high heterogeneity and increased rates of recurrence and metastasis [17]. Despite significant advancements in comprehensive sequence therapy improving the quality of life for patients, breast cancer remains a substantial threat to human health [18]. The utilization of immunotherapy and targeted therapy for immune factors has seen a rise in the treatment of breast cancer. Studies have shown that the immune environment, specifically the presence of immune cells within the tumor, can impact the advancement of the disease and is linked to prognosis and response to treatment [19–23]. Consequently, comprehending the relationship between immune cell infiltration and tumor occurrence and development in breast cancer is crucial for the design of innovative diagnostic and therapeutic methods. Current findings from whole-genome transcriptomics research on cancer suggest that immune-related genes may serve as predictors of patient survival outcomes or responsiveness to specific immunotherapies [24].

Our findings align with recent research on NK cells in breast cancer prognosis. Zhang et al. demonstrated that peripheral blood NK cell counts serve as independent predictors of neoadjuvant chemotherapy response, constructing nomograms with C-indices of 0.786–0.877 [25]. Their approach focused on peripheral blood enumeration, whereas our study leverages single-cell RNA sequencing to identify 29 NK cell-specific marker genes from tumor tissue. The foundational work by Ascierto et al. established NK cell molecular signatures in breast cancer, showing that NK activation genes correlate with relapse-free survival through microarray analysis [26]. Our study advances this by employing single-cell sequencing and achieving superior predictive performance with an AUC of 0.98. Additionally, therapeutic innovations such as Fc-engineered cathepsin D antibodies that enhance NK cell-mediated cytotoxicity and research on S100A9’s role in ameliorating NK cell dysfunction in ER + breast cancer highlight the clinical potential of NK cell-targeted interventions [27, 28]. While these studies focus on functional enhancement, peripheral enumeration, or specific regulatory mechanisms, our comprehensive genomic approach provides a molecular blueprint that could guide both prognostic assessment and therapeutic targeting in breast cancer patients.

With the promotion of genome sequencing technology, there are more and more ways to build a tumor prediction model based on a certain type of gene, and this method has also made certain progress in other tumors [29–31]. The prognostic significance of NK cell infiltration in breast cancer can be elucidated by examining the correlation between NK cell levels and the expression of specific genes or proteins. Research indicates that the expression of certain genes or proteins within breast cancer tissues is associated with the degree of NK cell infiltration, which, in turn, has prognostic implications for patients. For example, a recent study demonstrated that overexpression of Kinesin Family Member 2 C (KIF2C) in breast cancer tissues is positively correlated with immune cell infiltration, including NK cells [32]. Conversely, Centromere Protein N (CENPN) expression has been shown to suppress NK cell activity in breast cancer, with high levels of CENPN expression correlating with tumor immunosuppression [33]. Furthermore, Disconnected Interacting Protein 2 Homolog B (DIP2B) was found to negatively correlate with the infiltration levels of key immune cells, including activated NK cells [34]. The assessment of NK cell infiltration holds promise as a biomarker for breast cancer prognosis [35]. In addition, given the significant effect of NK cells on anti-tumor effects (can directly kill tumor cells or enhance the efficacy through synergistic effects with other immunotherapy), the detection of NK Marker genes represents the number of NK cells in the breast cancer tumor microenvironment, and it is very feasible to predict the efficacy of tumor treatment through the detection of these Marker genes.

The enrichment of RNA degradation genes in NK cell markers highlights the importance of post-transcriptional control for rapid cytotoxic responses. Components of the RNA exosome and cofactors regulate the stability and translation of key effectors like granzyme B and perforin [36]. At rest, NK cells maintain “translational readiness” through active mRNA decay, then upon activation reduce degradation to boost protein synthesis [37, 38]. These mechanisms not only mark NK activation states in tumors but, when dysregulated, contribute to oncogenesis and therapy resistance.

The enrichment of HPV infection pathway genes among NK cell markers highlights the antiviral surveillance role of NK cells and their potential dysfunction in virus-associated cancers. NK cells recognize and eliminate HPV-infected cells, but HPV can evade NK cell immunity by downregulating activating receptors (e.g., NKp30, NKp46) and secreting immunosuppressive factors such as TGF-β [39, 40]. The presence of HPV pathway genes suggests that some NK cells may retain antiviral signatures or represent specialized subsets. Overall, these findings underscore the dual role of NK cells in antiviral and antitumor immunity, and suggest that viral infections may modulate NK cell function in cancer.

The current study used a breast cancer scRNA-seq data to create a NK-related gene signature that could predict the prognosis of breast cancer through the application of machine learning techniques. The risk score of each patient in the TCGA cohort was also determined using the prognostic model, and breast cancer patients were divided into high- and low-risk subgroups according to their median risk score. Kaplan-Meier survival analysis and the ROC curve were used to verify the prognostic signature in the TCGA cohort and the GSE20685 cohort, after creating the prognostic model. A nomogram was then created to predict patient outcomes based on gender, age, tumor stage, the risk group and the risk score. The survival rates of breast cancer patients were forecasted for 1- and 2 years using the nomogram. The DCA and ROC curves demonstrated that this model could reliably assess the survival rate of breast cancer patients. It was observed that elevated risk scores were associated with poorer outcomes. Despite identifying a breast cancer prognosis prediction model, there is still a lengthy process involved in translating it into a clinical model. To ensure the feasibility of clinical testing, it is imperative to identify relevant Marker genes in breast cancer patients and utilize the model formula to calculate each patient’s score, ultimately predicting their prognosis. Simplifying this process can be achieved by developing a prediction application or website, which is the next step in our ongoing research.

The relationship between TME and immune cell infiltration is a critical area of research, particularly in the context of cancer prognosis and the efficacy of immunotherapy. Recent studies have focused on various aspects of this relationship, including the development of scoring models to predict patient outcomes, the role of specific genes and pathways in modulating immune infiltration, and the potential for these factors to serve as biomarkers or therapeutic targets [41–44]. A study on breast cancer explored the relationship between intratumoral lymphatic endothelial cell (iLEC) infiltration, reflecting lymphangiogenesis, and the TME. The research hypothesized that iLECs could serve as an indicator of lymphangiogenesis and a predictor of metastatic potential and overall survival. The findings highlighted the complex interactions within the TME, including the balance between pro- and anti-cancer gene sets and the role of immune responses in counterbalancing lymphangiogenesis [45]. Thus, different levels and types of immune cell infiltration of the TME can significantly influence patient prognosis.

The tumor immune microenvironment is a process in which different cells influence and regulate each other. In our immune infiltration analysis, it was found that macrophages had different distributions in NK cell-related models. This may suggest a potential regulatory relationship between NK cells and macrophages. This provides a very good direction for us to explore the mechanisms of the NK cell model in the future. The relationship between TMB and immune infiltration was further explored. The relationship between TMB and cancer prognosis has been a focal point of recent research, with studies exploring its predictive value for immunotherapy response and overall survival across various cancer types [46–48]. TMB, defined as the total number of mutations per coding area of a tumor genome, is considered a biomarker for the effectiveness of immunotherapy, particularly immune checkpoint inhibitors [46]. High TMB is often associated with better responses to immunotherapy due to the increased likelihood of forming neoantigens that the immune system can recognize and target. In our study, while the difference between the two TMB groups and the prediction model yielded a P-value of 0.0712, this result suggests a potential biological trend that could reach statistical significance with an increased sample size. In recent years, targeted therapy has also successfully treated aggressive malignancies, such as lung, bladder, liver, and gastric cancer [49–52]. To find the treatment that targeted model genes, we seek drug response information in TCGA database. Drugs with significant response level between the high and low-risk groups has been well described in Fig. 7E. Several drugs including Tozasertib, Mirin, Navitoclax and Alisertib could be the optimal potential therapy for breast cancer patients, especially patients in the high risk group.

Several limitations were identified in this study. Firstly, the sample size of breast cancer patients was relatively limited, indicating a need for additional large-scale prospective cohort studies to validate these findings. Secondly, although both TCGA and GEO datasets are samples of breast cancer, the personal information (therapy information, race, age, etc.) of each sample may lead to species bias in the model. Further verification of a wider population is needed. Thirdly, the analyses are based on retrospective data, which may be influenced by variations in time, location, and clinical decision making. In order to confirm the accuracy of prognostic models for breast cancer prognosis, a prospective cohort study is necessary in the future. Furthermore, our entire study constructed two prognostic models. Given the different genes and algorithms included in the model, the model of simple NK cells in the early stage has better predictive effects on BRCA, and the impact on the future model will be further compared with larger and broader data. Lastly, further experimental investigations are essential to elucidate the mechanistic underpinnings by which individual genes associated with the risk score contribute to the pathogenesis of breast cancer.

Conclusion

In this study, we constructed a predictive algorithm pertaining to NK cell-associated genes utilizing machine learning approaches. Furthermore, we explored the correlation between these genetic markers and the immunological microenvironment, immunotherapeutic interventions, and pharmacological susceptibility. The implications of this study could significantly enhance the advancement of novel therapeutic strategies for breast cancer and ameliorate patient prognosis.

Supplementary Information

Below is the link to the electronic supplementary material.

12672_2025_3647_MOESM1_ESM.jpg^{(160.9KB, jpg)}

Supplementary Material 1. Supplementary Figure 1. The flow chat showed the schematic design of present study.

12672_2025_3647_MOESM2_ESM.tif^{(1.6MB, tif)}

Supplementary Material 2. Supplementary Figure 2. Evaluating model effectiveness based on multiple machine learning algorithms.

12672_2025_3647_MOESM3_ESM.csv^{(3.2KB, csv)}

Supplementary Material 3. Supplementary Table 1. Univariate COX analysis of 27 NK cell-related genes.

Author contributions

Jiawei Zhang: Contributed to the study’s conceptualization and design, conducted data analysis, and wrote the initial draft of the manuscript.Donghui Wang: Assisted in data collection and interpretation, and contributed to the writing and revision of the manuscript. Lan Hou and Juliang Zhang: Oversaw the entire research project, coordinated the contributions of all authors, and finalized the manuscript for submission.

Funding

Not applicable.

Data availability

The datasets generated and/or analysed during the current study are available in the GEO repository: GSE20685 [https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?%20acc=GSE20685]; GSE235168 [https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?%20acc=GSE235168], TCGA repository [https://www.cancer.gov/ccg/research/genome-sequencing/tcga].

Declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

All authors agree to publish this article.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Jiawei Zhang and Donghui Wang have contributed equally to this work.

Contributor Information

Lan Hou, Email: xyhoulan@126.com.

Juliang Zhang, Email: vascularzhang@163.com.

References

1.World Health Organization. Breast cancer. 2023. https://www.who.int/news-room/fact-sheets/detail/breast-cancer.
2.Nardin S, Mora E, Varughese FM, D’Avanzo F, Vachanaram AR, Rossi V, Saggia C, Rubinelli S, Gennari A. Breast cancer survivorship, quality of life, and late toxicities. Front Oncol. 2020;10:864. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Fujimoto RHP, Koifman RJ, Silva IFD. Survival rates of breast cancer and predictive factors: a hospital-based study from Western Amazon area in Brazil. Cien Saude Colet. 2019;24(1):261–73. [DOI] [PubMed] [Google Scholar]
4.Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, Bray F. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021;71(3):209–49. [DOI] [PubMed] [Google Scholar]
5.American Cancer Society. Survival Rates for Breast Cancer. https://www.cancer.org/cancer/types/breast-cancer/understanding-a-breast-cancer-diagnosis/breast-cancer-survival-rates.html.
6.Waks AG, Winer EP. Breast cancer treatment: A review. JAMA. 2019;321(3):288–300. [DOI] [PubMed] [Google Scholar]
7.Harbeck N, Gnant M. Breast cancer. Lancet. 2017;389(10074):1134–50. [DOI] [PubMed] [Google Scholar]
8.Harbeck N, Penault-Llorca F, Cortes J, Gnant M, Houssami N, Poortmans P, Ruddy K, Tsang J, Cardoso F. Breast cancer. Nat Rev Dis Primers. 2019;5(1):66. [DOI] [PubMed] [Google Scholar]
9.Lukasiewicz S, Czeczelewski M, Forma A, Baj J, Sitarz R, Stanislawek A. Breast cancer-epidemiology, risk factors, classification, prognostic markers, and current treatment strategies-an updated review. Cancers (Basel). 2021;13(17). [DOI] [PMC free article] [PubMed]
10.Smolarz B, Nowak AZ, Romanowicz H. Breast cancer-epidemiology, classification, pathogenesis and treatment (Review of Literature). Cancers (Basel). 2022;14(10). [DOI] [PMC free article] [PubMed]
11.Li JJ, Tsang JY, Tse GM. Tumor microenvironment in breast cancer-updates on therapeutic implications and pathologic assessment. Cancers (Basel). 2021;13(16). [DOI] [PMC free article] [PubMed]
12.Lejeune M, Reverte L, Sauras E, Gallardo N, Bosch R, Roso A, Petit A, Peg V, Riu F, Garcia-Fontgivell J, et al. Prognostic implications of the residual tumor microenvironment after neoadjuvant chemotherapy in triple-negative breast cancer patients without pathological complete response. Cancers (Basel). 2023;15(3). [DOI] [PMC free article] [PubMed]
13.Campos-Mora M, Jacot W, Garcin G, Depondt ML, Constantinides M, Alexia C, Villalba M. NK cells in peripheral blood carry trogocytosed tumor antigens from solid cancer cells. Front Immunol. 2023;14:1199594. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Li C, Yu S, Chen J, Hou Q, Wang S, Qian C, Yin S. Risk stratification based on DNA damage-repair-related signature reflects the microenvironmental feature, metabolic status and therapeutic response of breast cancer. Front Immunol. 2023;14:1127982. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Lin J, Zhao A, Fu D. Evaluating the tumor immune profile based on a three-gene prognostic risk model in HER2 positive breast cancer. Sci Rep. 2022;12(1):9311. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Lu N, Guan X, Bao W, Fan Z, Zhang J. Breast cancer combined prognostic model based on lactate metabolism genes. Med (Baltim). 2022;101(51):e32485. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.So JY, Yang HH, Park WY, Skrypek N, Ishii H, Chen JM, Lee MP, Yang L. DNA methyltransferase 3B-Mediated intratumoral heterogeneity and therapeutic targeting in breast cancer recurrence and metastasis. Mol Cancer Res. 2022;20(11):1674–85. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Otto LD, Russart KLG, Kulkarni P, McTigue DM, Ferris CF, Pyter LM. Paclitaxel chemotherapy elicits widespread brain anisotropy changes in a comprehensive mouse model of breast cancer survivorship: evidence from in vivo diffusion weighted imaging. Front Oncol. 2022;12:798704. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Zhu S, Wu Y, Song B, Yi M, Yan Y, Mei Q, Wu K. Recent advances in targeted strategies for triple-negative breast cancer. J Hematol Oncol. 2023;16(1):100. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.de la Harpe A, Beukes N, Frost C. Mitochondrial calcium overload contributes to cannabinoid-induced paraptosis in hormone-responsive breast cancer cells. Cell Prolif. 2024;57(10):e13650. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Wong RS, Ong RJ, Lim JS. Immune checkpoint inhibitors in breast cancer: development, mechanisms of resistance and potential management strategies. Cancer Drug Resist. 2023;6(4):768–87. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Xie J, Deng X, Xie Y, Zhu H, Liu P, Deng W, Ning L, Tang Y, Sun Y, Tang H, et al. Multi-omics analysis of Disulfidptosis regulators and therapeutic potential reveals glycogen synthase 1 as a Disulfidptosis triggering target for triple-negative breast cancer. MedComm (2020). 2024;5(3):e502. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Zou Y, Yang A, Chen B, Deng X, Xie J, Dai D, Zhang J, Tang H, Wu T, Zhou Z, et al. crVDAC3 alleviates ferroptosis by impeding HSPB1 ubiquitination and confers trastuzumab Deruxtecan resistance in HER2-low breast cancer. Drug Resist Updat. 2024;77:101126. [DOI] [PubMed] [Google Scholar]
24.Garcia-Martinez E, Gil GL, Benito AC, Gonzalez-Billalabeitia E, Conesa MA, Garcia Garcia T, Garcia-Garre E, Vicente V. Ayala de La Pena F: Tumor-infiltrating immune cell profiles and their change after neoadjuvant chemotherapy predict response and prognosis of breast cancer. Breast Cancer Res. 2014;16(6):488. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Hou Q, Li C, Chong Y, Yin H, Guo Y, Yang L, Li T, Yin S. Comprehensive single-cell and bulk transcriptomic analyses to develop an NK cell-derived gene signature for prognostic assessment and precision medicine in breast cancer. Front Immunol. 2024;15:1460607. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Ascierto ML, Idowu MO, Zhao Y, Khalak H, Payne KK, Wang XY, Dumur CI, Bedognetti D, Tomei S, Ascierto PA, et al. Molecular signatures mostly associated with NK cells are predictive of relapse free survival in breast cancer patients. J Transl Med. 2013;11:145. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Liu Z, Ding M, Qiu P, Pan K, Guo Q. Natural killer cell-related prognostic risk model predicts prognosis and treatment outcomes in triple-negative breast cancer. Front Immunol. 2023;14:1200282. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Fang Y, Zheng R, Xiao Y, Zhang Q, Liu J, Wu J. Machine learning-based diagnostic and prognostic models for breast cancer: a new frontier on the clinical application of natural killer cell-related gene signatures in precision medicine. Front Immunol. 2025;16:1581982. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Xia WT, Qiu WR, Yu WK, Xu ZC, Zhang SH. Identifying TME signatures for cervical cancer prognosis based on GEO and TCGA databases. Heliyon. 2023;9(4):e15096. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Yu Y, Ouyang W, Huang Y, Huang H, Wang Z, Jia X, Huang Z, Lin R, Zhu Y, Yalikun Y, et al. Artificial intelligence-based multi-modal multi-tasks analysis reveals tumor molecular heterogeneity, predicts preoperative lymph node metastasis and prognosis in papillary thyroid carcinoma: a retrospective study. Int J Surg. 2025;111(1):839–56. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Zhang Y, Song J, Zhao Z, Yang M, Chen M, Liu C, Ji J, Zhu D. Single-cell transcriptome analysis reveals tumor immune microenvironment heterogenicity and granulocytes enrichment in colorectal cancer liver metastases. Cancer Lett. 2020;470:84–94. [DOI] [PubMed] [Google Scholar]
32.Liu S, Ye Z, Xue VW, Sun Q, Li H, Lu D. KIF2C is a prognostic biomarker associated with immune cell infiltration in breast cancer. BMC Cancer. 2023;23(1):307. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Gui Z, Tian Y, Yu T, Liu S, Liu C, Zhang L. Clinical implications and immune features of CENPN in breast cancer. BMC Cancer. 2023;23(1):851. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Song C, Shang F, Tu W, Liu X. Integrated Pancancer analysis reveals the oncogene characteristics and prognostic value of DIP2B in breast cancer. BMC Cancer. 2023;23(1):296. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Yu Z, Cheng L, Liu X, Zhang L, Cao H. Increased expression of INHBA is correlated with poor prognosis and high immune infiltrating level in breast cancer. Front Bioinform. 2022;2:729902. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Kim TD, Park JY, Choi I. Post-transcriptional regulation of NK cell activation. Immune Netw. 2009;9(4):115–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Salcedo TW, Azzoni L, Wolf SF, Perussia B. Modulation of Perforin and granzyme messenger RNA expression in human natural killer cells. J Immunol. 1993;151(5):2511–20. [PubMed] [Google Scholar]
38.Fehniger TA, Cai SF, Cao X, Bredemeyer AJ, Presti RM, French AR, Ley TJ. Acquisition of murine NK cell cytotoxicity requires the translation of a pre-existing pool of granzyme B and Perforin mRNAs. Immunity. 2007;26(6):798–811. [DOI] [PubMed] [Google Scholar]
39.Wu X, Xiao Y, Guo D, Zhang Z, Liu M. Reduced NK cell cytotoxicity by Papillomatosis-Derived TGF-beta contributing to Low-Risk HPV persistence in JORRP patients. Front Immunol. 2022;13:849493. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Wang W, Xi Y, Li S, Liu X, Wang G, Wang H, Pei M, Zhang J, Gui J, Ni X. Restricted recruitment of NK cells with impaired function is caused by HPV-Driven immunosuppressive microenvironment of papillomas in aggressive Juvenile-Onset recurrent respiratory papillomatosis patients. J Virol. 2022;96(19):e0094622. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Huang H, Cai X, Lin J, Wu Q, Zhang K, Lin Y, Liu B, Lin J. A novel five-gene metabolism-related risk signature for predicting prognosis and immune infiltration in endometrial cancer: A TCGA data mining. Comput Biol Med. 2023;155:106632. [DOI] [PubMed] [Google Scholar]
42.Li W, Wang Q, Lu J, Zhao B, Geng Y, Wu X, Chen X. Machine learning-based prognostic modeling of lysosome-related genes for predicting prognosis and immune status of patients with hepatocellular carcinoma. Front Immunol. 2023;14:1169256. [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Zhang Z, Zhang J, Duan Y, Li X, Pan J, Wang G, Shen B. Identification of B cell marker genes based on single-cell sequencing to Establish a prognostic model and identify immune infiltration in osteosarcoma. Front Immunol. 2022;13:1026701. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Li S, Li Z, Wang X, Zhong J, Yu D, Chen H, Ma W, Liu L, Ye M, Shen R, et al. HK3 stimulates immune cell infiltration to promote glioma deterioration. Cancer Cell Int. 2023;23(1):227. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Wu R, Sarkar J, Tokumaru Y, Takabe Y, Oshi M, Asaoka M, Yan L, Ishikawa T, Takabe K. Intratumoral lymphatic endothelial cell infiltration reflecting lymphangiogenesis is counterbalanced by immune responses and better cancer biology in the breast cancer tumor microenvironment. Am J Cancer Res. 2022;12(2):504–20. [PMC free article] [PubMed] [Google Scholar]
46.Su X, Jin K, Guo Q, Xu Z, Liu Z, Zeng H, Wang Y, Zhu Y, Xu L, Wang Z, et al. Integrative score based on CDK6, PD-L1 and TMB predicts response to platinum-based chemotherapy and PD-1/PD-L1 Blockade in muscle-invasive bladder cancer. Br J Cancer. 2024. [DOI] [PMC free article] [PubMed]
47.Dong R, Chen S, Lu F, Zheng N, Peng G, Li Y, Yang P, Wen H, Qiu Q, Wang Y, et al. Models for predicting response to immunotherapy and prognosis in patients with gastric cancer: DNA damage response genes. Biomed Res Int. 2022;2022:4909544. [DOI] [PMC free article] [PubMed]
48.Chen Y, Deng Q, Chen H, Yang J, Chen Z, Li J, Fu Z. Cancer-associated fibroblast-related prognostic signature predicts prognosis and immunotherapy response in pancreatic adenocarcinoma based on single-cell and bulk RNA-sequencing. Sci Rep. 2023;13(1):16408. [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Wu C, Zhao J, Wang X, Wang Y, Zhang W, Zhu G. A novel pyroptosis related genes signature for predicting prognosis and estimating tumor immune microenvironment in lung adenocarcinoma. Transl Cancer Res. 2022;11(8):2647–59. [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Zhao J, Wu C, Wang Y, Li M, Jiang Y, Luo Y. Identification of a pyroptosis related gene signature for predicting prognosis and estimating tumor immune microenvironment in bladder cancer. Transl Cancer Res. 2022;11(7):1865–79. [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Ye J, Tian W, Zheng B, Zeng T. Identification of cancer-associated fibroblasts signature for predicting the prognosis and immunotherapy response in hepatocellular carcinoma. Med (Baltim). 2023;102(45):e35938. [DOI] [PMC free article] [PubMed] [Google Scholar]
52.Wei C, Li M, Lin S, Xiao J. Characterization of tumor mutation Burden-Based gene signature and molecular subtypes to assist precision treatment in gastric cancer. Biomed Res Int. 2022;2022:4006507. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

12672_2025_3647_MOESM1_ESM.jpg^{(160.9KB, jpg)}

Supplementary Material 1. Supplementary Figure 1. The flow chat showed the schematic design of present study.

12672_2025_3647_MOESM2_ESM.tif^{(1.6MB, tif)}

Supplementary Material 2. Supplementary Figure 2. Evaluating model effectiveness based on multiple machine learning algorithms.

12672_2025_3647_MOESM3_ESM.csv^{(3.2KB, csv)}

Supplementary Material 3. Supplementary Table 1. Univariate COX analysis of 27 NK cell-related genes.

Data Availability Statement

[CR1] 1.World Health Organization. Breast cancer. 2023. https://www.who.int/news-room/fact-sheets/detail/breast-cancer.

[CR2] 2.Nardin S, Mora E, Varughese FM, D’Avanzo F, Vachanaram AR, Rossi V, Saggia C, Rubinelli S, Gennari A. Breast cancer survivorship, quality of life, and late toxicities. Front Oncol. 2020;10:864. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR3] 3.Fujimoto RHP, Koifman RJ, Silva IFD. Survival rates of breast cancer and predictive factors: a hospital-based study from Western Amazon area in Brazil. Cien Saude Colet. 2019;24(1):261–73. [DOI] [PubMed] [Google Scholar]

[CR4] 4.Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, Bray F. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021;71(3):209–49. [DOI] [PubMed] [Google Scholar]

[CR5] 5.American Cancer Society. Survival Rates for Breast Cancer. https://www.cancer.org/cancer/types/breast-cancer/understanding-a-breast-cancer-diagnosis/breast-cancer-survival-rates.html.

[CR6] 6.Waks AG, Winer EP. Breast cancer treatment: A review. JAMA. 2019;321(3):288–300. [DOI] [PubMed] [Google Scholar]

[CR7] 7.Harbeck N, Gnant M. Breast cancer. Lancet. 2017;389(10074):1134–50. [DOI] [PubMed] [Google Scholar]

[CR8] 8.Harbeck N, Penault-Llorca F, Cortes J, Gnant M, Houssami N, Poortmans P, Ruddy K, Tsang J, Cardoso F. Breast cancer. Nat Rev Dis Primers. 2019;5(1):66. [DOI] [PubMed] [Google Scholar]

[CR9] 9.Lukasiewicz S, Czeczelewski M, Forma A, Baj J, Sitarz R, Stanislawek A. Breast cancer-epidemiology, risk factors, classification, prognostic markers, and current treatment strategies-an updated review. Cancers (Basel). 2021;13(17). [DOI] [PMC free article] [PubMed]

[CR10] 10.Smolarz B, Nowak AZ, Romanowicz H. Breast cancer-epidemiology, classification, pathogenesis and treatment (Review of Literature). Cancers (Basel). 2022;14(10). [DOI] [PMC free article] [PubMed]

[CR11] 11.Li JJ, Tsang JY, Tse GM. Tumor microenvironment in breast cancer-updates on therapeutic implications and pathologic assessment. Cancers (Basel). 2021;13(16). [DOI] [PMC free article] [PubMed]

[CR12] 12.Lejeune M, Reverte L, Sauras E, Gallardo N, Bosch R, Roso A, Petit A, Peg V, Riu F, Garcia-Fontgivell J, et al. Prognostic implications of the residual tumor microenvironment after neoadjuvant chemotherapy in triple-negative breast cancer patients without pathological complete response. Cancers (Basel). 2023;15(3). [DOI] [PMC free article] [PubMed]

[CR13] 13.Campos-Mora M, Jacot W, Garcin G, Depondt ML, Constantinides M, Alexia C, Villalba M. NK cells in peripheral blood carry trogocytosed tumor antigens from solid cancer cells. Front Immunol. 2023;14:1199594. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] 14.Li C, Yu S, Chen J, Hou Q, Wang S, Qian C, Yin S. Risk stratification based on DNA damage-repair-related signature reflects the microenvironmental feature, metabolic status and therapeutic response of breast cancer. Front Immunol. 2023;14:1127982. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] 15.Lin J, Zhao A, Fu D. Evaluating the tumor immune profile based on a three-gene prognostic risk model in HER2 positive breast cancer. Sci Rep. 2022;12(1):9311. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR16] 16.Lu N, Guan X, Bao W, Fan Z, Zhang J. Breast cancer combined prognostic model based on lactate metabolism genes. Med (Baltim). 2022;101(51):e32485. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.So JY, Yang HH, Park WY, Skrypek N, Ishii H, Chen JM, Lee MP, Yang L. DNA methyltransferase 3B-Mediated intratumoral heterogeneity and therapeutic targeting in breast cancer recurrence and metastasis. Mol Cancer Res. 2022;20(11):1674–85. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Otto LD, Russart KLG, Kulkarni P, McTigue DM, Ferris CF, Pyter LM. Paclitaxel chemotherapy elicits widespread brain anisotropy changes in a comprehensive mouse model of breast cancer survivorship: evidence from in vivo diffusion weighted imaging. Front Oncol. 2022;12:798704. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] 19.Zhu S, Wu Y, Song B, Yi M, Yan Y, Mei Q, Wu K. Recent advances in targeted strategies for triple-negative breast cancer. J Hematol Oncol. 2023;16(1):100. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.de la Harpe A, Beukes N, Frost C. Mitochondrial calcium overload contributes to cannabinoid-induced paraptosis in hormone-responsive breast cancer cells. Cell Prolif. 2024;57(10):e13650. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] 21.Wong RS, Ong RJ, Lim JS. Immune checkpoint inhibitors in breast cancer: development, mechanisms of resistance and potential management strategies. Cancer Drug Resist. 2023;6(4):768–87. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] 22.Xie J, Deng X, Xie Y, Zhu H, Liu P, Deng W, Ning L, Tang Y, Sun Y, Tang H, et al. Multi-omics analysis of Disulfidptosis regulators and therapeutic potential reveals glycogen synthase 1 as a Disulfidptosis triggering target for triple-negative breast cancer. MedComm (2020). 2024;5(3):e502. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR23] 23.Zou Y, Yang A, Chen B, Deng X, Xie J, Dai D, Zhang J, Tang H, Wu T, Zhou Z, et al. crVDAC3 alleviates ferroptosis by impeding HSPB1 ubiquitination and confers trastuzumab Deruxtecan resistance in HER2-low breast cancer. Drug Resist Updat. 2024;77:101126. [DOI] [PubMed] [Google Scholar]

[CR24] 24.Garcia-Martinez E, Gil GL, Benito AC, Gonzalez-Billalabeitia E, Conesa MA, Garcia Garcia T, Garcia-Garre E, Vicente V. Ayala de La Pena F: Tumor-infiltrating immune cell profiles and their change after neoadjuvant chemotherapy predict response and prognosis of breast cancer. Breast Cancer Res. 2014;16(6):488. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Hou Q, Li C, Chong Y, Yin H, Guo Y, Yang L, Li T, Yin S. Comprehensive single-cell and bulk transcriptomic analyses to develop an NK cell-derived gene signature for prognostic assessment and precision medicine in breast cancer. Front Immunol. 2024;15:1460607. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.Ascierto ML, Idowu MO, Zhao Y, Khalak H, Payne KK, Wang XY, Dumur CI, Bedognetti D, Tomei S, Ascierto PA, et al. Molecular signatures mostly associated with NK cells are predictive of relapse free survival in breast cancer patients. J Transl Med. 2013;11:145. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] 27.Liu Z, Ding M, Qiu P, Pan K, Guo Q. Natural killer cell-related prognostic risk model predicts prognosis and treatment outcomes in triple-negative breast cancer. Front Immunol. 2023;14:1200282. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.Fang Y, Zheng R, Xiao Y, Zhang Q, Liu J, Wu J. Machine learning-based diagnostic and prognostic models for breast cancer: a new frontier on the clinical application of natural killer cell-related gene signatures in precision medicine. Front Immunol. 2025;16:1581982. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR29] 29.Xia WT, Qiu WR, Yu WK, Xu ZC, Zhang SH. Identifying TME signatures for cervical cancer prognosis based on GEO and TCGA databases. Heliyon. 2023;9(4):e15096. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR30] 30.Yu Y, Ouyang W, Huang Y, Huang H, Wang Z, Jia X, Huang Z, Lin R, Zhu Y, Yalikun Y, et al. Artificial intelligence-based multi-modal multi-tasks analysis reveals tumor molecular heterogeneity, predicts preoperative lymph node metastasis and prognosis in papillary thyroid carcinoma: a retrospective study. Int J Surg. 2025;111(1):839–56. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR31] 31.Zhang Y, Song J, Zhao Z, Yang M, Chen M, Liu C, Ji J, Zhu D. Single-cell transcriptome analysis reveals tumor immune microenvironment heterogenicity and granulocytes enrichment in colorectal cancer liver metastases. Cancer Lett. 2020;470:84–94. [DOI] [PubMed] [Google Scholar]

[CR32] 32.Liu S, Ye Z, Xue VW, Sun Q, Li H, Lu D. KIF2C is a prognostic biomarker associated with immune cell infiltration in breast cancer. BMC Cancer. 2023;23(1):307. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR33] 33.Gui Z, Tian Y, Yu T, Liu S, Liu C, Zhang L. Clinical implications and immune features of CENPN in breast cancer. BMC Cancer. 2023;23(1):851. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR34] 34.Song C, Shang F, Tu W, Liu X. Integrated Pancancer analysis reveals the oncogene characteristics and prognostic value of DIP2B in breast cancer. BMC Cancer. 2023;23(1):296. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR35] 35.Yu Z, Cheng L, Liu X, Zhang L, Cao H. Increased expression of INHBA is correlated with poor prognosis and high immune infiltrating level in breast cancer. Front Bioinform. 2022;2:729902. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR36] 36.Kim TD, Park JY, Choi I. Post-transcriptional regulation of NK cell activation. Immune Netw. 2009;9(4):115–21. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR37] 37.Salcedo TW, Azzoni L, Wolf SF, Perussia B. Modulation of Perforin and granzyme messenger RNA expression in human natural killer cells. J Immunol. 1993;151(5):2511–20. [PubMed] [Google Scholar]

[CR38] 38.Fehniger TA, Cai SF, Cao X, Bredemeyer AJ, Presti RM, French AR, Ley TJ. Acquisition of murine NK cell cytotoxicity requires the translation of a pre-existing pool of granzyme B and Perforin mRNAs. Immunity. 2007;26(6):798–811. [DOI] [PubMed] [Google Scholar]

[CR39] 39.Wu X, Xiao Y, Guo D, Zhang Z, Liu M. Reduced NK cell cytotoxicity by Papillomatosis-Derived TGF-beta contributing to Low-Risk HPV persistence in JORRP patients. Front Immunol. 2022;13:849493. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR40] 40.Wang W, Xi Y, Li S, Liu X, Wang G, Wang H, Pei M, Zhang J, Gui J, Ni X. Restricted recruitment of NK cells with impaired function is caused by HPV-Driven immunosuppressive microenvironment of papillomas in aggressive Juvenile-Onset recurrent respiratory papillomatosis patients. J Virol. 2022;96(19):e0094622. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR41] 41.Huang H, Cai X, Lin J, Wu Q, Zhang K, Lin Y, Liu B, Lin J. A novel five-gene metabolism-related risk signature for predicting prognosis and immune infiltration in endometrial cancer: A TCGA data mining. Comput Biol Med. 2023;155:106632. [DOI] [PubMed] [Google Scholar]

[CR42] 42.Li W, Wang Q, Lu J, Zhao B, Geng Y, Wu X, Chen X. Machine learning-based prognostic modeling of lysosome-related genes for predicting prognosis and immune status of patients with hepatocellular carcinoma. Front Immunol. 2023;14:1169256. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR43] 43.Zhang Z, Zhang J, Duan Y, Li X, Pan J, Wang G, Shen B. Identification of B cell marker genes based on single-cell sequencing to Establish a prognostic model and identify immune infiltration in osteosarcoma. Front Immunol. 2022;13:1026701. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR44] 44.Li S, Li Z, Wang X, Zhong J, Yu D, Chen H, Ma W, Liu L, Ye M, Shen R, et al. HK3 stimulates immune cell infiltration to promote glioma deterioration. Cancer Cell Int. 2023;23(1):227. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR45] 45.Wu R, Sarkar J, Tokumaru Y, Takabe Y, Oshi M, Asaoka M, Yan L, Ishikawa T, Takabe K. Intratumoral lymphatic endothelial cell infiltration reflecting lymphangiogenesis is counterbalanced by immune responses and better cancer biology in the breast cancer tumor microenvironment. Am J Cancer Res. 2022;12(2):504–20. [PMC free article] [PubMed] [Google Scholar]

[CR46] 46.Su X, Jin K, Guo Q, Xu Z, Liu Z, Zeng H, Wang Y, Zhu Y, Xu L, Wang Z, et al. Integrative score based on CDK6, PD-L1 and TMB predicts response to platinum-based chemotherapy and PD-1/PD-L1 Blockade in muscle-invasive bladder cancer. Br J Cancer. 2024. [DOI] [PMC free article] [PubMed]

[CR47] 47.Dong R, Chen S, Lu F, Zheng N, Peng G, Li Y, Yang P, Wen H, Qiu Q, Wang Y, et al. Models for predicting response to immunotherapy and prognosis in patients with gastric cancer: DNA damage response genes. Biomed Res Int. 2022;2022:4909544. [DOI] [PMC free article] [PubMed]

[CR48] 48.Chen Y, Deng Q, Chen H, Yang J, Chen Z, Li J, Fu Z. Cancer-associated fibroblast-related prognostic signature predicts prognosis and immunotherapy response in pancreatic adenocarcinoma based on single-cell and bulk RNA-sequencing. Sci Rep. 2023;13(1):16408. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR49] 49.Wu C, Zhao J, Wang X, Wang Y, Zhang W, Zhu G. A novel pyroptosis related genes signature for predicting prognosis and estimating tumor immune microenvironment in lung adenocarcinoma. Transl Cancer Res. 2022;11(8):2647–59. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR50] 50.Zhao J, Wu C, Wang Y, Li M, Jiang Y, Luo Y. Identification of a pyroptosis related gene signature for predicting prognosis and estimating tumor immune microenvironment in bladder cancer. Transl Cancer Res. 2022;11(7):1865–79. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR51] 51.Ye J, Tian W, Zheng B, Zeng T. Identification of cancer-associated fibroblasts signature for predicting the prognosis and immunotherapy response in hepatocellular carcinoma. Med (Baltim). 2023;102(45):e35938. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR52] 52.Wei C, Li M, Lin S, Xiao J. Characterization of tumor mutation Burden-Based gene signature and molecular subtypes to assist precision treatment in gastric cancer. Biomed Res Int. 2022;2022:4006507. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]

PERMALINK

Identification of NK cell marker genes based on single-cell sequencing to establish a prognostic signature in breast cancer

Jiawei Zhang

Donghui Wang

Xiangmei He

Lan Hou

Juliang Zhang

Abstract

Background

Methods

Results

Conclusions

Supplementary Information

Introduction

Methods

Data sources

Data preprocessing and standardization

Identification of NK cell-related genes

Construction of a prognostic risk scoring model according to NK markers

Validation and evaluation of model efficiency

Nomogram of the prediction model

Assessment of tumor-infiltrating immune cells and microenvironment

Identification of gene mutation and drug response prediction

Statistical analysis

Results

Identification of NK cell marker genes

Fig. 1.

Establishment of an NK related prognostic model for breast cancer

Fig. 2.

Fig. 3.

Nomogram of the prognostic risk model

Fig. 4.

Characteristics of NK cell marker genes in the prognostic risk model

Fig. 5.

Evaluation of tumor-infiltrating immune cells utilizing the risk stratification

Fig. 6.

Tumor mutation landscape and drug responses in the risk model

Fig. 7.

Discussion

Conclusion

Supplementary Information

Author contributions

Funding

Data availability

Declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Footnotes

Contributor Information

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases