Abstract
Triple-negative breast cancer (TNBC) is notorious for its rapid progression, tendency to metastasize, high recurrence rates, dismal outcomes, and limited treatment options, underscoring the urgent need to uncover new biomarkers and molecular pathways to enhance diagnosis, prognosis, and therapeutic strategies. Metabolic reprogramming continues to play a role throughout the life cycle of cancer, evolving and adapting. In this study, we aimed to identify specific genes associated with metabolic reprogramming in TNBC, which can potentially become unique biomarkers of this cancer. TNBC datasets retrieved from the Gene Expression Omnibus were employed to pinpoint genes exhibiting altered expression linked to tumor metabolic reprogramming. Key genes were accurately screened through machine learning algorithms, and then externally verified using the TBNC dataset based on the Cancer Genome Atlas database. Finally, immunohistochemical methods were used to clinically confirm the differential expression and trends of these key genes. Our analysis accurately identified four genes—CLEC7A, IRS1, RSPO3, and ALB—that are closely correlated with the metabolic reprogramming characteristics of cancer, and could be regarded as innovative biomarkers for TNBC. This opens a new avenue for further investigation into the mechanisms of metabolic reprogramming in TNBC and new treatment strategies.
Supplementary Information
The online version contains supplementary material available at 10.1007/s10238-025-01870-1.
Keywords: Triple-negative breast cancer, Metabolic reprogramming, Machine learning, Biomarkers, Immune infiltration, Immunohistochemistry
Introduction
Breast cancer continues to pose a substantial global health burden, particularly impacting women and standing as the foremost cause of cancer-related illness and death worldwide. This disease is marked by considerable diversity, shaped by the intricate interplay between genetic susceptibility and environmental influences [1]. Within the spectrum of breast cancer, triple-negative breast cancer (TNBC) accounts for roughly 15–20% of cases and lacks expression of estrogen receptor, progesterone receptor, and human epidermal growth factor receptor-2 (HER-2), and is linked to the most unfavorable prognosis [2, 3]. Known for its aggressive nature, TNBC often leads to early relapse, swift spread to distant organs, and fewer treatment options compared to hormone receptor-positive or HER-2-driven subtypes [4] At present, the conventional treatment of TNBC, such as chemotherapy, surgical resection, and radiation therapy, is still relied upon; tumor immunotherapy and targeted therapy have been improved in recent years, which is indeed an alternative way for TNBC; and the research and development of targeted therapies are still caught in the realm of trial and error, for which the lack of specific biomarkers. This gap highlights the pressing need to uncover the molecular mechanisms driving TNBC development [5] Consequently, research efforts must prioritize the discovery of novel biomarkers and the development of targeted treatments, advancing the fields of precision and personalized medicine. Recent findings have emphasized metabolic reprogramming as a pivotal feature of TNBC progression, allowing cancer cells to alter their energy metabolism to fuel unchecked growth, avoid cell death, and facilitate metastasis [6]. Thus, investigating the molecular and cellular mechanisms of metabolic reprogramming in TNBC offers valuable insights into its origins and progression.
The fusion of bioinformatics and machine learning has revolutionized cancer investigation, enabling comprehensive analysis of extensive multi-omics datasets. These computational tools have accelerated the identification of disease-specific biomarkers, therapeutic targets, and predictive models.
This study utilized transcriptomic datasets and systems biology approaches to pinpoint metabolic reprogramming-associated genes with diagnostic, prognostic, and therapeutic relevance in TNBC. Our methodology integrates differential gene expression analysis, robust rank aggregation (RRA), machine learning techniques, and Friends analysis to highlight critical genes that may serve as innovative targets for precision oncology.
Materials and methods
Acquisition and analysis of data
We screened 2 TNBC datasets from the Gene Expression Omnibus (GEO) database: GSE38959 (based on GPL4133, including 13 normal tissues and 30 TNBC tissues) and GSE53752 (based on GPL7264, including 25 normal tissues and 51 TNBC tissues) [7, 8]. Both datasets were normalized and performed in R software (version 4.4.2) by utilizing the “Limma” and “ggplot2” packages. Visualized by Box plots and Principal Component Analysis (PCA) plots.
Screening of DEGs
Using a threshold of |log2 fold change (FC)|> 1 and p < 0.05, differentially expressed genes (DEGs) from the datasets GSE38959 and GSE53752 were pinpointed through the “Limma” package in R software (version 4.4.2). To visualize these DEGs, heat maps and volcano plots were generated with the help of the “Pheatmap” and “ggplot2” packages, respectively, also within R software (version 4.4.2).
Biological function and pathway enrichment analysis
To perform a comprehensive analysis of gene pathways, we employed Gene Set Enrichment Analysis (GSEA) [9], Gene Ontology (GO) [10], and Kyoto Encyclopedia of Genes and Genomes (KEGG) [11] pathway enrichment techniques using R software (version 4.4.2) in conjunction with the “ClusterProfiler” package. The gene reference list, “c2.cp.Reactome.v7.0.symbols.gmt”, was sourced from the Molecular Signature Database (MsigDB). The GO analysis was broken down into three distinct categories: biological processes (BP), cellular components (CC), and molecular functions (MF). A false discovery rate (FDR) threshold of less than 0.05 was established to determine statistical significance [12].
Screening of cDEGs
The “VennDiagram” tool in R software (version 4.4.2) generated a Venn diagram [13] to identify common DEGs (cDEGs) linked to metabolic reprogramming in datasets GSE38959 and GSE53752. The metabolic reprogramming gene list was downloaded from the GeneCards database.
RRA analysis
To screen and evaluate the top cDEGs from datasets GSE38959 and GSE53752, we performed an RRA analysis [14]. This analysis was conducted utilizing the “RobustRankAggreg” and “ggplot2” packages of R software (version 4.4.2).
Machine learning
GSE32641 (based on GPL887, including 7 normal tissues and 52 TNBC tissues) from GEO datasets was set as an external validation set for machine learning. The Least Absolute Shrinkage and Selection Operator (LASSO) [15] is a logistic regression technique that employs an L1-penalty (lambda) to zero out coefficients of less relevant variables, effectively sifting through the data to pinpoint the most significant predictors and build an optimal classification model. On the other hand, Support Vector Machine-Recursive Feature Elimination (SVM-RFE) [16] is a supervised machine learning approach designed to identify key genes by iteratively removing the least important feature vectors generated by the SVM algorithm. Meanwhile, Random Forest (RF) [17] is a machine learning method rooted in decision trees, which assesses the importance of variables by assigning scores to each, thereby highlighting their relative contribution to the model. The three machine learning algorithms were analyzed utilizing the “glmnet”, “e1071”, and “randomForest” packages in R software (version 4.4.2), respectively. Finally, a Venn diagram was constructed to identify the intersection of hub genes obtained through these three machine-learning algorithms via the “VennDiagram” package in R software (version 4.4.2).
Friends analysis
To identify key genes from the hub genes, we chose the Friends analysis [18], which was conducted by “org.Hs.eg.db” and “GOSemSim” packages in R software (version 4.4.2). Subsequently, we visualized Friends analysis by the “ggplot2” package plot package.
Expression and clinical values evaluation
Extracted from normal control and TNBC samples in The Cancer Genome Atlas (TCGA), samples that did not correspond to clinical information were discarded. To explore the expression patterns of key genes in the TCGA-TNBC cohort, we firstly verified the expression of key genes in the TCGA-TNBC DEGs cohort, which was performed by generating paired-sample plots, co-expression heatmaps, clinical expression heatmaps, chordal plots, and cyclotomic plots by the “ggplot2” and “circlize” packages of R software (version 4.4.2). To assess the clinical diagnostic significance of key genes, we utilized the R software (version 4.4.2) with the “ggplot2” and “pROC” packages. This allowed us to compute the receiver operating characteristic (ROC) curve and derive the area under the curve (AUC). Genes with an AUC value falling within the 0.5 to 1 range are deemed valuable for diagnosing TNBC patients. The closer the AUC is to 1, the higher the diagnostic prowess of the gene. Furthermore, to evaluate the diagnostic efficacy of the key genes in TNBC, we delved into the association between these key genes and overall survival (OS). Employing the R programming language’s “survival” and “survminer” packages, we conducted an analysis and plotted the Kaplan–Meier (K–M) curves to gain insights, chose hazard ratio (HR), and confidence interval (CI), p < 0.05 was defined as statistically significant.
Immune infiltration analysis evaluation
To mine the TCGA-TNBC cohort using the CIBERSORT [19] algorithms, the researchers used the “CIBERSORT” and “ggplot2” packages in R software (version 4.4.2). Their aim was to understand the association between key genes and 22 immune cell types. To illustrate these associations, the researchers constructed stacked bar charts and box plots, with the help of which a clear visualization was achieved.
TF and ceRNA regulatory network construction
Transcription factors (TFs) targeting key genes based on at least 3 TF databases, such as hTFtarget, ENCODE, CHIP_atlas, and KnockTF, were assessed. TF-DNA interaction network was constructed and visualized by Cytoscape (3.7.1 version). MicroRNAs (miRNAs) targeting key genes based on 3 databases, including miRWalk, TargetScan, and miRDB, were evaluated. The competing endogenous RNA (ceRNA) network formed from the associations between circular RNAs (circRNAs) and miRNAs was assembled by using CircBase and visualized as a Sankey plot by applying the “ggplot2” and “VennDiagram” packages of R software (version 4.4.2).
Immunohistochemical (IHC) staining
Sample information
This research, authorized by Xuzhou Central Hospital’s Ethics Committee (XZXY-LK-20240926-0150). In this study, we used archived paraffin-embedded specimens from the pathology department, matching TNBC tissue and paraneoplastic normal breast tissue (spacing > 2 cm) as samples. Inclusion criteria included postoperative diagnosis confirmed by histopathology, completion of standard HE staining with diagnostic review, and complete clinical information.
The case was a 47-year-old female patient with a preoperative Karnofsky Functional status score of 100, a regular preoperative menstrual cycle (last menstruation 1 month before surgery), no family history of breast cancer or underlying disease, and no neoadjuvant therapy. The tumor was located in the upper outer quadrant of the right breast, and a modified radical mastectomy with sentinel lymph node biopsy and axillary dissection was performed. Postoperative pathology showed that the gross specimen was invasive ductal carcinoma with histologic grade III (Nottingham grading system), the primary focus was 2.0 × 1.8 × 1.5 cm, and no cancer metastasis was seen in any of the 15 axillary lymph nodes (0/15), and the immunophenotypes were negative for ER, PR, and HER-2, with a proliferation index of approximately 60% for Ki67; CK5/6; EGFR, GATA3; E -CAD and membrane type P120 positive, P63 negative.
Experimental reagents and instruments
Albumin monoclonal antibody (Cat No. 66051-1-Ig) and RSPO3 monoclonal antibody (Cat No. 66314-1-Ig) were purchased from Proteintech; IRS1 mouse monoclonal antibody and CLEC7A rabbit polyclonal antibody were purchased from Affinity Biosciences. Abcam (Shanghai, China) provided horseradish peroxidase-linked secondary antibodies: goat anti-mouse IgG (Cat No. ab205719) and goat anti-rabbit IgG (Cat No. ab205718). Part reagents—including sodium citrate antigen retrieval solution, endogenous peroxidase blocking buffer, hematoxylin staining solution, PBS buffer, and the DAB horseradish peroxidase chromogenic kit—were sourced from Beyotime Biotechnology in Shanghai. Additionally, all sundry supplies are sourced from Jiangsu Shitai Experimental Equipment Co., Ltd. The following equipment used in this experiment were provided by the Department of Pathology at Xuzhou Central Hospital: the PBM-A model tissue embedding freezing station, a LEICA RM2235 rotary microtome, the BenchMark ULTRA PLUS automated slide stainer, a PHY-IIIS2-93 tissue floating and drying instrument, Fuyilian refrigerators (maintaining temperatures of − 4 °C and − 20 °C), an Olympus IX73 inverted microscope, and various precision pipetting devices.
Immunohistochemical staining process [20]
After completing HE staining, a four-μm section was made from a wax block of selected tumor and peritumoral normal tissues. The section was baked at 68 °C for 1 h and then subjected to dewaxing and hydration (xylene for 10 min × 2 times → anhydrous ethanol for 5 min × 2 times → graded ethanol for 5 min each at 95%, 90%, 85%, 80%, and 75%). Subsequent operations covered antigen repair, sealing, primary antibody incubation (according to reagent instructions), secondary antibody incubation, DAB staining, hematoxylin counterstain, followed by dehydration and mounting finalized the slides.
Interpretation and quantification
Two pathologists independently evaluated IHC results. Images were captured at × 40 magnification using an inverted light microscope, with three representative fields per slide. Quantitative analysis was performed using ImageJ software, visualized as bar graphs, with statistical significance at p < 0.05.
Statistical analysis
Statistical analyses were executed using GraphPad Prism 8.0 (GraphPad Software, CA, USA) for statistical computation. Key gene expression differences were evaluated using unpaired t-tests, while relationships between gene expression and clinical parameters were examined through Pearson correlation tests. ROC curves were plotted to gage diagnostic accuracy, with AUC values and 95% confidence intervals serving as performance metrics. Survival outcomes were visualized via Kaplan–Meier curves, and group comparisons were made using log-rank tests. Statistical significance was defined at p < 0.05 for all statistical tests. computations were.
Results
Screening of DEGs
Comparative gene expression analysis between two mRNA datasets was performed using R’s Limma package. Data from both datasets were normalized by employing the normalizeBetweenArrays() function from the limma package, using the “quantile” method with its default parameters to minimize technical biases and ensure comparability. Following this preprocessing, the analytical pipeline identified DEGs with results visualized through box plots, PCA plots, volcano plots, and heatmaps (Fig. 1A–H). The two datasets had distinct expression patterns. GSE38959 had 3409 up-regulated and 5072 down-regulated genes. GSE53752 had 2388 up-regulated and 2356 down-regulated genes. DEGs distribution differences suggest biological variations in samples, highlighting the importance of context-specific interpretation in transcriptomic studies. All DEGs met strict biological relevance criteria (|log2 FC|> 1 and p < 0.05).
Fig. 1.
A, B Diagrams showing the data quality from GSE38959 and the differences between groups. Boxplot (left) of GSE38959. PCA (right) dimensionality reduction analysis chart of GSE38959. C, D Diagrams showing the quality of the data from GSE53752 and the differences between groups. Boxplot (left) of GSE53752. PCA (right) dimensionality reduction analysis chart of GSE53752. E, F Volcano maps of GSE38959 (left) and GSE53752 (right). G, H Heatmaps of GSE38959 (left) and GSE53752 (right)
GSEA analysis of DEGs in GSE38959 and GSE53752 and cDEGs identification via RRA analysis
GSEA of the GSE38959 and GSE53752 datasets identified DEGs linked to telomere maintenance, PI3K-AKT and MAPK signaling, TP53 regulation, cell cycle dynamics, endocytosis, immune activation, and extracellular matrix reorganization. Intersecting these DEGs with metabolic reprogramming genes yielded cDEGs, categorized as up-regulated or down-regulated. RRA analysis highlighted 106 up-regulated and 90 down-regulated cDEGs, with the top 24 most significant cDEGs screened-CLEC7A, IRS1, RSPO3, ALB, OXTR, WIF1, TP53, SCGB2A2, BIRC5, S100P, SCGB1D2, NPY2R, CXCL11, NEK2, EDN3, PTN, SOX11, GJB2, PIP, SCUBE2, SCGB1D1, FIGF, MUCL1 and CEP55. KEGG and GO analyses revealed that up-regulated cDEGs were tied to chromosome segregation and cytokine signaling, while down-regulated cDEGs were associated with steroid hormone receptors and Wnt-protein interactions. PI3K-Akt and MAPK pathways emerged as key regulators across datasets.
Machine learning and key genes identification
To kick off the investigation, LASSO regression was employed as the initial step to filter potential cDEGs, yielding 14 key candidates through a meticulous feature selection process (Fig. 2A, B). The SVM-RFE algorithm was deployed, honing the selection to 16 genes that exhibited clear discriminative characteristics (Fig. 2D). The RF algorithm was also utilized for Further refinement, spotlighting 11 genes based on their variable importance scores (Fig. 2C, F). A Venn diagram-based intersection analysis highlighted 7 hub genes overlapping across all three methods: IRS1, CLEC7A, ALB, RSPO3, PCGF6, and SLC25A19 (Fig. 2E). Finally, Friends analysis assessed functional and regulatory relevance, ultimately prioritizing IRS1, CLEC7A, ALB, and RSPO3 as critical candidates due to their striking biological and statistical significance (Fig. 2G). This layered strategy methodically streamlined the gene pool, ensuring reliability through cross-validation across multiple algorithms.
Fig. 2.
A, B LASSO regression algorithm. D SVM-RFE algorithm. C, F RF algorithm. E A Venn diagram shows the genes that intersect and are common to three algorithms. G The raincloud plot displays the correlations of four key genes in Friend’s analysis
Expression values of CLEC7A, IRS1, RSPO3 and ALB
Based on the TCGA database, we screened the TNBC tumor samples and paraneoplastic normal tissue samples, and after excluding samples with incomplete clinical information, we finally identified 113 TNBC tumor samples and paired 113 normal tissue samples. Subsequently, we downloaded the gene expression matrix and clinical data such as age, gender, menopausal status, pathological type and stage, survival status, and survival time, and used the R language to complete data cleaning and analysis.
The study assessed the expression levels of four key genes in the TCGA dataset. Paired scatter plots showed that CLEC7A expression was significantly up-regulated in TNBC tumor tissues (Fig. 3D), while the expression levels of IRS1, RSPO3, and ALB were significantly reduced compared with normal tissues (Fig. 3A–C).
Fig. 3.
A–D The paired-sample plots illustrate the expression levels of ALB, IRS1, RSPO3, and CLEC7A in TNBC tissues versus matched normal tissues from the TCGA dataset, with statistical significance denoted by *p < 0.05; **p < 0.01, and ***p < 0.001. E A co-expression heatmap, generated using TCGA data, visualizes the relationship between ALB, IRS1, RSPO3, and CLEC7A genes in TNBC. F A chord diagram, also based on TCGA data, depicts the gene expression patterns of ALB, IRS1, RSPO3, and CLEC7A in TNBC. G A circos plot highlights the chromosomal locations of ALB, IRS1, RSPO3, and CLEC7A. H A clinical heatmap demonstrates the correlation between the expression of these genes and key pathological features, including T stage, N-stage, M stage, and age
By constructing a co-expression heat map of the four genes, the researchers observed significant interaction relationships (Fig. 3E) showed that CLEC7A was positively correlated with RSPO3 and negatively correlated with ALB and IRS1, and the chord plot further revealed the strength of the four associations (Fig. 3F). The loop diagram pinpointed the location of the key genes on the chromosome (Fig. 3G), while the clinical heat map visualized the association between CLEC7A, RSPO3, IRS1, ALB, and clinical features such as pathological T-stage, N-stage, M-stage, and age (Fig. 3H).
Clinical values of CLEC7A, IRS1, RSPO3 and ALB
To investigate the utility of CLEC7A, IRS1, RSPO3 and ALB gene expression in the diagnosis of TNBC, the researchers dug deeper into the TCGA-TNBC cohort (n = 116), and for each gene, plotted the ROC curves to assess their diagnostic potential, which showed that the AUC values were 0.857, 0.910, 0.912, and 0.867, respectively, which all exceeded 0.6, suggesting that they each exhibit diagnostic significance.
Not to be outdone by examining the role of these four genes in prognostic prediction, the researchers ran the data in the R environment, focusing on OS, and constructed K-M survival curves, dividing the high- and low-expression groups using optimized cutoff values, examining the risk ratios and the 95% confidence intervals, and the results of the analyses revealed that all four genes acted as reliable prognostic biomarkers (p < 0.0001). To summarize, the study observed that high expression of CLEC7A in TNBC was associated with worse OS, whereas low expression of IRS1, RSPO3, and ALB predicted shorter survival in TNBC patients.
The relationship between CLEC7A, IRS1, RSPO3 and ALB and immune cells
Immunotherapy is gradually becoming a key component in the treatment of triple-negative breast cancer [21]. Numerous studies have confirmed that patients with higher levels of tumor-infiltrating lymphocytes usually have a superior clinical prognosis and prolonged survival [22]. We analyzed the predictive role of key genes on immunotherapy response by the TCGA-TNBC cohort, and stacked bar charts (Fig. 4A–D) with box line plots (Fig. 4E–H) demonstrated the proportional distributions of 22 immune cells in the high and low expression groups for CLEC7A, IRS1, RSPO3, and ALB, respectively, The data showed that CLEC7A positively correlated with M1-type macrophages and negatively correlated with plasma cells and M2-type macrophages, IRS1 had a significant positive correlation with resting memory CD4 + T cells, while RSPO3 positively correlated with monocytes, activated memory CD4 + T cells and initial B cells and negatively correlated with follicular helper T cells and activated natural killer cells, and ALB correlated with CD8 + T cells and monocytes, and negatively correlated with M2-type macrophages. These results suggest that CLEC7A may be a potential immunotherapeutic target by regulating the direction of macrophage polarization (promoting M1 polarization and inhibiting M2 polarization), and its high expression may indicate better immunotherapeutic effects. The long-term inactivation of resting memory CD4 + T cells in the immunosuppressive microenvironment may lead to tumor immune escape, and the high expression of the IRS1 gene may imply that the T cell function in the tumor microenvironment is depleted or suppressed, which in turn may affect the clinical effect of immunotherapy. The function of the RSPO3 gene may be microenvironment-specific, and the Wnt signaling pathway regulated by the gene may play an essential role in the balance between T cell activation and natural killer cell inhibition. Cell activation and natural killer cell inhibition need to be explored in depth. As a key effector cell in the anti-tumor immune response, it is well established that increased CD8 + T cell infiltration significantly correlates with improved patient prognosis and enhanced immunotherapy response. It is thought that the ALB gene may indirectly promote CD8 + T cell tumor infiltration through pathways such as influencing the metabolic microenvironment or modulating the secretion of inflammatory cytokines.
Fig. 4.
A–D Stacked bar charts illustrate the proportion of infiltration by 22 immune cell types in groups with high and low levels of those same 22 immune cell types within TNBC tumors. Note *p < 0.05; **p < 0.01; ***p < 0.001, with p < 0.05 considered statistically significant. Low expression of CLEC7A, IRS1, RSPO3, and ALB. E–H Box plots depict the correlation between CLEC7A, IRS1, RSPO3, and ALB and the infiltration
These findings highlight the immunomodulatory role of these biomarkers, offering the potential for patient stratification in personalized immunotherapy and revealing intricate gene-immune interactions in TNBC microenvironments.
TF and ceRNA regulatory networks construction of CLEC7A, IRS1, RSPO3, and ALB
By integrating data from the target, CHIP_atlas, and KnockTF repositories, we pinpointed two key transcriptional regulators, FOXA1 and SP1, that specifically influence the ALB gene. Applying the same analytical framework across these databases, we also identified BCL6 as the upstream regulator for CLEC7A and SNAI2 as the primary transcription factor governing RSPO3 expression. A thorough cross-referencing process involving KnockTF, CHIP_atlas, and ENCODE resources uncovered 10 shared upstream regulators for IRS1, namely NR2F2, TBL1XR1, MYC, ZNF143, FOXA1, SP1, TEAD4, USF1, ETS1, and E2F6. This led to systematically mapping regulatory networks that detail TF-DNA interactions among CLEC7A, IRS1, RSPO3, ALB, and the 12 identified transcription factors.
In the next phase of our study, we leveraged the miRWalk, TargetScan, and miRDB platforms to predict miRNA regulators for each target gene, resulting in detailed miRNA-mRNA interaction profiles. The analysis highlighted varying degrees of regulatory complexity: CLEC7A was Linked to 15 distinct miRNAs (hsa-miR-30c-2-3p, hsa-miR-30b-3p, hsa-miR-149-3p, hsa-miR-377-5p, hsa-miR-4284, hsa-miR-3689a-3p, hsa-miR-4728-5p, hsa-miR-6086, hsa-miR-6779-5p, hsa-miR-6780a-5p, hsa-miR-6785-5p, hsa-miR-6788-5p, hsa-miR-1273 h-5p, hsa-miR-3689b-3p, hsa-miR-689c), IRS1 to 8 (hsa-miR-492, hsa-miR-148a-3p, hsa-miR-1277-5p, hsa-miR-148a-3p, hsa-miR-1277-5p, hsa-miR-4789-3p, hsa-miR-5011-5p, hsa-miR-892c-3p, hsa-miR-6867-5p, hsa-miR-8485), and RSPO3 to 6 (hsa-miR-107, hsa-miR-216a-3p, hsa-miR-15b-5p, hsa-miR-27b-3p, hsa-miR-195-5p, hsa-miR-613), while ALB exhibited only one (hsa-miR-492) miRNA interaction. These insights facilitated the construction of a ceRNA network, visually represented through a Sankey diagram. Validation of circRNAs components using CircBase revealed diverse regulatory capacities: CLEC7A-associated networks included 1 validated circRNA (hsa-circ-0041948), IRS1 networks featured 3 (hsa-circ-0069969, hsa-circ-0041949, hsa-circ-0041947), RSPO3 networks contained 4 (hsa-circ-0069968, hsa-circ-0130607, hsa-circ-0077801, hsa-circ-0069970), and ALB networks showcased 5 functionally significant circRNA components (hsa-circ-0069967, hsa-circ-0130606, hsa-circ-0107702, hsa-circ-0003625, hsa-circ-0041946). This study elucidated the precise regulation of genes such as CLEC7A, IRS1, RSPO3, and ALB during transcription and subsequent modification, utilizing a multilevel approach including TF-DNA network analysis, miRNA-mRNA interaction network, and CeRNA network analysis. These regulatory hotspots are closely associated with important pathways such as glycolysis, insulin signaling, and lipid metabolism, and the analysis of the experimental data shows remarkable consistency with the metabolic reprogramming phenomenon, which lays the foundation for subsequent functional validation experiments, such as testing metabolic phenotypes by knocking down TFs or miRNAs.
IHC analysis result
IHC was used with the aim of comparing the expression levels of key genes in TNBC tumor tissues with those in normal breast tissues, revealing that CLEC7A, which is mainly localized in the cytoplasm and cell membranes, showed significantly enhanced positive expression in TNBC tumors. In contrast, IRS1, RSPO3, and ALB showed down-regulated expression in tumor tissues. Specifically, IRS1 was mainly distributed in the nucleus and cytoplasm, RSPO3 was located in the cell membrane, and ALB was primarily expressed in the cytoplasm and cell membrane, with particular staining on the cell membrane. The experimental results corroborated the conclusions of this study, and the specific results are shown in Fig. 5.
Fig. 5.
The immunohistochemistry (IHC) findings from the TNBC samples were magnified 40 × under a reverse microscope. The analysis reveals that CLEC7A is up-regulated in the tumor tissue, whereas IRS1, RSPO3, and ALB show down-regulated expression. Note * signifies p < 0.05; ** signifies p < 0.01, and *** signifies p < 0.001, with p < 0.05 indicating statistical significance
Discussion
TNBC is widely recognized owing to its rapid progression and less favorable prognosis relative to alternative breast cancer classifications [23] Emerging evidence highlights metabolic reprogramming as a critical driver of TNBC progression, enabling the tumor to meet its bioenergetic and biosynthetic needs, maintain redox balance, and fuel oncogenic signaling, proliferation, and metastasis [24] Advances in computational technologies, including machine learning and network pharmacology, have revolutionized the identification of illness biomarkers and treatment objectives [25]. Leveraging transcriptomic and clinical data from the GEO and TCGA databases, we employed machine learning algorithms alongside Friend analysis to pinpoint four key metabolic reprogramming genes—CLEC7A, IRS1, RSPO3, and ALB—which were subsequently validated across additional datasets. We assessed their clinical relevance, expression patterns, and potential through comprehensive analyses.
CLEC7A, a pattern recognition receptor involved in innate immunity [26] has recently been implicated in immune homeostasis and cancer [27]. However, its role in metabolic reprogramming within tumors remains underexplored [28]. Our findings reveal elevated CLEC7A expression in TNBC tissues, suggesting its potential as both a diagnostic marker and a prognostic indicator. IRS1, a central mediator of insulin [29] and growth factor receptor signaling [30], has been linked to breast cancer progression [31]. We observed reduced IRS1 expression in TNBC, consistent with prior studies, correlating with higher proliferation rates in dedifferentiated tumors. RSPO3, a secreted protein with oncogenic roles in various cancers [32, 33], has been extensively studied in colorectal cancer [34] but remains poorly understood in TNBC. Eline J et al. [35] found that RSPO3 copy number amplifications occur in about 25% of breast cancer patients with worse outcomes, highlighting its potential as a therapeutic target. Meanwhile, Caitlin B. Conboy et al. [36] linked RSPO2 expression to a distinct subset of TNBC. However, what character of RSPO3 plays in TNBC remains unexplored. Our analysis showed down-regulated RSPO3 expression in TNBC tissues, positioning it as a potential indicator for diagnosis and prognosis, along with a therapeutic focus. ALB, encoding albumin, plays multifaceted roles in plasma homeostasis, nutrient transport, and antioxidant defense [37]. It is linked to metabolic changes in hepatocellular carcinoma and is inversely associated with colorectal cancer risk [38]. Jia et al. [39] emphasize its predictive value in endometrial cancer, while Chen et al. [40] note its role in immune activation and reduced levels in breast tumors, indicating therapeutic potential. Despite unclear implications in breast cancer, our study confirms ALB’s low expression in TNBC, supporting its utility as a diagnostic and prognostic target.
Our integrative approach identified CLEC7A, IRS1, RSPO3, and ALB as key metabolic reprogramming genes in TNBC, validated as novel diagnostic and prognostic biomarkers. Their involvement in immune regulation, insulin signaling, and metabolic homeostasis aligns with established pathways driving TNBC’s aggressive behavior. However, our study is not without limitations. Variations across databases may introduce bias, and the functional roles of these genes in invasion and metastasis remain to be explored. Additionally, in vitro and in vivo trials are crucial to validate our findings and enhance their reliability. Despite these challenges, our work provides critical insights into the functions of these genes in TNBC and establishes a base for precision diagnostics and targeted therapies. Future research should focus on experimental validation to elucidate their mechanisms in tumor progression and therapeutic resistance, ultimately advancing TNBC treatment strategies.
Conclusion
To identify promising targets for TNBC, this study integrates RRA, machine learning algorithms, and Friends analysis. Initial filtering identified several genes, which were then subjected to IHC for preliminary validation: CLEC7A, IRS1, RSPO3, and ALB, which may serve as novel biomarkers for the diagnosis and prognosis of TNBC, hinting at their potential therapeutic applications. Although further in vitro and in vivo studies are needed to unraveling the regulatory mechanisms of CLEC7A, IRS1, RSPO3, and ALB expression, their strong diagnostic and prognostic properties, as well as their associated biological pathways, make these genes promising candidates for precision oncology approaches. In future, it will be crucial to explore their mechanisms of action in depth.
Supplementary Information
Below is the link to the electronic supplementary material.
Acknowledgements
I am deeply grateful for the invaluable guidance and assistance from my mentor, colleagues, and fellow staff in the Pathology Department of Xuzhou Central Hospital throughout the research and manuscript preparation. I would also like to thank Xuzhou Central Hospital and the Xuzhou Science and Technology Bureau for their financial support.
Author contributions
H.K. and L.S. contributed to the conceptualization and methodology. H.K., M.P., and L.S. handled software, validation, data curation, and visualization. H.K. led the formal analysis, investigation, and original draft writing. M.P. and H.K. reviewed and edited the manuscript. L.S. supervised and managed the project, securing funding. All authors approved the final manuscript.
Funding
The study received financial support from the Xuzhou Science and Technology Bureau and Xuzhou Central Hospital, Grant Number KC23186.
Data availability
The data that support the findings of this study are available in PubMed at https://pubmed.ncbi.nlm.nih.gov/, reference numbers [7, 8]. These data were derived from the following resources available in the public domain: GSE38959 (https://pubmed.ncbi.nlm.nih.gov/23254957/), GSE53752 (https://pubmed.ncbi.nlm.nih.gov/23049873/), and GSE32541 (https://pubmed.ncbi.nlm.nih.gov/22553414/).
Declarations
Competing interests
The authors declare that they have no competing interests.
Ethical approval
All bioinformatics data incorporated in this study were sourced from publicly accessible databases. There is no moral, legal, or relevant conflicts of interest. The patients involved in these databases have obtained ethical approval. The experimental data do not involve the leakage of patients’ personal information and do not harm the patients’ interests. This study received approval from the Biomedical Research Ethics Review Committee of Xuzhou Central Hospital (Version: V1.0; Approval Date: 2024-01-01; Ethics Number: XZXY-LK-20240926-0150). This study complied with the Declaration of Helsinki and all applicable ethical guidelines.
Consent to participate
Informed consent was obtained from all subjects involved in the study.
Consent for publication
Not applicable.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Fu M, Peng Z, Wu M, et al. Current and future burden of breast cancer in Asia: a GLOBOCAN data analysis for 2022 and 2050. Breast. 2025;79:103835. 10.1016/j.breast.2024.103835. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Zhao S, Zuo WJ, Shao ZM, et al. Molecular subtypes and precision treatment of triple-negative breast cancer. Ann Transl Med. 2020;8(7):499. 10.21037/atm.2020.03.194. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Derakhshan F, Reis-Filho JS. Pathogenesis of triple-negative breast cancer. Annu Rev Pathol. 2022;17:181–204. 10.1146/annurev-pathol-042420-093238. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Li Y, Zhang H, Merkher Y, et al. Recent advances in therapeutic strategies for triple-negative breast cancer. J Hematol Oncol. 2022;15(1):121. 10.1186/s13045-022-01341-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Foldi J, Geyer CE Jr. Precision medicine for metastatic TNBC: the FUTURE is now. Cell Res. 2023;33(7):491–2. 10.1038/s41422-023-00815-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Liu D, Wang Y, Li X, et al. Participation of protein metabolism in cancer progression and its potential targeting for the management of cancer. Amino Acids. 2023;55(10):1223–46. 10.1007/s00726-023-03316-y. [DOI] [PubMed] [Google Scholar]
- 7.Komatsu M, Yoshimaru T, Matsuo T, et al. Molecular features of triple negative breast cancer cells by genome-wide gene expression profiling analysis. Int J Oncol. 2013;42(2):478–506. 10.3892/ijo.2012.1744. [DOI] [PubMed] [Google Scholar]
- 8.Kuo WH, Chang YY, Lai LC, et al. Molecular characteristics and metastasis predictor genes of triple-negative breast cancer: a clinical study of triple-negative breast carcinomas. PLoS ONE. 2012;7(9):e45831. 10.1371/journal.pone.0045831. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102(43):15545–50. 10.1073/pnas.0506580102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Blake JA, Dolan M, Drabkin H, et al. Gene Ontology annotations and resources. Nucleic Acids Res. 2013;41(Database issue):D530–5. 10.1093/nar/gks1050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Kanehisa M, Sato Y, Kawashima M, et al. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 2016;44(D1):D457–62. 10.1093/nar/gkv1070. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Yu G, Wang LG, Han Y, et al. Clusterprofiler: an R package for comparing biological themes among gene clusters. OMICS. 2012;16(5):284–7. 10.1089/omi.2011.0118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Chen H, Boutros PC. Venndiagram: a package for the generation of highly-customizable Venn and Euler diagrams in R. BMC Bioinform. 2011;12:35. 10.1186/1471-2105-12-35. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Lawrence KD, Arthur JL. Robust nonlinear regression. In: Robust regression. Routledge; 2019. pp. 59–86.
- 15.Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Series B Stat Methodol. 1996;58(1):267–88. 10.1111/j.2517-6161.1996.tb02080.x. [Google Scholar]
- 16.Sanz H, Valim C, Vegas E, et al. SVM-RFE: selection and visualization of the most relevant features through non-linear kernels. BMC Bioinform. 2018;19:1–18. 10.1186/s12859-018-2451-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Breiman L. Random forests. Mach Learn. 2001;45:5–32. 10.1023/A:1010933404324. [Google Scholar]
- 18.Duan Y, Ni S, Zhao K, et al. Immune cell infiltration and the genes associated with ligamentum flavum hypertrophy: identification and validation. Front Cell Dev Biol. 2022;10:914781. 10.3389/fcell.2022.914781. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Newman AM, Liu CL, Green MR, et al. Robust enumeration of cell subsets from tissue expression profiles. Nat Methods. 2015;12(5):453–7. 10.1038/nmeth.3337. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Hussaini HM, Seo B, Rich AM. Immunohistochemistry and Immunofluorescence. Methods Mol Biol. 2023;2588:439–50. 10.1007/978-1-0716-2780-8_26. [DOI] [PubMed] [Google Scholar]
- 21.Shi Y, Guo Z, Wang Q, et al. Prognostic value of tumor-infiltrating lymphocyte subtypes and microorganisms in triple-negative breast cancer. J Cancer Res Ther. 2024;20(7):1983–90. 10.4103/jcrt.jcrt_41_24. [DOI] [PubMed] [Google Scholar]
- 22.Xiong W, Li C, Wan B, et al. N6-methyladenosine regulator-mediated immue patterns and tumor microenvironment infiltration characterization in glioblastoma. Front Immunol. 2022;13:819080. 10.3389/fimmu.2022.819080. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Leon-Ferre RA, Goetz MP. Advances in systemic therapies for triple negative breast cancer. BMJ. 2023;381:e071674. 10.1136/bmj-2022-071674. [DOI] [PubMed] [Google Scholar]
- 24.Wang Z, Jiang Q, Dong C. Metabolic reprogramming in triple-negative breast cancer. Cancer Biol Med. 2020;17(1):44–59. 10.20892/j.issn.2095-3941.2019.0210. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Swanson K, Wu E, Zhang A, et al. From patterns to patients: advances in clinical machine learning for cancer diagnosis, prognosis, and treatment. Cell. 2023;186(8):1772–91. 10.1016/j.cell.2023.01.035. [DOI] [PubMed] [Google Scholar]
- 26.Patel SJ, Sanjana NE, Kishton RJ, et al. Identification of essential genes for cancer immunotherapy. Nature. 2017;548(7669):537–42. 10.1038/nature23477. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Dambuza IM, Brown GD. C-type lectins in immunity: recent developments. Curr Opin Immunol. 2015;32:21–7. 10.1016/j.coi.2014.12.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Kalia N, Singh J, Kaur M. The role of dectin-1 in health and disease. Immunobiology. 2021;226(2):152071. 10.1016/j.imbio.2021.152071. [DOI] [PubMed] [Google Scholar]
- 29.Sun XJ, Rothenberg P, Kahn CR, et al. Structure of the insulin receptor substrate IRS-1 defines a unique signal transduction protein. Nature. 1991;352(6330):73–7. 10.1038/352073a0. [DOI] [PubMed] [Google Scholar]
- 30.Duggan C, Baumgartner RN, Baumgartner KB, et al. Genetic variation in TNFα, PPARγ, and IRS-1 genes, and their association with breast-cancer survival in the HEAL cohort. Breast Cancer Res Treat. 2018;168(2):567–76. 10.1007/s10549-017-4621-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Schnarr B, Strunz K, Ohsam J, et al. Down-regulation of insulin-like growth factor-I receptor and insulin receptor substrate-1 expression in advanced human breast cancer. Int J Cancer. 2000;89(6):506–13. 10.1002/1097-0215(20001120)89:6%3c506::aid-ijc7%3e3.0.co;2-f. [DOI] [PubMed] [Google Scholar]
- 32.Liu TT, Shi X, Hu HW, et al. Endothelial cell-derived RSPO3 activates Gαi1/3-Erk signaling and protects neurons from ischemia/reperfusion injury. Cell Death Dis. 2023;14(10):654. 10.1038/s41419-023-06176-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Gu H, Tu H, Liu L, et al. RSPO3 is a marker candidate for predicting tumor aggressiveness in ovarian cancer. Ann Transl Med. 2020;8(21):1351. 10.21037/atm-20-3731. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Ter Steege EJ, Doornbos LW, Haughton PD, et al. R-spondin-3 promotes proliferation and invasion of breast cancer cells independently of Wnt signaling. Cancer Lett. 2023;568:216301. 10.1016/j.canlet.2023.216301. [DOI] [PubMed] [Google Scholar]
- 35.Ter Steege EJ, Boer M, Timmer NC, et al. R-spondin-3 is an oncogenic driver of poorly differentiated invasive breast cancer. J Pathol. 2022;258(3):289–99. 10.1002/path.5999. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Conboy CB, Vélez-Reyes GL, Rathe SK, et al. R-spondins 2 and 3 are overexpressed in a subset of human colon and breast cancers. DNA Cell Biol. 2021;40(1):70–9. 10.1089/dna.2020.5585. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Taverna M, Marie AL, Mira JP, et al. Specific antioxidant properties of human serum albumin. Ann Intensive Care. 2013;3(1):4. 10.1186/2110-5820-3-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Wheeler DA, Roberts LR. Comprehensive and integrative genomic characterization of hepatocellular carcinoma. Cell. 2017;169(7):1327. 10.1016/j.cell.2017.05.046. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Lei J, Wang Y, Guo X, et al. Low preoperative serum ALB level is independently associated with poor overall survival in endometrial cancer patients. Future Oncol. 2020;16(8):307–16. 10.2217/fon-2019-0732. [DOI] [PubMed] [Google Scholar]
- 40.Chen L, Wei W, Sun J, et al. Cordycepin enhances anti-tumor immunity in breast cancer by enhanceing ALB expression. Heliyon. 2024;10(9):e29903. 10.1016/j.heliyon.2024.e29903. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data that support the findings of this study are available in PubMed at https://pubmed.ncbi.nlm.nih.gov/, reference numbers [7, 8]. These data were derived from the following resources available in the public domain: GSE38959 (https://pubmed.ncbi.nlm.nih.gov/23254957/), GSE53752 (https://pubmed.ncbi.nlm.nih.gov/23049873/), and GSE32541 (https://pubmed.ncbi.nlm.nih.gov/22553414/).





