Skip to main content
Discover Oncology logoLink to Discover Oncology
. 2025 Nov 3;16:2004. doi: 10.1007/s12672-025-03850-z

Integrated analysis of single-cell and bulk RNA sequencing identifies GSTM4 as a tumor suppressor and prognostic biomarker in breast cancer

Zhen Yu 1, Weikang Yun 2, Qian Zhang 1, Xianglan Li 1,, Chunbo Zhao 1,, Xu Liu 1,
PMCID: PMC12583255  PMID: 41182420

Abstract

Background

Breast cancer (BC) continues to be a predominant cause of cancer-related deaths among women globally, highlighting the complexity of the disease, its propensity for metastasis, and resistance to conventional therapies. The discovery of clinical biomarkers and therapeutic targets is crucial for advancing BC treatment strategies.

Methods

We conducted a Transcriptome-Wide Association Study (TWAS) and Weighted Gene Co-expression Network Analysis (WGCNA) to pinpoint genes associated with BC, with GSTM4 emerging as a candidate tumor suppressor. To further elucidate GSTM4’s role, we analyzed its mRNA and protein expression using multiple databases. The prognostic significance of GSTM4 was evaluated using KM-Plotter tools, while its epigenetic profile was examined through GSCA. Protein–Protein Interaction (PPI) networks were constructed with GeneMANIA and STRING to explore GSTM4’s functional interactions. The ssGSEA algorithm and TIMER were employed to link GSTM4 with the immune context of BC. Single-cell RNA sequencing from GSE148673 was analyzed to investigate GSTM4’s influence on the tumor immune microenvironment. In vitro experiments were designed to assess GSTM4’s impact on BC cell behavior.

Results

GSTM4 expression is diminished in BC tissues and is positively linked to better patient outcomes. Epigenetic studies indicate that GSTM4 silencing may partly result from promoter deletion mutation. The PPI network and enrichment analysis are consistent with GSTM4’s multifaceted role in glutathione metabolism, detoxification, antioxidant defense, and oxidoreductase reactions. GSTM4 expression shows a negative correlation with immune checkpoint gene expression and is associated with an enhanced antitumor immune response in BC as well as increased sensitivity to anticancer drugs. Single-cell RNA sequencing analysis suggests a association of GSTM4 with the prevention of epithelial transformation, the mitigation of BC cell malignancy, and the promotion of tumor microenvironment interactions. In vitro studies confirm GSTM4’s inhibitory effects on BC cell proliferation, invasion, and stemness.

Conclusion

Integrated analyses identify GSTM4 as a potential tumor suppressor and prognostic biomarker in BC, suggesting its promise for advancing therapies.

Supplementary Information

The online version contains supplementary material available at 10.1007/s12672-025-03850-z.

Keywords: GSTM4, Breast cancer, Tumor suppressor, scRNA-seq, Prognostic biomarker

Introduction

Breast cancer (BC) has emerged as the most common cancer among women worldwide, with a significant increase in both incidence and mortality rates. In the United States alone, an estimated 297,790 new BC cases were diagnosed and 43,170 patients died of BC in 2023, highlighting its prevalence and severity [1]. In recent years, with the continuous improvement of diagnosis and therapy, the treatment of BC has made great progress. However, the heterogeneity of BC presents a formidable clinical challenge, as patients with similar clinical stages and pathological grades can have vastly different outcomes [2]. Meanwhile, the prognosis of patients with advanced and treatment-resistant BC is still unsatisfactory. Thus, the identification of reliable tumor markers and novel therapeutic targets to halt BC progression is of great significance.

Glutathione S-transferase Mu 4 (GSTM4) belongs to the µ subfamily of the glutathione S-transferases (GSTs), a family of enzymes that specialize in detoxification. These enzymes facilitate the binding of glutathione to various agents, including oncogenes, drugs, toxic materials, and oxidative stress byproducts, thereby diminishing their potential to cause cellular damage [3]. Dysfunction of GSTM4 is linked to a spectrum of human diseases. Research indicates that GSTM4 plays a role in the production of maresin conjugates through Tissue Regeneration 1 (MCTR1), potentially providing hepatoprotection and anti-inflammatory benefits in nonalcoholic fatty liver disease [4]. GSTM4 has been shown to mitigate the progression of endometriosis [5]. Additionally, it is involved in tumorigenesis and acts as a tumor suppressor in hepatoma and pancreatic cancer [3, 6]. GSTM4 is situated on chromosome 1p13.3, in proximity to the protective allele rs17024629. A significant correlation exists between the rs17024629 variant and the elevated expression of GSTM4, which is positioned 51 kb downstream of the SNP [7]. This suggests that GSTM4 may play an anti-tumor role in BC. However, the precise expression patterns and biological functions of GSTM4 in BC are not yet fully elucidated.

In this research, we utilized Transcriptome-Wide Association Study (TWAS) and Weighted Gene Coexpression Network Analysis (WGCNA) to pinpoint novel BC risk genes, identifying GSTM4 as a low-risk gene with tumor suppressor potential. Subsequent analyses of GSTM4’s expression, prognostic significance, epigenetic regulation, protein–protein interactions, biological functions, immune context, and drug sensitivity in BC, based on RNA-seq and scRNA-seq data from multiple BC databases, further supported this finding. Validation through western blot in our BC samples, along with in vitro assays, confirmed GSTM4’s role in inhibiting BC cell proliferation, migration, invasion, and stemness. Therefore, by integrating bioinformatic discovery with mechanistic exploration, immune microenvironment analysis, and experimental assessment, this multi-omics study defines the tumor-suppressive function of GSTM4 and proposes its potential as a prognostic biomarker and therapeutic target in breast cancer.

Materials and methods

Datasets

The RNA expression data and clinical information of 1098 BRCA samples as well as 113 adjacent control tissues were retrieved from The Cancer Genome Atlas (TCGA) database (https://www.cancer.gov/tcga) and transferred to Transcripts Per Kilobase Million (TPM) for further analysis [8]. During survival analysis, after removing samples with a survival time of 0 days, 1076 BRCA samples remained.

Mendelian randomized data: (1) GWAS data of breast cancer were obtained from BCAC Consortium, which included BC cases and 105,974 controls; (2) SNP weights from BC tissues and normal breast tissues were retrieved from TWAS FUSION (http://gusevlab.org/projects/fusion/#reference-functional-data).

Transcriptome-wide association study (TWAS)

TWAS was performed using FUSION software with default settings according to previous study reported [9]. The colocalization examination, along with the TWAS examination, offers information regarding a common causal variant linking the anticipated functional characteristic and the trait. The coloc R package was utilized for colocalization analysis, employing a 0.5 Mb window, to assess the co-occurrence of GWAS and TWAS associations with a causal SNP among all genes that exhibited transcriptome-wide significance (p < 0.05) and an effect size cutoff of |log₂FC| >1. Additionally, FUSION was employed to estimate the posterior probability (PP) of shared causal SNPs between GWAS and TWAS associations.

Constructing co-expression networks

Differentially expressed genes (DEGs) were first identified using a significance threshold of |log₂FC| >1 with an adjusted p-value (adj.p) < 0.05. Subsequently, we constructed the gene co-expression network using the “WGCNA” package in R software, following the protocol established in previous studies [10]. The analysis employed the following key parameters: (1) a soft thresholding power of 6 was selected based on scale-free topology criterion analysis to ensure the network exhibited scale-free properties; (2) module detection was performed using the dynamic tree cut algorithm with a minimum module size of 30 genes to maintain biological relevance; (3) gene co-expression relationships were quantified using the Topological Overlap Matrix (TOM), which served as the basis for module partitioning; and (4) all correlation analyses were calculated using Spearman’s correlation method, and correlations with p-values < 0.05 were considered statistically significant. These well-defined parameters and standardized procedures were implemented to ensure the reproducibility and reliability of our results.

Prognosis analysis

Using deep forest plots, the “forestplot” Rprogram was used to perform univariate cox regression analysis and display p-values. Survival analysis of target genes in the TCGA dataset included overall survival (OS), disease-specific survival (DSS), and progression-free survival (PFI), using KM survival analysis and assessed by Log-rank test. The KM-Plotter database (https://kmplot.com/analysis/) was applied to further verify the relationship between GSTM4, distal metastase-free survival (DMFS) and relapse-free survival (RFS) of BC.

UALCAN database

UALCAN database (https://ualcan.path.uab.edu/analysis.html) was used to explore the protein expression difference of GSTM4 in BC tissues and normal mammary gland.

Human protein atlas (HPA) database

Immunohistochemistry staining images of GSTM4 were obtained from the Human Protein Atlas (HPA) database (https://www.proteinatlas.org/), specifically using antibodies HPA048652 and HPA055973. The selected representative images correspond to immunohistochemical staining in ductal carcinoma tissues.

GeneMANIA and STRING database

The GeneMANIA database (http://genemania.org/) was used to explore and visualize the top 20 proteins interacting with GSTM4 proteins. The analysis was performed using the database’s default threshold settings. To construct a Protein–Protein Interaction (PPI) network, the STRING database (https://string-db.org/) was employed, with an interaction score threshold set to > 0.4 to define meaningful interactions.

Gene set enrichment analysis (GSEA)

The TCGA-BRCA dataset was divided into high risk and low risk groups according to the GSTM4 risk score. We then performed gene set enrichment analysis (GSEA) to explore the pathway of significant enrichment in the high- and low-risk groups of GSTM4. The initial phase involved a thorough differential analysis comparing the gene expression profiles across these two groups, followed by a ranking of genes based on their log2-fold change in expression. Subsequently, we employed GSEA (version 3.0) to scrutinize the signal transduction pathways that are associated with the GSTM4 high- and low-risk score groups. Pathways that emerged as significantly enriched in either group were subjected to stringent statistical thresholds, with a normalized enrichment score (NES) of at least 1 and a p-value < 0.05 and FDR < 0.25. The molecular signature dataset used for the KEGG pathway was “c2.cp.KEGG.v7.4.symbols.gmt”. One thousand permutations were used in the GSEA with gene set permutations.

GSTM4 gene mutations analysis

Gene Set Cancer Analysis (GSCA) database (http://bioinfo.life.hust.edu.cn/GSCA/#/) was used to investigate the main factors affecting the aberrant low expression of GSTM4 in BC and the relationships among GSTM4 mRNA expression, copy number variation, and mutations in TCGA-BRCA samples.

Prediction of Immunogenomic landscape

The single sample gene set enrichment analysis (ssGSEA) of R package “GSVA” was used to analyze the correlation between GSTM4 and infiltration of immune cell types. TIMER database (https://cistrome.shinyapps.io/timer/) was used to explore the correlation between GSTM4 expression and tumor infiltrates immune cells.

Single-cell RNA (scRNA) analysis

The single-cell sequencing data of six breast cancer samples were obtained from the GEO database (GSE148673) [11]. Raw data were processed using the Seurat R package in RStudio, where cells with fewer than 200 genes, more than 4000 genes, or mitochondrial gene content exceeding 15% were filtered out. Batch effects were corrected using the Harmony R package, and the top 3000 highly variable genes were identified using the `FindVariableFeatures` function. Cell subpopulations were manually annotated based on marker gene expression, and UMAP was employed for dimensionality reduction and visualization. To analyze intercellular communication, the CellChat package was utilized, with ligand-receptor interactions inferred based on a curated database of known interactions. Signaling pathways were quantified by calculating communication probabilities and network centrality measures (e.g., outgoing, incoming, and overall communication strength) for each cell type. Additionally, pathway enrichment analysis was performed using the clusterProfiler package to identify significantly enriched pathways in the context of intercellular communication. Statistical significance was assessed through permutation testing (n = 1000 permutations) to ensure robustness, and results were visualized using heatmaps, circle plots, and network diagrams to enhance interpretability and reproducibility.

Human breast cancer specimen and cells

Human breast cancer specimens and adjacent normal breast specimens were collected from 12 diagnosed BC patients without neoadjuvant therapy during surgery at the Harbin Medical University Cancer Hospital. The snap-frozen tissue specimens, stored at − 80 °C in liquid nitrogen, were used for protein extraction and subsequent validation by Western blotting. Written informed consent for using Clinical specimens and data in this study was obtained from each patient. This study was approved by the ethics committee of the Harbin Medical University Cancer Hospital.

All the cells used in this study were purchased from Cell Bank of the Chinese Academy of Sciences (Shanghai, China). MCF-7 were cultured in MEM with 10% FBS. MDA-MB-231 cells were maintained in L-15 with 10% FBS.

Cell transfection

The lentiviral vectors for GSTM4 overexpression and knockdown were designed and constructed by GenePharma (Shanghai, China); viral packaging was performed using the HEK293T cell line through co-transfection with a three-plasmid system; polybrene was added during transduction at a final concentration of 8 µg/mL to enhance infection efficiency; at 24 h post-transfection, puromycin selection was applied for 7 days (MDA-MB-231 cells: 2 µg/mL, MCF-7 cells: 1.5 µg/mL); detailed experimental parameters included: 8 µg of overexpression vector, 10 nM shRNA concentration, and Lipofectamine 2000 as the transfection reagent; finally, the transfection and knockdown/overexpression efficiency was validated by quantitative real-time PCR. The sequences of shRNA targeting GSTM4 were listed as follows:

GSTM4#sh1: GAAACTGAAGCCAGAATACTT

GSTM4#sh2: GCCCTGACTTTGAGAAACTGA

GSTM4#sh3: TATGGACGTCTCCAATCAGCT

RNA extraction and qPCR analysis

Total RNA was extracted from cultured cells with TRIzol reagent (Takara, Osaka, Japan). Subsequently, reverse transcription was performed using 1 µg of total RNA with the PrimeScript RT Reagent Kit with gDNA Eraser (Takara, Kyoto, Japan) to generate cDNA. Quantitative real-time PCR was then carried out using SYBR Premix ExTaq™ II (Takara) on the Applied Biosystems 7500 Fast Real-Time PCR System (ABI7500), following the manufacturer’s instructions. The relative RNA expression levels were quantified using the 2 − ΔΔCt method. β-Actin was used as the reference gene with the following primers:

F: ATGGCCACGGCTGCTTCCAGC (5ʹ → 3ʹ)

R: CATGGTGGTGCCGCCAGACAG (5ʹ → 3ʹ)

The primers for the target gene in qPCR reactions were listed as follows:

F: AGAGGAGAAGATTCGTGTGGA (5ʹ → 3ʹ)

R: TGCTGCATCATTGTAGGAAGTT (5ʹ → 3ʹ)

Cell proliferation assays

For the colony formation assay, the designated cells (1 × 103 cells per well) were seeded into six-well plates and allowed to proliferate for a period of 2 weeks. The cultured cells were then fixed using formaldehyde and stained with crystal violet. The FluorChem M system was employed to capture images, and ImageJ software was utilized to quantify the number of colonies formed.

The Cell Counting Kit-8 (CCK-8) assay was conducted as follows: cells were seeded at a density of 1 × 105 per well into 96-well plates and maintained in a cell incubator. Measurements were performed at 0, 24, 48, and 72 h. At each time point, 10 µl of CCK-8 reagent (Beyotime, China) was added, and the cells were incubated for an additional 2 h. The cells’ proliferative capacity was assessed by measuring the absorbance at 450 nm using a BioTek Synergy H1 multi-mode microplate reader (Model: ELx800). Statistical analysis was performed using GraphPad Prism 9.0 software. Cell viability data are presented as mean ± standard deviation, and between-group comparisons were conducted using two-way ANOVA followed by Bonferroni’s multiple comparison test.

The 5-ethynyl-2′-deoxyuridine (EdU) staining assay was carried out according to the manufacturer’s protocol. Briefly, cells were seeded into 96-well plates at a density of 1 × 10⁴ cells per well and cultured until they reached approximately 60–70% confluence. The cells were then incubated with an EdU solution (Beyotime, China) for 2 h. Subsequently, nuclear staining was performed using DAPI staining solution (Beyotime, Catalog No.: C0078S) at a working concentration of 10 µM. Images were acquired using a Leica DMi8 inverted fluorescence microscope, with five randomly selected non-overlapping fields captured per sample for analysis. Cell counting was performed using a Bio-Rad TC20 automated cell counter and its accompanying software to ensure quantitative objectivity and reproducibility.

Cell migration and invasion assays

Wound healing assays were employed to assess the migratory potential of the cells. In a concise procedure, the specified cells were seeded onto six-well plates and incubated at 37 °C. Once the cells reached confluence, artificial wounds were introduced into the cell monolayer, and the wound closure was monitored at 0, 24, and 48 h. Microscopic images were captured, and the migration rate was quantified using the following formula: The percentage of wound closure (%) = (Initial wound width − Final wound width at 24–48 h)/Initial wound width × 100%.

The Matrigel-coated transwell assay was conducted to evaluate the invasive capacity of the cells. The indicated cells were resuspended in serum-free medium at a density of 5 × 10⁴ cells per well (for 24-well plates) and 200 µl of the cell suspension was introduced into the upper chambers of transwells (8.0 μm, Corning, United States). The upper chambers had been pre-coated with Corning Matrigel matrix, which was diluted at a 1:8 ratio using serum-free medium. The chambers were subsequently placed into 24-well plates containing 600 µl of medium supplemented with 10% FBS and incubated at 37 °C for 24 h. Post-incubation, non-invading cells on the upper surface were removed, and the invading cells on the lower surface were fixed with 4% paraformaldehyde and stained with 1% crystal violet. Images were acquired using a Nikon ECLIPSE Ts2 inverted microscope, employing a systematic random sampling method to capture five non-overlapping fields per sample. Image J software was utilized to enumerate the number of invaded cells.

Sphere-formation assay

The tumorsphere-formation assay was employed to investigate the stemness and self-renewal capabilities of the cells. Cells were seeded into ultra-low-attachment 24-well plates (Corning) at a density of 1 × 103 cells per well, using a stem cell medium composed of DMEM/F-12 (Gibco, United States), supplemented with 1 × B27 (Invitrogen), 20 ng/ml epidermal growth factor (EGF, Invitrogen), 20 ng/mL basic fibroblast growth factor (bFGF, Invitrogen), and 2 mM L-glutamine (Invitrogen). The cells were then cultured in a cell incubator for 2 weeks. Subsequently, image acquisition was performed using a Nikon ECLIPSE Ts2 inverted microscope, employing a systematic random sampling method to capture ten non-overlapping fields per sample. Cell aggregates with a diameter ≥ 50 μm were defined as valid spheroids. The spheroid formation efficiency (%) was calculated as (number of valid spheroids formed/initial number of cells seeded) × 100%. The number of tumorspheres was quantified, and images were captured to document the results.

Western blotting assay

Western blotting was employed to analyze protein expression within cells and tissues. In brief, proteins were extracted from both cultured cells and tissue samples using RIPA lysis buffer (Beyotime, China), fortified with phosphatase and protease inhibitors. Equal amounts of protein (50 µg/lane for clinical samples and 20 µg/lane for cell samples) were separated via 10% sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) and transferred onto polyvinylidene fluoride (PVDF) membranes with a pore size of 0.22 μm. After blocking with 5% non-fat milk, the membranes were incubated overnight with specific primary antibodies diluted at 1:1000. Following primary antibody incubation, the membranes were probed with HUABIO goat anti-rabbit/mouse IgG secondary antibody (Catalog No.: HA1001) diluted at 1:50,000. Protein signals were visualized using the Tannon 5200 chemiluminescence imaging system via the ECL chemiluminescence method. The primary antibodies used were listed as follows:

GSTM4 (#16766-1-AP, Proteintech), β-actin (#66009-1-Ig, Proteintech).

Statistical analysis

Data were presented as the mean ± standard deviation (SD), derived from a minimum of three independent experiments. To evaluate the distribution of all datasets, a normality test was applied. Differences between two groups were assessed using Student’s t-tests, while one-way analysis of variance (ANOVA) was employed to discern intergroup disparities. All statistical analyses were performed using GraphPad Prism version 9.0, and P-values less than 0.05 were deemed statistically significant (*p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001, ns: p > 0.05).

Results

Identification of risk related genes in BC

The workflow of data preparation, processing, analysis, and validation were listed in Fig. 1. According to the Materials and methods section, we initially conducted a Transcriptome-Wide Association Study (TWAS) in BC datasets to identify genes associated with BC risk. Table S1 presents the identification of 1168 genes significantly associated with BC, including 555 high-risk genes (TWAS Z Score >0) and 613 low-risk genes (TWAS Z Score < 0). Additionally, we performed WGCNA on TCGA BRCA database to screen for risk-related genes in BC. The co-expression analysis divided the genes from both BC and control tissues into 16 modules (Fig. 2A and Fig. S1A, B). Correlation analysis revealed that the blue, saddlebrown, lightgreen, black, and greenyellow modules were significantly associated with BC (Fig. S1C), encompassing a total of 2086 genes for further analysis. Subsequently, we intersected these 2086 WGCNA-derived genes with the previously identified 1168 TWAS-associated ones resulting in obtaining 109 common genes (Fig. 2B). A heatmap was generated displaying these 109 differentially expressed genes between tumor and normal tissues in BC; among them, significant differences were observed in expression levels of 100 DEGs (Fig. S2). Finally, univariate cox analysis was applied to these 100 DEGs related to BC prognosis revealing that 10 genes exhibited significant associations with BC prognosis: eight low-risk genes and two high-risk genes. Notably, GSTM4, GGT7, and FBXO6 were identified as low-risk genes in both cox analysis and TWAS analyses, suggesting better survival outcomes for BC patients (Table 1 and Fig. 2C). Specifically, GSTM4 and GGT7 were down-regulated in BC tissues while FBXO6 was up-regulated in breast cancer. Survival curves obtained from TCGA cohorts demonstrated that lower GGT7 and GSTM4 expression conferred poorer OS, DSS and PFI in BC (Fig. 2D–F and Fig. S3A–C). Kaplan-Meier survival analysis in the KM-Plotter database demonstrated that BC patients with low GSTM4 expression exhibited worse overall survival (OS), distant metastasis-free survival (DMFS), and relapse-free survival (RFS) (Fig. S3D–F). Although Cox regression analysis suggested GGT7 might have prognostic value, the Kaplan-Meier analysis did not show statistical significance. Previous studies have reported that GSTM4 is located at the 1p13.3 locus, which has been linked to an increased risk of developing BC [7]. However, the precise biological function of GSTM4 in BC remains unknown. Given the significant correlations between GSTM4 expression levels and prognosis in breast cancer patients identified in this study, systematically elucidating the molecular regulatory mechanisms of GSTM4 in breast cancer development will provide important theoretical foundations for developing novel targeted therapeutic strategies. In summary, our integrated analysis identified and validated GSTM4 as a key prognostic gene in breast cancer, underscoring its clinical relevance and the need for further functional characterization. Having established GSTM4 as a key low-risk prognostic gene, we next sought to characterize its expression pattern in breast cancer tissues.

Fig. 1.

Fig. 1

The workflow of data preparation, processing, analysis, and validation were listed

Fig. 2.

Fig. 2

A The Cluster dendrogram showed that the genes were divided into 16 co-expression modules according to their expression similarity; B The Venn diagram revealed that a total of 109 genes were identified following the intersection of 2086 genes from the WGCNA analysis and 1168 genes from the TWAS analysis; C The univariate analysis identified 10 prognostic genes significantly associated with BC prognosis, comprising of 8 low-risk genes and 2 high-risk genes; DF Survival analysis was conducted to evaluate the impact of GSTM4 expression levels (high or low) on OS, DSS, and PFI in BC patients from the TCGA cohort;

Table 1.

Results of TWAS analysis of prognostic genes

SNP WEGHT SET Chr GENE TWAS.Z TWAS.p
GTExv8.ALL.Breast_Mammary_Tissue 7 GTF2IRD2 4.22276 2.41E−05
GTExv8.ALL.Breast_Mammary_Tissue 1 GSTM4 − 3.69003 2.24E−04
GTExv8.ALL.Breast_Mammary_Tissue 14 NUDT14 3.6164 2.99E−04
GTExv8.ALL.Breast_Mammary_Tissue 11 C11orf1 3.267 1.09E−03
GTExv8.ALL.Breast_Mammary_Tissue 20 GGT7 − 2.9406 3.28E−03
GTExv8.ALL.Breast_Mammary_Tissue 9 SPATA6L 2.741951 6.11E−03
GTExv8.ALL.Breast_Mammary_Tissue 1 FBXO6 − 2.54718 1.09E−02
GTExv8.ALL.Breast_Mammary_Tissue 8 THEM6 − 2.45598 1.41E−02
GTExv8.ALL.Breast_Mammary_Tissue 8 ADHFE1 2.31142 2.08E−02
GTExv8.ALL.Breast_Mammary_Tissue 6 PTCHD4 − 2.21907 2.65E−02

Bolded genes (GSTM4, GGT7) were significantly associated in the TWAS, down-regulated in breast cancer tissues, and further confirmed as low-risk genes associated with a favorable prognosis by survival analysis

GSTM4 expression in human BC

Our analysis of the TCGA-BRCA dataset showed a marked down-regulation of GSTM4 mRNA in breast cancer (BC) compared to normal tissues (Fig. 3A), with a negative correlation to pathological stages, lymph node involvement, and metastasis (Fig. 3B, D, F). Notably, all subtypes of BC exhibited decreased levels of GSTM4, particularly triple-negative BC (TNBC), which is associated with poor prognosis (Fig. 3C). Additionally, results from Fig. 3E suggested that BC samples with TP53 mutations tended to have relatively low GSTM4 levels, hinting at a potential role in TP53-related anti-tumor functions. Moreover, we investigated the protein expression of GSTM4 in BC using data from the UALCAN-CTPAC dataset. The analysis revealed that across various clinical stages and subtypes of BC, GSTM4 protein expression was consistently low and inversely related to the activation of the mTOR pathway and the P53/Rb pathway in tumor samples (Fig. 3G–K). Ultimately, western blotting was performed on twelve matched pairs of human BC and normal tissues to evaluate GSTM4 protein levels. A notable reduction in GSTM4 protein was observed in BC tissues relative to normal counterparts (Fig. 3L). Immunohistochemical analysis from the HPA database corroborated these findings, showing diminished GSTM4 protein in tumors (Fig. 3M). Collectively, these results underscore GSTM4’s down-regulation in BC and its correlation with worse prognosis. Given its consistent downregulation at both RNA and protein levels, we next explored the potential regulatory mechanisms responsible for GSTM4 suppression in BC.

Fig. 3.

Fig. 3

AF The mRNA expression of GSTM4 in BC was analyzed based on sample types (A), individual cancer stages (B), BC subclasses (C), nodal metastasis status (D), TP53 mutation status (E) and distant metastasis status (F) using data obtained from the TCGA database; GK The protein expression of GSTM4 in BC was analyzed based on sample types (G), individual cancer stages (H), BC subclasses (I), mTOR pathway status (J) and p53/Rb-related pathway status (K) using data obtained from the CPTAC samples; L images of western blot results showed the protein level of GSTM4 in 12 pairs of BC tissues and their paired normal tissues; M the expression of GSTM4 in BC tissue and normal breast tissues were measured by IHC staining from HPA database. Statistical significance is denoted by *p < 0.05, **p < 0.01, ***p < 0.001

Regulation of GSTM4 expression in BC

Epigenetic alterations are pivotal in the development and progression of cancer, including BC, as well as in the development of resistance to treatment. To delve deeper into the mechanisms behind the aberrant expression of GSTM4 mRNA, we conducted a comprehensive analysis of the correlation between gene copy number and mRNA expression. Our findings from the GSCA database revealed that deletion mutations are the predominant form of GSTM4 alterations (Fig. S4A), and there is a significant positive correlation between GSTM4 copy number and mRNA expression levels (Fig. S4B). Notably, patients with GSTM4 deletions exhibited a poorer overall survival (OS) (Fig. S4C) and disease-specific survival (DSS) (Fig. S4D). In summary, our findings underscore the critical role of epigenetic dysregulation, particularly deletion mutations of GSTM4, in the downregulation of GSTM4 expression and its impact on the prognosis of breast cancer patients. After identifying genomic deletion as a key mechanism for GSTM4 downregulation, we proceeded to investigate its biological functions through protein interaction networks and pathway analysis.

PPI network of GSTM4 in cancers and enrichment analysis

The functional network was constructed using the GeneMANIA database to explore the potential interactome with GSTM4 protein as the hub (Fig. S5A) and predict its biological function (Fig. S5B). Additionally, a PPI network in BC was built using the STRING database. The potential interacting proteins and their associated biological functions of GSTM4 were listed in Fig. S5C and D, respectively. These results revealed that GSTM4 primarily participates in glutathione metabolism, xenobiotic metabolism, antioxidant activities, and oxidoreductase activities—all of which play crucial roles in tumor metabolism and drug susceptibility. Finally, KEGG enrichment analysis was performed on TCGA-BRCA dataset to investigate the association between GSTM4 expression levels (high or low) and metabolic pathways. High expression of GSTM4 was mainly linked to glutathione metabolism, arachidonic acid metabolism, oxidative phosphorylation, fatty acid metabolism, as well as arginine and proline metabolism; whereas low expression of GSTM4 was associated with homologous recombination, cell cycle regulation, ubiquitin-mediated proteolysis pathway activation, bladder cancer development, and basal transcription factor activity (Fig. S5E, F). Importantly, all these cellular biology pathways are involved in tumor metabolic reprogramming processes. Taken together, these findings suggest that GSTM4 may influence tumor metabolic reprogramming processes by exerting its role as a tumor suppressor. The enrichment of GSTM4 in metabolic and detoxification pathways led us to hypothesize that it might also influence the tumor immune microenvironment, given the known connections between metabolism and immune function in cancer.

GSTM4 expression and immune correlation

To enhance our comprehension of the interplay between GSTM4 expression and the tumor immune microenvironment in BC, we initially evaluated the expression profiles of GSTM4 alongside immune checkpoint genes within the TCGA-BRCA dataset. As illustrated in Fig. 4A, our findings revealed that tumors with diminished GSTM4 expression were characterized by elevated levels of CTLA4, LAG3, PDCD1LG2 and TIGIT, but had depressed levels of SIGLEC15. These results suggest the presence of an immunosuppressive landscape in tumors with low GSTM4 expression. This suggests a complex role for GSTM4 in modulating the immune microenvironment of BC. Subsequently, we employed the ssGSEA algorithm to quantify the infiltration fraction of immune cells within the tumor microenvironment of TCGA-BRCA and analyzed its correlation with GSTM4 expression. As shown in Fig. 4B, there was a positive correlation between GSTM4 and infiltration fractions of NK cells, CD8+ T cells, and Th17 cells (immune-promoting cell populations), while a negative correlation was observed with macrophages and Treg cells (immunosuppressive cell populations). Furthermore, we conducted co-expression analyses among 33 tumors to investigate relationships between GSTM4 expression and various immune-related genes including immunostimulators (Fig. S6A), immunoinhibitors (Fig. S6B), chemokines (Fig. S6C), as well as receptors (Fig. S6D). The analysis revealed that upregulation of GSTM4 expression was significantly positively correlated with the expression levels of multiple immune-related molecules, including key receptors (such as TNFRSF13, TNFRSF14, CCR10, and CX3CR1) and their corresponding ligands (e.g., CCL14, CXCL14, and PVRL2). These associations suggest that GSTM4 may be involved in regulating the expression of such immune molecules or may co-exist under the control of a common upstream signaling pathway. Furthermore, GSTM4 is likely broadly implicated in critical biological processes, including immune response, inflammatory reaction, cell adhesion, and signal transduction. In summary, high expression of GSTM4 may play an active role in immune regulation by promoting the recruitment and activation of immune cells as well as enhancing intercellular communication. Our findings revealed strong associations between GSTM4 expression and these immune-related genes. Collectively, these results highlight close links between GSTM4 expression patterns and biological functions mediated by diverse cytokines along with other relevant components involved in immunity. To gain deeper insights into how GSTM4 modulates the tumor microenvironment at cellular resolution, we turned to single-cell RNA sequencing analysis.

Fig. 4.

Fig. 4

A The association between GSTM4 expression and the expression of immune checkpoint markers in TCGA BRCA samples was investigated; B the infiltration fraction of immune cells in the TCGA-BRCA microenvironment was quantified using the ssGSEA algorithm, and its correlation with GSTM4 expression was analyzed; Statistical significance is denoted by *p < 0.05, **p < 0.01, ***p < 0.001

GSTM4 at the single-cell level in BC

To gain a deeper understanding of the relationship between GSTM4 and the tumor microenvironment in BC at the single-cell level, we conducted scRNA-Seq analysis using data from the GEO database (GSE148673). Data quality control was performed according to the criteria outlined in the “Materials and Methods” section, resulting in a total of 4581 cells being obtained. These cells were subsequently classified into 23 clusters based on their marker genes (Fig. 5A, B). Subsequently, we assessed GSTM4 expression across different subpopulations. The results presented in Fig. 5C demonstrate that GSTM4 exhibits high expression levels in epithelial cells and macrophages, while showing variable but generally lower expression levels across most subgroups of cancer cells, with the exception of the PRSS1+ subpopulation. Following this, we performed reclustering of epithelial cells and macrophages, and following dimensionality reduction, identified six distinct epithelial cell subsets (Fig. S7A) and five distinct macrophage subsets (Fig. S7C). The annotation of these six epithelial cell subsets was carried out based on marker genes (Fig. 5D). We then examined GSTM4 expression within these six subgroups of epithelial cells and observed significantly higher expression levels specifically within RPL31 + epithelial cells (Fig. 5E). Additionally, GSEA analysis based on hallmarker (Fig. 5F), KEGG (Fig. 5G) and GO-BP (Fig. 5H) revealed that tumor-related signaling pathways such as mTORC1, hypoxia, EMT, KRAS, VEGF, cell locomotion and intrinsic apoptotic inhibition were suppressed in RPL31 + epithelial cells with high GSTM4 expression. However, activation of TP53 pathway was observed within RPL31 + epithelial cells. Collectively, our findings highlight that GSTM4 serves as an important tumor suppressor within the BC microenvironment by being associated with a reduced malignant phenotype in epithelial cells.

Fig. 5.

Fig. 5

A Expression levels of specific marker genes were assessed in each cell subtype; B Single cell map showing twenty-three cell subclusters; C Violin plots illustrating the expression level variation of GSTM4 across different cell subgroups; D each subpopulation was annotated based on marker genes; E The expression profile of GSTM4 was examined across all six cell clusters; FH GSEA analyses were performed to investigate pathway activation/inhibition in RPL31+ epithelial cells using hallmark gene sets (F), KEGG gene sets (G), and GO-BP gene sets (H)

To elucidate the impact of GSTM4 expression on cancer cells, a dimensionality reduction approach was employed, resulting in the stratification of the cancer cells into five distinct subpopulations (Fig. S7B). These subpopulations were characterized by their marker genes and are depicted in Fig. 6A. GSTM4 expression was assessed across these cell clusters, with Fig. 6B revealing a notable overexpression in the PRSS1+ cell cluster. GSEA was conducted using hallmark gene sets (Fig. 6C), KEGG pathways (Fig. 6D), and GO-BP (Fig. 6E), focusing on the PRSS1+ cell cluster. This analysis uncovered the activation of cancer-inhibitory pathways, including oxidative phosphorylation, apoptosis, and the P53 signaling cascade in PRSS1+ cell cluster. Conversely, pathways associated with MYC target gene expression, hypoxia response, DNA repair, cell proliferation, and stem cell differentiation were found to be attenuated within the PRSS1+ cell cluster.

Fig. 6.

Fig. 6

A Each subpopulation was annotated based on marker genes; B The expression profile of GSTM4 was examined across all five cell clusters; CE GSEA analyses were performed to investigate pathway activation/inhibition in PRSS1+ cancer cells using hallmark gene sets (C), KEGG gene sets (D), and GO-BP gene sets (E)

In parallel, macrophages were subjected to a similar clustering strategy, yielding five distinct cell clusters classified by their marker genes, as detailed in Fig. 7A and Fig. S8C. The immunosuppressed TREM2 + macrophage subpopulation exhibited a significant upregulation of GSTM4, as demonstrated in Fig. 7B. GSEA analyses, utilizing hallmark gene sets, KEGG pathways, and GO-BP, were also conducted on the TREM2+ macrophages, with the findings presented in Fig. 7C–E. These analyses suggest that GSTM4 modulates a diverse array of cellular biological processes in macrophages, thereby influencing the immunological microenvironment of BC through the functional alteration of macrophages.

Fig. 7.

Fig. 7

A Each subpopulation was annotated based on marker genes; B The expression of GSTM4 in macrophage subsets; CE GSEA analyses were performed to investigate pathway activation/inhibition in TREM2+ macrophages using hallmark gene sets (C), KEGG gene sets (D), and GO-BP gene sets (E); FG Cellular communication networks were inferred by calculating the likelihood of communication among all cell subtypes

Finally, we conducted a CellChat analysis to elucidate cell-to-cell signaling networks between cancer cells and immune cells within the tumor microenvironment (TME), as depicted in Fig. 7F-G. Utilizing ligand-receptor interactions, we scrutinized the specific cancer cell types and their corresponding interacting partners, which are detailed in Fig. S9. AGR2+ Cancer cells exhibited a strong correlation with CD4+ T cells, CD8+ T cells, FABP5+ macrophages, FCER1A+ macrophages, MMP9+ macrophages, Monocytes1, SEPP1+ macrophages, and TREM2+ macrophages. These interactions may significantly impact tumor immune evasion and the immune response. Additionally, AGR2+ cancer cells engaged in communication with non-immune cells, such as APOD+ epithelial cells and HMGB2+ epithelial cells, potentially influencing cell-to-cell communication within the TME (Fig. S8A, B). GSTP1+ cancer cells were found to interact with a diverse array of cell types, including AGR2 + cancer cells, APOD + epithelial cells, B cells, CD4+  T cells, CD8+ T cells, CXCL10+ epthelial cells, FABP5+ macrophages, FCER1A+ macrophages, HMGB2+ epithelial cells, and MMP9+ macrophages. These interactions are likely to affect tumor growth and immune surveillance (Fig. S8C, D). KLF2+ cancer cells demonstrated the ability to interact with AGR2+ cancer cells, CD4+ T cells, FABP5 + macrophages, FCER1A+  macrophages, GSTP1+ cancer cells, JUND+ epithelial cells, KLF2+ cancer cells themselves, NDRG1+ cancer cells, SEPP1+ macrophages, SOD3+ epithelial cells, and TREM2+ macrophages. These connections may be integral to tumor cell proliferation, differentiation, and immune evasion (Fig. S8E, F). NDRG1+ cancer cells interacted with CD4+ T cells, CD8+ T cells, FABP5+ macrophages, FCER1A+ macrophages, MMP9+ macrophages, Monocytes1, SEPP1+  macrophages, and TREM2+ macrophages, suggesting a role in tumor immune evasion and immune response (Fig. S8G, H). PRSS1+ cancer cells were observed to interact with AGR2+ cancer cells, CD4+ T cells, CD8+ T cells, FABP5+ macrophages, FCER1A+ macrophages, MMP9+ macrophages, Monocytes1, SEPP1+ macrophages, and TREM2+ macrophages, potentially influencing the tumor immune microenvironment and immune cell function (Fig. S8I, J). Through a comprehensive analysis of the tumor cells and the cell subsets they interact with, we identified that AGR2+ and PRSS1+ cancer cells may be particularly active in engaging with immune cells within the immune microenvironment, suggesting a higher propensity to induce anti-tumor immune responses. Our findings also revealed that GSTM4 was highly expressed in PRSS1+ cancer cells but less so in AGR2+ cancer cells, indicating a significant role for GSTM4 in the tumor-immune cell interactions. However, the precise functions of GSTM4 within the TME of BC warrant further investigation. Overall, our single-cell transcriptomic analysis reveals that GSTM4 expression is enriched in specific epithelial and macrophage subpopulations, suppresses oncogenic pathways, and modulates intercellular communication within the tumor microenvironment, suggesting its multifaceted role as a tumor suppressor in breast cancer. To complement these computational findings and establish direct functional evidence, we conducted in vitro experiments to validate the tumor-suppressive effects of GSTM4 observed in our bioinformatic analyses.

Determination of GSTM4 as a novel suppressor in BC

Finally, in vitro studies were conducted to elucidate the biological role of GSTM4 in BC cells. We established an MDA-MB-231 cell line with overexpressed GSTM4 (MDA-MB-231GSTM4) and two MCF-7 cell lines with GSTM4 knockdown (MCF-7GSTM4#sh1 and MCF-7GSTM4#sh2), as shown in Fig. 8A. The effects of GSTM4 on proliferation of GSTM4 on BC cells were evaluated using CCK-8 (Fig. 8B), and Edu staining assays (Fig. 8C). Our findings indicated a significant reduction in the proliferation rate of MDA-MB-231 cells following GSTM4 overexpression, while a converse effect was observed in MCF-7 cells where GSTM4 was knocked down. Additionally, cell scratch (Fig. 8D) and invasion assays (Fig. 8E) demonstrated that GSTM4 overexpression in MDA-MB-231 cells led to a notable inhibition of cell migration and invasion. Stem cell spheroid formation assays further revealed a decrease in stemness characteristics in the GSTM4-overexpressing MDA-MB-231 cells. In contrast, the MCF-7 cells with GSTM4 knockdown exhibited an increase in stemness, as shown in Fig. 8F. These results collectively suggest that GSTM4 plays a pivotal role in modulating the proliferative, migratory, and invasive properties of BC cells, with potential implications for cancer stemness and therapeutic strategies.

Fig. 8.

Fig. 8

A The MDA-MB-231 cell line was utilized to generate a stable GSTM4 overexpressing derivative (MDA-MB-231GSTM4). To knock down GSTM4 expression in MCF-7 cells, two distinct small interfering RNA (siRNA) oligonucleotides targeting GSTM4 were introduced, named MCF-7sh-GSTM4#1 and MCF-7sh-GSTM4#2. The efficacy of the transfection process in the respective cell lines was subsequently assessed using western blot. B, C A series of assays, including the CCK-8 (B) and EdU (5-ethynyl-2′-deoxyuridine) staining (C), were conducted to evaluate the impact of GSTM4 overexpression on the proliferative capacity of BC cells. D, E The Cell Scratch (D) and invasion (E) assays were employed to investigate the influence of GSTM4 on the migratory and invasive properties of BC cells. F Representative images of tumor spheres derived from MDA-MB-231NC and MDA-MB-231GSTM4, as well as MCF-7NC and MCF-7shGSTM4, are presented, illustrating the morphological characteristics of spheroids cultured on ultra-low attachment plates.; p < 0.05, **p < 0.01, ***p < 0.001. The data represent at least three independent experiments

Discussion

Our multi-omics investigation, integrating bulk transcriptomics, genomics, scRNA-seq, and epigenetic analyses, identifies GSTM4 as a pivotal tumor suppressor and a promising prognostic biomarker in BC. This study not only delineates its fundamental biological roles but also highlights its significant correlations on the tumor immune microenvironment, pointing to novel therapeutic avenues.

We initially identified GSTM4 as a low-risk gene associated with better patient survival. Analyses of the TCGA-BRCA dataset, UALCAN-CTPAC, and the HPA database consistently confirmed that GSTM4 expression is significantly reduced in BC tissues compared to normal counterparts, and its low expression correlates with poor prognosis. Notably, this downregulation was most pronounced in the aggressive HER2-positive and TNBC subtypes. Given that TNBC is linked to a heightened risk of early recurrence and an unfavorable prognosis, and given its suboptimal response rate (5%–10%) to conventional chemotherapy in advanced stages, the discovery of new biomarkers is of paramount importance [12]. The gene’s location on chromosome 1p13.3, 51 kb downstream from the protective allele SNP rs17024629—which is linked to decreased BC susceptibility—further genetically supports its protective role [7].

The genomic mechanism underlying GSTM4 silencing in BC was predominantly attributed to copy-number deletion, which correlated with low expression and worse survival. Functionally, GSEA of bulk transcriptomic data suggested GSTM4’s central involvement in critical pathways, including glutathione metabolism, xenobiotic detoxification, and antioxidant defense [4]. At a higher resolution, scRNA-seq GSEA indicated that in epithelial cells, high GSTM4 expression was inversely correlated with oncogenic pathways like mTORC1 signaling, hypoxia, EMT, and KRAS signaling, while being positively linked to TP53 pathway activation. Similarly, in cancer cells, GSTM4 expression was associated with the promotion of oxidative phosphorylation and apoptosis while correlating with the dampening of pathways related to MYC targets, DNA repair, and stem cell differentiation. These findings collectively support the model that GSTM4 acts as a versatile tumor suppressor, potentially modulating core metabolic and signaling processes to inhibit malignancy.

Beyond cell-intrinsic effects, GSTM4 expression showed significant correlations with the composition of the tumor immune microenvironment (TME). We identified a significant negative correlation between GSTM4 expression and key immune checkpoint genes (e.g., CTLA4, LAG3, PDCD1LG2, TIGIT), which is consistent with the idea that its loss may contribute to an immunosuppressive TME. Supporting this, ssGSEA showed that high GSTM4 expression was associated with increased infiltration of anti-tumor immune effectors like CD8+ T cells and NK cells, and reduced infiltration of pro-tumorigenic Treg and Th2 cells. CellChat analysis at the scRNA level pinpointed that cancer cells with high GSTM4 expression were the most interactive with CD8+ T cells. This compelling set of correlations suggests that GSTM4 may enhance immune surveillance, providing a rationale for its association with a favorable prognosis. The recognition of BC as an immunogenic entity, particularly with the advent of immune checkpoint inhibitors like Atezolizumab and Pembrolizumab for TNBC, underscores the clinical relevance of understanding immune-modulatory factors like GSTM4 [13, 14].

These bioinformatic insights were explored in vitro through an experimental strategy employing two distinct models—the triple-negative MDA-MB-231 cell line with low GSTM4 expression and the Luminal MCF-7 cell line exhibiting high GSTM4 expression, as identified in the Cancer Cell Line Encyclopedia (CCLE) database [15]. Given that these cell lines represent breast cancer subtypes with divergent malignant clinical characteristics, we performed GSTM4 overexpression in the low-expression model and knockdown in the high-expression model to investigate its potential functional roles across malignant contexts. This approach showed that GSTM4 overexpression suppressed proliferative, migratory, invasive, and stem-like properties of cancer cells, while its knockdown enhanced these phenotypes. Although this strategy helped elucidate core functions of GSTM4 across subtypes, it may not fully exclude potential influences from intrinsic molecular differences.

It is also noteworthy that other genes, including GGT7 and FBXO6, consistently demonstrated low-risk prognostic characteristics in our analyses. As a member of the gamma-glutamyltransferase family, GGT7 has been suggested to be involved in antioxidant metabolism and tumorigenesis [16], whereas FBXO6, as a component of the E3 ubiquitin ligase complex, may influence tumor progression through regulating protein degradation processes [17]. These findings imply that multiple low-risk genes may collectively contribute to protective biological processes in breast cancer, warranting further systematic investigation into their mechanisms in future studies.

This study has several limitations. First, the analytical findings have not yet been further validated in large clinical cohorts. Second, the specific regulatory mechanisms through which GSTM4 influences immune cell recruitment or related signaling pathways remain to be directly elucidated, such as through mechanistic studies employing in vitro or in vivo immune co-culture models of GSTM4. Furthermore, although the experimental design using two cell lines with distinct malignant phenotypes—namely MDA-MB-231 and MCF-7—for knockdown and overexpression experiments, respectively, helped investigate the common functions of GSTM4 across different cellular contexts, it still does not fully rule out potential influences of intrinsic molecular subtype differences in breast cancer on the observed phenotypic outcomes.

Conclusion

Collectively, our research has elucidated the expression profile and biological role of GSTM4 in BC. This deeper comprehension of GSTM4 offers a promising new therapeutic option for the treatment of BC.

Supplementary Information

Below is the link to the electronic supplementary material.

12672_2025_3850_MOESM1_ESM.tif (18.5MB, tif)

Supplementary Material 1. Figure S1. (A-B) Selection of optimal thresholds. The threshold is 6; (C) Heat maps of correlation between co-expression modules and BC showed that “blue”, “saddle brown”, “light green”, “black” and “green yellow” modules were significantly correlated with BC

12672_2025_3850_MOESM2_ESM.tif (29MB, tif)

Supplementary Material 2. Figure S2. (A) Heatmap displayed DEGs between tumor and normal tissues in BC.

12672_2025_3850_MOESM3_ESM.tif (13.5MB, tif)

Supplementary Material 3. Figure S3. (A-C) Survival analysis was conducted to evaluate the impact of GGT7 expression levels (high or low) on overall survival (OS), disease-specific survival (DSS), and progression-free interval (PFI) in BC patients from the TCGA cohort; (D-F) Survival analysis was conducted to evaluate the impact of GSTM4 expression levels (high or low) on OS, distant metastasis free survival (DMFS), and relapse-free survival (RFS) in BC patients from the KM-Plotter database.

12672_2025_3850_MOESM4_ESM.tif (7.9MB, tif)

Supplementary Material 4. Figure S4. (A-B) Data from GSCA platform revealed that deletion mutation of GSTM4 was the predominant form of genetic alteration (A), and there was a significant positive correlation between the expression of GSTM4 and its gene copy number in BRCA samples (B); (C-D) Survival analysis from GSCA platform showed that BC patients with GSTM4 deletion hold poor OS (C) and disease-specific survival (DSS) (D).

12672_2025_3850_MOESM5_ESM.tif (28.4MB, tif)

Supplementary Material 5. Figure S5. (A-D) Data from the GeneMANIA and STRING databases revealed the predicted interaction partners of GSTM4 (A, C), along with their potential functions (B, D). (E-F) The TCGA-BRCA samples exhibited a significant enrichment of the Top5 KEGG pathway in the subgroups characterized by high (E) or low (F) expression of GSTM4.

12672_2025_3850_MOESM6_ESM.tif (19.2MB, tif)

Supplementary Material 6. Figure S6. (A-D) Data from TISIDB revealed Spearman correlations between GSTM4 expression and immunostimulators (A), immunoinhibitors (B), as well as chemokines and receptors across various human cancers (C-D).

12672_2025_3850_MOESM7_ESM.tif (19.1MB, tif)

Supplementary Material 7. Figure S7. (A) After dimensionality reduction and cluster analysis, the epithelial cells were classified into six distinct subpopulations. The gene expression profiles of these clusters are presented in a heatmap; (B) After dimensionality reduction and cluster analysis, the cancer cells were classified into five distinct subpopulations. The gene expression profiles of these clusters are presented in a heatmap; (C) After dimensionality reduction and cluster analysis, the macrophages were classified into five distinct subpopulations. The gene expression profiles of these clusters are presented in a heatmap.

12672_2025_3850_MOESM8_ESM.tif (28.1MB, tif)

Supplementary Material 8. Figure S8 (A-B) The bubble map showed the ligand-receptor relationship between AGR2 + cancer cells and other cell subpopulations; (C-D) The bubble map showed the ligand-receptor relationship between GSTP1 + cancer cells and other cell subpopulations; (E-F) The bubble map showed the ligand-receptor relationship between KLF2 + cancer cells and other cell subpopulations; (J-H) The bubble map showed the ligand-receptor relationship between NDRG1 + cancer cells and other cell subpopulations; (I-J) The bubble map showed the ligand-receptor relationship between PRSS1 + cancer cells and other cell subpopulations.

Supplementary Material 9. (283.6KB, xlsx)
Supplementary Material 10. (623.3KB, xlsx)
Supplementary Material 14. (255.9KB, docx)

Author contributions

Zhen Yu: Manuscript drafting, and revision in response to reviewers; Weikang Yun: Bioinformatics tool development/validation, data curation/visualization; Qian Zhang: Statistical analysis/method optimization; Xianglan Li*: Study supervision, funding, manuscript revision; Chunbo Zhao*: Research coordination, final approval; Xu Liu*: Experiment design, data analysis, manuscript drafting. (*Co-corresponding authors)

Funding

This work was supported by National Natural Science Foundation of China (82303432), China Postdoctoral Science Foundation (Grant Number: 2020M681116), HMU Cancer Hospital Haiyan Fund Outstanding Youth Project (JJYQ2024-05) and the CSCO-Hengrui Cancer Research Fund project (Y-HR2022QN-0374).

Data availability

The datasets analyzed in this study are publicly available in the following repositories: The Cancer Genome Atlas (TCGA; https://www.cancer.gov/tcga), TWAS FUSION (http://gusevlab.org/projects/fusion/#reference-functional-data), the Gene Expression Omnibus (GEO) under accession number GSE148673 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE148673), KM-Plotter (https://kmplot.com/analysis/), UALCAN (https://ualcan.path.uab.edu/analysis.html), the Human Protein Atlas (HPA; https://www.proteinatlas.org/), GeneMANIA (http://genemania.org/), STRING (https://string-db.org/), Gene Set Cancer Analysis (GSCA; http://bioinfo.life.hust.edu.cn/GSCA/#/), and TIMER (https://cistrome.shinyapps.io/timer/). All other data generated or analyzed are included in the article and its supplementary materials, and further details are available from the corresponding authors upon reasonable request.

Declarations

Ethics approval and consent to participate

This study was conducted in accordance with the Declaration of Helsinki principles. It was approved by the Research Ethics Committee of Harbin Medical University.

Consent for publication

Not applicable. The manuscript contains no personally identifiable information about any participant.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Change history

11/20/2025

This article has been updated to amend the missing funding acknowledgement

Contributor Information

Xianglan Li, Email: 1243903966@qq.com.

Chunbo Zhao, Email: chunbozhao@hrbmu.edu.cn.

Xu Liu, Email: Liuxu01@hrbmu.edu.cn.

References

  • 1.Siegel RL, Miller KD, Wagle NS, Jemal A. Cancer statistics, 2023. CA Cancer J Clin. 2023;73:17–48. 10.3322/caac.21763. [DOI] [PubMed] [Google Scholar]
  • 2.Wagner J, et al. A Single-Cell atlas of the tumor and immune ecosystem of human breast cancer. Cell. 2019;177:1330–45. 10.1016/j.cell.2019.03.005. e1318. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Zhang Z, Sun W, Zeng Z, Lu Y. Identification of significant prognostic risk markers for pancreatic ductal adenocarcinoma: a bioinformatic analysis. Acta Biochim Pol. 2022;69:327–33. 10.18388/abp.2020_5758. [DOI] [PubMed] [Google Scholar]
  • 4.Yang MH, et al. Reversal of High-Fat Diet-Induced Non-Alcoholic fatty liver disease by Metformin combined with PGG, an inducer of Glycine N-Methyltransferase. Int J Mol Sci. 2022. 10.3390/ijms231710072. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Liu W, et al. 6-(7-Nitro-2,1,3-benzoxadiazol-4-ylthio) hexanol inhibits proliferation and induces apoptosis of endometriosis by regulating glutathione S-Transferase mu class 4. Reprod Sci. 2023;30:2945–61. 10.1007/s43032-023-01207-x. [DOI] [PubMed] [Google Scholar]
  • 6.Marugame Y, et al. Sesame lignans upregulate glutathione S-transferase expression and downregulate microRNA-669c-3p. Biosci Microbiota Food Health. 2022;41:66–72. 10.12938/bmfh.2021-067. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Adedokun B, et al. Cross-ancestry GWAS meta-analysis identifies six breast cancer loci in African and European ancestry women. Nat Commun. 2021;12:4198. 10.1038/s41467-021-24327-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Colaprico A, et al. TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data. Nucleic Acids Res. 2016;44:e71. 10.1093/nar/gkv1507. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Liu X, et al. Identification of multiple novel susceptibility genes associated with autoimmune thyroid disease. Front Immunol. 2023;14:1161311. 10.3389/fimmu.2023.1161311. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Zhao Z, et al. Analysis and experimental validation of rheumatoid arthritis innate immunity gene CYFIP2 and Pan-Cancer. Front Immunol. 2022;13:954848. 10.3389/fimmu.2022.954848. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Gao R, et al. Delineating copy number and clonal substructure in human tumors from single-cell transcriptomes. Nat Biotechnol. 2021;39:599–608. 10.1038/s41587-020-00795-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Liu Y, et al. Subtyping-based platform guides precision medicine for heavily pretreated metastatic triple-negative breast cancer: the FUTURE phase II umbrella clinical trial. Cell Res. 2023;33:389–402. 10.1038/s41422-023-00795-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Franzoi MA, Romano E, Piccart M. Immunotherapy for early breast cancer: too soon, too superficial, or just right? Ann Oncol. 2021;32:323–36. 10.1016/j.annonc.2020.11.022. [DOI] [PubMed] [Google Scholar]
  • 14.Badve SS, et al. Determining PD-L1 status in patients with Triple-Negative breast cancer: lessons learned from IMpassion130. J Natl Cancer Inst. 2022;114:664–75. 10.1093/jnci/djab121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Li H, et al. The landscape of cancer cell line metabolism. Nat Med. 2019;25:850–60. 10.1038/s41591-019-0404-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Wang X, et al. Gamma-glutamyltransferase 7 suppresses gastric cancer by cooperating with RAB7 to induce mitophagy. Oncogene. 2022;41:3485–97. 10.1038/s41388-022-02339-1. [DOI] [PubMed] [Google Scholar]
  • 17.Ji M, et al. FBXO6-mediated RNASET2 ubiquitination and degradation governs the development of ovarian cancer. Cell Death Dis. 2021;12:317. 10.1038/s41419-021-03580-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

12672_2025_3850_MOESM1_ESM.tif (18.5MB, tif)

Supplementary Material 1. Figure S1. (A-B) Selection of optimal thresholds. The threshold is 6; (C) Heat maps of correlation between co-expression modules and BC showed that “blue”, “saddle brown”, “light green”, “black” and “green yellow” modules were significantly correlated with BC

12672_2025_3850_MOESM2_ESM.tif (29MB, tif)

Supplementary Material 2. Figure S2. (A) Heatmap displayed DEGs between tumor and normal tissues in BC.

12672_2025_3850_MOESM3_ESM.tif (13.5MB, tif)

Supplementary Material 3. Figure S3. (A-C) Survival analysis was conducted to evaluate the impact of GGT7 expression levels (high or low) on overall survival (OS), disease-specific survival (DSS), and progression-free interval (PFI) in BC patients from the TCGA cohort; (D-F) Survival analysis was conducted to evaluate the impact of GSTM4 expression levels (high or low) on OS, distant metastasis free survival (DMFS), and relapse-free survival (RFS) in BC patients from the KM-Plotter database.

12672_2025_3850_MOESM4_ESM.tif (7.9MB, tif)

Supplementary Material 4. Figure S4. (A-B) Data from GSCA platform revealed that deletion mutation of GSTM4 was the predominant form of genetic alteration (A), and there was a significant positive correlation between the expression of GSTM4 and its gene copy number in BRCA samples (B); (C-D) Survival analysis from GSCA platform showed that BC patients with GSTM4 deletion hold poor OS (C) and disease-specific survival (DSS) (D).

12672_2025_3850_MOESM5_ESM.tif (28.4MB, tif)

Supplementary Material 5. Figure S5. (A-D) Data from the GeneMANIA and STRING databases revealed the predicted interaction partners of GSTM4 (A, C), along with their potential functions (B, D). (E-F) The TCGA-BRCA samples exhibited a significant enrichment of the Top5 KEGG pathway in the subgroups characterized by high (E) or low (F) expression of GSTM4.

12672_2025_3850_MOESM6_ESM.tif (19.2MB, tif)

Supplementary Material 6. Figure S6. (A-D) Data from TISIDB revealed Spearman correlations between GSTM4 expression and immunostimulators (A), immunoinhibitors (B), as well as chemokines and receptors across various human cancers (C-D).

12672_2025_3850_MOESM7_ESM.tif (19.1MB, tif)

Supplementary Material 7. Figure S7. (A) After dimensionality reduction and cluster analysis, the epithelial cells were classified into six distinct subpopulations. The gene expression profiles of these clusters are presented in a heatmap; (B) After dimensionality reduction and cluster analysis, the cancer cells were classified into five distinct subpopulations. The gene expression profiles of these clusters are presented in a heatmap; (C) After dimensionality reduction and cluster analysis, the macrophages were classified into five distinct subpopulations. The gene expression profiles of these clusters are presented in a heatmap.

12672_2025_3850_MOESM8_ESM.tif (28.1MB, tif)

Supplementary Material 8. Figure S8 (A-B) The bubble map showed the ligand-receptor relationship between AGR2 + cancer cells and other cell subpopulations; (C-D) The bubble map showed the ligand-receptor relationship between GSTP1 + cancer cells and other cell subpopulations; (E-F) The bubble map showed the ligand-receptor relationship between KLF2 + cancer cells and other cell subpopulations; (J-H) The bubble map showed the ligand-receptor relationship between NDRG1 + cancer cells and other cell subpopulations; (I-J) The bubble map showed the ligand-receptor relationship between PRSS1 + cancer cells and other cell subpopulations.

Supplementary Material 9. (283.6KB, xlsx)
Supplementary Material 10. (623.3KB, xlsx)
Supplementary Material 14. (255.9KB, docx)

Data Availability Statement

The datasets analyzed in this study are publicly available in the following repositories: The Cancer Genome Atlas (TCGA; https://www.cancer.gov/tcga), TWAS FUSION (http://gusevlab.org/projects/fusion/#reference-functional-data), the Gene Expression Omnibus (GEO) under accession number GSE148673 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE148673), KM-Plotter (https://kmplot.com/analysis/), UALCAN (https://ualcan.path.uab.edu/analysis.html), the Human Protein Atlas (HPA; https://www.proteinatlas.org/), GeneMANIA (http://genemania.org/), STRING (https://string-db.org/), Gene Set Cancer Analysis (GSCA; http://bioinfo.life.hust.edu.cn/GSCA/#/), and TIMER (https://cistrome.shinyapps.io/timer/). All other data generated or analyzed are included in the article and its supplementary materials, and further details are available from the corresponding authors upon reasonable request.


Articles from Discover Oncology are provided here courtesy of Springer

RESOURCES