Skip to main content
Human Genomics logoLink to Human Genomics
. 2026 Jan 25;20:37. doi: 10.1186/s40246-026-00918-x

Exploration of neutrophil-associated genes in the prognosis of bladder urothelial carcinoma based on a machine learning and multi-omics data integration framework

Muya Ran 2,#, Xiaoming Chen 4,#, Guancheng Xiao 1,3,, RuoHui Huang 1,3, Wei Xia 1,3, QingMing Zeng 1,3, Gang Xu 1,3, Bo Jiang 1,3
PMCID: PMC12914921  PMID: 41582171

Abstract

Background

Bladder urothelial carcinoma (BLCA) is a prevalent malignancy. The poor performance of existing therapeutic approaches in the advanced stages of BLCA underscores the critical need for more sensitive and precise biomarkers to improve patient survival and prognosis.

Methods

This study utilized single-cell RNA sequencing (scRNA-seq) data from BLCA and control groups, employing the high-dimensional Weighted Gene Co-expression Network Analysis (hdWGCNA) algorithm to identify neutrophil-associated genes. These genes were intersected with differentially expressed genes (DEGs) from RNA-seq data, followed by univariate Cox regression analysis. Subsequently, BLCA subtypes were identified using a framework combining autoencoder (DAE) and joint deep semi-nonnegative matrix factorization algorithms. Various machine learning ensemble algorithms were then used to screen prognostic genes and construct a BLCA risk model.

Results

We identified several reliable BLCA subtypes with significant differences in enriched pathways and immune landscapes. Based on the risk model, the high- and low-risk groups showed significant differences in the expression patterns and BLCA-related associations of prognostic genes, as well as in immune cell correlations and drug sensitivity. Furthermore, the prognostic genes in the constructed risk model also demonstrated significant value in pan-cancer analysis.

Conclusion

This study reveals the critical role of neutrophils in the occurrence and progression of BLCA through multi-omics data and bioinformatics analyses, and constructs a risk model with potential clinical applications. Our research provides new insights for precise stratification and personalized treatment of BLCA, promising to improve the clinical prognosis. The source code for the proposed framework is available at https://gitee.com/guancheng-xiao/blca/tree/master/.

Keywords: Bladder urothelial carinoma, Neutrophils, Biomarkers, Risk model, scRNA-seq

Introduction

BLCA is a common malignancy that originates from the cellular layer lining the inner surface of the bladder [1, 2]. Based on the depth of tumor infiltration, BLCA can be classified into non-muscle-invasive bladder cancer (NMIBC) and muscle-invasive bladder cancer (MIBC) [3]. NMIBC is characterized by a high recurrence rate and has a certain likelihood of progression to MIBC. Treatment modalities for BLCA include surgery, chemotherapy, radiotherapy, and immunotherapy. However, due to the high rates of recurrence, metastasis, and drug resistance, the efficacy of these treatments in the advanced stages of BLCA remains suboptimal [4, 5]. For immunotherapy, the poor therapeutic outcomes are primarily due to the lack of more sensitive and precise biomarkers. Therefore, identifying key biomarkers involved in the occurrence and progression of BLCA and accurately stratifying BLCA patients can help improve survival rates and prognoses.

Neutrophils, the most abundant type of white blood cells in the bloodstream, play a critical role in the human immune system. Studies have shown that neutrophils are closely associated with BLCA. Firstly, neutrophils play a complex role in the tumor microenvironment of BLCA. They can influence tumor growth, angiogenesis, invasion, and metastasis by secreting cytokines, chemokines, and enzymes. For example, Jing et al. revealed that tumor-neutrophil crosstalk coordinates the BLCA microenvironment. Specifically, they found that targeting neutrophils or hepatocyte growth factor receptor signaling combined with immune checkpoint blockade can inhibit BLCA progression and enhance the anti-tumor effect of CD8 T cells in mice [6]. Secondly, the neutrophil-to-lymphocyte ratio (NLR) in the blood is considered an important prognostic indicator. A high NLR is typically associated with a poor prognosis in bladder cancer patients, as it may reflect an inflammatory state and an immunosuppressive environment that promotes tumor growth and dissemination [7]. Thirdly, neutrophils play a significant role in the response to bladder cancer treatment. Studies have found that higher tumor-associated neutrophil (TAN) infiltration predicts poorer outcomes in urothelial bladder cancer patients receiving immune checkpoint blockade therapy [8]. Finally, the accumulation of neutrophils at the tumor site is often accompanied by a pro-inflammatory response, which may further drive tumor growth and deterioration. For instance, Mandelli et al. confirmed that pro-inflammatory CK recruits TAN in basal-type MIBC [9].

Previous studies have identified reliable biomarkers for BLCA using various bioinformatics analysis methods and experimental validations from multiple perspectives. For example, Wang et al. found that COL10A1 has prognostic value in urothelial bladder cancer and is associated with tumor-infiltrating immune cells through gene co-expression network analysis, GSEA analysis, and immunohistochemical staining [10]. Qiu et al. explored the role of PMEPA1 in predicting BLCA progression, prognosis, and molecular subtypes based on BLCA transcriptome data from the The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) databases using various bioinformatics methods [11]. Lü et al. identified MAP1A as a high-risk factor for BLCA through interactive analysis of gene expression profiles and further validated the association of MAP1A with early diagnosis, prognosis, immunotherapy, and the competing endogenous RNA (ceRNA) network of BLCA [12]. Wang et al. identified hub genes in BLCA progression based on weighted co-expression network and protein–protein interaction network and validated their significant differential expression between BLCA and control groups [13]. Dong et al. identified BLCA subtypes and constructed a risk model based on DEGs between BLCA and control groups using non-negative matrix factorization (NMF) and Lasso-Cox regression, respectively [14]. Although these studies have provided valuable insights, most existing BLCA signatures and subtype classifications rely primarily on bulk-level gene expression data and do not explicitly incorporate cell-type–specific biological information or multi-omics integration.

On the other hand, deep learning algorithms have been widely applied in disease classification, key biomarker identification, and subtype identification, achieving excellent results. For instance, Zhu et al. proposed a model based on an autoencoder and graph convolutional neural network to explore the association between lncRNA and cancer metastasis events, and the results were validated through statistical analysis and literature verification [15]. P. Vanmathi et al. proposed a combined algorithm framework that predicts breast cancer prognosis through an integrated serial cascade attention network and improved variational autoencoder, achieving better prediction results than baseline algorithms [16]. Lu et al. proposed an improved genomic non-negative matrix factorization algorithm, which includes a pilot hierarchical clustering procedure for determining the number of clusters, multiple random starting schemes, and a new stopping criterion for core Genomic Non-negative Matrix Factorization (GNMF), which can be used to identify genomic subgroups including non-small cell lung cancer, colorectal cancer, and malignant melanoma [17]. Graph-regularized NMF reveals the underlying geometric structure of the data space through graph regularization. Zhu et al. combined l2,1-norm NMF with spectral clustering to propose robust manifold NMF, which has better performance in cancer gene clustering [18]. However, the application of deep learning–based multi-omics integration frameworks that explicitly incorporate immune cell–specific biological signals into BLCA subtype identification remains limited.

In this study, we aim to address these gaps by proposing an integrative and biologically informed framework that differs from existing BLCA signatures and subtype studies in several key aspects. We first leveraged scRNA-seq data and the hdWGCNA algorithm to identify neutrophil-associated gene modules, thereby anchoring our analysis in cell-type–specific immune biology. These neutrophil-related genes were then integrated with bulk RNA-seq and DNA methylation data to capture complementary molecular layers. Subsequently, we identified reliable BLCA subtypes using DAE and joint deep semi-non-negative matrix factorization (JDSNMF) framework, enabling robust latent feature learning and subtype discovery. Compared with existing BLCA signatures derived from single-omics or bulk-level data, our approach uniquely integrates single-cell–derived immune features with multi-omics data through deep learning–based representation learning. Furthermore, we constructed a prognostic risk model using ensemble machine learning methods and examined BLCA-related associations of prognostic genes, their correlations with immune cells, and differences in drug sensitivity between high- and low-risk groups. Finally, we performed exploratory pan-cancer analyses to provide additional context for the potential broader relevance of the identified genes.

Methods

Data acquisition and preprocessing

The scRNA-seq data utilized in this study was sourced from GEO database. Specifically, data were retrieved from three datasets: GSE129845 [19], GSE135337 [20], and GSE267718. The GSE129845 dataset includes 3 BLCA samples. The GSE135337 dataset contains 7 primary BLCA samples and 1 adjacent normal tissue sample. Additionally, 8 BLCA tumor samples were collected from the GSE267718 dataset. Detailed information on the datasets used can be found in Supplementary Table S1.

Gene expression data (FPKM) and clinical data for bladder cancer patients (TCGA-BLCA cohort) were downloaded from the TCGA database using the R package “easyTCGA.” The TCGA-BLCA cohort consists of 19 normal tissues and 412 bladder cancer tissues. The GSE13507 dataset, containing 165 bladder cancer tissue samples, was obtained from the GEO database for external validation of the prognostic model. The normalizeBetweenArrays() function from the limma package was used to normalize the GSE13507 data. DNA methylation data (450 k, 21 normal tissues, and 416 bladder cancer tissues) and copy number variation (CNV) data (416 bladder cancer tissues) for the TCGA-BLCA cohort were downloaded from the UCSC Xena database. During the preprocessing of DNA methylation data, CpG sites on sex chromosomes and those missing in more than 70% of samples were removed, retaining CpG sites within 2 kb upstream to 0.5 kb downstream of transcription start sites. Tumor mutation burden (TMB) data for the TCGA-BLCA cohort was obtained using the R package “TCGAbiolinks.” Additionally, gene expression matrices and corresponding clinical survival data for 33 types of cancer were retrieved from the UCSC Xena database.

scRNA-seq data processing and cell type identification

scRNA-seq data analysis was performed using R, with necessary packages including Seurat [21], DoubletFinder [22], harmony [23], and SingleR [24]. The three scRNA-seq datasets were loaded and merged into a single Seurat object. After updating the metadata to include sample identifiers, violin plots were used to visualize the merged data, assessing features, counts, and mitochondrial gene percentages. During quality control, cells with fewer than 100 or more than 7500 features, mitochondrial gene content exceeding 10%, or counts over 30,000 were filtered out. Putative doublets were subsequently identified and removed using the DoubletFinder package. Briefly, parameter sweeps were performed with paramSweep_v3 using the first 20 principal components, and the optimal pK value was selected by maximizing the BCmetric. The expected number of doublets was estimated with pN set to 0.25 and adjusted for homotypic doublet proportion using modelHomotypic based on clustering annotations. Predicted doublets were excluded from downstream analyses.

Subsequently, the filtered data were normalized using the LogNormalize method with a scaling factor of 10,000. Highly variable features were identified, and principal component analysis (PCA) was conducted for dimensionality reduction. Batch effect correction was performed using Harmony, with sample origin as the grouping variable. Neighbors were identified, and clusters were detected using a clustering algorithm with a resolution of 0.5. t-Distributed Stochastic Neighbor Embedding (t-SNE) plots based on PCA and Harmony dimensionality reduction were used to visualize different cell clusters. Cell types were identified using the SingleR package with the human primary cell atlas data as a reference. Annotation results were visualized using heatmaps, and cell types were assigned to clusters in the metadata. Neutrophils were defined as clusters annotated as neutrophils by SingleR and further supported by high expression of canonical neutrophil marker genes, including CSF3R, FCGR3B and CXCR2.

hdWGCNA analysis of neutrophils in BLCA

The functional role of neutrophil-related gene modules in BLCA was investigated using co-expression network analysis. Data were preprocessed and loaded using the WGCNA and hdWGCNA [25] packages, along with Seurat for single-cell analysis. Neutrophil cell populations were identified and subjected to normalization and dimensionality reduction analysis. Highly variable genes were selected, and meta-information was constructed based on cell types and samples, followed by PCA for dimensionality reduction. The expression matrix was normalized, and a soft threshold parameter was chosen for network construction. Co-expression networks and module identification were then performed. Module eigengenes (MEs) were calculated for each module, and gene importance was assessed using module connectivity (kME). Differential expression analysis was conducted, kME heatmaps of module genes were plotted, and hub genes were identified.

Intercellular communication and transcription factor analysis

Intercellular communication analysis was performed using the CellChat package [26]. Expression matrices and cell classification information were extracted to create a CellChat object. The CellChatDB.human database was selected, and database categories and contents were displayed. To reduce computational costs, expression data were subset, overexpressed genes and ligand-receptor pairs were identified, and they were projected onto the human PPI network. Communication probabilities were calculated, and intercellular communication networks were inferred, excluding communications with few cells. The inferred intercellular communication data frame was extracted, and cell–cell communication was inferred at the signaling pathway level, calculating aggregated cell interaction communication networks. Various visualization methods were used for network visualization.

Additionally, transcription factor (TF) activity analysis on scRNA-seq data was performed using the DoRothEA package. Human TF regulons from the DoRothEA database were utilized, excluding interactions with confidence levels below A, B, or C. Viper scores were calculated to quantify TF activity in neutrophils. The top 20 most variable TFs were selected and displayed, and heatmaps of their mean Viper scores across different cell populations were generated using the pheatmap package.

Differential and mutation analysis

To identify DEGs between normal and bladder cancer tissues, the limma algorithm was used on the TCGA-BLCA cohort. Genes with an absolute logFC greater than 0.5 and an False Discovery Rate less than 0.05 were identified as DEGs. The “ChAMP” package was used for normalization and differential analysis of the TCGA-BLCA cohort’s 450 K DNA methylation arrays. CpG sites with adj.p values less than 0.001 and absolute logFC greater than 0.4 were considered significantly different. Neutrophil-related cells and DEGs identified in single-cell data were intersected, and the intersected genes were used for further analysis.

Candidate prognostic genes in bladder cancer were screened using Kaplan–Meier (KM) analysis and univariate Cox regression analysis. KM analysis was performed on the intersected genes with a threshold of p < 0.05. Genes with p < 0.05 in univariate Cox regression analysis were considered candidate prognostic genes. The “igraph” package was used to plot correlation networks of candidate prognostic genes. The “maftools” package was used to generate mutation frequency and tumor waterfall plots for candidate prognostic genes.

BLCA subtype identification method

To identify reliable molecular subtypes of BLCA, we propose an Multi-omics Data Integration based on Denoising autoencoder and Joint Deep Semi-nonnegative matrix factorization (MDIDJSD) framework that integrates BLCA gene expression data and DNA methylation data. The framework consists of two consecutive components: DAE for learning a low-dimensional latent representation of the integrated multi-omics data, JDSNMF model applied to the learned latent space to identify subtype-specific structures.

Let the feature matrices of gene expression data and DNA methylation data be denoted as Inline graphic and Inline graphic, where Inline graphic is the number of samples, and Inline graphic and Inline graphic are the numbers of genes and DNA methylation sites, respectively. The two modalities are concatenated to form the integrated input matrix Inline graphic.The encoder of the DAE is defined as a stack of linear transformations followed by ReLU activation functions:

graphic file with name d33e369.gif 1

where Inline graphic denotes the input feature vector of a single sample, Inline graphic are the hiddenlayer activations, Inline graphic is the latent representation, and Inline graphic are the learnable weights and biases. The decoder reconstructs the input from the latent representation:

graphic file with name d33e391.gif 2

where Inline graphic is the reconstructed input. After training, the encoder output Inline graphic for all samples forms the latent matrix Inline graphic, which is subsequently used as the input to the JDSNMF model. The goal of JDSNMF is to further decompose the DAE-learned latent matrix Inline graphic into subtype-relevant components through a multi-layer factorization structure. For each view Inline graphic (here corresponding to the integrated latent representation), the optimization problem is defined as:

graphic file with name d33e417.gif 3

S.T. Inline graphic

Inline graphic denotes the latent representation matrix obtained from the DAE. Inline graphic denotes the shared basis matrix representing sample-subtype associations, where Inline graphic is the number of subtypes. Inline graphic denotes the layer-specific embedding matrices in the deep factorization structure. Inline graphic denotes the final non-negative latent coefficient matrix encoding subtype-specific features. Inline graphic for Inline graphic is the intermediate non-negative hidden matrices; in particular, Inline graphic denotes the output of the penultimate factorization layer. Inline graphic is a constant initialization matrix (e.g., identity or uniform matrix) used to stabilize the deep factorization.Inline graphic denotes an element-wise non-linear activation function (ReLU in this study), introduced to enhance the representational capacity of the factorization model.Inline graphic denotes a regularization matrix that enforces smoothness and numerical stability of the subtype representations. Inline graphic denotes a regularization parameter controlling the contribution of the penalty term.Inline graphic represents a layer-wise deep factorization, where each embedding matrix is followed by a nonlinear activation.

The optimization problem is solved using an alternating minimization strategy, where Inline graphic, and Inline graphic are updated iteratively while fixing the remaining variables. The algorithm is considered to have converged when either the relative change in the objective function value between two consecutive iterations falls below Inline graphic, or a predefined maximum number of iterations is reached. Conceptually, the DAE first learns a compact and noise-reduced latent representation of the integrated multi-omics data. The JDSNMF then factorizes this latent representation into a shared basis matrix and multiple layer-wise latent components. The optimization objective seeks to minimize the reconstruction error of the latent space while enforcing non-negativity and structural regularization, thereby extracting biologically meaningful and subtype-specific patterns. This twostage framework enables robust identification of BLCA molecular subtypes by combining representation learning and structured matrix factorization.

Enrichment analysis

Metascape analysis was performed to identify the biological functions and signaling pathways potentially involved in bladder cancer progression by the intersecting genes. Gene Set Variation Analysis (GSVA) analysis was conducted based on the c2.cp.kegg.v7.5.1.symbols.gmt gene set to identify differential pathways among neutrophil-related bladder cancer subtypes.

Prognostic model construction

To identify prognostic neutrophil genes in bladder cancer, we integrated 10 machine learning algorithms (survival-SVM, CoxBoost, Ridge, Lasso, StepCox, RSF, GBM, Enet, plsRcox, and SuperPC) and 101 algorithm combinations. First, we applied 101 algorithm combinations to the prognostic candidate genes identified by univariate Cox regression analysis. We then calculated the c-index of each prognostic model constructed by these algorithms in the TCGA-BLCA cohort and GSE13507. The algorithm combination with the highest average c-index in both datasets was considered the best. Subsequently, we calculated risk scores for bladder cancer patients in the TCGA-BLCA cohort and GSE13507 using the linear predictor. KM analysis was used to compare the survival differences between high and low-risk groups. The R package “timeROC” was used to plot 1-year, 3-year, and 5-year Receiver Operating Characteristic (ROC) curves to evaluate the predictive accuracy of the prognostic model. Considering the clinical significance of the prognostic model, a nomogram model was developed based on the genes and risk groups in the prognostic model. Calibration curves and ROC curve analysis were used to assess the performance of the nomogram model in predicting 1-year, 3-year, and 5-year survival of bladder cancer patients.

Immune infiltration analysis

The R package “estimate” was used to calculate ESTIMATEScore, ImmuneScore, StromalScore, and TumorPurity for bladder cancer patients in the TCGA-BLCA cohort. The wilcox.test was used to calculate the significance p-value of immune function and immune cell scores between risk subgroups based on single-sample Gene Set Enrichment Analysis (ssGSEA) analysis results. Spearman correlation was used to assess the relationship between prognostic genes and immune cells and immune functions. A p-value less than 0.05 was considered statistically significant.

TMB and drug sensitivity analysis

The R package “maftools” was used to generate somatic mutation frequency and distribution of variant genes in high-risk and low-risk bladder cancer patients. The TMB calculation included multiple types of variants, such as non-synonymous SNVs and indels. Based on the survival time and status of bladder cancer patients, the median TMB score was used to classify patients into high and low TMB groups. KM analysis was performed to evaluate the association between TMB groups and bladder cancer survival. To evaluate the combined prognostic impact of TMB and the risk score, we used the surv_cutpoint function to determine the optimal TMB cutoff based on survival outcomes, thereby classifying patients into low-TMB (L-TMB) and high-TMB (H-TMB) groups. Similarly, the risk score was divided into low-risk (L-Risk score) and high-risk (H-Risk score) groups. Subsequently, TMB status and risk score were combined to generate four subgroups (L-TMB + L-Risk score, L-TMB + H-Risk score, H-TMB + L-Risk score, and H-TMB + H-Risk score), which were used for subsequent survival analyses. To assess the value of the risk model in clinical treatment of bladder cancer, the association between risk scores and the efficacy of common chemotherapy drugs was explored. The R package “oncoPredict” was used to predict the half-maximal inhibitory concentration (IC50) of chemotherapy drugs. The wilcox.test was used to calculate the differences in IC50 values of drugs between high and low-risk groups. A p-value less than 0.05 was considered statistically significant.

Pan-cancer analysis

Expression landscape of prognostic genes in different tumor types: Wilcox.test was performed on 33 tumor types to analyze the differential expression of prognostic genes between tumor and normal tissues. Differential gene levels were displayed as log2 FC in heatmaps. The R package “corrplot” was used for correlation analysis of prognostic genes to explore possible expression patterns among them.

Clinical significance of prognostic genes in different tumor types: Patients were divided into high and low expression groups based on the expression level of each prognostic gene. The R package “survival” was used to analyze survival differences between high and low expression groups, and KM curves were plotted. Additionally, Cox regression was used to calculate the hazard ratio of prognostic genes in 33 cancers. The significance p-value threshold was set at 0.05.

Association between prognostic genes and the immune microenvironment in different tumor types: Thorsson Vésteinn et al. identified six immune subtypes (wound healing (C1), IFN-γ dominant (C2), inflammatory (C3), lymphocyte depleted (C4), immunologically quiet (C5), TGF-β dominant (C6)) based on the immune signature of TCGA tumor types [28]. Differences among cancer immune subtypes reveal distinct tumor characteristics closely related to patient prognosis and treatment. The kruskal.test was used to calculate the significance p-value of prognostic gene expression among six immune subtypes. The R package “estimate” was used to calculate immune cell infiltration levels, stromal content, stromal-immune scores, and tumor purity for 33 cancers. The R package “corrplot” was used to calculate the correlation between prognostic gene expression and ESTIMATEScore, ImmuneScore, StromalScore, and TumorPurity.

Association between prognostic genes and stemness indices in different tumor types: Stemness analysis was performed on TCGA tumor samples, and two stemness indices, RNA stemness score (RNAss) and (B) DNA stemness score (DNAss), were calculated. Spearman correlation analysis was further conducted between the expression of prognostic genes and stemness indices.

RT-qPCR and western blot analysis methods

In this study, Quantitative real-time PCR (RT-qPCR) experiments were conducted to validate the mRNA expression levels of key prognostic genes identified in two groups (normal control group and BLCA group). Specifically, the control group utilized the immortalized normal urothelial cell line SV-HUC-1, while the BLCA group employed the bladder cancer cell line HT-1376 (sourced from the National Infrastructure of Cell Line Resource in Beijing, China). RNA was extracted from the cells, and cDNA was synthesized using reverse transcription. RT-qPCR was performed to amplify the mRNA, followed by data processing and analysis, and histogram plotting. Primer sequences and the quantitative PCR reaction system are provided in the supplementary material file mRNA.xlsx.

Bladder urothelial cancer cell lines (T24) and normal bladder cell lines (SV-HUC-1) were added into 1 mL cell lysis solution for cell lysis. Protein samples were mixed with 10% SDS gel loading buffer at 4 °C for 4 min, and boiled at 100 °C for 10 min. After that, proteins were separated by electrophoresis and transferred to nitrocellulose membranes. After being blocked overnight with 5% skim milk at 4 °C, the membranes were incubated with mouse anti-SLC9A3R1 (ab9526, 1:1000), rabbit anti-TOR1AIP2 (ab317701, 1:1000), anti-SLC2A3 (ab314193, 1:1000), anti-MAFG (ab154318, 1:1000), anti-TCIRG1 (83351-6-RR, 1:1000), anti-DEDD2 (14574-1-AP, 1:1000), anti-GAPDH (ab9485, 1:1000) and anti-β-Tubulin (ab15568, 1:1000) antibodies, and then with secondary antibodies at 37 °C. GAPDH and β-Tubulin served as the internal controls. The above antibodies were purchased from Abcam (Abcam plc, Cambridge, UK) and Proteintech (Proteintech, USA). The proteins were detected by ImageQuant™ LAS 4000 (GE Healthcare, Piscataway, NJ, USA), and quantitative analysis of the western blot was conducted using ImageJ software.

Statistics analyses

Differences between two or multiple groups were assessed using the Wilcoxon rank-sum test as appropriate. For survival analyses, P values obtained from univariate Cox proportional hazards regression analyses were adjusted for multiple testing using the Benjamini-Hochberg (BH) method. Similarly, p values derived from ssGSEA analyses and drug sensitivity analyses were corrected using the BH procedure. In the pan-cancer analyses of the six prognostic genes, including differential expression analyses, KM analyses, univariate Cox regression, ESTIMATE scores, RNAss, and DNAss, multiple testing correction was consistently performed using the BH method. The above statistical analysis results are presented in Table S2-S9 in the Supplementary Material. p value < 0.05 was considered statistically significant.

Results

Basic analysis results of scRNA-seq data

Figure 1 presents the technical roadmap of this study. Initially, we obtained scRNA-seq data for BLCA and its control group samples from three datasets (GSE129845, GSE135337, and GSE267718). After quality control for each dataset, cells from the three sources were merged (Supplementary Figure S2). The merged cells were further filtered, retaining 83,370 cells. Subsequently, batch effects were removed using the R package “harmony”. Figure 2A and B show the distribution of cells in two-dimensional space before and after batch effect removal. After cell annotation, we observed nine cell types in the t-SNE plot, including smooth muscle cells, epithelial cells, monocytes, endothelial cells, T cells, tissue stem cells, neutrophils, B cells, and neurons (Fig. 2C). We also separately displayed the distribution of neutrophils in the control and BLCA groups (Fig. 2D). We then calculated the proportion of different cell types in BLCA and control group samples (Fig. 2E, F). We observed considerable variation in cell type distribution across individual samples (Fig. 2E). In the aggregate comparison, the cellular composition of BLCA samples shifted notably compared to the normal group, characterized by a lower relative proportion of smooth muscle cells and higher proportions of other immune and stromal cell types (Fig. 2F and Figure S1).

Fig. 1.

Fig. 1

Technical roadmap for this paper

Fig. 2.

Fig. 2

Landscape of the single-cell transcriptome in BLCA and control samples. A, B t-SNE visualizations of single-cell transcriptomic data before (A) and after (B) batch effect correction using the Harmony algorithm. Different colors represent distinct samples, demonstrating the effective integration of datasets from different sources. C The t-SNE projection of all 83,370 cells, annotated with nine major cell types. Each color corresponds to a specific cell type. D t-SNE plots specifically highlighting the distribution of neutrophils in the Control group (left) and BLCA group (right). E Stacked bar chart showing the relative proportion of cell types across individual samples. F Bar chart comparing the cellular composition between the aggregated Control and BLCA groups

Identification results of neutrophil-related modules in BLCA

To explore neutrophil-related gene modules and elucidate their roles in BLCA development, we performed hdWGCNA analysis. Eight modules were identified with the optimal soft threshold set at 8 (Fig. 3A). The hierarchical clustering results of genes based on different module expression patterns were visualized using a dendrogram (Fig. 3B). The grey module contains genes that were not assigned to any co-expression module. We then visualized the top 10 hub genes in each of the eight co-expression modules (Fig. 3C) and plotted the correlation heatmap between modules based on the expression of these hub genes (Fig. 3D). Modules 1, 2, and 4 were found to be most associated with neutrophils (Fig. 3E, F). Finally, we constructed interaction networks among the hub genes for each of these three modules (Fig. 3G–I).

Fig. 3.

Fig. 3

Identification of neutrophil-specific gene modules in BLCA using the hdWGCNA algorithm. A Analysis of network topology for various soft-thresholding powers. The left panel displays the scale-free fit index, and the right panel shows the mean connectivity. The red line indicates the selected soft-threshold power. B Gene dendrogram obtained by hierarchical clustering of expression patterns. The colored row underneath the tree indicates the module assignment for each gene. C Bar plots displaying the top 10 hub genes ranked by their intramodular connectivity within neutrophils. The colors of the bars correspond to the distinct co-expression modules identified in B. D Eigengene adjacency heatmap showing the correlation between different gene modules. Red represents high positive correlation, while blue represents low correlation. E Dot plot visualizing the expression of module-specific signatures in neutrophils. The size of the dot represents the percentage of cells expressing the genes, and the color intensity indicates the average expression level. F Violin plots showing the distribution of module eigengene scores for the top 25 hub genes in the three modules most strongly associated with neutrophils. G–I Circular interaction networks of the top hub genes for the three selected neutrophil-associated modules. Nodes represent genes, and the edges indicate co-expression relationships

Results of cell communication and TF analysis

This study identified communication pathways between neutrophil clusters and other cell groups through cell communication analysis. Specifically, we identified the number and intensity of intercellular communications among different cell types (Fig. 4A, B). We further evaluated the significant pathways for neutrophils acting as ligands and receptors in communication with other cells (Fig. 4C, D) as well as their significant input and output pathways (Fig. 4E, F), including Annexin, Calcitonin Receptor (CALCR), Chemokine (C-X-C motif) Ligand (CXCL), Macrophage Migration Inhibitory Factor (MIF), and Vascular Endothelial Growth Factor (VEGF). These pathways, which play crucial roles in BLCA development, will be analyzed in detail in the discussion section (Fig. 4G–K).

Fig. 4.

Fig. 4

Intercellular communication networks between neutrophils and the BLCA tumor microenvironment inferred by CellChat. A, B Circular chord diagrams illustrating the number of interactions (A) and interaction strength (B) among nine cell types. The width of the connecting lines is proportional to the number or strength of ligand-receptor interactions between cell pairs. C, D Bubble plots identifying significant ligand-receptor pairs where neutrophils act as the sender (source) targeting other cells (C), and where neutrophils act as the receiver from other cells (D). The dot size represents the significance (p-value), and the color gradient indicates the communication probability. E, F Heatmaps displaying the dominant signaling patterns across cell groups. E shows the outgoing signaling patterns, while F shows the incoming signaling patterns. Darker colors indicate a higher contribution (relative strength) of a cell type to a specific signaling pathway. G–K Violin plots validating the expression levels of key signaling genes involved in the ANNEXIN (G), CALCR (H), CXCL (I), MIF (J), and VEGF (K) pathways across different cell types. The Y-axis represents the log-transformed gene expression level, and colors correspond to different cell types

Additionally, to assess the trend of TF activity changes in the BLCA and control groups, we obtained the viper scores for the two groups using the DoRothEA package (Fig. 5A). Neutrophils in BLCA tissues exhibited decreased TF activity, likely resulting from the combined effects of various pathways (Fig. 5B–D). These pathways include miRNA regulation, cell fate determination, transcription regulatory complexes, RNA polymerase II-associated pathways, transcriptional repressor complexes, and DNA-binding TF activity. In the tumor microenvironment, the dysregulation of these pathways may lead to neutrophil dysfunction, inhibiting their antitumor capability and promoting tumor growth and immune evasion.

Fig. 5.

Fig. 5

TF activity estimation and functional enrichment analysis in BLCA neutrophils. A Heatmap displaying the inferred activity of the top variable TF in neutrophils from Normal and BLCA samples. The activity scores were calculated using DoRothEA regulons. Red indicates high TF activity, while blue indicates low activity. B Bar graph showing the top enriched ontology terms and pathways for the differential TFs identified by Metascape. The length of the bar represents the statistical significance (-log10 p-value). C Dot plot illustrating the GO and KEGG pathway enrichment results. The size of the dot corresponds to the gene count, and the color gradient (from blue to red) represents the adjusted p-value. D Network visualization of the enriched terms generated by Metascape. Each node represents an enriched term, and nodes are colored by their cluster identity. Edges link terms with a similarity score > 0.3

Prognostic gene selection in BLCA

In this study, we first performed differential expression analysis on RNA-seq and DNA methylation data from both the control and BLCA groups in the TCGA-BLCA cohort (Fig. 6A, B). For RNA-seq data, we retained 1098 DEGs with |logFC|> 0.5 and padj < 0.05. Regarding DNA methylation data, we identified 7008 differentially methylated sites using a threshold of |logFC|> 0.4 and padj < 0.001. We intersected genes from previously significant modules associated with neutrophils with DEGs, yielding 130 intersecting genes (Fig. 6C). Through enrichment analysis using Metascape, we identified multiple pathways significantly associated with BLCA (Fig. 6D), including pathways such as response to hypoxia and negative regulation of intracellular signal transduction, which are known to play crucial roles in tumor growth and drug resistance. Hypoxia responses may enhance the invasive and drug-resistant capabilities of bladder cancer cells under low oxygen conditions. Negative regulation of intracellular signal transduction could influence cancer cell proliferation, differentiation, and survival, potentially inhibiting cancer progression. Further detailed pathway analyses are discussed separately.

Fig. 6.

Fig. 6

Identification and genetic characterization of neutrophil-related prognostic genes in BLCA. A, B Volcano plots illustrating the differential analysis results for (A) gene expression and (B) DNA methylation in the TCGA-BLCA cohort. Red dots indicate significantly upregulated genes/sites, blue dots indicate downregulated ones, and grey dots represent non-significant changes. The thresholds for significance were |logFC|> 0.5 for RNA-seq and |logFC|> 0.4 for methylation, with adjusted p-values < 0.05 and < 0.001, respectively. C Venn diagram showing the intersection of 130 genes shared between neutrophil-associated module genes and DEGs. D Network visualization of significantly enriched pathways for the intersecting genes identified by Metascape. E Circos plot summarizing the prognostic value and interactions of the candidate genes. The outer ring represents the gene expression pattern (Red: Upregulated; Blue: Downregulated). The inner ring indicates the prognostic risk type based on univariate Cox regression (Yellow: Risk factor; Purple: Favorable factor). The connecting lines in the center depict the correlation between gene expression levels. F Bar chart showing the CNV frequency of prognostic genes. Red bars represent copy number amplification (Gain), and blue bars represent deletion (Loss). G Summary of the somatic mutation profile for the prognostic genes, including variant classification, variant type, SNV class, and the top 10 mutated genes. H Waterfall plot depicting the somatic mutation landscape of the 25 prognostic genes in BLCA patients. Each column represents a patient, and different colors indicate distinct mutation types

In Fig. 6E, we depict an interaction and expression correlation network of genes significantly associated with the prognosis of BLCA patients. Using univariate Cox regression analysis (Figure S2B) and KM tests, we constructed a prognostic model for predicting OS based on 25 intersecting genes with p-values < 0.05. We further evaluated the expression differences of these genes between BLCA and control groups (Figure S2A), with most genes exhibiting copy number increases above normal states potentially leading to gene overexpression (Fig. 6F). Finally, we summarized and ranked the mutation frequencies of these 25 genes using the Maftools package (Fig. 6G, H), with FLNA showing the highest mutation frequency.

Identification BLCA subtypes and immune landscape

This study proposes the MDIDJSD framework, integrating BLCA transcriptomic and methylation data using a combination of DAE and JDSNMF algorithms. Specifically, the normalized data from both sources are merged, and latent features in the bottleneck layer are first extracted using DAE. Subsequently, JDSNMF further decomposes the latent feature matrix, and clustering is performed on the decomposed base matrix using Agglomerative clustering to obtain the final clustering results. We set the number of clusters between 2 and 4 (Figure S3A-C) and introduce multiple clustering algorithms as baselines, evaluating their performance using three clustering metrics: silhouette score, calinski harabasz score, and bouldin score. Based on the changes in these three metrics, we identify two subtypes of BLCA (Fig. 7A and Supplementary Table S1). GSVA analysis of these two subtypes reveals that subtype A is highly active in multiple signaling pathways, immune response pathways, and cell–cell interaction pathways, suggesting potentially stronger signaling and immune response capabilities (Fig. 7B). In contrast, these pathways show lower expression in subtype B, implying reduced activity in these functions. Furthermore, we evaluate the immune landscape between the two subtypes. Significant differences are observed in the expression of most immune checkpoint sites and all HLA genes between subtypes A and B (Fig. 7C, D). The ssGSEA algorithm assesses differences in immune cell infiltration abundance and immune function scores between the subtypes (Fig. 7E). Apart from Type II IFN Response, significant differences are observed in other immune cells/functions.

Fig. 7.

Fig. 7

Identification of BLCA molecular subtypes and characterization of their immune landscape. A PCA plot visualizing the distribution of BLCA samples in the two-dimensional space defined by the MDIDJSD-derived latent features. Samples are color-coded by their subtype assignment (Subtype A vs. Subtype B). B Heatmap illustrating the GSVA results for the two identified subtypes. The rows represent KEGG pathways, and the color gradient from blue to red indicates the relative pathway activity. C, D Box plots comparing the expression levels of (C) immune checkpoint-related genes and (D) HLA family genes between Subtype A (red) and Subtype B (blue). Statistical significance was determined using the Wilcoxon test (* p < 0.05, ** p < 0.01, *** p < 0.001). E Box plots displaying the differences in immune cell infiltration abundance and immune function scores between the two subtypes, calculated using the ssGSEA algorithm. Subtype A is represented in red and Subtype B in blue

To further validate the superiority of the MDIDJSD algorithm, we compared it with K-means, Spectral clustering, Agglomerative clustering, and Gaussian mixture clustering using gene expression data, DNA methylation data, and their merged representation (Tables S10–S21). The results consistently showed that MDIDJSD achieved better clustering performance than the competing methods when the number of clusters ranged from 2 to 4. In addition, ablation experiments were conducted to evaluate the contribution of each module in the proposed framework. Specifically, we compared the clustering performance obtained using merged data with DAE alone or JDSNMF alone at three clusters (Tables S19–S20). The combined DAE–JDSNMF framework consistently outperformed either component used in isolation, highlighting its advantage in multi-omics integration.

Furthermore, we assessed the stability of the MDIDJSD-derived subtypes using a bootstrap resampling strategy. Briefly, 80% of samples were randomly selected without replacement and the complete MDIDJSD pipeline was repeated 100 times. Clustering stability was quantified using the proportion of ambiguous clustering (PAC) between bootstrap-derived labels and the full-sample labels (Table S21). The resulting consensus matrix exhibited a clear block-diagonal structure, and MDIDJSD achieved a low PAC value and high average Adjusted Rand Index (ARI), indicating robust and reproducible subtype assignments. In contrast, baseline clustering methods showed higher ambiguity and lower label consistency under the same resampling scheme. These results further demonstrate that MDIDJSD not only outperforms existing clustering approaches in terms of clustering quality, but also yields stable and reliable BLCA subtypes.

Risk model construction and immune landscape of BLCA

Based on the expression profiles of 25 prognostic genes, we developed a signature consisting of 6 prognostic genes using various machine learning algorithms. Specifically, we utilized 101 prediction algorithms to compute the C-index in both the TCGA-BLCA dataset and the GSE13507 dataset (Fig. 8A). The optimal model, comprising a combination of Lasso and StepCox algorithms, achieved the highest average C-index of 0.6503, demonstrating superior performance across all validation datasets. Ultimately, a risk model consisting of 6 genes (TOR1AIP2, TCIRG1, SLC2A3, SLC9A3R1, MAFG, and DEDD2) was established. We stratified all BLCA samples into high and low expression groups based on the median expression of these six genes. Prognostic differences between the high and low expression groups of these six genes were significant (Figure S2C-H).

Fig. 8.

Fig. 8

Construction and validation of a machine learning-based prognostic risk model and clinical nomogram. A Heatmap displaying the C-indices of prognostic models constructed using 101 combinations of machine learning algorithms. The models were evaluated in both the TCGA-BLCA (training) and GSE13507 (testing) cohorts. The color gradient represents the C-index value, with the Lasso + StepCox [both] combination achieving the highest average performance. B, D Kaplan–Meier survival curves for the (B) TCGA-BLCA and (D) GSE13507 cohorts. Patients were stratified into high-risk (Red) and low-risk (Blue) groups based on the median risk score. P-values were calculated using the log-rank test. C, E Time-dependent ROC curves evaluating the predictive accuracy of the risk model for 1-, 3-, and 5-year overall survival (OS) in the (C) TCGA-BLCA and (E) GSE13507 cohorts. Different colors correspond to different time points (Blue: 1-year; Red: 3-year; Yellow: 5-year). F A prognostic nomogram integrating the expression levels of six key genes (TOR1AIP2, TCIRG1, SLC2A3, SLC9A3R1, MAFG, DEDD2) and the risk group status to predict 1-, 3-, and 5-year survival probability. G Calibration curves assessing the agreement between nomogram-predicted (X-axis) and observed (Y-axis) survival probabilities. The gray dashed diagonal line represents the ideal prediction, while the colored solid lines represent the actual performance at 1, 3, and 5 years. H ROC curves showing the discrimination ability of the nomogram for predicting 1-, 3-, and 5-year survival

Additionally, the risk model constructed in this study provides a risk score for each BLCA sample. Based on the median score, we divided all BLCA samples into high and low-risk groups. Significant differences in survival outcomes between these two groups were observed in both the TCGA-BLCA (training set) (Fig. 8B, C) and GSE13507 datasets (testing set) (Fig. 8D, E). In the training set, the Area Under the Curve (AUCs) for predicting patient survival at 1 year, 3 years, and 5 years were 0.701, 0.701, and 0.706, respectively. In the testing set, the AUCs for predicting survival at 1 year, 3 years, and 5 years were 0.742, 0.691, and 0.633, respectively. Finally, a nomogram model for BLCA was constructed based on the expression levels of the six genes and the risk score (Fig. 8F, G). This model achieved AUCs of 0.698, 0.697, and 0.700 for predicting survival at 1 year, 3 years, and 5 years, respectively (Fig. 8H).

Further evaluation of the immune landscape in high- and low-risk groups was conducted in this study. Initially, based on the ESTIMATE algorithm, significant differences were observed between high- and low-risk group samples in immune score, stromal score, ESTIMATE score, and tumor purity (Fig. 9A–D). Subsequently, differences between the two groups were assessed in terms of immune cell infiltration abundance and immune function scores (Fig. 9E), as well as the correlation between diagnostic-related genes and immune cell/function (Fig. 9F–K).

Fig. 9.

Fig. 9

Characterization of the immune microenvironment landscape in high- and low-risk BLCA patients. AD Violin plots comparing the (A) ImmuneScore, (B) StromalScore, (C) ESTIMATEScore, and (D) TumorPurity between the high-risk (red) and low-risk (blue) groups calculated by the ESTIMATE algorithm. Statistical significance was determined using the Wilcoxon test. E Box plots showing the differences in the abundance of 29 immune-related gene sets (immune cells and functions) between the two risk groups derived from ssGSEA analysis. Red and blue boxes represent the high- and low-risk groups, respectively. Statistical significance is marked by asterisks. F–K Lollipop charts visualizing the Spearman correlation between the infiltration levels of immune cells and the expression of the six prognostic genes: (F) DEDD2, (G) MAFG, (H) SLC2A3, (I) SLC9A3R1, (J) TCIRG1, and (K) TOR1AIP2. The horizontal axis represents the correlation coefficient (r). The size of the dots corresponds to the absolute correlation value, and the color gradient represents the p-value significance. * p < 0.05, ** p < 0.01, *** p < 0.001

TMB landscape in high and low-risk groups

In this study, we performed TMB analysis on samples from high and low-risk groups to explore the TMB levels and the relationship between mutated genes and prognosis in these groups. Specifically, we found that BLCA patients in the high-risk group exhibited a higher mutation frequency compared to those in the low-risk group (Fig. 10A, B). The groups categorized by TMB scores also showed significant differences in prognosis (Fig. 10C). Furthermore, combining the risk group scores, we observed significant differences in prognosis among the four groups of samples (Fig. 10D). Finally, to identify potential drugs effective for stratified treatment of BLCA, we investigated the drugs with significantly different sensitivities between the high and low-risk groups (Fig. 10E–K). The potential therapeutic effects of these drugs will be analyzed in the discussion section.

Fig. 10.

Fig. 10

Genetic mutation landscape, TMB prognostic analysis, and drug sensitivity prediction. A, B Waterfall plots visualizing the somatic mutation landscape of the top 20 driver genes in the (A) high-risk and (B) low-risk groups. Each column represents an individual patient, and different colors indicate specific mutation types as shown in the legend. The top bar plot indicates the TMB of each sample. C KM survival curves comparing OS between patients with High-TMB (H-TMB) and Low-TMB (L-TMB). Red and blue lines represent the High and Low TMB groups, respectively. D Stratified Kaplan–Meier survival analysis combining TMB status and risk scores. Patients are divided into four subgroups: High-TMB + High-Risk, High-TMB + Low-Risk, Low-TMB + High-Risk, and Low-TMB + Low-Risk, distinguished by different colors. (E-K) Box plots showing the predicted IC50 values of chemotherapeutic agents that exhibit significant differences between the two groups. The high-risk group is shown in red and the low-risk group in blue. A lower IC50 value indicates higher drug sensitivity

Expression patterns of prognostic genes in pan-cancer

To explore the potential roles of prognostic genes in various cancers, this study evaluated the expression patterns of six prognostic genes across different cancer types (Fig. 11A–F). The results demonstrated significant differences in their expression between cancerous tissues and control groups in breast cancer (BRCA), cholangiocarcinoma (CHOL), glioblastoma (GBM), head and neck squamous cell carcinoma (HNSC), and uterine corpus endometrial carcinoma (UCSC). Further analysis revealed that all six prognostic genes were highly expressed in cholangiocarcinoma (Fig. 11G). Additionally, there was a strong correlation among most of the prognostic genes, with the highest correlation observed between SLC2A3 and MAFG (corr = 0.26) (Fig. 11H).

Fig. 11.

Fig. 11

Pan-cancer expression landscape of the six prognostic genes. A–F Box plots comparing the mRNA expression levels of the six prognostic genes [(A) DEDD2, (B) MAFG, (C) SLC2A3, (D) SLC9A3R1, (E) TCIRG1, and (F) TOR1AIP2] between tumor tissues and adjacent normal tissues across 18 cancer types. Red boxes represent tumor samples, and blue boxes represent normal control samples. G Heatmap summarizing the differential expression profiles (log2 Fold Change) of the six genes across 18 cancer types. Red indicates upregulated expression in tumors, while blue indicates downregulated expression. H Correlation heatmap displaying the pairwise co-expression relationships among the six prognostic genes in BLCA. The color gradient and numbers represent the correlation coefficients (Red: positive correlation; Blue: negative correlation). (* p < 0.05, ** p < 0.01, *** p < 0.001)

Subsequently, the relationship between the six prognostic genes and overall survival in pan-cancer was analyzed using KM analysis. Figure 12A–K present the results with p < 0.1. Significant differences in OS were observed between the high and low expression groups in cancers such as BLCA, LGG, LIHC, and STAD, based on the expression levels of the majority of prognostic genes. These results suggest that the identified prognostic genes exhibit heterogeneous expression patterns and context-dependent prognostic associations across different cancer types. While the effect sizes vary and are modest in several cancers, these observations provide supportive evidence that the BLCA-derived prognostic genes may have broader relevance beyond bladder cancer and warrant further investigation. The univariate Cox regression analysis in Fig. 12L provides a clearer illustration of the relationship between prognostic genes and pan-cancer prognosis.

Fig. 12.

Fig. 12

Pan-cancer survival analysis and prognostic significance of the six candidate genes. A–K KM overall survival curves for the prognostic genes (including DEDD2, MAFG, SLC2A3, SLC9A3R1, and TCIRG1) in representative cancer types (e.g., BLCA, LGG, LIHC, STAD, KIRC, MESO, and UVM). Patients were stratified into high-expression (Red) and low-expression (Blue) groups based on the median gene expression levels. P-values were calculated using the log-rank test. L Forest plot summarizing the univariate Cox regression analysis of the six prognostic genes across 33 cancer types. The x-axis displays the Hazard Ratio (HR) on a logarithmic scale. The squares represent the estimated HR, and the horizontal lines indicate the 95% confidence intervals.

Furthermore, this study explored the correlation between prognostic genes and tumor stemness using RNAss and DNAss analysis methods (Fig. 13A, B). It was found that TCIRG1 and SLC2A3 were negatively correlated with RNAss in most cancers.

Fig. 13.

Fig. 13

Pan-cancer associations of prognostic genes with tumor stemness and the immune microenvironment. A, B Heatmaps illustrating the Spearman correlation between the expression of the six prognostic genes and tumor stemness indices: (A) RNAss and (B) DNAss across 33 cancer types. The color gradient represents the correlation coefficient, where red indicates a positive correlation and blue indicates a negative correlation. C–F Heatmaps displaying the correlations between gene expression and tumor microenvironment scores calculated by the ESTIMATE algorithm: (C) ESTIMATEScore, (D) ImmuneScore, (E) StromalScore, and (F) TumorPurity. Red represents a positive correlation, while blue represents a negative correlation. G Distribution of prognostic gene expression levels across six pan-cancer immune subtypes (C1: Wound healing; C2: IFN-γ dominant; C3: Inflammatory; C4: Lymphocyte depleted; C5: Immunologically quiet; C6: TGF-β dominant). Statistical significance was determined using the Kruskal–Wallis test (*** p < 0.001)

Lastly, the relationship between the six prognostic genes and the immune microenvironment in pan-cancer was investigated. Specifically, the ESTIMATE algorithm was employed to assess the ESTIMATE scores, immune scores, stromal scores, and tumor purity of prognostic genes across different cancers (Fig. 13C–F). TCIRG1 and SLC2A3 were found to be positively correlated with ESTIMATE scores, immune scores, and stromal scores in a majority of cancers, while they were negatively correlated with tumor purity. Additionally, the expression of prognostic genes across different immune phenotypes (wound healing, IFN-γ dominance, inflammation, lymphocyte depletion, immune silence, and TGF-β dominance) in pan-cancer was determined (Fig. 13G). Significant differences were observed in the immune phenotypes of all the genes.

Experimental validation of prognostic genes

To validate the expression patterns of the prognostic genes in control and BLCA groups, we performed RT-qPCR. TOR1AIP2, TCIRG1, SLC9A3R1, and DEDD2 were significantly upregulated in the BLCA group, whereas SLC2A3 and MAFG were higher in the control group (Fig. 14A–F). To validate the prognostic model at the translational level, we further assessed protein expression using Western blotting (Fig. 15). GAPDH or β-Tubulin was utilized as the loading control based on the molecular weights of the target proteins to ensure accurate quantification. As shown in Fig. 15A, B, the protein expression patterns of TOR1AIP2, TCIRG1, SLC9A3R1, and DEDD2 were significantly upregulated in BLCA tissues, whereas SLC2A3 was downregulated. These trends were largely concordant with the mRNA findings from RT-qPCR and the bioinformatics analysis (Figure S1A). In contrast, although MAFG showed differential expression at the mRNA level, it exhibited no significant difference between the groups at the protein level.

Fig. 14.

Fig. 14

Experimental results of qRT-PCR. A–F is the expression difference of TOR1AIP, TCIRG1, SLC2A3, SLC9A3R1, MAFG and DEDD2 in control group and BLCA group, respectively

Fig. 15.

Fig. 15

Experimental validation of prognostic genes by Western Blot. A Representative Western Blot images. B Relative protein expression levels of TOR1AIP2, TCIRG1, SLC2A3, SLC9A3R1, MAFG, and DEDD2 in the control and BLCA groups

Discussion

This study comprehensively investigated the critical role of neutrophils in the occurrence and progression of BLCA by integrating multiple omics data and machine learning methods. Firstly, we utilized scRNA-seq data and the hdWGCNA algorithm to identify gene modules associated with neutrophils. By exploring the communication between neutrophil populations and other cell groups, we identified several significant pathways, including Annexin, CALCR, CXCL, MIF, and VEGF. These findings are consistent with recent studies highlighting the pivotal role of neutrophils in remodeling the BLCA microenvironment. For instance, Jing et al. recently demonstrated that tumor-neutrophil crosstalk, particularly through cytokine signaling, orchestrates the immunosuppressive microenvironment in bladder cancer and limits the efficacy of CD8 + T cells [6]. Our identification of the MIF signaling pathway also aligns with previous findings by Otterbein et al., who reported that MIF promotes bladder cancer progression by enhancing cell proliferation and angiogenesis [27]. The enrichment of VEGF signals in our neutrophil clusters further supports the notion that TAN serve as a key source of angiogenic factors, driving tumor vascularization under hypoxic conditions. Furthermore, some genes involved in these pathways (ADM, CXCL1, VEGFA, and KDR) have also been confirmed to be associated with BLCA progression. RNA interference targeting adrenomedullin can induce apoptosis in BLCA cells and inhibit their growth [28]. Yu-Chieh Tsai et al. found that selective inhibition of HDAC6 could enhance BLCA radiosensitivity and attenuate radiation-induced CXCL1 signaling [29]. BLCA patients with higher VEGFA expression levels showed a trend towards shorter cancer-specific survival [30]. Liu et al. identified VEGF and KDR as important markers of BLCA, and their co-expression suggests autocrine VEGF signaling in tumor cells [31].

Secondly, this study explored the differences in TF activity between BLCA and control groups, identifying enriched pathways of the top TFs in both groups. Some of these pathways (miRNA regulation, transcriptional repressor complex, and stem cell differentiation) have been confirmed to be associated with BLCA progression. Fan et al. identified multiple potential miRNA-mRNA regulatory mechanisms in BLCA [32]. Chen et al. discovered a new mechanism by which BMI1 activates P-glycoprotein through transcriptional repression of miR-3682-3p, enhancing BLCA cell chemoresistance [33]. Seyung S. Chung et al. found that BLCA cells in co-culture could induce differentiation of human stem cells into urothelial cells via paracrine FGF10 signaling [34].

Thirdly, through a combination algorithm framework based on DAE and JDSNMF algorithms, we further identified reliable BLCA subtypes. These subtypes exhibited significant differences in enriched pathways and immune landscapes, revealing the complex biological characteristics of BLCA. Specifically, we first obtained DEGs between control and BLCA groups from the TCGA-BLCA cohort. After intersecting these DEGs with genes in the neutrophil-related modules, we performed univariate Cox regression analysis on the intersecting genes. Metascape analysis identified multiple pathways involving 25 prognosis-related genes. Pathways such as response to hypoxia, nuclear receptors meta, and apoptotic signaling pathway have been confirmed to be closely related to BLCA. Su et al. found that hypoxia-induced circELP3 contributes to BLCA progression [35]. Shen et al. discovered that lymphotoxin β receptor signaling may be involved in promoting BLCA through the NF-κB pathway [36]. Loperamide exhibits antitumor activity in various cancers. Wu et al. found that loperamide induces protective autophagy and apoptosis in BLCA through the ROS/JNK signaling pathway [37].

Fourthly, we further constructed a risk model for BLCA based on prognostic genes and conducted an in-depth analysis of high- and low-risk groups. The risk model consisted of six genes (TOR1AIP2, TCIRG1, SLC2A3, SLC9A3R1, MAFG, and DEDD2). Among them, TCIRG1 and SLC2A3 have been confirmed to be related to BLCA prognosis. To evaluate the clinical independence of this model, univariate and multivariate Cox regression analyses were performed. Univariate analysis demonstrated that the risk score was significantly associated with overall survival (HR = 3.611, 95% CI: 2.471–5.276, P < 0.001). Importantly, multivariate analysis confirmed that the risk score remained a robust independent prognostic factor after adjusting for other clinicopathological features, including tumor stage and TNM classification (HR = 3.589, 95% CI: 2.364–5.448, P < 0.001) (Figure S4). Liu et al. analyzed the ferroptosis pathway in BLCA through multi-omics analysis and identified TCIRG1 as a prognostic marker for BLCA [38]. Ma et al. identified SLC2A3 as a hub gene and revealed the regulatory mechanism of SKAP1 on BLCA cell proliferation and apoptosis [39]. Additionally, the sensitivity of some drugs between high- and low-risk groups showed significant differences (KU-55933, RO-3306, and AURKA). Zhang et al. found that KU-55933 inhibits ATM phosphorylation upon irradiation and can be used for radiotherapy in invasive phenotypes of bladder cancer patients with DAB2IP gene deficiency [40]. AURKA expression differs significantly between patients with over ten types of cancers and control groups and has been confirmed to be significantly associated with NU7441 resistance [41]. Jinbeom Heo et al. found that combined treatment with RO-3306 and apigenin significantly inhibited tumor growth in an orthotopic BLCA xenograft animal model [42]. Similarly, MAFG, a small Maf TF, has been implicated in the regulation of oxidative stress responses. Vera-Puente et al. found that MAFG overexpression confers resistance to cisplatin by modulating the antioxidant response pathway in lung cancer cells [43]. This parallels our finding that high-risk patients with high MAFG expression show differential sensitivity to chemotherapy and distinct hypoxic response patterns. Thus, these genes are not merely statistical markers but functional drivers of neutrophil adaptation to metabolic and oxidative stress in BLCA.

Lastly, we evaluated the value of prognostic genes in the constructed risk model through pan-cancer analysis. Although the pan-cancer analyses revealed that the prognostic genes identified in BLCA show differential expression and survival associations in multiple tumor types, the observed effect sizes were generally modest and varied substantially across cancers. Therefore, these results should be interpreted as exploratory and hypothesis-generating rather than definitive evidence of universal prognostic value. Importantly, these analyses primarily serve to support the biological plausibility of the BLCA-derived gene signature and highlight potential directions for future studies in other cancer contexts. The expression trends of prognostic genes between control and BLCA groups were also validated. Despite some important findings, there are still some limitations in this study. Firstly, the data used in this study mainly came from public databases, and further validation with larger cohort samples is needed. Secondly, although we used multiple bioinformatics methods, differences between methods may affect the results. Future research should combine more clinical samples and experimental data to validate and expand our findings.

Conclusion

In summary, this study revealed the critical role of neutrophils in the occurrence and development of BLCA through multi-omics data and bioinformatics analysis, and constructed a risk model with clinical application potential. These findings provide new insights for precise stratification and individualized treatment of BLCA, with the potential to improve clinical outcomes for BLCA patients. In addition, exploratory pan-cancer analyses suggest that the identified prognostic genes may exhibit context-dependent relevance in other tumor types; however, the primary strength of this study lies in the robust identification of BLCA subtypes and prognostic signatures. Further validation in independent cohorts and functional studies will be required to clarify the broader applicability of these findings beyond bladder cancer.

Author contributions

Xiaoming Chen: Participated in data analysis and interpretation, assisted in drafting and revising the manuscript. Guancheng Xiao: Project leader, supervised the overall progress of the research, and provided technical support and expert guidance. RuoHui Huang: Participated in the specific operations of the experiment, assisted in data collection and organization. Wei Xia: Responsible for handling and managing experimental samples, participated in data collection and analysis. Zeng QingMing: Assisted in experimental design and data analysis, provided technical support. Gang Xu: Provided experimental equipment and material support, participated in the discussion and analysis of research results. Bo Jiang: Participated in the overall planning and management of the project, provided professional guidance and support. All authors participated in various stages of this study, including experimental design, data collection, data analysis, and manuscript writing. They reviewed and approved the final manuscript for publication.

Funding

Not applicable.

Data availability

The data used in this paper came from the TCGA database (https://portal.gdc.cancer.gov/) and the GEO database (https://www.ncbi.nlm.nih.gov/geo/).

Declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Muya Ran and Xiaoming Chen are joint first authors.

References

  • 1.Li S, et al. Blood-based liquid biopsy: insights into early detection, prediction, and treatment monitoring of bladder cancer. Cell Mol Biol Lett. 2023;28(1):28. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Siegel RL, et al. Cancer statistics, 2022. CA Cancer J Clin. 2022;72(1):7–33. [DOI] [PubMed] [Google Scholar]
  • 3.Fang D, Kitamura H. Cancer stem cells and epithelial-mesenchymal transition in urothelial carcinoma: possible pathways and potential therapeutic approaches. Int J Urol. 2018;25(1):7–17. [DOI] [PubMed] [Google Scholar]
  • 4.Babjuk M, et al. European Association of Urology Guidelines on Non-muscle-invasive Bladder Cancer (Ta, T1, and Carcinoma in Situ). Eur Urol. 2022;81(1):75–94. [DOI] [PubMed] [Google Scholar]
  • 5.Wołącewicz M, et al. Immunotherapy in bladder cancer: current methods and future perspectives. Cancers (Basel). 2020. 10.3390/cancers12051181. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Jing W, et al. Tumor-neutrophil cross talk orchestrates the tumor microenvironment to determine the bladder cancer progression. Proc Natl Acad Sci U S A. 2024;121(20):e2312855121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Chen H, et al. The clinicopathological and prognostic value of NLR, PLR and MLR in non-muscular invasive bladder cancer. Arch Esp Urol. 2022;75(5):467–71. [DOI] [PubMed] [Google Scholar]
  • 8.Ouyang Y, et al. Tumor-associated neutrophils suppress CD8(+) T cell immunity in urothelial bladder carcinoma through the COX-2/PGE2/IDO1 axis. Br J Cancer. 2024;130(5):880–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Mandelli GE, et al. Tumor infiltrating neutrophils are enriched in basal-type urothelial bladder cancer. Cells. 2020. 10.3390/cells9020291. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Wang X, et al. Prognostic value of COL10A1 and its correlation with tumor-infiltrating immune cells in urothelial bladder cancer: a comprehensive study based on bioinformatics and clinical analysis validation. Front Immunol. 2023;14:955949. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Qiu D, et al. PMEPA1 is a prognostic biomarker that correlates with cell malignancy and the tumor microenvironment in bladder cancer. Front Immunol. 2021;12:705086. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Lyu X, et al. Identification of immuno-infiltrating MAP1A as a prognosis-related biomarker for bladder cancer and its ceRNA network construction. Front Oncol. 2022;12:1016542. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Wang H, et al. Identification and preliminary analysis of hub genes associated with bladder cancer progression by comprehensive bioinformatics analysis. Sci Rep. 2024;14(1):2782. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Dong Y, et al. Prognostic model development and molecular subtypes identification in bladder urothelial cancer by oxidative stress signatures. Aging (Albany NY). 2024;16(3):2591–616. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Zhu Y, et al. Predicting latent lncRNA and cancer metastatic event associations via variational graph auto-encoder. Methods. 2023;211:1–9. [DOI] [PubMed] [Google Scholar]
  • 16.Vanmathi P, Jose D. An ensemble-based serial cascaded attention network and improved variational auto encoder for breast cancer prognosis prediction using data. Comput Methods Biomech Biomed Engin. 2024;27(1):98–115. [DOI] [PubMed] [Google Scholar]
  • 17.Lu X, et al. An algorithm for classifying tumors based on genomic aberrations and selecting representative tumor models. BMC Med Genomics. 2010;3:23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Zhu R, et al. A robust manifold graph regularized nonnegative matrix factorization algorithm for cancer gene clustering. Molecules. 2017. 10.3390/molecules22122131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Yu Z, et al. Single-cell transcriptomic map of the human and mouse bladders. J Am Soc Nephrol. 2019;30(11):2159–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Lai H, et al. Single-cell RNA sequencing reveals the epithelial cell heterogeneity and invasive subpopulation in human bladder cancer. Int J Cancer. 2021;149(12):2099–115. [DOI] [PubMed] [Google Scholar]
  • 21.Hao Y, et al. Dictionary learning for integrative, multimodal and scalable single-cell analysis. Nat Biotechnol. 2024;42(2):293–304. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Stoeckius M, et al. Cell Hashing with barcoded antibodies enables multiplexing and doublet detection for single cell genomics. Genome Biol. 2018;19(1):224. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Korsunsky I, et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat Methods. 2019;16(12):1289–96. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Aran D, et al. Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nat Immunol. 2019;20(2):163–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Morabito S, et al. hdWGCNA identifies co-expression networks in high-dimensional transcriptomics data. Cell Rep Methods. 2023;3(6):100498. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Jin S, et al. Inference and analysis of cell-cell communication using cell chat. Nat Commun. 2021;12(1):1088. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Choudhary S, et al. Macrophage migratory inhibitory factor promotes bladder cancer progression via increasing proliferation and angiogenesis. Carcinogenesis. 2013;34(12):2891–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Liu AG, et al. RNA interference targeting adrenomedullin induces apoptosis and reduces the growth of human bladder urothelial cell carcinoma. Med Oncol. 2013;30(3):616. [DOI] [PubMed] [Google Scholar]
  • 29.Tsai YC, et al. Selective inhibition of HDAC6 promotes bladder cancer radiosensitization and mitigates the radiation-induced CXCL1 signalling. Br J Cancer. 2023;128(9):1753–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Zaravinos A, et al. Role of the angiogenic components, VEGFA, FGF2, OPN and RHOC, in urothelial cell carcinoma of the urinary bladder. Oncol Rep. 2012;28(4):1159–66. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Liu L, et al. Expression of vascular endothelial growth factor, receptor KDR and p53 protein in transitional cell carcinoma of the bladder. Urol Int. 2008;81(1):72–6. [DOI] [PubMed] [Google Scholar]
  • 32.Fan X, et al. Global analysis of miRNA-mRNA regulation pair in bladder cancer. World J Surg Oncol. 2022;20(1):66. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Chen MK, et al. BMI1 activates P-glycoprotein via transcription repression of miR-3682-3p and enhances chemoresistance of bladder cancer cell. Aging Albany NY. 2021;13(14):18310–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Chung SS, Koh CJ. Bladder cancer cell in co-culture induces human stem cell differentiation to urothelial cells through paracrine FGF10 signaling. In Vitro Cell Dev Biol Anim. 2013;49(10):746–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Su Y, et al. Hypoxia-elevated circELP3 contributes to bladder cancer progression and cisplatin resistance. Int J Biol Sci. 2019;15(2):441–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Shen M, et al. Lymphotoxin β receptor activation promotes bladder cancer in a nuclear factor-κB-dependent manner. Mol Med Rep. 2015;11(2):783–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Wu J, et al. Loperamide induces protective autophagy and apoptosis through the ROS/JNK signaling pathway in bladder cancer. Biochem Pharmacol. 2023;218:115870. [DOI] [PubMed] [Google Scholar]
  • 38.Liu X, et al. Generalized machine learning based on multi-omics data to profile the effect of ferroptosis pathway on prognosis and immunotherapy response in patients with bladder cancer. Environ Toxicol. 2024;39(2):680–94. [DOI] [PubMed] [Google Scholar]
  • 39.Ma L, et al. Crosstalk between mesenchymal stem cells and cancer stem cells reveals a novel stemness-related signature to predict prognosis and immunotherapy responses for bladder cancer patients. Int J Mol Sci. 2023. 10.3390/ijms24054760. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Zhang T, et al. The ATM inhibitor KU55933 sensitizes radioresistant bladder cancer cells with DAB2IP gene defect. Int J Radiat Biol. 2015;91(4):368–78. [DOI] [PubMed] [Google Scholar]
  • 41.Miralaei N, et al. Integrated pan-cancer of AURKA expression and drug sensitivity analysis reveals increased expression of AURKA is responsible for drug resistance. Cancer Med. 2021;10(18):6428–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Heo J, et al. The CDK1/TFCP2L1/ID2 cascade offers a novel combination therapy strategy in a preclinical model of bladder cancer. Exp Mol Med. 2022;54(6):801–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Vera-Puente O, et al. MAFG is a potential therapeutic target to restore chemosensitivity in cisplatin-resistant cancer cells by increasing reactive oxygen species. Transl Res. 2018;200:1–17. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data used in this paper came from the TCGA database (https://portal.gdc.cancer.gov/) and the GEO database (https://www.ncbi.nlm.nih.gov/geo/).


Articles from Human Genomics are provided here courtesy of BMC

RESOURCES