Skip to main content
British Journal of Cancer logoLink to British Journal of Cancer
. 2025 Feb 3;132(6):543–557. doi: 10.1038/s41416-025-02947-0

Elevated serum levels of GPX4, NDUFS4, PRDX5, and TXNRD2 as predictive biomarkers for castration resistance in prostate cancer patients: an exploratory study

Rong Wang 1,2,#, Shaopeng Wang 1,#, Yuanyuan Mi 3, Tianyi Huang 4, Jun Wang 3, Jiang Ni 3, Jian Wang 3, Jian Yin 5, Menglu Li 1,6, Xuebin Ran 7,8, Shuangyi Fan 7, Qiaoyang Sun 9, Soo Yong Tan 7, H Phillip Koeffler 8,10, Lingwen Ding 7,8,11,, Yong Q Chen 1,2,, Ninghan Feng 1,2,6,
PMCID: PMC11920399  PMID: 39900986

Abstract

Background

Prostate cancer (PCa) is a heterogeneous disease affecting over 14% of the male population worldwide. Although patients often respond positively to initial treatments within the first 2–3 years, many eventually develop a more lethal form of the disease known as castration-resistant PCa (CRPC). At present, no biomarkers that predict the onset of CRPC are available. This study aims to provide insights into the diagnosis and prediction of CRPC emergence.

Methods

Protein expression dynamics were analysed in drug (androgen receptor inhibitor)-tolerant persister (DTP) and drug withdrawal cells using proteomics to identify potential biomarkers. These biomarkers were subsequently validated using a mouse model, 180-paired carcinoma/benign tissues, and 482 serum samples. Five machine learning algorithms were employed to build clinical prediction models, wherein the SHapley Additive exPlanation (SHAP) framework was used to interpret the best-performing model. Moreover, three regression models were developed to determine the Time from initial PCa diagnosis to CRPC development (TPC) in patients.

Results

We identified that the protein expression levels of GPX4, NDUFS4, PRDX5, and TXNRD2 were significantly upregulated in PCa patients, particularly in those with CRPC. Among the tested machine learning models, the random forest and extreme gradient boosting models performed best on tissue and serum cohorts, achieving AUCs of 0.958 and 0.988, respectively. In addition, a significant inverse correlation was observed between TPC and serum levels of these four biomarkers. This correlation was formulated in three regression models, which achieved the smallest mean absolute error of 1.903 on independent datasets for predicting CRPC emergence.

Conclusion

Our study provides new insights into the role of DTP cells in CRPC development. The quad protein panel identified in our study, along with the post hoc and intrinsically explainable prediction models, may serve as a convenient and real-time prognostic tool, addressing the current lack of clinical biomarkers for CRPC.

graphic file with name 41416_2025_2947_Figa_HTML.jpg

Subject terms: Prostate cancer, Protein-protein interaction networks

Introduction

Androgen deprivation therapy (ADT) is a standard clinical practice in managing the progression of prostate cancer (PCa) [1]. Although patients initially exhibit positive responses to the treatment within the first 2–3 years, many patients eventually develop castration-resistant PCa (CRPC) [2]. Continuing ADT therapy at this stage results in suboptimal or even poor outcomes, underscoring the necessity for promptly identifying patients at a high risk of developing CRPC and transitioning them to an alternative therapeutic strategy [3, 4]. At present, monitoring the progression of CRPC still primarily relies on plasma prostate-specific antigen (PSA) levels. However, as PSA is a general biomarker for PCa, it lacks specificity for CRPC progression. In addition, similar to the diagnosis of PCa, PSA testing has a high false-positive rate and may cause unnecessary anxiety among false-positive patients [5, 6]. Therefore, developing new biomarkers for monitoring CRPC is crucial. This would enable physicians to promptly identify patients at risk of not responding to ADT, facilitating adjustments to their treatment regimen, and potentially improving overall survival.

Researchers have previously discovered that cancer cells can enter a drug-tolerant persister (DTP) state, which is marked by enhanced drug resistance without genetic mutations and the ability to revert to their original phenotype after drug withdrawal [7]. DTP cells have now been identified as one of the primary causes of cancer cell resistance to chemotherapeutic drugs [8, 9]. We have recently identified androgen receptor (AR) inhibitor-resistant prostate DTP cells, which play a crucial role in CRPC development [10]. In the current study, we further examined the dynamics of protein expression in parental PCa cells, DTP cells, and cells after withdrawing the androgen receptor (AR) inhibitor via label-free proteomics analyses. Among the levels of 1535 differentially expressed proteins (DEPs) identified, we observed that the levels of certain proteins remained unchanged despite phenotypic reversal. Notably, we identified a set of quad proteins glutathione peroxidase 4 (GPX4), NADH: ubiquinone oxidoreductase subunit S4 (NDUFS4), peroxiredoxin-5 (PRDX5), and thioredoxin reductase 2 (TXNRD2) that were consistently elevated and correlated with the emergence of CRPC. These findings were clinically validated using 180-paired carcinoma/benign tissue samples and 482 multi-class serum samples, highlighting the potential use of these proteins as biomarkers for CRPC.

We then applied machine learning (ML) techniques to further examine the predictive value of these proteins. ML has widely been used in developing diagnostic and prognostic models for various diseases, including PCa [1114]. Here, we used five ML algorithms to analyse tissue cohorts expressing the identified quad proteins (GPX4-NDUFS4-PRDX5-TXNRD2). The analysis led to the development of a Random Forest (RF) model that effectively distinguished between paired tumour and benign tissue samples, achieving an area under the receiver-operating-characteristic (ROC) curve (AUC) value of 0.958 in the validation set. Furthermore, in the serum cohort, an extreme gradient boosting (XGBoost) model with quad protein expression in the serum showed AUC values of 0.988 and 0.917 in validation and independent sets. To enhance both the ML models and their interpretive frameworks and make them more user-friendly for clinicians integrating prediction models into clinical practice, we used the SHapley Additive exPlanation (SHAP) framework [15, 16] for interpreting our ML models. This approach provides a unified post hoc method to interpret various types of ML models and visualise feature contributions to the final output. Moreover, to estimate the Time from initial PCa diagnosis to the emergence of CRPC (TPC), we developed three regression models with self-explanatory attributes, facilitating real-time clinical diagnosis. We validated that the four serum protein levels are closely correlated with TPC among patients, achieving the best mean absolute error (MAE) values of 3.583 in the training and 1.903 in the independent sets. Taken together, these findings underscore the potential of the identified quad proteins as biomarkers for predicting CRPC development using clinical samples and patient diagnostic data.

Materials and methods

Cell culture

Human PCa LNCaP (ATCC CRL-1740) and mouse PCa (Hi-Myc transgenic) MyC-CaP (ATCC CRL-3255) cell lines were grown in Roswell Park Memorial Institute (RPMI)-1640 medium (Thermo Fisher Scientific, CA, USA) containing 10% foetal bovine serum (FBS; Thermo Fisher Scientific) and 1% streptomycin–penicillin at 37 °C with 5% CO2. Human normal prostate epithelial WPMY-1 (ATCC CRL-2854) cell lines and human PCa epithelial VCaP (ATCC CRL-2876) cell lines were grown in Dulbecco’s Modified Eagle Medium (DMEM), with 4.5 g/L d-Glucose medium (Thermo Fisher Scientific, CA, USA) containing 10% foetal bovine serum (FBS; Thermo Fisher Scientific) and 1% streptomycin–penicillin, at 37 °C with 5% CO2. The cell lines were authenticated by short tandem repeat analysis, and mycoplasma contamination was evaluated using the PCR Mycoplasma Detection Kit (Takara, Otsu, Japan). The cells were treated with the EPI-001 (EPI, Selleck, S7955, TX, US, dissolved in DMSO) or enzalutamide (ENZ, MCE, HY-70002, NJ, US, dissolved in DMSO) using different concentrations and times to appropriately obtain drug-tolerant persister cells as described by Sharma et al. [7].

Generation of DTP and drug-tolerant expanded persister (DTEP)

Drug-sensitive cells were treated with AR inhibitors including EPI and ENZ at concentrations 100 times over the established IC50 values, with three rounds of treatment with each drug and each treatment round lasting 48 h. Viable cells that remained attached on plates at the end of the third round (9 days) were considered as DTP and collected for further analysis. DTP cells eventually resumed normal proliferation on continuous exposure to drugs and became DTEP cells after 33 days of treatment. These cells can be indefinitely propagated in the presence of drugs. The normal FBS culture medium was used to form DTP and DTEP cells. After withdrawing AR inhibitors EPI/ENZ, the DTP/DTEP cells were passaged every 6 days, photographed under an inverted microscope, and collected for follow-up studies.

Viability assay

Cell viability was assessed by Cell Counting Kit (CCK-8; MCE, HY-K0301, NJ, US) according to the manufacturer’s instructions. In brief, cells were seeded at a concentration of 4000 cells/200 µL/well into 96-well plates, incubated overnight, and the medium was changed to a fresh one with various inhibitors. Following treatment, 10 µL CCK-8 solution was added, and cells were incubated for 4 h at 37 °C. The optical density (OD) value was measured at 450 nm using a microplate spectrophotometer (Thermo Fisher Scientific). Three independent experiments were performed, each in triplicates.

For gene knockdown experiments, cells were transfected with small-interfering RNA (siRNA) and treated with/without AR inhibitors. Lactate dehydrogenase (LDH) release assay was performed after 48-h treatment using a 2-p-iodophenyl-3-nitrophenyl tetrazolium chloride/diaphorase-based kit (C0017, Beyotime, Jiangsu, CN), according to the manufacturer’s instructions.

Western blotting

Cells were treated as described in the abovementioned procedure and then lysed in sample buffer (2% SDS, 10% glycerol, 10% β-mercaptoethanol, bromophenol blue, and Tris-HCl, pH = 6.8) at 95 °C for 10 min. The lysates were fractionated on SDS-PAGE gels and transferred to PVDF membranes (Millipore, IPVH00010, NH, US). The blots were probed with specific antibodies, followed by secondary antibodies, and then detected using enhanced chemiluminescence (Sigma, WBULS0500, MO, US). GPX4 (67763-1-Ig; 1:2000), NDUFS4 (15849-1-AP; 1:1000), PRDX5 (17724-1-AP; 1:10,000), TXNRD2 (16360-1-AP; 1:2000), GAPDH (60004-1-Ig; 1:50,000), β-TUBULIN (66240-1-Ig; 1:50,000), and β-ACTIN (66009-1-Ig; 1:50,000) antibodies were purchased from Proteintech Group (IL, US). Secondary antibodies were conjugated with HRP (Proteintech Group; SA00001-2, SA00001-1; 1:10,000).

siRNA-mediated gene knockdown

hGPX4 siRNA (si-hGPX4; 5′-GUGGAUGAAGAUCCAACCCAATTUUGGGUUGGAUC UUCAUCCACTT-3′), hNDUFS4 siRNA (si-hNDUFS4; 5′-CGCAAUAACAUGCAGUCUGGA TTUCCAGACUGCAUGUUAUUGCGTT-3′), hPRDX5 siRNA (si-hPRDX5; 5′-GGUGGCCUG UCUGAGUGUUTTAACACUCAGACAGGCCACCTT-3′), hTXNRD2 siRNA (si-hTXNRD2; 5′- GAAUAUGGAAUCACAAGUGAUTTAUCACUUGUGAUUCCAUAUUCTT-3′), mGPX4 siRNA (si-mGPX4; 5′-AGUUUGACAUGUACAGCAAGATTUCUUGCUGUACAUGUCAAA CUTT-3′), mNDUFS4 siRNA (si-mNDUFS4; 5′-GUCUGGAGUAAAUAACACAAATT UUUGUGUUAUUUACUCCAGACTT-3′), mPRDX5 siRNA (si-mPRDX5; 5′-CAUUUAC ACCUGGCUGUUCUATTUAGAACAGCCAGGUGUAAAUGTT-3′), and mTXNRD2 siRNA (si-mTXNRD2; 5′-GCAGCAGAGCUUUGAUCUCUUTTAAGAGAUCAAAGCUCU GCUGCTT-3′), were prepared by Jiangsu Saisofi Biotechnology Co., Ltd (Wuxi, China), and negative siRNA (si-neg; 5′-UUCUCCGAACGUGUCACGUTTACGUGACACGUUCGGAGAA TT-3′), hGAPDH siRNA (si-hpos; 5′-UGACCUCAACUACAUGGUUTTAACCAUGUAGUUGAGGUCATT-3′), and mGAPDH siRNA (si-mpos; 5′-AGAAUGGGAAGCUUGUCAUCAATTUUGAUGACAAGC UUCCCAUUCUTT-3′) were used as negative and positive controls. The siRNAs were transfected into LNCaP, VCaP, and MyC-CaP cells using Polyplus-transfection (jetPRIME, NY, USA), according to the manufacturer’s instructions. Successful knockdown was verified by western blotting.

Immunohistochemistry (IHC)

PCa tissue samples were collected from the Affiliated Hospital and Medical Center of Jiangnan University according to institutional guidelines. Tissue paraffin blocks were sectioned, and stained with antibodies specific to GPX4 (67763-1-Ig; 1:1000; Proteintech Group), NDUFS4 (ab137064; 1:50; Abcam), PRDX5 (67599-1-Ig; 1:200; Proteintech Group) (17724-1-AP; 1:200; Proteintech Group), TXNRD2 (ab180493; 1:50; Abcam), and AR (22089-1-AP; 1:20; Proteintech Group). This was followed by scanning with a Pannoramic Scanner (3DHISTECH, Budapest, Hungary).

Enzyme-linked immunosorbent assay (ELISA)

ELISA kits for GPX4 (ml060706-2), NDUFS4 (ml507645-2), PRDX5 (ml606701-2), and TXNRD2 (ml560379-2) were used to detect the proteins in serum samples according to the protocols provided by the manufacturers (Shanghai Enzyme-linked Biotechnology Co., Ltd).

Proteomics study

LNCaP, epiDTP, epiR5, epiR10, epiR20, enzDTP, enzR5, enzR10, and enzR20 cells were used for label-free quantitative proteomics analysis. Samples were lyzed with 8 M urea (pH = 8.0) and the concentration was quantified using a BCA kit (Beyotime, P0012, Shanghai, China). Proteins were reduced with dithiothreitol (DTT) and then alkylated with iodoacetamide (IAM) in a dark environment. Sequencing-grade trypsin (Promega, WI, USA) was added for overnight digestion. Peptides were desalted and reconstituted in 0.5 M tetraethyl-ammonium bromide (TEAB) and processed using Pierce™ quantitative colorimetric peptide assay kit, according to the manufacturer’s instructions (Thermo Scientific, CA, USA). Global peptides were resuspended in 2% acetonitrile (ACN) and 0.1% formic acid (FA) solution and then analysed using an EASY-nLC 1200 system (Thermo Scientific, CA, USA) coupled with a high-resolution Orbitrap Fusion Lumos mass spectrum (Thermo Scientific). Peptides were first separated using an RSLC C18 column (1.9 µm × 100 µm × 20 cm) packed in-house, and then selected for Tandem mass spectrometry (MS/MS) using normalised collision energy (NCE) setting at 28%. The fragments were detected in the Orbitrap at a resolution of 17,500. A data-dependent procedure that alternated between one MS scan followed by 20 MS/MS scans with 15.0 s dynamic exclusion. Automatic gain control (AGC) was set at 5E4. The fixed first mass was set at 100 m/z.

The resulting MS/MS data were analysed by the MaxQuant with an integrated Andromeda search engine (version 1.4.1.2). The search of tandem mass spectra was implemented in a SwissProt human database concatenated with a reverse decoy database. Trypsin/P was used as the cleavage enzyme allowing up to two missing cleavages. For proteomic analysis, the first search range was set to 5 ppm for precursor ions, and the main search range was set to 5 ppm and 0.02 Da for fragment ions. Cysteine carbamidomethylation was defined as the fixed modification, and the oxidation of methionine was defined as the variable modification. The quantification method used was label-free quantification (LFQ); the false discovery rate (FDR) was adjusted to <1%, and the minimum score for modified peptides was >40.

To identify the DEPs between compared groups, DEPs were defined as the proteins meeting the following criterion: |Log2 FC | > 1 and P < 0.05. The heatmap of expression profiles was drawn using the “pheatmap” R-package (https://www.r-project.org).

Pathway enrichment and Gene set enrichment analysis (GSEA)

Kyoto Encyclopedia of Genes and Genomes (KEGG) database (https://www.genome.jp/kegg/) was used to identify enriched pathways by a two-tailed Fisher’s exact test to detect the enrichment of DEPs against all quantitative proteins. Pathways with a corrected FDR ≤ 0.05 were considered significant, and classified into hierarchical categories. Furthermore, proteins in selected pathways were visualised by a heatmap from the “pheatmap” R-package (https://www.r-project.org/).

GSEA (http://www.broad.mit.edu/gsea) was used to generate a ranked list of proteins using the Signal2Noise ranking metric (descending order) with the comparison data of DTP and NC. The top-ranked GSEA pathways are shown in the histogram.

Unsupervised hierarchical clustering analysis

The Mfuzz method was used to identify changes in protein abundance under different consecutive samplings (e.g., treatment durations or drug concentrations). This analysis method adopted a new clustering algorithm—fuzzy c-means algorithm—to reduce the interference of noise on clustering results. The parameter k, denoted the number of clusters, which was set to 8; the degree of cluster fuzziness m was set to 2. Proteins of the same cluster had similar expression transition trends.

Animal study

Hi-MYC transgenic PCa mice (gifted from George V. Thomas laboratory) [17] were used in animal experiments, and all experimental protocols were approved by the Animal Ethics Committee of Jiangnan University, China (JN. No20190630t1360101[191]). To construct the temporal development of castration resistance, 4-month-old mice were randomly assigned to control (NC) and ENZ (intragastric administration 10 mg/Kg) groups, with drug administration every 3 days. The mice were then euthanized every month, and their prostate (including anterior lobes, dorsal lateral lobes, and ventral lobes) was dissected, photographed, and weighed. After an initial decrease in prostate weight, a regaining growth in the prostate weight was considered to indicate the emergence of castration resistance [18, 19]. To detect the expressions of quad proteins, the prostate tissues were subjected to western blotting and histopathological analysis.

Clinical study

Patient tissue and peripheral blood samples were collected from the Affiliated Hospital and Medical Center of Jiangnan University. All patients signed the informed consent. The study was approved and authorised by the Ethics Committee of the Affiliated Hospital of Jiangnan University (Approval document number: LS2023099). Patient information is listed in Tables S3 and S4. An independent set including 11 CRPC patients with their serum expressions of quad biomarkers and TPC attributes was subsequently constructed for external validation.

Model derivation and comparison

Data from 180-paired carcinoma/benign tissue samples and 362 serum samples (120 BPH, 196 LPCa, and 46 CRPC) were divided into training set (80%) and validation set (20%) to avoid model over-fitting. In particular, as an outer testing set, a tissue cohort from the study by Grasso et al. [20] with 119 samples (28 BPH, 59 LPCa and 32 mCRPC) was used to further validate the distinguishing ability of the quad proteins for discriminating normal prostate tissues from those in different diseases or at cancer stages. In addition, an independent set (including 11 CRPC serum samples) was used to test the generalisation ability for the best model derived from the serum cohort.

The quad biomarkers were used to build prediction models on the abovementioned data. Five ML algorithms, namely Support Vector Classification (SVC), Adaptive Boosting (AdaBoost), Random Forest (RF), Gradient Boosting Decision Tree (GBDT), and eXtreme Gradient Boosting (XGBoost) with Python programming language were used to distinguish different sample classes in our cohorts. The final hyper-parameters in the prediction models were selected via a manual fine-tuning procedure.

Several widely accepted evaluation metrics including AUC yielded from the ROC curve and total accuracy (ACC) were applied to evaluate the generalisation abilities of the prediction models together with sensitivity (SEN) and specificity (SPE) for binary classification models.

Model explanation

Interpretation of the ML model would help us understand the prediction processes and results in the complex models, which is an essential requirement of clinicians to explain the model output in clinical practice.

As a post hoc explainable ML framework, the SHAP framework derived from Game theory provided a global and local explanation procedure for the prediction model. The global one offered a feature-level association between sample classes and input features to quantitively evaluate the importance of features in model prediction. The local explanation procedure provided a visualised process for specific prediction in individual patient samples.

Correlation analysis

Biomarker levels, patient age, TPSA, GS and the Time from initial PCa diagnosis to the emergence of CRPC (TPC) data were log2 transformed. Spearman or Pearson correlation analysis was performed, and fitting correlation straight line and R2 value were obtained.

The Cox proportional hazard regression model was used to estimate the hazard ratio (HR) and 95% confidence interval (CI) for each potential risk factor, with visualisation through Forest plots. Stepwise multivariate Cox regression analysis included inclusion and exclusion criteria for type I error at 0.1.

Log2 serum values of biomarkers as dependent variables and log2 TPC values as an independent variable were fitted with a multi-linear regression model with an intrinsic explanation attribute to quantitatively evaluate their correlativeness.

Statistical analysis

Student’s t test was used to compare the means of the two groups. One-way ANOVA was performed to compare the means of three or more groups (GraphPad, CA, USA). Tukey’s test was used to perform multiple comparisons (IBM SPSS, NY, USA). Data are presented as mean ± s.d. of biological repetition. P < 0.05 was considered significant in all tests.

Results

Heightened antioxidative state in DTP cells

To gain insights into the expression dynamics taking place in PCa cells before AR inhibitor treatment, in the DTP state, and after phenotypic reversal, a global proteome profiling of NC (0 days), DTP (9 days of drug treatment), and R20 (20 days after drug withdrawal) was conducted. The analysis identified 764 (362 up and 402 down) and 771 (385 up and 386 down) DEPs in enzalutamide (ENZ)-resistant and EPI-001 (EPI)-resistant DTP cells, respectively. Comparisons of other DTP groups are shown in Fig. S1A and Datasets S1 and S2 (ProteomeXchange PXD032983). Gene enrichment analysis was performed against the “Hallmark” gene sets derived from the MSigDB [21] and KEGG enrichment analysis, and significant enrichment of metabolism-related pathways and a heightened antioxidative state was observed in DTP cells (Figs. 1a and  S1BC and Datasets S3 and S4).

Fig. 1. Protein expression dynamics in DTP cells.

Fig. 1

a Enrichment analysis of MSigDB gene sets (https://www.gsea-msigdb.org/gsea/msigdb) was conducted to compare untreated LNCaP (NC) with enzalutamide (ENZ)-resistant DTP or EPI-001 (EPI)-resistant DTP cells. Antioxidative signalling pathways were enriched in DTP cells. b Principal component analysis (PCA) was performed to analyse the transition of cells from untreated LNCaP (NC) to DTP cells at 5, 10 and 20 days after drug withdrawal. The results suggested a gradual protein expression reversal. c The protein expression was clustered into eight groups. Clusters I and IV showed expression reversal at 5 days after drug withdrawal (R5), Clusters II and V showed expression reversal at 10 days after drug withdrawal (R10), and Clusters III and VI showed expression reversal at 20 days after drug withdrawal (R20). Cluster VII showed proteins that remained upregulated, and Cluster VIII showed proteins that stayed downregulated after the phenotypic reversal of cells. d Protein–protein interaction (PPI) analysis indicated a tight connection with metabolic networks. e Mitochondrial proteins PRDX5, TXNRD2, GPX4, and NDUFS4 were identified as regulators of redox metabolic pathways.

To determine whether phenotypic reversal was accompanied by protein expression reversal, principal component analysis was performed. The expression dynamics analysed by the dynamic clustering method were found to be largely in accordance with the phenotype (Fig. 1b). The detailed expression pattern examination, however, revealed that there were eight different expression clusters, and the levels of some proteins were not reversed despite the phenotypic reversal of drug resistance (Fig. 1c). Proteins in Cluster VII (Table S1) remained elevated, whereas proteins in Cluster VIII (Table S2) stayed depressed (Fig. 1c). Protein–protein interaction mapping of Cluster VII and VIII showed a tight connection with the metabolic network (Fig. 1d). Thus, some protein expression patterns in DTP cells were not reversible even after the cells converted back to a drug-sensitive state. This indicated their essential biological functions in maintaining the drug-resistant state of DTP cells.

Of the proteins in Cluster VII, peroxiredoxin-5 (PRDX5) is a thiol-specific peroxidase that catalyzes the reduction of hydrogen peroxide and organic hydroperoxides to water and alcohol, respectively. Glutathione peroxidase 4 (GPX4) is an essential antioxidant peroxidase that detoxifies lipid hydroperoxide. Furthermore, thioredoxin reductase 2 (TXNRD2) plays a role in redox homoeostasis. NADH dehydrogenase [ubiquinone] iron-sulphur protein 4 (NDUFS4) is a subunit of complex I. All these proteins function as mitochondrial enzymes regulating the redox metabolic pathways (Fig. 1e).

Increased expression of quad proteins in DTP and DTEP cells

Measurement of antioxidative proteins GPX4, NDUFS4, PRDX5 and TXNRD2 by Western blotting in LNCaP, DTP, and reversed cells confirmed the proteomics results (Fig. 2a, b). Similar results were observed in human VCaP and mouse MycCaP PCa cells (Fig. S2A). Therefore, we suggested that these proteins maintained high expression after cell resistance, even after the AR inhibitors were withdrawn. Silencing of PRDX5/TXNRD2/NDUFS4 genes appeared to have additional effects with AR inhibitors on cell proliferation (Figs. 2d and S2BC). The knockdown of individual genes itself had significant suppressive effects on some cells (Figs. 2d and S2C). In addition, it was necessary to explore the survival rate of these cell lines after a combination knockdown of the quad proteins. Through the survival rate experiments of LNCaP, VCaP and MyC-CaP cell lines, we observed that the combination knockdown of quad proteins significantly reduced the cell survival rate. In addition, more obvious additive effects on cell proliferation were observed with AR inhibitors (Fig. 2e). Interestingly, when DTP cells were treated with continuous drug exposure to form drug-tolerant expanded persister (DTEP) cells, the level of these proteins was further increased (Fig. 2c). The results provided some evidence for our hypothesis that quad proteins can be potentially regarded as diagnostic targets with further validation.

Fig. 2. Functional validation of antioxidative proteins in DTP cells.

Fig. 2

a AR inhibitors EPI/ENZ were withdrawn from cultures, and DTP cells were passaged every 6 days, and photographed under an inverted microscope. Scale bars = 100 µm. b The protein levels of PRDX5, TXNRD2, GPX4, and NDUFS4 were measured via Western blotting (WB) in DTP and withdrawal R5, R10 and R20 cells (left panel). This confirmed the proteomics results. β-ACTIN was used as a loading control. Quantitative results of relative protein expression are shown in the right panel. One-way ANOVA with the Tukey’s test was performed with *P < 0.05, indicating statistical significance. Error bars represent the mean ± s.d. of biological triplicates. c In DTP cells with continuous drug exposure (DTEP cells) for 33 days, the protein levels of PRDX5, TXNRD2, GPX4, and NDUFS4 further increased. GAPDH was used as a loading control. One-way ANOVA with the Tukey’s test was performed with *P < 0.05, indicating statistical significance. Error bars represent the mean ± s.d. of biological triplicates. d LNCaP cells were treated with negative control siRNA (siCTL), si-PRDX5, si-TXNRD2, si-GPX4 and si-NDUFS4 alone or in combination with EPI and ENZ. Cell survival and death rates were measured and presented in histograms. Student’s t test was performed with *P < 0.05, indicating statistical significance. Error bars represent the mean ± s.d. of biological triplicates. e LNCaP, VCaP and MyC-CaP cells were treated with negative control siRNA (siCTL), si-PRDX5/TXNRD2/GPX4/NDUFS4 (siALL) alone or in combination with EPI and ENZ. Cell survival rates were measured and presented in histograms. Student’s t test was performed with *P < 0.05, indicating statistical significance. Error bars represent the mean ± s.d. of biological triplicates.

Increased expression of quad proteins in CRPC mice and patients

To explore the potential use of quad proteins (GPX4, NDUFS4, PRDX5, TXNRD2) as biomarkers, their expression levels in PCa and adjacent tissues were examined by immunohistochemistry among 180 patients (180-paired carcinoma/benign samples) who underwent laparoscopic radical prostatectomy (Fig. 3a). We found that protein expression levels were significantly increased in carcinoma cells (Fig. 3b). The abovementioned 180-paired samples comprised a tissue cohort for ML modelling.

Fig. 3. Increased expression of quad proteins in PCa patients.

Fig. 3

a The representative images of PRDX5, TXNRD2, GPX4, NDUFS4, and AR immunohistochemistry in PCa tissues obtained from patients who underwent laparoscopic radical prostatectomy are displayed. Scale bars indicate 50 µm. b The box and whisker plot displays the IHC score of 180-paired carcinoma/benign tissues. The centre of the box represents the middle 50% (50th percentile) of the dataset as derived using the lower and upper quartile values. The median value is displayed inside the box. The maximum and minimum values are depicted with vertical lines connecting the points to the centre box. Individual values are shown as dots. Paired-sample t test showed statistical significance with ****P < 0.0001. c ELISA was employed to determine the level of PRDX5, TXNRD2, GPX4, and NDUFS4 in serum samples obtained from 120 patients with non-prostate related diseases (NPRD), 120 benign prostatic hyperplasia (BPH), 190 localised PCa (LPCa), and 28 CRPC. The results are shown in the box and whisker plot. One-way ANOVA indicated statistical significance with P < 0.05. The data are expressed as mean ± s.d.

To validate the expression pattern of quad proteins in mouse models, we constructed a CRPC mouse model by long-term administration of ENZ to Hi-MYC mice according to our reported method [19]. The prostate tissues of PCa and CRPC mice aged 5, 6, 7 and 8 months were excised to measure the expression levels of quad proteins. The results showed that at the treatment stage of 5- and 6-month-old mice, the expression levels of quad proteins were relatively low. In contrast, the mice developed drug resistance to ENZ and formed castration resistance at 7 and 8 months, with the expression levels of quad proteins being significantly increased (Fig. S2D). Subsequently, compared with the quad protein expression levels in the prostate tissues of 6-month-old mice in treatment stage, those in 8-month-old mice in the castration-resistance stage were increased, as confirmed by IHC staining (Fig. S2E). The abovementioned results further confirmed the expression pattern of quad proteins as potential biomarkers at the animal level.

It was wondered whether these proteins were also detectable in blood samples. Accordingly, serum derived from 120 non-prostate-related diseases (NPRD), 120 benign prostatic hyperplasia (BPH), 190 localised PCa (LPCa), and 28 CRPC patients were quantified by ELISA (Table S3). The total samples collected were as follows: 120, 120, 196 and 46 samples, respectively. The results indicated that the levels of these four proteins were not statistically different between NPRD and BPH patients, whereas the levels of GPX4, NDUFS4, and PRDX5, but not TXNRD2, were significantly higher in LPCa than in NPRD and BPH patients, with the highest levels observed in CRPC patients (Fig. 3c). Levels above the median were considered over-expression, and those below the median were considered under-expression. The mean, lower quartile (Q1), upper quartile (Q3), interquartile range (IQR), and variance of protein expression in tissue and serum samples are shown in Tables S5 and S6. Based on the abovementioned results, the quad proteins (GPX4, NDUFS4, PRDX5, and TXNRD2) may be used as blood-based biomarkers during clinical diagnosis. The serum samples from BPH, LPCa and CRPC patients comprised a cohort that was further used for ML modelling.

Biomarker analysis through ML approaches

According to Fig. 3c, no significant difference was observed in TXNRD2 between LPCa and BPH samples using statistical one-way ANOVA. To further evaluate the discriminatory ability of the quad biomarkers and single biomarkers, a well-developed ML modelling procedure was employed [22, 23]. In the tissue cohort, we used 180-paired samples with a paired random split to obtain training and validation sets in a proportion of 8:2. A tenfold cross-validation process was applied on single biomarkers and quad biomarkers, and their distinguishing abilities were evaluated by ROC curves and AUC values listed in Figs. S3S7. In general, the prediction performances of five types of models on quad biomarkers obtained a better AUC value than those on single biomarkers. Thus, these five prediction models for quad biomarkers were further applied to the validation set. In the validation set, the RF model obtained the best AUC of 0.958 (Fig. 4a) for distinguishing between benign tissue and tumour tissue samples. The other evaluation metrics together with the AUC values are shown in Dataset S5.

Fig. 4. Prediction performances of ML models on the discrimination of clinical tissue and serum samples.

Fig. 4

a ROC curves of the five ML models using quad biomarkers in the validation set of the tissue cohort. The RF model obtained the best AUC value of 0.958. b ROC curves of the five ML models using quad biomarkers in the validation set of the serum cohort. The XGBoost model obtained the best AUC value of 0.988. c ROC curve of XGBoost model in the independent set comprising 11 CRPC serum samples with an AUC value of 0.917.

To further validate the discrimination ability of a single biomarker and quad biomarkers, we used another tissue cohort with 119 patient samples from the study by Grasso et al. containing 28 normal, 59 LPCa, and 32 mCRPC tissues as an outer testing set [20]. The modelling procedure was similar to that of our tissue cohort with RF algorithm. The tenfold cross-validation results of the RF model in the training set with AUC values are listed in Fig. S8. Similarly, the model for quad biomarkers achieved the best AUC value compared with that for single ones. Therefore, the model for quad biomarkers was used to predict the samples in the validation set with an AUC value of 0.904. The performance metrics are shown in Dataset S5.

In the serum cohort, 120 BPH, 196 LPCa and 46 CRPC samples (Table S3) were strategically split with a proportion of 8:2 to obtain the training and validation sets. The ROC curves of models evaluating single and quad biomarkers with tenfold cross-validation are listed in Figs. S9S13. For the validation set, the XGBoost model showed the best AUC of 0.988 (Fig. 4b). Because of a lower proportion of CRPC samples in the serum cohort (~12.7%), it is necessary to further validate the generalisability of the XGBoost model for accurately recognising CRPC samples. Thus, an independent set with 11 CRPC serum samples (Table S4) was used. The ROC curve for serum independent set is shown in Fig. 4c; the AUC value was 0.917, which indicated good generalisability of our XGBoost model in distinguishing CRPC serum samples. The AUC and ACC values of the five models are shown in Dataset S6.

Model explanation

It is easier for clinicians to accept prediction outputs if they are interpretable and directly associated with clinical biochemical indices. Thus, the SHAP framework was used to interpret the output of the best XGBoost model derived from the serum cohort, in which the global explanation of each feature on how to make the prediction output was computed and visualised.

As shown in Fig. 5a, the importance of biomarkers for patient samples to predict “CRPC” class was assessed by their mean absolute SHAP values in the descending order. Furthermore, the SHAP values of samples were sorted in an ascending order on X axis along with samples coloured with red to blue indicating their values from high to low as shown in Fig. 5b. In addition, the dependence scatter plot helped to interpret how the feature values affect the prediction outputs of one model. As shown in Fig. 5c–f, patient samples (represented as dots) with SHAP values higher than zero (above the red lines) highlighted one or more value ranges, in which the model outputs were pushed to CRPC serum class.

Fig. 5. Global SHAP output plot of the best XGBoost in the serum cohort for CRPC class.

Fig. 5

a SHAP bar plot. The contributions of biomarkers on model prediction were summarised by the mean SHAP values in descending order. b SHAP dot plot. Each dot represents a biomarker value and corresponding SHAP value for one patient sample. The actual biomarker values were sorted from high to low with colours of red to blue. A larger SHAP value for a specific biomarker meant a positive push direction to the given class. For example, lower TXNRD2 values increased the model output to CRPC class. cf SHAP dependence scatter plot. This scatter plot facilitated the understanding of the affections between single biomarker and model output. The dot regions with SHAP values larger than zero push the model output to the CRPC class. For instance, biomarker GPX4 with values higher than 25.146 and less than 62.651 pushes the model decision toward the CRPC class. The X axis represents the values of biomarkers and corresponding SHAP values for Y axis.

In particular, patients with GPX4 expression between 25.146 and 62.651 (dots with SHAP values larger than zero) pushed the model outputs toward the “CRPC” class. Similarly, patients with RPDX5 expression less than 0.277, between 0.576 and 0.754, or higher than 0.880 were more likely to fall into “CRPC” class, which showed a more complex pattern for RPDX5 than that for GPX4. For TXNRD2 or NDUFS4, expression values less than 3.224 or higher than 0.863 (with a few exception dots) corresponded to positive SHAP values, indicating their association with the “CRPC” class. For BPH and LPCa classes, the global SHAP and dependence scatter plots in the XGBoost model for feature-level contributions are provided in Figs. S14 and  S15.

The local explanation provided information about how a specific prediction was made according to the serum expressions of quad biomarkers. As shown in Fig. 6a–c, a BPH patient was predicted with the input feature values. Figure 6a represents that the patients were pushed to BPH class with a probability of 94.6% together with probabilities of 5.1% (Fig. 6b) and 0.3% (Fig. 6c) for LPCa and CRPC classes, respectively. As shown, the GPX4, NDUFS4 and PRDX5 values pushed (red bar) the model prediction to the right class on the BPH sample, while decreasing the probabilities (blue bar) of the other two classes with different degrees. For example, GPX4 with a value of 1.617 increased the probability by +0.78 on BPH prediction but decreased it by −0.59 and −0.07 on the prediction of the other two classes. However, TXNRD2 increased the probability on the prediction of LPCa and CRPC classes.

Fig. 6. Local SHAP output plot of the best XGBoost in the serum cohort.

Fig. 6

ai Waterfall plots of each biomarker pushing the predicted class probability from base value to output value for a BPH (ac), LPCa (df) and CRPC serum sample (gi), respectively. a, d, g In first column represented the predicted probabilities of one serum sample for BPC class, along with b, e, h for LPCa class and c, f, i for CRPC class. For example, a BPH serum sample had the probability of 94.6% for BPH class, together with probabilities of 0.51% and 0.03% for LPCa and CRPC classes, respectively.

Some more complex patterns were shown on an LPCa sample (Fig. 6d–f) and a CRPC sample (Fig. 6g–i). As shown in Fig. 6g–i, the values of GPX4, NDUFS4 and PRDX5 pushed the sample toward CRPC class with a probability of 95.1%; in contrast, the biomarkers pushed the sample toward LPCa and BPH class with probabilities of 2.7% and 2.2%, respectively. One interesting finding was that PRDX5 showed increased probabilities for all three classes, whereas NDUFS4, GPX4 or TXNRD2 decreased the probabilities for LPCa and BPH classes (Fig. 6g, h). These results indicate that the combination of quad biomarkers is necessary to accurately distinguish the serum samples between three classes.

Quad proteins as biomarkers independent of age, TPSA and Gleason scores

The relationships between age, total prostate-specific antigen (TPSA), Gleason score (GS) and the expressions of the quad biomarkers in LPCa patients were analysed. We found that there was a poor correlation between the quad biomarkers and age, TPSA, or GS (Fig. S16). Similarly, there was no significant correlation between the quad biomarkers and age, TPSA, or GS among CRPC patients (Fig. S16). Thus, these proteins may be used as biomarkers independent of age, TPSA and Gleason scores.

Correlation of quad biomarkers with CRPC development

Spearman correlation analysis in paired tissue samples indicated that all four proteins positively correlated with PCa development (Fig. 7a). These results suggested that GPX4, NDUFS4, PRDX5 and TXNRD2 had a high diagnostic power for PCa tissues. Pearson correlation analysis showed that all four proteins positively correlated with tumour progression in serum samples (Fig. 7b). We explored the relationship between biomarker expression and the Time from initial PCa diagnosis to the emergence of CRPC (TPC), and found that the expression of these four proteins (GPX4, NDUFS4, PRDX5, TXNRD2,) was inversely correlated with TPC (Fig. 7c, log-rank P < 0.001). Multivariate Cox regression analysis revealed that GPX4 [HR: 19.664, 95% CI range 5.698–67.859, P < 0.001], PRDX5 [HR: 17.564, 95% CI range 5.071–60.827, P < 0.001], TXNRD2 [HR: 19.664, 95% CI range 5.698–67.859, P < 0.001], and NDUFS4 [HR: 10.107, 95% CI range 4.070–25.097, P < 0.001] were independent risk factors for TPC. However, age [HR: 0.952, 95% CI range 0.891–1.108, P = 0.152] and TPSA [HR: 1.003, 95% CI range 0.998–1.008, P = 0.31] were not significant factors by the presentation of forest plots (Fig. 7d).

Fig. 7. Predictive value of biomarkers for TPC.

Fig. 7

a Expression correlation analysis of four biomarkers in paired patient tissue samples. C_PRDX5: The expression of PRDX5 in carcinoma samples, C_TXNRD2: The expression of TXNRD2 in carcinoma samples, C_GPX4: The expression of GPX4 in carcinoma samples, C_NDUFS4: The expression of NDUFS4 in carcinoma samples, B_PRDX5: The expression of PRDX5 in benign samples, B_TXNRD2: The expression of TXNRD2 in benign samples, B_GPX4: The expression of GPX4 in benign samples, B_NDUFS4: The expression of NDUFS4 in benign samples. Purple represents a positive correlation and green represents a negative correlation. b Expression correlation analysis of four biomarkers in serum samples. c The four proteins were divided into two groups, high expression and low expression, according to their median expression in CRPC samples (P < 0.001). TPC means the time from initial PCa diagnosis to the emergence of CRPC. d Cox proportional hazard regression model showed the influence of multivariate factors (age, TPSA and four biomarkers) on hazard ratio (HR). Stepwise multivariate Cox regression analysis with inclusion and exclusion criteria of type I error = 0.1. HR > 1 means a risk factor, HR = 1 means no influence and HR < 1 means a protective factor.

We then built some single-variable linear regression models with TPC values as an independent variable and expressions of each biomarker as dependent variables (Fig. 8a). The results indicated TPC values were negatively correlated with each biomarker. Furthermore, a combined multi-linear regression model was generated between TPC and quad biomarkers:

y=0.232731*log2P+0.261863*log2T2.135358*log2G+0.006589*log2N+14.29377
TPCpred=2y

in which the P, T, G and N indicated the protein expressions of PRDX5, TXNRD2, GPX4 and NDUFS4, respectively. The R2 value of our model was 0.800, which indicated that ~80% of the variance in TPC can be explained by the level of quad proteins in the patient’s serum.

Fig. 8. Association of biomarkers with TPC.

Fig. 8

a Correlation between PRDX5, TXNRD2, GPX4, or NDUFS4 and TPC. The R2 value represents the quality of the correlation. b Quad biomarker levels and TPC were used to construct a multi-variable linear regression model. The y values were obtained and predicted TPC (TPCpred) was calculated (TPCpred = 2 y). R2 = 0.800. Comparisons between real TPC (TPCreal) and TPCpred were shown using the abovementioned formula. Mean absolute error = 3.583. c Comparison between TPCreal and TPCpred using an independent set of patients. Mean absolute error = 1.973.

Subsequently, the predicted TPC (TPCpred) values in the 27 CRPC samples (among the 46 CRPC samples, 1 sample with missing TPC value was deleted, and expression levels of quad biomarkers belonging to the same patient were averaged, resulting in 27 samples at the end) were compared with their ground truth TPC (TPC real) values (Fig. 8b and Table S3), with a mean absolute error of 3.583. The predictive power was further corroborated with an independent set of 11 CRPC patients (Table S4), with a mean absolute error of 1.973 (Fig. 8c). These results demonstrated the accuracy of our models and suggested that quad proteins can be used as diagnosis indices to predict the emergent time of CRPC.

In addition, the Ridge and Least absolute shrinkage and selection operator (Lasso) algorithm was used to build regression models on TPC prediction, wherein the process was similar to that in the multi-linear regression model. As shown in Fig. S17, the R2 values of the two models in the training set were 0.772 and 0.789, respectively. In the independent set, the Lasso model achieved a mean absolute error of 1.903, which was smaller than that observed in multi-linear or Ridge model (Dataset S7).

Discussion

Serum PSA [2426] and tissue Gleason grading [27] serve as standard tools for diagnosing PCa. PSA, a commonly employed pre-diagnostic marker, has a notable limitation with a high false-positive rate [28], as it can be elevated in conditions such as PCa, BPH, or prostatitis. Although tissue biopsy [29] and Gleason grading contribute to accurate PCa diagnosis, the invasiveness of these tests precludes their routine use as screening tools for monitoring disease progression. Furthermore, their ability to predict individual patient outcomes is hindered by the impact of inter-observer variability. Various tissue-based biomarkers, such as ConfirmMDx, Oncotype Dx, Prolaris, Decipher and ProMark [3032], as well as blood- or urine-based markers, such as PHI, 4Kscore, SelectMDx and MiPS [3335], have been developed. However, specific diagnostic tools for CRPC are lacking [36, 37]. In this study, we identified four proteins significantly elevated in PCa, especially in CRPC but not in BPH. These proteins can serve as valuable biomarkers in conjunction with PSA, potentially reducing unnecessary prostate biopsies. PSA plays a pivotal role as a diagnostic marker, with its consecutive rise after ADT indicating the emergence of CRPC [38]. However, PSA lacks predictive capabilities concerning TPC or the survival of CRPC patients. Our quad-biomarker panel (including GPX4, NDUFS4, PRDX5 and TXNRD2) addresses these limitations and holds promise for providing more accurate prognostic information throughout the continuum of PCa progression, potentially leading to improved patient outcomes.

It has long been proposed that inflammation is associated with prostate proliferative inflammatory atrophy and may drive prostate carcinogenesis via oxidative stress [39]. However, we uncovered quad proteins—PRDX5, TXNRD2, NDUFS4 and GPX4—in AR inhibitor-resistant DTP cells, and a heightened antioxidative state induced by AR inhibitor resistance may differ from that induced by inflammation.

The quad biomarkers are localised in the mitochondria or cytoplasm. Accordingly, it was of interest to determine how these proteins were released into serum and whether they were released by blood cells or tumour cells. Regardless, bloodborne proteins have advantages over tissue-based ones deployed as biomarkers in clinical practice. With regard to GPX4, three previously published articles [4042] have reported that GPX4 can be detected in human serum via ELISA. Therefore, GPX4 is considered a secreted protein with high confidence. As for PRDX5, it has been reported [43] that it is a secreted protein and is highly expressed in human serum. Furthermore, although the secretory property of TXNRD2 has not been revealed, TXNRD1 has been reported as a secreted protein in two studies [44, 45]. It is also known that TXNRD1 and TXNRD2 belong to the same family with a sequence identity of 92% [they belong to the class-I pyridine nucleotide-disulphide oxidoreductase family, and are members of the thioredoxin (Trx) system]. In addition, they are all selenoproteins [46]. Therefore, it is reasonable to speculate that TXNRD2 is also a secreted protein. In addition, by SecretomeP 2.0 (https://services.healthtech.dtu.dk/services/SecretomeP-2.0/) [47] analysis, the protein sequence of NDUFS4 had an NN-score of 0.870, which was much larger than the recommended threshold of 0.6 for mammalian secretory sequences. Therefore, NDUFS4 was predicted to be a secreted protein. The estimated scores of the quad proteins by SecretomeP 2.0 are listed in Table S7.

Our analysis showed that the quad biomarkers were independent biomarkers of age, PSA and Gleason scores for PCa. This was not surprising given their correlation with TPC. CRPC was not age-dependent and was not associated with the level of PSA or Gleason scores at the initial PCa diagnosis.

ML algorithms, as powerful statistical methods, had the ability to deal with complex and non-linear associations between input features and target variables. Combining fine-tuning hyper-parameters, the prediction model would learn sophisticated mapping relationships between clinical data and application in clinical diagnosis and prediction. Among the five models, RF and XGBoost obtained the best AUC values in tissue and serum cohorts, respectively. Compared with the widely used algorithms including K-Nearest Neighbours, Decision Trees, Naive Bayesian and Support Vector Machine, RF as an ensemble algorithm constructs a statistical model with hundreds of decision trees and each one fits part of the original dataset with bootstrap sampling strategy, extracting the distributions of dataset features and promoting the generalisability of the built models [22, 23]. The RF algorithm has been successfully applied in proteomics and biomarker mining [48, 49]. By combining a forward stepwise algorithm and additive model, the XGBoost used a series of base learners to minimise the residual error with gradient boosting technique, which has been used in several clinical prediction fields based on its powerful fitting ability to sophisticated problems [5052].

In this study, with the innovative quad biomarkers as input features, the best RF model obtained an AUC value of 0.958 and 0.904 on the validation sets in our tissue cohort and the outer testing set, respectively; the best XGBoost model obtained an AUC of 0.958 and 0.917 on validation and independent sets of the serum cohort, respectively. In addition, we found that the performance metrics of models from quad biomarkers were generally better than those from single ones with ≥10% improvement. The improvement was largely because of the expression information provided by the combination of the quad proteins to achieve better generalisability. Thus, the abovementioned results indicated that the best models could be denoted as promising tools for the classification of tissue and serum samples into different PCa development stages.

For complex ML models such as XGBoost, the mechanism through which the model made predictions was unclear and confusing for model users. Thus, this is an obstacle for clinicians to adopt these models in clinical practice and prognosis. As an advantage of our study, the SHAP framework was used as a post hoc explainable method to explain the prediction models. The SHAP framework provided feature importance and dependence scatter plots for global explanation and prediction process on individual samples as well as local explanation, which would facilitate the use of our clinical models derived from the quad biomarkers.

As intrinsic explainable methods, the models from multi-linear, Ridge and Lasso regression algorithms were used to predict the TPC among patients. The fit parameters corresponding to each biomarker indicated their impact on final decisions. Predicted errors of our models were measured by mean absolute error (MAE) and mean squared error (MSE). As per our understanding, compared with MSE, MAE is a better quantitative index to measure the generalisability of our linear model. The acquired R2, MAE and MSE values on training and independent sets are shown in Dataset S7. In particular, these results indicated that the model could accurately estimate the TPC among patients based on serum expressions of the quad biomarkers, further validating the ability of quad biomarkers to make clinical and prognostic predictions.

The current study has several limitations that should be proposed for further improvement studies. First, only the upregulated proteins (in Cluster VII) in DTP cells were studied. However, the downregulated proteins (Cluster VIII) may also contain great research values, and further exploration should be conducted to validate their functionality. Second, some multi-centre cohorts including heterogenous populations were better to validate the functionality of the quad biomarkers. However, given that this was an exploratory study, we proposed cellular, clinical and statistical level results to validate their prognostic diagnosis abilities, which provide a concrete basis for further assessment studies in the future. Third, to build an ML model with generalisation power, “big data” was necessary but no standard criterion was proposed to compute the adequate sample size for model development. Nevertheless, a well-designed cross-validation process together with confirmations on validation and independent sets provides adequate evidence to test the generalisation power of the models. Finally, the data imbalance occurred in serum cohorts with a smaller number of CRPC samples, which would lead to concerns of bias in prediction. As a common phenomenon, the number of CRPC patients was far less than that of other PCa patients. Thus, an independent set with another 11 CRPC patients was offered to validate the XGBoost model. The final AUC value of 0.917 showed good generalisability on the prediction of CRPC patients.

In this exploratory study, we have indicated that there were some obvious correlations between expression patterns of the quad proteins and CRPC progression among clinical patients. To further validate their functional correlations, a series of well-designed fundamental studies are needed to be completed including the following aspects. First, the specific molecular mechanisms and vital functional pathways in which the quad proteins participate, which in turn influence the progression or metastasis of CRPC, should be explored. Second, some innovative small molecules or drugs that can be used in targeted therapy to remission the progression or metastasis of CRPC should be screened, and their pharmacokinetics and molecular mechanisms should be illustrated in vivo. Third, the possible functional relations between quad proteins and gut microbiota and how their interactions influence the initiation, development, metastasis, and therapeutic efficacy of CRPC should be inquired.

Conclusion

In conclusion, to the best of our knowledge, this was the first exploratory study to investigate and validate the role of PRDX5, TXNRD2, GPX4 and NDUFS4 in clinical CRPC prognosis combining cell proteomics, bioinformatics analysis, clinical tissue and serum sample analysis, and explainable ML techniques. From experimental and statistical viewpoints, the quad proteins represented good clinical diagnosis potential for classifying between different stages of PCa and predicting TPC. Thus, they may potentially serve not only as biomarkers but also as therapeutic targets in clinical practice.

Supplementary information

Supplementary Dataset 1 (87.2KB, xlsx)
Supplementary Dataset 3 (957.8KB, pdf)
Supplementary Dataset 4 (952.9KB, pdf)
Supplementary Dataset 5 (12.1KB, xlsx)
Supplementary Dataset 6 (10.5KB, xlsx)
Supplementary Dataset 7 (10.6KB, xlsx)

Author contributions

RW, SPW, NHF, YQC and LWD conceptualised the study, designed and performed experiments, developed and optimised methodology, curated data, analysed data and visualised data. YYM, TYH, JW and JN developed and optimised methodology, designed and performed experiments, curated data and visualised data. JY, MML and XBR developed and optimised methodology, analysed data and visualised data. SYF, QYS, SYT and HPK developed and optimised methodology. RW and SPW drafted the manuscript. SYT, HPK, RW, SPW, YQC and JW edited the manuscript. YQC, LWD, RW, MLL and NHF provided financial support. All authors read and approved the final manuscript.

Funding

This study was supported by the National Natural Science Foundation of China (No. 82403686 & 82370777 & 82302654 & 31771539); by the Basic Research Program of Jiangsu (No. BK20241620 & BK20230188); by the Singapore Ministry of Health’s National Medical Research Council (NMRC) Open Fund—Individual Research Grant (OF-IRG; to L-W Ding; MOH-OFIRG21nov-0007); by the China Postdoctoral Science Foundation (No. 2024M751160); by the Jiangsu Funding Program for Excellent Postdoctoral Talent (No. 2024ZB069); and by Medical Research Program of Affiliated Hospital of Jiangnan University (No. YJY202306).

Data availability

All data supporting the findings of this study are available with the article, or from the corresponding author upon reasonable request. Data are also available via ProteomeXchange with the identifier PXD032983.

Competing interests

The authors declare no competing interests.

Ethics approval and consent to participate

Patient tissue and peripheral blood samples were collected from the Affiliated Hospital of Jiangnan University. Informed consent was obtained from all participants included in this study according to ethical committee regulations (the Affiliated Hospital of Jiangnan University, approval document number: LS202128).

Consent for publication

All authors have agreed to publish this manuscript.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Rong Wang, Shaopeng Wang.

Contributor Information

Lingwen Ding, Email: patdl@nus.edu.sg.

Yong Q. Chen, Email: yqchen@jiangnan.edu.cn

Ninghan Feng, Email: n.feng@njmu.edu.cn.

Supplementary information

The online version contains supplementary material available at 10.1038/s41416-025-02947-0.

References

  • 1.Huggins C, Hodeges CV. Studies on prostatic cancer. I. The effect of castration, of estrogen and androgen injection on serum phosphatases in metastatic carcinoma of the prostate. Cancer Res. 1941;1:293–7. [DOI] [PubMed] [Google Scholar]
  • 2.Cheng Q, Butler W, Zhou Y, Zhang H, Tang L, Perkinson K, et al. Pre-existing castration-resistant prostate cancer-like cells in primary prostate cancer promote resistance to hormonal therapy. Eur Urol. 2022;81:446–55. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Pienta KJ, Bradley D. Mechanisms underlying the development of androgen-independent prostate cancer. Clin Cancer Res. 2006;12:1665–71. [DOI] [PubMed] [Google Scholar]
  • 4.Wu MJ, Chen CJ, Lin TY, Liu YY, Tseng LL, Cheng ML, et al. Targeting KDM4B that coactivates c-Myc-regulated metabolism to suppress tumor growth in castration-resistant prostate cancer. Theranostics. 2021;11:7779–96. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Chowdhury S, Bjartell A, Agarwal N, Chung BH, Given RW, Pereira de Santana Gomes AJ, et al. Deep, rapid, and durable prostate-specific antigen decline with apalutamide plus androgen deprivation therapy is associated with longer survival and improved clinical outcomes in TITAN patients with metastatic castration-sensitive prostate cancer. Ann Oncol. 2023;34:477–85. [DOI] [PubMed] [Google Scholar]
  • 6.Ploussard G, Rozet F, Roubaud G, Stanbury T, Sargos P, Roupret M. Chromogranin A: a useful biomarker in castration-resistant prostate cancer. World J Urol. 2023;41:361–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Sharma SV, Lee DY, Li B, Quinlan MP, Takahashi F, Maheswaran S, et al. A chromatin-mediated reversible drug-tolerant state in cancer cell subpopulations. Cell. 2010;141:69–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Haven B, Heilig E, Donham C, Settles M, Vasilevsky N, Owen K. Registered report: a chromatin-mediated reversible drug-tolerant state in cancer cell subpopulations. eLife. 2016;5:69–80. [DOI] [PMC free article] [PubMed]
  • 9.Boumahdi S, de Sauvage FJ. The great escape: tumour cell plasticity in resistance to targeted therapy. Nat Rev Drug Discov. 2020;19:39–56. [DOI] [PubMed] [Google Scholar]
  • 10.Wang R, Mi Y, Ni J, Wang Y, Ding L, Ran X, et al. Identification of PRDX5 as a target for the treatment of castration-resistant prostate cancer. Adv Sci. 2023. 10.1002/advs.202304939. [DOI] [PMC free article] [PubMed]
  • 11.Fei X, Du X, Wang J, Liu J, Gong Y, Zhao Z, et al. Precise diagnosis and risk stratification of prostate cancer by comprehensive serum metabolic fingerprints: a prediction model study. Int J Surg. 2024. 10.1097/js9.0000000000001033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Kour B, Shukla N, Bhargava H, Sharma D, Sharma A, Singh A, et al. Identification of plausible candidates in prostate cancer using integrated machine learning approaches. Curr Genomics. 2023;24:287–306. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Gabriele C, Aracri F, Prestagiacomo LE, Rota MA, Alba S, Tradigo G, et al. Development of a predictive model to distinguish prostate cancer from benign prostatic hyperplasia by integrating serum glycoproteomics and clinical variables. Clin Proteom. 2023;20:52. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Badmos S, Noriega-Landa E, Holbrook KL, Quaye GE, Su X, Gao Q, et al. Urinary volatile organic compounds in prostate cancer biopsy pathologic risk stratification using logistic regression and multivariate analysis models. Am J Cancer Res. 2024;14:192–209. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Lundberg S, Lee S-I. A unified approach to interpreting model predictions. In: 31st Conference on Neural Information Processing Systems (NIPS 2017). Long Beach, CA, USA; 2017. 10.48550/arXiv1705.07874.
  • 16.Lundberg SM, Nair B, Vavilala MS, Horibe M, Eisses MJ, Adams T, et al. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nat Biomed Eng. 2018;2:749–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Ellwood-Yen K, Graeber TG, Wongvipat J, Iruela-Arispe ML, Zhang J, Matusik R, et al. Myc-driven murine prostate cancer shares molecular features with human prostate tumors. Cancer Cell. 2003;4:223–38. [DOI] [PubMed] [Google Scholar]
  • 18.Wang R, Wen P, Yang G, Feng Y, Mi Y, Wang X, et al. N-glycosylation of GDF15 abolishes its inhibitory effect on EGFR in AR inhibitor-resistant prostate cancer cells. Cell Death Dis. 2022;13:626. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Wang R, Mi Y, Ni J, Wang Y, Ding L, Ran X, et al. Identification of PRDX5 as a target for the treatment of castration-resistant prostate cancer. Adv Sci. 2024;11:e2304939. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Grasso CS, Wu YM, Robinson DR, Cao X, Dhanasekaran SM, Khan AP, et al. The mutational landscape of lethal castration-resistant prostate cancer. Nature. 2012;487:239–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Liberzon A, Birger C, Thorvaldsdóttir H, Ghandi M, Mesirov JP, Tamayo P, et al. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst. 2015;1:417–25. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Breiman L. Random forests. Mach Learn. 2001;45:5–32. [Google Scholar]
  • 23.Ho TK. Random decision forests. In: Proceedings of 3rd International Conference on Document Analysis and Recognition. Montreal, QC, Canada: IEEE; 1995, p. 278–82. 10.1109/ICDAR.1995.598994.
  • 24.Kattan MW, Eastham JA, Stapleton AM, Wheeler TM, Scardino PT. A preoperative nomogram for disease recurrence following radical prostatectomy for prostate cancer. J Natl Cancer Inst. 1998;90:766–71. [DOI] [PubMed] [Google Scholar]
  • 25.Stephenson AJ, Scardino PT, Eastham JA, Bianco FJ Jr, Dotan ZA, Fearn PA, et al. Preoperative nomogram predicting the 10-year probability of prostate cancer recurrence after radical prostatectomy. J Natl Cancer Inst. 2006;98:715–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Cooperberg MR, Broering JM, Carroll PR. Risk assessment for prostate cancer metastasis and mortality at the time of diagnosis. J Natl Cancer Inst. 2009;101:878–87. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Gleason DF. Classification of prostatic carcinomas. Cancer Chemother Rep. 1966;50:125–8. [PubMed] [Google Scholar]
  • 28.Draisma G, Etzioni R, Tsodikov A, Mariotto A, Wever E, Gulati R, et al. Lead time and overdiagnosis in prostate-specific antigen screening: importance of methods and context. J Natl Cancer Inst. 2009;101:374–83. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Rebello RJ, Oing C, Knudsen KE, Loeb S, Johnson DC, Reiter RE, et al. Prostate cancer. Nat Rev Dis Prim. 2021;7:9. [DOI] [PubMed] [Google Scholar]
  • 30.Eggener SE, Rumble RB, Armstrong AJ, Morgan TM, Crispino T, Cornford P, et al. Molecular biomarkers in localized prostate cancer: ASCO guideline. J Clin Oncol. 2020;38:1474–94. [DOI] [PubMed] [Google Scholar]
  • 31.Lokeshwar SD, Syed JS, Segal D, Rahman SN, Sprenkle PC. Optimal use of tumor-based molecular assays for localized prostate cancer. Curr Oncol Rep. 2022;24:249–56. [DOI] [PubMed] [Google Scholar]
  • 32.Olleik G, Kassouf W, Aprikian A, Hu J, Vanhuyse M, Cury F, et al. Evaluation of new tests and interventions for prostate cancer management: a systematic review. J Natl Compr Cancer Netw. 2018;16:1340–51. [DOI] [PubMed] [Google Scholar]
  • 33.Bronimann S, Pradere B, Karakiewicz P, Abufaraj M, Briganti A, Shariat SF. An overview of current and emerging diagnostic, staging and prognostic markers for prostate cancer. Expert Rev Mol Diagn. 2020;20:841–50. [DOI] [PubMed] [Google Scholar]
  • 34.Visser WCH, de Jong H, Melchers WJG, Mulders PFA, Schalken JA. Commercialized blood-, urinary- and tissue-based biomarker tests for prostate cancer diagnosis and prognosis. Cancers. 2020;12:3790. [DOI] [PMC free article] [PubMed]
  • 35.Dong M, Lih TM, Chen SY, Cho KC, Eguez RV, Hoti N, et al. Urinary glycoproteins associated with aggressive prostate cancer. Theranostics. 2020;10:11892–907. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Zhou L, Song Z, Hu J, Liu L, Hou Y, Zhang X, et al. ACSS3 represses prostate cancer progression through downregulating lipid droplet-associated protein PLIN3. Theranostics. 2021;11:841–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Han Q, Xie QR, Li F, Cheng Y, Wu T, Zhang Y, et al. Targeted inhibition of SIRT6 via engineered exosomes impairs tumorigenesis and metastasis in prostate cancer. Theranostics. 2021;11:6526–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Llop E, Ferrer-Batalle M, Barrabes S, Guerrero PE, Ramirez M, Saldova R, et al. Improvement of prostate cancer diagnosis by detecting PSA glycosylation-specific changes. Theranostics. 2016;6:1190–204. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Sfanos KS, Yegnasubramanian S, Nelson WG, De Marzo AM. The inflammatory microenvironment and microbiome in prostate cancer development. Nat Rev Urol. 2018;15:11–24. [DOI] [PubMed] [Google Scholar]
  • 40.Su X, Wang Z, Li J, Gao S, Fan Y, Wang K. Hypermethylation of the glutathione peroxidase 4 gene promoter is associated with the occurrence of immune tolerance phase in chronic hepatitis B. Virol J. 2024;21:72. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Liu J, An W, Zhao Q, Liu Z, Jiang Y, Li H, et al. Hyperbaric oxygen enhances X-ray induced ferroptosis in oral squamous cell carcinoma cells. Oral Dis. 2024;30:116–27. [DOI] [PubMed] [Google Scholar]
  • 42.Abbas SF, Abdulkadim H, Hadi NR. Assessing the cardioprotective effect of necrosulfonamide in doxorubicin-induced cardiotoxicity in mice. J Med Life. 2023;16:1468–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Uzawa A, Kawaguchi N, Kanai T, Himuro K, Oda F, Kuwabara S. Increased serum peroxiredoxin 5 levels in myasthenia gravis. J Neuroimmunol. 2015;287:16–18. [DOI] [PubMed] [Google Scholar]
  • 44.Mahmoudian E, Khalilnezhad A, Gharagozli K, Amani D. Thioredoxin-1, redox factor-1 and thioredoxin-interacting protein, mRNAs are differentially expressed in multiple sclerosis patients exposed and non-exposed to interferon and immunosuppressive treatments. Gene. 2017;634:29–36. [DOI] [PubMed] [Google Scholar]
  • 45.Lin W, Tang Y, Zhang M, Liang B, Wang M, Zha L, et al. Integrated bioinformatic analysis reveals TXNRD1 as a novel biomarker and potential therapeutic target in idiopathic pulmonary arterial hypertension. Front Med. 2022;9:894584. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Meplan C, Rohrmann S, Steinbrecher A, Schomburg L, Jansen E, Linseisen J, et al. Polymorphisms in thioredoxin reductase and selenoprotein K genes and selenium status modulate risk of prostate cancer. PLoS ONE. 2012;7:e48709. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Bendtsen JD, Jensen LJ, Blom N, Von Heijne G, Brunak S. Feature-based prediction of non-classical and leaderless protein secretion. Protein Eng Des Sel. 2004;17:349–56. [DOI] [PubMed] [Google Scholar]
  • 48.Gunther EC, Stone DJ, Gerwien RW, Bento P, Heyes MP. Prediction of clinical drug efficacy by classification of drug-induced genomic expression profiles in vitro. Proc Natl Acad Sci USA. 2003;100:9608–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Tong W, Xie Q, Hong H, Shi L, Fang H, Perkins R, et al. Using decision forest to classify prostate cancer samples on the basis of SELDI-TOF MS data: assessing chance correlation and prediction confidence. Environ Health Perspect. 2004;112:1622–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Ferrigno I, Verzellesi L, Ottone M, Bonacini M, Rossi A, Besutti G, et al. CCL18, CHI3L1, ANG2, IL-6 systemic levels are associated with the extent of lung damage and radiomic features in SARS-CoV-2 infection. Inflammation Res. 2024. 10.1007/s00011-024-01852-1. [DOI] [PubMed]
  • 51.Wu P, Zhang C, Tang X, Li D, Zhang G, Zi X, et al. Pan-cancer characterization of cell-free immune-related miRNA identified as a robust biomarker for cancer diagnosis. Mol Cancer. 2024;23:31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Zhang Y, Ma Y, Wang J, Guan Q, Yu B. Construction and validation of a clinical prediction model for deep vein thrombosis in patients with digestive system tumors based on a machine learning. Am J Cancer Res. 2024;14:155–68. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Dataset 1 (87.2KB, xlsx)
Supplementary Dataset 3 (957.8KB, pdf)
Supplementary Dataset 4 (952.9KB, pdf)
Supplementary Dataset 5 (12.1KB, xlsx)
Supplementary Dataset 6 (10.5KB, xlsx)
Supplementary Dataset 7 (10.6KB, xlsx)

Data Availability Statement

All data supporting the findings of this study are available with the article, or from the corresponding author upon reasonable request. Data are also available via ProteomeXchange with the identifier PXD032983.


Articles from British Journal of Cancer are provided here courtesy of Cancer Research UK

RESOURCES