Skip to main content
Communications Medicine logoLink to Communications Medicine
. 2025 Mar 4;5:61. doi: 10.1038/s43856-025-00783-0

5-hydroxymethylcytosine sequencing of plasma cell-free DNA identifies epigenomic features in prostate cancer patients receiving androgen deprivation therapies

Qianxia Li 1,2,#, Chiang-Ching Huang 3,#, Shane Huang 4, Yijun Tian 1, Jinyong Huang 1, Amirreza Bitaraf 1, Xiaowei Dong 3, Marja T Nevalainen 5, Manishkumar Patel 1, Jodie Wong 1, Jingsong Zhang 6, Brandon J Manley 6, Jong Y Park 7, Manish Kohli 8, Elizabeth M Gore 9, Deepak Kilari 10,, Liang Wang 1,
PMCID: PMC11880319  PMID: 40038525

Abstract

Background

We evaluated whether 5hmC signatures in cell-free DNA (cfDNA) are associated with treatment failure to androgen-deprivation therapies (ADT) among men with hormone-naive prostate cancer.

Methods

We collected a total of 139 serial plasma samples from 55 prostate cancer patients receiving ADT at 3 time points including baseline (before initiating ADT, n = 55); 3 months (after initiating ADT, n = 55); and disease progression (n = 15) within 24 months or 24 months if no progression was detected (n = 14). We used selective chemical labeling sequencing to quantify 5hmC abundance across the genome and Kaplan–Meier analysis to assess survival association.

Results

Here we show a significant 5hmC difference in 1642 of 23433 genes between patients with and without progression (false discovery rate [FDR] < 0.1) in baseline plasma samples. Patients with progression demonstrate significant 5hmC enrichments in multiple hallmark gene sets, with androgen responses as the top enriched gene-set (FDR = 1.19E−13). We further show a significant association between high activity scores in these gene sets and poor progression-free survival (P < 0.05), even after adjusting for circulating tumor DNA fraction and prostate-specific antigen values. Additionally, our longitudinal analysis shows that the high activity score is significantly reduced after 3 months of initiating ADT (P = 0.0004) but returns to higher levels when the disease progresses (P = 0.0317).

Conclusions

5hmC-based activity scores from gene-sets involved in AR responses show great potential in assessing treatment resistance, monitoring disease progression, and identifying patients who would benefit from upfront treatment intensification. However, further studies are needed to validate these findings.

Subject terms: Predictive markers, Tumour biomarkers, Prostate

Plain Language Summary

Cancer cells release a molecule called DNA into the blood. We investigated whether the structure of the DNA that is released could predict whether people with advanced prostate cancer would respond well to androgen deprivation therapy (ADT), a type of treatment that reduces levels of the hormone testosterone. We collected blood samples from people before, during, and after treatment. We found differences in the structure of the DNA released into the blood between people who responded in different ways to treatment. Evaluating the structure of the DNA released prior to and during treatment could be used to determine the best treatment to use and whether people with prostate cancer are responding well to treatment.


Li, Huang, et al. examine 5-Hydroxymethylcytosine (5hmC) profiles in cell-free DNA from plasma taken from hormone-naïve prostate cancer patients. High 5hmC signals in the androgen-related pathways are associated with poor progression-free survival in the patients receiving androgen deprivation therapy (ADT) and its combination therapies.

Background

Prostate cancer is the most common cancer among men and the second leading cause of cancer-related deaths1. Prostate cancer cells rely on circulating androgens to activate endogenous androgen receptor (AR)2. Suppression of testicular androgens by castration (medical or surgical) is the mainstay of treatment for metastatic disease3. Androgen deprivation therapies (ADT) achieve remission among 80% to 90% of men with advanced prostate cancer as measured by monitoring a decrease of serum prostate-specific antigen (PSA) and an average progression-free interval of 12 to 33 months4. Despite the efficacy of ADT, disease progression from a hormone-sensitive state is inevitable to a castration-resistant state5. To address this issue, the therapeutic strategy has evolved to adding other systemic agents. The combination of ADT with docetaxel or with new androgen receptor signaling inhibitors (ARSI) has shown substantial benefits6. Because docetaxel and ARSI have different mechanisms of action on androgen signaling and prostate cancer cells, the combination may enhance the treatment effect79. However, patients with similar clinical pathological factors may respond differently, suggesting phenotypic heterogeneity and the potential role of genetic background in the treatment response. To offer more effective treatment, it will be essential to understand which patient group would truly benefit from this therapy. The development of biomarkers will facilitate the selection of the most appropriate treatment.

Currently, tissue biopsies have demonstrated a limited role outside of histologic diagnosis for treatment decisions and are impractical to perform routinely in clinical practice because of the bone-predominant metastasis of prostate cancer. To address this challenge, recent research focus has turned to the development of minimally invasive biomarkers from bodily fluids10. The blood serological biomarker PSA is commonly used in screening for and diagnosing prostate cancer, as well as for monitoring treatment response11. However, PSA does not predict systemic treatment response. In the search for more predictive biomarkers, cell-free DNA (cfDNA) circulating in blood has attracted much attention. The minimally invasive detection of somatic variations using blood samples offers substantial advantages over tissue biopsy because it can detect the entire genetic makeup from tumor tissues. The easy accessibility of blood makes it an ideal sample source for real-time and dynamic monitoring of the treatment response and disease progression12.

DNA methylation at specific genomic regions is characteristic of human cancers and has been used as a specific biomarker for cancer detection and clinical outcome prediction13. In addition to commonly reported 5-methylcytosines in the genome, 5-hydroxymethylcytosines (5hmC) are also abundant. Recent genome-wide sequencing maps of 5hmC in various mammalian cells and tissues support its regulatory role for gene expression14. The 5hmC is enriched at transcriptionally active regions (such as gene bodies) and may represent dynamically activated transcription rather than constitutively expressed housekeeping genes15. Therefore, 5hmC has emerged as a new class of cancer epigenomic biomarkers, with a recent study showing cfDNA 5hmC as a prognostic factor in metastatic prostate cancer16.

So far, however, little is known about the potency and reliability of the cell-free 5hmC as a predictive biomarker of treatment response for metastatic hormone-sensitive prostate cancer under standard-of-care ADT-related combination therapies. To address this question, we employed the 5hmC-Seal17, a highly sensitive and selective chemical labeling-based sequencing technology, to profile the spectrum of 5hmC among the hormone-sensitive patients with 3 serial time points of blood collection from before ADT to 3 months after ADT and at the time of progression. Our results show that 5hmC-based epigenomic features are associated with treatment response and classify early resistance to ADT-related combination therapies.

Patients and methods

Patient cohort

A total of 55 prostate cancer patients were enrolled in the prospective cohort study. Among those, 30 patients were recruited from the Medical College of Wisconsin, and an additional 25 were recruited from the Veterans Affairs Medical Center in Milwaukee, Wisconsin. This study was approved by the Institutional Review Boards of the Veterans Affairs Medical Center, the Medical College of Wisconsin (PRO000023842), and the Moffitt Cancer Center (MCC20351). Written informed consent was obtained from all participants before enrollment. All patients were diagnosed with advanced prostate cancer and met the NCCN treatment guidelines for ADT-based treatments. Blood was collected before initiating ADT (baseline), at 3 months after initiating ADT, and at 24 months after ADT or at disease progression within the 2-year follow-up. Progression was defined as either radiological (defined by the criteria of the Prostate Cancer Working Group 2)18 or clinical (defined as worsening disease-related symptoms necessitating a change in anti-cancer therapy and/or deterioration in ECOG performance status ≥2 levels)19. The study design and workflow are shown in Fig. 1.

Fig. 1. Study design and blood collection.

Fig. 1

A total of 55 patients who met the NCCN guidelines for ADT were enrolled. Blood was collected at baseline (n = 55), 3-month (n = 55), disease progression (n = 14), and 24-month if no progression (n = 15).

Blood processing

Plasma was prepared within 2 h after blood draw using EDTA tubes. Collected blood samples were first centrifuged at 1000 g for 10 min at room temperature to separate plasma. The platelet-rich plasma was immediately centrifuged again at 5000 g for an additional 10 min to collect platelets-poor plasma before being stored at −80 °C. cfDNAs were extracted from 1 ml of platelets-poor plasma using a QIAamp DNA Blood Mini Kit (Qiagen). If 1 ml plasma did not generate 5 ng cfDNA, an additional 1 ml plasma was used for the cfDNA extraction. The final DNA eluent (50 μl) was quantified by a Qubit 2.0 Fluorometer (Life Technology) and stored at −80 °C until used. Additionally, cfDNA from plasma samples and gDNA from peripheral blood mononuclear cells (PBMCs) in 8 healthy subjects were also collected.

Spike-in controls for 5hmC enrichment

The 5hMe-Seal method has been previously published17,20. In brief, the spiked-in control was generated by polymerase chain reaction– (PCR–) amplified lambda DNA using a cocktail of dATP/dGTP/dTTP and one of the following: dCTP, dmCTP or 10% dhmCTP (Zymo)/90% dCTP. Primers sequences are as follows: dCTP FW-5′-CGTTTCCGTTCTTCTTCGTC-3′, RV-5′ TACTCGCACCGAAAATGTCA-3′; dmCTP FW-5′-GTGGCGGGTTATGATGAACT-3′, RV-5′-CATAAAATGCGGGGATTCAC-3′; 10% dhmCTP/90% dCTP FW-5′-TGAAAACGAAAGGGGATACG-3′, RV-5′-GTCCAGCTGGGAGTCGATAC-3′. The spike-in probes were used to monitor the robustness and sensitivity of the 5hmC-Seal. In total, 0.2 million copies of 5hmC and no 5hmC spike-ins were first mixed and then added to the cfDNA samples before library preparation. The input control samples omitted the 5hmC pull-down step to generate non-5hmC enrichment libraries. After sequencing, spike-in reads were called, and enrichment ratios were calculated.

5hmC library construction and high-throughput sequencing

5hmC libraries were constructed as described previously20. Briefly, the 5–10 ng cfDNA was ligated with sequencing adaptors. The ligated DNA was incubated in a 25 μl reaction solution containing HEPES buffer (50 mM, pH 8.0); MgCl2 (25 mM), N3-UDP-Glc (100 mM, Active Motif); and 12.5 U T4 phage β-glucosyltransferase (βGT, Thermo Fisher Scientific) for 1 h at 37 °C. Then, 2.5 µl of DBCO-PEG4-biotin (20 mM stock in dimethyl sulfoxide, Click Chemistry Tools) was added to the reaction mixture and incubated for 2 h at 37 °C. Subsequently, the DNA Clean & Concentrator™-5 (ZYMO Research) was used to purify the DNA. Thereafter, the purified DNA was incubated with C1 streptavidin beads (5 µl, Life Technologies) for 15 min at room temperature. The beads subsequently underwent 8 five-minute washes with buffer (5 mM Tris pH 7.5, 0.5 mM EDTA, 1 M NaCl, and 0.1% Tween 20). All binding and washing were done at room temperature with gentle rotation. Beads were then resuspended in water and amplified with 12 to 15 (cfDNA) or 9 (whole blood genomic DNA) cycles of PCR amplification using a KAPA Library Amplification Kit (Kapa Biosystems). A separate non-enrichment control library from cfDNA was also made by direct PCR amplification from ligated DNA without labeling and capture. The amplified products were purified using AMPure XP beads and used for a high-throughput 75-cycle single-end sequencing on the Illumina NextSeq 500 platform.

Data processing and normalization

Pre-alignment quality control was performed for the raw sequenced reads using fastp, v0.20.121 with the default settings. The raw 5hmC-Seal data were first cleaned for adapter sequences, followed by aligning to the human genome (hg19)using Bowtie-2, v2.4.222. SAMtools, v1.11 command lines were used to convert the file format from SAM to BAM, followed by sorting, indexing, and duplicate read removing23. The alignment excluded the ENCODE blacklisting regions24. Enrichment regions were identified by model-based analysis of ChIP-Seq (MACS)25. FeatureCounts from the Subread package (Release 2.0.3) were used to call read counts for each gene26. The Human Release 19 comprehensive gene annotation (GRCh37.p13) was used as a reference. The read counts for either 1000 bp bins or individual genes were extracted using bedtools27 and further normalized using DESeq228, which performs the variance-stabilizing transformation to correct for sequencing depth and library size.

Circulating tumor DNA–fraction estimation

Circulating tumor DNA– (ctDNA-) fraction calculation was previously published29. In brief, the mapped reads were first binned into 1 Mb genomic segments. The read count ratio was then calculated using the cfDNA read count in each segment divided by the average read count of 8 unrelated healthy male donors. After removing centromeres and other repeat-rich regions, the log2 ratios in genomic bins were subjected to segmentation analysis. ctDNA fraction was estimated by using the log2 ratio of the most significantly deleted fragment (>20 Mb) across all segments in the genome. Specifically, the ctDNA fraction was calculated as 12min.LR, in which min.LR is the smallest log2 ratio across the entire segmented genome after excluding segments either smaller than 20 Mb or on chromosome Y.

Calculation of gene-set activity scores

Gene lists of selected gene sets were downloaded from three data sources. The hallmark androgen response gene set was derived from the Gene Set Enrichment Analysis website. Lists of coexpressed genes with AR, FOXA1, and GRHL2 as well as AR targets by ChIP-seq were downloaded from the Enrichr gene set library30. The list of androgen signaling genes was copied from the publication31. To calculate the gene-set activity score, we transformed read counts into a log2 ratio of read counts in each individual gene between each patient and a pooled healthy control. We then identified the mean value of the log2 ratios in the gene set and further multiplied the mean value by 100. The formula is the following:

ActivityScore=100*i=1nlog2(RCiP)log2(RCiC)/n

In which RCiP is the normalized read count of gene i for a patient, RCiC is the average normalized read count of gene i among the 7 healthy controls, and n is the total number of genes in a gene set.

Statistical analyses

DESeq2 was used to identify the differentially methylated genes between progressed and nonprogressed patients and among 3 epigenetic groups. To correct for multiple testing, a false discovery rate (FDR) was used, with the FDR less than 0.1 being used as the cutoff. Student’s t-tests were used to compare the activity score, ctDNA fractions, and PSA values among different groups of patients. Kaplan Meir analysis was used to identify epigenomic features that were associated with progression-free survival (PFS). Analysis of covariance was applied to adjust for the effect of ctDNA fraction and clinical covariates. For differences in gene-set activity scores, ctDNA fraction, and PSA at baseline, a 2-sided P value less than 0.05 was considered significant.

Results

Clinical characteristics and plasma collection

In the 55 hormone-naive prostate cancer patients, the median age was 69 years (range 49–94). Based on self-reported race, 44 patients identified as White, 9 as African American, and 2 did not specify their race. At enrollment, 44 of the 55 patients showed bone or soft tissue metastasis, and the remaining 11 presented with non-metastatic castrate status (biochemical recurrence). Among the patients with metastasis, 18 had high-volume disease, and 26 had low-volume disease (CHAARTED criteria). During the 24-month follow-up, 20 patients showed disease progression (defined as early resistance to ADT in this study), and 35 had no sign of disease progression. More clinical characteristics of this cohort are summarized in Table 1 (Supplementary Data 1 for detail). Plasma samples from baseline (before ADT) and 3 months after ADT were collected from all 55 patients. Additional plasma samples were collected either at the time of progression from 14 of 20 progressed patients or at 24 months after ADT from 15 of 35 non-progressed patients.

Table 1.

Clinical characteristics of patients

Total Number 55
Age
Median 69
Range 49-94
Race
White 44
African American 9
Others 2
Gleason score at initial diagnosis
6 2
7 21
>=8 30
Missing 2
Baseline Prostate-Specific Antigen (ng/ml)
Median 21.5
Range 0.29-3275
Metastases to bone or soft tissues
Yes 44
No 11
Metastatic volume
High 18
Low 26
No 11
Radical prostatectomy (RP) and radiotherapy (RT)
RP only 7
RT only 8
RP + RT 12
None of RP and RT 28
Progression within 24-month follow up
Yes 20
No 35
Time from ADT start to ADT failure (months)
Median 12.53
Range 5.6-23.2

5hmC sequencing in cfDNA shows high enrichment efficiency and specificity

To determine 5hmC enrichment efficiency and specificity, we spiked in a pool of 180 bp amplicons containing either C, 5mC, or 5hmC into cfDNA during library preparation. We first performed PCR analysis in 5hmC-enriched libraries and observed amplicons in 5hmC-containing DNA only (Supplementary Fig. 1a). This result was confirmed in the final sequencing libraries, which showed over 100-fold enrichment in reads mapping to 5hmC spike-in DNA (Supplementary Fig. 1b, c). We also examined the duplication rate in these 5hmC libraries. With a median read count of 18.3 (6.03–42.43) million/sample, the sequencing libraries showed 98% (95%-99%) mappable reads and 75% (72%-78%) unique (nonduplicate) reads (Supplementary Data 2).

5hmC profiles in cfDNA are different from PBMC-derived gDNA

To evaluate the distribution of 5hmC in cfDNA, we first performed enrichment analysis using 1 kb window bins across the genome. Peak detection in healthy control cfDNAs showed that 66.7% of 5hmC-enriched regions were in the gene bodies, including introns (47%) and CDS (19.7%), whereas only 6.1% enriched regions were found in 5′ UTRs (Supplementary Fig. 2a). We then compared gene body read counts between cfDNA from healthy controls and peripheral blood gDNA. From a total of 23433 genes with a median raw read count greater than or equal to 8, we observed 11688 differentially methylated genes (DMGs), including 5819 hypermethylated genes and 5869 hypomethylated genes in cfDNA (FDR < 0.1, Supplementary Fig. 2b). Enrichment analysis showed that the hypermethylated DMGs were enriched in apoptotic process and immune response (Supplementary Fig. 2c), whereas hypomethylated genes were enriched in cell structure, cell adhesion, and neuron protection (Supplementary Fig. 2d).

Baseline plasma cfDNAs show diverse 5hmC epigenomic profiles

To test 5hmC differences at baseline cfDNAs, we compared the read count in each individual gene between patients with disease progression and those without disease progression during the 24-month follow-up. Among the 23433 genes tested, we identified 1642 DMGs (FDR < 0.1), including 1008 hypermethylated and 634 hypomethylated genes in the patients with progression (Supplementary Data 3). We further performed enrichment analysis using these DMGs in hallmark gene sets to evaluate molecular mechanisms underlining the differential methylation. This analysis showed significant enrichment in multiple gene sets with androgen response (FDR = 1.19E−13) and estrogen response early (FDR = 5.50E−13) as the top 2 gene sets in hypermethylated genes, and complement (FDR = 1.13E−3) and inflammatory response (FDR = 2.93E−03) as the top 2 gene sets in hypomethylated genes (Fig. 2a, b). Interestingly, these DMGs revealed significant epigenomic heterogeneity within both clinical groups of patients. Hierarchical clustering analysis showed that a fraction of the patients with progression demonstrated unique epigenomic features that separated them from all other patients although most patients with progression tended to cluster with non-progressed patients (Fig. 2c). Specifically, although the 55 patients were clinically classified into 2 groups (progression and no progression), they were epigenetically classified into three groups by unsupervised analysis. EpiGroup 1 included patients with disease progression (n = 6) who showed distinct epigenomic features. EpiGroup 2 featured highly similar epigenomic profiles between the subgroups of progressed (n = 14) and non-progressed patients (n = 17). EpiGroup 3 was a unique cluster of patients without disease progression (n = 18). The principal component analysis also showed a clear separation of the 3 EpiGroups (Fig. 2d).

Fig. 2. Gene-set enrichment and epigenomic heterogeneity.

Fig. 2

a Gene-set enrichment in hypermethylated genes of baseline samples when comparing progressed with non-progressed patients. b Gene set enrichment in hypomethylated genes of baseline samples when comparing progressed with non-progressed patients. c Separation of differentially methylated genes into three epigenomic groups among both progressed and non-progressed patients. d Principal component analysis separating three epigenomic clusters.

Patients with different epigenomic features show 5hmC hypermethylation in unique signaling pathways

To evaluate molecular mechanisms of the epigenomic heterogeneity, we compared the 5hmC profiles using the 23433 genes among 3 EpiGroups. This comparison identified 13,220 (EpiGroups 1 vs. 2), 5046 (EpiGroups 1 vs. 3), and 12603 (EpiGroups 2 vs. 3) DMGs (Supplementary Data 4). Enrichment analysis showed significant differences in a wide variety of signaling and regulatory pathways. Notably, when compared with either EpiGroup 2 or EpiGroup 3, EpiGroup 1 consistently showed that the androgen response was the most significantly hypermethylated gene set (FDR ≤ 5.63E−10), while immune responses (complement, allograft rejection, and inflammation) were among the top hypomethylated gene sets (FDR ≤ 1.04E−4) (Supplementary Fig. 3a–d, Supplementary Data 5). Of note, these hypermethylated and hypomethylated gene sets had a trend similar to the significant gene sets when comparing 2 clinical groups (Fig. 2a-b). Clearly, EpiGroup 1 demonstrated unique epigenomic features that could separate the patients with progression from those without progression. This unique subgroup is characterized by 5hmC hypermethylation (activation) in the androgen-related gene sets and 5hmC hypomethylation (inactivation) in immune responses. When comparing EpiGroup 2 with EpiGroup 3, we observed significant enrichment in hallmark P53 pathway and mitotic spindle gene sets (FDR ≤ 2.13E−6) with an increased methylation in EpiGroup 2 (Supplementary Data 5).

Gene-set activity scores of baseline samples classify patients with different clinical outcomes

To quantify the active status of these gene sets, we developed a gene-set activity scoring algorithm using a mean value of log2 ratios among all genes of selected gene sets. We first calculated the activity score using 97 genes that overlapped with the hallmark androgen response gene set. The average activity scores were statistically different (P = 5.43E−05) between patients with progression (score = 6.62 ± 11.65) and those without progression (score = −3.23 ± 4.90) (Fig. 3a). Subgroup analysis showed that the difference was driven by EpiGroup 1. Specifically, the average activity scores were 22.27 in EpiGroups 1, −0.84 in EpiGroups 2, and −4.90 in EpiGroups 3. The activity score was significantly higher in EpiGroup 1 than in EpiGroup 2 (P = 4.73E−14) and EpiGroup 3 (P = 1.73E−08) (Fig. 3b). Clearly, the patients in EpiGroup 1 had the highest activity scores that were distinguished from the remaining patients regardless of progression status (Fig. 3c). Meanwhile, to evaluate whether the activity score was predictive of disease outcome, we performed a Kaplan-Meier analysis and observed a significant association of the activity score with PFS (P = 0.0006, HR = 5.35, Fig. 3d). Specifically, 57.14% (16 of 28) of patients with high activity scores showed disease progression at 24 months. In the same follow-up period, however, only 14.82% (4 of 27) of patients with low activity scores had disease progression. Subgroup analysis also showed the survival differences among the 3 EpiGroups (P < 0.0001, Fig. 3e).

Fig. 3. Activity scores in androgen signaling gene sets separate patients with different clinical outcomes.

Fig. 3

a The activity score of the hallmark androgen response gene set is significantly higher among patients with progression (Y) than without progression (N). b Activity score in EpiGroup 1 is significantly higher than in EpiGroup 2 and EpiGroup 3. c Waterfall plot shows a clear-cut activity score difference in EpiGroup 1 when compared with other EpiGroups. d High activity score is associated with poor PFS. e Three EpiGroups show different PFS. The bottom and top of the box in the box plots (a, b) are the first and the third quartile, respectively.

In addition to the hallmark androgen response gene set, we also tested other androgen-related gene sets, including AR target genes (ChIP-seq), GRHL2-coexpressed genes, FOXA1-coexpressed genes, AR-coexpressed genes30 and AR signaling pathway31. The overall distribution of these activity scores in the selected gene sets, along with their linkage to EpiGroups, disease progression, and metastatic status, is shown in Fig. 4a. To test the activity differences, we first compared 2 groups of patients with different clinical outcomes in baseline samples. This comparison showed significantly higher activity scores among patients with progression than those without progression (Fig. 4b). We then compared 3 groups of patients with different epigenomic features. This analysis demonstrated consistently higher activity scores among the patients in EpiGroup 1 than among those in EpiGroups 2 and 3 patients (Fig. 4c). In addition to the hallmark androgen response gene set (Fig. 3b), 4 of the other 5 androgen-related gene sets (GRHL2-, FOXA1- and AR-coexpressed gene sets, and androgen signaling gene set) also demonstrated higher activity scores in EpiGroup 2 than in EpiGroup 3 (Fig. 4c). Kaplan–Meier analysis showed poor PFS among patients with high activity scores (Fig. 4d).

Fig. 4. Activity scores among patients with different clinical outcomes or with different epigenetic statuses.

Fig. 4

a Heatmap showing the distribution of activity scores in androgen response gene sets, disease progression, and metastatic status. b Significant activity score differences between progressed (Y) and non-progressed (N) patients. c Significant activity score differences among three EpiGroups (EG1, EG2 and EG3). d Association of high activity scores with poor PFS. The bottom and top of the box in the box plots (b, c) are the first and the third quartile, respectively.

Gene set activity scores are highly correlated with ctDNA fraction and PSA level

ctDNA fraction has been shown to have a significant association with poor clinical outcomes in various cancers including advanced prostate cancer32. Because the whole genome 5hmC sequencing data can be used for estimating ctDNA fraction20, we calculated the ctDNA fraction using a previously published algorithm29. We then performed linear correlation analysis between the ctDNA fraction and AR-related signaling activity scores. This analysis showed a significant correlation (R2 = 0.406) (Fig. 5a). The EpiGroup1 samples had the highest ctDNA fraction and highest activity score. When excluding the EpiGroup1 group, however, such an association no longer existed (R2 = 0.0071, Fig. 5b). We also performed the correlation analysis between the baseline PSA and AR-related activity scores and observed a strong association (R2 = 0.296). Again, EpiGroup1 samples appeared to drive the association (Fig. 5c). After excluding the EpiGroup1 subgroup, the significant association disappeared (R² = 0.0796, Fig. 5d). The heatmap of ctDNA fraction and PSA level also supported their significant association with the gene-set activity score (Fig. 4a). Given that the activity score, ctDNA fraction, and PSA levels are all linked to disease progression, we further examined their relationship with metastatic volume. This analysis demonstrated a consistent decrease in activity scores, ctDNA fractions, and PSA levels from high volume to low volume to non-metastatic disease (Fig. 5e–g).

Fig. 5. Association and dynamics of activity score, ctDNA fraction, and PSA level.

Fig. 5

a, b ctDNA fraction shows a strong association with gene-set activity score. EpiGroup 1 clearly drives the association. c, d PSA level shows a strong association with gene-set activity score. EpiGroup 1 drives the association. eg Metastatic volume is positively associated with gene set activity score (e), ctDNA fraction (f), and PSA level (g). hj Dynamic changes of gene set activity score (h), ctDNA fraction (i), and PSA level (j). The activity score is the only molecular event that consistently decreases by 3-month treatment and consistently increases upon disease progression. The bottom and top of the box in the box plots (eg) are the first and the third quartile, respectively.

Baseline gene set activity score is associated with early treatment failure independent of clinical factors and androgen receptor amplification

Because multiple clinical factors have shown a potential association with disease progression, we tested whether these clinical factors had any effect on the discriminative performance of the gene set activity scores. We first tested the Gleason score at diagnosis, age at enrollment, metastatic volume, and baseline PSA for their potential association with disease progression among the 55 patients. Among all the clinical factors tested, only the PSA level at enrollment showed a significant association with disease progression (P = 2.35E−05) (Supplementary Fig. 4a). We also tested the difference in baseline ctDNA fractions between progressed and nonprogressed patients. Such a test revealed a significantly higher ctDNA fraction in progressed patients (P = 0.0035) (Supplementary Fig. 4b). To evaluate the effect of these factors on the association of activity scores with clinical outcomes, we performed an analysis of covariance by using the baseline PSA level and ctDNA fraction as covariates, our analysis showed that baseline activity scores of patients with disease progression remained significantly higher than those without progression in 4 of 6 gene-sets (Supplementary Data 6). To further evaluate the effect of the patients in EpiGroup 1 on disease progression, we excluded this subgroup and performed subgroup survival analysis using the remaining EpiGroup 2 and EpiGroup 3 patients. The activity scores in the subgroup analysis remain significantly associated with disease progression (Supplementary Fig. 4c).

Additionally, we also evaluated the 5 hmC level of each gene around the AR locus. We plotted the mean values of read count log2 ratios in the genes flanking AR, and we observed a peak log2 ratio in EpiGroup 1 at the AR gene but not in EpiGroup 2 and EpiGroup 3. However, flanking gene loci did not show a significantly high read count in EpiGroup 1 (Supplementary Fig. 5a). Because the 5hmC capture assay is highly specific, the high level of 5hmC at AR but not at the nearby gene loci indicated specific activation of AR expression. To further evaluate the copy number at the AR locus, we performed low-pass whole genome sequencing in 10 available plasma samples, including 5 samples from EpiGroup 1 and 5 samples from EpiGgroup 2, in addition to 4 healthy controls. Copy number analysis using the whole genome sequencing data showed no focal amplification at the AR locus in any of these samples (Supplementary Fig. 5b).

Dynamics of gene-set activity scores reflect treatment response and serve as surrogates to monitor disease progression

To evaluate whether 5hmC profile changes reflect treatment outcomes, we compared the dynamics of the entire 23433 genes at different blood collection times for the three groups of patients with different epigenomic features. Although patients in EpiGroup 2 did not show a significant difference between different time points, we did observe significant changes when comparing 3-month data points to baselines in EpiGroup 1 and EpiGroup 3. Specifically, 3-month treatment caused significant 5hmC changes in 4315 genes, including 2627 reduced and 1688 increased methylations in EpiGroup 1; and 4112 genes, including 1803 reduced and 2309 increased methylation in EpiGroup 3 (FDR < 0.1, Supplementary Data 7). Interestingly, enrichment analysis in EpiGroup 1 showed the most significant methylation reduction in gene sets of androgen and estrogen responses, while the most significant methylation increases in the gene sets of immune responses (Supplementary Data 8). Similarly, enrichment analysis in EpiGroup 3 showed that the 3-month treatment induced significant methylation increases in gene sets of the mitotic spindle and p53 pathway (Supplementary Data 9), although we did not observe significant methylation reduction in any hallmark gene sets.

We also compared the dynamics of gene-set activity scores at 3 blood collection times. In EpiGroup 1, all six patients had blood collected at baseline and 3-month treatment, with four patients completing all three blood collections. When compared to baseline, 3-month treatment significantly reduced the gene-set activity score (P = 0.0004). Importantly, the reduced activity score was then significantly increased when the disease progressed (P = 0.0317) (Fig. 5h). However, these dynamic changes were not evident in EpiGroup 2 patients with disease progression (Supplementary Fig. 6a). Patients without disease progression, irrespective of EpiGroup 2 and EpiGroup 3, did not show significant changes (Supplementary Fig. 6b, c). Given that changes in ctDNA fraction and PSA levels are key indicators of disease progression, we also evaluated their dynamics among EpiGroup1 patients at all 3 data points. Despite a significant decrease after 3 months of treatment, neither ctDNA fractions nor PSA levels showed a significant increase at disease progression (Fig. 5i, j), suggesting that gene set activity score is more sensitive than either the ctDNA fraction or PSA in detecting disease progression.

Discussion

Currently, the standard of care for newly diagnosed metastatic castrate-sensitive prostate cancer is to combine the luteinizing hormone–releasing hormone (LHRH) analog and ARSI with or without docetaxel33. Although there are no head-to-head comparisons, multiple published meta-analyses did not show statistically significant improvement in overall survival with adding docetaxel to LHRH analog and ARSI (triplet therapy) compared with the LHRH analog plus ARSI (doublet therapy). Post hoc analysis of the ARASENS trial indicates that the overall survival benefits are mainly in high-volume but not for low-volume metastatic castrate-sensitive prostate cancer34,35. Retrospective biomarker studies have reported prognostic biomarkers like mutations in TP53 or RB36. However, these biomarkers have not been validated in randomized prospective studies. Our study demonstrated that hormone-sensitive prostate cancer patients can be classified into at least three epigenomic subgroups with an impact on discriminating clinical outcomes. The subgroup with higher androgen signaling signals had a shorter time to become castration-resistant. Further validation of this finding will facilitate the discovery of more effective biomarkers in future clinical trials.

One interesting finding of this study was the identification of significant 5hmC enrichment of androgen response gene sets among patients with shorter time to progression on ADT-based therapies. Importantly, this enrichment was present in pre-treatment baseline samples, suggesting potential use for identifying patients who would benefit from upfront treatment intensification. In fact, all patients in this subgroup received doublet therapy (4 cases for ADT + docetaxel; 2 cases for ADT + ARSI). Although the initial treatment seemed effective (as measured by monitoring PSA levels), this group of patients showed rapid disease progression within 1 year (median PFS = 11.73 months). This result suggests inherent resistance to the doublet therapy in this group of patients who featured a high 5hmC signal in androgen response genes. Furthermore, the activation signal was not limited to the hallmark androgen response gene set. We also observed the activation signals from the coexpressed gene sets of AR, FOXA1, and GRHL2. Meanwhile, the gene set involved in AR-binding targets also showed activation in the subgroup of patients. This observation is consistent with a recent study showing that 5hmC marks the activation of major driver genes in advanced prostate cancer, not only the gene bodies but also downstream target binding sites16. These results suggest that the activity scores may reflect driver gene activation.

The 5hmC epigenomic changes are highly tissue-specific and are more frequently found in genes that drive tissue differentiation and among tissue-specific transcription factors37,38. Because prostate cancer cells are highly dependent on androgens for growth and survival, the androgen signaling pathway plays a crucial role in the development and progression of prostate cancer39. It is more reasonable to speculate that the high level of 5hmC in androgen signaling gene sets is of prostate cancer origin. This speculation is supported by the observation that activity scores in androgen signaling gene sets were positively associated with ctDNA fraction. However, such a significant association seems driven by a few patients with high ctDNA fraction. The lack of association in most patients with low ctDNA fraction could be attributable to the inaccurate estimation of ctDNA fraction. Because the method for ctDNA fraction calculation does not have a reference baseline established in a healthy control population, ctDNA fraction may be overestimated in some patients with low tumor burden, potentially leading to poor correlations between ctDNA fraction and gene set activity scores in most patients. Nevertheless, the gene-set activity scores seem to effectively separate patients with disease progression from those without.

This study also found that the 5hmC-based dynamic changes in these gene sets may be used to monitor treatment response and disease progression. Our study shows that higher activity scores are significantly reduced for a subgroup of patients with activation in androgen response gene sets. However, this score reduction does not last long before returning to higher levels at disease progression. In addition to androgen signaling, we also observed increased activity scores of p53 and mitotic spindle pathways in another subgroup of patients during the treatment, suggesting activation of the p53-mediated stress response, and cell cycle regulation. These epigenomic changes may reflect the therapeutic effects and influence treatment response and disease progression. Meanwhile, the treatment also induced significant changes in gene sets related to immune response pathways in a subgroup of patients. Clearly, the dynamics of the activity scores during treatment coincide with treatment response and progression. This study demonstrates potential applications of 5hmC-based cfDNA enrichment analysis in assessing treatment response and monitoring disease progression among patients with hormone-sensitive prostate cancer, a state in which no validated biomarkers are currently available for clinical application.

Although the evidence from this study is promising, the current study also has some limitations. First, the study cohort was gathered between 2015 and 2019, a period when standard care was shifting from ADT to doublet or triplet therapies. Consequently, the cohort exhibits considerable treatment heterogeneity, with patients receiving a diverse range of treatments alongside ADT. Given that ADT combination therapies demonstrate survival benefits, a treatment-specific analysis is likely to yield more definitive results. Second, the study cohort is relatively small, which does not allow more detailed analysis by further stratifying patients based on their treatment strategies and ethnic groups. Third, this initial discovery was from one cohort only. The lack of an independent validation cohort may lead to cohort-dependent biases. Nevertheless, the activity scoring algorithm developed in this study has great potential in assessing treatment response, monitoring disease progression, and identifying patients who would benefit from upfront treatment intensification. However, further studies utilizing highly sensitive ctDNA quantification assays in homogeneous patient cohorts are required to validate the findings of the study.

Reporting Summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Supplementary information

Supplementary Information (646.8KB, pdf)
43856_2025_783_MOESM2_ESM.pdf (335.9KB, pdf)

Description of Additional Supplementary Files

Supplementary-Data 1-10 (6.5MB, xlsx)
Reporting Summary (1.2MB, pdf)

Acknowledgements

This work is funded by the NIH (R01CA212097 and R01CA250018 to L.W. and 1RO1CA262570 to M.T.N.) and by Wisconsin Breast/Prostate Cancer Showhouse (to L.W. and D.K.). We thank the Sequencing Core at the Medical College of Wisconsin for sequencing consultation and support. Editorial assistance was provided by Moffitt Cancer Center’s Office of Scientific Publishing by Gerard Hebert; no compensation was given beyond his regular salary.

Abbreviations

5hmC

5-hydroxymethylcytosines

ADT

androgen-deprivation therapy

AR

androgen receptor

ARSI

androgen-receptor signaling inhibitor

CDS

coding DNA sequence

cfDNA

cell-free DNA

ChIP

chromatin immunoprecipitation

DMGs

differentially methylated genes

ECOG

Eastern Cooperative Oncology Group

FDR

false discovery rate

FOXA1

forkhead box A1

GRHL2

grainyhead-like transcription factor 2

LHRH

luteinizing hormone–releasing hormone

PSA

prostate-specific antigen

Author contributions

Conception and design: L.W., D.K., C.C.H. Development of methodology: Q.X.L., L.W., C.C.H. Acquisition of data: Q.X.L., L.W., D.K. Analysis and interpretation of data: Q.X.L., S.H., Y.J.T., J.Y.H., A.B., M.K., M.P., J.W. Writing, review, and/or revision of the manuscript: Q.X.L., M.L., J.S.Z., J.Y.P., M.T.N., B.M., M.K., C.C.H., L.W. Administrative, technical, or material support: D.K., E.M.G.

Peer review

Peer review information

Communications Medicine thanks the anonymous reviewers for their contribution to the peer review of this work.

Data availability

The dataset used during the current study is available from the Sequence Read Archive submission with accession No. PRJNA1090140. Supplementary Data 1-9 includes detailed clinical characteristics of patients, sequencing data quality matrix, and statistical and pathway analyses of differentially methylated genes. The source data for Figs. 35 is in Supplementary Data 10.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Qianxia Li, Chiang-Ching Huang.

Contributor Information

Deepak Kilari, Email: dkilari@mcw.edu.

Liang Wang, Email: liang.wang@moffitt.org.

Supplementary information

The online version contains supplementary material available at 10.1038/s43856-025-00783-0.

References

  • 1.Bray, F. et al. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin.68, 394–424 (2018). [DOI] [PubMed] [Google Scholar]
  • 2.Wyatt, A. W. et al. Genomic alterations in cell-free DNA and enzalutamide resistance in castration-resistant prostate cancer. JAMA Oncol.2, 1598–1606 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Maximum androgen blockade in advanced prostate cancer: an overview of the randomised trials. Prostate Cancer Trialists’ Collaborative Group. Lancet355, 1491–1498 (2000). [PubMed]
  • 4.Denis, L. & Murphy, G. P. Overview of phase III trials on combined androgen treatment in patients with metastatic prostate cancer. Cancer72, 3888–3895 (1993). [DOI] [PubMed] [Google Scholar]
  • 5.Sharifi, N., Gulley, J. L. & Dahut, W. L. An update on androgen deprivation therapy for prostate cancer. Endocr. Relat. Cancer17, R305–R315 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Corsini, C. et al. Survival trend in individuals with De Novo metastatic prostate cancer after the introduction of doublet therapy. JAMA Netw. Open6, e2336604 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Crawford, E. D. et al. Androgen receptor targeted treatments of prostate cancer: 35 years of progress with antiandrogens. J. Urol.200, 956–966 (2018). [DOI] [PubMed] [Google Scholar]
  • 8.Kraus, L. A. et al. The mechanism of action of docetaxel (Taxotere) in xenograft models is not limited to bcl-2 phosphorylation. Invest. N. Drugs21, 259–268 (2003). [DOI] [PubMed] [Google Scholar]
  • 9.Mistry, S. J. & Oh, W. K. New paradigms in microtubule-mediated endocrine signaling in prostate cancer. Mol. Cancer Ther.12, 555–566 (2013). [DOI] [PubMed] [Google Scholar]
  • 10.Alix-Panabieres, C. & Pantel, K. Liquid biopsy: from discovery to clinical application. Cancer Discov.11, 858–873 (2021). [DOI] [PubMed] [Google Scholar]
  • 11.Attard, G. et al. Prostate cancer. Lancet387, 70–82 (2016). [DOI] [PubMed] [Google Scholar]
  • 12.Lone, S. N. et al. Liquid biopsy: a step closer to transform diagnosis, prognosis and future of cancer treatments. Mol. Cancer21, 79 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Nishiyama, A. & Nakanishi, M. Navigating the DNA methylation landscape of cancer. Trends Genet.37, 1012–1027 (2021). [DOI] [PubMed] [Google Scholar]
  • 14.Shi, D. Q., Ali, I., Tang, J. & Yang, W. C. New insights into 5hmC DNA modification: generation, distribution and function. Front. Genet.8, 100 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Xu, Y. et al. Genome-wide regulation of 5hmC, 5mC, and gene expression by Tet1 hydroxylase in mouse embryonic stem cells. Mol. Cell42, 451–464 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Sjostrom, M. et al. The 5-hydroxymethylcytosine landscape of prostate cancer. Cancer Res82, 3888–3902 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Song, C. X. et al. Selective chemical labeling reveals the genome-wide distribution of 5-hydroxymethylcytosine. Nat. Biotechnol.29, 68–72 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Scher, H. I. et al. Design and end points of clinical trials for patients with progressive prostate cancer and castrate levels of testosterone: recommendations of the Prostate Cancer Clinical Trials Working Group. J. Clin. Oncol.26, 1148–1159 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Azad, A. A. et al. A retrospective, Canadian multi-center study examining the impact of prior response to abiraterone acetate on efficacy of docetaxel in metastatic castration-resistant prostate cancer. Prostate74, 1544–1550 (2014). [DOI] [PubMed] [Google Scholar]
  • 20.Song, C. X. et al. 5-Hydroxymethylcytosine signatures in cell-free DNA provide information about tumor types and stages. Cell Res.27, 1231–1242 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics34, i884–i890 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods9, 357–359 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics25, 2078–2079 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Amemiya, H. M., Kundaje, A. & Boyle, A. P. The ENCODE Blacklist: identification of problematic regions of the genome. Sci. Rep.9, 9354 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol.9, R137 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics30, 923–930 (2014). [DOI] [PubMed] [Google Scholar]
  • 27.Patwardhan, M. N., Wenger, C. D., Davis, E. S. & Phanstiel, D. H. Bedtoolsr: An R package for genomic data analysis and manipulation. J. Open Source Softw.4, 1742 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol.15, 550 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Du, M. et al. Plasma cell-free DNA-based predictors of response to abiraterone acetate/prednisone and prognostic factors in metastatic castration-resistant prostate cancer. Prostate Cancer Prostatic Dis.23, 705–713 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Xie, Z. et al. Gene set knowledge discovery with Enrichr. Curr. Protoc.1, e90 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Beltran, H. et al. Divergent clonal evolution of castration-resistant neuroendocrine prostate cancer. Nat. Med.22, 298–305 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Fonseca, N. M. et al. Prediction of plasma ctDNA fraction and prognostic implications of liquid biopsy in advanced prostate cancer. Nat. Commun.15, 1828 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Sanmamed N, et al. Docetaxel Provides Oncological Benefits in the Era of New-Generation Androgen Receptor Inhibitors - or Is There a Crowd? Clin. Genitourin Cancer22, 56–66 (2024). [DOI] [PubMed]
  • 34.Fizazi, K. et al. Abiraterone plus prednisone added to androgen deprivation therapy and docetaxel in de novo metastatic castration-sensitive prostate cancer (PEACE-1): a multicentre, open-label, randomised, phase 3 study with a 2 x 2 factorial design. Lancet399, 1695–1707 (2022). [DOI] [PubMed] [Google Scholar]
  • 35.Hussain, M. et al. Darolutamide plus androgen-deprivation therapy and docetaxel in metastatic hormone-sensitive prostate cancer by disease volume and risk subgroups in the phase III ARASENS Trial. J. Clin. Oncol.41, 3595–3607 (2023). [DOI] [PubMed] [Google Scholar]
  • 36.Kohli, M. et al. Clinical and genomic insights into circulating tumor DNA-based alterations across the spectrum of metastatic hormone-sensitive and castrate-resistant prostate cancer. EBioMedicine54, 102728 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Cui, X. L. et al. A human tissue map of 5-hydroxymethylcytosines exhibits tissue specificity through gene and enhancer modulation. Nat. Commun.11, 6161 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.He, B. et al. Tissue-specific 5-hydroxymethylcytosine landscape of the human genome. Nat. Commun.12, 4249 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Mehralivand, S. et al. New advances of the androgen receptor in prostate cancer: report from the 1st International Androgen Receptor Symposium. J. Transl. Med.22, 71 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information (646.8KB, pdf)
43856_2025_783_MOESM2_ESM.pdf (335.9KB, pdf)

Description of Additional Supplementary Files

Supplementary-Data 1-10 (6.5MB, xlsx)
Reporting Summary (1.2MB, pdf)

Data Availability Statement

The dataset used during the current study is available from the Sequence Read Archive submission with accession No. PRJNA1090140. Supplementary Data 1-9 includes detailed clinical characteristics of patients, sequencing data quality matrix, and statistical and pathway analyses of differentially methylated genes. The source data for Figs. 35 is in Supplementary Data 10.


Articles from Communications Medicine are provided here courtesy of Nature Publishing Group

RESOURCES