Abstract
Background
Prognostic assessment plays a crucial role in guiding clinical management and treatment decisions for gastric cancer patients. The enrichment characteristics of 5-hydroxymethylcytosine (5hmC) in circulating cell-free DNA (cfDNA) has emerged as potential prognostic epigenetic markers.
Methods
Using 5hmC-Seal combined with next-generation sequencing (NGS), we profiled the genome-wide distribution of 5hmC in plasma cfDNA samples from 51 gastric cancer patients. Prognostic biomarkers were selected via random survival forest and Cox proportion hazards models, and a prognostic model was subsequently constructed.
Results
Seven prognostic biomarker genes were identified, and the 7-gene prognostic model demonstrated a concordance index (C-index) of 0.892 (95% CI = 0.786–0.998). Patients in the high risk group had a significantly worse overall survival (OS) than those in low-risk group (log-rank P = 0.00012). When the cfDNA 5hmC risk-score was integrated with the traditional clinical characteristics, the C-index increased from 0.819 (95% CI = 0.727–0.911) to 0.904 (95% CI = 0.853–0.955). Multivariate analysis adjusted for age, TNM stage, and chemotherapy confirmed that a high risk-score of cfDNA 5hmC model was an independent predictor of poor OS (hazard ratio [HR]=27.47, 95% CI = 3.28–230.25).
Conclusion
cfDNA 5hmC serves as an effective prognostic biomarker with high predictive value for the long-term survival in postoperative gastric cancer patients.
1. Introduction
Gastric cancer (GC) is the fifth most diagnosed malignancy worldwide, with nearly 1 million estimated new cases annually [1]. Despite advances in surgical techniques and chemoimmunotherapy, the global 5-year survival rate was remains only 20–40% [2]. Prognostic assessment plays a crucial role in guiding clinical management and therapeutic strategies for GC patients. Circulating cell-free DNA (cfDNA), a kind of non-invasive diagnostic and prognostic biomarker, has emerged as a powerful tool to overcome tumor heterogeneity. Its utility in cancer detection and prognosis has been demonstrated through the quantification of tumor-derived single-nucleotide variants (SNVs), copy number alterations (CNAs), and epigenetic aberrations [3–6].
Stable and homogeneous DNA epigenetic alterations are recognized as promising targets for biomarker development. 5-Hydroxymethylcytosine (5hmC), an intermediate product of DNA demethylation, is generated through the oxidation of 5-methylcytosine (5mC) by ten-eleven translocation (TET) enzymes [7]. This epigenetic modification is predominantly enriched in functionally active genomic regions, including enhancers, promoters, and gene bodies, where its levels exhibit a positive correlation with transcriptional activity [8]. Emerging evidence indicates that dynamic 5hmC alterations are associated with cancer initiation, progression, metastasis, and prognosis across multiple malignancies, including gastric cancer [9–13]. Notably, 5hmC profiling has recently gained attention as a novel prognostic epigenetic marker in oncology research.
To detect 5hmC enrichment in cfDNA, the 5hmC-Seal method coupled with next-generation sequencing (NGS) has been widely utilized in cancer research. This approach enables sensitive detection of trace 5hmC modifications and has demonstrated potential as both a diagnostic and prognostic marker across multiple cancer types [14–16]. In our previous study, we reported a gastric cancer diagnostic biomarker model based on the cfDNA 5hmC signatures of 50 gastric cancer patients and 50 matched non-cancer controls, and the biomarker model showed great diagnostic value [17]. Since biomarkers can not only signal disease occurrence but are also used to predict disease prognosis, this study conducted regular follow-up on these 50 gastric cancer patients plus one additional gastric cancer patient who was excluded from the diagnostic analysis due to incomplete pairing (totaling 51 gastric cancer patients). Based on the cfDNA 5hmC sequencing data, clinical characteristics, and follow-up data of these 51 gastric cancer patients, this study screened potential prognostic biomarkers for gastric cancer to construct a prognosis predictive model. Furthermore, it analyzed whether incorporating this predictive factor enhances the accuracy of prognostic predictions when combined with typical clinical characteristics.
2. Materials and methods
2.1 Patients and Follow-up
A total of 51 patients with histologically diagnosed GC) who underwent radical gastrectomy at the Department of Gastric and Colorectal Surgery in the First Hospital of Jilin University (Changchun, China) from 01/01/2018 to 01/12/2021 were recruited in this cohort study.These patients were derived from a gastric cancer patient cohort established since 01/01/2018. From 01/06/2023 to 30/04/2024, we retrospectively reviewed this patient cohort database to obtain their general demographic, clinicopathological characteristics, and follow-up data.
Patients included were first-time diagnosed with GC and received tumor resection; the diagnosis was confirmed by histopathological examination of tumor tissue, and all subjects had peripheral blood collected without any radiotherapy, chemotherapy, or immunotherapy before any tumor resection treatment. All patients were accounted for at the first follow-up. The exclusion criteria were as follows: ① patients with any other tumors, ② patients with distant metastasis. Clinical, pathological, and treatment data of GC patients were obtained from medical records.
Follow-up was implemented at 3 months, 6 months, 12 months and annually afterwards until death or the end of the follow-up. Information on general status and postoperative chemotherapy were collected during each follow-up. If the patients had died, the date of death and potential cause were recorded. The duration from the date of surgery to the date of death or the last successful interview date was defined as the survival time. If the patient was lost to follow-up, survival time was defined as the duration from the date of surgery to the date of the last successful interview. All patients were followed up for a minimum of 30 months.
This study was reviewed and approved by the Ethics Committee at The First Hospital of Jilin University (NO. 19K042-001). Written informed consent was obtained from each participant, and biospecimens were collected as approved by the Institutional Review Boards (IRBs) responsible at the First Hospital of Jilin University.
2.2 Sample preparation and 5hmC-Seal Sequencing
Approximately 10 ml peripheral blood samples from each participant were collected in PAXgene Blood ccfDNA Tubes according to the manufacturer’s protocol (Qiagen, Hilden, Germany). Plasma was separated from the whole blood samples by centrifuging twice. cfDNA was isolated from the clean plasma using the QIAamp Circulating Nucleic Acid Kit (Qiagen, Germantown, MD, USA) following the manufacturer’s protocol.
The cfDNA was prepared and ligated with Illumina compatible adaptors and purified on a Micro Bio-Spin 30 Column (Bio-Rad, Hercules, CA, USA). The cfDNA concentration of each library was measured with a Qubit fluorometer (Life Technologies, Carlsbad, CA, USA), and sequencing was performed on the Nova6000 platform. Samples that passed the quality test were transferred to the following sequencing steps. The raw 5hmC-Seal data were trimmed using Trimmomatic software (version 0.36) and checked for quality by Bioanalyzer dsDNA. A high-sensitivity assay (Agilent Technologies, Santa Clara, CA), followed by alignment to the human genome reference (hg19), was conducted by Bowtie2 (version 2.2.6). DeepTools (version 3.5.1) was used to conduct a series of processing steps on the sequencing result files and calculate the profiling. More details about the cfDNA quality test, 5hmC-Seal library preparation and sequencing and the data processing pipelines were described in previous publications [17]. The 5hmC-seal count data were normalized by correcting for sequencing depth and library size using DESeq2 (version 1.12.3).
The 5hmC read count per million mapped reads of the gene body (from TSS to TTS) was calculated as the 5hmC level of each gene. 5hmC modification of the gene body region was used as marker region of each gene supported by previous published results [17–19]. The online NIH/DAVID tool (https://david.ncifcrf.gov/tools) and Wei Sheng Xin (https://www.bioinformatics.com.cn/) were used to analyze the functional enrichment in Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. The P values of the enriched terms were corrected by the Holm Bonferroni method.
2.3 Statistics methods
Categorical variables are presented as n (%). Mean±standard deviation (SD) presented for normally distributed continuous variables, while median [interquartile range (IQR)] was given to those with non-normally distributed continuous variables.
Univariate Cox regression analyses were performed to select prognostic associated marker genes one by one with a P-value <0.05. RSF model was built by ensemble binary trees grown on bootstrapped samples to select the most important variables that are associated with time to death by R randomSurvivalForest package. Bootstrapping and random node splitting were used to grow an ensemble of binary trees to form the RSF model. We used 1000 trees to construct our models with the square root of the number of predictors sampled at each split time. When constructing a bootstrap sample in the ensemble, certain samples are left out. The average of the out-of-bag (OOB) performance measures can then be used to evaluate the predictive performance of the entire ensemble. Approximately 37% of the dataset, classified as OOB instances, were excluded from each tree’s training set and used for validation purposes.
VIMP was obtained by measuring the decrease in prediction accuracy using out-of-bag data which were not used for building trees each time. At each iteration, the variable with least importance score was removed and a model was rebuilt using the remaining variables, and the prediction error rate of the model was recorded. The process was repeated until all variables were removed. Then the variables in smallest prediction error model was selected as potential markers.
To evaluate how each of the individual analytes selected by RSF was linked to time to death, we used stepwise Cox proportional hazards regression model to construct the predictive model with selected variables at final by R survival package. To quantify the risk-scores for the final prognostic model, we employed the following formula according to the “risk” type of Coxph function of R survival package.:β1 × gene1 + β2 × gene2 + … + βp×genep, where βp is the coefficient for the kth marker gene from the final multivariable Cox proportional hazards model, and genep is the normalized 5hmC level of the kth marker gene. Seven genes were analyzed using a multivariate Cox proportional hazards model. Kaplan-Meier plot with log-rank test was performed to examine the OS difference across different risk-score group.
Model discrimination was measured by the Harrell’s C-index and tAUC, which corresponds to the proportion of random pairs of cases where one patient is alive and one dead at a specified time point where the model has correctly ordered their probability of survival having weighted for censoring [20], with R pec package. Model calibration was measured by the calibration plot using R rms package. Complex model including risk-score of cfDNA 5hmC markers and clinical characteristics was further constructed to evaluate the incremental value of cfDNA 5hmC markers. The incremental value of cfDNA 5hmC risk-score was assessed using both category-free net reclassification improvement (NRI) and integrated discrimination improvement (IDI) by R survival package. All statistical analyses were performed with R version 3.5.2 (http://www.r-project.org/). This study complies with the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) guideline statement [21].
3. Results
3.1 Patient characteristics
This study enrolled 51 GC patients who underwent curative tumor resection and completed initial postoperative follow-up (Table 1).
Table 1. Demographics and clinical characteristics of the study population.
| Characteristics | Group | n(%) |
|---|---|---|
| Sex | Male | 35(68.6) |
| Female | 16(31.4) | |
| Depth of invasion | T1 | 9(17.6) |
| T2 | 6(11.8) | |
| T3 | 21(41.2) | |
| T4 | 15(29.4) | |
| Lymph metastasis | N0 | 17(33.3) |
| N1-N3 | 34(66.7) | |
| TNM stage | Stage I | 11(21.6) |
| Stage II | 15(29.4) | |
| Stage III | 25(49.0) | |
| Tumor size | <0.5 cm | 34(66.7) |
| ≥0.5 cm | 17(33.3) | |
| Vascular invasion | Positive | 36(70.6) |
| Negative | 15(29.4) | |
| Neural invasion | Positive | 27(52.9) |
| Negative | 24(47.1) | |
| Lauren classification | Intestinal | 18(35.3) |
| Diffuse | 12(23.5) | |
| Mixd | 21(41.2) | |
| Chemotherapy | Yes | 23(45.1) |
| No | 28(54.9) |
The cohort had a median age of 60.3 ± 7.3 years (range: 41–77 years), and 68.6% (n = 35) participants were male. Pathological assessment revealed vascular invasion in 70.6% of cases and neural invasion in 52.9%. Adjuvant chemotherapy was administered to 45.1% of patients. According to the American Joint Committee on Cancer/ Union for International Cancer Control (AJCC/UICC) 8th Edition Gastric Cancer Staging System, 21.6% (n = 11) of patients were classified as TNM stage I. During the follow-up period (ended on April 30, 2024), 15 patients (29.4%) died.
3.2 Identification and functional annotation prognosis-associated genes
We mapped 5hmC-modified gene bodies to the adult gastric genome (which was defined in ENCODE GENCODE, http://genome.ucsc. edu/ENCODE/) and identified 19,100 annotated genes. Univariate Cox regression analysis revealed 445 candidate genes significantly associated with overall survival (P < 0.05). To investigate the biological relevance of these prognosis-linked hydroxymethylation changes, we performed functional enrichment analysis using Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO) databases. The results of the KEGG analysis showed that these genes were mainly enriched in tumor-related protein processing in the endoplasmic reticulum, the cMAP signaling pathway, ATP-dependent chromatin remodeling, and the p53 signaling pathway (Fig 1a). GO analysis highlighted molecular functions and cellular components associated with: nucleic acid transport, transmembrane transporter complex, and phosphoprotein binding functions (Fig 1b). More details of the enriched terms are listed in Supplemental table 1 (KEGG) and Supplemental table 2 (GO).
Fig 1. Functional enrichment analysis of GC patient prognosis associated genes (a.
Top 30 enriched KEGG pathways. b. Top 10 enriched GO terms of each function module. BP, biological process, CC, cellular component, MF, molecular function).
3.3 Prognostic marker genes selection and model construction
The random survival forest (RSF) model was applied to refine prognostic predictors. During model training, prediction error rates stabilized at low values after constructing 200 survival trees (Fig 2a). Variable importance (VIMP) scores were calculated upon completion of 1,000 survival trees, with higher VIMP indicating greater prognostic significance. Thirteen genes with VIMP > 0.3 were prioritized as candidate biomarkers (Fig 2b). Through stepwise multivariate Cox regression analysis, we refined these to seven core prognostic genes: RPH3A, ADIPOR1, ATP8A2, P2RX3, ATP1A2, OTOP1, and ABCD4. Multivariate Cox analysis confirmed the independent prognostic value of these seven genes (Fig 2c), with the resulting model demonstrating strong predictive accuracy (C-index = 0.892, 95% CI: 0.786–0.998). Using weighted 5hmC levels of the seven-gene signature, we calculated individualized risk scores. Patients were stratified into high-risk (n = 26) and low-risk (n = 25) groups based on the cohort median risk score (1.28). The high-risk group exhibited significantly worse overall survival compared to the low-risk group (log-rank P = 0.0001; Fig 2d).
Fig 2. Marker genes selection and predicted model construction.
(a. error rate curve of markers selection by RSF, b. Variable importance plot of genes with important value more than 0.5. c. Overall survival multivariate Cox regression analysis in GC patients. d. Kaplan-Meier plot of high and low risk-score groups, which separated using median risk score (1.28). RSF, random survival forest. HR, hazard ratio. CI, confidence interval.).
Time-dependent receiver operating characteristic (ROC) curve analysis revealed strong prognostic performance of the risk score across multiple survival endpoints. The area under the curve (AUC) values for 1-, 3-, and 5-year overall survival predictions were 0.972, 0.891, and 0.912, respectively (Fig 3a). The time-dependent AUC (tdAUC) trajectory further demonstrated sustained predictive accuracy, maintaining values >0.8 throughout the 5-year observation period (Fig 3b). Risk score distribution patterns provided additional clinical insights. A combined visualization of risk score dynamics and 5hmC profile clustering showed that: Patients with risk scores above the cohort median (1.28) exhibited significantly shorter survival times, and two early mortality cases (8-month survival) demonstrated exceptionally elevated risk scores compared to the remainder of the cohort (Fig 3c).).
Fig 3. Risk-score calculated based on the seven biomarker genes 5hmC level are correlated with prognosis of GC patients (a.
Time-dependent receiver operating characteristic (ROC) curves at 1 year,3years and 5 years. b. Time-dependent area under the curves (tdAUC). c. Linkage map of risk-score distribution and biomarker genes cluster).
To evaluate the additive prognostic value of the 5hmC-derived risk score, we developed two Cox regression models. The Model 1 included baseline predictors (age, TNM stage, chemotherapy) selected through stepwise regression from clinical variables (sex, age, TNM stage, neural invasion, vascular invasion, Lauren classification, tumor size). Model 2 constructed by model 1 predictors + risk score. The C-index improved from 0.819 (95% CI: 0.727–0.911) for Model 1 to 0.904 (95% CI: 0.853–0.955) for Model 2. Multivariate analysis confirmed the risk score as an independent prognostic factor (HR = 27.47, 95% CI: 3.28–230.25; P < 0.001) after adjusting for age, TNM stage, and chemotherapy (Table 2).Model comparison using reclassification metrics demonstrated significant improvement with risk score inclusion: Net Reclassification Index (NRI) = 0.49 (95% CI: 0.12–0.76) and Integrated Discrimination Improvement (IDI)=0.25 (95% CI: 0.06–0.45).
Table 2. Multivariate prognostic models in patients with gastric cancer.
| Variable | HR(95%CI) | C-index(95%CI) | |
|---|---|---|---|
| Model 1 | TNM stage (III vs Ⅰ-Ⅱ) | 13.71(2.97-63.32) | 0.819(0.727-0.911) |
| Chemotherapy(yes) | 0.28(0.08-0.96) | ||
| Age(≥60) | 1.68(0.53-5.33) | ||
| Model 2 | risk-score(high) | 27.47(3.28-230.25) | 0.904(0.853-0.955) |
| TNM stage (III vs Ⅰ-Ⅱ) | 23.11(4.24-126.03) | ||
| Chemotherapy(yes) | 0.26(0.08-0.87) | ||
| Age(≥60) | 3.79(1.07-12.43) |
The calibration curve for Model 2 exhibited excellent agreement between predicted and observed 5-year survival probabilities (Fig 4 a). We further developed a clinical nomogram integrating four predictors: age, TNM stage, chemotherapy status, and risk score for Model 2 (Fig 4 b). Each variable was assigned a point based on the HR value. Then, by summing the scores for each variable and locating the total score on the scale, the probabilities of 1-, 3-, and 5-year OS can be obtained. In the OS nomogram, the risk-score contributed the most to the survival outcome..
Fig 4. Calibrate curve and nomogram of the predicted model consisting risk-score with three characteristics (a. calibrate curve of model 2. b. nomogram of model 2 for predicting 1-,3- and 5 years survival in patients with GC.
).
4. Discussion
The distinct genomic localization of 5hmC within transcriptionally active gene bodies, coupled with its strong correlation to transcriptional output, renders it a more robust epigenetic biomarker of disease progression than 5-methylcytosine (5mC) [22]. In this prospective cohort study of GC patients, we performed genome-wide 5hmC profiling of plasma-derived cfDNA to assess its prognostic utility. To our knowledge, this represents the first systematic evidence establishing genome-wide cfDNA 5hmC signatures as independent prognostic markers in GC. Through univariate Cox regression screening of 19,100 annotated genes, we identified 445 genes with differential hydroxymethylation significantly associated with clinical outcomes (P < 0.05). KEGG and GO functional enrichment analysis revealed these prognosis-linked genes were overrepresented in ‘tumor-related protein processing in the endoplasmic reticulum’, ‘cMAP signaling pathway’, ‘p53 signaling pathway’ and ‘transmembrane transporter complex’, ‘phosphoprotein binding’ gene ontology terms. These mechanisms are mechanistically linked to tumor progression through regulation of protein homeostasis, stress response signaling, and cellular transport dynamics—key hallmarks of cancer pathogenesis.
The RSF model represents a non-parametric, nonlinear ensemble learning technique that extends the conventional random forest methodology to accommodate survival data [23]. The RSF model is particularly suited for handling censored survival data, as it adapts the Gini impurity criterion by incorporating log-rank statistics to optimize node splits, thereby maximizing the divergence between Kaplan-Meier survival curves post-split [24]. RSF constructs multiple decision trees through a process of randomization and aggregation, culminating in a robust prediction model. Notably, the RSF approach does not rely on assumptions such as p-values, the proportional hazards assumption, or linearity, and reduces computational time by replacing cross-validation with out-of-bag data estimation [23]. RSF offers enhanced accuracy compared to alternative methodologies, presents a novel technical approach, and has provided valuable insights into the temporal variability of variable significance [25]. Through the application of RSF in conjunction with a stepwise Cox model, a panel of seven marker genes significantly associated with the prognosis of GC patients was identified, effectively combining the high predictive accuracy of the RSF model with the interpretative clarity of the Cox model. Notably, several of these biomarker genes, including ADIPOR1 (which mediates increased AMP-activated protein kinase and PPARα ligand activity, thereby negatively regulating cancer cell progression, have been previously implicated in GC [26]. The expression level of ADIPOR1 has also been reported to be associated with the development and progression of GC [27]. Additionally, some genes in the panel have been reported their specific role in other cancers but not in gastric cancer. For example, ATP8A2, which encoded ATPase Phospholipid Transporting 8A2, belongs to the P4-ATPase family. This gene family actively flips phosphatidylserine and phosphatidylethanolamine from the exoplasmic to the cytoplasmic leaflet of cell membranes to generate and maintain phospholipid asymmetry [28]. ATP8A2 has been associated with a better prognosis in lung cancer [29,30].
The risk score of the final predictive model integrating RSF and stepwise Cox analysis demonstrated robust performance in predicting GC prognosis, achieving a C-index of 89.1%, significantly superior to traditional TNM stage [31]. This enhanced efficacy is particularly noteworthy given that pathological TNM staging, while recognized as the gold standard for long-term survival prognostication in GC, can only be determined postoperatively [32]. Patients stratified into the high-risk group exhibited significantly worse OS compared to the low risk-score group (log-rank P = 0.00012). Importantly, the high-risk score remained independently associated with poor prognosis in multivariate analyses adjusted for TNM stage, age, and chemotherapy status, suggesting that 5hmC modification patterns in genes carry critical prognostic implications for GC. This finding aligns with prior studies demonstrating the prognostic value of 5hmC biomarkers in cfDNA across multiple cancer types [14,33–36]. The clinical relevance of such prognostic stratification is further underscored by evidence from other malignancies. For instance, treatment strategy optimization guided by prognostic biomarkers has improved survival outcomes in colorectal cancer [37]. As a treatment-agnostic biomarker capable of predicting OS regardless of therapeutic interventions [37], 5hmC-based risk stratification could inform clinical decision-making by identifying patients who may benefit from intensified surveillance or personalized adjuvant therapies.
Our analysis further revealed that integrating the cfDNA 5hmC epigenetic signature with conventional clinical prognostic parameters significantly improved prognostic discrimination accuracy. This synergistic combination demonstrated dual clinical utility: 1) as an autonomous prognostic biomarker independent of existing staging systems, and 2) as a complementary tool enhancing current risk stratification frameworks for GC. Notably, the composite model retained strong predictive capacity for long-term postoperative survival (C-index = 0.89; 95% CI 0.85–0.93) even after adjusting for therapeutic interventions, establishing 5hmC-based stratification as a treatment-agnostic prognostic determinant.
While this study provides novel insights into 5hmC-based prognostic stratification, several limitations should be acknowledged. The limitation of the small sample size in this study does not allow for the validation of the marker panel in independent samples. As the 5hmC prognostic signature was developed from a single site cohort, its validation in a broader population is warranted to ensure its applicability across varied clinical settings.
Supporting information
(XLSX)
(XLSX)
(XLSX)
Acknowledgments
We are grateful to all the sample donors.
Data Availability
The raw and processed 5hmc-Seal data of the model development in the current study are publicly available from the NCBI GEO database with accession number GSE246110 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE246110). All other relevant data for the current study are available within the paper and its Supporting Information files.
Funding Statement
This research was supported by the Jilin Provinical Postdoctoral Science Foundation (Grant 820241147428), which recieved by Yingli Fu; the National Natural Science Foundation of China (Grant 82373664) recieved by Jing Jiang;and the International Science and Technology Cooperation Programme (Grant 20240402015 GH) recieved by Xueyuan Cao. No funders play any role in the study design,data collection and analysis, decision to publish, or preparation of the manyscript.
References
- 1.Bray F, Laversanne M, Sung H, Ferlay J, Siegel RL, Soerjomataram I, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2024;74(3):229–63. doi: 10.3322/caac.21834 [DOI] [PubMed] [Google Scholar]
- 2.Allemani C, Matsuda T, Di Carlo V, Harewood R, Matz M, Nikšić M, et al. Global surveillance of trends in cancer survival 2000-14 (CONCORD-3): analysis of individual records for 37 513 025 patients diagnosed with one of 18 cancers from 322 population-based registries in 71 countries. Lancet. 2018;391(10125):1023–75. doi: 10.1016/S0140-6736(17)33326-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Luo H, Zhao Q, Wei W, Zheng L, Yi S, Li G, et al. Circulating tumor DNA methylation profiles enable early diagnosis, prognosis prediction, and screening for colorectal cancer. Sci Transl Med. 2020;12(524):eaax7533. doi: 10.1126/scitranslmed.aax7533 [DOI] [PubMed] [Google Scholar]
- 4.Liang L, Zhang Y, Li C, Liao Y, Wang G, Xu J, et al. Plasma cfDNA methylation markers for the detection and prognosis of ovarian cancer. EBioMedicine. 2022;83:104222. doi: 10.1016/j.ebiom.2022.104222 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Wang Y, Yang L, Bao H, Fan X, Xia F, Wan J, et al. Utility of ctDNA in predicting response to neoadjuvant chemoradiotherapy and prognosis assessment in locally advanced rectal cancer: A prospective cohort study. PLoS Med. 2021;18(8):e1003741. doi: 10.1371/journal.pmed.1003741 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Pessoa LS, Heringer M, Ferrer VP. ctDNA as a cancer biomarker: A broad overview. Crit Rev Oncol Hematol. 2020;155:103109. doi: 10.1016/j.critrevonc.2020.103109 [DOI] [PubMed] [Google Scholar]
- 7.Wu X, Zhang Y. TET-mediated active DNA demethylation: mechanism, function and beyond. Nat Rev Genet. 2017;18(9):517–34. doi: 10.1038/nrg.2017.33 [DOI] [PubMed] [Google Scholar]
- 8.Zeng C, Stroup EK, Zhang Z, Chiu BC-H, Zhang W. Towards precision medicine: advances in 5-hydroxymethylcytosine cancer biomarker discovery in liquid biopsy. Cancer Commun (Lond). 2019;39(1):12. doi: 10.1186/s40880-019-0356-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Applebaum MA, Barr EK, Karpus J, Nie J, Zhang Z, Armstrong AE, et al. 5-Hydroxymethylcytosine Profiles Are Prognostic of Outcome in Neuroblastoma and Reveal Transcriptional Networks That Correlate With Tumor Phenotype. JCO Precis Oncol. 2019;3:PO.18.00402. doi: 10.1200/PO.18.00402 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Tucker DW, Getchell CR, McCarthy ET, Ohman AW, Sasamoto N, Xu S, et al. Epigenetic Reprogramming Strategies to Reverse Global Loss of 5-Hydroxymethylcytosine, a Prognostic Factor for Poor Survival in High-grade Serous Ovarian Cancer. Clin Cancer Res. 2018;24(6):1389–401. doi: 10.1158/1078-0432.CCR-17-1958 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Wang Z, Du M, Yuan Q, Guo Y, Hutchinson JN, Su L, et al. Epigenomic analysis of 5-hydroxymethylcytosine (5hmC) reveals novel DNA methylation markers for lung cancers. Neoplasia. 2020;22(3):154–61. doi: 10.1016/j.neo.2020.01.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Liu J, Jiang J, Mo J, Liu D, Cao D, Wang H, et al. Global DNA 5-Hydroxymethylcytosine and 5-Formylcytosine Contents Are Decreased in the Early Stage of Hepatocellular Carcinoma. Hepatology. 2019;69(1):196–208. doi: 10.1002/hep.30146 [DOI] [PubMed] [Google Scholar]
- 13.Fu Y-L, Wu Y-H, Cao D-H, Jia Z-F, Shen A, Jiang J, et al. Increased 5-hydroxymethylcytosine is a favorable prognostic factor of Helicobacter pylori-negative gastric cancer patients. World J Gastrointest Oncol. 2022;14(7):1295–306. doi: 10.4251/wjgo.v14.i7.1295 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Chiu BC-H, Zhang Z, You Q, Zeng C, Stepniak E, Bracci PM, et al. Prognostic implications of 5-hydroxymethylcytosines from circulating cell-free DNA in diffuse large B-cell lymphoma. Blood Adv. 2019;3(19):2790–9. doi: 10.1182/bloodadvances.2019000175 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Guler GD, Ning Y, Ku C-J, Phillips T, McCarthy E, Ellison CK, et al. Detection of early stage pancreatic cancer using 5-hydroxymethylcytosine signatures in circulating cell free DNA. Nat Commun. 2020;11(1):5270. doi: 10.1038/s41467-020-18965-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Xiao Z, Wu W, Wu C, Li M, Sun F, Zheng L, et al. 5-Hydroxymethylcytosine signature in circulating cell-free DNA as a potential diagnostic factor for early-stage colorectal cancer and precancerous adenoma. Mol Oncol. 2021;15(1):138–50. doi: 10.1002/1878-0261.12833 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Fu Y, Jiang J, Wu Y, Cao D, Jia Z, Zhang Y, et al. Genome-wide 5-hydroxymethylcytosines in circulating cell-free DNA as noninvasive diagnostic markers for gastric cancer. Gastric Cancer. 2024;27(4):735–46. doi: 10.1007/s10120-024-01493-7 [DOI] [PubMed] [Google Scholar]
- 18.Zhang J, Han X, Gao C, Xing Y, Qi Z, Liu R, et al. 5-Hydroxymethylome in Circulating Cell-free DNA as A Potential Biomarker for Non-small-cell Lung Cancer. Genomics Proteomics Bioinformatics. 2018;16(3):187–99. doi: 10.1016/j.gpb.2018.06.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Gross JA, Pacis A, Chen GG, Drupals M, Lutz P-E, Barreiro LB, et al. Gene-body 5-hydroxymethylation is associated with gene expression changes in the prefrontal cortex of depressed individuals. Transl Psychiatry. 2017;7(5):e1119. doi: 10.1038/tp.2017.93 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Kamarudin AN, Cox T, Kolamunnage-Dona R. Time-dependent ROC curve analysis in medical research: current methods and applications. BMC Med Res Methodol. 2017;17(1):53. doi: 10.1186/s12874-017-0332-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Collins GS, Reitsma JB, Altman DG, Moons KGM. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ. 2015;350:g7594. doi: 10.1136/bmj.g7594 [DOI] [PubMed] [Google Scholar]
- 22.He B, Zhang C, Zhang X, Fan Y, Zeng H, Liu J, et al. Tissue-specific 5-hydroxymethylcytosine landscape of the human genome. Nat Commun. 2021;12(1):4249. doi: 10.1038/s41467-021-24425-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Wang Y, Deng Y, Tan Y, Zhou M, Jiang Y, Liu B. A comparison of random survival forest and Cox regression for prediction of mortality in patients with hemorrhagic stroke. BMC Med Inform Decis Mak. 2023;23(1):215. doi: 10.1186/s12911-023-02293-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Mogensen UB, Ishwaran H, Gerds TA. Evaluating Random Forests for Survival Analysis using Prediction Error Curves. J Stat Softw. 2012;50(11):1–23. doi: 10.18637/jss.v050.i11 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Rahman SA, Maynard N, Trudgill N, Crosby T, Park M, Wahedally H, et al. Prediction of long-term survival after gastrectomy using random survival forests. Br J Surg. 2021;108(11):1341–50. doi: 10.1093/bjs/znab237 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Ishikawa M, Kitayama J, Yamauchi T, Kadowaki T, Maki T, Miyato H, et al. Adiponectin inhibits the growth and peritoneal metastasis of gastric cancer through its specific membrane receptors AdipoR1 and AdipoR2. Cancer Sci. 2007;98(7):1120–7. doi: 10.1111/j.1349-7006.2007.00486.x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Kordafshari M, Nourian M, Mehrvar N, Jalaeikhoo H, Etemadi A, Khoshdel AR, et al. Expression of AdipoR1 and AdipoR2 and Serum Level of Adiponectin in Gastric Cancer. Gastrointest Tumors. 2020;7(4):103–9. doi: 10.1159/000510342 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Choi H, Andersen JP, Molday RS. Expression and functional characterization of missense mutations in ATP8A2 linked to severe neurological disorders. Hum Mutat. 2019;40(12):2353–64. doi: 10.1002/humu.23889 [DOI] [PubMed] [Google Scholar]
- 29.Wang X, Shi D, Zhao D, Hu D. Aberrant Methylation and Differential Expression of SLC2A1, TNS4, GAPDH, ATP8A2, and CASZ1 Are Associated with the Prognosis of Lung Adenocarcinoma. Biomed Res Int. 2020;2020:1807089. doi: 10.1155/2020/1807089 [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
- 30.Zhang X, Dong W, Zhang J, Liu W, Yin J, Shi D, et al. A Novel Mitochondrial-Related Nuclear Gene Signature Predicts Overall Survival of Lung Adenocarcinoma Patients. Front Cell Dev Biol. 2021;9:740487. doi: 10.3389/fcell.2021.740487 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Zhang J, Ding Y, Wang W, Lu Y, Wang H, Wang H, et al. Combining the Fibrinogen/Albumin Ratio and Systemic Inflammation Response Index Predicts Survival in Resectable Gastric Cancer. Gastroenterol Res Pract. 2020;2020:3207345. doi: 10.1155/2020/3207345 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Ji X, Bu Z-D, Yan Y, Li Z-Y, Wu A-W, Zhang L-H, et al. The 8th edition of the American Joint Committee on Cancer tumor-node-metastasis staging system for gastric cancer is superior to the 7th edition: results from a Chinese mono-institutional study of 1663 patients. Gastric Cancer. 2018;21(4):643–52. doi: 10.1007/s10120-017-0779-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Weigert M, Cui X-L, West-Szymanski D, Yu X, Bilecz AJ, Zhang Z, et al. 5-Hydroxymethylcytosine signals in serum are a predictor of chemoresistance in high-grade serous ovarian cancer. Gynecol Oncol. 2024;182:82–90. doi: 10.1016/j.ygyno.2024.01.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Li S, Wang Y, Wen C, Zhu M, Wang M, Cao G. Integrative Analysis of 5-Hydroxymethylcytosine and Transcriptional Profiling Identified 5hmC-Modified lncRNA Panel as Non-Invasive Biomarkers for Diagnosis and Prognosis of Pancreatic Cancer. Front Cell Dev Biol. 2022;10:845641. doi: 10.3389/fcell.2022.845641 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Shao J, Olsen RJ, Kasparian S, He C, Bernicker EH, Li Z. Cell-Free DNA 5-Hydroxymethylcytosine Signatures for Lung Cancer Prognosis. Cells. 2024;13(4):298. doi: 10.3390/cells13040298 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Cai Z, Zhang J, He Y, Xia L, Dong X, Chen G, et al. Liquid biopsy by combining 5-hydroxymethylcytosine signatures of plasma cell-free DNA and protein biomarkers for diagnosis and prognosis of hepatocellular carcinoma. ESMO Open. 2021;6(1):100021. doi: 10.1016/j.esmoop.2020.100021 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Kerr DJ, Yang L. Personalising cancer medicine with prognostic markers. EBioMedicine. 2021;72:103577. doi: 10.1016/j.ebiom.2021.103577 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
(XLSX)
(XLSX)
(XLSX)
Data Availability Statement
The raw and processed 5hmc-Seal data of the model development in the current study are publicly available from the NCBI GEO database with accession number GSE246110 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE246110). All other relevant data for the current study are available within the paper and its Supporting Information files.




