Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Feb 1.
Published in final edited form as: Clin Cancer Res. 2024 Aug 1;30(15):3337–3348. doi: 10.1158/1078-0432.CCR-24-0199

A 5-hydroxymethylcytosine-based non-invasive model for early detection of colorectal carcinomas and advanced adenomas: the METHOD-2 study

Wenju Chang 1,2,3,, Zhou Zhang 4,, Baoqing Jia 5,, Kefeng Ding 6,7,, Zhizhong Pan 8,, Guoqiang Su 9,, Wei Zhang 10,, Tianyu Liu 1,2, Yunshi Zhong 1,11, Guodong He 1,2, Li Ren 1,2, Ye Wei 1,2, Dongdong Li 12, Xiaolong Cui 4,13, Jun Yang 14, Yixiang Shi 14, Marc Bissonnette 15, Chuan He 13,16, Wei Zhang 4,17, Jia Fan 18, Jianmin Xu 1,2
PMCID: PMC11490261  NIHMSID: NIHMS2000251  PMID: 38814264

Abstract

Purpose:

Detection of colorectal carcinomas (CRC) at a time when there are more treatment options is associated with better outcomes. This prospective case-control study assessed the 5-hydroxymethylcytosine (5hmC) biomarkers in circulating cell-free DNA (cfDNA) for early detection of CRC and advanced adenomas (AA).

Experimental Design:

Plasma cfDNA samples from 2,576 study participants from the multi-center METHOD-2 study (NCT03676075) were collected, comprising patients with newly diagnosed CRC (n=1,074), AA (n=356), other solid tumors (n=80), and non-CRC/AA controls (n=1,066), followed by genome-wide 5hmC profiling using the 5hmC-Seal technique and the next-generation sequencing (NGS). A weighted diagnostic model for CRC (stage I-III) and AA was developed using the elastic net regularization in a discovery set and validated in independent samples.

Results:

Distribution of 5hmC in cfDNA reflected gene regulatory relevance and tissue of origin. Besides being confirmed in internal validation, a 96-gene model achieved an area under the curve (AUC) of 90.7% for distinguishing stage I-III CRC from controls in 321 samples from multiple centers for external validation, regardless of primary location or mutation status. This model also showed cancer-type specificity as well as high capacity for distinguishing AA from controls with an AUC of 78.6%. Functionally, differential 5hmC features associated with CRC and AA demonstrated relevance to CRC biology, including pathways such as calcium and MAPK signaling.

Conclusions:

Genome-wide mapping of 5hmC in cfDNA shows the promise as a highly sensitive and specific non-invasive blood test to be integrated in screening programs for improving early detection of CRC and high-risk AA.

Keywords: colorectal cancer, advanced adenoma, early detection, biomarker, 5-hydroxyemthylcytosine, cell-free DNA

BACKGROUND

Colorectal carcinoma (CRC) is the third most common cancer worldwide, with an estimated 1,881,000 new cases, as well as the second leading cause of cancer-related mortality, with an estimated 916,000 deaths from CRC, according to the Global Cancer Statistics 2020 (1). The most important predictor for patient survival is stage at diagnosis, with 5-year relative survival of 90% for patients diagnosed with localized CRC, in contrast to 14% for patients diagnosed with distant stage disease in the United States (2), showing a similar trend in China as well (3). Of particular concern is the steady increase of CRC incidence in China, rising from 100,000 to 560,000 new cases during the period of 1990 to 2020 (1,4,5), accounting for approximately 30% of global CRC cases in 2020. Colonoscopy remains the “gold standard” for screening patients with CRC or high-risk advanced adenoma (AA), notably improving patient survival in countries that implemented this approach for mass screening (6,7). However, the invasive nature of endoscopy may result in patient discomfort and can sometimes lead to more severe complications, including for example bleeding, cardiopulmonary-, sedation-, and infection-related adverse events, as well as perforation (8,9), which could influence patient adherence to screening colonoscopy (1013). Several alternative approaches have been developed to complement screening colonoscopy with the promise to potentially enhance patient adherence, particularly those using blood (e.g., SEPT9 promoter methylation) (14) or stool samples (e.g., fecal immunochemical test [FIT], Cologuard) (15,16). In addition, some countries including China would initially employ risk assessment questionnaires and prioritize non-invasive methods (e.g., FIT) to identify individuals who may benefit from further evaluation with endoscopy, rather than using colonoscopy as the immediate first-line screening tool (17). However, previous research has shown that adherence to stool-based tests in real-world settings is substantially lower than the perfect compliance projected by many theoretical models, hampering full realization of the screening benefits of these approaches (18). In contrast, evidence suggests that blood-based tests for early detection or screening are likely to result in higher patient adherence, which could significantly enhance the effectiveness of screening programs and ultimately improve clinical outcomes (19,20). Additionally, conventional serological tumor markers such as carcinoembryonic antigen (CEA) and carbohydrate antigen (CA) 19–9 are inadequate for detecting CRC or AA due to their poor sensitivity and specificity (21). Therefore, the development of a novel blood test that may improve patient adherence and feature high accuracy for the screening or early detection of colorectal neoplasia at stages when more treatment options are available, is urgently needed to overcome one of the critical barriers for improving patient survival.

Epigenetic alterations that occur during carcinogenesis, such as aberrant cytosine modifications in tumors or microenvironment due to field cancerization (22), may be exploited as novel molecular targets for early detection. Interestingly, circulating cell-free DNA (cfDNA) in plasma has been shown to reflect the epigenetic landscapes in tumors and microenvironment, offering an ideal source for cancer biomarker discovery that may address the challenge of intra-tumoral heterogeneity for tissue pathology-based approaches (23). In the human genome, 5-hydroxymethylcytosines (5hmC) are abundant epigenetic modifications generated in an active demethylation process through oxidation of 5-methylcytosines (5mC) by the ten-eleven translocation (TET) enzymes (24). The 5hmC modifications in promoters, gene bodies, and gene regulatory elements (e.g., enhancers) faithfully reflect gene expression activation and tissue specificity (25), and have been implicated in various cancers (26).

Due to constraints of conventional techniques (e.g., bisulfite conversion), the majority of previous epigenetic studies did not distinguish 5hmC from the more abundant 5mC or simply interpreted all modified cytosines as 5mC (27). In addition, the often-limited amount of cfDNA from patient biospecimens (e.g., a few nanograms from 5–10 mL of plasma) also requires a technique that is feasible in a clinical setting. To address these technical gaps, the 5hmC-Seal (28,29), a highly sensitive and selective chemical labeling technique was developed and optimized for genome-wide 5hmC profiling in limited input DNA. Using the 5hmC-Seal, recent studies from our team and other groups have identified 5hmC in cfDNA with diagnostic and/or prognostic value for a variety of cancers and chronic diseases (2937). Although independent studies have highlighted the potential of 5hmC in cfDNA samples as informative diagnostic/prognostic markers for CRC (38,39), it is important to acknowledge several limitations, including small sample size, selection bias, the lack of comparison with existing biomarkers, and most importantly, missing the detection of high-risk individuals with AA, which warranted the current investigation that exploited our biospecimen resources from a multi-center clinical trial.

Specifically, in this prospective study (METHOD-2), we employed the 5hmC-Seal technique in cfDNA samples derived from study participants including patients with CRC, AA, and other solid tumors, as well as non-CRC/AA controls from eight medical centers across China (Figure 1). We developed a 5hmC-based machine-learning model for distinguishing CRC (stage I-III) and AA from controls and validated the model in independent samples, with the primary goal of providing a highly sensitive and specific, non-invasive blood test for early detection of CRC and high-risk AA.

Figure 1. The METHOD-2 study for training and validation of a non-invasive 5hmC-based blood test for early detection of CRC and AA.

Figure 1.

A. The overview of the METHOD-2 study design. This multi-center, prospective, case-control study enrolled 2,852 participants from eight medical centers across China. Criteria-based screening excluded 165 individuals, with additional 56 individuals removed due to hemolysis or experimental failure. Subsequently, 2,628 samples were subjected to 5hmC-Seal profiling. After QC, 52 samples were removed due to low quality data, resulting in 2,576 for analysis, including patients with CRC, AA, and other cancers, as well as non-CRC/AA controls. The primary goal is to train and validate a non-invasive epigenetic test for early detection of CRC and AA. B. Shown is the overview of the technical and analytical pipeline of the METHOD-2 study that covers collection of plasma samples, preparation of cfDNA, the 5hmC-Seal library constriction and sequencing, feature selection and modeling using a machine-learning approach, and validation of model performance. QC, quality control; CRC, colorectal carcinoma; AA, advanced adenoma; CTRL, non-CRC/AA control; wd-score, weighted diagnostic score; cfDNA, cell-free DNA; NGS, next-generation sequencing.

MATERIAL AND METHODS

Study population – the METHOD-2 study

A total of 2,852 adult individuals (≥ 18 years) were enrolled in the METHOD-2, a prospective, multicenter, case-control, observational study (NCT03676075, Figure 1A, Appendix 1). Among the individuals considered, 165 were excluded based on the inclusion and exclusion criteria, while additional 56 individuals were excluded due to hemolysis or experimental failures, leaving 2,628 samples for the 5hmC-Seal profiling. All participants were first invited to take a cancer risk assessment by an established Clinical Cancer Risk Score System, which was previously reported (40). Demographic and clinicopathological data at baseline were retrieved from medical records using the established study protocol of each medical center (Appendix 1). We conducted central pathological reviews on all surgically removed tumors from CRC patients. The CRC stages were determined according to the 8th Edition of the American Joint Committee on Cancer (AJCC) Staging System Manual (41). AA was characterized by villous features (>25%), size ≥ 1.0 cm, or high-grade dysplasia. We collected additional details such as primary tumor location (e.g., proximal vs. distal colon, rectum), tumor size, RAS mutation, BRAF mutation, and CEA levels measured at the time of blood collection. Non-CRC/AA controls were recruited from individuals with normal routine physical examination results (e.g., B-ultrasound, liver and kidney functions, other routine blood tests, etc.) and negative colonoscopy within 90 days and/or histological examination findings prior to sample collection. Specifically, enrolled controls were individuals who: (1) had never been diagnosed with cancer; (2) had consented to receive and complete an investigation questionnaire; (3) were able and willing to undergo a screening colonoscopy within 90 days of enrollment and had a negative result; and (4) were able and willing to provide plasma samples. The exclusion criteria were set to exclude individuals who had: (1) a history of colorectal neoplasia, digestive cancer, or inflammatory bowel disease; (2) undergone colorectal resection for any reason other than sigmoid diverticula; or (3) overt rectal bleeding within the previous 30 days. This study was conducted in accordance with ethical guidelines as outlined by the Declaration of Helsinki. Written informed consent was obtained from each participant before the study, which was approved by the Institutional Review Board (approval number: B2019–041R) of Zhongshan Hospital, Fudan University, Shanghai, China.

Preparation of biospecimens, construction of 5hmC-Seal libraries, and sequencing

Plasma samples were isolated from peripheral blood collected at the time of diagnosis, prior to tumor resection for patients, and during physical examination for controls, and stored at 4°C until processing within 2 hours at the laboratory. We assessed DNA concentration using the Qubit High Sensitivity dsDNA Assay (Invitrogen, MA, USA). The processed plasma samples (~5 mL/individual) were then immediately stored at −80°C, following our published protocol (Figure 1B) (29) before shipping to Shanghai Epican Genetech Co., Ltd. (Shanghai, China) for 5hmC-Seal profiling. Circulating cfDNA (10~50 ng/individual) was extracted from plasma using the QIAamp Circulating Nucleic Acid Kit (Qiagen, Germany) according to the product instructions. Our study coordinator maintains a record of sample collection, processing, and shipment, as well as other experiment-related information (e.g., technician).

The cfDNA samples were randomized for constructing 5hmC-Seal libraries (Supplementary Methods, Supplementary Figure 1), following by the next-generation sequencing (NGS). Briefly, the Illumina-compatible adaptors were installed on the cfDNA samples, which were checked for quality and concentration. Next, T4 bacteriophage β-glucosyltransferase was used to transfer an engineered glucose moiety containing an azide group onto the hydroxyl group of 5hmC across the human genome. The azide group was then chemically modified with biotin, and streptavidin beads were used for affinity enrichment of 5hmC-containing DNA fragments. Subsequently, routine amplification and paired-end sequencing (PE50) were conducted using the Illumina NGS platform (Illumina, CA, USA) at Shanghai Epican Genetech Co., Ltd. (Shanghai, China). On average, ~30 million unique mapped reads per sample were obtained. Technical robustness of the 5hmC-Seal approach has been systematically demonstrated in our previous studies (2830,42).

Bioinformatic processing of the 5hmC-Seal data

Adapter sequences were removed from raw sequencing reads using Trimmomatic (RRID:SCR_011848). Low quality bases at the 5’ and 3’ were trimmed based on phred score, following our previous publication (29). The sequencing reads were aligned to the human genome reference (hg19) using Bowtie2 (RRID:SCR_016368) with the end-to-end alignment mode. Read pairs were concordantly aligned with fragment length ≤ 500 bp and with average ≤ 1 ambiguous base and up to four mismatched bases per 100 bp length. Alignments with Mapping Quality Score ≥ 10 were counted for overlap with the GENCODE (RRID:SCR_014966) annotations for gene bodies using featureCounts (RRID:SCR_012919), without strand information. For a sample to pass quality control (QC), it must have at least 100,000 read counts uniquely mapped to the ~20,000 known protein-coding gene bodies to ensure reliability.

The same procedure was performed for the promoter regions (3 kb upstream of TSS [transcription start site]), and colon-derived enhancer markers (i.e., peak files for H3K4me1 and H3K27ac derived from colonic mucosa, rectal mucosa, and sigmoid colon with overlapping peaks merged) from the Roadmap Epigenomics Project (43). The raw read counts were then normalized using DESeq2 (RRID:SCR_000154), which performs an internal normalization that corrects for library size. In order to control unwanted sources of variation in high-throughput experiments, the ComBat (RRID:SCR_010974) tool was used to minimize batch effects.

Study design and sample grouping

After 5hmC-Seal profiling of cfDNA samples from 2,628 participants, a total of 2,576 samples passed QC and were included in the downstream analysis, including newly diagnosed CRC (n=1,074), AA (n=356), and non-CRC/AA controls (n=1,066) recruited from eight medical centers across China from January 2019 to January 2020 (Supplementary Table 1, 2). Additionally, 80 cancer patients were recruited from Zhongshan Hospital, Shanghai, China to assess model specificity, including esophageal carcinoma (EC), hepatocellular carcinoma (HCC), lung adenocarcinoma (LUAD), and stomach adenocarcinoma (STAD) (Supplementary Table 3). Figure 2 presents a visual of the overall study design for model development. Participants recruited from four hospitals in Shanghai and the neighboring Zhejiang Province were combined as a discovery set, with 664 stage I-III CRC, 246 AA, and 960 controls, which were randomly divided into a training set (CRC, n=445; AA, n=167; controls, n=642) and an internal validation set (CRC, n=219; AA, n= 79; controls, n=318). The additional 144 stage IV/unknown pathology CRC patients from Shanghai and Zhejiang Province were set aside. Study participants recruited at four centers from Northern and Southern China formed an independent external validation set (CRC, n=266; AA, n=110; controls, n=106), with the aim to evaluate the model performance in samples collected outside of the primary site regarding potential heterogeneity of geographic and socioeconomic background. Clinical data were extracted from the electronic medical records of each participating hospital. Detailed data collection, management and storage methodologies are presented in the study protocol (Appendix 1).

Figure 2. Study design and participant disposition for model development.

Figure 2.

In total 2,576 cfDNA samples passing QC are grouped into a training set, an internal validation set, and an external validation set. The training set and the internal validation set are comprised of patients with CRC (stage I-III), AA, and controls from Shanghai and the neighboring Zhejiang Province in Eastern China. The study participants from other medical centers in Northern and Southern China are grouped into an external validation set. Patients with stage IV tumors are tested separately to evaluate model performance as well. Cancer specificity of the model is evaluated in a set of patients with non-CRC cancers (n=80). QC, quality control; CRC, colorectal carcinoma; AA, advanced adenoma; CTRL, non-CRC/AA control; EC, esophageal carcinoma; HCC, hepatocellular carcinoma; LUAD, lung adenocarcinoma; STAD, stomach adenocarcinoma.

Exploring gene regulatory elements and tissue of origin

The 5hmC-Seal data obtained from CRC, AA, and controls were plotted across various genomic features, including gene bodies, TSS, exons, and colon-derived enhancer markers. The Pearson’s correlation between gene bodies and promoters was calculated for each individual, and a null distribution was simulated by bootstrapping (N=10,000 times). Peaks showing differential modification were identified by DESeq2 (RRID:SCR_000154), and summarized by varying fold-change thresholds. Taking advantage of our published 5hmC Human Tissue map (25,44), we also examined the ranking of top differentially modified gene bodies (e.g., CRC/AA vs. controls) across 18 normal tissues to explore tissue of origin. Briefly, differentially modified gene bodies (p<0.01) in the training set were systematically ranked according to the occurrence across 18 tissue-specific 5hmC profiles (Supplementary Methods). Ranking in different tissues for each differential gene body was summarized and plotted in a heatmap.

Development of a machine-learning model to detect CRC and AA

We applied a two-step procedure to develop a weighted diagnostic score (wd-score) for distinguishing CRC (stage I-III)/AA from controls (Figure 1B, Figure 2). In step 1, a multivariable logistic regression model, controlling for age and sex, selected the most informative 5hmC features (i.e., gene bodies) in the training set (p<0.01). In step 2, the elastic net regularization, using glmnet (RRID:SCR_015505), was implemented to fit a multivariable logistic regression model for distinguishing CRC/AA from controls. Parameters were cross-validated (5-fold) for a grid of values for α (0.05 to 1.0, controlling the relative proportion of the Ridge and Lasso penalty) and λ (10−5 to 1.0, controlling the overall strength of penalty). This selection process was repeated 100 times. Gene bodies (important value > 30 in each iteration), selected in at least 80% of the iterations, were retained to fit the final logistic model. The wd-score was calculated as: wd-score =11+eβ0+βkGk where β0 is the intercept, and for the kth gene, βk is the final logistic regression coefficient and Gk is the normalized 5hmC level. The wd-score was scaled between 0 and 1. The area under the curve (AUC) and 95% confidence intervals (CI) were computed to show model performance. Sensitivity and specificity were computed using the wd-score cutoff ensuring a minimum of 90% specificity in the training set.

Evaluation of model performance and clinical variables

Multivariable logistic regression models were used to assess whether the wd-scores were independent from various clinicopathological variables in the validation sets. The performance of the wd-scores was also compared with CEA (cutoff: ≥5 ng/mL) at the time of blood collection in those samples with available data. To demonstrate cancer-type specificity, we computed the wd-scores in a set of 80 patients with solid tumors, including EC, HCC, LUAD, or STAD (Supplementary Table 3). We also evaluated the differences in wd-scores by primary tumor location (colon or rectum), and by the presence of mutations in specific genes (i.e., KRAS, NRAS, BRAF, and PIK3CA) compared to patients with wild-type alleles.

Exploration of functional insights

We used the clusterProfiler (RRID:SCR_016884) to explore functional relevance of the differentially modified genes (gene bodies) for enrichment of canonical pathways maintained at the Kyoto Encyclopedia of Genes and Genomes database (KEGG, RRID:SCR_012773). Pathways with at least ten gene counts and 5% false discovery rate (FDR) were considered significant. To understand the system-level functionality of the differentially modified genes, we used the CEMiTool (45) to reveal modules of 5hmC co-modification networks. Those differential genes with an absolute value of coefficient > 0.6 under the logistic regression model distinguishing CRC (stage I-III)/AA from controls in the training set were used for the co-modification network analysis. In addition, we searched the protein-protein interactions (PPI) from the Human Reference Interactome (HuRI) (46) map to identify potential hub genes for the differentially modified gene bodies. For each of the identified modules (e.g., the groups of co-modified 5hmC-containing genes), the Normalized Enrichment Score (NES) was calculated (47). Specifically, the Gene Set Enrichment Analysis (GSEA) was performed using the component genes within each identified module as the reference gene set and the median z-score values of each group (e.g., CRC/AA and controls) as the rank (47). For each NES, the Benjamini-Hochberg adjusted p-value (48) was calculated, and an adjusted p-value (FDR) < 0.05 was considered significant.

In addition, using the RNA-seq data on colon cancer from The Cancer Genome Atlas (TCGA) Project (tumors, n=469; normal tissues, n=41), we investigated whether the final model components based on 5hmC exhibited gene dysregulation in TCGA data (49).

Statistical analysis

All statistical analyses were performed under the R Statistical Computing Environment (v4.1.2) (50). When testing the relationship between wd-scores and clinicopathological variables, continuous variables were analyzed using the two-sided Student’s t-tests, and categorical variables were analyzed using the Chi-square test. The limma R package (RRID:SCR_010943) was used to detect differentially expressed genes between TCGA colon cancer tumors and normal tissues. A p-value < 0.05 was considered statistically significant. When comparing histone peaks co-localized with model genes between CRC/AA and controls, a 5% FDR cutoff was used. To assess the discriminative performance of the wd-scores, a receiver operating characteristic (ROC) curve was generated, and an AUC was computed with 95% CIs.

Data availability

The raw 5hmC-Seal data have been deposited in the National Omics Data Encyclopedia (NODE) database of the Chinese Academy of Sciences and can be downloaded at this link: https://www.biosino.org/node/project/detail/OEP003666. Additional data generated in this study, including the data underlying the graphs and figures, are available upon formal request to the corresponding authors.

RESULTS

Genome-wide 5hmC profiles in cfDNA reveal gene regulatory relevance

Overall, the principal components analysis (PCA) showed no systematic bias of genome-wide 5hmC (~20,000 gene bodies) across different groups (CRC, AA, controls) or medical centers among the study participants (Figure 3AB). A simulation analysis showed higher correlation of 5hmC modification levels between promoters and gene bodies in all study participants (Figure 3C), supporting a non-random distribution of these epigenetic modifications in genic regions. Genome-wide 5hmC modifications were found to be enriched in genic regions, particularly gene bodies relative to their flanking regions (Figure 3D) as well as around exons and colon-derived enhancer markers (H3K4me1, H3K27ac) (Figure 3E), an observation consistent with previous studies (2931).

Figure 3. Genome-wide 5hmC profiles in cfDNA reveal gene regulatory relevance and tissue of origin.

Figure 3.

A-B. The principal components analysis (PCA) shows no systematic bias of genome-wide 5hmC (~20,000 gene bodies) across patient groups (A) or medical centers (B). C. The 5hmC modification levels show a higher correlation (red dashed line) between promoter regions and gene bodies, comparing to a simulated null distribution by bootstrapping (empirical p<0.001). D-E. Shown are summarized 5hmC distributions in CRC, AA, and controls at various genomic features. “A”, splicing acceptor; “D”, splicing donor; TSS, transcription start site; CRC, colorectal carcinoma; AA, advanced adenoma; CTRL, non-CRC/AA control.

Differentially modified signatures in CRC and AA

In the training set, 1,391 gene bodies showed a trend of differential modification (p<0.01) between CRC (stage I-III)/AA and controls, after controlling age and sex (Supplementary Figure 2A, Supplementary Table 4). These gene bodies served as candidate markers for subsequent feature selection in the modeling stage. Ranking analysis of these gene bodies revealed that the top-ranking ones closely matched colon-derived 5hmC profiles. Specifically, based on our 5hmC Human Tissue map (25), the top-ranking gene bodies were found to be ranked most frequently as the 1st position for colon-specific 5hmC profiles when compared across various tissue types (Supplementary Figure 2B), indicating that the 5hmC signatures associated with CRC/AA may reflect tissue of origin.

Performance of a 5hmC-based machine-learning model for detecting CRC/AA

The elastic net regularization on 1,391 candidate gene bodies through logistic regression yielded 96 highly consistent features as the final model components for detecting CRC (stage I-III)/AA (Supplementary Figure 2C). Notably, 48.9% (303/619) of colon-derived H3K4me1 peaks and 48.0% (324/475) of H3K27ac peaks co-localized with the 96 model genes showed differential modification between CRC (stage I-III)/AA and controls in the training set as well at 5% FDR (Supplementary Figure 3). This co-localization pattern suggested a potential mechanism involving cis-regulatory elements for the 5hmC-based model genes. The individual model genes’ discriminatory power, represented by the AUCs ranging from 52.0 to 65.2% (Supplementary Figure 4), underscored the need for an integrated and weighted model, i.e., the proposed wd-scores, for distinguishing CRC/AA from controls.

In the training set, the wd-scores effectively distinguished stage I-III CRC or AA from controls (AUC = 94.6%; 95% CI: 93.4–95.7%) (Figure 4A). Similar performance was achieved in the internal (AUC = 92.6; 95% CI: 90.7–94.5%) and external validation sets (AUC = 87.2%; 95% CI: 83.5–90.9%) (Figure 4A). The wd-scores performed comparably in differentiating stage I-III CRC from controls, as well as AA from controls, with slightly better results for CRC (Figure 4BC). Specifically, for stage I-III CRC, AUCs were 95.6% in the training set, 94.3% in the internal validation set, and 90.7% in the external validation set (Figure 4B), while for AA, AUCs were 91.8%, 87.9%, and 78.6%, respectively (Figure 4C).

Figure 4. Performance of the 5hmC-based model for detecting CRC and AA.

Figure 4.

A-C. The AUCs and 95% CIs of the wd-scores in the training, internal validation, and external validation sets for distinguishing: CRC (stage I-III) and AA from controls (A); CRC (stage I-III) from controls (B); and AA from controls (C). D. The boxplots show the distributions of wd-scores by diagnosis in the training and both validation sets. CRC, colorectal carcinoma; AA, advanced adenoma; CTRL, non-CRC/AA control; AUC, area under the curve; CI, confidence interval; wd-score, weighted diagnostic score.

Notably, the wd-scores increased progressively from controls to AA to CRC, with CRC patients assigned significantly higher scores than AA or controls in all sets (Student’s t-test; p<0.001, Figure 4D). At a cutoff of 0.556, the wd-scores distinguished CRC (stage I-III)/AA from controls with high specificity (90.0% in the training set, 87.4% in the internal validation set, and 83.0% in the external validation set) and sensitivity (83.3%, 80.0%, and 71.3%, respectively). Additionally, the wd-scores remained significantly associated with the diagnosis in multivariable logistic regression (p<0.001) after adjusting for age, sex, and primary tumor location (colon or rectum) (Supplementary Figure 5AB). Moreover, the distinguishing capacity of wd-scores was independent of RAS, BRAF, PIK3CA mutation status (Supplementary Figure 5CF), indicating that the diagnostic value of this blood-based 5hmC test holds the promise for patients with mutationally heterogeneous tumors.

Additionally, though not our primary focus, the model’s performance in differentiating stage I-III CRC from AA was assessed. In the internal validation set, it achieved an AUC of 71.2% (95% CI: 64.7–77.6%), while in the external validation set, an AUC of 70.1% (95%CI: 64.2–76.1%) was recorded (Supplementary Figure 6A). The model showed better performance in later stages (II-III) relative to stage I versus AA in all sample sets (Supplementary Figure 6B).

Comparison with CEA and cancer-type specificity

The wd-scores exhibited consistently higher accuracy across all tumor stages than CEA, which showed AUCs of 78.0% in training, 77.1% in internal validation, and 73.2% in external validation (Figure 5A). Notably, the 5hmC-based wd-scores were able to detect stage I-III CRC that would have been misclassified by CEA alone. For instance, in the internal validation set, of the 236 patients with stage I-III CRC, 145 (61.4%) would be misclassified because of “CEA” (e.g., below the conventional cutoff of 5ng/mL). In contrast, the wd-scores correctly identified 124 (57.7%) of these patients as “wd-score+”, significantly improving sensitivity over CEA alone (Figure 5B). Similar performance could be observed in the external validation samples as well (Supplementary Figure 7). Moreover, the wd-scores outperformed CEA in all patient subgroups with CRC (stage I-III)/AA (Figure 5C). Specifically, at a specificity of 90%, the wd-scores achieved sensitivities of 59.5% for AA, 73.5% for stage I CRC, and 85.3% for stage II/III CRC in the internal validation set, compared with CEA’s performance of only 12.8%, 29.4%, and 47.2% sensitivity in these groups, respectively (Figure 5D). In the external validation set, despite a lower sensitivity of 34.5% for AA, 55.8% for stage I CRC, and 68.0% for stages II/III CRC, the wd-scores continued to outperform CEA, which showed 28.8%, 25.0%, and 53.8% sensitivity in the respective categories (Figure 5D). Additionally, the wd-scores demonstrated cancer-type specificity, showing a significantly higher detection rate for AA and stage I-IV CRC (external validation) compared with EC, HCC, LUAD, and STAD, respectively (Supplementary Figure 8).

Figure 5. Comparison of model performance with CEA and evaluation of cancer specificity.

Figure 5.

A. The AUCs and 95% CIs for CEA in the training, internal validation, and external validation sets are shown. B. The wd-scores can detect stage I-III CRC patients that would be misclassified by CEA at the time of blood collection. “CEA+” denotes patients who would be detected by CEA (cutoff: ≥5ng/mL), and “CEA-” for those would be missed by CEA. C-D. The wd-scores outperform CEA in terms of AUC (C) and sensitivity (D) in all sample sets, regardless of tumor stage. The 95% CIs are presented as error bars. CRC, colorectal carcinoma; AA, advanced adenoma; CTRL, non-CRC/AA control; AUC, area under the curve; CI, confidence interval; CEA, carcinoembryonic antigen; wd-score, weighted diagnostic score.

Evaluation of the early detection model in late-stage CRC

Despite being trained using CRC (stage I-III) and AA, the wd-scores exhibited comparable performance for patients with late-stage tumors. In the internal validation set, the wd-scores successfully detected 106 of 138 (76.8%) patients with stage IV CRC, achieving 88.7% AUC. Similarly, in the external validation set of 23 stage IV tumors, the wd-scores achieved 70.0% sensitivity and 89.5% AUC (Supplementary Figure 9).

Functional relevance of the 5hmC signatures for CRC and AA

Functional annotation analysis of the 5hmC signatures for CRC and AA (i.e., the 1,391 differential gene bodies identified in the training set) suggested enrichment of cancer-relevant KEGG pathways (Figure 6A), including neuroactive ligand-receptor interaction and various signaling pathways, such as calcium signaling, MAPK signaling, and cAMP signaling (Supplementary Table 5). The co-modification analysis identified 6 interaction/co-modification modules involving the differential gene bodies for CRC and AA (i.e., 628 out of the 1,391 differential gene bodies that had absolute value of coefficient > 0.6) (Figure 6B, C; Supplementary Figure 10). Interestingly, many differentially modified genes enriched with these modules have been linked to the etiopathogenesis or metastasis of CRC. For example, GJA1 (encoding gap junction alpha-1) of the most significant module (M2) (Figure 6C, D) was one of the final components in our 5hmC-based model for CRC and AA. Specifically, GJA1 was a prognostic biomarker and correlated with immune infiltrates in CRC (51). Another marker gene, CACNA1A (encoding calcium voltage-gated channel subunit alpha1 A), was a member of the enriched calcium signaling pathway and M2, and has been implicated to be a potential cancer therapeutic target (52).

Figure 6. Functional implications of the differentially modified 5hmC signatures associated with CRC and AA.

Figure 6.

A. Shown are enriched KEGG pathways (FDR<0.05) of the 1,391 differentially modified 5hmC signatures associated with CRC (stage I-III) and AA (p<0.01) in the training set. “GeneRatio” is the ratio between the number of differential genes and the total genes in a particular pathway. B. The co-modification analysis detects 6 interaction/co-modification modules involving 628 differential 5hmC signatures (absolute value of coefficient > 0.6) associated with CRC (stage I-III) and AA. C. Shown is an example of the protein-protein interaction (PPI) network constructed for the detected module (M2) from the co-modification analysis. D. Shown is an example for the distribution of 5hmC reads within the gene body region of GJA1, a hub gene in the M2 module, taken from a random set of CRC (stage I-III) and non-CRC/AA control samples. CRC, colorectal carcinoma; AA, advanced adenoma; CTRL, non-CRC/AA control; FDR, false discovery rate; KEGG, Kyoto Encyclopedia of Genes and Genomes database; NES, normalized enrichment score.

In addition, for 92 out of the 96 final model component genes, the normalized RNA-seq data of 469 primary tumors and 41 normal tissues were extracted from the colon cancer dataset from TCGA. Differential analysis showed that 64 (69.6%) out of these 92 model genes showed a trend of differential expression (p<0.05) between tumors and normal tissues (Supplementary Figure 11), indicating the relevance to tumor transcriptional dysregulation of these 5hmC markers in cfDNA in the independent TCGA samples.

DISCUSSION

While screening colonoscopy is widely used for detecting colonic adenomas and cancers, and has been recommended for all adults aged 45 to 75 years by the US Preventive Services Task Force and Chinese Society of Clinical Oncology, patient adherence has been poor (e.g., ~40% not compliant in the US) (6,53,54). This is attributed to the high costs of equipment and procedures, the invasiveness and associated pain and potential complications, as well as the significant time commitment required (1113). Studies have shown that the unpleasantness and discomfort associated with colonoscopy and colon preparation negatively impact adherence (10,55,56). As a result, individuals often favor less invasive screening methods, such as fecal occult blood tests, despite their lower sensitivity and specificity, particularly for local and regional CRC or AA (56,57). Compared to fecal tests (e.g., Cologuard) and colonoscopy, blood-based tests likely would be preferred by patients as suggested in a US study (19). However, test like SEPT9 methylation in plasma, with an overall sensitivity of 48% for CRC screening, has shown limited efficacy in detecting AA, demonstrating only 21% sensitivity and 79% specificity in a recent study (58). Therefore, we sought to develop a non-invasive and more patient-friendly screening approach with high sensitivity and specificity that could enhance current CRC and AA screening programs, aiming to improve patient adherence and early disease detection.

Notably, the wd-scores we developed showed superior sensitivity over CEA, a widely used protein cancer biomarker, effectively detecting high-risk individuals with AA and early-stage CRC. Even though our model was not directly trained on stage IV CRC, it showed reasonable sensitivity while maintaining cancer-type specificity across various common cancers. Of note, a recent study explored the use of 5hmC in cfDNA for CRC detection but faced a major limitation of selection bias by using samples only from individuals undergoing colonoscopy, including controls, thus skewing the cohort towards high-risk individuals (39). In contrast, our study achieved an improved AUC due to a larger sample size that included both non-CRC/AA controls and high-risk AA individuals. Furthermore, our comparative analysis with existing biomarkers, such as CEA, has enhanced the clinical relevance of our method by highlighting its ability to detect CRC cases that CEA alone would miss. Specifically, combining the 5hmC-based model with CEA could further enhance detection performance, as our findings suggested. Thus, the 5hmC markers in cfDNA outlined in this study could enhance early CRC patient detection when tumors remain treatable, also would benefit those patients who would be misclassified by CEA alone. In addition, our model not only demonstrated a comparable level of sensitivity for CRC detection relative to alternative stool-based approaches (e.g., 73.8% for FIT, 92.3% for Cologuard) (15), but also showed potential to outperform the blood-based SEPT9 test (58,59). Therefore, the current study presented a robust and comprehensive screening approach for CRC and AA, not only outperforming CEA but also offering broader applicability in early cancer detection through enhancement of patient adherence with this new blood test.

Importantly, although the current METHOD-2 study was not designed to investigate molecular mechanisms, the capability of the 5hmC-Seal to carry out whole-genome epigenetic analysis also presents a significant advantage for improving our understanding of CRC biology, compared to for example the stool-based FIT or the SEPT9 test. Interestingly, the majority of the final model genes were found to be associated with gene dysregulation in the independent TCGA colon cancer dataset, suggesting the relevance of our identified signatures to CRC biology, which is further supported by the enrichment of relevant cancer pathways such as calcium and MAPK signaling among those differentially modified genes associated with CRC/AA from our study.

In addition, the 5hmC-Seal technique in cfDNA holds promise not only to improve adherence to CRC screening compared to stool-based tests or colonoscopy but also to provide a feasible tool for future monitoring of tumor recurrence and therapy response, considering the clinical convenience of this blood-based test that requires only limited amount of plasma (e.g., <5 mL). Technically, the 5hmC-Seal, involving low-coverage sequencing of enriched 5hmC-contatining cfDNA fragments (29,42), is potentially cost-effective and optimized for minimal DNA input (nanogram-level) (29,38), as well as addressing the issue of intra-tumoral heterogeneity that is common in mutation-based research (60). Considering the low yield and highly fragmented nature of cfDNA, the chemical labeling 5hmC-Seal technique features the sensitivity and technical robustness required for blood-based tests. We anticipate that patients will benefit substantially from the application of this highly sensitive and specific non-invasive assay as part of a routine screening program or disease monitoring in the future.

We acknowledge several limitations for the current study. Firstly, there are age differences between controls and CRC patients, which we addressed by adjusting age as a covariate in our multivariable logistic model during the initial discovery phase. This adjustment aimed to minimize its impact, and our findings suggested that the age difference did not significantly affect the wd-score’s ability to differentiate between controls and CRC patients. Secondly, potential selection bias might exist despite controlling for key clinical variables. While our study was conducted across various regions in China, it would benefit from validation in diverse global populations to enhance the generalizability of our findings in the future (Supplementary Table 6). Thirdly, a direct comparison of our method with screening colonoscopy and other non-invasive alternatives, such as the SEPT9 promoter methylation test (14), Panseer (61), or stool-based screenings (e.g., FIT, Cologuard) (15,16), is necessary in future longitudinal and prospective studies to establish its clinical utility of early detection of CRC and AA as well as cost-efficiency and patient adherence. Furthermore, future prospective studies could help address the potential issue of accuracy overestimation that might arise from our initial case-control study design.

In conclusion, we developed a highly sensitive and specific non-invasive epigenetic test for early detection of CRC and AA, exploiting our clinical resources and the established 5hmC-Seal technique in cfDNA. This approach could significantly improve screening adherence and ultimately clinical outcomes, through early detection of patients with surgically resectable CRC tumors or high-risk AA.

Supplementary Material

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

Translational relevance.

The escalating incidence of colorectal carcinoma (CRC) worldwide, and the pronounced survival advantage of early-stage diagnosis highlight an urgent need for improved non-invasive screening methods. Our study addresses this gap by exploiting the diagnostic potential of 5-hydroxymethylcytosines (5hmC) in circulating cell-free DNA (cfDNA), a novel epigenetic marker reflecting gene expression and tissue specificity. Overcoming limitations of traditional serological markers and invasive endoscopy as well as challenges of stool-based tests, our 5hmC-based model, trained and validated in a large, multi-center cohort, offers a sensitive and specific blood test for early CRC detection and identification of high-risk advanced adenoma (AA). This approach not only promises to elevate screening effectiveness but also aims to increase patient adherence, potentially transforming early cancer detection and improving survival outcomes. Our study stands at the forefront of a shift towards more patient-friendly and accurate screening strategies, thus marking a significant advancement in CRC diagnostics and patient care.

Acknowledgments

This study was partially supported by funds from the following sources: the National Natural Science Foundation of China (82072653, 81602035) to WC and JX; Clinical Research Plan of SHDC (SHDC2020CR5006, SHDC2020CR3037B) to JX and LR; Shanghai Science and Technology Committee Project (19511121300) to JX and LR; Fujian Provincial Health Commission Project (2021GGB032) to WC; Fujian Science and Technology Committee Project (2023J06057) to WC; Xiamen Science and Technology Agency Program (3502Z20224ZD1067) to WC; and the US National Institutes of Health (U01CA217078) to WZ (Northwestern) and MB. The authors also would like to thank Odyssey (Beijing) Medical Research Inc. for providing data and safety monitoring service. The cost of the monitoring service was covered by two funds (SHDC2020CR3037B and 19511121300).

List of Abbreviations:

5hmC

5-hydroxymethylcytosine

5mC

5-methylcytosine

AA

advanced adenoma

AUC

area under the curve

CEA

carcinoembryonic antigen

CRC

colorectal carcinoma

CTRL

non-CRC/AA control

cfDNA

cell-free DNA

CI

confidence interval

EC

esophageal carcinoma

FDR

false discovery rate

FIT

fecal immunochemical test

HCC

hepatocellular carcinoma

KEGG

Kyoto Encyclopedia of Genes and Genomes database

LUAD

lung adenocarcinoma

NES

normalized enrichment score

NGS

next-generation sequencing

PCA

principal components analysis

PPI

protein-protein interaction

QC

quality control

STAD

stomach adenocarcinoma

TCGA

The Cancer Genome Atlas

TET

ten-eleven translocation enzyme

wd-score

weighted diagnostic score

Footnotes

Declaration of interests

Jun Yang and Yixiang Shi are employees of Bionova (Shanghai) MedTech. The 5hmC-Seal technology was invented by Chuan He and was licensed by Shanghai Epican for clinical applications from the University of Chicago. Dongdong Li was an employee of Shanghai Epican. Wei Zhang (Northwestern) was a consultant to Shanghai Epican. No other disclosures were reported.

REFERENCES

  • 1.Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: a cancer journal for clinicians 2021. [DOI] [PubMed] [Google Scholar]
  • 2.Siegel RL, Miller KD, Goding Sauer A, Fedewa SA, Butterly LF, Anderson JC, et al. Colorectal cancer statistics, 2020. CA: a cancer journal for clinicians 2020;70(3):145–64 doi 10.3322/caac.21601. [DOI] [PubMed] [Google Scholar]
  • 3.Wang R, Lian J, Wang X, Pang X, Xu B, Tang S, et al. Survival rate of colorectal cancer in China: A systematic review and meta-analysis. Front Oncol 2023;13:1033154 doi 10.3389/fonc.2023.1033154. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Chen W, Zheng R, Baade PD, Zhang S, Zeng H, Bray F, et al. Cancer statistics in China, 2015. CA: a cancer journal for clinicians 2016;66(2):115–32 doi 10.3322/caac.21338. [DOI] [PubMed] [Google Scholar]
  • 5.Zhang L, Cao F, Zhang G, Shi L, Chen S, Zhang Z, et al. Trends in and Predictions of Colorectal Cancer Incidence and Mortality in China From 1990 to 2025. Front Oncol 2019;9:98 doi 10.3389/fonc.2019.00098. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Preventive U. S. Services Task Force, Davidson KW, Barry MJ, Mangione CM, Cabana M, Caughey AB, et al. Screening for Colorectal Cancer: US Preventive Services Task Force Recommendation Statement. JAMA 2021;325(19):1965–77 doi 10.1001/jama.2021.6238. [DOI] [PubMed] [Google Scholar]
  • 7.Maes-Carballo M, Garcia-Garcia M, Martin-Diaz M, Estrada-Lopez CR, Iglesias-Alvarez A, Filigrana-Valle CM, et al. A comprehensive systematic review of colorectal cancer screening clinical practices guidelines and consensus statements. Br J Cancer 2023;128(6):946–57 doi 10.1038/s41416-022-02070-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.ASGE Standards of Practice Committee, Ben-Menachem T, Decker GA, Early DS, Evans J, Fanelli RD, et al. Adverse events of upper GI endoscopy. Gastrointest Endosc 2012;76(4):707–18 doi 10.1016/j.gie.2012.03.252. [DOI] [PubMed] [Google Scholar]
  • 9.Veitch AM, Vanbiervliet G, Gershlick AH, Boustiere C, Baglin TP, Smith LA, et al. Endoscopy in patients on antiplatelet or anticoagulant therapy, including direct oral anticoagulants: British Society of Gastroenterology (BSG) and European Society of Gastrointestinal Endoscopy (ESGE) guidelines. Endoscopy 2016;48(4):385–402 doi 10.1055/s-0042-102652. [DOI] [PubMed] [Google Scholar]
  • 10.Harewood GC, Wiersema MJ, Melton LJ 3rd. A prospective, controlled assessment of factors influencing acceptance of screening colonoscopy. Am J Gastroenterol 2002;97(12):3186–94 doi 10.1111/j.1572-0241.2002.07129.x. [DOI] [PubMed] [Google Scholar]
  • 11.Anderson JC, Fortinsky RH, Kleppinger A, Merz-Beyus AB, Huntington CG 3rd, Lagarde S. Predictors of compliance with free endoscopic colorectal cancer screening in uninsured adults. J Gen Intern Med 2011;26(8):875–80 doi 10.1007/s11606-011-1716-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Inadomi JM, Vijan S, Janz NK, Fagerlin A, Thomas JP, Lin YV, et al. Adherence to colorectal cancer screening: a randomized clinical trial of competing strategies. Arch Intern Med 2012;172(7):575–82 doi 10.1001/archinternmed.2012.332. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Garborg K, Holme O, Loberg M, Kalager M, Adami HO, Bretthauer M. Current status of screening for colorectal cancer. Ann Oncol 2013;24(8):1963–72 doi 10.1093/annonc/mdt157. [DOI] [PubMed] [Google Scholar]
  • 14.Grutzmann R, Molnar B, Pilarsky C, Habermann JK, Schlag PM, Saeger HD, et al. Sensitive detection of colorectal cancer in peripheral blood by septin 9 DNA methylation assay. PloS one 2008;3(11):e3759. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Imperiale TF, Ransohoff DF, Itzkowitz SH, Levin TR, Lavin P, Lidgard GP, et al. Multitarget stool DNA testing for colorectal-cancer screening. N Engl J Med 2014;370(14):1287–97 doi 10.1056/NEJMoa1311194. [DOI] [PubMed] [Google Scholar]
  • 16.Lin JS, Perdue LA, Henrikson NB, Bean SI, Blasi PR. Screening for Colorectal Cancer: Updated Evidence Report and Systematic Review for the US Preventive Services Task Force. JAMA 2021;325(19):1978–98 doi 10.1001/jama.2021.4417. [DOI] [PubMed] [Google Scholar]
  • 17.Expert Group on Early Diagnosis and Treatment of Cancer, Chinese Society of Oncology, Chinese Medical Association. Expert consensus on the early diagnosis and treatment of colorectal cancer in China (2023 edition). Zhonghua Yi Xue Za Zhi 2023;103(48):3896–908 doi 10.3760/cma.j.cn112137-20230804-00164. [DOI] [PubMed] [Google Scholar]
  • 18.Fisher DA, Princic N, Miller-Wilson LA, Wilson K, DeYoung K, Ozbay AB, et al. Adherence to fecal immunochemical test screening among adults at average risk for colorectal cancer. Int J Colorectal Dis 2022;37(3):719–21 doi 10.1007/s00384-021-04055-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Adler A, Geiger S, Keil A, Bias H, Schatz P, deVos T, et al. Improving compliance to colorectal cancer screening using blood and stool based tests in patients refusing screening colonoscopy in Germany. BMC Gastroenterol 2014;14:183 doi 10.1186/1471-230X-14-183. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Broc G, Denis B, Fassier JB, Gendre I, Perrin P, Quintard B. Decision-making in fecal occult blood test compliance: A quali-quantitative study investigating motivational processes. Prev Med 2017;105:58–65 doi 10.1016/j.ypmed.2017.08.023. [DOI] [PubMed] [Google Scholar]
  • 21.Thomas DS, Fourkala EO, Apostolidou S, Gunu R, Ryan A, Jacobs I, et al. Evaluation of serum CEA, CYFRA21–1 and CA125 for the early detection of colorectal cancer using longitudinal preclinical samples. British journal of cancer 2015;113(2):268–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Ushijima T Epigenetic field for cancerization. J Biochem Mol Biol 2007;40(2):142–50 doi 10.5483/bmbrep.2007.40.2.142. [DOI] [PubMed] [Google Scholar]
  • 23.Gao Q, Zeng Q, Wang Z, Li C, Xu Y, Cui P, et al. Circulating cell-free DNA for cancer early detection. Innovation (Camb) 2022;3(4):100259 doi 10.1016/j.xinn.2022.100259. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Branco MR, Ficz G, Reik W. Uncovering the role of 5-hydroxymethylcytosine in the epigenome. Nat Rev Genet 2011;13(1):7–13 doi 10.1038/nrg3080. [DOI] [PubMed] [Google Scholar]
  • 25.Cui XL, Nie J, Ku J, Dougherty U, West-Szymanski DC, Collin F, et al. A human tissue map of 5-hydroxymethylcytosines exhibits tissue specificity through gene and enhancer modulation. Nat Commun 2020;11(1):6161 doi 10.1038/s41467-020-20001-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Mariani CJ, Madzo J, Moen EL, Yesilkanal A, Godley LA. Alterations of 5-hydroxymethylcytosine in human cancers. Cancers (Basel) 2013;5(3):786–814 doi 10.3390/cancers5030786. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Skvortsova K, Zotenko E, Luu PL, Gould CM, Nair SS, Clark SJ, et al. Comprehensive evaluation of genome-wide 5-hydroxymethylcytosine profiling approaches in human DNA. Epigenetics Chromatin 2017;10:16 doi 10.1186/s13072-017-0123-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Song CX, Szulwach KE, Fu Y, Dai Q, Yi C, Li X, et al. Selective chemical labeling reveals the genome-wide distribution of 5-hydroxymethylcytosine. Nat Biotechnol 2011;29(1):68–72 doi 10.1038/nbt.1732. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Li W, Zhang X, Lu X, You L, Song Y, Luo Z, et al. 5-Hydroxymethylcytosine signatures in circulating cell-free DNA as diagnostic biomarkers for human cancers. Cell Res 2017;27(10):1243–57 doi 10.1038/cr.2017.121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Cai J, Chen L, Zhang Z, Zhang X, Lu X, Liu W, et al. Genome-wide mapping of 5-hydroxymethylcytosines in circulating cell-free DNA as a non-invasive approach for early detection of hepatocellular carcinoma. Gut 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Cai J, Zeng C, Hua W, Qi Z, Song Y, Lu X, et al. An integrative analysis of genome-wide 5-hydroxymethylcytosines in circulating cell-free DNA detects noninvasive diagnostic markers for gliomas. Neurooncol Adv 2021;3(1):vdab049 doi 10.1093/noajnl/vdab049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Chiu BC, Chen C, You Q, Chiu R, Venkataraman G, Zeng C, et al. Alterations of 5-hydroxymethylation in circulating cell-free DNA reflect molecular distinctions of subtypes of non-Hodgkin lymphoma. NPJ genomic medicine 2021;6(1):11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Chiu BC, Zhang Z, Derman BA, Karpus J, Luo L, Zhang S, et al. Genome-wide profiling of 5-hydroxymethylcytosines in circulating cell-free DNA reveals population-specific pathways in the development of multiple myeloma. J Hematol Oncol 2022;15(1):106 doi 10.1186/s13045-022-01327-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Shao J, Wang S, West-Szymanski D, Karpus J, Shah S, Ganguly S, et al. Cell-free DNA 5-hydroxymethylcytosine is an emerging marker of acute myeloid leukemia. Sci Rep 2022;12(1):12410 doi 10.1038/s41598-022-16685-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Beadell AV, Zhang Z, Capuano AW, Bennett DA, He C, Zhang W, et al. Genome-Wide Mapping Implicates 5-Hydroxymethylcytosines in Diabetes Mellitus and Alzheimer’s Disease. J Alzheimers Dis 2023;93(3):1135–51 doi 10.3233/JAD-221113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Ren Y, Zhang Z, She Y, He Y, Li D, Shi Y, et al. A Highly Sensitive and Specific Non-Invasive Test through Genome-Wide 5-Hydroxymethylation Mapping for Early Detection of Lung Cancer. Small Methods 2023:e2300747 doi 10.1002/smtd.202300747. [DOI] [PubMed] [Google Scholar]
  • 37.Zeng C, Song X, Zhang Z, Cai Q, Cai J, Horbinski C, et al. Dissection of transcriptomic and epigenetic heterogeneity of grade 4 gliomas: implications for prognosis. Acta Neuropathol Commun 2023;11(1):133 doi 10.1186/s40478-023-01619-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Song CX, Yin S, Ma L, Wheeler A, Chen Y, Zhang Y, et al. 5-Hydroxymethylcytosine signatures in cell-free DNA provide information about tumor types and stages. Cell Res 2017;27(10):1231–42 doi 10.1038/cr.2017.106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Walker NJ, Rashid M, Yu SR, Bignell H, Lumby CK, Livi CM, et al. Hydroxymethylation profile of cell-free DNA is a biomarker for early colorectal cancer. Sci Rep-Uk 2022;12(1) doi 10.1038/s41598-022-20975-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Chen H, Li N, Ren J, Feng X, Lyu Z, Wei L, et al. Participation and yield of a population-based colorectal cancer screening programme in China. Gut 2019;68(8):1450–7 doi 10.1136/gutjnl-2018-317124. [DOI] [PubMed] [Google Scholar]
  • 41.Amin M, Edge S, Greene F, Byrd D, Brookland R, Washington M, et al. AJCC Cancer Staging Manual. 8th ed. New York: Spring; 2017. 252–54 p. [Google Scholar]
  • 42.Han D, Lu X, Shih AH, Nie J, You Q, Xu MM, et al. A Highly Sensitive and Robust Method for Genome-wide 5hmC Profiling of Rare Cell Populations. Mol Cell 2016;63(4):711–9 doi 10.1016/j.molcel.2016.06.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A, Heravi-Moussavi A, et al. Integrative analysis of 111 reference human epigenomes. Nature 2015;518(7539):317–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Cai Q, Zhang Z, Cui X, Zeng C, Cai J, Cai J, et al. PETCH-DB: a Portal for Exploring Tissue-specific and Complex disease-associated 5-Hydroxymethylcytosines. Database (Oxford) 2023;2023 doi 10.1093/database/baad042. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Russo PST, Ferreira GR, Cardozo LE, Burger MC, Arias-Carrasco R, Maruyama SR, et al. CEMiTool: a Bioconductor package for performing comprehensive modular co-expression analyses. BMC Bioinformatics 2018;19(1):56 doi 10.1186/s12859-018-2053-1 10.1186/s12859–018-2053–1 [pii]. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Luck K, Kim DK, Lambourne L, Spirohn K, Begg BE, Bian W, et al. A reference map of the human binary protein interactome. Nature 2020;580(7803):402–8 doi 10.1038/s41586-020-2188-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 2005;102(43):15545–50 doi 10.1073/pnas.0506580102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Benjamini Y, Hochberg Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society Series B (Methodological) 1995;57(1):289–300. [Google Scholar]
  • 49.Weinstein JN, Collisson EA, Mills GB, Shaw KR, Ozenberger BA, Ellrott K, et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet 2013;45(10):1113–20 doi 10.1038/ng.2764. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.R Consortium. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2013. [Google Scholar]
  • 51.Hu W, Li S, Zhang S, Xie B, Zheng M, Sun J, et al. GJA1 is a Prognostic Biomarker and Correlated with Immune Infiltrates in Colorectal Cancer. Cancer management and research 2020;12:11649–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Phan NN, Wang CY, Chen CF, Sun Z, Lai MD, Lin YC. Voltage-gated calcium channels: Novel targets for cancer therapy. Oncol Lett 2017;14(2):2059–74 doi 10.3892/ol.2017.6457. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Kahi CJ, Boland CR, Dominitz JA, Giardiello FM, Johnson DA, Kaltenbach T, et al. Colonoscopy Surveillance After Colorectal Cancer Resection: Recommendations of the US Multi-Society Task Force on Colorectal Cancer. Gastroenterology 2016;150(3):758–68 e11 doi 10.1053/j.gastro.2016.01.001. [DOI] [PubMed] [Google Scholar]
  • 54.Loeve F, Brown ML, Boer R, van Ballegooijen M, van Oortmarssen GJ, Habbema JD. Endoscopic colorectal cancer screening: a cost-saving analysis. J Natl Cancer Inst 2000;92(7):557–63 doi 10.1093/jnci/92.7.557. [DOI] [PubMed] [Google Scholar]
  • 55.Myers RE, Ross EA, Wolf TA, Balshem A, Jepson C, Millner L. Behavioral interventions to increase adherence in colorectal cancer screening. Med Care 1991;29(10):1039–50 doi 10.1097/00005650-199110000-00009. [DOI] [PubMed] [Google Scholar]
  • 56.Bujanda L, Sarasqueta C, Zubiaurre L, Cosme A, Munoz C, Sanchez A, et al. Low adherence to colonoscopy in the screening of first-degree relatives of patients with colorectal cancer. Gut 2007;56(12):1714–8 doi 10.1136/gut.2007.120709. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Allison JE, Tekawa IS, Ransom LJ, Adrain AL. Improving the fecal occult-blood test. N Engl J Med 1996;334(24):1607–8 doi 10.1056/NEJM199606133342414. [DOI] [PubMed] [Google Scholar]
  • 58.Luo H, Zhao Q, Wei W, Zheng L, Yi S, Li G, et al. Circulating tumor DNA methylation profiles enable early diagnosis, prognosis prediction, and screening for colorectal cancer. Sci Transl Med 2020;12(524) doi 10.1126/scitranslmed.aax7533. [DOI] [PubMed] [Google Scholar]
  • 59.Church TR, Wandell M, Lofton-Day C, Mongin SJ, Burger M, Payne SR, et al. Prospective evaluation of methylated SEPT9 in plasma for detection of asymptomatic colorectal cancer. Gut 2014;63(2):317–25 doi 10.1136/gutjnl-2012-304149. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Merker JD, Oxnard GR, Compton C, Diehn M, Hurley P, Lazar AJ, et al. Circulating Tumor DNA Analysis in Patients With Cancer: American Society of Clinical Oncology and College of American Pathologists Joint Review. J Clin Oncol 2018;36(16):1631–41 doi 10.1200/JCO.2017.76.8671. [DOI] [PubMed] [Google Scholar]
  • 61.Chen X, Gole J, Gore A, He Q, Lu M, Min J, et al. Non-invasive early detection of cancer four years before conventional diagnosis using a blood test. Nature communications 2020;11(1):3475 doi 10.1038/s41467-020-17316-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

Data Availability Statement

The raw 5hmC-Seal data have been deposited in the National Omics Data Encyclopedia (NODE) database of the Chinese Academy of Sciences and can be downloaded at this link: https://www.biosino.org/node/project/detail/OEP003666. Additional data generated in this study, including the data underlying the graphs and figures, are available upon formal request to the corresponding authors.

RESOURCES