Abstract
Background
Cell-free DNA is a promising source of biomarkers for early cancer detection and carries tumor-driven methylation and fragmentation features that have achieved good diagnostic efficacy across various cancers. However, there were no studies that detected both of them for esophageal cancer diagnosis.
Methods
In this study, we analyzed the cfDNA methylation and fragmentation markers for accurate esophageal cancer detection. Using cfMeDIP-seq, we profiled 145 plasma samples from healthy controls and esophageal cancer patients. We used multiple algorithms to identify cfDNA methylation markers and fragmentation markers to evaluate the efficacy of early esophageal cancer detection.
Results
Finally, we identified 25 cfDNA methylation and fragmentation markers and constructed a machine-learning model, which achieved a sensitivity of 99% and specificity of 97.82% in an independent cohort. These results indicate that methylation and fragmentomics biomarkers based on cfMeDIP-seq can accurately distinguish esophageal cancer patients from non-tumor controls.
Conclusion
Our study based on cfMeDIP-seq highlights the efficacy of cfDNA methylation and fragmentation histology markers in diagnosing esophageal cancer and provides a direction for subsequent research.
Supplementary Information
The online version contains supplementary material available at 10.1186/s12885-025-15150-4.
Keywords: Cell-free DNA, cfMeDIP-seq, Cancer early detection, Esophageal cancer, Methylation, Fragmentomics
Introduction
Esophageal cancer (EC) is a highly invasive malignant tumor originating from the esophageal epithelium [1]. Globally, esophageal cancer ranks seventh in incidence and sixth in mortality. In China, EC remains a major public health concern, with both incidence and mortality rates in the top six [2]. Although the incidence rate of esophageal cancer has been decreasing over the years, it is still a malignant tumor that poses a serious threat to human health due to its extremely low survival rate [3]. The low 5-year survival rate of esophageal cancer is mainly attributed to delayed diagnosis and limited therapeutic options [4], which underscores the importance of early screening and timely diagnosis in reducing mortality. Current diagnostic methods for the early detection of esophageal cancer mainly involve endoscopy, serum biomarkers, and artificial intelligence, each of which presents limitations [5]. Endoscopy causes an uncomfortable examination experience for patients and is associated with the risk of complications [6]. Serum biomarkers, such as CEA, Cyfra21-1, p53 antibody, SCC-Ag, and VEGF-C, have demonstrated high specificity but exhibit variable and generally low sensitivity in esophageal cancer detection [7]. Although AI-based methods show promise in reducing diagnostic variability and improving accuracy [8], these technologies lack investigational and widespread clinical validation. Consequently, there is an urgent need to develop more accurate, minimally invasive, and clinically applicable biomarkers to improve the early detection of esophageal cancer.
Cell-free DNA (cfDNA) refers to extracellular free DNA present in various body fluids, such as blood, cerebrospinal fluid, urine, pleural fluid, and amniotic fluid, and is mainly released through cell death and active secretion mechanisms [9]. In recent years, cfDNA has attracted widespread attention as a potential biomarker, and several studies have demonstrated that its concentration in tumor patients is higher than that in healthy individuals [10–12]. cfDNA is a highly fragmented double-stranded DNA fragment, and the process of fragmentation does not occur randomly and is related to the structure of nucleosomes and the cell origin, resulting in distinct fragmentation profiles [13]. cfDNA methylation, a remarkable epigenetic modification mechanism, plays a crucial role in cell development, gene expression, and genome stability [14]. cfDNA methylation has been demonstrated as a potential biomarker for early cancer detection, and methylation alterations of cfDNA have been detected across various tumor types [15–17].
The bisulfite conversion method is the gold standard method for DNA methylation studies and the most widely utilized approach in epigenetic studies [18]. However, this method presents several limitations, including high costs, time-consuming, and requires substantial quantities of cfDNA from plasma [19]. Cell-free methylated DNA immunoprecipitation and subsequent high-throughput sequencing (cfMeDIP-seq) is a new technique that provides a solution to the limitations of the bisulfite conversion method through the utilization of an anti-5mC antibody for targeted methylated cfDNA fragments [20]. To date, there have been no reports of comprehensive methylation profiling and fragmentomic analysis in esophageal cancer using cfMeDIP-seq. In this study, we analyzed cfDNA methylation profiles, cfDNA length, end motifs, and breakpoint motifs in a total of 145 plasma samples from EC patients and healthy controls using the cfMeDIP-seq method. Through this approach, we identified new methylation and fragmentomics markers and demonstrated that cfMeDIP-seq-based cfDNA methylation and fragmentomics markers have the potential to detect early esophageal cancer.
Materials and methods
Patient recruitment
We enrolled 36 esophageal cancer patients at Shanxi Cancer Hospital and 110 healthy controls who underwent physical examination during the same period. All patients included in the final analysis had histologically confirmed esophageal squamous cell carcinoma (ESCC). One patient was excluded due to lack of definitive histological confirmation, resulting in a final cohort of 35 patients. All peripheral blood and clinical data of ESCC patients, including age, gender, and stage, were collected before treatment. Healthy controls were enrolled during routine physical examinations, and participants with a history of any malignancy, severe chronic diseases, long-term smoking or alcohol abuse, and recent medication use were excluded. It was approved by the Institutional Research Ethics Committee of the Shanxi Cancer Hospital (KY2023042), and all enrolled participants provided written informed consent.
Sample collection, cfDNA extraction, and cfMeDIP-seq library construction
8 mL peripheral blood was collected using EDTA anticoagulant tubes and stored at 4 °C after collection and processed within 4 h. Whole blood was centrifuged at 1600G for 10 min, and the supernatant was aspirated into a 2 mL EP tube, which was then centrifuged at 16,000 g for 10 min at 4 °C, and the supernatant was again aspirated into a 2 mL EP tube and stored at −80 °C until DNA extraction. cfDNA was extracted from 4 mL plasma using the Qiagen Circulating Nucleic Acids Kit (Qiagen, Cat# 55114) according to the manufacturer’s protocol, with a final elution volume of 55 µL. cfDNA concentration was quantified by the Qubit dsDNA High Sensitivity Assay Kit (Thermo Fisher, Cat# Q33231). The cfMeDIP-seq procedure was referenced from a previously published article [21, 22]. Specifically, 100 ng of cfDNA was used for cfMeDIP library preparation. When the cfDNA quantity was less than 100 ng, λDNA was added to compensate the total input mass to 100 ng. The libraries were prepared using the MagMeDIP kit (Diagenode, C02010021) according to the manufacturer’s protocol, followed by purification with AMPure XP beads. Finally, 150 bp paired-end reads were performed on an Illumina NovaSeq 6000 platform.
Quality control and peak calling
Raw sequencing reads were first quality controlled using FastQC (v0.11.7) and MultiQC (v1.9). Low-quality reads and adapter sequences were subsequently trimmed using Trim Galore (v0.6.3). The processed reads were then aligned to the hg19 reference genome using Bowtie2 (v2.3.4.1), and removed PCR duplicate reads using Samtools (v1.9). Peak calling was performed using MACS2 (v2.1.1) with default parameters. To generate a consensus methylation profile, peak intervals from all samples were combined, and the overlapping peak regions were merged using BEDTools (v2.27.1). The resulting merged BED file was processed using the R reshape2 package to construct a comprehensive methylation matrix, representing the intersecting peak intervals across all samples.
Methylation matrices normalization
To screen differentially methylated regions (DMRs) of esophageal cancer, we performed normalization the methylation matrices derived from tumor and normal plasma samples. First, we screened out suboptimal samples and peak sites based on multiple quality metrics. Since most peaks were enriched in only a few samples, to reduce the impact on subsequent data normalization, we removed peak intervals that were enriched in less than 10% samples. We also removed peaks located in the sex chromosome and mitochondrial genomic region to remove the influence of sex and mitochondrial genes. Finally, we normalized the peak data using the limma package Voom function.
Propensity score matching (PSM) and DMR analysis
Prior to analysis, we used the caret package to randomly select 115 samples (80%) as the discovery set and 30 samples (20%) as the validation set. To minimize the impact of confounding factors, we employed the hold-out method to randomly select 80% samples from the healthy controls and the cancer patients, respectively, as a training set for PSM analysis and differential analysis, while reserving the remaining samples as a test set. This process was repeated 100 times to mitigate the influence of outlier samples and imbalanced factors such as age and gender. PSM is a robust statistical method designed to reduce the interference caused by other biases and confounding factors. The fundamental principle is to calculate the propensity score to find one or more identical or similar samples for the treatment group as a control, as a way to reduce interference from unknown factors. In order to reduce the potential confounding effects from variables such as age and gender, we used the R package MatchIt to match the tumor and control groups with 2:1 propensity scores, and the scores were calculated by a logit regression model. Then the obtained match scores were brought into the limma as covariates. In each iteration, we used the threshold |logFC| >3 and adj.P.Val < 0.0001 to screen for differentially methylated loci. The differentially methylated loci from all 100 iterations were merged, and those appearing more than 90 times were selected as DMRs.
cfDNA fragment length analysis
Fragment lengths were analyzed using the bamToBed software. We first extracted the sequenced reads in which paired-end reads were matched to the same chromosome, and then calculated the length of the cfDNA fragments based on the fragment coordinates. Due to the limitation of the second-generation sequencing technology, the fragments above 500 bp would be lost in the sequencing process, so we removed the fragments longer than 500 bp. To characterize the fragment length distribution, we counted the percentage of cfDNA fragments at each length interval. Subsequently, we calculated the peaks and valleys of the fragment length distribution profile, and the fragment proportions between each peak and valley were calculated and utilized as quantitative features representing the cfDNA fragment length characteristics.
Analysis of cfDNA fragment end motifs and breakpoint motifs
Fragment end motifs, defined as the nucleotide sequences at both 3’ and 5’ termini of cfDNA fragments, can be subdivided into end motifs (EDM) and breakpoint motifs (BPM): EDM was extended from the breakpoint position towards the interior of the cfDNA fragment, and BPM is extending from the breakpoint position towards the fragment ends. In this study, the bamToBed software was used to extract the start and end coordinates of each sequenced fragment. Subsequently, we extracted the genomic coordinates of 6 bp sequences for both EDM and BPM, respectively, and then extracted the nucleotide sequences corresponding to these genomic coordinates using the Samtools software. Finally, we quantified the relative abundance of each motif by counting the percentage distribution across all fragments.
Machine learning algorithm feature screening and model construction
cfMeDIP-seq data is a kind of high dimensionality, high sparsity, and high noise data, and presents significant challenges for traditional regression methods, which are prone to overfitting when analyzing such complex datasets. LASSO (Least Absolute Shrinkage and Selection Operator) is a variable selection method proposed in 1996, which is capable of removing unimportant variables by penalizing the parameter sizes. Compared with traditional regression methods, LASSO regression can better select the features that are most closely related to the disease. Random forest algorithm is an ensemble learning method that aggregates predictions from multiple decision trees. The features and data of each tree in a random forest are randomly selected, so they are not easily affected by certain outliers, resulting in superior predictive accuracy and robustness. In this study, we performed LASSO for variable selection and random forest for model construction using the glmnet package and Randomforest package, respectively.
Statistical analysis
R (version 4.1.1) and RStudio were used for statistical analysis. The Mann–Whitney U–test was used to compare the differences between esophageal cancer and control samples, and the Kruskal test for multiple groups of continuous variables. The ROC curves were plotted using the pROC package, the optimal cutoff value was determined using the Jordon index, and the differences in AUC area between models were analyzed using the DeLong test. P < 0.05 was considered significant.
Results
Patient information and cfDNA concentration distribution
To investigate the cfDNA methylation and fragmentation patterns in esophageal cancer patients, we collected plasma samples from 110 non-tumor controls and 35 esophageal cancer patients, and the clinical information is shown in Table 1, and a detailed clinical information of all participants was included in Supplementary Table S1. First, we analyzed the distribution of cfDNA concentration in different groups, and we found that the cfDNA concentration was significantly higher in the tumor group (Fig. 1A). However, the cfDNA concentration was not significantly different across different tumor stages (Fig. 1B). Age is a risk factor for tumors, so we grouped patients by median age but found no significant difference in cfDNA concentration between different age groups (Fig. 1C). The incidence of esophageal cancer in men is two times higher than that in women, but there was also no significant difference in cfDNA concentration between different genders (Fig. 1D), suggesting that the cfDNA concentration in esophageal cancer is not related to age or gender.
Table 1.
Demographic and clinicopathological features of the datasets
| Discovery Set | Validation Set | |||
|---|---|---|---|---|
| Normal, n = 88 | Tumor, n = 27 | Normal, n = 22 | Tumor, n = 8 | |
| Age (years), mean (SD) | 47.74 (10.92) | 64.19 (7.54) | 50.50 (11.47) | 62.75 (5.09) |
| (Missing, n) | 4 | 0 | 2 | 0 |
| Gender, n/N (%) | ||||
| Female | 45 / 85 (53%) | 9 / 27 (33%) | 10 / 21 (48%) | 4 / 8 (50%) |
| Male | 40 / 85 (47%) | 18 / 27 (67%) | 11 / 21 (52%) | 4 / 8 (50%) |
| (Missing, n) | 3 | 0 | 1 | 0 |
| cfDNA Concentration (ng/mL), mean (SD) | 3.39 (2.74) | 11.39 (5.17) | 2.59 (2.72) | 15.87 (8.20) |
| (Missing, n) | 19 | 8 | 4 | 2 |
| Stage, n/N (%) | ||||
| StageI | 1 / 27 (3.7%) | 2 / 8 (25%) | ||
| StageII | 15 / 27 (55.6%) | 3 / 8 (37.5%) | ||
| StageIII | 11 / 27 (40.7%) | 3 / 8 (37.5%) | ||
| Reads number (millions), mean (SD) | 30.91 (5.45) | 25.05 (6.70) | 32.64 (7.26) | 27.38 (6.54) |
| Histological_subtype, n/N (%) | ||||
| Squamous cell carcinoma | 27 / 27 (100%) | 8 / 8 (100%) | ||
| Anatomic_location, n/N (%) | ||||
| Lower third | 4 / 26 (15.3%) | 3 / 8 (37.5%) | ||
| Middle third | 20 / 26 (77%) | 5 / 8 (62.5%) | ||
| Upper third | 2 / 26 (7.7%) | |||
| (Missing, n) | 1 | 0 | ||
Data are shown as mean (SD), n/N (%), or n
Fig. 1.
Distribution of cfDNA concentration among different groups. (A) Difference in cfDNA concentration between the tumor group and the normal control group. (B) Difference in cfDNA concentration between normal control and different tumor subgroups. (C) Difference in cfDNA concentration between different age groups. (D) Difference in cfDNA concentration between different genders. The y-axis shows cfDNA concentration, the median is shown as the line in the box plots, the box ranges from the 25th to the 75th percentile, and the p-value was determined by the Mann-Whitney U test
Identification of methylation markers for esophageal cancer
To identify methylation markers for esophageal cancer, we first analyzed DMRs between the tumor and non-tumor groups. Because of the imbalance in age and gender distribution between the tumor and non-tumor groups, we used the propensity score matching method to remove the effects of age and gender. We obtained 34 DMRs, of which 27 were hypermethylated and 7 were hypomethylated (Fig. 2A). Most of the hypermethylated DMRs were localized in the distal interstitial region (77.78%), and less than 10% were localized in the promoter region, while 14.29% of the hypomethylated DMRs were localized in the distal interstitial region, and 14.29% were localized in the promoter region (Fig. 2B). The distribution of these DMRs in CpG islands was significantly different, with most of the hypermethylated DMRs located in the CpG inter region (75%) and no DMRs located in the CpG island region, and 7.69% of the hypomethylated DMRs located in the CpG inter region, and 46.15% located in the CpG island region (Fig. 2C).
Fig. 2.
Screening of specific methylation markers for esophageal cancer. (A) Heatmap of 34 DMRs identified in plasma cfDNA of 100 normal controls and 35 esophageal cancer patients; (B) Percentage distribution of hypermethylated and hypomethylated DMRs in genomic locations; (C) Percentage distribution of gene types associated with hypermethylated and hypomethylated DMRs; (D) GO enrichment analysis of genes associated with hypermethylation regions. (E) GO enrichment analysis of genes associated with hypomethylation regions. For the analysis results, the size of the dots represents the number of differential genes in the pathway, the color of the dots represents the p-value, the horizontal coordinate is the percentage of differential genes in the pathway, and the vertical coordinate is the different KEGG pathways
To analyze the functions of the genes associated with these DMRs, we performed enrichment analyses of the hypermethylated and hypomethylated regions, respectively. The results showed that the genes’ molecular function associated with hypermethylated DMRs were enriched in the tau protein and exonuclease related pathway (Fig. 2D), the genes associated with hypomethylated DMRs were enriched in the nucleic- and nuclease- related pathway (Fig. 2E). These findings suggest that the DMRs may contribute to esophageal cancer pathogenesis through tau-, nucleic- and nuclease- related pathway.
Construction and validation of the esophageal cancer early screening model
To screen the hub DMRs, we employed LASSO regression with 10-fold cross-validation to analyze the 34 identified DMRs. Using the minimum λ-value, we finally obtained 13 methylation markers (Fig. 3A). Subsequently, we developed a random forest-based model for early esophageal cancer detection using these 13 DMRs. The results showed that the model could accurately discriminate esophageal cancer patients from non-tumor controls with an AUC of 0.9876 (0.984–0.9913), sensitivity of 96.2%, and specificity of 92.94% in the test set, and an AUC of 0.9133 (0.8988–0.9278), sensitivity of 82.37% and specificity of 95.5% in the validation set (Fig. 3B).
Fig. 3.
Model construction for esophageal cancer detection. (A) Screening of non-zero coefficient DMRs by minimum λ value using ten-fold cross-validation in LASSO; (B) ROC curves of esophageal cancer models to assess the predictive ability of the models to discriminate between tumors and normal controls; (C) Model-predicted risk score distribution in different clinical groups; (D) Model-predicted risk score distribution in different stage periods; (E) Efficacy of the model in distinguishing between patients with different stages. (F) Efficacy of a cfDNA methylation-based model with cfDNA concentration and various clinical traits to distinguish non-tumor controls from esophageal cancer.
Since there were differences in gender and age between samples, to verify the independence of the model, we performed a clinical subgroup analysis to demonstrate that the performance of the model was not affected by clinical variables. We grouped age and cfDNA concentration by median and counted differences in model risk scores between groups. The results showed that there were no significant differences in modeled risk values within any of the age, sex, and cfDNA concentration groups (Fig. 3C), and that risk did not change with increasing staging (Fig. 3D). Next, we analyzed the efficacy of the model to distinguish non-tumor controls from patients with different stages. The results showed that the AUC of the model to detect early-stage and late-stage patients was 0.9881 and 0.9619, respectively, and there was no significant difference in the predictive efficacy of the model between the staging periods (Fig. 3E). Therefore, the model was independent of clinical traits. Finally, we compared the efficacy of the cfDNA methylation-based model with cfDNA concentration and various clinical traits to distinguish non-tumor controls from esophageal cancer. The results showed that the cfDNA methylation-based model achieved the highest accuracy with an AUC of 0.9703, with a sensitivity of 88.54% and a specificity of 91.08%. cfDNA concentration also accurately differentiated non-tumor control from esophageal cancer samples, with an AUC value of 0.9602 (Fig. 3F). These results demonstrate that our methylation model maintains consistent performance across clinical variables.
Methylated cfDNA fragment length characterization
Previous studies showed that the cfDNA fragment length of tumor samples was lower than that of non-tumor control samples [23, 24]. To characterize the fragment length profile of methylated cfDNA in esophageal cancer, we analyzed the cfDNA length distribution of each patient. The results showed that there was a significant difference in cfDNA fragment length distribution between tumor samples and non-tumor samples (Fig. 4A), with non-tumor controls exhibiting a higher proportion of short fragments (Fig. 4B). Then, we calculated the proportion between each peak and valley of the cfDNA length distribution and analyzed the differences between tumor and non-tumor groups in the training set. The result showed that the proportion of cfDNA fragments length differed significantly between the tumor and non-tumor groups, especially the ultra-short fragments (< 60 bp), whose proportion was significantly higher in non-tumor samples compared to tumor samples (Fig. 4C), and achieved an AUC of 0.941 for distinguishing tumor from non-tumor samples (Fig. 4D). These findings demonstrated that the cfDNA fragments’ length proportion represents a robust biomarker for differentiating esophageal cancer patients from non-tumor controls.
Fig. 4.
Analysis of cfDNA fragment length characteristics among normal and tumor groups. (A) Demonstration of cfDNA length distribution between normal and tumor groups; (B) Further zoom-in analysis of the frequency of cfDNA fragment distribution below 100 bp; (C) Comparison of the proportion of cfDNA fragments below 60 bp between normal and tumor groups; (D) ROC curves based on the proportion of short cfDNA fragments for evaluating the ability of the model to distinguish between normal and tumor groups
Characterization of end motifs and breakpoint motifs of methylated cfDNA fragments
Previous studies showed that the cfDNA fragment end motifs and breakpoint motifs exhibit tumor-specific patterns across various cancer types [25]. To investigate their ability in esophageal cancer detection, we analyzed the differences between tumor and non-tumor groups in the training set, and then we selected the top 50 end motifs and breakpoint motifs for further analysis. The results showed that cfDNA fragment breakpoint motifs were significantly different between tumor and non-tumor groups, the majority of the top 50 breakpoint motifs showed higher frequencies in tumors (Fig. 5A), and only half of the breakpoint motifs with a T header showed lower frequencies in the tumor group. Among the breakpoint motifs, GCTCAC motifs differed significantly between the non-tumor and tumor groups, with a much lower percentage in the non-tumor group than in the tumor group (Fig. 5B), and achieved an AUC of 0.9208 for distinguishing tumor from non-tumor groups (Fig. 5C). Similarly, with the breakpoint motifs, the end motifs also differed significantly between the tumor and non-tumor groups, with the majority of the end motifs in the top 50 showing higher frequencies in tumors (Figure S1A), especially the CGATCT motifs (Figure S1B), whose achieved an AUC of 0.9164 for distinguishing tumor from non-tumor groups (Figure S1C). Therefore, cfDNA fragment end motifs and breakpoint motifs are significantly different between tumor and non-tumor groups, indicating their potential as a marker for esophageal cancer detection.
Fig. 5.
Analysis of fragment motif characteristics, fragmentomics models, and combined model among normal and tumor groups. (A) Heatmap of top-50 BPM identified in normal controls and esophageal cancer patients. (B) Comparison of the frequency of specific motifs (motif GCTCAC) in the normal and tumor groups, with a p-value of 3e-14 by Wilcoxon test, indicating a significant difference between the two groups. (C) ROC curves were used to assess the performance of diagnostic models based on GCTCAC motifs. (D) ROC curves to assess the performance of Fragmentomics models in the test set and validation set. (E) Predictive performance of different models in distinguishing esophageal cancer from non-tumor controls
To evaluate the diagnostic performance of cfDNA fragmentation features compared to methylation features in esophageal cancer detection, we extracted the top 50 features from each fragmentomic category: fragment length, end motifs, and breakpoint motifs, yielding a total of 150 features. These 150 features were then filtered using the LASSO algorithm, and finally, 18 cfDNA fragmentomic features were obtained, and random forest modeling was performed using these 18 features. The fragmentomic-based model achieved an AUC of 0.987 (95% CI: 0.9836–0.9904) in the test set, with a sensitivity of 95.6% and a specificity of 92.94%. In the validation set, the model achieved an AUC of 0.9973 (95% CI: 0.9963–0.9982), with a sensitivity of 98.5% and a specificity of 96.95% (Fig. 5D).
Subsequently, we analyze the performance of cfDNA fragmentation profiles combined with cfDNA methylation profiles in esophageal cancer diagnosis. First, we extracted 34 cfDNA methylation features, 50 cfDNA fragment length features, 50 terminal motif features, and 50 breakpoint motif features. Then, we used the LASSO algorithm to select the optimal features, and finally, we identified 25 features, which contained 9 methylation features, 2 cfDNA fragment length features, 6 end motif features, and 8 breakpoint motif features. These features also demonstrated excellent capability in distinguishing esophageal cancer from non-tumor controls, with the AUC values for distinguishing esophageal cancer from non-tumor controls being greater than 0.9 for the breakpoint motif feature GGCTCA, TGTTAG and TTTAGT (Supplementary Table S2). Random forest modeling using these 25 features showed the best performance of the combined feature model achieved an AUC of 0.9924 (0.9909–0.9938) with a sensitivity of 98.23% and a specificity of 93.77% for distinguishing esophageal cancer from non-cancer controls. Notably, the integrated model significantly outperformed the model with methylation or fragmentomic features alone (Fig. 5E). Due to the imbalance in terms of sample size, age, and other clinical information, it is crucial to assess model performance using metrics that are robust to class imbalance. Therefore, we analyzed the Matthew’s Correlation Coefficient (MCC), F1-score, and Balancing Accuracy on both the test and validation sets. The results of these metrics are shown in Supplementary Table S3. The high MCC value, F1 score, and Balanced Accuracy remain robust when class imbalance is taken into account. This confirms that the performance of our model is not an artefact of the dataset distribution, but rather reflects its true predictive power.
Discussion
In this study, we analyzed the methylation profiles, cfDNA length, end motifs, and breakpoint motifs of plasma cfDNA using the cfMeDIP-seq method, and we identified 25 cfDNA methylation markers and fragmentomics markers. By integrating these markers, we constructed a random forest model, which achieved an AUC of 0.9924 (0.9909–0.9938), demonstrating both high sensitivity and specificity for esophageal cancer detection.
Liquid biopsy has shown good potential in early cancer detection or precancerous lesion identification, offering a minimally invasive alternative and accurate than traditional diagnostic methods [26]. cfDNA is the most promising source of biomarkers in liquid biopsy at present, which not only carries the genetic and epigenetic information but also the fragmentomic information reflecting cellular origins and release mechanisms [27]. Furthermore, the fragmentomic patterns of cfDNA have been demonstrated to provide additional discriminative power for tumor detection [28]. DNA methylation is an epigenetic alteration that is cancer-specific and can identify the target tissue origin of tumors. Therefore, methylation patterns can provide high predictive accuracy for the detection of cancer [29]. cfMeDIP method captures methylated fragments by anti-5mC antibody, which tends to enrich for low CpG-density regions (< 5 CpG/100 bp) and accounts for 95% of the genome [30], so cfMeDIP has a high genome coverage and enables provides more genomic methylation variant information.
cfDNA concentration serves as a dynamic indicator of physiological and pathological states and significant elevations in various disease conditions [31]. In this study, the cfDNA concentration of esophageal cancer patients was significantly higher than that of healthy subjects. Notably, we observed a tendency towards elevated cfDNA concentrations even in early-stage esophageal cancer cases. However, cfDNA concentration was independent of age and gender.
cfDNA fragmentomics reflects physiological and pathological processes and has become a promising method for multi-cancer early detection [25]. cfDNA is a kind of highly fragmented DNA, with a primary peak fragment length of 167 bp and a secondary peak fragment length of 147 bp [32]. The fragmentation of cfDNA is not a random process but is intricately associated with nucleosomes, DNA nucleases, and cellular origins. In plasma, cfDNA binds to histones to form nucleosomes, where the core DNA region spans 147 bp and the linker region consists of 20 bp [33]. DNase1L3 and DFFB have been identified as crucial nucleases for the generation of cfDNA in vivo [34], with the capacity to cleave cfDNA at the nucleosome linker regions, thereby producing 167 bp and 147 bp cfDNA fragments, which correspond precisely to the size profile of plasma cfDNA. Nucleosomes protect cfDNA from degradation, whereas regions such as transcription factor binding sites (TFBS) and chromatin open regions exhibit higher degradation due to the absence of histone protection, resulting in shorter cfDNA fragments. Notably, tumor-derived cfDNA fragments are typically shorter than those from normal cells, a feature that has become a focal point in early cancer detection and prognosis monitoring. For instance, Dimitrios Mathios et al. [35] developed a lung cancer detection model based on cfDNA fragment size distribution and frequency, achieving AUCs of 0.78, 0.95, 0.94, and 0.95 for distinguishing Stage I–IV lung cancer patients from non-cancer controls, respectively. In this study, differential fragment length analysis revealed that the proportion of ultra-short fragments could accurately discriminate esophageal cancer from non-tumor controls, with an AUC of 0.941. However, contrary to the study based on whole-genome sequencing (WGS) [36], tumor patients exhibited significantly lower proportions of short fragments than healthy controls, and this may be related to methylation modifications. Previous studies demonstrate that methylation influences cfDNA fragmentation via chromatin accessibility and DNA nuclease activity [37, 38]. Global hypomethylation and promoter hypermethylation are hallmarks of many cancers, with hypomethylation altering chromatin openness. For example, Wang et al. reported that hypomethylated regions in breast cancer tend to yield shorter cfDNA fragments detectable via advanced methylation profiling [39]. Furthermore, DNA nucleases preferentially cleave unmethylated DNA, leading to excessive degradation of tumor-derived cfDNA and reduced capture efficiency of 5-methylcytosine antibodies. This ultimately leads to a decrease in the proportion of ultra-short fragments in cancer patients compared to controls.
cfDNA end motifs have been demonstrated to be associated with DNA nucleases. Different nucleases generate distinct end motifs: DFFB produces fragments that exhibit a preference for A ends, DNASE1L3 exhibits a preference for C ends, and DNASE1 exhibits a preference for T ends [32]. These end motifs exhibit significant variations across pathological states and enable early tumor detection. For instance, Siwei Wang et al. [40] constructed a stacked model for lung cancer diagnosis using five cfDNA features, including Fragment Size Coverage (FSC), Fragment Size Distribution (FSD), EnD Motif (EDM), BreakPoint Motif (BPM), and Copy Number Variation (CNV), achieving an AUC of 0.985. Notably, EDM and BPM alone discriminated lung cancer from controls with AUCs of 0.953 and 0.973, respectively. Similarly, Yang Liu et al. [41] calculated motif-normalized Shannon entropy for circulating mitochondrial DNA, achieving AUCs >0.9322 for distinguishing each tumor from non-tumor controls. In our study, both BPM and EDM accurately distinguished esophageal cancer from controls, with single-feature AUCs of 0.9208 (BPM motif: GCTCAC) and 0.9164 (EDM motif: CGATCT).
cfDNA fragmentomics is an emerging tool for cancer detection [25], and various combinations of fragmentomics features have been extensively studied. For instance, the DELFI algorithm developed by Cristiano et al. [23] evaluated genome-wide cfDNA fragmentation patterns, achieving an AUC of 0.94 for multi-cancer detection. Similarly, Hua Bao et al. [42] analyzed five cfDNA features (FSC, FSD, 6-bp EDM, 6-bp BPM, and CNV) across 381 primary liver cancer (PLC), 298 colorectal cancer (CRC), 292 lung adenocarcinoma (LUAD), and 243 healthy volunteers without cancer, and developed a stacked ensemble model with sensitivities of 93.2–97.1% and specificities of 85.9–98.2% in independent cohorts. In our study, a model incorporating cfDNA fragment length, EDM, and BPM achieved superior performance (AUC = 0.9913) for esophageal cancer detection, outperforming single-feature approaches in both sensitivity and specificity.
DNA methylation alterations can reflect cellular accumulation of disease risk factors, and aberrant methylation status is closely associated with the development of a variety of malignant tumors [43]. cfDNA methylation has been identified as a promising marker for early cancer detection and has shown accurate and reliable results in a variety of tumors. For instance, Luo et al. developed an early hepatocellular carcinoma (HCC) screening model based on 2321 differentially methylated blocks (DMBs) using the bisulfite conversion method. This model achieved 84% sensitivity and 96% specificity in an independently validated cohort, outperforming serum alpha-fetoprotein (AFP) in distinguishing HCC patients from controls [44]. Zhao et al. identified 149 methylation features from cfDNA using the bisulfite conversion method and developed a colorectal cancer screening model by a machine learning algorithm, which showed superior sensitivity compared to conventional biomarkers, including CEA, CRP, and CA19-9 [45]. In this study, we constructed an early screening model for esophageal cancer using methylation profiles, and the sensitivity of the model was 82.37% and the specificity was 95.5%, indicating high diagnostic accuracy. Notably, the random forest model integrating both methylation and fragmentomics features demonstrated better diagnostic performance compared to models utilizing methylation or fragmentomic features, and cfDNA fragmentomics features reflect a more important role than methylation features that cannot be ignored in the clinical early screening of esophageal cancer.
There are some limitations in this study. First, Significant regional disparities exist in the incidence of esophageal cancer. For instance, Central China exhibits the highest incidence and mortality rates, and the incidence and mortality rates in rural areas are approximately twice those in urban regions. Our research is based on a series of analyses of the methylation status of cfDNA in esophageal cancer patients, who have similar causes of disease due to originating from the same region, so a multicenter study should also be carried out at a later stage to validate our results. Second, we only collected samples from healthy controls and esophageal cancer patients, and there was an imbalance in the number and age between healthy controls and esophageal cancer patients. The lack of samples from patients with non-tumor-related esophageal diseases makes it impossible to evaluate the applicability of the current model to patients with non-tumor-related diseases, which may affect its practical value in real clinical scenarios. Subsequent studies need to expand the sample size and include more diverse clinical samples for validation. Finally, the biological significance of most of the 25 markers we screened remains unclear, and there are very few relevant studies that need to be further investigated for their role in esophageal cancer development.
Conclusion
In summary, we validated the performance of plasma cfDNA methylation profiling and fragmentomics profiling in the early detection of esophageal cancer based on the cfMeDIP-seq method. This study not only proved the promise of cfDNA methylation markers and fragmentomics markers in the early detection of esophageal cancer but also provided a valuable direction for subsequent studies.
Supplementary Information
Additional file 1: Figure S1. Analysis of end motif characteristics among normal and esophageal cancer groups. Figure.S1. Analysis of end motif characteristics among normal and tumor groups. (A) Heatmap of top-50 EDM identified in normal controls and esophageal cancer patients. (B) Comparison of the frequencies of motif CGATCT in the normal and tumor groups. (C) ROC curves were used to assess the performance of the diagnostic model based on CGATCT motifs.
Additional file 2: Table S1. Information of all participants in this study. Table S2. The AUC, sensitivity and specificity of the 25 markers. T able S3. The MCC, F1-score, Balanced Accuracy, and other metrics of the Methylation Model, Fragmentomics Model, and Combined Model.
Acknowledgements
We thank Shanxi Provincial Cancer Hospital for providing clinical samples. We thank the members of the technical-assistance team at the Institute of Health and Medical Technology, Hefei Institutes of Physical Science, Chinese Academy of Sciences.
Abbreviations
- DMR
Differentially methylated region
- cfDNA
cell-free DNA
- cfMeDIP-seq
cell-free methylated DNA immunoprecipitation and high-throughput sequencing
- AUC
Area Under the receiver operating Curve
- ROC
Receiver Operating Characteristic
- GO
Gene Ontology
- LASSO
Least Absolute Shrinkage and Selection Operator
Authors’ contributions
W.Q.and N wrote the main manuscript text and W.Z. and L.W.prepared figures. All authors reviewed the manuscript.
Funding
Global Selection Program Funding in 2024, LX2024006, Institute of Health and Medicine, Hefei Comprehensive National Science Center.
Data availability
The raw sequencing data of this study are available from the corresponding author upon reasonable request. The code is available on GitHub (https://github.com/qijian5503/cfmedip\_seq\_for\_EC).
Declarations
Ethics approval and consent to participate
All patients gave informed written consent and the project was in compliance with the Helsinki Declaration. Approval of the research protocol by an Institutional Reviewer Board: The Institutional Research Ethics Committee of the Shanxi Cancer Hospital approved this study.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Jian Qi, Email: qijian@mail.ustc.edu.cn.
Jinfu Nie, Email: jinfunie@163.com.
References
- 1.Bray F, Laversanne M, Sung H, Ferlay J, Siegel RL, Soerjomataram I, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. Cancer J Clin. 2022;74:229–63. 10.3322/caac.21834. [DOI] [PubMed] [Google Scholar]
- 2.Xia C, Dong X, Li H, Cao M, Sun D, He S, et al. Cancer statistics in China and united States, 2022: profiles, trends, and determinants. Chin Med J. 2022;135:584. 10.1097/CM9.0000000000002108. [DOI] [PMC free article] [PubMed]
- 3.Siegel RL, Kratzer TB, Giaquinto AN, Sung H, Jemal A. Cancer statistics, 2025. CA Cancer J Clin. 2025;75:10–45. 10.3322/caac.21871. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Joseph A, Raja S, Kamath S, Jang S, Allende D, McNamara M, et al. Esophageal adenocarcinoma: a dire need for early detection and treatment. CCJM. 2022;89:269–79. 10.3949/ccjm.89a.21053. [DOI] [PubMed] [Google Scholar]
- 5.Qu H-T, Li Q, Hao L, Ni Y-J, Luan W-Y, Yang Z, et al. Esophageal cancer screening, early detection and treatment: current insights and future directions. World J Gastrointest Oncol. 2024;16:1180–91. 10.4251/wjgo.v16.i4.1180. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Weiner BC. Complications of routine diagnostic upper endoscopy. Gastrointest Endosc. 1987;33:53. 10.1016/S0016-5107(87)71496-5. [DOI] [PubMed] [Google Scholar]
- 7.Zhang J, Zhu Z, Liu Y, Jin X, Xu Z, Yu Q, et al. Diagnostic value of multiple tumor markers for patients with esophageal carcinoma. PLoS ONE. 2015;10:e0116951. 10.1371/journal.pone.0116951. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Mori Y, Kudo S, Mohmed HEN, Misawa M, Ogata N, Itoh H, et al. Artificial intelligence and upper gastrointestinal endoscopy: current status and future perspective. Dig Endosc. 2019;31:378–88. 10.1111/den.13317. [DOI] [PubMed] [Google Scholar]
- 9.Sharma M, Verma RK, Kumar S, Kumar V. Computational challenges in detection of cancer using cell-free DNA methylation. Comput Struct Biotechnol J. 2022;20:26–39. 10.1016/j.csbj.2021.12.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Abe T, Nakashima C, Sato A, Harada Y, Sueoka E, Kimura S, et al. Origin of circulating free DNA in patients with lung cancer. PLoS ONE. 2020;15:e0235611. 10.1371/journal.pone.0235611. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Husain H, Melnikova VO, Kosco K, Woodward B, More S, Pingle SC, et al. Monitoring daily dynamics of early tumor response to targeted therapy by detecting circulating tumor DNA in urine. Clin Cancer Res. 2017;23:4716–23. 10.1158/1078-0432.CCR-17-0454. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Ikoma D, Ichikawa D, Ueda Y, Tani N, Tomita H, Sai S, et al. Circulating tumor cells and aberrant methylation as tumor markers in patients with esophageal cancer. Anticancer Res. 2007;27:535–9. [PubMed] [Google Scholar]
- 13.Ding SC, Chan RWY, Peng W, Huang L, Zhou Z, Hu X, et al. Jagged ends on multinucleosomal cell-free DNA serve as a biomarker for nuclease activity and systemic lupus erythematosus. Clin Chem. 2022;68:917–26. 10.1093/clinchem/hvac050. [DOI] [PubMed] [Google Scholar]
- 14.van Eijk KR, de Jong S, Boks MP, Langeveld T, Colas F, Veldink JH, et al. Genetic analysis of DNA methylation and gene expression levels in whole blood of healthy human subjects. BMC Genomics. 2012;13:636. 10.1186/1471-2164-13-636. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Liu MC, Oxnard GR, Klein EA, Swanton C, Seiden MV. Sensitive and specific multi-cancer detection and localization using methylation signatures in cell-free DNA. Ann Oncol. 2020;31:745–59. 10.1016/j.annonc.2020.02.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Deng Z, Ji Y, Han B, Tan Z, Ren Y, Gao J, et al. Early detection of hepatocellular carcinoma via no end-repair enzymatic methylation sequencing of cell-free DNA and pre-trained neural network. Genome Med. 2023;15:93. 10.1186/s13073-023-01238-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Wang T, Li P, Qi Q, Zhang S, Xie Y, Wang J, et al. A multiplex blood-based assay targeting DNA methylation in PBMCs enables early detection of breast cancer. Nat Commun. 2023;14:4724. 10.1038/s41467-023-40389-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Leontiou CA, Hadjidaniel MD, Mina P, Antoniou P, Ioannides M, Patsalis PC. Bisulfite conversion of DNA: performance comparison of different kits and methylation quantitation of epigenetic biomarkers that have the potential to be used in non-invasive prenatal testing. PLoS ONE. 2015;10:e0135058. 10.1371/journal.pone.0135058. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Fisher T, Ford CE, Warton K. Recovery efficiency of cell-free DNA after bisulfite conversion. Clin Chem. 2022;68:1219–20. 10.1093/clinchem/hvac107. [DOI] [PubMed] [Google Scholar]
- 20.Nuzzo PV, Berchuck JE, Korthauer K, Spisak S, Nassar AH, Abou Alaiwi S, et al. Detection of renal cell carcinoma using plasma and urine cell-free DNA methylomes. Nat Med. 2020;26:1041–3. 10.1038/s41591-020-0933-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Shen SY, Singhania R, Fehringer G, Chakravarthy A, Roehrl MHA, Chadwick D, et al. Sensitive tumour detection and classification using plasma cell-free DNA methylomes. Nature. 2018;563:579–83. 10.1038/s41586-018-0703-0. [DOI] [PubMed] [Google Scholar]
- 22.Qi J, Hong B, Wang S, Wang J, Fang J, Sun R et al. Plasma cell-free DNA methylome-based liquid biopsy for accurate gastric cancer detection. Cancer Sci. n/a n/a. 10.1111/cas.16284. [DOI] [PMC free article] [PubMed]
- 23.Cristiano S, Leal A, Phallen J, Fiksel J, Adleff V, Bruhm DC, et al. Genome-wide cell-free DNA fragmentation in patients with cancer. Nature. 2019;570:385–9. 10.1038/s41586-019-1272-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Underhill HR, Kitzman JO, Hellwig S, Welker NC, Daza R, Baker DN, et al. Fragment length of Circulating tumor DNA. PLoS Genet. 2016;12:e1006162. 10.1371/journal.pgen.1006162. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Chiu RWK, Heitzer E, Lo YMD, Mouliere F, Tsui DWY, Cell-Free DNA, Fragmentomics. The new omics on the block. Clin Chem. 2020;66:1480–4. 10.1093/clinchem/hvaa258. [DOI] [PubMed] [Google Scholar]
- 26.Gao Q, Zeng Q, Wang Z, Li C, Xu Y, Cui P, et al. Circulating cell-free DNA for cancer early detection. Innov (Camb). 2022;3:100259. 10.1016/j.xinn.2022.100259. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Tan WY, Nagabhyrava S, Ang-Olson O, Das P, Ladel L, Sailo B, et al. Translation of epigenetics in Cell-Free DNA liquid biopsy technology and precision oncology. Curr Issues Mol Biol. 2024;46:6533–65. 10.3390/cimb46070390. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Qi T, Pan M, Shi H, Wang L, Bai Y, Ge Q, Cell-Free DNA. Fragmentomics: The Novel Promising Biomarker. IJMS. 2023;24:1503. 10.3390/ijms24021503. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Klutstein M, Nejman D, Greenfield R, Cedar H. DNA methylation in cancer and aging. Cancer Res. 2016;76:3446–50. 10.1158/0008-5472.CAN-15-3278. [DOI] [PubMed] [Google Scholar]
- 30.Nair SS, Coolen MW, Stirzaker C, Song JZ, Statham AL, Strbenac D, et al. Comparison of methyl-DNA Immunoprecipitation (MeDIP) and methyl-CpG binding domain (MBD) protein capture for genome-wide DNA methylation analysis reveal CpG sequence coverage bias. Epigenetics. 2011;6:34–44. 10.4161/epi.6.1.13313. [DOI] [PubMed] [Google Scholar]
- 31.Stejskal P, Goodarzi H, Srovnal J, Hajdúch M, Van ’T, Veer LJ, Magbanua MJM. Circulating tumor nucleic acids: biology, release mechanisms, and clinical relevance. Mol Cancer. 2023;22:15. 10.1186/s12943-022-01710-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Han DSC, Ni M, Chan RWY, Chan VWH, Lui KO, Chiu RWK, et al. The biology of Cell-free DNA fragmentation and the roles of DNASE1, DNASE1L3, and DFFB. Am J Hum Genet. 2020;106:202–14. 10.1016/j.ajhg.2020.01.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Struhl K, Segal E. Determinants of nucleosome positioning. Nat Struct Mol Biol. 2013;20:267–73. 10.1038/nsmb.2506. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Han DSC, Lo YMD. The nexus of CfDNA and nuclease biology. Trends Genet. 2021;37:758–70. 10.1016/j.tig.2021.04.005. [DOI] [PubMed] [Google Scholar]
- 35.Mathios D. Detection and characterization of lung cancer using cell-free DNA fragmentomes.:14. 10.1038/s41467-021-24994-w. [DOI] [PMC free article] [PubMed]
- 36.Ganesamoorthy D, Robertson AJ, Chen W, Hall MB, Cao MD, Ferguson K, et al. Whole genome deep sequencing analysis of cell-free DNA in samples with low tumour content. BMC Cancer. 2022;22:85. 10.1186/s12885-021-09160-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Wan JCM, Massie C, Garcia-Corbacho J, Mouliere F, Brenton JD, Caldas C, et al. Liquid biopsies come of age: towards implementation of circulating tumour DNA. Nat Rev Cancer. 2017;17:223–38. 10.1038/nrc.2017.7. [DOI] [PubMed] [Google Scholar]
- 38.Zhu G, Jiang P, Li X, Peng W, Choy LYL, Yu SCY, et al. Methylation-Associated nucleosomal patterns of Cell-Free DNA in cancer patients and pregnant women. Clin Chem. 2024;70:1355–65. 10.1093/clinchem/hvae118. [DOI] [PubMed] [Google Scholar]
- 39.Wang J, Niu Y, Yang M, Shu L, Wang H, Wu X, et al. Altered CfDNA fragmentation profile in hypomethylated regions as diagnostic markers in breast cancer. Epigenetics Chromatin. 2023;16:33. 10.1186/s13072-023-00508-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Wang S, Meng F, Li M, Bao H, Chen X, Zhu M, et al. Multidimensional cell-free DNA fragmentomic assay for detection of early-stage lung cancer. Am J Respir Crit Care Med. 2023;207:1203–13. 10.1164/rccm.202109-2019OC. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Liu Y, Peng F, Wang S, Jiao H, Dang M, Zhou K, et al. Aberrant fragmentomic features of Circulating cell-free mitochondrial DNA as novel biomarkers for multi-cancer detection. EMBO Mol Med. 2024;16:3169–83. 10.1038/s44321-024-00163-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Bao H, Wang Z, Ma X, Guo W, Zhang X, Tang W, et al. Letter to the editor: an ultra-sensitive assay using cell-free DNA fragmentomics for multi-cancer early detection. Mol Cancer. 2022;21:129. 10.1186/s12943-022-01594-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Liu C, Tang H, Hu N, Li T. Methylomics and cancer: the current state of methylation profiling and marker development for clinical care. Cancer Cell Int. 2023;23:242. 10.1186/s12935-023-03074-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Luo B, Ma F, Liu H, Hu J, Rao L, Liu C, et al. Cell-free DNA methylation markers for differential diagnosis of hepatocellular carcinoma. BMC Med. 2022;20:8. 10.1186/s12916-021-02201-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Zhao F, Bai P, Xu J, Li Z, Muhammad S, Li D, et al. Efficacy of cell-free DNA methylation-based blood test for colorectal cancer screening in high-risk population: a prospective cohort study. Mol Cancer. 2023;22:157. 10.1186/s12943-023-01866-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Additional file 1: Figure S1. Analysis of end motif characteristics among normal and esophageal cancer groups. Figure.S1. Analysis of end motif characteristics among normal and tumor groups. (A) Heatmap of top-50 EDM identified in normal controls and esophageal cancer patients. (B) Comparison of the frequencies of motif CGATCT in the normal and tumor groups. (C) ROC curves were used to assess the performance of the diagnostic model based on CGATCT motifs.
Additional file 2: Table S1. Information of all participants in this study. Table S2. The AUC, sensitivity and specificity of the 25 markers. T able S3. The MCC, F1-score, Balanced Accuracy, and other metrics of the Methylation Model, Fragmentomics Model, and Combined Model.
Data Availability Statement
The raw sequencing data of this study are available from the corresponding author upon reasonable request. The code is available on GitHub (https://github.com/qijian5503/cfmedip\_seq\_for\_EC).





