ABSTRACT
Leukocyte cell proportion changes affect the detection of cancer-associated aberrant DNA methylation alterations in peripheral blood samples. We aimed to detect cellular DNA methylation changes in ovarian cancer (OVC) blood samples avoiding the above-mentioned cell-composition effects. Based on the within-sample relative methylation orderings (RMOs) of CpG loci in leukocyte subtypes, we developed the Ref-RMO method to detect aberrant methylation alterations from OVC blood samples. Stable CpG pairs with consistent RMOs in different leukocyte subtypes were determined, more than 99% of which retained their RMO patterns in peripheral whole blood (PWB) in independent datasets. Based on the stable CpG pairs, significantly reversed CpG pairs were detected from OVC PWB samples, which were relative to clinical information such as age, subtype, grade, stage, or CA125 level. Results showed 439 CpG loci were determined to be significant differential DNA methylations between OVC and healthy blood samples. They were mainly enriched in KEGG pathways, such as cytokine-cytokine receptor interaction, apoptosis, proteoglycans in cancer, and immune-associated Gene Ontology terms. STRING analysis showed that they tended to have functional interactions with cancer-associated genes recorded in the COSMIC database. Leukocyte cellular differential DNA methylations could be identified by the proposed RMO-based method from OVC PWB samples, which were cancer-associated aberrant signals against cell-composition effects.
KEYWORDS: Cancer early detection, ovarian cancer, peripheral whole blood, cell-composition effect, relative methylation ordering
Introduction
Ovarian cancer (OVC) is the most common cancerin female genital organs. Most OVC patients are diagnosed with an advanced disease that has already metastasized to distant organs, for which 5-year survival is 29% [1]. For women at high risk of OVC, a possible screening approach is testingusing transvaginal ultrasound and tumour marker CA125. However, this strategy has not proven effective in reducing OVC mortality [1]. Therefore, molecular biomarkers that could help to detect early OVC are in urgent need.
Peripheral whole blood (PWB) has been considered as a promising surrogate for solid tissue to investigate disease-associated biomarkers [2]. Many molecular alterations have been identified from global gene expression profiles in PWB of patients with various cancers including OVC [3,4]. However, PWB comprises a heterogeneous population of leukocytes, whose relative proportions may shift under disease states [5,6]. Such cell-composition effects could introduce additional signals, which could influence and even disguise the disease-associated signals in leukocytes [6,7]. Therefore, a computational tool is needed to detect leukocyte-specific molecular alterations from PWB samples, taking the influence of leukocyte subtype proportion changes on the overall signals of the PWB sample into consideration.
Some researchers have already developed methods to avoid cell-composition effects of PWB. According to whether using cell-type specific reference profiles, these methods can be classified as either reference-free or reference-based [6,8,9]. For example, surrogate variable analysis (SVA), which is a reference-free method, can identify the potential confounding factors as surrogate variables and adjust them. The reference-based method developed by Houseman et al. needs to estimate and adjust the proportion of each leukocyte subtype in blood samples using the deconvolution algorithm based on the profiles of the purified leukocyte subtypes [6]. However, the absolute measurement values of genes they calculated are sensitive to the so-called batch effects in microarray experiments [10]. It has been reported that the within-sample relative expression orderings (REOs) of genes are insensitive to the systematic biases in microarray measurements, invariant to monotonic data normalization, and robust against inter-individual biological variation of gene expression levels [11]. We have developed a method, named Ref-REO, to detect leukocyte-specific expression alterations from mixed-cell blood samples of patients, through analysing the disrupted patterns of the pre-determined REOs of genes consistent in healthy purified leukocyte subtypes [12]. This method has proven to be an effective way to detect leukocyte cellular gene expression alterations from PWB sample of diseases such as Alzheimer’s disease [13].
DNA methylation (DNAm) is a core epigenetic component, which is known to be associated with different forms of cancer. DNAm aberrations in PWB have also been reported to serve as cancer biomarkers [14]. Considering the above-mentioned cell-composition effects in blood samples, the objectives of our study are: (1) to assess the stability of within-sample relative methylation orderings (RMOs) of CpG loci in healthy human leukocytes; (2) to develop the Ref-RMO method for the identification of significantly disrupted methylated CpG loci from cancer blood samples; (3) to apply the Ref-RMO method to OVC PWB samples. The findings showed that significantly altered methylations in OVC blood samples detected by the Ref-RMO method were related to important clinical outcomes, such as subtype, grade, or CA125 level, and might be different from the inflammation-related diseases in involved biological pathways. The development of the method to discover the disease-associated methylation alterations in this paper is promising to improve early diagnosis of OVC patients and increase their survival.
Material and methods
Data sources and data preprocessing
All the DNAm profiles analysed in this study were downloaded from the Gene Expression Omnibus database (GEO, http://www.ncbi.nlm.nih.gov/geo/, Table 1). All DNAm profiles were generated by the Illumina Infinium Human Methylation 27k (K27) or Methylation 450k BeadChip (K450).
Table 1.
DNA methylation profiles analysed in this study
| Type | Sample number | Data source | Platform | Ref |
|---|---|---|---|---|
| Healthy leukocyte | Total:73 Monocytes: 5, granulocytes: 4, neutrophils: 4, B cells: 5, NK cells (Pan NKR cells, CD16+ NK cells, CD16− NK cells, CD8+ NK cells and CD8− NK cells): 12, T cells (CD4+ T cells, CD8+ T cells, NK T cells, Pan T cells and Tregs): 16, PWB: 27 |
GSE39981 | K27 | [5] |
| Total: 60 CD4+ T cells: 6, CD8+ T cells: 6, CD56+ NK cells: 6, CD19+ B cells: 6, CD14+ monocytes: 6, neutrophils: 6, eosinophils: 6, whole blood: 6, PBMC: 6 |
GSE35069 | K450 | [15] | |
| OVC PWB | Total: 405 Normal: 274 Pre-treatment OVC: 131 |
GSE19711 | K27 | [16] |
| Healthy PWB | Total: 92 | GSE30229 | K27 | [9] |
| Rheumatoid arthritis PWB | Total: 674 Normal: 325 Rheumatoid arthritis: 349 |
GSE42861 | K450 | [18] |
For each CpG locus, the methylation level, denoted as a beta-value (), was calculated by Eq. (1):
| (1) |
where M and U represent the methylated and unmethylated signal intensity of this CpG locus reported by BeadChip, respectively. The 25,978 CpG loci measured on both platforms were analysed as background CpG loci. All CpG loci were within the proximal promoter regions of the transcription start sites of 14,113 genes. The original platform annotation file for K27 was used to map each CpG locus to gene ID.
Detecting disease-associated DNAm alterations in leukocytes from PWB
The Ref-RMO method was developed to detect the leukocyte cellular disease-associated DNAm alterations from PWB based on the RMOs of CpG loci within individual samples, which consist of the following steps as described below.
First, reference CpG pairs were decided. For all n(n-1)/2 CpG pairs that could be obtained from the n background CpG loci, if the methylation levels of any two CpG loci, i and j, satisfied that their RMO βi>βj were stable and consistent in all samples of leukocyte subtypes, then the CpG pair (CpGi, CpGj) was extracted as a reference CpG pair.
Second, reversed CpG pairs were detected. In a dataset to be analysed, the RMOs of the reference CpG pairs were evaluated in healthy PWB samples first. Only those reference CpG pairs keeping their RMOs in more than 90% of the healthy samples in the analysed dataset were retained for analysis. Then, for each pair, the numbers of healthy samples with RMO βi>βj and βi<βj were calculated and denoted by n1 and n2, and the numbers of disease samples with RMO βi>βj and βi<βj were calculated and denoted by m1 and m2, respectively. Finally, according to n1, n2, m1 and m2, Fisher’s exact test was used to assess whether the RMO of a reference CpG pair in healthy samples was significantly reversed in disease samples. The false discovery rate was controlled at 5% by Benjamin and Hochberg method [17]. An adjusted p-value less than 0.05 was considered statistically significant and the corresponding CpG pairs were defined as reversed CpG pairs.
Last, significant differentially methylated CpG loci (DML) were detected. For each CpG locus i included in the reversed CpG pairs, let n1 and n2 denote the numbers of reference CpG pairs with RMOs βi>βj and βj>βi, and m1 and m2 denote the numbers of reversed CpG pairs with RMOs βi>βj and βj>βi, respectively, where j could be any locus except i. The Fisher’s exact test was employed to test whether the RMOs of the CpG locus i and its partners were significantly different between the reference and reversed CpG pairs. As the reference and reversed CpG pairs were defined from healthy and disease samples, respectively, this CpG locus i could be considered significantly differentially methylated if the corresponding adjusted Fisher’s exact test p-value is smaller than 0.05 [17]. If n1/n2 < m1/m2, we defined this CpG locus as hypermethylated one, otherwise hypomethylated one. Notably, CpG loci with no DNAm changes in the disease state could also be identified as significant DML as they could frequently be the partners of true significant DML. To delete such CpG loci, the following steps were performed for filtering. First, select the most significant methylated CpG locus according to the adjusted p-value. Then, remove those CpG pairs comprising of this locus from the list of the reversed CpG pairs, and re-detect significant DML. This procedure was performed until no significant DML could be detected. All the selected most significant DML were considered as true DML.
Evaluating the association of reversed CpG pairs with clinical characteristics
The association of reversed CpG pairs with a clinical characteristic was evaluated by the following procedure. Suppose there are n reversed CpG pairs. First, the disease samples were classified according to the RMO pattern ( or ) of each reversed CpG pair. Then, the difference in a clinical characteristic between the two groups of disease samples classified by each reversed CpG pair was evaluated by the Wilcoxon rank-sum test. A reversed CpG pair was considered significantly associated with a clinical characteristic if the p-value is smaller than 0.05. To test whether the number of reversed CpG pairs observing significant association with a clinical characteristic could be observed by random chance, a random experiment is performed as described below: (1) Randomly divided the disease samples into two groups, with the same sizes as the two groups classified by the RMO of each reversed CpG pair. (2) Calculate the number of pairs that observing significant association with a clinical characteristic based on random groups, denoted as m. (3) Repeated steps 1 and 2 1000 times. The probability of observed k out of n reversed CpG pairs significantly associated with a clinical characteristic of disease samples was calculated by p= t/1000, where t represents the times of m≥ k. If the p value is smaller than 0.05, we consider that observing k reversed CpG pairs significantly associated with a clinical characteristic are unlikely to have happened by chance.
Predicting disease samples based on RMOs of reversed CpG pairs
A five-fold cross-validation procedure was conducted to evaluate the predictive performance of the reversed CpG pairs identified for a disease. First, the individuals in a dataset were partitioned into five random groups of near-equal size, with the randomization performed separately within the disease and normal sub-populations. Among them, one group was labelled as the validation set and the other four was labelled as the training set. Second, reversed CpG pairs were sorted from the largest to the smallest according to their differences in probability of between normal and disease samples (). For a reversed CpG pair (i, j), can be calculated by Eq. (2):
where N1 and N2 denote the number of normal and disease samples, and a and b denote the number of normal and disease samples having the RMO pattern in the training set, respectively. The top k CpG pairs with the biggest were selected to predict the disease samples in the validation set in turn. The area under the receiver operating curve (AUC) was calculated for each corresponding cross-validation fold. The cross-validation procedure was replicated for 100 times, where for each replicate a new randomization was performed. The predictive performance for the reversed CpG pairs was finally evaluated by the average AUC across all 100 replications.
Surrogate variable analysis and reference-based method
SVA (8) and the reference-based method developed by Houseman et al. (6) were performed with the default parameter settings. Here we used SVA to identify the potential confounding factors in the DNAm data as surrogate variables (SVs), which was implemented using SVA package in R. For the reference-based method, consistent with Liu [18] and Koestler et al. [19], the 500 CpG loci having the most varied methylation levels among examined leukocyte subtypes were used as reference to estimate the proportions of leukocyte subtypes in PWB samples. Briefly, a total of five leukocyte subtypes, including B cells, granulocytes, monocytes, NK cells and T cells, were included in the estimation and their methylation levels were obtained from GSE39981. The reference-based method was implemented using software provided by Houseman et al. (6).
Functional enrichment analysis
The functional annotation tool DAVID (https://david.ncifcrf.gov/, version 6.8) was used to perform the enrichment analysis for genes involved in DML which were referred to as differentially methylated genes [20]. All of the measured genes annotated in the KEGG or Gene Ontology database were used as the background genes.
STRING and COSMIC database
The STRING database (https://string-db.org/, Version: 10) recording 82,160 known and predicted protein–protein interactions was adopted to evaluate the interaction relationship between differentially methylated genes [21]. The COSMIC database (https://cancer.sanger.ac.uk/cosmic), the world’s largest and most comprehensive resource for exploring the impact of somatic mutations in human cancer, was used to collect the cancer-associated genes [22].
Results
Stable relative methylation orderings of CpG loci within purified healthy leukocyte subtypes
In a previous work, we demonstrated that REOs are highly stable in various types of healthy leukocytes [12]. To evaluate whether the RMOs of CpG pairs are stable in healthy leukocytes, DNA methylation profiles of various types of healthy purified leukocytes from two different datasets were collected (Table 1), and pairwise comparisons were performed for all CpG loci in each dataset. A CpG pair with the same RMO in 100% of the healthy leukocyte subtypes was considered as a stable CpG pair. For the 14 leukocyte subtypes in GSE39981, 207,747,039 stable CpG pairs were identified. For the eight leukocyte subtypes in GSE35069, 232,401,494 stable pairs were identified. The two lists shared 172,516,852 stable pairs with consistent RMOs in all leukocyte subtypes in both datasets, which was unlikely to occur by chance (binomial distribution test, p-value<2.2 × 10−16).
Then, we evaluated whether the cell composition of leukocyte subtypes differs between individual PWB samples. The analysis of the deconvolution algorithm for Dataset GSE30229 (92 healthy PWB profiles) showed that the estimated proportion of each leukocyte subtype from healthy PWB fluctuated greatly (Figure 1). For example, the proportions varied from 32.4% to 75.2% for granulocytes. A similar phenomenon was observed in the Dataset GSE42861. Our previous work suggested that changes in cell proportions alone could not affect the stable REOs in all leukocyte subtypes in PWB samples [12]. To evaluate whether cell-composition effects of PWB could influence the methylation signals intrinsic in purified leukocyte subtypes, the RMOs of the 172,516,852 stable pairs were first examined in PWB or PBMC samples collected in the same datasets containing the purified leucocytes. Among these pairs, 99.82% and 99.90% retained the same RMOs in all of the 27 healthy PWB DNAm profiles examined in dataset GSE39981, and in all six healthy PWB and six healthy PBMC DNAm profiles examined in dataset GSE35069, respectively. Then, the common stable pairs with consistent RMOs in all samples including leukocytes, PWB, and PBMCs in both GSE39981 and GSE35069, which involving 172,039,962 CpG pairs, were further evaluated in healthy PWB samples examined in independent datasets, GSE30229 and GSE42861. We found that 99.72% and 99.76% of the common stable CpG pairs had the consistent RMOs in more than 90% of the PWB samples in these two datasets, respectively.
Figure 1.

Proportions of the five main leukocyte subtypes in normal PWB samples estimated by deconvolution method.
The above results indicated that the stability of RMOs detected from purified leukocyte subtypes could be maintained in mix-cell blood samples measured by different platforms and datasets despite cell-composition effects.
Reverse relative methylation orderings of CpG loci in PWB of ovarian cancer
Taking the 172,039,962 stable CpG pairs identified based on purified leukocyte subtypes as reference, the healthy and OVC PWB DNAm profiles examined in GSE19711 were evaluated. The result showed that 99.67% of the reference pairs had consistent RMOs in more than 90% healthy DNAm profiles. This phenomenon further validated that the CpG pairs with stable RMOs in purified leukocyte subtypes could be stable in healthy PWB. For OVC PWB DNAm profiles, the RMOs of 6,122 reference pairs were significantly reversed with adjusted Fisher’s exact test p-value smaller than 0.05.
Although relatively low quantity of reversed CpG pairs was observed, they might reflect the true disease-associated DNAm alterations in PWB. First, in the random experiment performed by randomly disturbing the sample labels in GSE19711 for 1000 times, no significant reversed CpG pairs were observed (p-value<0.001). Second, the reversed CpG pairs in OVC were associated with age, subtype, grade, stage, or CA125 level. Among the 6,122 reversed pairs, 3,309 were significantly associated at least with age, subtype, grade, stage or CA125 level (Figure 2(a)). For example, 1,509 out of the 6,122 reversed pairs were significantly associated with CA125 level, which was unlikely to occur by chance: only 301.24 pairs could be observed to be associated with CA125 level in random experiments (p-value<0.001, see Material and Methods). Further, the median value of CA125 was significantly higher in OVC samples with reversed RMOs (749.10 U/mL) than in OVC samples without reversed REOs (245.25 U/mL) of the 1,509 pairs (p-value < 2.2 × 10−16, Fisher’s exact test).
Figure 2.

The association of altered RMOs in PWB with OVC. (a) Reversed CpG pairs are significantly associated with OVC clinical characteristics. (b) Functional pathways and GO terms enriched with differentially methylated genes detected by Ref-RMO.
Notably, the reversed CpG pairs could be used as biomarkers to predict OVC just based on their RMOs. For simplicity, only the top k reversed CpG pairs with the biggest difference in probability of between normal and OVC samples were selected for analysis. The parameter k was set from 3 to 200. The five fold cross-validation was employed to evaluate the predictive performance of the top k CpG pairs. As shown in Figure S1, the average AUC varied slightly as the number of top reversed CpG pairs increased, with the biggest average AUC of 0.7730.056 (k = 112). The average AUC can approach 0.77 when only a very small number of top CpG pairs was used. For example, when the top 11 CpG pairs were selected to classify OVC and healthy PWB samples, the mean and standard deviation of AUC was 0.7590.063, which were comparable with the reported performance of AUC = 0.8 (95% CI: 0.74–0.87) [23].
These results suggested that the reversed CpG pairs detected from PWB contained leukocyte cellular OVC-associated DNAm alterations.
Differentially methylated CpG loci in ovarian cancer PWB
The 6,122 reversed CpG pairs involved 3,311 loci. Among them, 439 loci were detected as OVC DML in PWB by Ref-RMO (Fisher’s exact test, FDR<5%). The 439 DML and the remaining 2,872 loci not detected as DML showed a significant difference in the average methylation levels between OVC and normal samples (: 0.02230.0200 vs 0.01230.0159, p-value<2.2 × 10−16, Wilcoxon rank-sum test). The 439 DML mapped to 425 genes (supplementary Table S1), which may have close associations with leukocyte subtypes. For example, the CpG locus cg2082571 involved 290 reversed CpG pairs, and it located in the promoter of gene CLEC4A, which encoded dendritic cell immunoreceptor protein. cg23959705 involving 93 reversed CpG pairs mapped to gene TNFRSF9, which encodes a member of the TNF-receptor super family and plays an important role in the development of T cells and proliferation in peripheral monocytes [24,25]. cg3001305 involving 85 reversed CpG pairs mapped to gene STAT5AM, which correlates with CD8+ T cell homoeostasis [26].
Functional enrichment analysis was conducted for the 425 differentially methylated genes by DAVID. Significant KEGG pathways included cytokine-cytokine receptor interaction, apoptosis, proteoglycans in cancer, neurotrophin signalling pathway, and HIF-1 signalling pathway (Figure 2(b)). Significant Gene Ontology terms included a variety of biological processes, among which were terms involved in immune-associated terms (Figure 2(b)). The significantly enriched pathways or functional categories could be closely associated with cancer development and progression. For example, proteoglycans have been reported to play an important role in regulating tumour cell growth, adhesion, survival, metastasis, and angiogenesis [27].
The protein–protein interaction information involving the 425 genes were retrieved from the STRING database. Results showed that there were 737 interactions, greater than the expected 620 interactions (the protein–protein interaction enrichment p-value = 2.79 × 10−6). Furthermore, the 425 differentially methylated genes significantly interacted with 232 cancer genes recorded in the COSMIC database, which was unlikely to occur by random chance: when randomly selected 425 genes from the background as differentially methylated genes, the mean number of interacting cancer genes was 111.1860 ± 9.1975, with p-value<0.001 in 1000 random experiments. The interacting cancer genes were mainly involved in the PI3K-Akt signalling pathway, T cell receptor signalling pathway, and B cell receptor signalling pathway.
The above results suggested that DNA alterations detected from PWB by the Ref-RMO method could be closely associated with OVC.
Comparing the performance of Ref-RMO with surrogate variable analysis and reference-based deconvolution algorithm
For comparison, diseased-associated DNAm alterations in PWB of OVC patients in dataset GSE19711 were also detected using SVA and reference-based deconvolution algorithm, respectively.
SVA identifies the potential confounding factors such as cell-composition effect as surrogate variables and adjusts them. However, result showed that no significant surrogate variables were identified, suggesting that it might detect the leukocyte proportion shifts as disease-associated alterations, and may be difficult to avoid cell-composition effects. Actually, without adjusting for cell-composition effect, 4,762 DML were detected by SVA, which were all detected by t-test simply comparing OVC and normal samples. Therefore, we did not compare these 4,762 DML to the 439 DML detected by Ref-RMO.
Based on the DNAm profiles of purified leukocyte subtypes examined in GSE39981, the reference-based deconvolution method identified only three significant DML at FDR 5%. When controlling the FDR at 20%, 153 significant DML were detected, 11 out of which were also identified by Ref-RMO, which was unlikely to be observed by chance (hypergeometric distribution test, p-value = 0.02). Notably, the reference-based deconvolution method tended to identify those CpG loci with higher methylation levels in healthy PWB samples (Wilcoxon rank-sum test, p-value = 6.0079 × 10−21, Figure 3(a)) and with lower variations of methylation levels in healthy leukocyte subtypes (Wilcoxon rank-sum test, p-value = 1.5 × 10−3, Figure 3(b)). CpG loci with lower methylation variations in leukocyte subtypes might not be easily affected by the proportion shifts in PWB and they were small in quantity, which may explain the low efficiency of the deconvolution method to detect disease-associated methylation alterations from PWB.
Figure 3.

Comparison between Ref-RMO and deconvolution method. (a) The mean DNAm methylation levels of DML detected in normal PWB samples examined in GSE30229. (b) The variance of DNAm methylation levels of DML detected in purified leukocyte subtypes examined in GSE39981.
Comparing the DNAm alterations between ovarian cancer and rheumatoid arthritis
Chronic inflammation is associated with the development of malignancies and shows changes in cell proportions similar to that of cancer [28]. Due to a lack of public data, only rheumatoid arthritis (RA) was analysed in this study. The Ref-RMO method detected 893 DML from the 349 RA PWB samples in GSE42861. They shared 74 with the OVC DML, which was unlikely to be observed by chance (hypergeometric distribution test, p-value<2.2 × 10−16). Of the 74 overlapping DML, 64 had same DNAm alteration directions (both hypermethylation or hypomethylation) (Table 2). For example, the CpG locus cg20070090, which was detected as hypomethylated DML (p-value = 1.02 × 10−2) in RA and in OVC (p-value = 5.49 × 10−8), mapped to gene S100A8 (calprotectin). As a neutrophil activation marker, calprotectin has been reported to show a high level in RA patients compared to healthy subjects [29], and its median concentration has also been reported elevated in OVC plasma [30]. Among the 14 DML having opposite DNAm alteration directions, 13 DML were hypermethylated in OVC and hypomethylated in RA compared to healthy samples. When analysed the genes mapped by the 14 DML together with the interaction partners in STRING, several pathways were detected to be enriched by DAVID, such as Chemokine signalling pathway (hypergeometric distribution test, p-value = 5.90 × 10−49), Proteasome (hypergeometric distribution test, p-value = 8.40 × 10−36), Cytokine-cytokine receptor interaction (hypergeometric distribution test, p-value = 2.50 × 10−25), Neuroactive ligand-receptor interaction (hypergeometric distribution test, p-value = 1.20 × 10−23) pathways. These pathways had been reported to play important roles in both inflammation diseases and cancer [31–34]. However, their roles might be different between inflammation disease and cancer. For example, the CpG locus cg10964421 located in gene TNFRSF10D promoter region, which was a key regulator of the inflammatory response. It has been observed to be over-expressed in inflammation diseases such as Helicobacter pylori infection [35], while hypermethylated in many types of cancer including OVC [36,37].
Table 2.
DML with opposite DNAm alteration detections between OVC and rheumatoid arthritis
| CpG Loci ID | Gene Symbol | OVC |
Rheumatoid arthritis |
||
|---|---|---|---|---|---|
| Methylated* | p-value | Methylated | p-value | ||
| cg02523617 | HIP2 | 1 | 2.20 × 10−16 | −1 | 2.20 × 10−16 |
| cg04826883 | CA12 | 1 | 5.08 × 10−10 | −1 | 2.15 × 10−3 |
| cg05889321 | ALDH18A1 | 1 | 5.57 × 10−3 | −1 | 2.20 × 10−16 |
| cg07133445 | SOLH | 1 | 2.20 × 10−16 | −1 | 2.66 × 10−3 |
| cg07310661 | CUL3 | 1 | 2.20 × 10−16 | −1 | 2.2 × 10−16 |
| cg07799947 | NPAS2 | 1 | 1.24 × 10−4 | −1 | 1.34 × 10−8 |
| cg08028004 | RANBP2 | 1 | 1.32 × 10−4 | −1 | 8.15 × 10−5 |
| cg08785155 | ATP1B1 | −1 | 1.24 × 10−4 | 1 | 2.0112 × 10−2 |
| cg10964421 | TNFRSF10D | 1 | 2.20 × 10−16 | −1 | 4.44 × 10−11 |
| cg11970458 | PYCARD | 1 | 2.20 × 10−16 | −1 | 1.69 × 10−5 |
| cg13144783 | CCR1 | 1 | 2.20 × 10−16 | −1 | 2.2 × 10−16 |
| cg17705056 | PQLC1 | 1 | 2.20 × 10−16 | −1 | 1.33 × 10−2 |
| cg26229607 | ARHGAP29 | 1 | 1.12 × 10−5 | −1 | 1.03 × 10−2 |
| cg26704579 | CYP1A1 | 1 | 2.20 × 10−16 | −1 | 5.74 × 10−6 |
*1 represents hypermethylated; −1represents hypomethylated.
Discussion
Improved technology has driven the promising blood-borne biopsy biomarker identification. However, due to the cell-composition effects in blood, the discovery of leukocyte cellular DNA alterations that are relative to cancer condition is challenged. In this study, we developed an RMO-based method to detect OVC associated DNAm alterations in PWB. We first showed that stable CpG pairs widely existed in healthy human leukocytes. This was an inherent feature of DNAm in healthy human leukocytes, which was also the basis for the detection of aberrant DNAm alterations in the Ref-RMO method. Theoretically, CpG pairs with stable and consistent RMOs in purified leukocyte subtypes could be stable in DNAm profiles of PWB regardless of leukocyte cell proportion changes when no DNAm alterations occurred on leukocyte subtypes. We observed that almost all CpG pairs which had stable and consistent RMOs in purified leukocyte subtypes were also stable in healthy DNAm profiles of PWB in independent datasets in this study. Therefore, those CpG pairs with significantly reversed RMOs in cancer PWB samples should include the CpG loci with aberrant DNAm alterations in certain leukocyte subtypes.
In this study, we revealed that reversed CpG pairs were significantly associated with clinical characteristics by conducting random experiments, taking the CA125 level as an example. We note that the number of reversed CpG pairs observing significant association with OVC stage was not much bigger than that observed in random experiments (389 vs 298.4, Figure 2(a)). Therefore, we additionally analysed the correlation between stage and the number of reversed CpG pairs observed in each stage of OVC samples. The result showed that there were no correlations between them (r= 0.1109, p= 0.2127). Actually, for stage, another interesting analysis may be to apply Ref-RMO to different stages of OVC samples. By comparing the OVC samples of different stages with the normal controls, no significant reversed CpG pairs were detected in stage 2 and 4 by Ref-RMO. This may be due to the limited number of individuals (12 and 11 in stage 2 and 4, respectively). When the sample size increased, more reversed CpG pairs could be detected: two for stage 1 (n= 48) and 483 for stage 3 (n= 57), respectively. Similar results were observed for OVC samples with different grades (data not shown). These results suggested that Ref-RMO might capture DNAm alterations occurred in early OVC.
On the other hand, we also showed that reversed CpG pairs had the potential to distinguish disease from normal controls. Except OVC, RA data were also analysed as depicted in supplementary Figure S1. By ordering reference CpG pairs according to their differences in probability of βi>βj between normal and RA samples, which is a more general criterion, the average AUC calculated by five-fold cross-validation was ranging from 0.6660.045 (k= 3) to 0.8030.072 (k= 200). Considering that reference CpG pairs were required to keep their RMOs in more than 90% of the healthy samples, a simple way was to order them according to their reversal rates in RA samples, which produced similar results, with the average AUC ranging from 0.6890.071 (k= 3) to 0.8260.062 (k= 200). Nevertheless, an effective ordering of reference CpG pairs should consider more details of the data under study.
Although Ref-RMO is more effective and simpler than traditional SVA and reference-based methods to detect disease-related DNAm changes, some problems still require caution. First, due to the uneven distribution of leukocyte subtypes in PWB, Ref-RMO is difficult to detect DNAm changes caused by a low proportion of leukocyte subtypes. Second, the proportion shifts of leukocyte subtypes under disease state could influence the detection of the DNAm alteration as the proportion changes could strengthen or weaken the differential signals that come from leukocyte subtypes. Last, more leukocyte subtypes of DNAm profiles are needed for reference to increase the credibility of detected disease-associated alterations.
Inflammation is closely associated with cancer [38]. In the study, we compared the DML detected from the OVC and RA, respectively. We showed that some common DNAm alterations shared by these two diseases (Table 2) while some different DNAm alterations were also observed. The HIF-1 signalling pathway recorded in KEGG, which plays an important role in inflammatory disease and cancer [39], was significantly enriched with genes commonly represented by RA and OVC DML. As shown in Figure 4, RA differentially methylated genes could enter HIF-1 signalling by the NF-kB signalling subpathway, while OVC differentially methylated genes may enter them TOR/PI3K-Akt signalling subpathway [40,41]. However, this needs further validation.
Figure 4.

KEGG pathway of HIF-1 signalling pathway. Items coloured red and yellow indicate, respectively, differentially methylated genes detected for OVC and RA as compared to normal; blue indicates differentially methylated genes detected for both diseases. Figure was coloured based on KEGG pathway hsa04066 (http://www.genome.jp/kegg/pathway/hsa/hsa04066.html).
Supplementary Material
Funding Statement
This work was supported in part by the National Natural Science Foundation of China (Grant Nos. 61961002, 81903186, 81501829, and 81501215), the Open Project of Key Laboratory of Prevention and Treatment of Cardiovascular and Cerebrovascular Diseases, Ministry of Education (Grant No. XN201914), and the Doctoral Fund of Gannan Medical University (Grant Nos.QD201827, and QD201828).
Author Contributions
GNH and HDL conceived the idea and conceptualized the study. HDL conducted the bioinformatics analysis and interpreted results. FLJ, NL, ZHC, HC and YG collected and pre-processed data. YHD generated the figures and tables. HDL and GNH wrote the paper and supervised the whole study process. HDL, YG and GNH revised the manuscript. All authors have read and approved the final version of manuscript.
Disclosure statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Data Availability Statement
The datasets for this study can be found in the Gene Expression Omnibus repository [https://www.ncbi.nlm.nih.gov/geo/].
Supplementary material
Supplemental data for this article can be accessed here.
References
- [1].American Cancer Society , Cancer Facts & Figures 2020.
- [2].Cheung AH, Chow C, To KF.. Latest development of liquid biopsy. J Thorac Dis. 2018;10(Suppl 14):S1645–S1651. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Hausler SF, Keller A, Chandran PA, et al. Whole blood-derived miRNA profiles as potential new tools for ovarian cancer screening. Br J Cancer. 2010;103(5):693–700. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [4].Wang L, Ni S, Du Z, et al. A six-CpG-based methylation markers for the diagnosis of ovarian cancer in blood. J Cell Biochem. 2020;121(2):1409–1419. [DOI] [PubMed] [Google Scholar]
- [5].Accomando WP, Wiencke JK, Houseman EA, et al. Decreased NK cells in patients with head and neck cancer determined in archival DNA. Clin Cancer Res. 2012;18(22):6147–6154. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Houseman EA, Accomando WP, Koestler DC, et al. DNA methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinformatics. 2012;13:86. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Li H, Zheng T, Chen B, et al. Similar blood-borne DNA methylation alterations in cancer and inflammatory diseases determined by subpopulation shifts in peripheral leukocytes. Br J Cancer. 2014;111(3):525–531. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Chakraborty S, Datta S, Datta S. Surrogate variable analysis using partial least squares (SVA-PLS) in gene expression studies. Bioinformatics. 2012;28(6):799–806. [DOI] [PubMed] [Google Scholar]
- [9].Langevin SM, Houseman EA, Accomando WP, et al. Leukocyte-adjusted epigenome-wide association studies of blood from solid tumor patients. Epigenetics. 2014;9(6):884–895. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Leek JT, Scharpf RB, Bravo HC, et al. Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet. 2010;11(10):733–739. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Shi P, Ray S, Zhu Q, et al. Top scoring pairs for feature selection in machine learning and applications to cancer outcome prediction. BMC Bioinformatics. 2011;12:375. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].Hong G, Li H, Li M, et al. A simple way to detect disease-associated cellular molecular alterations from mixed-cell blood samples. Brief Bioinform. 2018;19(4):613–621. [DOI] [PubMed] [Google Scholar]
- [13].Li H, Hong G, Lin M, et al. Identification of molecular alterations in leukocytes from gene expression profiles of peripheral whole blood of Alzheimer’s disease. Sci Rep. 2017;7(1):14027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Shiao SPK, Xiao H, Dong L, et al. Genome wide DNA differential methylation regions in colorectal cancer patients in relation to blood related family members, obese and non-obese controls - a preliminary report. Oncotarget. 2018;9(39):25557–25571. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Reinius LE, Acevedo N, Joerink M, et al. Differential DNA methylation in purified human blood cells: implications for cell lineage and studies on disease susceptibility. PLoS One. 2012;7(7):e41361. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [16].Teschendorff AE, Menon U, Gentry-Maharaj A, et al. Age-dependent DNA methylation of genes that are suppressed in stem cells is a hallmark of cancer. Genome Res. 2010;20(4):440–446. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy. Stat Soc B. 1995;57:289–300. [Google Scholar]
- [18].Liu Y, Aryee MJ, Padyukov L, et al. Epigenome-wide association data implicate DNA methylation as an intermediary of genetic risk in rheumatoid arthritis. Nat Biotechnol. 2013;31(2):142–147. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [19].Koestler DC, Christensen B, Karagas MR, et al. Blood-based profiles of DNA methylation predict the underlying distribution of cell types: a validation analysis. Epigenetics. 2013;8(8):816–826. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [20].W. Huang da, B. T. Sherman, and R. A. Lempicki . Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 2009;37(1):1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].Szklarczyk D, Morris JH, Cook H, et al. The STRING database in 2017: quality-controlled protein-protein association networks, made broadly accessible. Nucleic Acids Res. 2017;45(D1):D362–D368. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].Forbes SA, Beare D, Boutselakis H, et al. COSMIC: somatic cancer genetics at high-resolution. Nucleic Acids Res. 2017;45(D1):D777–D783. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Teschendorff AE, Menon U, Gentry-Maharaj A, et al. An epigenetic signature in peripheral blood predicts active ovarian cancer. PLoS One. 2009;4(12):e8274. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [24].Ju SW, Ju SG, Wang FM, et al. A functional anti-human 4-1BB ligand monoclonal antibody that enhances proliferation of monocytes by reverse signaling of 4-1BBL. Hybrid Hybridomics. 2003;22(5):333–338. [DOI] [PubMed] [Google Scholar]
- [25].Schwarz H, Tuckwell J, Lotz M. A receptor induced by lymphocyte activation (ILA): a new member of the human nerve-growth-factor/tumor-necrosis-factor receptor family. Gene. 1993;134(2):295–298. [DOI] [PubMed] [Google Scholar]
- [26].Gatzka M, Piekorz R, Moriggl R, et al. A role for STAT5A/B in protection of peripheral T-lymphocytes from postactivation apoptosis: insights from gene expression profiling. Cytokine. 2006;34(3–4):143–154. [DOI] [PubMed] [Google Scholar]
- [27].Theocharis AD, Skandalis SS, Neill T, et al. Insights into the key roles of proteoglycans in breast cancer biology and translational medicine. Biochim Biophys Acta. 2015;1855(2):276–300. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [28].Shacter E, Weitzman SA. Chronic inflammation and cancer. Oncology (Williston Park). 2002;16(2):217–226. 229; discussion 230–2. [PubMed] [Google Scholar]
- [29].Bach M, Moon J, Moore R, et al. A neutrophil activation biomarker panel in prognosis and monitoring of patients with rheumatoid arthritis. Arthritis Rheumatol. 2020;72(1):47–56. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [30].Odegaard E, Davidson B, Elgaaen BV, et al. Circulating calprotectin in ovarian carcinomas and borderline tumors of the ovary. Am J Obstet Gynecol. 2008;198(4):418 e1–7. [DOI] [PubMed] [Google Scholar]
- [31].Mollica Poeta V, Massara M, Capucetti A, et al. Chemokines and chemokine receptors: new targets for cancer immunotherapy. Front Immunol. 2019;10:379. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [32].Pajonk F, McBride WH. The proteasome in cancer biology and treatment. Radiat Res. 2001;156(5 Pt 1):447–459. [DOI] [PubMed] [Google Scholar]
- [33].Qureshi N, Vogel SN, Van Way C 3rd, et al. The proteasome: a central regulator of inflammation and macrophage function. Immunol Res. 2005;31(3):243–260. [DOI] [PubMed] [Google Scholar]
- [34].Zhang L, Yu M, Deng J, et al. Chemokine signaling pathway involved in CCL2 expression in patients with rheumatoid arthritis. Yonsei Med J. 2015;56(4):1134–1142. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [35].Neu B, Rad R, Reindl W, et al. Expression of tumor necrosis factor- alpha -related apoptosis-inducing ligand and its proapoptotic receptors is down-regulated during gastric infection with virulent cagA+/vacAs1+ Helicobacter pylori strains. J Infect Dis. 2005;191(4):571–578. [DOI] [PubMed] [Google Scholar]
- [36].Shivapurkar N, Toyooka S, Toyooka KO, et al. Aberrant methylation of trail decoy receptor genes is frequent in multiple tumor types. Int J Cancer. 2004;109(5):786–792. [DOI] [PubMed] [Google Scholar]
- [37].Venza M, Visalli M, Catalano T, et al. Impact of DNA methyltransferases on the epigenetic regulation of tumor necrosis factor-related apoptosis-inducing ligand (TRAIL) receptor expression in malignant melanoma. Biochem Biophys Res Commun. 2013;441(4):743–750. [DOI] [PubMed] [Google Scholar]
- [38].Murata M. Inflammation and cancer. Environ Health Prev Med. 2018;23(1):50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [39].Jung YJ, Isaacs JS, Lee S, et al. IL-1beta-mediated up-regulation of HIF-1alpha via an NFkappaB/COX-2 pathway identifies HIF-1 as a critical link between inflammation and oncogenesis. Faseb J. 2003;17(14):2115–2117. [DOI] [PubMed] [Google Scholar]
- [40].Horak P, Crawford AR, Vadysirisack DD, et al. Negative feedback control of HIF-1 through REDD1-regulated ROS suppresses tumorigenesis. Proc Natl Acad Sci U S A. 2010;107(10):4675–4680. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [41].Trebec-Reynolds DP, Voronov I, Heersche JN, et al. VEGF-A expression in osteoclasts is regulated by NF-kappaB induction of HIF-1alpha. J Cell Biochem. 2010;110(2):343–351. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The datasets for this study can be found in the Gene Expression Omnibus repository [https://www.ncbi.nlm.nih.gov/geo/].
