Skip to main content
JNCI Journal of the National Cancer Institute logoLink to JNCI Journal of the National Cancer Institute
editorial
. 2013 Apr 10;105(10):678–680. doi: 10.1093/jnci/djt090

Searching for Blood DNA Methylation Markers of Breast Cancer Risk and Early Detection

Montserrat García-Closas 1, Mitchell H Gail 1, Karl T Kelsey 1, Regina G Ziegler 1,
PMCID: PMC3653825  PMID: 23578855

In this issue of the Journal, Xu et al. (1) used prospectively collected blood samples to determine whether epigenome-wide methylation profiles in blood DNA differ between women who subsequently develop breast cancer and those who do not. Potentially, such a profile could predict the risk of developing breast cancer or detect this disease days to several years before it appears clinically. Existing breast cancer risk prediction models have limited discriminatory accuracy for this common disease (2), and improved methods for early detection are also needed (3,4). In this context, the findings of Xu et al. are promising and exemplify the potential value of epigenome-wide association studies (5,6). However, several important considerations temper the initial enthusiasm and highlight challenges for future studies.

In the Sister Study, a cohort of women with a biologic sister with breast cancer, DNA methylation at 27578 CpG sites was compared in blood samples that were stored at study entry from 298 women who developed breast cancer during follow-up and from a random sample of 612 women who remained cancer-free. This study is possibly the first cancer study to use prospectively collected blood and an epigenome-wide assay platform with single nucleotide resolution. Several epigenome-wide association studies have reported methylation profile differences in blood DNA from cancer case subjects and control subjects but relied on blood samples collected after cancer diagnosis and treatment (7). This retrospective design can induce bias if methylation levels are affected by disease progression or treatment.

As the authors discuss, a major limitation of the Xu et al. report is the lack of replication in independent study populations. In a small replication sample of the 25 case subjects and 56 non−case subjects excluded from the original analysis because of diverse ethnicity, the authors found promising indications of discriminatory accuracy for the five most statistically significant CpGs. However, what is required to confirm the promise of epigenome-wide association studies are independent studies of methylation markers using prospective epidemiological designs and larger sample sizes.

The mean time from baseline blood draw to diagnosis among the breast cancer cases was only 1.3 years. Thus the data cannot tell us whether epigenetic changes can predict risk years into the future or are, instead, a response to incipient disease. For 72% of the 250 differentially methylated CpG sites, mean methylation values for the case subjects with bloods collected more than 1 year before diagnosis were intermediate between the methylation values for the non−case subjects and the case subjects with blood collected within a year before diagnosis. For all CpGs in the array, methylation values were intermediate for case subjects with blood collected more than 1 year before diagnosis at statistically significantly fewer CpGs (34%). These findings suggest that the epigenetic changes are early markers of disease. However, cohort studies with longer follow-up and serial blood collections are needed to estimate lead times, clarify biology, and apply appropriate methods for evaluating predictive value (8).

By design, all women in the Sister Study cohort had a family history of breast cancer. Therefore, the results from this study may not generalize to populations with lower levels of genetic risk. In particular, as the authors discuss, the risk discrimination estimates in this study might not be comparable with estimates in the general population, and risk models that incorporate family history, such as the Gail model, would be expected to have diminished discriminatory accuracy.

Xu et al. assessed whether there was differential methylation at each of 27578 CpG sites across the epigenome using case−cohort proportional hazard regression, adjusted only for age and laboratory parameters. After correction for multiple testing using a false discovery rate threshold of 0.05, Xu et al. identified 250 CpGs associated with breast cancer. The distribution of P values (Supplementary Figure 1A, available online) indicates that many CpGs were differentially methylated. This broad signal could be a pervasive response to early, not-yet-diagnosed disease. It is also possible the signal could reflect bias or the influence on methylation of known or as-yet-undiscovered breast cancer risk factors, which were not controlled in the proportional hazard analyses.

To evaluate their ability to distinguish between case subjects and non−case subjects, Xu et al. used repeated internal cross-validation (9) to obtain unbiased estimates of the area under the receiver operating characteristic curve (AUC), which measures discriminatory accuracy. They used training data (two-thirds of the data) to both reselect promising CpG sites and fit classifiers and then used the remaining data to estimate AUC (Supplementary Methods, available online). This entire process was repeated 500 times to obtain stable estimates. By reselecting CpGs and developing new classifiers each time, they avoided overestimating AUC (10).

In this study, the AUC estimated for methylation markers (65.8%) was larger than for the Gail model (56.0%) or nine highly ranked single nucleotide polymorphisms from genome-wide association studies of breast cancer (58.8%). When Gail model predictors or single nucleotide polymorphisms were added to the methylation markers, the methylation AUC was only marginally improved (<1.0%). However, it would have been of interest to examine AUC improvement when adding methylation markers to the Gail and single nucleotide polymorphism models. Adding more predictors to AUC estimates generally leads to diminishing improvement. Furthermore, if the methylation markers are detecting early disease and the Gail and genetic models are identifying risk factors, then AUC models may not be directly comparable.

Ideally, all three models should be tested and compared for both short-term and long-term discriminatory accuracy in independent study populations. Internal validation, as was conducted for the methylation markers, is never as rigorous a test as external validation (11). However, Xu et al. did not present an explicit risk model or classifier based on the promising CpGs, and such a model is required for external validation in independent studies.

The 27K methylation array used in this study targets promoter regions of translated genes and has limited coverage of the methylome. Much denser arrays have been developed that interrogate methylation sites across the promoter and in the gene body, in noncoding RNA and intergenic regions, and outside of CpG islands (12). Thus future studies can be more comprehensive. Improved technologies, including bisulfite sequencing, will allow better characterization of the true nature and behavior of DNA methylation patterns identified as being predictive of disease. These patterns, precisely described and perhaps integrated with companion histone marks (13), will almost certainly advance our understanding of the biology.

In population studies, the extent of methylation at one specific CpG site in blood DNA does not vary substantially, partly because methylation status is averaged over a number of cells, a variety of leukocyte cell types, and many individuals. Indeed for the 250 promising CpGs identified in this study, the mean β values used to quantify methylation generally differed between case subjects and non−case subjects by only about 1%. To find reliable differences between case subjects and non−case subjects, laboratory variation in measurement must not overwhelm between-subject variation. Intraclass correlation coefficients estimate how much of total variation can be explained by between-subject variation (14). However, the intraclass correlation coefficients presented by Xu et al. for methylation markers are likely to be inflated because they were based on quality control samples created in the laboratory to range in methylation levels from 0 to 100% at each CpG site. Intraclass correlation coefficients derived from the population-based between-subject variance observed for each CpG site in Sister Study subjects would have been more relevant.

The tissue specificity of DNA methylation is well known. Recently, in case−control studies of ovarian, bladder, and head and neck cancers that assessed epigenome-wide methylation, shifts in leukocyte subpopulations, indicated by shifts in methylation markers (15), explained most of the methylation differences between case subjects and control subjects in retrospectively collected blood DNA (7). However, in the Sister Study, methylation markers that best differentiate the common leukocyte subpopulations did not differ between the case subjects and non−case subjects. This finding is reassuring and important because it suggests that prospectively collected blood DNA might yield novel information about methylation predictors of subsequent breast cancer and potentially other solid tumors.

The observed associations are intriguing because the biological mechanism to explain them remains unclear. Because the predictive methylation profile occurs in blood DNA, the next step is to determine if this profile represents a leukocyte-specific response or a process that has occurred across most tissues. As the authors acknowledge, these alternatives cannot be distinguished with these data, but the fact that differential methylation at many of the informative CpG sites was more pronounced among the women with breast cancer diagnosed within a year after blood collection makes it tempting to speculate that the findings reflect an emerging antitumor immune response. In this construct, normally rare immune cells become activated by the tumor and proliferate in an attempt to control the malignant process. Their distinctive clonal DNA methylation profiles become detectable in blood DNA. Other interpretations are possible; only with additional epidemiologic studies, and ultimately functional studies, can we truly put these findings in perspective.

These are early, exciting times for studies of blood DNA methylation markers and risk of breast cancer and other solid tumors. The study by Xu et al. illustrates the complexities of such work but also provides encouraging results for replication and extension. Prospective studies using the next generation of genome-wide methylation platforms and complementary epigenetic assays, collaborative efforts across studies to reach adequate statistical power and solid replication, further development of statistical methods, and improved understanding of the biology of DNA methylation and epigenetic control should advance our knowledge in this promising area of research.

Funding

This work was supported by the Intramural Research Program of the Division of Cancer Epidemiology and Genetics, National Cancer Institute (MHG, RGZ) and Breakthrough Breast Cancer, United Kingdom (MGC).

The authors have no conflicts of interest to disclose. The funders did not have a role in the writing of the manuscript and the decision to submit the manuscript for publication. We wish to acknowledge useful discussions with Shira G. Ziegler, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland.

References

  • 1. Xu Z, Bolick SCE, DeRoo LA, Weinberg CR, Sandler DP, Taylor JA. Epigenome-wide association study of breast cancer using prospectively collected sister study samples. J Natl Cancer Inst. 2013;105(10):694–700 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Rockhill B, Spiegelman D, Byrne C, Hunter DJ, Colditz GA. Validation of the Gail et al. model of breast cancer risk prediction and implications for chemoprevention. J Natl Cancer Inst. 2001:93(5):358–366 [DOI] [PubMed] [Google Scholar]
  • 3. Nelson HD Tyne K Naik A Bougatsos C Chan BK Humphrey L U.S. Preventive Services Task Force. Screening for breast cancer: an update for the U.S. Preventive Services Task Force. Ann Intern Med. 2009:151(10):727–737, W237–W242 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Gøtzsche PC, Nielsen M. Screening for breast cancer with mammography. Cochrane Database Syst Rev. 2011. Jan 19;(1):CD001877 [DOI] [PubMed] [Google Scholar]
  • 5. Feinberg AP. Genome-scale approaches to the epigenetics of common human disease. Virchows Arch. 2010:456(1):13–21 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Rakyan VK, Down TA, Balding DJ, Beck S. Epigenome-wide association studies for common human diseases. Nat Rev Genet. 2011:12(8):529–541 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Koestler DC, Marsit CJ, Christensen BC, et al. Peripheral blood immune cell methylation profiles are associated with nonhematopoietic cancers. Cancer Epidemiol Biomarkers Prev. 2012:21(8):1293–1302 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Pepe MS, Etzioni R, Feng Z, et al. Phases of biomarker development for early detection of cancer. J Natl Cancer Inst. 2001:93(14):1054–1061 [DOI] [PubMed] [Google Scholar]
  • 9. Wessels LF, Reinders MJ, Hart AA, et al. A protocol for building and evaluating predictors of disease state based on microarray data. Bioinformatics. 2005:21(19):3755–3762 [DOI] [PubMed] [Google Scholar]
  • 10. Dupuy A, Simon RM. Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting. J Natl Cancer Inst. 2007:99(2):147–157 [DOI] [PubMed] [Google Scholar]
  • 11. Gail MH, Pfeiffer RM. On criteria for evaluating models of absolute risk. Biostatistics. 2005:6(2):227–239 [DOI] [PubMed] [Google Scholar]
  • 12. Sandoval J, Heyn H, Moran S, et al. Validation of a DNA methylation microarray for 450,000 CpG sites in the human genome. Epigenetics. 2011:6(6):692–702 [DOI] [PubMed] [Google Scholar]
  • 13. Bernstein BE, Meissner A, Lander ES. The mammalian epigenome. Cell. 2007:128(4):669–681 [DOI] [PubMed] [Google Scholar]
  • 14. Gail MH, Fears TR, Hoover RN, et al. Reproducibility studies and interlaboratory concordance for assays of serum hormone levels: estrone, estradiol, estrone sulfate, and progesterone. Cancer Epidemiol Biomarkers Prev. 1996:5(10):835–844 [PubMed] [Google Scholar]
  • 15. Houseman EA, Accomando WP, Koestler DC, et al. DNA methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinformatics. 2012. May 8;13:86 [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from JNCI Journal of the National Cancer Institute are provided here courtesy of Oxford University Press

RESOURCES