Epigenetic changes underlie developmental and age related biology. Promising epidemiologic research implicates epigenetics in disease risk and progression, and suggests epigenetic status depends on environmental risks as well as genetic predisposition. Epigenetics may represent a mechanistic link between environmental exposures, or genetics, and many common diseases, or may simply provide a quantitative biomarker for exposure or disease for areas of epidemiology currently lacking such measures. This great promise is balanced by issues related to study design, measurement tools, statistical methods, and biological interpretation that must be given careful consideration in an epidemiologic setting. This article describes the promises and challenges for epigenetic epidemiology, and suggests directions to advance this emerging area of molecular epidemiology.
Keywords: epigenetics, epidemiology, environment, disease, DNA methylation
Epidemiology is the study of the distribution of disease in populations and the causes, consequences, prevention, and treatment strategies for those diseases. Epidemiologic studies complement mechanistic toxicologic studies, where associations observed in populations can be tested under controlled conditions in the laboratory, and conversely toxicology findings can inform epidemiology study designs. Collaborative studies between epidemiologists and toxicologists are one of the strongest strategies to efficiently produce science relevant to the public’s health.
Epigenetics is formally defined as heritable changes in gene expression that occur without changes to the underlying DNA sequence. Types of epigenetic regulators include DNA methylation, histone modifications, microRNA, and prions. The majority of epigenetic epidemiology studies have focused on DNA methylation due to its relative stability with storage and the multitude of technical platforms available for analysis. Epidemiologists have observed changes in late-life health status associated with early life environmental exposures, termed the Developmental Origins of Hypothesis of Adult Disease (DOHAD) [Dolinoy et al. 2007] or the Barker hypothesis [Barker 2004]. Some of these lasting effects of exposures long since flushed from the body may be due to persistent epigenetic modifications. Prenatal conditions, epigenetic change, and adverse later life health outcomes have been observed following such extreme cases as the Dutch Hunger Winter during 1944–1945 [Heijmans et al. 2008]. This research has inspired a new wave of epigenetic epidemiology studies.
Epigenetics has been hailed as a missing mechanistic link between environmental exposures or genetics and many common diseases [Cortessis et al. 2012]. The evidence for these associations has largely been provided by promising epidemiologic research [Foley et al. 2009]. These epigenetic epidemiology studies often struggle with common issues related to design, methods, and biology that impair their ability to make causal inferences regarding the role of epigenetics in disease. This article focuses on the promises and challenges for epigenetic epidemiology, and suggests future approaches to move forward this nascent branch of molecular epidemiology.
Epigenetic mechanisms shape development, aging, and disease
Epigenetic reprogramming is a key element of normal biological development and aging as well as many common diseases. Uncovering the locations and timing of epigenetic changes in development and disease holds the promise to identify time points sensitive to change and potential disease interventions.
During mammalian development, two major waves of epigenetic reprogramming take place [Reik et al. 2001]. The first is during gametogenesis and the second occurs in the preimplantation embryos. After fertilization, the paternal genome is actively demethylated and the maternal genome is passively demethylated with DNA replication. Remethylation occurs several days later during implantation, and is subsequently maintained through cellular replication.
Similar to its role in normal development, epigenetics may be a feature of the aging process. Twin studies show greater epigenetic differences across the lifespan, potentially as a result of accumulated environmental exposures and disease, or as part of aging biology [Fraga et al. 2005; Javierre et al. 2010; Poulsen et al. 2007; Ribel-Madsen et al. 2012]. Further, across epigenomic sites that vary between individuals in a population, particular regions show change in methylation within the same person over time [Bjornsson et al. 2008; Feinberg et al. 2010]. Disease related epigenetic mechanisms may be behind middle to late life functional declines [Feinberg et al. 2010; Heyn et al. 2013]. Indeed, epigenetic modifications play a key role in many forms of cancer [Feinberg and Tycko 2004], as well as Alzheimer’s disease [Bakulski et al. 2012] and other neurodegenerative diseases [Coppede 2013].
Epigenetic marks are influenced by environment and genes
Given the critical roles of epigenetics in development and disease, it follows that identifying and characterizing modulators of epigenetics could lead to methods for disease prevention and intervention. Animal toxicological models are crucial, proof-of-principal studies showing epigenetic susceptibility to environmental conditions. Researchers exploit visible phenotypes that are driven by gene expression and epigenetic state, including the kinky tail AxinFU mouse model. Another highly studied murine epigenetic biosensor is the viable yellow agouti (Avy) allele, which features a retrotransposable element that regulates gene expression when it is inverted and inserted prior to the gene [Dolinoy and Jirtle 2008]. DNA methylation levels at these alleles are stochastic, but depend on fetal in utero and early life developmental environmental and nutritional conditions. Agouti dams fed a high methyl-donor diet produced offspring with a shift in coat color distribution toward brown, driven by increased DNA methylation at the Avy allele [Waterland and Jirtle 2003]. Coat color and DNA methylation shifts were also observed with diet (genistein) [Dolinoy et al. 2006] and environmental exposures, such as bisphenol-A [Anderson et al. 2012] and ethanol [Kaminen-Ahola et al. 2010]. Notably, rat maternal licking and grooming behavior influence offspring stress response and hippocampal DNA methylation at the glucocorticoid receptor promoter [Weaver et al. 2004]. Controlled animal and cell line experiments may be useful to identify dose-response causal relationships and demonstrate that many chemicals and behavioral conditions may be broad epigenetic regulators, potentially in humans.
Toxicology studies inspired emerging environmental epigenetic epidemiology association studies [Bollati and Baccarelli 2010], which have the ability to test these associations in human populations. Paired toxicology and epidemiology research is needed to address shortcomings of either individually, such as generalizability (species or population specificity), dosage range, and causality. In most environmental cases, human exposure levels are orders of magnitude below laboratory toxicologic dosing and epidemiologic research is needed to determine real world risk. Examples of population-based and experimental research on overlapping exposures are listed in Table 1. Specifically, global and gene-specific DNA methylation associations have been observed with diet [Fenech 2001a, b; Fenech and Ferguson 2001], lifestyle and demographic characteristics like maternal smoking [Joubert et al. 2012], and environmental toxicants [Baccarelli and Bollati 2009; Sutherland and Costa 2003]. In addition to DNA methylation, environmental factors also influence histone modifications including metals [Arita et al. 2012; Cantone et al. 2011; Chervona et al. 2012]. Broad epigenetic change does not appear to be specific to a particular class of chemical exposures. Future, replicated genome-wide environmental epigenetic studies will show whether particular chemicals map to corresponding sensitive genomic regions.
Table 1.
Broad environmental DNA methylation regulators and references, by higher order classifications of toxicants.
Inherited genes are another regulator of epigenetics, with several reports describing gene-DNA methylation associations [Bell et al. 2010; Bell et al. 2011; Bjornsson et al. 2004; Liu et al. 2013]. However, the particular locations of gene-epigenotype correspondence have not been well–informed nor mechanisms understood. A recent study observed that single nucleotide polymorphisms (SNPs) influence surrounding DNA methylation, but the size of the methylation region of impact, as well as the size of the genetic signal is inconsistent [Liu et al. 2013]. Combined analysis of genetic and epigenetic data can illuminate relationships (passive and active) between DNA methylation and gene expression[Gutierrez-Arcelus et al. 2013]. Study design options such as Mendelian randomization [Relton and Davey Smith 2012] and potential statistical strategies including mediation analysis [Liu et al. 2013] that may facilitate research in this area will be discussed in the subsequent sections. Further study is needed to understand the gene-epigene spatial relationships; the relative impact of inherited genes versus environment, age, and random noise on epigenetic marks throughout the genome remains unclear.
Roles for epigenetics in epidemiology
One of the great allures of epigenetics is the potential as a biological mechanism between genetics or environment and disease. We discuss this potential in detail below. It is critical to not only consider the option of epigenetics as a direct mechanism to disease, but also an indirect mechanism - a biomarker of exposure or disease that can be useful to epidemiology even if not mechanistically relevant. These various potential pathways for epigenetic epidemiology are presented in Figure 1.
Figure 1.
Theoretical relationships between epigenetics, disease, exposure and genotype.
Direct effects on disease risk (mediation)
Epigenetic modifications can directly cause disease (Figure 1, 1a). Imprinting disorders, such as Beckwith Weidemann syndrome (BWS), are classic examples where disease is caused by aberrant DNA methylation. Most offspring genes are expressed from both the maternal and paternal copies of chromosomes, but imprinting genes are expressed in only one copy, from a particular type of parent. Thus, changes to the parental balance in epigenetic control of expression at these imprinted sites can interrupt normal biology. BWS is caused by changes in methylation at the imprinting control regions for the genes H19, IGF2, CDKN1C, and KCNQ1, which lead to changes in their gene expression and pediatric overgrowth associated with BWS [Weksberg et al. 2010].
There are also several examples where diseases are caused by genetic mutations that act through epigenetic mechanisms, such as Rett Syndrome (Figure 1, 1b). This predominantly female neurodevelopmental disorder is caused by spontaneous mutations in the methyl-CpG-binding protein 2 (MECP2) gene on the X-chromosome [Amir et al. 1999]. MECP2 is important for recognizing epigenetic modifications controlling gene expression, and mutated MECP2 alters the expression of other genes that are normally regulated by epigenetics. In another example, a particular serotonin receptor (5HT2A) genotype at the T102C position adds 2 CpG sites to the DNA sequence. This variant increases methylation and decreases expression of 5HT2A [Polesskaya et al. 2006], which is involved in psychiatric phenotypes such as schizophrenia [Polesskaya and Sokolov 2002; Serretti et al. 2007]. Similarly in colorectal cancer, the genetic loss of a CpG site through a C-T mutation in the methyltransferase MGMT gene disrupts gene regulation and leads to disease [Ogino et al. 2007]. In a final example, a SNP in the COMT gene introduces a new CpG site and the methylation level of the variant allele is correlated with lifetime stress level and working memory [Ursini et al. 2011]. In these cases, the locations of epigenetic change are restricted to the area of genetic change. Here, the negative effects of a particular genotype are mediated through epigenetics.
In the same way, epigenetics can be the mechanism linking environmental exposures and disease (Figure 1, 1c). Many toxicants are known to create oxidative stress in cells. Oxidative DNA damage from the enivironment interferes with DNA methyltransferase action on DNA [Valinluck et al. 2004], causing aberrant DNA methylation [Turker and Bestor 1997] that can be associated with disease. Toxicants can also directly interact with DNA methylatransferases or one-carbon metabolism enzymes, influencing the global epigenetic state of cells. In the same vein, dietary methyl group and co-factor availability influence global epigenetics. Often, environmental effects on the DNA methylome are global and diffuse, differing from the sequence specificity in the genetic example. Future genome-wide studies may reveal that certain chromatin structural arrangements are more sensitive to environmental influence, so there may be more regional specificity to environmental epigenetic change. Patterns in environmentally induced epigenetic change may also arise from stochastic change followed by cellular proliferation and local selection. Perhaps for some exposures there is a natural regulation of response that yields sequence specific epigenetic sensitivity, but exposure related epigenetic signatures have not yet been defined. Epigenetics as a mediator of the environmental-disease mechanism in particular is appealing for scientists who have noticed exposures in early life can lead to late life disease [Barker 2004]. The lingering toxic effects of non-persistent environmental factors could be maintained in epigenetic marks that are relevant as the body ages.
Modification of disease risk
Epigenetics may also influence the relationships between exposure and disease (Figure 1, 2a), or genotype and disease (Figure 1, 2b) via effect modification. For example, some toxicants impact disease risk via DNA damage and the location of DNA damage (in genes, etc.) dictates disease severity. The surrounding epigenetic state can control the accessibility of DNA, and thus modify the impact of a toxicant on disease risk by titrating the susceptibility of genomic locations to exposure-induced damage [Ha et al. 2002; Jones et al. 2002; Meng et al. 2005; Smith et al. 1998]. Exposure to air pollution increases risk of cardiovascular disease and preliminary epidemiologic research shows susceptibility to elevated intermediate cardiovascular biomarkers following exposure to particles and nitrogen dioxide may in part be due to differences in methylation at the TLR2 gene [Bind et al. 2012]. Effect modification may also occur when a genetic mutation leads to disease, but disease severity and expression of the mutated protein depends on the epigenetic status of the gene. For example, Angelman syndrome is a developmental disorder caused by a genetic mutation in the ubiquitin ligase gene, UBE3A, an imprinted gene where the maternal allele is unmethylated and expressed in the brain. Because the paternal allele is methylated and unexpressed, mutations in the paternal UBE3A gene do not lead to disease while alterations in the maternally expressed sequence do result in Angelman syndrome. Thus, epigenetic context modifies the impact of the mutation. This is a challenging scenario for traditional genetic epidemiology research because the genetic effects would be masked when they are averaged over different epigenetic contexts. Genetic mutations that similarly depend on the parent-of-origin for disease risk have been observed in diseases such as autism [Arking et al. 2008; Fradin et al. 2010], bipolar disorder [Stine et al. 1995], and multiple sclerosis [Ebers et al. 2004], although specific epigenetic mechanisms to explain these parental origin effects have not yet been described.
Biomarkers of disease and environment
Alternatively, there may be specific disease and exposure examples where epigenetic factors are not involved mechanistically, but can serve as useful biomarkers of disease (Figure 1, 3) or of a particular exposure (Figure 1, 4). For example, epigenetic biomarkers are effective tests of disease status or of disease subtype when targeting treatment. In the context of cancer, epigenetic tests are involved in disease prediction, diagnosis, and prognosis [Esteller 2008]. The original example of the utility of epigenetics in personalized cancer treatment was associated with glioblastoma tumors. The DNA repair enzyme O6-methylguanine-DNA methyltransferase (MGMT) was hypermethylated in tumors that responded to alkylating agents, resulting in extended survival [Esteller et al. 2000]. Thus, patients with hypomethylated MGMT tumors would not be targeted for alkylating agent treatment. In addition, hypermethylation of the detoxification enzyme, glutathione s-transferase P (GSTP1), is consistent in prostate cancer. Prostate cancer is a disease where we have a blood-based screening test (prostate-specific antigen, PSA) with high sensitivity. Combining the PSA test with an epigenetic biomarker test could increase diagnosis specificity, by reducing the number of false positives [Sunami et al. 2009].
Secondly, epigenetic marks may represent “tombstones” of previous environmental exposures that have been flushed from the body. For example, in two separate birth cohorts, researchers found matching DNA methylation sites in newborn cord blood that correlated to maternal smoking behavior in pregnancy [Joubert et al. 2012]. Methylation status at these sites may be a biomarker of in utero smoke exposure and appear to persist in children 2–5 years of age (Ladd-Acosta 2013 unpublished). For other exposures that are more challenging to retrospectively assess than maternal smoking, epidemiology studies may greatly benefit from a validated epigenetic biomarker, particularly one that can be measured several years after the exposure.
Finally, epigenetic marks may be used to identify disease- or exposure-susceptible populations or individuals. In cases where epigenetics is not a driver of disease, it may still be useful as a biomarker of disease, exposure, or of susceptible populations.
Epigenetics as potential mechanism for transgenerational effects
Preliminary results suggest epigenetic changes may be transgenerational, in which case the promises of epigenetic epidemiology are even more far-reaching. In principal, transgenerational epigenetic inheritance has been demonstrated in Arabidopsis thaliana [Schmitz et al. 2011], Caenorhabditis elegans [Greer et al. 2011] and the AxinFu mouse model [Rakyan et al. 2003]. Environmentally induced changes in epigenetics may be passed through the germline. Rat in utero exposure to the fungicide vinclozolin produced male offspring with impaired spermatogenesis that persisted to the F4 generations and were correlated with germ line DNA methylation change [Anway et al. 2005]. In humans, early life exposures as known to influence later life health outcomes; for example, low birth weight predicts risk of obesity, type 2 diabetes, and cardiovascular disease [Hales and Barker 1992; McMillen and Robinson 2005]. Women pregnant during the Dutch hunger winter during World War 2 had female offspring with altered lipid profiles in adulthood [Lumey et al. 2009], which was accompanied by persistent epigenetic change [Heijmans et al. 2008; Tobi et al. 2009]. Careful distinction must be made in humans to separate the effects of in utero or early life exposures influencing later life health outcomes, potentially due to epigenetic programming, for which the evidence is mounting and epigenetic inheritance over multiple generations, with much more preliminary evidence [Gluckman et al. 2007; Schmidt 2013]. Rather than in DNA and chromatin, transgenerational inheritance may be linked to RNA factors such as piRNA and microRNA in gametes [Daxinger and Whitelaw 2012]. There is support for transgenerational epigenetic effects in many eukaryote models, but the molecular basis of the inheritance is currently poorly characterized.
Challenges and directions
The outlook surrounding the field of epigenetic epidemiology is very hopeful for the promises detailed above. Practicing researchers, however, have repeatedly described several challenges facing epigenetic epidemiology on the ground [Heijmans and Mill 2012]. In this section, we describe the universal research issues and propose potential approaches for addressing them and moving the field forward.
Catalogue of information
There is a fundamental need to address foundational questions of epigenetic epidemiology prior to more nuanced research questions related to exposures or disease. First, we need to understand which level of the epigenome to examine for changes (DNA methylation, histones, microRNA, etc.). This may depend on the stability of the epigenetic mark over time or in storage, or the sensitivity of the mark to changes in the external or internal environment. Second, there are millions of epigenetic marks in any given cell. We need to know where in the epigenome to look for differences. For example, in DNA methylation studies, there has been an evolution of focus from CpG islands to CpG island shores and a focus on parts of the epigenome with intra- and inter-individual variability [Irizarry et al. 2009]. Epidemiology studies patterns in groups of people and this depends on an understanding of inter-individual differences in the epigenome. We need to characterize normal variability within and between people, in order to comment on what is altered [Bock et al. 2008]. This requires cross-tissue and cross-population measurements to establish variable methylation and catalogue the locations, somewhat analogous to the Haplotype Mapping project for common genetic polymorphisms.
Similarly, we need to understand the spatial relationships in the epigenome. Nearby DNA methylation CpG sites are often correlated in their methylation state. It will be important to understand which sites are individually and which are coordinately methylated. Better knowledge of the spatial structure of the epigenome will help target our discovery searches. Next, we need to understand the temporal changes in the epigenome. These may be related to long-term change with aging, or it may be on a much shorter time scale. Histone acetylation at clock-controlled genes is responsive to circadian rhythm [Bellet and Sassone-Corsi 2010]. Epigenetic state is dynamic and the timing of our sample collection is a critical element.
Epigenome-scale public databases are a potential approach to share foundational epigenetic information. The NIH Roadmap Epigenomics Mapping Consortium (http://www.roadmapepigenomics.org/) was launched to provide a normal reference for ex vivo DNA methylation, chromatin assembly, histone modifications, and small RNA epigenomes from various tissues. This database is already providing comprehensive information on different epigenomic levels, variation across the epigenome, and tissue variation. This ambitious undertaking will inform development of future epigenome-wide technologies and the design of new studies. Unfortunately, documenting inter-individual “normal” variation is outside the scope of this project and additional epidemiologic studies will inform on this point.
Study design
Epigenetic epidemiology requires unique study design considerations. The study timing, sample types, and scale of epigenetic epidemiology are very different than for genetic epidemiology [Foley et al. 2009]. For example, genome-wide association studies (GWAS) often bank cells from participants in culture, which represents a virtually unlimited source of DNA for follow-up studies. However, it is well documented that cell culture conditions impact epigenetic state, so epigenetic epidemiology studies require sufficient primary samples, often with specific collection protocols to preserve the epigenetic mark of interest. In addition, genetics represent a fixed exposure throughout the lifetime; we can assume genetic effects temporally occur prior to disease onset regardless of when the biological sample was taken relative to exposure or disease. Epigenetics, in contrast, are a dynamic exposure that can vary over short intervals. If samples are collected at disease diagnosis, we cannot make assumptions about the role of epigenetics as a driver of disease or a late-stage passenger of the condition. Studies must be very cautious of reverse causation in epigenetic research.
Many of the epigenetic epidemiology study design challenges can be overcome with planning, longitudinal designs, and clearly specified a priori hypotheses and models. Prospective data collection and repeated longitudinal epigenetic measures will help to address questions about temporality, particularly those that begin at conception or in early in utero development. At risk studies, which examine earlier in the disease process during subclinical or disease risk states, will be particularly important. By working with different age-windows, we will be better able to understand the directions of the exposure, epigenetic, and disease relationships. Further, statistical approaches that apply causal inference concepts, such as Mendelian randomization [Relton and Davey Smith 2012] can protect against over-interpretation of results in the face of potential reverse causality, particularly in cross-sectional and case-control studies. In general, careful a priori attention to which role of epigenetic marks is being investigated - causal, a modifier, or a biomarker – should be taken at the design stage.
Sample availability
Related to the study design issues is the challenge of sample availability. The first point related to sampling is that collection timing matters. Etiologic studies require samples collected prior to disease. New prospective collections of large numbers of samples necessitate time and considerable funds. It is prudent, though not always possible, to identify appropriate archived samples. Second, the type of sample, whether fresh or frozen, influences the types of epigenetic analyses that are possible. DNA methylation is considered stable in frozen samples, but collection for histone modifications and microRNA requires additional considerations. Immortalized cell culture derived samples are not appropriate for epigenetic profiling because immortalization processes inherently alter the epigenome [Farwell et al. 2000; Grafodatskaya et al. 2010; Kulaeva et al. 2003], though primary cell culture samples have been used effectively in epigenetic toxicology studies [Rager et al. 2013]. In addition, genetic, RNA, protein, metabolite, and epigenetic data are needed on common samples to answer complex biological questions. Each of these analytes has specific necessary storage conditions [Deng et al. 2004; Masson et al. 2010; Shechter et al. 2007]. Third, most tissue samples are heterogeneous cell populations. To examine individual cell type signals, most physical cell sorting methods require fresh samples [Hawley and Hawley 2011]. As discussed further in the biological interpretation and statistical approaches sections, cell type composition of a mixed sample is an important consideration for interpretation and analytic approach.
The fourth issue with respect to sampling is the paradigm of target tissue and surrogate tissues. There is still debate about the utility of surrogate tissues, such as peripheral blood compared to the primary tissue for a specific disease. However, such primary tissue - for example brain - is simply unavailable for in life sampling. Thus, we may be limited to mixed or indirect signals of “true” effects. There is a growing body of literature to suggest blood or lymphocyte epigenetic profiles may have relevance to tissues of interest. Inter-individual differences in gene expression [Sullivan et al. 2006] and, in a limited set, DNA methylation [Davies et al. 2012] were consistent across blood and brain tissues, however the utility of blood DNA methylation for non-blood related disorders is still an open question. Gene expression and epigenetic findings from blood are evidenced in colorectal neoplasia [Cui et al. 2003], schizophrenia and bipolar disorder [Kuratomi et al. 2008; Tsuang et al. 2005], Alzheimer’s disease [Wang et al. 2008], fragile X syndrome [Darnell et al. 2001; Nishimura et al. 2007], and autism [Gregg et al. 2008; Hu et al. 2006; Nakamura et al. 2008; Nguyen et al. 2010; Wong et al. 2013], among others. Thus, some biological disease differences may span non-target tissues and be detectable via epidemiologic studies. Complementary study designs, where epigenetic associations are examined in both post mortem disease tissues as well as in vivo peripheral tissues will be very powerful.
It is in our best fiscal and time management interest to work with established studies, yet existing cohorts often have limited availability of appropriate biosamples at appropriate collection windows.
Measurement tools
Thus far, epigenetic epidemiology has largely been synonymous with the molecular epidemiology of DNA methylation. These studies measure the percentage of methylated cytosine residues within CpG dinucleotides. It is possible to measure the methylation levels at individual CpG sites, but as a first-pass, studies often measure a proxy of the global methylation level of a sample across the epigenome. Global or total methylation methods include high-performance liquid chromatography (HPLC) [Ehrlich et al. 1982], methylation-specific in situ antibody fluorescence [Miller et al. 1974], repetitive element (such as long interspersed nucleotide element-1, LINE-1) bisulfite sequencing [Yang et al. 2004], and LUminometric Methylation Assay (LUMA) of CGCG sequences [Karimi et al. 2006]. Gene or region-specific DNA methylation is measured by methylation sensitive restriction enzymes [Singer-Sam et al. 1990], bisulfite methylation specific PCR [Herman et al. 1996], restriction digestion (COBRA) [Xiong and Laird 1997], MethyLite fluorescence-based real time PCR [Eads et al. 2000], base-specific cleavage with mass-spectrometry [Ehrich et al. 2005], and bisulfite Pyrosequencing [Tost and Gut 2007], among other methods. Assays have progressed to determine genome-scale DNA methylation coverage including antibody-mediated methylation dependent immunoprecipitation (MeDIP) with oligonucleotide array [Weber et al. 2005], restriction enzyme (mcrBC) digestion with tiling array (CHARM) [Irizarry et al. 2008], whole-genome bisulfite sequencing [Cokus et al. 2008], array capture bisulfite sequencing [Hodges et al. 2009], Illumina Infinium 450k array of bisulfite converted DNA [Bibikova et al. 2011], and reduced-representation bisulfite sequencing (RRBS) [Gu et al. 2011]. Each of these methods features a trade-off between feasibility, cost, input requirements, tissue collection, genome-coverage, and measurement precision. Appropriate epigenomic measurement tools will vary by study. Comparisons across platforms deal with differences in genomic coverage and quantitation, which is sometimes difficult for intermediate levels of methylation.
The technical approaches available to epigenetic researchers are already diverse and they are consistently producing higher quality data at lower costs. As a result, researchers are using multiple platforms with thousands or millions of observations. The data generated across platforms are often not easily comparable and use different genome builds, restricting our ability to make measurements over time or perform pooled or meta-analyses, while keeping up with technological developments. Greater efforts will be needed to bring cohesion to our epigenomics findings, perhaps using informatics to make measurements backward and forward compatible with new technologies.
Statistical analysis and integration
The field of statistical epigenomics is only now developing. Considerations span measurement error issues and signal-to-noise enhancement, estimation and testing of particular association and causal models, and integration of epigenetic measures with other “omics” level data.
First, high-throughput laboratory epigenetic measurements, such as array-based and sequencing protocols are subject to batch effects when signals are variable as a function of environmental conditions, personnel, and reagent group [Leek et al. 2010]. Epigenetic technology development sometimes outpaces analytics development and many of the platforms do not yet have standard pipelines to adjust for batch effects. However, some of these batch concerns can be solved in design by randomizing sample placement across batches. Also, there is not yet uniformity on data quality assurance/quality control measures or normalization procedures [Marabita et al. 2013; Wang et al. 2012], although lessons from expression array analyses and other fields are being applied.
DNA methylation and histone modifications are major drivers of tissue and cell differentiation, and thus particular cell types have specific epigenomic profiles [Irizarry et al. 2009; Jones and Taylor 1980]. As mentioned earlier, epidemiologic samples are often complex mixtures of different cell types, and thus associations with disease may be confounded by cell type distribution, if those distributions are a surrogate of the disease cause. This can be handled via familiar approaches to confounding, such as regression adjustment or stratification, if cell composition is known, or cell types can be isolated. For example, whole blood DNA represents fractions of DNA from granulocytes, monocytes, T-cells, B-cells, etc. Conducting laboratory analyses stratified by cell type can solve this, (and may be most relevant for some questions where effects of interest are isolated to particular cell types –see the Biologically interpreting the results section below) although this is not feasible or cost-effective for most epidemiological studies. Recently, investigators have addressed this by using cell type-specific methylation signatures to predict cell type proportions from mixed tissue samples such as blood and using these predictions for adjustment in disease association analyses [Houseman et al. 2012].
Beyond these technical issues of batch, normalization, and protection from confounding, the field is still settling on the most appropriate analytic approach to detect disease associations within the epigenome. Approaches include single-CpG statistical tests and region-based tests such as “bump hunting” that borrows information from neighboring sites to smooth over findings and exploit correlation in methylation signals [Jaffe et al. 2012] and Aclust that detects clusters between adjacent sites [Sofer et al. 2013]. Gold-standard statistical tests at the epigenome level have not been fully defined and it is currently unclear what reasonable expectations of effect size will be in epidemiologic settings. Calculations of sample size and study power in epigenetic epidemiology are difficult, but requirements may be less than the numbers needed for GWAS given the reduced number of features for many epigenome platforms compared to GWAS and the quantitative nature of epigenetic signals compared to categorical genotypes. Ultimately, replication across studies should drive long-term interpretation.
Concerns that associations may reflect a consequence of the disease rather than an indicator of etiology can be addressed by carefully considering the purpose of the epigenetic analyses (see the Study design section above), and clear causal modeling such as Mendelian Randomization in situations where genotype may be an appropriate instrumental variable [Liu et al. 2013; Smith and Ebrahim 2004].
Measurement and integration across the epigenome with the genome and environment, as key regulators of the epigenome, are needed, and common measures to ensure that our tests are “statistically and biologically sound” are important in this regard [Heijmans and Mill 2012].
Biologically interpreting the results
While epigenetic modifications may be simply biomarkers of exposure or disease, the great hope is that epigenetic research may provide direct mechanistic insight regarding exposure impacts on the body, and potentially on disease. Since epigenetic mechanisms contribute to gene regulation, one functional assessment of epigenetic modification is the correlation with gene expression. However, this correlation is not always apparent when, for example, DNA methylation occurs at non-promoter, or non-genic sites, such that the targeted DNA for regulation/expression is not clear. Examination of correlation between methylation and expression based on single CpGs – single genes versus regional examination at either level may provide different insights. We need to better understand the functional influence of altered marks individually and collectively. Further, most analyses of DNA methylation and RNA gene expression consider both as linear predictors/outcomes. Non-linear relationships may more accurately describe the functional implications of DNA methylation.
Mechanistic work must rely on data from specific target tissues, and at specific developmental or lifespan windows, which, as mentioned earlier, is a particular challenge in large epidemiologic settings. Our partnerships with basic scientists will be critical to characterizing biological relationships and determining biological plausibility of our population-based observations. Mechanistic studies are needed to better understand the implications of epidemiologic associations, leading to therapeutic interventions and treatments.
Public relations and communications
High expectations of the human genome project were deflated by the complexity of the relationships between genetics and most diseases. The public experienced disappointment in human genetics when few clearly disease-causing genetic variants were identified. Epigenetics has experienced considerable hype within the scientific community, which may extend to the public, and could suffer the same fate if expectations are set unreasonably high. Thus far, the observed effect sizes, outside of foundational tissue and cancer studies, have been relatively small. It is important to recognize the promise, but also the limitations of epigenetic studies in humans, including the limitations on available tissue types, cell heterogeneity, and developmental windows. We will need to use appropriately cautious interpretations of results and be clear and deliberate in our communication with the media.
Recent exciting advances in measurement technology and analytic techniques offer greater epigenetic epidemiology opportunities than ever before. The field has the potential to address questions about mechanisms of basic biology, development, disease, environmental exposures, nutrition, and aging, and may open up new measurement tools for biomarker-based analyses of exposures or disease in epidemiology. Careful designs of new studies, clever leveraging of existing studies, and rigorous statistical and causal modeling will be necessary to overcome challenges inherent to the field related to temporality and biological interpretation. Like other areas of molecular epidemiology, epigenetic epidemiology can provide great insight into the biological changes associated with exposures and disease and may indeed bridge the genetic and environmental epidemiology fields.
