Abstract
The Illumina Infinium MethylationEPIC provides an efficient platform for profiling DNA methylation in humans at over 850,000 CpGs. Model organisms such as mice do not currently benefit from an equivalent array. Here we used this array to measure DNA methylation in mice. We defined probes targeting conserved regions and performed differential methylation analysis and compared between the array-based assay and affinity-based DNA sequencing of methyl-CpGs (MBD-seq) and reduced representation bisulfite sequencing. Mouse samples consisted of 11 liver DNA from two strains, C57BL/6J (B6) and DBA/2J (D2), that varied widely in age. Linear regression was applied to detect differential methylation. In total, 13,665 probes (1.6% of total probes) aligned to conserved CpGs. Beta-values (β-value) for these probes showed a distribution similar to that in humans. Overall, there was high concordance in methylation signal between the EPIC array and MBD-seq (Pearson correlation r = 0.70, p-value < 0.0001). However, the EPIC probes had higher quantitative sensitivity at CpGs that are hypo- (β-value < 0.3) or hypermethylated (β-value > 0.7). In terms of differential methylation, no EPIC probe detected a significant difference between age groups at a Benjamini-Hochberg threshold of 10%, and the MBD-seq performed better at detecting age-dependent change in methylation. However, the top most significant probe for age (cg13269407; uncorrected p-value = 1.8 x 10−5) is part of the clock CpGs used to estimate the human epigenetic age. For strain, 219 EPIC probes detected significant differential methylation (FDR cutoff 10%) with ~80% CpGs associated with higher methylation in D2. This higher methylation profile in D2 compared to B6 was also replicated by the MBD-seq data. To summarize, we found only a small subset of EPIC probes that target conserved sites. However, for this small subset the array provides a reliable assay of DNA methylation and can be effectively used to measure differential methylation in mice.
Introduction
There has been a surge in large-scale epigenetic studies in recent years. In particular, epigenome-wide association studies (EWAS) of DNA methylation have shown associations with physiological traits [1, 2], diseases [3–5], environmental exposures [6, 7], aging [8], and even socioeconomic [9] and emotional experiences [10]. The development of robust and reliable methylation microarrays has been an important driving force. In particular, the Illumina Human Methylation BeadChips have made it both convenient and cost-effective to incorporate an epigenetic arm to large epidemiological studies [11, 12]. The latest version, the Illumina Infinium MethylationEPIC BeadChip (EPIC), provides an efficient high throughput platform to quantify methylation at 866,836 CpG sites on the human genome [13, 14]. A remarkable biological insight that has emerged from these array-based studies is the definition of the methylation-based “epigenetic clock,” a biomarker of human age and aging (i.e., the epigenetic clock) that is defined using specific probes represented on these arrays [8].
Currently there is no equivalent microarray platform for model organisms and work in experimental species have largely relied on high-throughput sequencing. For instance, while the human DNA methylation age can be calculated from a few hundred probes on the Illumina BeadChips, a similar effort in mice required a more extensive sequencing of the mouse methylome [15]. However, CpG islands (CGIs) are largely conserved between mice and humans and the two species share similar numbers of CGIs and similar proportions of CGIs in promoter regions of genes [16]. Considering that these CpGs and CGIs are highly conserved in gene regulatory regions, it is feasible that probes on the human microarrays that target these sites may have some application in research using rodent models. This was previously evaluated for the two older versions of the Illumina HumanMethylation BeadChips [17]. A more recent study has also evaluated the EPIC array for conserved probes [18]. These studies have shown that a subset of the probes target highly conserved sites and can be used to measure DNA methylation in mice and possibly other mammalian species.
In the present work, we extend the conservation analysis of the EPIC platform by applying a quantitative approach to evaluate the capacity of these probes to detect differential methylation in mice. We begin by defining the conserved probes and the key features of the corresponding CpG sites in the context of the larger mouse and human genomes. We also compare the methylation signal detected by the conserved probes with affinity-based methyl-CpG enriched DNA sequence (MBD-seq) data from the same samples and evaluate if the conserved probes are informative of age and strain differences in mice. Additionally, we perform comparison with a publicly available mouse CpG methylation data generated by reduced representation bisulfite sequencing (RRBS).
Materials and methods
Defining conserved EPIC probes
Sequences for the 866,836 CpG probes were obtained from Illumina (http://www.illumina.com/). The probe sequences were aligned to the mouse genome (mm10) using bowtie2 (version 2.2.6) with standard default parameters. A total of 34,981 probes aligned to the mouse genome of varying alignment quality. Conserved probes were then defined based on quality of alignment. For this, we filtered out all sequences with a low mapping quality (MAPQ) of less than 60 (15,717 excluded) and those that contain more than two non-matching base pairs (1,092). To retain only the high-quality probes, we further filtered probes based on confidence in DNA methylation signal and based on this, 4,507 probes with detection p-values > 0.0001 were removed. This generated a list of 13,665 high quality probes that are conserved sequences and provide reliable methylation assays in mice (these are listed in S1 Data). CpG island annotations [19] for the respective genome were downloaded from UCSC Genome Browser (http://genome.ucsc.edu) and distribution of conserved probes and positions of CGIs were plotted to the human (GRCh37) and mouse (mm10) genomes using CIRCOS [20].
For conserved sequences, there is high correspondence in functional and genomic features between mouse and human genomes and we referred to the human probe annotations provided by Illumina to define the location of conserved probes with respect to gene features and CpG context (i.e., islands, shores, shelves) (S1 Data). To evaluate if the conserved set is enriched in specific features relative to the full background set, we performed a hypergeometric test using the phyper function in R.
Animals and sample preparation
Tissues samples were derived from mice that were part of an aging cohort maintained at the University of Tennessee Health Science Center (PI: Robert W. Williams). Details on animal rearing and sample collection are described in Mozhui and Pandey 2017 [21]. All animal procedures were approved by the Institutional Animal Care and Use Committee (IACUC) at the University of Tennessee Health Science Center.
Liver tissues were collected from mice aged at ~4 months (mos; young), ~12 mos (mid), and ~24 mos (old). The mice were of two different strains—C57BL/6J (B6) and DBA/2J (D2)—and as the colony was set up to study aging in females, the majority of the mice in this study are females (Table 1). Mice were euthanized by intraperitoneal injection of Avertin (250 to 500 mg/kg of a 20 mg/ml solution), followed by cardiac puncture and exsanguination. All sample collection procedures were done on the same day within a 3-hour timeframe. Liver samples were snap-frozen and stored at -80°C until use.
Table 1. Sample details and average methylation signal intensity.
Full set (850K) |
Conserved set (13665) |
|||||||
---|---|---|---|---|---|---|---|---|
Sample | Age | Age (months) | Strain1 | Sex | Mean | Median | Mean | Median |
Mouse1 | young | 4 | D2 | F | 505 | 394 | 3206 | 1898 |
Mouse2 | young | 4 | D2 | F | 926 | 524 | 10989 | 10278 |
Mouse7 | young | 4 | B6 | F | 877 | 538 | 9866 | 8702 |
Mouse8 | young | 4 | B6 | F | 766 | 397 | 10386 | 9975 |
Mouse3 | mid | 12 | D2 | F | 852 | 483 | 10615 | 9880 |
Mouse4 | mid | 12 | D2 | F | 818 | 430 | 10866 | 10542 |
Mouse5 | mid | 12 | D2 | M | 845 | 456 | 11545 | 10982 |
Mouse9 | mid | 12 | B6 | M | 852 | 444 | 11433 | 11187 |
Mouse6 | old | 24 | D2 | F | 737 | 379 | 10206 | 9611 |
Mouse10 | old | 24 | B6 | F | 845 | 448 | 10767 | 10436 |
Mouse11 | old | 24 | B6 | F | 886 | 490 | 11302 | 10741 |
Human1 | 7568 | 7218 | 8710 | 8616 | ||||
Human2 | 10668 | 10288 | 11761 | 11599 |
1 D2: DBA/2J; B6: C57BL/6J
DNA was purified from the liver tissue using the Qiagen AllPrep kit (http://www.qiagen.com) on the QIAcube system. Nucleic acid quality was checked using a NanoDrop spectrophotometer (http://www.nanodrop.com). As reference, we also included two human samples. These are DNA derived from the buffy coats from two individuals.
DNA methylation microarray and data processing
DNA methylation assays were performed as per the standard manufacturer’s protocol (http://www.illumina.com/). In brief, 500 ng of DNA extracted from the mouse liver was treated with sodium bisulfite to convert cytosine to uracil. The 5-methyl cytosine remains unreactive to sodium bisulfite. The DNA is then hybridized to the EPIC BeadChip. After washing off unhybridized DNA, a single base extension was recorded to calculate the methylation level at the CpG probe site. DNA methylation assays were performed at the Genomic Services Lab at the HudsonAlpha Institute for Biotechnology (http://hudsonalpha.org). Raw intensity data files (idat files) for both mouse and human samples were processed using the R package Minfi [22]. The full mouse data is available from NCBI NIH Gene Expression Omnibus (GEO accession ID GSE110600).
The intensity and β-values were used to evaluate the performance of the EPIC probes in mice and humans. Comparisons were based on the full set of 850K probes and the conserved set of 13,665 probes. We also used the β-values and signal intensity scores for the 13,665 probes to perform hierarchical clustering and principal component analysis for the mouse samples. From initial quality checks, we identified one outlier mouse sample (S1 Fig) that had lower intensity and higher detection p-value compared to the other mouse samples. This sample was excluded from the statistical tests.
Comparison with high-throughput sequencing data
The mouse samples we report here were previously assayed for DNA methylation using MBD-seq [21]. This is an affinity-based enrichment of methylated CpGs using the methyl binding domain (MBD) of methyl-CpG-binding protein 2, followed by high throughput sequencing (MBD-seq) [23–25]. Sequencing was performed on Life Technologies’ Ion Proton platform. Data have been deposited to the NCBI’s Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/; GEO accession ID GSE95361) and Sequence Repository Archive (https://www.ncbi.nlm.nih.gov/sra/; SRA accession ID SRP100703). To compare methylation signal detected by the conserved EPIC arrays, we extracted MBD-seq reads at the corresponding sites. MBD-seq does not provide single-base resolution as the resolution is limited to the fragment size, in this case ~300 bp. However, since methylation levels at neighboring CpGs are largely correlated [26], we derived quantitative data from the number of read fragments that map to a CpG region. For the sites in the mouse genome targeted by the conserved EPIC probes, we expanded the window to 300 bp bins, and extracted the MBD-seq fragment counts. The CpG density-normalized methylation level was then quantified using the MEDIPS R package [27]. We then used Pearson’s correlation to compare the EPIC β-values and the relative methylation score (rms or the CpG density normalized methylation) detected by MBD-seq [28].
For additional comparison, we used a publicly available mouse RRBS data (GEO accession ID GSE93957, sample GSM2465617 at http://www.ncbi.nlm.nih.gov/geo/). This data was generated from liver tissue of mouse strain C57BL/6-BABR and alignment was to the GRCm38/mm10 mouse genome build [15]. By matching genome coordinates, we identified CpGs that were interrogated by both conserved EPIC probes and the RRBS. From the RRBS data, we used the methylation percentage (counts for methylated/unmethylated) to correlate with the β-values.
Analysis of differential methylation
Statistical analyses were done in R (https://www.r-project.org/) and JMP Statistics (JMP Pro 12). Mice were grouped into three age categories (young, mid, and old; additional sample details are in Table 1). To evaluate differential methylation detected by the 13,655 conserved probes, we applied a regression model with age, strain and sex as predictors (~ageGroups + strain + sex) for each probe using the R glm function and type III anova to calculate test statistics (equations are provided in S1 Data). For the MBD-seq reads, we performed differential methylation analysis of the read counts using the edgeR R package [29]. The same linear regression model was applied (~ageGroups + strain + sex) and equations are provided in S1 Data. We then cross-compared differential methylation detected by the two methods. Treating the EPIC data as a discovery set, we applied the Benjamini-Hochberg (BH) procedure to control the false discovery rate (FDR) [30, 31]. We then defined differentially methylated CpGs (DMCpGs) and evaluated the corresponding region in the MBD-seq data to test replication at a lenient uncorrected p-value threshold of 0.05. Likewise, in the reverse comparison, we applied an FDR threshold to identify differentially methylated regions (DMRs) in the MBD-seq data, and tested replication of the corresponding CpG at an uncorrected p-value threshold of 0.05.
Results
Conserved Infinium MethylationEPIC probes
The human EPIC array contains 866,836 50-mer probes. Out of these, we defined a total of 13,665 probes that align to conserved sites in the mouse genome and provide high quality methylation signal (details on mapping quality scores and methylation signal confidence are provided in S1 Data). In the full set of EPIC probes, 71% are located within annotated gene features or within 200–1,500 bp upstream of transcription start sites (TSS). Compared to this background set, a higher percent of the conserved probes (88%; 11,972 probes) target such functionally annotated regions. Probes that target CpGs located in exons, 5’ UTR, and within 200 bp upstream of TSS (TSS200) are highly overrepresented among the conserved set (Table 2). This is expected, since sequences in these functional regions are conserved across species. The upstream regulatory regions and the first exon harbor a large percent of CGIs, and compared to the background set, there is close to a 2.5-fold higher enrichment in CGIs among the conserved probes (Table 2). In contrast, there is no enrichment in probes that target CpGs that are between 200–1,500 bp upstream of TSS (TSS1500), gene body (mostly intronic), 3’ UTRs, and non-genic regions. Locations of the conserved probes and CGI densities in the human and mouse genomes are shown in Fig 1.
Table 2. Genomic features of CpGs and enrichment in conserved sites.
Full set (850K) |
Conserved set (13665) |
||||
---|---|---|---|---|---|
Feature | Counts | Percent Total |
Counts | Percent Total |
Enrichment p3 |
Gene features1 | |||||
TSS1500 | 107193 | 12 | 1195 | 9 | ns |
TSS200 | 65152 | 8 | 1940 | 14 | <1.0E-15 |
5'UTR | 73070 | 8 | 1269 | 9 | 1.8E-04 |
1stExon | 26433 | 3 | 2028 | 15 | <1.0E-15 |
Exon | 5680 | 1 | 282 | 2 | <1.0E-15 |
3'UTR | 21594 | 2 | 340 | 2 | ns |
Body | 318165 | 37 | 4918 | 36 | ns |
Non-Genic | 249549 | 29 | 1693 | 12 | ns |
CpG islands and flanking regions2 | |||||
Islands | 161598 | 19 | 6270 | 46 | <1.0E-15 |
Shores | 154735 | 18 | 2267 | 17 | ns |
Shelves | 61811 | 7 | 664 | 5 | ns |
Open Sea | 488692 | 56 | 4464 | 33 | ns |
1 CpG position relative to gene features based on annotations from Illumina (UCSC_RefGene_Group). TSS1500 and TSS200 are CpGs at –200 or 200–1500 upstream of are transcription start sites; Non-genic are CpG with no annotated gene features.
2 Shores = 0–2 kb from islands; shelves = 2–4 kb from islands
3 Enrichment of gene features and CpG regions in the conserved set compared to the full set based on hypergeometric test
Comparison of probe performance in mouse and human samples
We used data generated from two human samples as a reference. Using the full set of 850K probes, the mouse samples showed low overall signal intensity (Fig 2A). The mean signal intensity for the two human samples was 9,118 ± 2,192 (Table 1). For the mouse samples, the mean signal intensity was 810 ± 114 (Table 1). The β-value distribution also showed poor performance for mice with a peak β-value at 0.4 that indicates failure for probes. The methylation β-values in human samples showed the expected bimodal distribution that characterizes the Illumina methylation arrays (Fig 2B) [13, 14].
The EPIC BeadChip clearly performed poorly in mice when we considered the full set of probes. However, when we considered only the 13,665 conserved probes, the methylation signal became comparable between the mouse and human samples. Total mean signal intensity for the mouse samples ranged from 9,866 to 11,545 (Mouse1, which failed the initial QC, has very low signal intensity compared to the other mouse samples; this was excluded from differential methylation analysis) (Table 1). Mean signal intensity for the two human samples were 8,711 and 11,761 (Table 1). The bimodal β distribution was also observed for this set of conserved probes in mouse samples (Fig 2C and 2D).
Comparison with MBD-seq and RRBS
To determine if we could find a concordant methylation signal, we compared the microarray β-values with the CpG density-normalized rms derived from MBD-seq data (average β-values and rms are provided in S1 Data). As in the case of the EPIC, the MBD-seq data also showed the bimodal distribution. Overall, there was concordance between the two technologies, and the β-values and rms were significantly correlated (Pearson’s correlation of 0.70, p < 0.0001; Fig 3A). However, several CpGs also showed discrepant signal between the two technologies (i.e., CpGs with low β-values associated with high rms and vice versa). To assess the number of probes associated with concordant or discordant methylation levels, we grouped the CpGs into three categories based on β-values—hypomethylated for β < 0.3, hemimethylated for 0.3 ≤ β ≤ 0.7, and hypermethylated for β > 0.7—and examined the corresponding rms values. Given the high representation of islands and CpGs in 5’ regions of genes, which generally remain hypomethylated [16, 32], the majority of the conserved probes fell into the hypomethylated category (Table 3). For the hypomethylated probes, 82% of the corresponding CpG regions also had rms < 0.3 (Table 3). For many of the CpGs regions that correspond to the hypomethylated probes, the rms were close to 0, which indicates poor coverage by MBD-seq. For hemimethylated probes, 58% of the corresponding regions had 0.3 ≤ rms ≤ 0.7 and 31% had rms < 0.3. For hypermethylated probes, only 40% of corresponding regions were associated with rms > 0.7, and 54% had 0.3 ≤ rms ≤ .7. The corresponding CpG regions for this hypermethylated set tended to have rms close to 0.75. This clustered rms distribution for CpG regions at the lower and upper levels of methylation indicate that the MBD-seq has lower quantitative sensitivity at these regions.
Table 3. Counts of Illumina human MethylationEPIC probes by β-value and concordance with MBD-seq at corresponding CpG regions.
Counts of CpG regions by rms value2 | ||||
---|---|---|---|---|
CpG Category1 | Probe counts1 | rms < 0.3 | 0.3 ≤ rms ≤ 0.7 | rms > 0.7 |
Hypo (β < 0.3) | 7548 | 6198 | 1000 | 350 |
Hemi (0.3 ≤ β ≤ 0.7) | 3159 | 973 | 1827 | 359 |
Hyper(β > 0.7) | 2956 | 171 | 1599 | 1186 |
1Conserved probes on the HumanMethylationEPIC arrays were grouped by β-value. These are counts in each category.
2CpG For each category of probes, the corresponding CpG regions were counted and grouped by CpG density normalized relative methylation score (rms) to determine concordance between the array and MBD-seq
For comparison with a bisulfite-based assay, we obtained RRBS data for mouse liver. We found only 2,548 CpGs in the RRBS data that were also interrogated by the conserved EPIC probes. Using this smaller subset, we compared the array-based β-values with the RRBS-based methylation percent (here represented as fraction methylated; Fig 3B). Overall, there is significant correlation between the two data (R = 0.78, p < 0.0001) and this is particularly true for CpGs with methylation percent that range between 0 and 100 in the RRBS. However, comparison with the RRBS was limited by the poor read coverage for several of the CpGs that resulted in either 0% or 100% methylation values. For these CpGs that clustered at either 0 or 100%, the average total read counts was less than 7. For this 2,548 CpGs, we found a better linear correlation between the β-values and the MBD-seq (R = 0.84, p < 0.0001; Fig 3C). Overall, the significant correlations with both the MBD-seq and RRBS shows that the conserved EPIC probes provide a reliable quantification of methylation in mice for majority of the CpGs. Furthermore, for CpGs that are hypomethylated or hypermethylated, the EPIC technology may have an advantage and provide higher quantitative sensitivity compared to the. MBD-seq.
Differential methylation analysis
We applied linear regression to examine differential methylation by age group and strain, and cross-referenced the DMCpGs detected by the EPIC array with DMRs detected by MBD-seq. For the effect of age, no conserved EPIC probe passed a 10% FDR threshold (full results and p-values are provided in S1 Data). However, we note that the probe that detected the most significant effect of age, cg13269407, is among the 353 CpGs that are used to estimate the human epigenetic age [8]. This CpG is hemimethylated (average β-value of 0.55) and associated with a ~2.4-fold decline in methylation between young and old age (uncorrected p-value = 1.8 x 10−5). In the MBD-seq, the corresponding region is classified as hypomethylated with rms = 0 for most of the samples and no reliable statistics could be carried out for this region due to small number of mapped reads. We then performed a reverse comparison to identify age-dependent DMRs (age-DMRs) in the MBD-seq data and evaluated replication by the EPIC probes. At the same FDR threshold of 10%, the MBD-seq detected seven age-DMRs. These strong age-DMRs have rms between 0.3 and 0.7 and are associated with an increase in methylation with age. Most occur in CGIs that have been reported previously [21]. Out of these seven age-DMRs, six corresponding EPIC probes replicated the age-dependent increase in methylation at a nominal p-value cutoff of 0.05 (Table 4).
Table 4. Age-dependent differentially methylated CpGs/regions detected by conserved Illumina human MethylationEPIC probes and by MBD-seq.
EPIC1 | MBD-seq1 | ||||||
---|---|---|---|---|---|---|---|
ProbeID | Gene2 | Region2 | Position (mm10) 3 | Coef. | Age (P) | logFC | Age (P) |
cg08949408 | C1QL3 | Body; Island | chr2:13.01 | 0.32 | 0.001 | 3.3 | 1.3E-10 |
cg10444382 | RFX4 | Body; Island | chr10:84.76 | 0.24 | 9.4E-04 | 2.9 | 2.5E-08 |
cg22384902 | LRRC4; SND1 | TSS1500; island | chr6:28.83 | 0.22 | 0.009 | 2.0 | 2.0E-06 |
cg06945399 | LRRC4; SND1 | TSS200; Island | chr6:28.83 | 0.18 | 0.057 | 1.5 | 2.2E-05 |
cg23398076 | MEIS1 | Body; Shelf | chr11:19.02 | 0.13 | 0.007 | 1.5 | 2.4E-05 |
cg05393688 | TSC22D1 | Body; Shore | chr14:76.51 | 0.17 | 0.005 | 1.5 | 2.8E-05 |
cg20563498 | USP35 | Body; Shelf | chr7:97.32 | -0.02 | 0.27 | 1.1 | 3.2E-05 |
1These are age-dependent differentially methylated CpG regions discovered in the MBD-seq at an FDR of 10%; replicated for the corresponding CpG in the EPIC microarray at an uncorrected p-value cutoff of 0.05. Coef. is the linear regression coefficient (i.e., change in methylation β-value from young to old). LogFC is log2 fold change in methylation from young to old.
2CpG location in relation to gene features and CpG region based in probe annotations for the human methylation microarray; gene feature annotations are the same for the corresponding regions in the mouse genome.
3Chromosome and Megabase coordinate based on mm10 mouse reference genome
For strain effect, 219 conserved EPIC probes detected a significant difference in methylation between B6 and D2 at an FDR threshold of 10% (strain-DMCpGs). Close to 80% of these CpGs (175 out of 219) are associated with higher methylation in D2 relative to B6. In the MBD-seq data, only 29 of the 219 corresponding regions replicated strain effect at an uncorrected p-value cutoff 0.05 (Table 5). Of these, 9 were associated with higher methylation in B6, and 20 were associated with higher methylation in D2. In the reverse comparison, we identified only 37 strain-dependent DMRs (strain-DMRs) at an FDR cutoff of 10%. Consistent with the EPIC data, the majority of these regions (21 of the 37) showed higher methylation in D2 relative to B6. Of these, 16 strain differences were replicated at the corresponding CpG in the EPIC data (6 with higher methylation in B6 and 10 with higher methylation in D2) (Table 5).
Table 5. Strain-dependent differentially methylated CpGs/regions detected by both Illumina human MethylationEPIC probes and by MBD-seq.
EPIC1 | MBD-seq1 | ||||||
---|---|---|---|---|---|---|---|
ProbeID | Gene3 | Region4 | Position (mm10) 3 | Coef.2 | Strain (P) | logFC2 | Strain (P) |
Differentially methylated CpGs detected by EPIC probe at FDR 10%; replicated by MBD-seq | |||||||
cg21064315 | SZT2 | 3'UTR; Shore | chr4:118.36 | -0.82 | 5.5E-09 | -2.0 | 1.7E-04 |
cg14945867 | CNIH | 1stExon; Island | chr14:46.79 | 0.27 | 1.3E-06 | 6.0 | 1.2E-09 |
cg04546815 | KANK4 | Body | chr4:98.78 | 0.34 | 1.6E-06 | 1.7 | 4.4E-04 |
cg10277781 | CNIH | 1stExon; Island | chr14:46.79 | 0.35 | 1.9E-06 | 6.0 | 1.2E-09 |
cg00049718 | CSDE1 | 5'UTR | chr3:103.02 | 0.40 | 2.6E-06 | 6.8 | 9.8E-15 |
cg07211292 | C20orf160 | 3'UTR; Island | chr2:153.08 | -0.46 | 5.0E-06 | -1.5 | 1.4E-05 |
cg24255125 | GRIK4 | Body; Island | chr9:42.52 | -0.36 | 7.8E-06 | -3.0 | 5.9E-09 |
cg03517030 | MTCH2 | 1stExon; Island | chr2:90.85 | 0.35 | 1.6E-05 | 6.7 | 2.3E-14 |
cg05781968 | WNT5A | Body; Island | chr14:28.51 | 0.31 | 4.4E-05 | 2.3 | 1.0E-05 |
cg04154281 | UBTF | Body; Shore | chr11:102.31 | 0.17 | 6.5E-05 | 0.7 | 0.03 |
cg06861375 | ZNF697 | Body; Island | chr3:98.43 | 0.36 | 6.7E-05 | 4.5 | 2.8E-04 |
cg24959134 | - | - | chr10:92.44 | -0.33 | 9.4E-05 | -2.4 | 0.01 |
cg06552810 | - | - | chr2:106.19 | 0.26 | 1.1E-04 | 2.9 | 0.002 |
cg01663821 | Shore | chr3:98.94 | 0.19 | 1.3E-04 | 0.9 | 0.02 | |
cg00597112 | - | - | chr11:109.01 | 0.21 | 1.4E-04 | 0.5 | 0.002 |
cg26857408 | UBTF | Body; Shore | chr11:102.31 | 0.24 | 2.1E-04 | 0.7 | 0.03 |
cg15172734 | SLMAP | 5'UTR; Shore | chr14:26.53 | -0.11 | 3.4E-04 | -2.5 | 0.01 |
cg09990537 | WNT5A | 5'UTR; Shore | chr14:28.51 | 0.17 | 3.4E-04 | 1.0 | 0.004 |
cg12849734 | - | - | chr2:157.71 | 0.14 | 4.4E-04 | 0.9 | 0.01 |
cg21746387 | NDUFA4L2 | TSS1500; Shore | chr10:127.51 | -0.17 | 5.5E-04 | -3.4 | 0.001 |
cg11382417 | - | - | chr2:96.32 | -0.21 | 6.0E-04 | -4.7 | 1.3E-07 |
cg02865068 | - | Shore | chr2:105.66 | 0.11 | 9.7E-04 | 2.9 | 0.04 |
cg14275842 | CHRNE | Body; Island | chr11:70.62 | 0.18 | 0.001 | 1.0 | 0.005 |
cg02159996 | GABRR1 | 5'UTR | chr4:33.13 | 0.13 | 0.001 | 1.2 | 2.5E-04 |
cg00920372 | - | - | chr19:45.33 | -0.08 | 0.001 | -1.5 | 8.8E-04 |
cg03422015 | ERC1 | Body | chr6:119.69 | 0.04 | 0.001 | 1.1 | 0.02 |
cg04340318 | - | - | chr4:86.04 | 0.16 | 0.001 | 2.3 | 0.001 |
cg14465355 | DYNC1H1 | Body; Shore | chr12:110.64 | 0.06 | 0.001 | 0.6 | 0.02 |
cg15002641 | SOX13 | Body | chr1:133.39 | -0.10 | 0.001 | -1.0 | 0.02 |
Differentially methylated regions detected by MBD-seq at FDR 10%; replicated by EPIC | |||||||
cg05362127 | WNT5A | TSS200; Island | chr14:28.51 | 0.33 | 0.002 | 2.3 | 9.4E-06 |
cg24142850 | - | - | chr8:92.55 | -0.09 | 0.005 | -2.9 | 9.4E-05 |
cg15585318 | WNT5A | Body; Island | chr14:28.51 | 0.22 | 0.006 | 1.8 | 2.1E-06 |
cg09595163 | WNT5A | Body; Island | chr14:28.51 | 0.18 | 0.006 | 2.3 | 1.2E-05 |
cg13868216 | BAIAP2L2 | Body; Island | chr15:79.26 | 0.11 | 0.01 | 1.6 | 1.8E-04 |
cg09972454 | PDXDC1 | Body; Shore | chr4:147.94 | -0.06 | 0.01 | -2.9 | 1.5E-06 |
cg18120446 | - | Island | chr5:41.75 | 0.01 | 0.02 | -2.2 | 2.7E-08 |
1These are strain-dependent differentially methylated CpGs (EPIC microarray) and CpG regions (MBD-seq) based on a “false discovery threshold” (FDR) cutoff of 10% and replication at an uncorrected p-value threshold of 0.05.
2Coef. is the linear regression coefficient (i.e., difference in methylation relative to C57BL/6J; negative is lower methylation in DBA/2J; and positive is higher methylation in DBA/2J compared to C57BL/6J). LogFC is log2 fold difference in methylation (i.e., difference in methylation relative to DBA/2J; negative is lower methylation in DBA/2J; and positive is higher methylation in DBA/2J compared to C57BL/6J).
3CpG location in relation to gene features and CpG region based in probe annotations for the human methylation microarray. For most conserved regions, mouse annotations are analogous to humans.
4Chromosome and Megabase coordinate based on mm10 mouse reference genome
Discussion
We used the recently released Illumina EPIC microarray to assay DNA methylation at conserved CpGs in the mouse genome. We evaluated both the qualitative features as well as the quantitative performance and compared it with MBD-seq data that was generated on the same DNA samples from mice. Such a cross-species approach has been previously used to examine gene expression and perform comparative genomics studies [33–36]. The two older versions of this Illumina methylation microarrays, the Infinium HumanMethylation 27K (HM27) and HumanMethylation 450K (HM450), have been carefully evaluated for use in mice [17]. The number of probes that map to the mouse genome can vary somewhat depending on the alignment algorithm. In the work by Wong et al. [17], alignment to the bisulfite-converted mouse genome resulted in the highest number of conserved probes. Using a stringent parameter of 100% sequence identity to the bisulfite genome, Wong et al. identified a total of 1,308 (4.7% of total) uniquely aligned probes in the 27K array, and 13,715 (2.8% of total) uniquely aligned probes in the 450K array that can be used to interrogate conserved CpGs in the mouse. In our present work, we performed alignment in a non-bisulfite space. While we required unique alignment, we tolerated up to two non-matching base pairs and added detection confidence as another parameter to identify probes that we can use for reliable quantitative assays. With these parameters, we identified 1.6% of total probes (13,665 in the EPIC array) that aligned uniquely to the mouse genome and associated with high confidence in signal detection. While alignment to the bisulfite-converted genome may have yielded a higher number of probes for measuring DNA methylation in mouse, the degenerate nature of bisulfite conversion would capture probes with off-target degenerate alignments. Indeed, a recent study did find this with a number of uniquely aligned probes ranging from 4,984 to 19,420 depending on the mapping stringency [18]. When we compared our list of probes to the 19,420 mouse EPIC probes reported by Needhamsen et al. [18], we found a high overlap of 77%. For our purposes, the probes we have identified here provide a representative subset with high confidence in sequence specificity and conservation in mouse and we have used these to assess quantitative performance in mouse samples and utility in detecting methylation variation.
In the set of 13,665 conserved probes, 9,429 (69%) were CpG loci carried over from the HM450 array and 7,483 of these were also in Wong’s list of conserved HM450 probes [17]. Only 4,234 of the 13,665 probes (31%) were new content that are unique to the EPIC array (i.e., not ported over from the HM450). A similar proportion of probes in the set reported by Needhamsen et al. was also ported over from the HM450 (13,005 out of 19,420) [18]. This low proportion of conserved probes is likely due to the design of the EPIC array. In the case of the HM450, the emphasis was on CGIs and flanking regions (i.e., shores and shelves) [11, 37]. These CGIs generally overlap proximal regulatory sites and are highly conserved across mammalian species with humans and mice having very similar complement of CGIs [16, 32, 38]. In contrast to the HM450, the emphasis of the EPIC array was on enhancers and CpGs outside of islands, and these are sequences that generally have lower conservation across mammalian species [13, 39]. Based on Illumina probe annotations, only 22% of the newly added content unique to the EPIC cover CGI associated regions. Out of the 4,234 conserved probes we identified that are unique to EPIC, 1,767 target CGI associated regions and 1,573 target 5’ regions such as TSS, 5’ UTR and exon 1. This is consistent with the overall higher enrichment in proximal gene regulatory sites among the conserved probes. Our observations show that despite the higher probe content in the EPIC compared to the older HM450, the number of probes with utility in cross-species studies is not proportionally increased.
In terms of quantitative variation in methylation, CGIs and promoter region CpGs show significant population variation [40]. However, compared to intergenic CpGs, the extent of inter-individual variability in methylation is reported to be much lower in these conserved sites [41, 42]. Hence, an obvious limitation in using the conserved EPIC probes is that we attain only a narrow perspective of the mouse methylome and we may be sampling the portion of CpGs that shows the least quantitative variability in a population. Nonetheless, CpGs in regulatory regions and CGIs play crucial roles in development and cell differentiation, and are implicated in tumor development and aging [16, 32, 38, 43, 44]. While narrow in perspective, the conserved probes likely represent a subset of CpGs with high functional relevance and application in cross-species study of DNA methylation.
We compared the methylation signal detected by the EPIC probes to two datasets generated by DNA sequencing—MBD-seq that was measured using the same samples as the EPIC, and a publicly available mouse liver RRBS data. While the sequencing approach theoretically provides more comprehensive coverage than microarrays, both RRBS and MBD-seq come with their own characteristic biases and have limitations in the types of CpGs that are most effectively interrogated [26, 45, 46]. The RRBS data we used in this comparison had poor read coverage for a large proportion of CpGs. Nonetheless, an overall significant correlation between the β-values and methylation percent measured by RRBS was observed. Unlike the bisulfite based RRBS and EPIC, the MBD-seq relies on affinity capture of DNA fragments by the methyl-CpG binding domain protein [23–25]. The methylation level is indirectly estimated based on the counts of sequenced reads that map to that region and the MBD-seq provides information on the methylation level of correlated CpGs in a region rather than one CpG [47–49]. We found a stronger concordance between the EPIC and MBD-seq, likely because these were generated from matched samples. However, for CpGs that are hypomethylated and hypermethylated, the rms for the corresponding regions showed a more clustered distribution and indicated a limited quantitative sensitivity for MBD-seq and limited capacity in discerning quantitative variation at such CpG regions. Our observations agree with a previous study that compared HM450 and MBD-seq data generated using the same commercial kit we used [50].
For a direct comparison between the EPIC probes and MBD-seq, we applied the same regression model and crosschecked the DMCpGs and DMRs detected by the two technologies. While we expected a higher quantitative sensitivity for the EPIC probes, the EPIC probes did not detect significant differential methylation between age groups at an FDR threshold of 10%. However, the topmost significant probe, cg13269407, is part of the 353 clock CpGs that are used to estimate the human DNA methylation age [8]. Consistent with the negative correlation with age in humans, this age-informative CpG was associated with a ~2.4-fold reduction in methylation in the old mice relative to the young mice. Aside from cg13269407, only 10 other human clock CpG probes were in the conserved set and none of these are associated with age in mice. Overall, the effect of age was weak when we considered individual CpGs. When we examined the corresponding CpG regions, the MBD-seq was more effective at detecting age-dependent methylation. At an FDR cutoff of 10%, we identified seven CpG regions that are classified as age-DMRs. These age-DMRs have been previously reported and show increases in methylation with age in mice [21]. For these age-DMRs identified by MBD-seq, we then checked whether the EPIC probes could verify the age effect. For this cross-verification, we used a less stringent statistical threshold of 0.05 for uncorrected p-values and found that six of the targeted CpGs are also associated with a significant age-dependent increases in β-values. Our observations suggest that age-dependent changes in methylation at these conserved sites may be more pronounced if we consider the correlated change of neighboring CpGs rather than methylation status of a single CpG. Despite the low overall quantitative sensitivity, the MBD-seq provides a complementary approach that may perform better for detecting methylation changes in regions harboring multiple correlated CpGs.
DNA methylation can vary substantially between mouse strains and a large fraction of this is likely due to underlying sequence differences between strains [21, 51, 52]. Strain variation in methylation has been shown to associate with complex phenotypes in mice such as insulin resistance, adiposity, and blood cell counts [53]. In our analysis, we detected 219 CpGs (i.e., 1.6% of the 13,365 interrogated CpGs) with a significant difference between strains at an FDR cutoff of 10%. A large majority (175 out of 219 CpGs) was associated with higher methylation in D2 compared to B6. While the overall lower methylation in B6 is intriguing, such variation between strains must be cautiously interpreted. It is well known that SNPs in probe sequences can have a strong confounding effect. This is particularly pernicious for mouse specific microarrays in which probe sequences are usually based on the B6 mouse reference, and as a result, there is more efficient hybridization for B6-derived samples, which results in a positive bias for this canonical mouse strain [54–56]. In the present work, since the EPIC array is based on the human sequence, we do not expect a systematic bias for one strain over the other. For replication, we referred to the MBD-seq data and only 29 out of the 219 corresponding CpG regions had consistent differential methylation between B6 and D2 in the MBD-seq.
Unlike using a human array that should not bias one mouse strain over another, the MBD-seq data is more vulnerable to technical artifacts caused by sequence differences. As is the general practice, we performed the alignment of the MBD-seq reads to the mouse reference genome. This means the alignment will be more efficient for sequences from B6, while sequences from D2 will have more mismatches. Since methylation quantification is estimated from the relative number of aligned reads, this may result in a systematic negative bias for D2, and methylation levels in regions with sequence differences will tend to have lower methylation due to poorer alignment. As a result, a higher fraction of strain-DMR will have lower methylation in D2 compared to B6 [21]. In the case that these conserved CpGs have higher methylation in D2 compared to B6, then the negative bias will lessen the quantitative difference between the strains. This may explain why the effect of strain is less pronounced in the MBD-seq data. In the MBD-seq, there were only 37 DMRs between B6 and D2 at an FDR threshold of 10%, and the EPIC probes replicated 16 of these. Out of the 37 strain-DMRs, the majority (21 of the 37) was associated with higher methylation in D2. Both the EPIC and MBD-seq therefore show an overall lower methylation profile in B6 compared to D2 that warrants further investigation and verification. Such strain differences in overall methylation have been previously reported for A/J and WSB/EiJ, with the A/J strain exhibiting higher methylation of CGIs in normal liver tissue compared to WSB/EiJ. This difference in the methylome was suggested to contribute to differential susceptibility for nonalcoholic fatty liver disease that characterizes the two strains [51]. In the case of B6 and D2, the two strains are highly divergent in a number of complex phenotypes ranging from behavioral and physiological to aging traits. The panel of recombinant inbred progeny derived from B6 and D2 (the BXD panel) has been used extensively in genetic research [57–61]. If there is indeed a distinct profile in DNA methylation between B6 and D2, then it will be of interest to evaluate if it segregates in the BXDs and how the methylome contributes to some of the phenotypic differences. The BXD panel could be an extremely rich and as yet untapped resource for methylome-wide analysis of complex traits that can then be integrated with the extensive systems genetics work that has already been done with this mouse family [62, 63]. No doubt, large-scale analysis of genome-wide DNA methylation in mouse genetic reference panels will be greatly accelerated with the development of a mouse version of the Infinium methylation arrays. And as is the case with other types of arrays, it will be crucial that the probes are designed against a more diverse panel of strains so that investigators can derive a more unbiased readout of methylation [64].
To conclude, we have catalogued a small subset of EPIC probes that target conserved CpGs in the mouse genome and that provide reliable quantification of DNA methylation in mouse samples. While detection for age-dependent methylation was weaker for the EPIC probes compared to MBD-seq, we have identified significant strain variation in methylation at the conserved CpGs. Our results indicate lower methylation for B6 compared to D2 at sites that have significant strain effect. It is unclear how much of the strain variation results from underlying sequence differences between B6 and D2, and this strain-specific profile needs to be further evaluated and verified
Supporting information
Acknowledgments
We are very grateful to Robert W. Williams for providing advice and access to the aging colony.
Data Availability
The full mouse data is available from NCBI NIH Gene Expression Omnibus (GEO accession ID GSE110600).
Funding Statement
This work was supported by University of Tennessee Faculty AwardUTCOM-2013KM, National Institute of Aging grant number R01AG043930.
References
- 1.Laffita-Mesa JM, Bauer PO, Kouri V, Pena Serrano L, Roskams J, Almaguer Gotay D, et al. Epigenetics DNA methylation in the core ataxin-2 gene promoter: novel physiological and pathological implications. Hum Genet. 2012;131(4):625–38. doi: 10.1007/s00439-011-1101-y . [DOI] [PubMed] [Google Scholar]
- 2.Simeone P, Alberti S. Epigenetic heredity of human height. Physiol Rep. 2014;2(6). doi: 10.14814/phy2.12047 ; PubMed Central PMCID: PMCPMC4208652. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Egger G, Liang G, Aparicio A, Jones PA. Epigenetics in human disease and prospects for epigenetic therapy. Nature. 2004;429(6990):457–63. doi: 10.1038/nature02625 . [DOI] [PubMed] [Google Scholar]
- 4.Portela A, Esteller M. Epigenetic modifications and human disease. Nature biotechnology. 2010;28(10):1057–68. doi: 10.1038/nbt.1685 . [DOI] [PubMed] [Google Scholar]
- 5.Toperoff G, Aran D, Kark JD, Rosenberg M, Dubnikov T, Nissan B, et al. Genome-wide survey reveals predisposing diabetes type 2-related DNA methylation variations in human peripheral blood. Human molecular genetics. 2012;21(2):371–83. doi: 10.1093/hmg/ddr472 ; PubMed Central PMCID: PMCPMC3276288. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Breton CV, Byun HM, Wenten M, Pan F, Yang A, Gilliland FD. Prenatal tobacco smoke exposure affects global and gene-specific DNA methylation. Am J Respir Crit Care Med. 2009;180(5):462–7. doi: 10.1164/rccm.200901-0135OC ; PubMed Central PMCID: PMCPMC2742762. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Baccarelli A, Wright RO, Bollati V, Tarantini L, Litonjua AA, Suh HH, et al. Rapid DNA methylation changes after exposure to traffic particles. Am J Respir Crit Care Med. 2009;179(7):572–8. doi: 10.1164/rccm.200807-1097OC ; PubMed Central PMCID: PMCPMC2720123. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Horvath S. DNA methylation age of human tissues and cell types. Genome biology. 2013;14(10):R115 doi: 10.1186/gb-2013-14-10-r115 ; PubMed Central PMCID: PMC4015143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Swartz JR, Hariri AR, Williamson DE. An epigenetic mechanism links socioeconomic status to changes in depression-related brain function in high-risk adolescents. Mol Psychiatry. 2017;22(2):209–14. doi: 10.1038/mp.2016.82 ; PubMed Central PMCID: PMCPMC5122474. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Roth TL, Lubin FD, Funk AJ, Sweatt JD. Lasting epigenetic influence of early-life adversity on the BDNF gene. Biol Psychiatry. 2009;65(9):760–9. doi: 10.1016/j.biopsych.2008.11.028 ; PubMed Central PMCID: PMCPMC3056389. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Bibikova M, Barnes B, Tsan C, Ho V, Klotzle B, Le JM, et al. High density DNA methylation array with single CpG site resolution. Genomics. 2011;98(4):288–95. doi: 10.1016/j.ygeno.2011.07.007 . [DOI] [PubMed] [Google Scholar]
- 12.Bibikova M, Le J, Barnes B, Saedinia-Melnyk S, Zhou LX, Shen R, et al. Genome-wide DNA methylation profiling using Infinium (R) assay. Epigenomics. 2009;1(1):177–200. doi: 10.2217/epi.09.14 PubMed PMID: WOS:000278041000021. [DOI] [PubMed] [Google Scholar]
- 13.Moran S, Arribas C, Esteller M. Validation of a DNA methylation microarray for 850,000 CpG sites of the human genome enriched in enhancer sequences. Epigenomics. 2016;8(3):389–99. doi: 10.2217/epi.15.114 ; PubMed Central PMCID: PMCPMC4864062. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Pidsley R, Zotenko E, Peters TJ, Lawrence MG, Risbridger GP, Molloy P, et al. Critical evaluation of the Illumina MethylationEPIC BeadChip microarray for whole-genome DNA methylation profiling. Genome biology. 2016;17(1):208 doi: 10.1186/s13059-016-1066-1 ; PubMed Central PMCID: PMCPMC5055731. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Stubbs TM, Bonder MJ, Stark AK, Krueger F, Team BIAC, von Meyenn F, et al. Multi-tissue DNA methylation age predictor in mouse. Genome biology. 2017;18(1):68 doi: 10.1186/s13059-017-1203-5 ; PubMed Central PMCID: PMC5389178. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Deaton AM, Bird A. CpG islands and the regulation of transcription. Genes & development. 2011;25(10):1010–22. doi: 10.1101/gad.2037511 ; PubMed Central PMCID: PMC3093116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Wong NC, Ng J, Hall NE, Lunke S, Salmanidis M, Brumatti G, et al. Exploring the utility of human DNA methylation arrays for profiling mouse genomic DNA. Genomics. 2013;102(1):38–46. doi: 10.1016/j.ygeno.2013.04.014 . [DOI] [PubMed] [Google Scholar]
- 18.Needhamsen M, Ewing E, Lund H, Gomez-Cabrero D, Harris RA, Kular L, et al. Usability of human Infinium MethylationEPIC BeadChip for mouse DNA methylation studies. BMC bioinformatics. 2017;18(1):486 doi: 10.1186/s12859-017-1870-y ; PubMed Central PMCID: PMCPMC5688710. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Gardiner-Garden M, Frommer M. CpG islands in vertebrate genomes. J Mol Biol. 1987;196(2):261–82. . [DOI] [PubMed] [Google Scholar]
- 20.Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, et al. Circos: an information aesthetic for comparative genomics. Genome research. 2009;19(9):1639–45. doi: 10.1101/gr.092759.109 ; PubMed Central PMCID: PMC2752132. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Mozhui K, Pandey AK. Conserved effect of aging on DNA methylation and association with EZH2 polycomb protein in mice and humans. Mechanisms of ageing and development. 2017;162:27–37. doi: 10.1016/j.mad.2017.02.006 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Aryee MJ, Jaffe AE, Corrada-Bravo H, Ladd-Acosta C, Feinberg AP, Hansen KD, et al. Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics. 2014;30(10):1363–9. doi: 10.1093/bioinformatics/btu049 ; PubMed Central PMCID: PMC4016708. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.De Meyer T, Mampaey E, Vlemmix M, Denil S, Trooskens G, Renard JP, et al. Quality evaluation of methyl binding domain based kits for enrichment DNA-methylation sequencing. PloS one. 2013;8(3):e59068 doi: 10.1371/journal.pone.0059068 ; PubMed Central PMCID: PMC3598902. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Aberg KA, Xie L, Chan RF, Zhao M, Pandey AK, Kumar G, et al. Evaluation of Methyl-Binding Domain Based Enrichment Approaches Revisited. PloS one. 2015;10(7):e0132205 doi: 10.1371/journal.pone.0132205 ; PubMed Central PMCID: PMC4503759. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Aberg KA, McClay JL, Nerella S, Xie LY, Clark SL, Hudson AD, et al. MBD-seq as a cost-effective approach for methylome-wide association studies: demonstration in 1500 case—control samples. Epigenomics. 2012;4(6):605–21. doi: 10.2217/epi.12.59 ; PubMed Central PMCID: PMC3923085. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Bock C, Tomazou EM, Brinkman AB, Muller F, Simmer F, Gu H, et al. Quantitative comparison of genome-wide DNA methylation mapping technologies. Nature biotechnology. 2010;28(10):1106–14. doi: 10.1038/nbt.1681 ; PubMed Central PMCID: PMCPMC3066564. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Lienhard M, Grimm C, Morkel M, Herwig R, Chavez L. MEDIPS: genome-wide differential coverage analysis of sequencing data derived from DNA enrichment experiments. Bioinformatics. 2014;30(2):284–6. doi: 10.1093/bioinformatics/btt650 ; PubMed Central PMCID: PMC3892689. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Chavez L, Jozefczuk J, Grimm C, Dietrich J, Timmermann B, Lehrach H, et al. Computational analysis of genome-wide DNA methylation during the differentiation of human embryonic stem cells along the endodermal lineage. Genome research. 2010;20(10):1441–50. doi: 10.1101/gr.110114.110 ; PubMed Central PMCID: PMC2945193. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26(1):139–40. doi: 10.1093/bioinformatics/btp616 ; PubMed Central PMCID: PMC2796818. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Benjamini Y, Hochberg Y. Controlling the false Discovery Rate: A Practical and Powerful approach to Multiple Tesing. Journal of the Royal Statistical Society Series B (Methodological). 1995;57(1):289–300. [Google Scholar]
- 31.Benjamini Y, Drai D, Elmer G, Kafkafi N, Golani I. Controlling the false discovery rate in behavior genetics research. Behav Brain Res. 2001;125(1–2):279–84. . [DOI] [PubMed] [Google Scholar]
- 32.Illingworth RS, Gruenewald-Schneider U, Webb S, Kerr AR, James KD, Turner DJ, et al. Orphan CpG islands identify numerous conserved promoters in the mammalian genome. PLoS genetics. 2010;6(9):e1001134 doi: 10.1371/journal.pgen.1001134 ; PubMed Central PMCID: PMC2944787. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Adjaye J, Herwig R, Herrmann D, Wruck W, Benkahla A, Brink TC, et al. Cross-species hybridisation of human and bovine orthologous genes on high density cDNA microarrays. BMC Genomics. 2004;5:83 doi: 10.1186/1471-2164-5-83 ; PubMed Central PMCID: PMCPMC535340. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Bar-Or C, Czosnek H, Koltai H. Cross-species microarray hybridizations: a developing tool for studying species diversity. Trends Genet. 2007;23(4):200–7. doi: 10.1016/j.tig.2007.02.003 . [DOI] [PubMed] [Google Scholar]
- 35.Oshlack A, Chabot AE, Smyth GK, Gilad Y. Using DNA microarrays to study gene expression in closely related species. Bioinformatics. 2007;23(10):1235–42. doi: 10.1093/bioinformatics/btm111 . [DOI] [PubMed] [Google Scholar]
- 36.Lu Y, Huggins P, Bar-Joseph Z. Cross species analysis of microarray expression data. Bioinformatics. 2009;25(12):1476–83. doi: 10.1093/bioinformatics/btp247 ; PubMed Central PMCID: PMCPMC2732912. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Sandoval J, Heyn H, Moran S, Serra-Musach J, Pujana MA, Bibikova M, et al. Validation of a DNA methylation microarray for 450,000 CpG sites in the human genome. Epigenetics: official journal of the DNA Methylation Society. 2011;6(6):692–702. . [DOI] [PubMed] [Google Scholar]
- 38.Illingworth RS, Bird AP. CpG islands—'a rough guide'. FEBS letters. 2009;583(11):1713–20. doi: 10.1016/j.febslet.2009.04.012 . [DOI] [PubMed] [Google Scholar]
- 39.Villar D, Berthelot C, Aldridge S, Rayner TF, Lukk M, Pignatelli M, et al. Enhancer evolution across 20 mammalian species. Cell. 2015;160(3):554–66. doi: 10.1016/j.cell.2015.01.006 ; PubMed Central PMCID: PMCPMC4313353. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Heyn H, Moran S, Hernando-Herraez I, Sayols S, Gomez A, Sandoval J, et al. DNA methylation contributes to natural human variation. Genome research. 2013;23(9):1363–72. doi: 10.1101/gr.154187.112 ; PubMed Central PMCID: PMCPMC3759714. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Wagner JR, Busche S, Ge B, Kwan T, Pastinen T, Blanchette M. The relationship between DNA methylation, genetic and expression inter-individual variation in untransformed human fibroblasts. Genome biology. 2014;15(2):R37 doi: 10.1186/gb-2014-15-2-r37 ; PubMed Central PMCID: PMCPMC4053980. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Gervin K, Hammero M, Akselsen HE, Moe R, Nygard H, Brandt I, et al. Extensive variation and low heritability of DNA methylation identified in a twin study. Genome research. 2011;21(11):1813–21. doi: 10.1101/gr.119685.110 ; PubMed Central PMCID: PMCPMC3205566. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Teschendorff AE, Menon U, Gentry-Maharaj A, Ramus SJ, Weisenberger DJ, Shen H, et al. Age-dependent DNA methylation of genes that are suppressed in stem cells is a hallmark of cancer. Genome research. 2010;20(4):440–6. doi: 10.1101/gr.103606.109 ; PubMed Central PMCID: PMC2847747. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Heyn H, Li N, Ferreira HJ, Moran S, Pisano DG, Gomez A, et al. Distinct DNA methylomes of newborns and centenarians. Proceedings of the National Academy of Sciences of the United States of America. 2012;109(26):10522–7. doi: 10.1073/pnas.1120658109 ; PubMed Central PMCID: PMC3387108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Robinson MD, Statham AL, Speed TP, Clark SJ. Protocol matters: which methylome are you actually studying? Epigenomics. 2010;2(4):587–98. doi: 10.2217/epi.10.36 ; PubMed Central PMCID: PMC3090160. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Harris RA, Wang T, Coarfa C, Nagarajan RP, Hong C, Downey SL, et al. Comparison of sequencing-based methods to profile DNA methylation and identification of monoallelic epigenetic modifications. Nature biotechnology. 2010;28(10):1097–105. doi: 10.1038/nbt.1682 ; PubMed Central PMCID: PMC2955169. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Zhang W, Spector TD, Deloukas P, Bell JT, Engelhardt BE. Predicting genome-wide DNA methylation using methylation marks, genomic position, and DNA regulatory elements. Genome biology. 2015;16:14 doi: 10.1186/s13059-015-0581-9 ; PubMed Central PMCID: PMC4389802. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Lovkvist C, Dodd IB, Sneppen K, Haerter JO. DNA methylation in human epigenomes depends on local topology of CpG sites. Nucleic acids research. 2016;44(11):5123–32. doi: 10.1093/nar/gkw124 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Li Y, Zhu J, Tian G, Li N, Li Q, Ye M, et al. The DNA methylome of human peripheral blood mononuclear cells. PLoS biology. 2010;8(11):e1000533 doi: 10.1371/journal.pbio.1000533 ; PubMed Central PMCID: PMC2976721. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.De Meyer T, Bady P, Trooskens G, Kurscheid S, Bloch J, Kros JM, et al. Genome-wide DNA methylation detection by MethylCap-seq and Infinium HumanMethylation450 BeadChips: an independent large-scale comparison. Scientific reports. 2015;5:15375 doi: 10.1038/srep15375 ; PubMed Central PMCID: PMC4612737. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Tryndyak VP, Han T, Fuscoe JC, Ross SA, Beland FA, Pogribny IP. Status of hepatic DNA methylome predetermines and modulates the severity of non-alcoholic fatty liver injury in mice. BMC genomics. 2016;17:298 doi: 10.1186/s12864-016-2617-2 ; PubMed Central PMCID: PMC4840954. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Orozco LD, Rubbi L, Martin LJ, Fang F, Hormozdiari F, Che N, et al. Intergenerational genomic DNA methylation patterns in mouse hybrid strains. Genome biology. 2014;15(5):R68 doi: 10.1186/gb-2014-15-5-r68 ; PubMed Central PMCID: PMC4076608. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Orozco LD, Morselli M, Rubbi L, Guo W, Go J, Shi H, et al. Epigenome-wide association of liver methylation patterns and complex metabolic traits in mice. Cell metabolism. 2015;21(6):905–17. doi: 10.1016/j.cmet.2015.04.025 ; PubMed Central PMCID: PMC4454894. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Walter NA, McWeeney SK, Peters ST, Belknap JK, Hitzemann R, Buck KJ. SNPs matter: impact on detection of differential expression. Nature methods. 2007;4(9):679–80. doi: 10.1038/nmeth0907-679 ; PubMed Central PMCID: PMC3410665. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Ciobanu DC, Lu L, Mozhui K, Wang X, Jagalur M, Morris JA, et al. Detection, validation, and downstream analysis of allelic variation in gene expression. Genetics. 2010;184(1):119–28. doi: 10.1534/genetics.109.107474 ; PubMed Central PMCID: PMC2802080. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Bottomly D, Walter NA, Hunter JE, Darakjian P, Kawane S, Buck KJ, et al. Evaluating gene expression in C57BL/6J and DBA/2J mouse striatum using RNA-Seq and microarrays. PloS one. 2011;6(3):e17820 doi: 10.1371/journal.pone.0017820 ; PubMed Central PMCID: PMC3063777. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Sloane LB, Stout JT, Vandenbergh DJ, Vogler GP, Gerhard GS, McClearn GE. Quantitative trait loci analysis of tail tendon break time in mice of C57BL/6J and DBA/2J lineage. The journals of gerontology Series A, Biological sciences and medical sciences. 2011;66(2):170–8. doi: 10.1093/gerona/glq169 ; PubMed Central PMCID: PMC3021371. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Peirce JL, Lu L, Gu J, Silver LM, Williams RW. A new set of BXD recombinant inbred lines from advanced intercross populations in mice. BMC genetics. 2004;5:7 doi: 10.1186/1471-2156-5-7 ; PubMed Central PMCID: PMC420238. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Mozhui K, Ciobanu DC, Schikorski T, Wang X, Lu L, Williams RW. Dissection of a QTL hotspot on mouse distal chromosome 1 that modulates neurobehavioral phenotypes and gene expression. PLoS genetics. 2008;4(11):e1000260 doi: 10.1371/journal.pgen.1000260 ; PubMed Central PMCID: PMC2577893. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Lang DH, Gerhard GS, Griffith JW, Vogler GP, Vandenbergh DJ, Blizard DA, et al. Quantitative trait loci (QTL) analysis of longevity in C57BL/6J by DBA/2J (BXD) recombinant inbred mice. Aging clinical and experimental research. 2010;22(1):8–19. . [DOI] [PubMed] [Google Scholar]
- 61.de Haan G, Van Zant G. Dynamic changes in mouse hematopoietic stem cell numbers during aging. Blood. 1999;93(10):3294–301. . [PubMed] [Google Scholar]
- 62.Wang X, Pandey AK, Mulligan MK, Williams EG, Mozhui K, Li Z, et al. Joint mouse-human phenome-wide association to test gene function and disease risk. Nature communications. 2016;7:10464 doi: 10.1038/ncomms10464 ; PubMed Central PMCID: PMC4740880. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Mulligan MK, Mozhui K, Prins P, Williams RW. GeneNetwork: A Toolbox for Systems Genetics. Methods in molecular biology. 2017;1488:75–120. doi: 10.1007/978-1-4939-6427-7_4 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Morgan AP, Fu CP, Kao CY, Welsh CE, Didion JP, Yadgary L, et al. The Mouse Universal Genotyping Array: From Substrains to Subspecies. G3. 2015;6(2):263–79. doi: 10.1534/g3.115.022087 ; PubMed Central PMCID: PMC4751547. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The full mouse data is available from NCBI NIH Gene Expression Omnibus (GEO accession ID GSE110600).