Abstract
The Infinium BeadChip is the most widely used DNA methylome assay technology for population-scale epigenome profiling. However, the standard workflow requires over 200 ng of input DNA, hindering its application to small cell-number samples, such as primordial germ cells. We developed experimental and analysis workflows to extend this technology to suboptimal input DNA conditions, including ultra-low input down to single cells. DNA preamplification significantly enhanced detection rates to over 50% in five-cell samples and ∼25% in single cells. Enzymatic conversion also substantially improved data quality. Computationally, we developed a method to model the background signal's influence on the DNA methylation level readings. The modified detection P-value calculation achieved higher sensitivities for low-input datasets and was validated in over 100 000 public diverse methylome profiles. We employed the optimized workflow to query the demethylation dynamics in mouse primordial germ cells available at low cell numbers. Our data revealed nuanced chromatin states, sex disparities, and the role of DNA methylation in transposable element regulation during germ cell development. Collectively, we present comprehensive experimental and computational solutions to extend this widely used methylation assay technology to applications with limited DNA.
Graphical Abstract
Introduction
DNA modifications, including 5-methylcytosines (5mCs) and 5-hydroxymethylcytosines (5hmCs), are canonical forms of epigenetic modification in human and other mammalian genomes. DNA methylation is found mainly in CpG dinucleotide contexts, where it is extensively implicated in gene transcriptional regulation, cell identity maintenance, organismal development, aging, and diseases (1). Infinium DNA methylation BeadChips are among the most popular genome-wide methylation assays in humans and other species due to the ease of experiment and data analysis (2). These arrays have been the primary data generation workhorse for large data consortia such as The Cancer Genome Atlas (TCGA), with public methylome profiles of over 80 000 HM450 samples (3) and a similar number of EPIC array methylome profiles deposited to Gene Expression Omnibus (GEO). While the adoption of sequencing-based methods is catching up in case and mechanistic studies, the Infinium technology remains the most used assay platform for population-level studies such as meQTL studies (4,5), epigenetic risk scores (6,7), and other epigenome-wide association studies (8,9). This is, in part, due to the necessity in population studies to cover a large number of samples with nuanced variation in methylation levels and to dissect multiple cohort covariates such as sex, age, genetic background, and tissue type. In addition to being a powerful and popular tool for biological discovery, the technology has recently enabled rapid clinical application development (10). This platform has found wide success in cancer diagnosis (11), cell-free liquid biopsy (12), and forensics (13). Recently, the Infinium BeadChips have also been used to generate the largest DNA methylome atlas across different mammalian species (14–16).
Despite these successes, a significant drawback of this technology is that it requires over 200 ng of input DNA from the standard processing protocol (2). This requirement constrains scientific and clinical applications with limited DNA availability. For example, as few as 25 primordial germ cells (PGCs) can be found in the early mouse embryo (17). Serum-derived, tumor-originated cell-free DNA (cfDNA) in cancer patients (18) holds value in non-invasive early cancer diagnosis (19). But as little as five ng/ml cfDNA in healthy subjects and 30 ng/ml in cancer patients (20) may be available. Also, DNA obtained from crime scene traces are often in the picogram to nanogram range (21). The DNA concentrations obtained from these tissues are much lower than the Infinium array standard protocol requires.
In addition to circumstances where the input DNA quantity is limited, it may be of interest to study a complex tissue to dissect cell-to-cell heterogeneity purposefully, even when there is no shortage of total DNA in the tissue. DNA methylation encodes distinctive cell identity fingerprints, which can be used to infer cellular phenotypes (1), trace the state of the DNA-releasing cells (22), and infer cell proportions (23). By performing DNA methylation analysis on laser-capture microdissected specimens, one can compare methylomes at different locations in tumors (24,25) or select specific cell types from the brain for analysis (26). In most cases, laser-capture microdissected tissues are limited in quantity and involve pre-assay whole genome amplification, as is done with SNP arrays (27).
The extreme of increasing the cell resolution at the cost of working with small DNA amounts is epitomized by the rapid development of single-cell DNA methylome assay technologies (28). Each human or mouse cell carries 5–7 pg. In the past decade, most technologies have been based on pre-amplifying deaminated DNA (29–31) using random priming or single-strand adapter ligation before feeding amplified DNA to high-throughput sequencing. In addition, multiple enzymatic cytosine conversion methods have been developed to replace sodium bisulfite conversion to better preserve genomic DNA during library preparation (32). Inspired by these single-cell methods, we posit that similar preamplification methods and enzymatic conversion can also be used with Infinium arrays.
Besides the experimental challenges, current computational methods are not fully optimized for low-input DNA data. Current signal preprocessing practices may lead to low probe signal detection rates and probe over-masking. Signal detection has traditionally been determined by comparing probes’ signal intensities with negative control probes (33) or readings in the Infinium-I out-of-band channel (34). A conservative threshold of the detection P-value was then determined to mask low-intensity probes. When the foreground and background signal intensities are separable, using a conservative threshold on high-input data would not harm detection sensitivity. This approach leads to a significant loss of true biological signals for limited DNA input. A method that maximally preserves biological signals while removing pure artifacts remains an unmet need.
Here, we systematically developed and evaluated experimental and computational methods to improve array sensitivity at low-input ranges and single cells. Our evaluation encompasses previously attempted adaptations, including using different bisulfite conversion elution (35), using Formalin-Fixed Paraffin-Embedded (FFPE) restoration (36), combining bisulfite conversion with DNA extraction (37) as well as methods that were never previously used with Infinium arrays, such as the enzymatic conversion and different preamplification strategies. We developed a new signal detection framework to address the computational challenge of processing data from limited DNA. We showed that this new method significantly improved array detection rates while effectively masking probes whose readings are dominated by background signals. We showed that the Infinium BeadChip is compatible with samples of low input down to single cells. And we presented end-to-end solutions to enable this technology for low-input and single-cell samples.
Materials and methods
Cell cultures
NIH3T3 (ATCC, CRL-1658) was obtained from American Type Culture Collection (ATCC) and cultured in DMEM (ATCC, 30-2002) containing 10% Calf Bovine Serum (ATCC, 30-2030) and 1% penicillin/streptomycin (Gibco, 15140122). B16-F0 (ATCC, CRL-6322) was obtained from ATCC and cultured in DMEM (ATCC, 30-2002) containing 10% Fetal Bovine Serum (Gibco, 45000-736) and 1% penicillin/streptomycin (Gibco, 15140122). All cells were maintained in a 37°C incubator with 5% CO2 and cultured in a 75 cm2 culture flask (Fisher, BD353136).
Cell flow sorting
5 × 106 cell pellets of the NIH3T3 and the B16-F0 were resuspended in 50 μl of 0.1 μg/1 ml of 4,6-diamidino-2-phenylindole (DAPI) (Sigma-Aldrich, D9542-5MG) in 1 ml of phosphate-buffered saline (PBS) (Life Technology, 10010023). Cells were filtered by a Falcon Cell Strainer Snap Cap (Falcon, 352235). DAPI-negative cells (1, 2, 5, 10 and 100 cells) from NIH3T3 and B16-F0 were sorted and collected into 96-well plates pre-loaded with 10 μl of 1× M-Digestion Buffer (Zymo Research, D5020-9) using a BD FACSAria™ Fusion cell sorter (BD Biosciences) with a 100 μm nozzle.
Mouse primordial germ cells
Gonads from embryonic Oct4-GFP transgenic mice (B6;129S4-Pou5f1tm2Jae/J; Jackson Laboratory, strain #008214,RRID: IMSR_JAX:008214) were harvested at embryonic day E11.5, E12.5, E13.5, and E14.5 (38). Gonads were dissected in calcium- and magnesium-free PBS (Gibco) and transferred into 500 μl of 0.25% Trypsin–EDTA (Gibco). Subsequently, the preparation of embryonic germ cells was carried out following the method previously described (39). For bisulfite mutagenesis, PGCs were snap-frozen for storage at –80°C until further processing.
DNA extraction and bisulfite conversion
NIH3T3 and B16-F0 cells were harvested by centrifugation at 100g for 5 min at room temperature and washed twice using PBS (Gibco, 10010023). The DNeasy Blood and Tissue Kit (Qiagen, 69504) was used to extract genomic DNA from NIH3T3 and B16, according to the manufacturer's protocol. DNA samples were quantified using Qubit 4.0 Fluorometer (Invitrogen) using the dsDNA HS Assay Kit (Invitrogen, Q33231). Bisulfite conversion was performed using three kits. DNA bisulfite conversion using EZ DNA Methylation Kit (Zymo Research, D5001) was performed according to the manufacturer's instructions with the specified modifications for Illumina Infinium Methylation Assay. DNA bisulfite conversion using EZ DNA Methylation-Gold Kit (Zymo Research, D5005) and EZ DNA Methylation-Direct Kit (Zymo Research, D5020) was performed according to the manufacturer's protocol. Cell lysis and bisulfite conversion from sorted cells and PGCs were performed with EZ DNA Methylation-Direct Kit according to the manufacturer's instructions.
DNA restoration
After bisulfite conversion, the bisulfite-converted DNA was eluted, resulting in an 8 μl volume. The Infinium HD FFPE DNA restoration kit (Illumina, WG-321-1002) was then used according to the manufacturer's instructions. Following a 1-min incubation, the elution of the DNA was carried out using autoclaved ultrapure water for a 10 μl elution volume. The eluted DNA was stored at –20°C before undergoing Infinium array processing. Intermediate DNA purifications were performed using the DNA Clean and Concentrator-25 Kit (Zymo Research, D4064).
Cytosine conversion elution size optimization
The Illumina Infinium Mouse Methylation BeadChip assays were conducted according to the manufacturer's specifications with slight modifications (Supplementary Tables S1A and S1B). The original protocol specifies using 4 μl obtained from 12 to 22 μl eluted BCD, mixed with 4 μl 0.1 N sodium hydroxide for amplification and BeadChip reaction (Infinium HD Assay Methylation Protocol Guide 15019519 v01). However, commercial bisulfite conversion kits typically produce over 10 μl elution in purifying the converted DNA, leading to only part of the eluted DNA (4 μl) used for the BeadChip assay. Previous studies have adjusted the elution size or additional concentration steps to minimize DNA loss. For example, one option is to mix 7 μl eluted DNA with 1 μl 0.4 N sodium hydroxide (NaOH) (40–42). DNA input can be maximized by increasing NaOH concentration in denature step of the Infinium array. To preserve more input DNA, we compared four alternative combinations of elution buffer, input DNA volume, and NaOH concentration and volume.
Enzymatic methyl (EM)-array sample preparation
Libraries were prepared using the NEBNext Enzymatic Methyl-seq (NEB, E7120S) kit, following the manufacturer's instructions. 50, 5, 2, or 0.5 ng of 5mC adaptor-ligated NIH3T3 DNA was used as input. HiFi HotStart Uracil + Ready Mix (KAPA Biosystems, KK2801) was used to amplify the libraries following conversion before purification over SPRI beads (0.8× left-sided) and elution in nuclease-free water to yield final libraries. Libraries were then quantified by Qubit HS (Invitrogen, Q32851) and quality checked on an Agilent Bioanalyzer 2100 before sequencing on an Illumina MiSeq instrument to confirm conversion efficiencies. Details of each input's elution size and library amplification cycles are listed in Supplementary Table S1C.
ELBAR detection P-value calculation
In the standard Infinium BeadChip usage, >200 ng DNA is profiled, and probe detection calling is employed to filter out probes whose signals are subject to substantial background influence. However, this practice will cause significant biological signal loss from low-input datasets where users often seek to retain the most biological signal but can tolerate some background influence. To meet this need, we developed ELBAR (Eliminating BAckground-dominated Reading) to exclude/mask only probes lacking biological signals and entirely dominated by background signals. The ELBAR method is based on the observation that the beta value ranges depend on the probes’ total signal intensities and includes the following steps. First, we define total signal intensities as the sum of signals methylated (M) and unmethylated (U) alleles. Pooling in-band and out-of-band signals, ELBAR bins probes by log2-transformed M + U signal intensities. Next, ELBAR calculates the upper and lower bounds (calculated using the 5% and 95% quantiles to accommodate outliers, defined as the beta value envelope) of each bin as M + U varies. Third, we define the background signal by looking for the first bin that deviates in the beta value envelope of the bin from the smallest M + U. Lastly, these probes' maximum M and U signals were treated as the true background signal to compute detection P-values.
Public datasets
The mouse MM285 datasets were downloaded from GEO under GSE18441 (15,43). GEO accessions of other EPIC and HM450 datasets were provided in Supplementary Table S2A. BS-seq dataset for 4 PGC samples (E10.5, E11.5, one male E12.5, and one female E12.5) was downloaded from GEO under GSE76971 (44). TCGA testicular seminoma datasets (45) were downloaded from Genomic Data Commons (46).
Infinium BeadChip data preprocessing and analysis
The Illumina Infinium BeadChips technology is based on sodium bisulfite conversion of DNA, with single base resolution genotyping of targeted CpG sites via probes on a microarray. Probes are designed to match specific 50 base regions of bisulfite-converted genomic DNA, with a CpG site at the probe's 3′ end (47). Upon hybridizing with bisulfite-converted DNA, the probe undergoes a single-base extension that incorporates a fluorescently labeled ddNTP at the 3′ CpG site, enabling distinguishing of the C/T conversion resulting from bisulfite conversion. The fluorescent signals provide insight into the methylation status (methylated or unmethylated) of specific cytosine residues in the DNA sample. Signal intensity refers to the strength of the fluorescent signal emitted by the hybridized probes on the BeadChip. The signal intensity is directly related to the quantity of target molecules bound to specific probes. Probe success rates represent the proportion of successfully captured CpG probes for targeted CpGs.
The IDAT files generated, along with all public datasets utilized, were processed using the SeSAMe R package. This encompassed preprocessing, quality control, and analysis, adhering to the established preprocessing workflow (43). The probe detection p-value was computed using the pOOBAH algorithm, which leverages the fluorescence of out-of-band (OOB) probes. Subsequently, normalization was performed using noob, which applies a normal exponential deconvolution of fluorescent intensities based on the OOB probes. Additionally, a dye bias correction was applied using the dyeBiasNL function. Infinium Methylation BeadChip manifest and annotations data, which include gene, chromatin state, sequence context, the ‘PGCMeth’ probes list (designed to target CpGs highly methylated in E13.5 PGCs) (48), and other functional annotations, were obtained from http://zwdzwd.github.io/InfiniumAnnotation. Metagene plot was generated using the KYCG_plotMeta function in the SeSAMe package. To calculate the F1 scores of each sample against the 250-ng control (205243950081_R01C01), a beta value greater than 0.5 was rounded to one and set to zero otherwise. Then, the F1 score is calculated by treating one as true and 0 as false and comparing the target sample with the 250-ng control, i.e. F1 = 2TP / (2TP + FN + FP) where TP is the true positive counts, FP is the false positive counts, and FN is the false negative counts.
Results
Characterizing the suboptimal DNA signatures in public Infinium datasets
Suboptimal DNA quality and quantity impact Infinium methylation data through the manifestation of lower signal intensity and probe success rate. We first studied the probe success rate of over 100 000 public Infinium datasets deposited to GEO to identify the determinants of Infinium array performance (Figure 1A). Comparing DNA sources, we observed that bone, buccal, plasma cfDNA, esophagus, and saliva often yield data with suboptimal probe detection success. Further dataset stratification by sample preservation reveals that FFPE samples are significantly lower in probe success rate than non-FFPE samples. The lower signal intensity leads to skewing of beta values, which represent the methylation level at a specific site, towards an intermediate reading at 0.5 due to stronger relative influences of the signal background (Figure 1B–D) (34). This asymptotic convergence to 0.5, defined by the upper and lower bounds of beta values, forms a beta value envelope. This increase in the background signal influence is a continuous spectrum instead of a dichotomy of detection success vs. failure. When the input DNA is of high quality and quantity, most probes have stable and clustered signal intensities, leading to a bimodal beta value distribution. But in suboptimal datasets such as from FFPE and cfDNA samples, more probes carry lower signal intensities, leading to beta values approaching 0.5 (Figure 1C). This drop in sample quality in FFPE samples is likely intertwined with low DNA input amount, as supported by a similar transition of intermediate methylation readings in the low input data (Figure 1D and Supplementary Figure S1A). We also found that FFPE and cfDNA samples lose detection at different genomic regions. cfDNA, saliva, and buccal cells preferentially lose detection at GC-rich promoter sites such as TssA and TssBiv, while FFPE samples are less biased across genomic territories (Figure 1E). The full names of the chromatin states are available in Supplementary Figure S1B.
Infinium BeadChip workflows for low DNA input and single cells
To improve signal detection in low-input experiments, we developed 13 non-standard workflows using: (i) preamplification of the genomic DNA (see Supplementary Methods); and (ii) enzymatic cytosine conversion methods, besides other protocol adjustments that preserve DNA load (see Figure 2A, Supplementary Tables S1A–S1C). The probe success rate and F1 scores were presented in Figure 2B–D for three workflows, each representing distinct characteristics: the unmodified workflow (Workflow A), the workflow with maximized elution size (Workflow C), and the most optimized workflow with preamplification (Workflow J). Additionally, Figure 2C and D included a workflow achieved through enzymatic base conversion, labeled as Workflow M. In Supplementary Figures 2B–2D, we illustrated probe success rates, F1 scores and Spearman correlations for all 13 workflows, encompassing the four previously mentioned workflows. We found that the standard Illumina workflow (Workflow A) can detect signals on 70% array probes with two ng DNA without modification, consistent with our prior characterization of the EPICv2 BeadChip (49). In the sub-2-ng range, the detection success rate drops rapidly for Workflows A and B (Figure 2B, C, and Supplementary Figure S2B). Enzymatic base conversion (Workflow M) maximizes signal detection (84%) in 0.5 ng-input experiments, followed by a whole-genome preamplification-based method (Workflow J) and one with elution size modification alone (Workflow C). Due to the allelic nature of DNA, we use the F1 score (Materials and methods), which binarizes beta values for comparison to the reference sample, rather than correlation. In the five-cell experiments, Workflow J (Figure 2B and D, and Supplementary Figure S2C) consistently reaches over 70% in signal detection (pOOBAH < 0.05) and close to 90% in the F1 score. Notably, Workflow J detects around 25% probes in single cells with an F1 score >70%, suggesting that the detected data is biologically informative despite a higher rate of detection failure. In contrast, the standard workflow is consistently under 50% in the F1 score, suggesting more biologically misleading readings.
For 0.5 and 2 ng input ranges, Workflow J also improved the data quality as indicated by the higher correlation coefficient with a 250-ng dataset (Figure 2E and F, and Supplementary Figure S2D). The improvement is most evident in the recovery of intermediate DNA methylation, which is only observed in workflow J at 0.5 ng (Figure 2F), suggesting that preamplification helps all samples from fewer than ten cells to those with over 1 ng input (Figure 2E).
Previous studies have shown that FFPE restoration kits may improve signal detection on DNA of suboptimal quality (36). The FFPE restoration kit did not significantly improve the array performance with 50 ng and 0.5 ng non-FFPE DNA input (Workflows D and E, Supplementary Figure S2E). We also did not observe substantial performance differences among the three different bisulfite conversion kits (Supplementary Figure S2F). EZ-direct kit produced data of a slightly better probe detection rate likely due to the minimization of DNA loss from a single-step bisulfite conversion and purification.
We also compared preamplification based on random hexamers (N6) and random hexamers with a T7 primer at the 5′-end (N6-T7) for whole genome amplification (Supplementary Figures S2F and S2G) (50). Workflow J showed the highest Spearman's rho compared to other workflows with preamplification (Workflows F–I). The T7 sequence in N6-T7 serves as a second primer, allowing further PCR amplification. However, additional PCR cycles did not improve array performance (Workflows K and L, Supplementary Table S1D, Supplementary Figures S2B, and S2D). Moreover, the array with four Klenow amplification cycles (Workflow H) did not outperform the array with two cycles. Given our findings, we followed the N6 amplification strategy for the subsequent analysis.
Optimized workflow resolves intercellular heterogeneity while maintaining cell line identity
DNA methylomes profiled from a small number of cells often reveal cell-to-cell heterogeneity. We next tested whether our low-input method can uncover this heterogeneity and whether the cell population averages reduce to measurements from high DNA input. We merged the single and five-cell methylomes, respectively, and compared the combined data with the 250-ng methylome (Figure 3A). We found that the merged methylome reinstated the intermediate methylation measurements which are otherwise missing from the single low-input experiment representatives. As DNA methylation readouts are allelic (taking only 0%, 50%, and 100% in diploid cells), we expect a reduction of non-allelic fractions as the cell population becomes less heterogeneous. Focusing on CpGs showing intermediate methylation (0.3–0.7) in the 250-ng data, we observed a gradual dichotomization of methylation levels approaching 0% and 100% in the 10-cell (n = 4) datasets and more in the five-cell (n = 5) datasets (Figure 3B). This polarization is likely due to a higher genomic DNA homogeneity from reduced cell numbers. The lingering non-allelic methylation fractions are likely due to amplification bias and residual cell-to-cell heterogeneity. Given the same input cell number, preamplification (Workflow J) retained more intermediate readings (Figure 3C). The standard workflow (Workflow A) became nearly fully dichotomized at ten cells and struggled globally in the 5-cell experiment (Figure 3C).
Despite the resolution of intermediate methylation levels, small-cell-number data largely retain global and focal methylation differences intrinsic to the cell type identity (Supplementary Figure S3A). All datasets from five to ten cells cluster with the 250 ng datasets of the same cell line on a t-distributed stochastic neighbor embedding (tSNE) projection (Figure 3D). Most single-cell samples are also grouped accordingly despite the erroneous placement of two single-cell B16 samples, likely due to lack of biological signal, as suggested by their ultra-low probe detection rates (0.13) compared to other samples of similar input DNA amounts. The five-cell datasets are clearly separated in a metagene plot that suggested a higher global methylation for the B16 cells at all input ranges (Supplementary Figure S3B). Finally, the differentially methylated CpGs between the two cell lines are largely preserved in five-cell methylomes (Figure 3E). Random discrepant methylation did occur more frequently at CpGs intermediately methylated in the 250 ng datasets for the two cell lines, respectively (Figure 3E), enriching for bivalent chromatin (Supplementary Figures S3C and S3D). Collectively, these data suggest the Infinium arrays can robustly profile five-cell samples. While the Infinium arrays can profile single cells, their performances are unstable (Figure 3D).
GC-rich and high copy number regions retain detection in low input datasets
We next explored which genomic regions are most susceptible to detection loss in high and low input experiments. We first observed that signal intensities intrinsically depend on the probe sequences. We studied the within-sample intensity Z-score of HM450 autosomal probes across 749 normal samples from the TCGA cohort (Methods) (Figure 4A, Supplementary Figure S4A). The violin plot displays the complete set of Z-scores for probes in the cohort. Data points corresponding to individual samples were smoothed and provided a comprehensive view of the overall distribution pattern. The Z-score distribution for different probes shows clear probe dependence. Probes at the high and low signal intensity extremes have little overlap, suggesting a strong sequence dependence. Probes targeting GC-rich regions (as indicated by the number of ‘C’s in the probe sequences since ‘G’s are replaced by ‘A’s to pair with ‘T’s from bisulfite conversion) are associated with higher signal intensities (Figure 4B). This is supported by an enrichment of the detected probes in CpG islands (Figure 4C, D, and Supplementary Figure S4B), gene promoters, transcription factor binding sites, e.g. TFAP2C, and promoter-associated histone modifications, e.g. H3K64ac (Supplementary Figures S4C and D).
Consistently, the probes that fail detection in 250-ng and 5-cell samples are significantly enriched in low-CpG density regions (Figure 4C). PMD solo-WCGWs—CpGs flanked by A/Ts and with no other CpGs within a 70-bp neighborhood at partially methylated domains (51)—are observed to lose most signal detection, consistent with their CpG-sparse nature. Probes targeting non-CG cytosine methylation also tend to lose detection in low-cell-number samples. Interestingly, mitochondrial CpG probes, transposable element (TE) CpGs, and other multi-mapping probes have the least probe detection loss in low-input datasets. The mitochondrial genome showed nearly 100% probe detection success in single, and two-cell experiments. This is likely due to the high copy number of mitochondrial genomes per cell (52). Similarly, other high copy number repetitive elements, such as the Satellite, B1 elements, and other SINE1/7SL elements, also show high probe success rates (Figure 4E and F). These results suggest that the multi-mapping probes may be used as a TE profiling tool for low-input samples. More prevalent heterogeneity in DNA methylation within quiescent chromatin was observed. CpGs displaying lower methylation levels in bulk tissues but higher levels in individual five cells (Figure 4G, right panel), or vice versa (Supplementary Figure S4E, right panel), are both enriched in quiescent chromatin regions. In contrast, promoters (Tss) and gene bodies (Tx) showed consistently low (Figure 4G, left panel) and high methylation levels (Supplementary Figure S4E, left panel), respectively, in both the bulk tissue and the 5-cell samples.
ELBAR preserves more signal detection for low-input datasets
The conventional detection P-value calculation aims at preventing false discovery in high-input datasets, where probes with suboptimal signal intensity are rare and a relatively clear decision boundary can be found. In low-input samples, more probes carry lower signal intensity and can overlap with measurements purely dominated by background signals (Supplementary Figure S5A). Applying the same detection P-value threshold may lead to a significant loss of biologically useful readings. To better balance sensitivity for low-input datasets, we developed the ELBAR algorithm, based on the observation that probes dominated by signal-background-only are always associated with intermediate methylation readings (Figure 5A). In brief, ELBAR looks for low-signal probes with intermediate methylation to model the background signal (Methods). Doing so can effectively remove background-induced artificial readings while minimally removing probes with biological signals (Figure 5B). In the cell line experiments, probes that survive ELBAR masking maintain a bimodal distribution of beta values as biologically expected. In contrast, pOOBAH, a prior method designed for high-input datasets, masked probes more aggressively. The probes surviving pOOBAH masking are slightly asymmetric in the beta-value envelope and show a small amount of background-dominated beta values (Figure 5B). ELBAR effectively masked these beta values associated with low signal intensity and artifactually fixed around 0.5.
Testing ELBAR on public EPIC, HM450 (Figure 5C, Supplementary Table S2A), and MM285 (Figure 5D) datasets, we found that it could rescue a significant number of probes compared to pOOBAH. Interestingly, experiments with array-wide failure remain low in detection rates, suggesting ELBAR can discriminate probe failure against array-wide failures. The probes rescued by ELBAR from pOOBAH show biological relevance, evidenced by higher correlation with the 250 ng datasets (Supplementary Figure S5B). Of note, ELBAR combines negative control probes for background calibration and only considers intermediate methylation reading from low-intensity probes. Hence, its masking would not be influenced by true biological methylations. For example, we validated ELBAR’s performance in samples with globally high, intermediate, and low methylation levels (Supplementary Figures S5C–E), including testicular seminoma tissues (Supplementary Figure S5F). ELBAR improves detection in wide input ranges (Figure 5E, P-value=2.7E-8, t-test of the method sensitivity difference in multiple linear regression) without harming accuracy (Figure 5F, no statistical significance detected from method accuracy differences).
To further validate ELBAR’s performance in FFPE samples, we compared ELBAR, pOOBAH and minfi's detectionP function in a previous study of 53 melanoma FFPE tissue samples (53). We used five samples with the best detection p-values to derive a putative ground truth methylation profile and evaluated the measurement deviation in probe sets stratified by the masking status under the three methods (Figure 5G). Probes that survive all three masking methods have the lowest methylation level deviation as expected, followed by the probes masked only by pOOBAH. These probes are the greatest in number compared to other probe masking groups, suggesting that pOOBAH may have caused a significant loss of biological signal in this dataset. Overall, ELBAR-masked probes are associated with higher measurement deviation from the ground truth (dark red in Figure 5G), despite that they may survive the masking by the other two methods. Collectively, these results point to an advantage of using ELBAR for detection calling over pOOBAH and detectionP in FFPE samples.
Low-input BeadChip data captures the demethylation dynamics in primordial germ cells
PGCs are typically present in low numbers, hindering their analysis by the standard Infinium array processing workflow (54–56). In mammals, PGCs undergo genome-wide epigenetic reprogramming, including global DNA methylation loss, as they migrate from the epiblast to the bipotential gonads (57). This corresponds to embryonic day(E)7.5 to E14.5 of development in the mouse. We applied our optimized method to study the methylation of mouse gonadal PGCs collected at E11.5 to E14.5 (Figure 6A). For each time point, PGCs from a pair of embryonic gonads were FACS sorted (Methods), and the aliquoted volume varied from 0.25 μl to 9 μl. We employed workflow J, with pre-amplified DNA amounts ranging from ∼1 ng to 13 ng (Supplementary Table S2B). As a contrast, we included methylome profiles of mouse liver, lung, ovary, and testes tissues in our analysis (Methods, Supplementary Table S2C).
Consistent with prior knowledge, PGCs exhibit the lowest methylation level relative to somatic tissues and adult gonads across major chromatin states (Figure 6B and C). Consistent with the probe design rationale, ‘PGCMeth’ probes showed resistance to methylation in E11.5 and E13.5 PGCs. The genome-wide distribution of PGC methylation loss is largely uncoupled from their methylation states observed in non-PGC tissues. For example, the genomic regions with active gene transcription (ChromHMM states ‘Tx’ and ‘TxWk’) were associated with the highest methylation in non-PGC tissues (Figure 6B). But in PGCs, gene bodies are less methylated than heterochromatin and transcriptionally quiescent regions (Figure 6B and C). We observed a partial methylation loss in E11.5 PGCs. The demethylation process culminated in E13.5 PGCs. Male PGCs initiated re-methylation as early as E14.5, while female PGCs remained similarly unmethylated at E14.5 as E13.5. This is consistent with de novo methylation occurring in female germline postnatally as oocytes are recruited for growth during each reproductive cycle (58). This sex disparity in methylation rebound is evident in imprinted gene-associated differentially methylated regions (DMRs) and gene bodies (Figure 6D and Supplementary Figure S6A).
The arrays allow for detailed analysis of the timing of the DNA methylation change across genomic elements. For example, the retained DNA methylation at heterochromatin is enriched at TRIM28 binding sites (Figure 6E and Supplementary Figure S6B). TRIM28 regulates the transcription of TE, particularly endogenous retroviruses (ERV) (59). Specific ERV elements are known to retain DNA methylation in human PGC development (60) and mice (61). These observations suggest a critical role for DNA methylation in TE suppression for maintaining germ cell genome integrity and intergenerational epigenetic inheritance.
Interestingly, PGC residual methylation is also enriched for the binding of the DNA methylation reader proteins MECP2 and MBD1, supporting their reported roles in DNA methylation-mediated TE regulation (62). Among TE DNA families, LTR elements were more resistant to PGC demethylation than SINEs and LINEs, consistent with their functions as transcriptional promoters (63). The role of methylation retention in TE regulation is further supported by the higher methylation retention in evolutionarily younger TE subfamilies than older TE families (64) (Figure 6F). Intracisternal A Particle (IAP) elements were previously highlighted to exhibit extensive methylation retention in PGCs (65). We identified a heterogeneous pattern of their demethylation dynamics in PGCs. IAPLTR2, IAPEY and IAPA are among the most resistant families, while IAPEY5, IAPLTR4 and IAP5 are the least methylated.
In addition to TEs, imprinted gene DMRs, germ cells, and placenta-specific hypomethylated sites show later DNA methylation loss compared to the rest of the genome. This is due to active DNA demethylation pathways, mediated by TET proteins, that are required for methylation erasure at imprinting control regions (39,66). CGs flanked by A/Ts are more susceptible to aging-associated DNA demethylation (51). We did not observe this sequence context preference for PGC development (Supplementary Figure S6C), suggesting a distinct, TET-mediated demethylation mechanism.
Discussion
Despite the successful employment of Infinium BeadChips in population-scale DNA methylome studies, their potential for ‘difficult’ DNA, i.e. when the input is limited in quality, quantity, or both, has not been fully explored or optimized. This restricts the scope of the Infinium array usage for cell-free DNA, microdissected tissues, and other samples of limited availability. Here, we presented experimental and computational resources to enable array usage in these suboptimal settings, especially with low-input DNA. Experimentally, we explored the array's compatibility with random priming-based whole genome preamplification and enzymatic base conversion by TET/APOBEC3A. Our data suggests that both preamplification and enzymatic base conversion by TET/APOBEC3A using the NEBNext Enzymatic Methyl-seq (Workflow J). Computationally, we developed ELBAR for preserving biological signals from suboptimal input datasets. ELBAR excludes only probes dominated by background noises.
Besides, we comprehensively characterized the biological and technical determinants of array performance from public datasets. From surveying 100 000+ datasets and using probe detection rate as the main performance metric, we found that cell-free DNA, saliva, bone and FFPE samples, tend to have worse detection rates compared to cultured cells, primary and fresh frozen tissues. FFPE samples and those of sheer lower input show a different genomic distribution in signal detection. Plasma cfDNA tends to cause detection loss at CpG-sparse, GC-low genomic regions and preserved detection at CpG-rich regions such as bivalent chromatin. In contrast, FFPE-induced detection loss is less biased across genomic regions. This low-input sample bias can be attributed to the weaker intrinsic signal from GC-low probes (Figure 4C and D). As cfDNA localization is known to be linked to nucleosome footprints and can inform cell of origin (67), the array signal intensity bias may serve as an unconventional source of cell of origin signal to complement the methylation signal that the array data already carries.
Although Illumina recommends a minimum of 200 ng of DNA for current Infinium BeadChips, there has been research exploring lower input amounts. For example, reproducible results were achieved with over 125 ng of DNA from peripheral blood (68), and an input of 16 ng exhibited a high correlation with 500 ng input, albeit with lower reproducibility. Another study identified 75 ng as the minimum requirement (69), and a few other studies suggested that 10 ng is acceptable (35,70,71). The fact that the standard protocol works at these sub-optimal input ranges is likely due to the isothermal amplification. However, the true lower input limit of the technology remains underexplored, and the extent to which precision and sensitivity are compromised with decreased DNA input remains incompletely resolved. Our work shows that Infinium arrays are reasonably compatible with picogram-range input DNA or single-digit cell number. Preamplification using Klenow fragments and enzymatic conversion further magnifies probe signal intensities, leading to reproducible profiling of five-cell methylomes. Interestingly, additional adapter-based PCR amplification did not lead to a further increase in probe detection or accuracy (Supplementary Table S1D), likely due to loss of library complexity from amplification bias (72).
The Infinium technology is currently not cost-competitive for projects that only massively profile methylomes in single cells due to the incompatibility with the sample multiplexing (73). The focus of this work is to explore the capacity of Infinium technology in profiling samples of variable quantities and whether the data is comparable. This is most relevant to applications where DNA can be of high or limited quantities, such as microdissected tissues (24) and cell-free DNA. In fact, techniques shared with sequencing-based methods, such as preamplification using Klenow or other polymerases (27), would benefit both high and low input samples (e.g. the 50-ng samples in Figure 2C and D). We showed that low-input Infinium BeadChip data is comparable to high-input data and is biologically relevant. It reflects the allelic nature of the DNA methylation signal on low DNA input. Intermediate methylation levels were resolved to high and low methylation readings. These binary readings reflect cell-to-cell heterogeneity and when merged, their population averages recapitulated bulk input measurements. Our single-cell array data reached 20% detection, similar to previous deep single-cell WGBS datasets (30).
Previous data analysis workflows relied on a single threshold for determining signal detection success (74). While this is a viable assumption for the high input data, it does not always hold for low-input datasets. In low-input datasets, biological signals overlap more with background signals in signal intensity, particularly for probes with intrinsically low foreground signals. Our analysis showed that this intrinsic propensity is linked to the number of Cs in the probe sequences, likely reminiscent of a GC content bias as the probes are designed to be G-less to pair with converted genomic DNA. The stronger overlaps of biological with background signal not only obfuscate the detection discrimination but also bias the beta value calculation towards 0.5 due to the incomplete subtraction of signal background from the observed compound signal. Users should consider this major tradeoff of measurement precision for sensitivity.
Compared to pOOBAH and other detection calling methods designed to minimize false discovery in high-input datasets, ELBAR seeks to mask only probes fully dominated by signal background, leaving probes with true biological signals visible in downstream analysis. However, we caution that probe readings surviving ELBAR detection may be influenced by background signals to various degrees, leading to unstable quantitative accuracy. The detection p-values can serve as measures of background influences besides their traditional use for probe masking. For high-input samples, pOOBAH and ELBAR perform similarly (Figure 5E and F), suggesting that one may use ELBAR for data from all input settings.
Despite the reduced probe detection in low-input datasets, probes that target multi-copy DNA, such as mitochondria and repetitive elements (e.g. the B1 elements and satellite sequences), retain high signal intensities. In the low-input datasets, we observed that these probes measure aggregated signals from multiple genomic loci, making an unconventional use of the methylation BeadChips—as a tool to study the global epigenetic regulation of multi-copy TEs. In our work, we applied our low-input protocol to profile mouse PGCs. We validated the dynamics of global methylation erasure in PGCs, a sex disparity in remethylation, as well as the demethylation resistance at TRIM28 binding sites which are known to escape germ cell epigenetic remodeling (60). These multi-mapping probes also revealed that evolutionarily younger LTR repeat families retained more methylation than other repeat families. These methylation retentions can protect germ-line genome integrity from TE transcriptional mobilization.
Conclusion
We presented experimental and computational solutions for applying Infinium BeadChips to low-input and single-cell samples. Based on whole-genome preamplification and enzymatic base conversion, our new methods revealed a previously underappreciated low-input potential of this popular methylation profiling assay. We demonstrated the power of these methods by applying them to uncover detailed demethylation dynamics of murine primordial germ cell development.
Supplementary Material
Acknowledgements
We thank the Center for Applied Genomics Genotyping Core at the Children's Hospital of Philadelphia for their help with array processing.
Contributor Information
Sol Moe Lee, Center for Computational and Genomic Medicine, The Children's Hospital of Philadelphia, PA 19104, USA.
Christian E Loo, Graduate Group in Biochemistry and Biophysics, University of Pennsylvania, Philadelphia, PA 19104, USA.
Rexxi D Prasasya, Department of Cell and Developmental Biology, Epigenetics Institute, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA.
Marisa S Bartolomei, Department of Cell and Developmental Biology, Epigenetics Institute, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA.
Rahul M Kohli, Department of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.
Wanding Zhou, Center for Computational and Genomic Medicine, The Children's Hospital of Philadelphia, PA 19104, USA; Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.
Data availability
All BeadChip data produced in this study is available through GEO (accession: GSE239290). ELBAR and other informatics for low input methylation BeadChip are implemented in the SeSAMe (version 1.18.4+) available through Bioconductor (https://doi.org/doi:10.18129/B9.bioc.sesame). ELBAR can also be used in the openSesame workflow with the "I" code specified in the prep= argument (see SeSAMe vignette).
Supplementary data
Supplementary Data are available at NAR Online.
Funding
National Institute of Health [R35-GM146978 to W.Z., R01-HG010646 to R.M.K., F31-HG012892 to C.E.L., GM146388 to M.S.B., R.M.K., F32-HD101230 to R.D.P.]; W.Z.’s startup fund at Children's Hospital of Philadelphia and research sponsorship from FOXO Bioscience. Funding for open access charge: NIH [R35-GM146978].
Conflict of interest statement. W.Z. received Infinium BeadChips from Illumina Inc.
References
- 1. Greenberg M.V.C., Bourc’his D The diverse roles of DNA methylation in mammalian development and disease. Nat. Rev. Mol. Cell Biol. 2019; 20:590–607. [DOI] [PubMed] [Google Scholar]
- 2. Bibikova M., Barnes B., Tsan C., Ho V., Klotzle B., Le J.M., Delano D., Zhang L., Schroth G.P., Gunderson K.L. et al. High density DNA methylation array with single CpG site resolution. Genomics. 2011; 98:288–295. [DOI] [PubMed] [Google Scholar]
- 3. Maden S.K., Thompson R.F., Hansen K.D., Nellore A. Human methylome variation across Infinium 450K data on the Gene Expression Omnibus. NAR Genom. Bioinform. 2021; 3:lqab025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Hawe J.S., Wilson R., Schmid K.T., Zhou L., Lakshmanan L.N., Lehne B.C., Kuhnel B., Scott W.R., Wielscher M., Yew Y.W. et al. Genetic variation influencing DNA methylation provides insights into molecular mechanisms regulating genomic function. Nat. Genet. 2022; 54:18–29. [DOI] [PubMed] [Google Scholar]
- 5. Min J.L., Hemani G., Hannon E., Dekkers K.F., Castillo-Fernandez J., Luijk R., Carnero-Montoro E., Lawson D.J., Burrows K., Suderman M. et al. Genomic and phenotypic insights from an atlas of genetic effects on DNA methylation. Nat. Genet. 2021; 53:1311–1321. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Thompson M., Hill B.L., Rakocz N., Chiang J.N., Geschwind D., Sankararaman S., Hofer I., Cannesson M., Zaitlen N., Halperin E. Methylation risk scores are associated with a collection of phenotypes within electronic health record systems. NPJ Genom Med. 2022; 7:50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Aref-Eshghi E., Kerkhof J., Pedro V.P., Groupe D.I.F., Barat-Houari M., Ruiz-Pallares N., Andrau J.C., Lacombe D., Van-Gils J., Fergelot P. et al. Evaluation of DNA methylation episignatures for diagnosis and phenotype correlations in 42 mendelian neurodevelopmental disorders. Am. J. Hum. Genet. 2020; 106:356–370. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Li M., Zou D., Li Z., Gao R., Sang J., Zhang Y., Li R., Xia L., Zhang T., Niu G. et al. EWAS Atlas: a curated knowledgebase of epigenome-wide association studies. Nucleic. Acids. Res. 2019; 47:D983–D988. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Battram T., Yousefi P., Crawford G., Prince C., Sheikhali Babaei M., Sharp G., Hatcher C., Vega-Salas M.J., Khodabakhsh S., Whitehurst O. et al. The EWAS Catalog: a database of epigenome-wide association studies. Wellcome Open Res. 2022; 7:41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Berdasco M., Esteller M. Clinical epigenetics: seizing opportunities for translation. Nat. Rev. Genet. 2019; 20:109–127. [DOI] [PubMed] [Google Scholar]
- 11. Capper D., Jones D.T.W., Sill M., Hovestadt V., Schrimpf D., Sturm D., Koelsche C., Sahm F., Chavez L., Reuss D.E. et al. DNA methylation-based classification of central nervous system tumours. Nature. 2018; 555:469–474. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Li H.T., Xu L., Weisenberger D.J., Li M., Zhou W., Peng C.C., Stachelek K., Cobrinik D., Liang G., Berry J.L. Characterizing DNA methylation signatures of retinoblastoma using aqueous humor liquid biopsy. Nat. Commun. 2022; 13:5523. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Mannens M., Lombardi M.P., Alders M., Henneman P., Bliek J. Further introduction of DNA methylation (DNAm) arrays in regular diagnostics. Front. Genet. 2022; 13:831452. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Haghani A., Li C.Z., Robeck T.R., Zhang J., Lu A.T., Ablaeva J., Acosta-Rodriguez V.A., Adams D.M., Alagaili A.N., Almunia J. et al. DNA methylation networks underlying mammalian traits. Science. 2023; 381:eabq5693. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Ding W., Kaur D., Horvath S., Zhou W. Comparative epigenome analysis using Infinium DNA methylation BeadChips. Brief. Bioinform. 2023; 24:bbac617. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Arneson A., Haghani A., Thompson M.J., Pellegrini M., Kwon S.B., Vu H., Maciejewski E., Yao M., Li C.Z., Lu A.T. et al. A mammalian methylation array for profiling methylation levels at conserved sequences. Nat. Commun. 2022; 13:783. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Moratilla A., Sainz de la Maza D., Cadenas Martin M., Lopez-Iglesias P., Gonzalez-Peramato P., De Miguel M.P. Inhibition of PKCepsilon induces primordial germ cell reprogramming into pluripotency by HIF1&2 upregulation and histone acetylation. Am. J. Stem Cells. 2021; 10:1–17. [PMC free article] [PubMed] [Google Scholar]
- 18. Vasioukhin V., Anker P., Maurice P., Lyautey J., Lederrey C., Stroun M. Point mutations of the N-ras gene in the blood plasma DNA of patients with myelodysplastic syndrome or acute myelogenous leukaemia. Br. J. Haematol. 1994; 86:774–779. [DOI] [PubMed] [Google Scholar]
- 19. Laird P.W. The power and the promise of DNA methylation markers. Nat. Rev. Cancer. 2003; 3:253–266. [DOI] [PubMed] [Google Scholar]
- 20. Lenaerts L., Tuveri S., Jatsenko T., Amant F., Vermeesch J.R. Detection of incipient tumours by screening of circulating plasma DNA: hype or hope?. Acta Clin. Belg. 2020; 75:9–18. [DOI] [PubMed] [Google Scholar]
- 21. Vidaki A., Kayser M. From forensic epigenetics to forensic epigenomics: broadening DNA investigative intelligence. Genome Biol. 2017; 18:238. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Moss J., Magenheim J., Neiman D., Zemmour H., Loyfer N., Korach A., Samet Y., Maoz M., Druid H., Arner P. et al. Comprehensive human cell-type methylation atlas reveals origins of circulating cell-free DNA in health and disease. Nat. Commun. 2018; 9:5068. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Houseman E.A., Molitor J., Marsit C.J. Reference-free cell mixture adjustments in analysis of DNA methylation data. Bioinformatics. 2014; 30:1431–1439. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Gagnon J.F., Sanschagrin F., Jacob S., Tremblay A.A., Provencher L., Robert J., Morin C., Diorio C. Quantitative DNA methylation analysis of laser capture microdissected formalin-fixed and paraffin-embedded tissues. Exp. Mol. Pathol. 2010; 88:184–189. [DOI] [PubMed] [Google Scholar]
- 25. Siegmund K.D., Marjoram P., Woo Y.J., Tavare S., Shibata D Inferring clonal expansion and cancer stem cell dynamics from DNA methylation patterns in colorectal cancers. Proc. Natl. Acad. Sci. U.S.A. 2009; 106:4828–4833. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Hernandez H.G., Sandoval-Hernandez A.G., Garrido-Gil P., Labandeira-Garcia J.L., Zelaya M.V., Bayon G.F., Fernandez A.F., Fraga M.F., Arboleda G., Arboleda H. Alzheimer's disease DNA methylome of pyramidal layers in frontal cortex: laser-assisted microdissection study. Epigenomics. 2018; 10:1365–1382. [DOI] [PubMed] [Google Scholar]
- 27. Aaltonen K.E., Ebbesson A., Wigerup C., Hedenfalk I. Laser capture microdissection (LCM) and whole genome amplification (WGA) of DNA from normal breast tissue — optimization for genome wide array analyses. BMC Res. Notes. 2011; 4:69. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Ahn J., Heo S., Lee J., Bang D Introduction to single-cell DNA methylation profiling methods. Biomolecules. 2021; 11:1013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Iqbal W., Zhou W. Computational methods for single-cell DNA methylome analysis. Genomics Proteomics Bioinformatics. 2023; 21:48–66. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Smallwood S.A., Lee H.J., Angermueller C., Krueger F., Saadeh H., Peat J., Andrews S.R., Stegle O., Reik W., Kelsey G. Single-cell genome-wide bisulfite sequencing for assessing epigenetic heterogeneity. Nat. Methods. 2014; 11:817–820. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Hui T., Cao Q., Wegrzyn-Woltosz J., O’Neill K., Hammond C.A., Knapp D., Laks E., Moksa M., Aparicio S., Eaves C.J. et al. High-resolution single-cell DNA methylation measurements reveal epigenetically distinct hematopoietic stem cell subpopulations. Stem Cell Rep. 2018; 11:578–592. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Wang T., Loo C.E., Kohli R.M. Enzymatic approaches for profiling cytosine methylation and hydroxymethylation. Mol Metab. 2022; 57:101314. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Aryee M.J., Jaffe A.E., Corrada-Bravo H., Ladd-Acosta C., Feinberg A.P., Hansen K.D., Irizarry R.A. Minfi: a flexible and comprehensive bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics. 2014; 30:1363–1369. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Zhou W., Triche T.J. Jr, Laird P.W., Shen H SeSAMe: reducing artifactual detection of DNA methylation by Infinium BeadChips in genomic deletions. Nucleic Acids Res. 2018; 46:e123. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Whalley C., Payne K., Domingo E., Blake A., Richman S., Brooks J., Batis N., Spruce R., Consortium S.C., Mehanna H. et al. Ultra-low DNA input into whole genome methylation assays and detection of oncogenic methylation and copy number variants in circulating tumour DNA. Epigenomes. 2021; 5:6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Moran S., Vizoso M., Martinez-Cardus A., Gomez A., Matias-Guiu X., Chiavenna S.M., Fernandez A.G., Esteller M. Validation of DNA methylation profiling in formalin-fixed paraffin-embedded samples using the Infinium HumanMethylation450 Microarray. Epigenetics. 2014; 9:829–833. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Ghantous A., Saffery R., Cros M.P., Ponsonby A.L., Hirschfeld S., Kasten C., Dwyer T., Herceg Z., Hernandez-Vargas H. Optimized DNA extraction from neonatal dried blood spots: application in methylome profiling. BMC Biotechnol. 2014; 14:60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Lengner C.J., Camargo F.D., Hochedlinger K., Welstead G.G., Zaidi S., Gokhale S., Scholer H.R., Tomilin A., Jaenisch R. Oct4 expression is not required for mouse somatic stem cell self-renewal. Cell Stem Cell. 2007; 1:403–415. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Prasasya R.D., Caldwell B.A., Liu Z., Wu S., Leu N.A., Fowler J.M., Cincotta S.A., Laird D.J., Kohli R.M., Bartolomei M.S. TET1 Catalytic activity is required for reprogramming of imprinting control regions and patterning of sperm-specific hypomethylated regions. 2023; bioRxiv doi:21 February 2023, preprint: not peer reviewed 10.1101/2023.02.21.529426. [DOI] [PMC free article] [PubMed]
- 40. Gross J.A., Lefebvre F., Lutz P.E., Bacot F., Vincent D., Bourque G., Turecki G. Variations in 5-methylcytosine and 5-hydroxymethylcytosine among human brain, blood, and saliva using oxBS and the Infinium MethylationEPIC array. Biol. Methods Protoc. 2016; 1:1–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Solomon O., Macisaac J.L., Tindula G., Kobor M.S., Eskenazi B., Holland N. 5-Hydroxymethylcytosine in cord blood and associations of DNA methylation with sex in newborns. Mutagenesis. 2019; 34:315–322. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Grit J.L., Johnson B.K., Dischinger P.S., C J.E., Adams M., Campbell S., Pollard K., Pratilas C.A., Triche T.J. Jr, Graveel C.R et al. Distinctive epigenomic alterations in NF1-deficient cutaneous and plexiform neurofibromas drive differential MKK/p38 signaling. Epigenetics Chromatin. 2021; 14:7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Zhou W., Hinoue T., Barnes B., Mitchell O., Iqbal W., Lee S.M., Foy K.K., Lee K.H., Moyer E.J., VanderArk A. et al. DNA methylation dynamics and dysregulation delineated by high-throughput profiling in the mouse. Cell Genom. 2022; 2:100144–100153. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Hill P.W.S., Leitch H.G., Requena C.E., Sun Z., Amouroux R., Roman-Trufero M., Borkowska M., Terragni J., Vaisvila R., Linnett S. et al. Epigenetic reprogramming enables the transition from primordial germ cell to gonocyte. Nature. 2018; 555:392–396. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Shen H., Shih J., Hollern D.P., Wang L., Bowlby R., Tickoo S.K., Thorsson V., Mungall A.J., Newton Y., Hegde A.M. et al. Integrated molecular characterization of testicular germ cell tumors. Cell Rep. 2018; 23:3392–3406. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Gao G.F., Parker J.S., Reynolds S.M., Silva T.C., Wang L.B., Zhou W., Akbani R., Bailey M., Balu S., Berman B.P. et al. Before and after: comparison of legacy and harmonized TCGA Genomic Data Commons' Data. Cell Syst. 2019; 9:24–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Bibikova M., Le J., Barnes B., Saedinia-Melnyk S., Zhou L., Shen R., Gunderson K.L. Genome-wide DNA methylation profiling using Infinium(R) assay. Epigenomics. 2009; 1:177–200. [DOI] [PubMed] [Google Scholar]
- 48. Wang L., Zhang J., Duan J., Gao X., Zhu W., Lu X., Yang L., Zhang J., Li G., Ci W. et al. Programming and inheritance of parental DNA methylomes in mammals. Cell. 2014; 157:979–991. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. Kaur D., Lee S.M., Goldberg D., Spix N.J., Hinoue T., Li H.-T., Dwaraka V.B., Smith R., Shen H., Liang G. Comprehensive evaluation of the Infinium human MethylationEPIC v2 BeadChip. Epigenetics Commun. 2023; 3:6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Wong K.K., Stillwell L.C., Dockery C.A., Saffer J.D. Use of tagged random hexamer amplification (TRHA) to clone and sequence minute quantities of DNA–application to a 180 kb plasmid isolated from Sphingomonas F199. Nucleic Acids Res. 1996; 24:3778–3783. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51. Zhou W., Dinh H.Q., Ramjan Z., Weisenberger D.J., Nicolet C.M., Shen H., Laird P.W., Berman B.P. DNA methylation loss in late-replicating domains is linked to mitotic cell division. Nat. Genet. 2018; 50:591–602. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. Rooney J.P., Ryde I.T., Sanders L.H., Howlett E.H., Colton M.D., Germ K.E., Mayer G.D., Greenamyre J.T., Meyer J.N. PCR based determination of mitochondrial DNA copy number in multiple species. Methods Mol. Biol. 2015; 1241:23–38. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53. Pradhan D., Jour G., Milton D., Vasudevaraja V., Tetzlaff M.T., Nagarajan P., Curry J.L., Ivan D., Long L., Ding Y. et al. Aberrant DNA methylation predicts melanoma-specific survival in patients with acral melanoma. Cancers (Basel). 2019; 11:2031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54. Leitch H.G., Nichols J., Humphreys P., Mulas C., Martello G., Lee C., Jones K., Surani M.A., Smith A. Rebuilding pluripotency from primordial germ cells. Stem Cell Rep. 2013; 1:66–78. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55. Teng H., Sui X., Zhou C., Shen C., Yang Y., Zhang P., Guo X., Huo R. Fatty acid degradation plays an essential role in proliferation of mouse female primordial germ cells via the p53-dependent cell cycle regulation. Cell Cycle. 2016; 15:425–431. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56. Saitou M., Yamaji M. Primordial germ cells in mice. Cold Spring Harb. Perspect. Biol. 2012; 4:a008375. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57. Kurimoto K., Saitou M. Germ cell reprogramming. Curr. Top. Dev. Biol. 2019; 135:91–125. [DOI] [PubMed] [Google Scholar]
- 58. Gkountela S., Zhang K.X., Shafiq T.A., Liao W.W., Hargan-Calvopina J., Chen P.Y., Clark A.T. DNA demethylation dynamics in the Human prenatal germline. Cell. 2015; 161:1425–1436. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59. Rowe H.M., Jakobsson J., Mesnard D., Rougemont J., Reynard S., Aktas T., Maillard P.V., Layard-Liesching H., Verp S., Marquis J. et al. KAP1 controls endogenous retroviruses in embryonic stem cells. Nature. 2010; 463:237–240. [DOI] [PubMed] [Google Scholar]
- 60. Tang W.W., Dietmann S., Irie N., Leitch H.G., Floros V.I., Bradshaw C.R., Hackett J.A., Chinnery P.F., Surani M.A. A unique gene regulatory network resets the Human germline epigenome for development. Cell. 2015; 161:1453–1467. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61. Lane N., Dean W., Erhardt S., Hajkova P., Surani A., Walter J., Reik W. Resistance of IAPs to methylation reprogramming may provide a mechanism for epigenetic inheritance in the mouse. Genesis. 2003; 35:88–93. [DOI] [PubMed] [Google Scholar]
- 62. Yu F., Zingler N., Schumann G., Stratling W.H. Methyl-CpG-binding protein 2 represses LINE-1 expression and retrotransposition but not Alu transcription. Nucleic Acids Res. 2001; 29:4493–4501. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63. Klaver B., Berkhout B. Comparison of 5' and 3' long terminal repeat promoter function in human immunodeficiency virus. J. Virol. 1994; 68:3830–3840. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64. Zhou W., Liang G., Molloy P.L., Jones P.A. DNA methylation enables transposable element-driven genome expansion. Proc. Natl. Acad. Sci. U.S.A. 2020; 117:19359–19366. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65. Seisenberger S., Andrews S., Krueger F., Arand J., Walter J., Santos F., Popp C., Thienpont B., Dean W., Reik W. The dynamics of genome-wide DNA methylation reprogramming in mouse primordial germ cells. Mol. Cell. 2012; 48:849–862. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66. Yamaguchi S., Hong K., Liu R., Inoue A., Shen L., Zhang K., Zhang Y. Dynamics of 5-methylcytosine and 5-hydroxymethylcytosine during germ cell reprogramming. Cell Res. 2013; 23:329–339. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67. Snyder M.W., Kircher M., Hill A.J., Daza R.M., Shendure J. Cell-free DNA comprises an In vivo nucleosome footprint that informs its tissues-of-origin. Cell. 2016; 164:57–68. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68. Christiansen S.N., Andersen J.D., Kampmann M.L., Liu J., Andersen M.M., Tfelt-Hansen J., Morling N. Reproducibility of the Infinium methylationEPIC BeadChip assay using low DNA amounts. Epigenetics. 2022; 17:1636–1645. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69. Abbasi M., Smith A.D., Swaminathan H., Sangngern P., Douglas A., Horsager A., Carrell D.T., Uren P.J. Establishing a stable, repeatable platform for measuring changes in sperm DNA methylation. Clin Epigenetics. 2018; 10:119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70. Watkins S.H., Ho K., Testa C., Falk L., Soule P., Nguyen L.V., FitzGibbon S., Slack C., Chen J.T., Davey Smith G. et al. The impact of low input DNA on the reliability of DNA methylation as measured by the Illumina Infinium MethylationEPIC BeadChip. Epigenetics. 2022; 17:2366–2376. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71. Hovestadt V., Remke M., Kool M., Pietsch T., Northcott P.A., Fischer R., Cavalli F.M., Ramaswamy V., Zapatka M., Reifenberger G. et al. Robust molecular subgrouping and copy-number profiling of medulloblastoma from small amounts of archival tumour material using high-density DNA methylation arrays. Acta Neuropathol. 2013; 125:913–916. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72. Polz M.F., Cavanaugh C.M. Bias in template-to-product ratios in multitemplate PCR. Appl. Environ. Microb. 1998; 64:3724–3730. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73. Mulqueen R.M., Pokholok D., Norberg S.J., Torkenczy K.A., Fields A.J., Sun D., Sinnamon J.R., Shendure J., Trapnell C., O’Roak B.J. et al. Highly scalable generation of DNA methylation profiles in single cells. Nat. Biotechnol. 2018; 36:428–431. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74. Lehne B., Drong A.W., Loh M., Zhang W., Scott W.R., Tan S.T., Afzal U., Scott J., Jarvelin M.R., Elliott P. et al. A coherent approach for analysis of the Illumina HumanMethylation450 BeadChip improves data quality and performance in epigenome-wide association studies. Genome Biol. 2015; 16:37. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75. van der Velde A., Fan K., Tsuji J., Moore J.E., Purcaro M.J., Pratt H.E., Weng Z. Annotation of chromatin states in 66 complete mouse epigenomes during development. Commun. Biol. 2021; 4:239. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All BeadChip data produced in this study is available through GEO (accession: GSE239290). ELBAR and other informatics for low input methylation BeadChip are implemented in the SeSAMe (version 1.18.4+) available through Bioconductor (https://doi.org/doi:10.18129/B9.bioc.sesame). ELBAR can also be used in the openSesame workflow with the "I" code specified in the prep= argument (see SeSAMe vignette).