Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2010 Jun 23;107(28):12587–12592. doi: 10.1073/pnas.1005173107

High-throughput method for analyzing methylation of CpGs in targeted genomic regions

Shivani Nautiyal a, Victoria E H Carlton a, Yontao Lu a, James S Ireland a, Diane Flaucher a, Martin Moorhead a, Joe W Gray b, Paul Spellman b, Michael Mindrinos c, Paul Berg d, Malek Faham a,1
PMCID: PMC2906552  PMID: 20616066

Abstract

A unique microarray-based method for determining the extent of DNA methylation has been developed. It relies on a selective enrichment of the regions to be assayed by target amplification by capture and ligation (mTACL). The assay is quantitatively accurate, relatively precise, and lends itself to high-throughput determination using nanogram amounts of DNA. The measurements using mTACLs are highly reproducible and in excellent agreement with those obtained by sequencing (r = 0.94). In the present work, the methylation status of >145,000 CpGs from 5,472 promoters in 221 samples was measured. The methylation levels of nearby CpGs are correlated, but the correlation falls off dramatically over several hundred base pairs. In some instances, nearby CpGs have very different levels of methylation. Comparison of normal and tumor samples indicates that in tumors, the promoter regions of genes involved in differentiation and signaling are preferentially hypermethylated, whereas those of housekeeping genes remain hypomethylated. mTACL is a platform for profiling the state of methylation of a large number of CpG in many samples in a cost-effective fashion, and is capable of scaling to much larger numbers of CpGs than those collected here.

Keywords: array, technology, tumor


In mammals, methylation of cytosines in CpG dinucleotides plays an essential role in normal development; mice that are deficient in DNMT1, the maintenance methyltransferse, die early in development (1). Methylation of promoters is generally associated with gene silencing and is essential for processes such as cell differentiation, imprinting, and X inactivation. Aberrant methylation has been noted in a variety of diseases, including rare imprinting defects and genomic instability syndromes in almost all forms of cancer studied to date (25).

Despite the importance of DNA methylation, high-resolution analyses of genome-wide methylation patterns have been hampered by a lack of technologies that are quantitative, high-throughput, cost-effective, and both scalable and flexible with respect to coverage. Several existing technologies rely on the use of methylation-sensitive and/or methylation-specific endonucleases or the use of antibodies or other proteins that bind 5-methylcytosine (5mC) to enrich for regions of DNA with a high density of 5mCs. These technologies are generally low resolution and provide an indirect readout of the methylation status relative to the greater-resolution approaches discussed herein (612).

Another group of technologies relies on the fact that treatment of DNA with sodium bisulfite causes deamination of unmethylated cytosines (C) to uracils, whereas 5mCs remain unchanged. These differences in reactivity of C's and 5mCs to bisulfite can be distinguished by genotyping or sequencing methodologies. Because the method based on bisulfite treatment followed by DNA sequencing was first reported by Frommer et al. in 1992 (13), it has become the gold standard despite continuing technical challenges. Bisulfite treatment degrades DNA and reduces sequence uniqueness (creating DNA with essentially three bases rather than four), thereby making target amplification (required for most assays) difficult. Recently, whole-genome approaches using next-generation sequencing have been developed. These approaches are capable of obtaining comprehensive assessment on a small number of samples (14, 15). For the purposes of biomarker discovery, cost-effective, high-throughput technologies focused on studying large numbers of samples for a large number of regions are required. To that end, methods that employ complexity reduction before sequencing have recently been described (16, 17). These methods can potentially allow the assessment of many samples through focusing of sequencing to specific segments, but the scalability of these approaches is yet to be demonstrated. Similarly, some array-based genotyping technologies capable of high sample throughput have been adapted for use with bisulfate-converted DNA (18). The number of assayed CpGs, however, has been modest, with ∼27,000 CpGs being the highest panel size tested thus far (19).

We describe here a unique technology to assess methylation by target amplification by capture and ligation (mTACL) based on targeted capture of specific regions of interest and interrogating the methylation level of CpGs that they contain. The readout platform we used relies on microarrays; sequencing can also be used if that is deemed to be more cost-effective. Using the mTACL approach, the regions to be analyzed are first captured and ligated to common primers, reacted with bisulfite, amplified, and then analyzed by hybridization of the product to a microarray. Our high throughput approach allows analysis of hundreds of thousands of CpGs from many samples in parallel.

mTACL's main advantage is that it captures the DNA fragment in an efficient and specific manner. The efficiency is due to the hybridization of the genomic sequence to a long probe sequence (>70 bp), and that the ligation is efficient and is not competing with other processes. The efficiency of the method is translated into a low DNA requirement (<200 ng genomic DNA). Specificity is generated through the requirement of two specific ligation events on both ends of the captured molecule. An additional aspect of the specificity is that the bisulfite conversion occurs only after the complexity of the target has been reduced to 0.1% of the genome. Thus, problems associated with the sequence degeneracy that are introduced by bisulfite conversion of the entire genome are greatly decreased because only the relevant regions are interrogated.

Using mTACL, we have quantitatively assessed the methylation of 145,148 CpGs from the promoters of 5,472 genes in 221 samples. The high accuracy and precision of mTACL is demonstrated by comparison with direct sequencing of DNA treated with bisulfite, by assaying DNA with known methylation levels, and by repeatability of the measurements. Using this assay, we have examined the DNA sequences surrounding the interrogated CpGs and sought to discern any genomic sequences that might influence the level of methylation. The methylation difference between normal and tumor genomes has also been investigated. Finally, we sought to identify those gene classes whose CpGs are differentially methylated in normal and tumor samples.

Results

Assay Overview.

The basic mTACL procedure relies on a step in which preselected regions of the genome (the targets) are captured to reduce the complexity of the genomic DNA to be analyzed (Fig. 1). The capture is achieved using segments of DNA complementary to the targets, except that all of the thymidines (T) have been substituted with uridines (U). These “dU capture probes” (dU probes) contain sequences complementary to the targets to be analyzed, flanked by two common regions shared by all dU probes. The common regions match PCR primers used later for amplification. For the current mTACL panel, the dU probes were designed for fragments that would result from the digestion of genomic DNA with MspI and HpyF3I endonucleases, though any other combinations of endonucleases could be used. We denatured 200 ng of digested genomic DNA in the presence of ∼19,250 dU probes and adapter oligonucleotides (oligos) complementary to the common regions in the dU probes. All of the C's in the adapter oligonucleotides are substituted with 5mC. A “touchdown-annealing” protocol was used to hybridize the genomic DNA with the dU probes and the adapter oligos, and the adapter oligos were then ligated to the ends of the genomic DNA. After ligation, the original dU probes were removed by digestion with uracil DNA-glycosylase, leaving only the target genomic DNA with the common primers ligated to their ends. Next, the target DNA was treated with bisulfite to convert unmethylated cytosines to uracils. The bisulfate-treated DNA was then amplified using primers homologous to the regions common to all dU probes and hybridized to a microarray. Because the target sequences that are amplified and hybridized on the array represent only 0.1% of the genome, the potential for cross-hybridization is greatly reduced. Because the adapter sequences have 5mC and are therefore not converted by bisulfite, the amplification primers are highly specific to the adapter sequences.

Fig. 1.

Fig. 1.

Assay overview. (A) dU probes are tools for the targeted capture of genomic loci. An individual probe consists of a double-stranded DNA molecule in which all of the T's in the sequence have been substituted with dU. For simplicity, the figure shows only the strand that is used for capture. The central part of the probe, shown in magenta, contains a targeting region that corresponds exactly to a genomic locus of interest. At the ends of the molecule there are two distinct common regions, shown in green and yellow; these direct the ligation of adapters to the genomic target. Within a panel of dU probes, the target region varies, but the common regions are the same. (B) Scheme for determining methylation using the dU probes and bisulfite treatment. The genomic DNA is digested and combined with a dU probe panel and adapter primers that correspond in sequence to common regions 1 and 2, shown in peach and cyan, respectively. Note that the adapter oligos are methylated at all C residues, and the adapter corresponding to common region 2 is phosphorylated. For ease of visualization, a line has been drawn between the common region and the targeting regions of the dU probe. The targeting region of each dU probe contains a sequence that corresponds exactly to the restriction fragment of interest. The mixture is denatured and allowed to anneal. In some cases, structures will form where the adapter oligos and target hybridize perfectly (with no gaps between junctions) to the dU probe. In such cases, the adapters can be ligated onto the target and thus captured. Only the forward strand of the genomic DNA target is captured, because the adapters correspond to the sequence from the forward strand of the dU probe. Once ligation has taken place, the sample is incubated with uracil DNA glycosylase followed by heat treatment to destroy the dU probe. The captured molecules are treated with bisulfite and amplified by PCR. The amplified material is fragmented, labeled, and hybridized to a microarray that detects whether C residues at CpG positions were converted to U or remained C.

The dU probes (70–350 bp in size) were designed to capture sequences from ∼19,250 regions (comprising ∼3 Mb in total). These regions were chosen to span around transcription start sites (1.5 kb upstream and downstream of transcription start sites) of 5,472 genes potentially relevant in tumorigenesis (see SI Text for gene selection criteria). Some characteristics of the panel are shown in Fig. S1. Approximately 170,000 CpG were contained in the designed dU probes.

The microarray contains 21-mer probes that span across all of the CpGs in the target DNA. Each probe has two versions, one where the CpG(s) spanned by the probe are methylated, and one corresponding to the sequence in which CpG(s) spanned by the probe are unmethylated; this pair is collectively referred to as a probe set. The relative signal of the two probes is used to infer the extent of CpG methylation. Each CpG in the target sequence is assayed by at least three different probe sets.

Methylation Estimation.

To translate hybridization signals into methylation values, we analyzed artificial samples with known levels of methylation (0%, 10%, 25%, 50%, 75%, and 100%). Logistic regression was used to fit a model relating relative probe signal to percentage methylation for each probe set, and goodness of fit was assessed with r2. The level of methylation for each CpG was determined by using an r2-weighted average of the values for each of the probe sets that covered a CpG. We also used r2 values to limit the panel to CpGs for which we had high confidence in the estimate of methylation levels. After eliminating underperforming CpGs, ∼145,000 CpGs could be reliably assayed. The algorithm is described in more detail in SI Text. We performed 221 mTACL assays on 194 unique samples. A list of the samples used in all of the mTACL assays is shown in Table S1.

Evaluation of mTACL Performance.

CpG methylation values using the mTACL procedure were validated by their high repeatability, by the ability to distinguish different levels of methylation in artificial samples, and by their concordance with those obtained by sequencing. Additional validation was obtained by measurements of CpG methylation in the X chromosome (Fig. S2).

Repeatability.

The reproducibility of the assay was determined by comparing the results with replicates of the same DNA, each processed separately. Thus, the repeatability reflects the precision of the entire assay: target capture, bisulfite conversion, amplification, labeling, and hybridization to the microarrays. Nineteen samples in the study were run in duplicate, and four samples were run in triplicate. The average Pearson correlation coefficient (r) of autosomal CpGs in replicates was 0.987 ± 0.006.

Ability to Distinguish Different Methylation Levels Using ROC Analysis.

The ability to distinguish different levels of CpG methylation was tested by determining the frequency of false positive vs. true positive methylation-status calls using artificial samples of known 5mCpG content. For example, the extent of CpG methylation in DNA lacking any methylation was compared with DNA containing 25% CpG methylation.

Ideally, by comparing the array probes complementary to the methylated and unmethylated sequences, a relative signal threshold can be determined that differentiates CpGs in the 0% and 25% methylated samples with no misassignment. In reality, different choices of the relative signal threshold result in a tradeoff between sensitivity (how completely the 25% CpG methylated samples are assigned to a high methylation state) and specificity (how frequently the 0% CpG methylated samples are incorrectly assigned a high methylation state). The tradeoff can be depicted in a receiver operating characteristic (ROC) plot, which depicts a true positive rate vs. a false positive rate; each point on the curve corresponds to a different choice of relative signal threshold. ROC plots for mTACL (Fig. S3) show that the assay has excellent ability to discriminate between different methylation states.

Comparison with Sequencing.

Bisulfite conversion followed by high-coverage sequencing is generally regarded as the gold standard in assessing CpG methylation levels (13). Hence, we assessed the accuracy of mTACL results by comparing them with those obtained by sequencing of the bisulfate-treated DNA using a 454 Life Sciences sequencer. Details on the 454 sequencing is described in SI Text.

To capture DNA for sequencing, we used 383 dU probes (a subset of the mTACL panel) rather than the full ∼19,250-probe panel to ensure sequence coverage sufficiently high to quantitatively measure methylation levels. The 454 sequencing results were compared with the mTACL data generated with the full dU probe panel. Two DNA samples were tested (both from Coriell cell lines, one from a HapMap sample, and one from an individual with an X-chromosome aneuploidy), and the analysis was limited to autosomal CpGs for which there was high sequence coverage (≥100 reads). The results are in high agreement (Fig. 2). The two mTACL repeats with NA06991 DNAs had Pearson correlations (r) with the sequencing data of 0.934 and 0.916; the r values characterizing the two measurements with NA 06061 DNA using the mTACL or by sequencing were 0.964 and 0.958.

Fig. 2.

Fig. 2.

Comparison of the state of methylation between mTACL and 454 sequencing. Each point represents data from a single CpG. The y axis shows the level of methylation of a particular CpG as determined by 454 sequencing of bisulfate-treated DNA, and the x axis shows the methylation level determined by mTACL.

Correlation of Methylation Levels over Distance.

Often, measurements of methylation are restricted to one or a few CpGs in a promoter, presumably on the supposition that nearby CpGs share methylation status (18, 20, 21). Our data from a large number of CpGs in many samples offers an opportunity to determine the extent to which CpG methylation status is shared as a function of the distance between CpGs. Accordingly, the methylation levels of all possible pairs of CpGs within 5 kb of each other were compared for each of the different samples. We found that there was a high correlation (r) in the methylation state for neighboring CpGs (<50 bp), but this correlation dropped quickly with distance (Fig. 3 shows data averaged for each of the four DNAs). The data for the correlation values as a function of distance between CpGs was fit to an exponential function, and a correlation length (the length corresponding to a drop-off in methylation level by a factor of e) was determined for each of the different DNAs. Interestingly, the correlation length was similar for the different DNAs, but the DNA from the normal cell lines was consistently lower. Inspection of the cell line data on a sample-by-sample basis revealed that the lower correlation is not due to a few cell lines with a low correlation.

Fig. 3.

Fig. 3.

Correlation of the level of methylation level with distance between CpGs. The x axis shows the distance between two CpGs, and the y axis shows the correlation (r) in methylation level. The four curves shown are for 72 normal samples (purple), 37 tumor samples (green), 47 normal cell lines (blue), and 61 tumor cell lines (red). Each curve was fit to an exponential function, and the following fits were obtained. For tumor samples: average methylation = 0.89·e^(−0.0008·distance) with r2 = 0.999; for normal samples: average methylation = 0.94·e^(−0.0009·distance) with r2 = 0.989; for normal cell lines: average methylation = 0.85·e^(−0.001·distance) with r2 = 0.997; for tumor cell l lines: average methylation = 0.86·e^(−0.0009·distance) with r2 = 0.998. From these fits we calculated correlation lengths (the length corresponding to a drop-off in methylation level by a factor of e) of 1190, 1060, 856, and 1,100 bp for tumor samples, normal samples, normal cell line, and tumor cell lines, respectively.

When the correlation analysis was limited to CpGs outside of CpG islands, a similar pattern was seen, though with far fewer data points.

We often found cases in which nearby CpGs had very different methylation values—a phenomenon we term methylation discontinuity. One such example is shown in Fig. 4, where CpGs that are less than 20 base pairs apart have dramatic differences in methylation; two mTACL measurements are shown as green diamonds. The methylation discontinuity in these cases was confirmed by 454 sequencing (purple diamonds in Fig. 4).

Fig. 4.

Fig. 4.

Methylation levels can change dramatically over small distances. The x axis indicates distance from the first CpG in the region of interest, and the y axis the estimated methylation levels. The green diamonds represent the two mTACL repeats of NA06991; purple diamonds show the methylation values obtained from 454 sequencing data.

We have examined the proportion of CpGs that show this discontinuity using criteria defined in Methods, and found that discontinuity is uncommon but not rare; in normal samples, an average of 1% of nearby CpGs showed discontinuity in at least one sample. Significantly, more CpGs showed discontinuity in the other DNA types (P values all <1e−5): 1.6% in tumors, 2.9% for normal cell lines, and 3.6% for tumor-derived cell lines. Because most of the CpGs that are in close proximity are in segments of high CpG density, the average data described mainly pertain to these CpG islands. The 5% of CpG pairs that were outside of CpG islands had a higher rate of discontinuity in methylation (6.5%) than CpGs pairs that were within islands (2.0%). When one of the CpGs is in a region of high CpG density segment and the other CpG is outside, the rate of discontinuity was 7.5%.

The biological significance of the methylation discontinuity remains to be elucidated, and further studies using high-resolution techniques such as mTACL are warranted.

CpGs Differentially Methylated in Tumor and Normal Samples.

It is well established that there are widespread differences in CpG methylation between normal and tumor cell DNA (4, 22). The extent of those differences, determined by examining the levels of methylation in fresh colon samples, was 29 matched normal/tumor pairs. To determine the significance of any differences between normal and tumor colon CpGs, we used a t test assuming unequal variance. We observed significant differences among a substantial number of CpGs. Over half the CpGs reached a threshold of P < 0.05, whereas only 5% would be expected to do so by chance. One gene, RUNX3, was highlighted in our data (Fig. 5) as it contained 7 of the 10 most significantly differentially methylated CpGs. RUNX3 has been reported to be hypermethylated in tumors (23), but our analysis showed that the differences in methylation was limited to a portion of the promoter for the p44 isoform (the p46 isoform promoter is methylated in both normal and tumor colon samples) and the degree of differential methylation is CpG specific.

Fig. 5.

Fig. 5.

Differential tumor and normal methylation in RUNX3. Heat map of the levels of methylation in tumor and normal samples for RUNX3 (green is low methylation, red is high). CpGs are shown sorted by position along the y axis, and samples are shown along the x axis. CpGs for two promoters of RUNX3 are displayed.

Sequence Parameters Affecting Methylation Levels.

We also sought to determine if there is a relationship between the level of CpG methylation and the surrounding sequences using a logistic regression model (SI Text). The two most important variables were the local CpG density and the overall density of CpGs in the promoter—that is, whether the promoters are in the high or low CpG category as defined by Saxonov et al. (24). [Promoters are classified as high CpG or low CpG based on the observed vs. expected CpG density in the 3 kb surrounding the transcription start site (TSS).] Our findings show that the average level of methylation in promoters with high CpG content is significantly lower than in promoters with low CpG content (Fig. S4A). Methylation levels also vary dramatically with local CpG density (defined as 100 bp on each side of the CpG); in both normal and tumor samples, the levels of methylation decreased with higher CpG density (Fig. S4B).

Gene Function and Differential Methylation Between Tumors and Normal Tissue.

We investigated whether the promoters of certain classes of genes are more likely than others to be hypermethylated in tumor compared with normal. With fresh colon samples (29 matched normal/tumor pairs), we identified the top 200 hypermethylated promoters described in Methods.

These 200 genes were classified according to their gene ontology (GO) category; GO provides a standardized nomenclature to describe the molecular function, biological process, and cellular location of their gene products (24). If hypermethylated genes were randomly distributed among the different GO categories, then the proportion of hypermethylated genes in the different GO categories would be the same. However, we sought GO categories that have either above or below the expected number of hypermethylated genes. Using a hypergeometric test (described in Methods), we observed that hypermethylated genes were not randomly distributed among GO categories. Many GO categories are highly enriched, and others are underrepresented (the top 20 GO categories are shown in Table 1; a complete list with P values below 0.001 is presented in Table S2).

Table 1.

Top 20 Gene Ontology categories significantly over- or underenriched for hypermethylated genes in colon tumors

GO ID GO type GO category P value, colon Odds ratio, colon P value, breast Odds ratio, breast
Categories overenriched for hypermethylated genes
 GO:0004888 MF Transmembrane receptor activity 4E-16 5.0 2E-04 2.2
 GO:0016021 CC Integral to membrane 3E-15 4.2 2E-03 1.8
 GO:0031224 CC Intrinsic to membrane 3E-15 4.2 2E-03 1.8
 GO:0005887 CC Integral to plasma membrane 5E-15 4.6 2E-03 1.9
 GO:0031226 CC Intrinsic to plasma membrane 8E-15 4.5 3E-03 1.9
 GO:0004872 MF Receptor activity 1E-14 4.3 1E-04 2.1
 GO:0044459 CC Plasma membrane part 2E-14 4.1 2E-03 1.8
 GO:0004930 MF G protein coupled receptor activity 3E-14 6.2
 GO:0044425 CC Membrane part 3E-12 3.5 4E-03 1.7
 GO:0007268 BP Synaptic transmission 3E-11 7.3
 GO:0019226 BP Transmission of nerve impulse 4E-11 6.8
 GO:0007186 BP G protein coupled receptor protein signaling pathway 3E-10 3.9
 GO:0005886 CC Plasma membrane 4E-10 3.1
 GO:0007187 BP G protein signaling, coupled to cyclic nucleotide second messenger 1E-09 7.4
 GO:0032501 BP Multicellular organismal process 1E-09 2.9 1E-06 2.3
Categories underenriched for hypermethylated genes
 GO:0005622 CC Intracellular 6E-13 0.3 3E-07 0.4
 GO:0044424 CC Intracellular part 2E-11 0.3 1E-06 0.4
 GO:0044238 BP Primary metabolic process 5E-11 0.3 1E-03 0.6
 GO:0008152 BP Metabolic process 9E-10 0.3 2E-03 0.6
 GO:0043170 BP Macromolecule metabolic process 1E-09 0.3 2E-03 0.6

BP, biological process; MF, molecular function; CC, cellular compartment.

Because tumors may have different cellular composition than normal tissues (changes in angiogenesis, immune cells, etc.), these enriched GO categories may reflect changes in the cellular composition rather than differences between tumor and normal colon cells. To distinguish between these two possibilities, we assessed methylation differences in cell lines. In this data set, we compared 56 breast cancer cell lines to eight normal breast cell lines. Using the same methodology, we identified GO categories that are over- or underenriched for hypermethylated genes in the breast cancer cell lines. Despite the difference between the tissue type (colon vs. breast and fresh vs. cell line), many of the significant GO categories that are enriched in the colon tumors were also significantly enriched in the breast cancer cell lines. Moreover, the representation in various GO categories is in the same direction as in the colon cancers (Table 1 and Table S2). Specifically, of the 20 most significant GO categories in colon cancers, 14 were also significantly represented in breast cancer cell lines (at P value <5 × 10−3). Generally, hypermethylated genes are enriched in GO categories whose genes are present in cellular membranes and/or are involved in the cell–cell interactions of cells with other cells and functions involving the extracellular matrix. GO categories associated with basic intracellular processes (i.e., housekeeping genes) tended to have fewer than expected hypermethylated genes.

Promoters with high or low CpG content are not randomly distributed among the various GO categories (25). It was important to evaluate whether our findings regarding enrichment of hypermethylated genes in certain GO categories are due to this confounding factor. Therefore, the same GO analysis described herein was examined with only the high CpG genes. Essentially the same significant GO categories as those shown in Table 1 were obtained with this restricted gene set, confirming that the findings are not due to this confounding factor. Similar results were seen when the analysis was repeated using only those CpGs restricted to the 500 bp surrounding the TSS.

Discussion

Studies of methylation and other epigenetic phenomena hold enormous promise in expanding our knowledge of normal development and disease but are hampered by technological challenges. Numerous tools exist to facilitate high-density genetic studies; for instance, commercial products exist to examine the state of hundreds of thousands of SNPs and to evaluate the copy number variation throughout the genome. Additional tools are needed to study what and how epigenetic changes, particularly how variations in CpG methylation, influence gene expression.

To that end, we have developed a unique assay, mTACL, which can measure the state of methylation of large numbers of CpGs in genomic DNA. The current assay interrogates > 145,000 CpGs in 5,472 genes. As currently practiced, the mTACL method is capable of high CpG coverage and high throughput. Both ROC curve analysis and comparison with sequencing following bisulfite treatment indicate that the precision and accuracy of the assay are sufficiently robust. The measured concordance between array and sequencing readouts is lower than that for replicates assays with arrays as a readout. The decreased concordance is presumably partly due to systematic bias in the array detection (and potentially in the sequencing readout). This bias may be due to cross-hybridization or imperfection of the model of the prediction of methylation from the relative strength of signal of the methylated and unmethylated probes.

One of the challenges of the mTACL methodology lies in the construction of dU probe panels. The current panel was manufactured by performing >19,000 single-plex PCR reactions, which were subsequently pooled. Thus, the procedure is labor intensive and costly, making it impractical for construction of larger and custom panels. This barrier can be overcome by using parallel oligo synthesis technologies to produce virtually unlimited numbers of oligos of lengths up to 100 nucleotides as a pool, and subsequently amplified (26). Having such shorter oligos designed to be complementary to the two ends of the DNA segment would enable much longer segments of genomic DNA to be captured. We have made dU probe panels in this format with individual probe lengths of 100 bp and shown that these can be used to capture targets 70–350 bases long. No significant difference in performance was observed when comparing the full-length probes to the 100-mer probes in the mTACL assay (Fig. S5). Oligo pools therefore represent an inexpensive means for constructing large and custom dU probe panels and greatly improve the flexibility of the assay with respect to coverage. Using pools of such 100 mers should facilitate the scale up of the number of CpGs assessed in one assay by an order of magnitude.

The number of samples and breadth of coverage that could be achieved with mTACL permit several generalizations regarding methylation. First, though methylation levels of nearby CpGs tend to be highly correlated, this correlation drops dramatically over a few hundred base pairs (the scale of a promoter). Nevertheless, in some instances, the methylation of adjacent CpGs is decidedly disparate. Thus, methods with low resolution or those testing only a few CpGs per promoter may provide an incomplete picture of the methylation state of a promoter. Second, although there is no striking sequence feature that correlates with whether a CpG is methylated, the density of CpGs has a strong influence. Third, the number of genes that is differentially methylated between tumors and normal tissues greatly exceeds random expectation. Thus, genes generally associated with cell–cell communication and regulation tend to be hypermethylated in tumors, whereas the methylation state of CpGs in genes involved in basic metabolic processes are more likely to remain unchanged. Inasmuch as increased methylation tends to silence gene expression, methylation of genes involved in differentiation and signaling may contribute to the maintenance of the tumor state. More detailed and in-depth comparisons of the levels of CpG methylation in normal and tumor cells are needed to determine if the differences are associative or causative of the tumor state.

Though the present work focused on CpGs in nominal promoter regions, there is considerable evidence that other regions are capable of influencing the temporal and tissue specificity of gene expression (27). Because these regions are better defined, they provide suitable targets for determining if CpG methylation plays a role.

We note that two groups have reported different implementations of padlock probes for targeted capture of bisulfate-treated DNA for methylation analysis (16, 17). Our assay has several significant advantages over these methodologies. First, because the TACL procedure captures the DNA before bisulfite conversion, we are not limited by considerations of the methylation status in our capture probe design. Second, the circularization step reduces the efficiency, particularly as the size of the captured sequence gets larger. As a result, our method is more efficient (reflected in lower DNA requirement) and flexible while maintaining the high level of specificity provided by two precise ligation events. In addition, dU probe generation is much simpler than generation of a padlock probe as it only requires a PCR step. Using oligo pools, no enzymatic steps are required after the PCR, making the probe very scalable. As such, we believe that mTACL is an efficient, versatile, and scalable technology for targeted capture and assessment of DNA methylation levels.

Methods

Please see SI Text for details of methods for describing gene selection, dU probe panel construction, the methylation assay, generation of the calibration samples, the methylation level calling algorithm, the samples used, the generation of the ROC curves, the generation of 454 data, and the search for association between methylation level and surrounding sequence characteristics.

Gene Ontology Analysis of Genes Hypermethylated in Tumors.

To identify genes hypermethylated in tumors, we defined the methylation level as the median methylation value of all CpGs in the gene. We compared these values in both sample sets (matched normal/tumor colon pairs and breast tumor cell lines vs. normal breast cell lines) with a t test and looked at average methylation levels for each sample type. We first indentified genes that showed at least a 2-fold increase in methylation levels in tumors, and then picked the 200 most significantly hypermethylated genes in each comparison. We used to the GOstats package in R to perform hypergeometric testing to find GO categories that were under- or overenriched for hypermethylated genes.

Methylation Correlation.

Methylation correlations were individually calculated for each sample and then averaged. Specifically, we found all pairs of CpGs within 5 kb of each other, sorted the pairs by distance, and then divided the data into 25 equal bins containing the same number of markers. For each bin, we found the median distance between CpG pairs and the correlation (Pearson) of CpG methylation levels for the two CpGs in a pair. For each sample, we determined what percentage of close CpG pairs (separated by 100 bp or less) has discontinuous methylation levels (absolute difference in methylation level >50% or absolute difference in methylation level >30% and fold change in methylation >3×).

Supplementary Material

Supporting Information

Acknowledgments

The work of J.W.G. and P.S. was supported by the Director, Office of Science, Office of Biological and Environmental Research, of the U.S. Department of Energy under Contract no. DE-AC02-05CH11231, by the Department of the Army, award: W81XWH-07-1-0663 (The U.S. Army Medical Research Acquisition Activity, Fort Detrick, MD is the awarding and administering acquisition office), and by the National Institutes of Health, National Cancer Institute grants P50 CA 58207, and by the U54 CA 112970 to J.W.G.

Footnotes

Conflict of interest statement: Most of the work described in the manuscript was done at Affymetrix Laboratories, the research division of Affymetrix. Several of the authors are Affymetrix employees (S.N., V.E.H.C., Y.L., and D.F.), and one is an active member of the Affymetrix, Inc. Board of Directors (P.B.). The work may result in a commercial product in the future.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1005173107/-/DCSupplemental.

References

  • 1.Li E, Bestor TH, Jaenisch R. Targeted mutation of the DNA methyltransferase gene results in embryonic lethality. Cell. 1992;69:915–926. doi: 10.1016/0092-8674(92)90611-f. [DOI] [PubMed] [Google Scholar]
  • 2.Ideraabdullah FY, Vigneau S, Bartolomei MS. Genomic imprinting mechanisms in mammals. Mutat Res. 2008;647:77–85. doi: 10.1016/j.mrfmmm.2008.08.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Tost J. DNA methylation: An introduction to the biology and the disease-associated changes of a promising biomarker. Methods Mol Biol. 2009;507:3–20. doi: 10.1007/978-1-59745-522-0_1. [DOI] [PubMed] [Google Scholar]
  • 4.Lopez J, Percharde M, Coley HM, Webb A, Crook T. The context and potential of epigenetics in oncology. Br J Cancer. 2009;100:571–577. doi: 10.1038/sj.bjc.6604930. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Iacobuzio-Donahue CA. Epigenetic changes in cancer. Annu Rev Pathol. 2009;4:229–249. doi: 10.1146/annurev.pathol.3.121806.151442. [DOI] [PubMed] [Google Scholar]
  • 6.Huang TH, Perry MR, Laux DE. Methylation profiling of CpG islands in human breast cancer cells. Hum Mol Genet. 1999;8:459–470. doi: 10.1093/hmg/8.3.459. [DOI] [PubMed] [Google Scholar]
  • 7.Lippman Z, Gendrel AV, Colot V, Martienssen R. Profiling DNA methylation patterns using genomic tiling microarrays. Nat Methods. 2005;2:219–224. doi: 10.1038/nmeth0305-219. [DOI] [PubMed] [Google Scholar]
  • 8.Waalwijk C, Flavell RA. MspI, an isoschizomer of hpaII which cleaves both unmethylated and methylated hpaII sites. Nucleic Acids Res. 1978;5:3231–3236. doi: 10.1093/nar/5.9.3231. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Rauch T, Pfeifer GP. Methylated-CpG island recovery assay: A new technique for the rapid detection of methylated-CpG islands in cancer. Lab Invest. 2005;85:1172–1180. doi: 10.1038/labinvest.3700311. [DOI] [PubMed] [Google Scholar]
  • 10.Yan PS, Potter D, Deatherage DE, Huang TH, Lin S. Differential methylation hybridization: Profiling DNA methylation with a high-density CpG island microarray. Methods Mol Biol. 2009;507:89–106. doi: 10.1007/978-1-59745-522-0_8. [DOI] [PubMed] [Google Scholar]
  • 11.Weber M, et al. Distribution, silencing potential and evolutionary impact of promoter DNA methylation in the human genome. Nat Genet. 2007;39:457–466. doi: 10.1038/ng1990. [DOI] [PubMed] [Google Scholar]
  • 12.Mohn F, Weber M, Schübeler D, Roloff TC. Methylated DNA immunoprecipitation (MeDIP) Methods Mol Biol. 2009;507:55–64. doi: 10.1007/978-1-59745-522-0_5. [DOI] [PubMed] [Google Scholar]
  • 13.Frommer M, et al. A genomic sequencing protocol that yields a positive display of 5-methylcytosine residues in individual DNA strands. Proc Natl Acad Sci USA. 1992;89:1827–1831. doi: 10.1073/pnas.89.5.1827. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Cokus SJ, et al. Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning. Nature. 2008;452:215–219. doi: 10.1038/nature06745. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Meissner A, et al. Genome-scale DNA methylation maps of pluripotent and differentiated cells. Nature. 2008;454:766–770. doi: 10.1038/nature07107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Ball MP, et al. Targeted and genome-scale strategies reveal gene-body methylation signatures in human cells. Nat Biotechnol. 2009;27:361–368. doi: 10.1038/nbt.1533. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Deng J, et al. Targeted bisulfite sequencing reveals changes in DNA methylation associated with nuclear reprogramming. Nat Biotechnol. 2009;27:353–360. doi: 10.1038/nbt.1530. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Bibikova M, et al. High-throughput DNA methylation profiling using universal bead arrays. Genome Res. 2006;16:383–393. doi: 10.1101/gr.4410706. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Bibikova M, et al. Genome-wide DNA methylation profiling using Infinium® assay. Epigenomics. 2009;1:177–200. doi: 10.2217/epi.09.14. [DOI] [PubMed] [Google Scholar]
  • 20.Nosho K, et al. Comprehensive biostatistical analysis of CpG island methylator phenotype in colorectal cancer using a large population-based sample. PLoS ONE. 2008;3:e3698. doi: 10.1371/journal.pone.0003698. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Eads CA, et al. MethyLight: A high-throughput assay to measure DNA methylation. Nucleic Acids Res. 2000;28:E32. doi: 10.1093/nar/28.8.e32. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Shelton BP, Misso NL, Shaw OM, Arthaningtyas E, Bhoola KD. Epigenetic regulation of human epithelial cell cancers. Curr Opin Mol Ther. 2008;10:568–578. [PubMed] [Google Scholar]
  • 23.Subramaniam MM, et al. RUNX3 inactivation in colorectal polyps arising through different pathways of colonic carcinogenesis. Am J Gastroenterol. 2009;104:426–436. doi: 10.1038/ajg.2008.141. [DOI] [PubMed] [Google Scholar]
  • 24.Ashburner M, et al. The Gene Ontology Consortium. Gene ontology: Tool for the unification of biology. Nat Genet. 2000;25:25–29. doi: 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Saxonov S, Berg P, Brutlag DL. A genome-wide analysis of CpG dinucleotides in the human genome distinguishes two distinct classes of promoters. Proc Natl Acad Sci USA. 2006;103:1412–1417. doi: 10.1073/pnas.0510310103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Tian J, et al. Accurate multiplex gene synthesis from programmable DNA microchips. Nature. 2004;432:1050–1054. doi: 10.1038/nature03151. [DOI] [PubMed] [Google Scholar]
  • 27.Doi A, et al. Differential methylation of tissue- and cancer-specific CpG island shores distinguishes human induced pluripotent stem cells, embryonic stem cells and fibroblasts. Nat Genet. 2009;41:1350–1353. doi: 10.1038/ng.471. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information
1005173107_st01.xls (44.5KB, xls)
1005173107_st02.docx (24.5KB, docx)

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES