Skip to main content
Frontiers in Genetics logoLink to Frontiers in Genetics
. 2012 May 4;3:73. doi: 10.3389/fgene.2012.00073

On the Analysis of the Illumina 450k Array Data: Probes Ambiguously Mapped to the Human Genome

Xu Zhang 1, Wenbo Mu 2, Wei Zhang 3,4,5,*
PMCID: PMC3343275  PMID: 22586432

The newly developed Illumina HumanMethylation450 BeadChip (450K array; Illumina, Inc., San Diego, CA, USA) allows unprecedented genome-wide profiling of DNA methylation at >450,000 CpG and non-CpG methylation sites (Sandoval et al., 2011). Utilizing the 450K array, Philibert et al. (2012) examined the relationship of recent alcohol intake to genome-wide methylation patterns in lymphoblast DNA samples derived from 165 female subjects participating in the Iowa Adoption Studies. The authors’ interesting paper demonstrated that the 450K array could be a useful tool for ongoing and newly designed epigenome projects. However, given the unique design of the platform (for detailed annotations for the 450K array including probe sequences: http://www.illumina.com/), some cautions might need to be exerted when analyzing the 450K array data, in addition to some general challenges for analyzing the whole-genome DNA methylation data (Laird, 2010). Particularly, we found that a substantial proportion of the >450,000 DNA methylation probes on the 450K array are not aligned to unique, unambiguous loci in the human genome (Moen et al., 2012). In total, we found ∼140,000 methylation probes ambiguously mapped to multiple locations in the human genome (hg19) with up to two mismatches in the probe sequences using Bowtie (v2.0.0 beta2; Langmead et al., 2009; Langmead and Salzberg, 2012). Briefly, Bowtie is an ultrafast, memory-efficient short read aligner by indexing the genome with an extended Burrows–Wheeler technique, which implements a novel quality-aware backtracking algorithm that permits mismatches (Langmead et al., 2009; Langmead and Salzberg, 2012). Different alignment algorithms, e.g., BLAT (Kent, 2002) and MAQ (Li et al., 2008), would provide similar estimates (unpublished data). In comparison, ∼1,000 methylation probes were found to be ambiguously mapped to the human genome hg18 in the earlier 27K Illumina Human Methylation array (27K array; Bell et al., 2011). Because the much more comprehensive 450K array covers not only promoters, but also gene bodies, untranslated regions (UTRs) and “open sea” methylation sites, the problem of ambiguous alignment may particularly need to be taken into account when analyzing the data from this new platform. Notably, 20 CpG methylation probes (e.g., cg24023553 in Table 2; cg00004209 in Table 3; cg24675557 in Table 5) out of the 90 top-ranking probes reported by Philibert et al. (2012) were mapped to ambiguous loci in the current human reference (hg19) using Bowtie (Langmead et al., 2009; Langmead and Salzberg, 2012). Since the problem of ambiguous alignment to the human genome may cause unreliable measurement of DNA methylation level at a particular methylation site, considering this unique problem for this platform may not only facilitate the data analysis (e.g., by improving the multiple-testing problem by removing those affected probes), but also help interpret the results by focusing on more reliable biological signals. In addition, other factors (e.g., polymorphisms in the target sequences, potential batch effects) that may affect other platforms (e.g., the 27K array; Bell et al., 2011; Fraser et al., 2012) as well may also need to be considered in the analysis of these data.

Acknowledgments

This work was supported, in part, by a grant, R21HG006367 (to Wei Zhang) from the NHGRI/NIH.

References

  1. Bell J. T., Pai A. A., Pickrell J. K., Gaffney D. J., Pique-Regi R., Degner J. F., Gilad Y., Pritchard J. K. (2011). DNA methylation patterns associate with genetic and gene expression variation in HapMap cell lines. Genome Biol. 12, R10. 10.1186/gb-2011-12-6-405 [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Fraser H. B., Lam L. L., Neumann S. M., Kobor M. S. (2012). Population-specificity of human DNA methylation. Genome Biol. 13, R8. 10.1186/gb-2012-13-2-r8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Kent W. J. (2002). BLAT – the BLAST-like alignment tool. Genome Res. 12, 656–664 10.1101/gr.229102 [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Laird P. W. (2010). Principles and challenges of genomewide DNA methylation analysis. Nat. Rev. Genet. 11, 191–203 10.1038/ni0310-191 [DOI] [PubMed] [Google Scholar]
  5. Langmead B., Salzberg S. L. (2012). Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 10.1038/nmeth.1923 [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Langmead B., Trapnell C., Pop M., Salzberg S. L. (2009). Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25. 10.1186/gb-2009-10-3-r25 [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Li H., Ruan J., Durbin R. (2008). Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18, 1851–1858 10.1101/gr.078212.108 [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Moen L. E., Mu W., Delaney S., Wing C., McQuade J., Godley L. A., Dolan M. E., Zhang W. (2012). Differences in DNA methylation between the African and European HapMap populations. Proc. Am. Assoc. Cancer Res. 5010 [Abstract] [Google Scholar]
  9. Philibert R. A., Plume J. M., Gibbons F. X., Brody G. H., Beach S. R. (2012). The impact of recent alcohol use on genome wide DNA methylation signatures. Front. Genet. 3:54. 10.3389/fgene.2012.00054 [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Sandoval J., Heyn H., Moran S., Serra-Musach J., Pujana M. A., Bibikova M., Esteller M. (2011). Validation of a DNA methylation microarray for 450,000 CpG sites in the human genome. Epigenetics 6, 692–702 10.4161/epi.6.6.16196 [DOI] [PubMed] [Google Scholar]

Articles from Frontiers in Genetics are provided here courtesy of Frontiers Media SA

RESOURCES