To the Editor: The majority of the significant sex-associated DNA-methylation sites at autosomal CpG loci reported by Numata et al.1 do not reflect a true biological phenomenon. Rather, the conclusions in this paper reflect a technical artifact created by the presence of cross-reactive autosomal probes hybridizing to both autosomal and sex chromosomes.
Numata et al.1 used the Illumina Infinium HumanMethylation27K microarray to assess genome-wide DNA methylation. This microarray uses 50 nt probes to target 27,578 CpG sites covering ∼13,000 genes. So that one can distinguish between the methylated and unmethylated alleles, DNA is treated with sodium bisulfite for converting unmethylated cytosines to uracil. Then, PCR amplification converts uracil to thymidine. In contrast, methylated cytosines remain cytosines. In this microarray, two probes are designed for each CpG site—one is designed for the methylated allele (cytosine), and the other is designed for the unmethylated allele (thymidine).
On the Illumina Infinium HumanMethylation27K microarray, there is a subset of probes that target autosomal loci but cross-react with genomic regions on the sex chromosomes. Because one of the X chromosomes in females is heavily methylated as a result of X inactivation,2 autosomal CpG loci targeted by probes that overlap these heavily methylated loci create spurious signals and therefore appear more methylated in females than in males. On the other hand, autosomal probes cross-hybridizing to unmethylated X chromosome loci that escape X inactivation show lower methylation in females than in males. Likewise for the Y chromosome, probes that cross-hybridize also shift the DNA-methylation level of the originally targeted autosomal CpG loci to produce a spurious increase in the methylation signals in males compared to females.
We can identify cross-reactive probes on the 27K microarray as having highly identical matches to nontargeted loci by first mapping probe sequences against the in silico sodium-bisulfite-converted reference genome (hg18) by using BLAT.3 In addition, the end nucleotide of the probes and the nontargeted loci are required to be the same for cross-hybridization to occur because array signals are derived from single-base extension of fluorescently tagged nucleotides at one end of the probes that correspond to the targeted CpGs. In Table 1, we have appended the potential cross-hybridizing targets of probes corresponding to the top ten autosomal genes described by Numata et al.1 as having the most significant sex differences. Using the same microarray platform, Liu et al. and Adkins et al.4–6 also reported the same overlapping set of autosomal sex-associated DNA-methylation sites, which we have found to be the result of technical artifact.3 The claim by Numata et al.1 that 5% of autosomal loci (or 1,333 CpGs) have significant differential methylation associated with sex is likely to be an overestimate because of the presence of autosomal probes cross-hybridizing to the sex chromosomes. The full list of CpG sites proposed to have significant sex differences was not published, so they could not be evaluated. Of course, this does not exclude the possibility that there are indeed true autosomal sex-associated sites of DNA methylation in humans because two of the top ten autosomal sex-associated CpG sites reported by Numata et al.1 are not targeted by cross-reactive probes. Notably, several other studies have observed true autosomal sex-associated DNA methylation by using targeted molecular approaches.7,8
Table 1.
From Numata et al.1 |
Cross-hybridizing Targets |
||||||
---|---|---|---|---|---|---|---|
CpG Locus | Chromosome | Gene | p Value (−log) | Difference | Chromosome | Length (bp) | Mismatch (bp) |
cg15915418 | 9 | TLE1 | 64.9 | female > male | X | 50 | 0 |
cg07711515 | 9 | BAG1 | 60.8 | female < male | X | 50 | 0 |
cg27063525 | 6 | C6orf68 | 46.7 | female > male | X | 50 | 0 |
cg11673803 | 10 | GLUD1 | 44.3 | female > male | X | 50 | 0 |
cg21243096 | 1 | POUF3F1 | 44.2 | female > male | X | 50 | 4 |
cg04455759 | 11 | SDHD | 28.4 | female < male | X | 45 | 2 |
cg08284151 | 12 | DPPA3 | 20.1 | female > male | X | 50 | 2 |
cg05924191 | 15 | FLI20582 | 17.8 | female > male | X | 50 | 2 |
cg23758485 | 16 | SMPD3 | 17.4 | female > male | none | none | none |
cg07494248 | 2 | HSPD1 | 16.4 | female > male | none | none | none |
The first five columns are from Table 1 of Numata et al.1 Note that the Illumina Infinium 27K microarray probes are 50 bases long, and, therefore, high-sequence identity of the probe sequences to unintended targets would suggest cross-reactivity. For detailed methods of identifying cross-hybridizing targets, please refer to our previous paper.3
The recognition of falsely discovered autosomal sex-associated DNA-methylation sites in our laboratory led us to perform a series of bioinformatic analyses to identify other potential cross-hybridizing targets. We found 6%–10% of the 27,578 probes in the Illumina Infinium 27K microarray to be cross-reactive and to thereby potentially generate false positives and reduce the power of downstream analyses.3 For example, of the top 100 most significant CpG loci reported by Numata et al.1 to be associated with developmental stages, age, expression, cis-mQTLs (methylation quantitative trait loci) or trans-mQTLs, we found a substantial overlap between cis- and trans-mQTLs and CpG loci targeted by cross-reactive probes on the basis of our previously published list of cross-reactive probes3 (Table 2). We also observed significantly higher proportions of cross-reactive probes in cis- and trans-mQTLs (23% and 28%, respectively)1 than in the entire array (6%–10%).3 This raises the possibility that the cross-hybridizing targets might overlap underlying SNPs and might thereby create spurious signals for which intensities depend on SNP genotypes. The fact that cross-reactive probes have multiple targets, i.e., an increased chance of hybridizing to a SNP variant rather than a single unique target, could explain the observed enrichments.
Table 2.
Proportion of CpGs |
||
---|---|---|
Targeted by Cross-reactive Probes | Polymorphic CpGs (SNPs) | |
From Numata et al.1 | ||
Top 100 CpGs showing significant methylation changes during the fetal period | 5% | 4% |
Top 100 CpGs showing significant methylation changes during childhood (0–10 years) | 3% | 3% |
Top 100 CpGs showing significant methylation changes over the age of 10 years | 10% | 0% |
48 CpGs showing significant age-related changes in cancer-related genes | 6% | 4% |
Top 100 negative correlations between gene expression and CpG methylation | 3% | 4% |
Top 100 significant cis-mQTLs | 23% | 29% |
Top 100 significant trans-mQTLs | 28% | 14% |
Top 100 mQTLs in African American subjects | 9% | 9% |
Top 100 mQTLs in subjects of European descenta | 11% | 8% |
From Chen et al.3 | ||
All 27,578 CpGs in the 27K microarray | 6%–10% | 3% |
The top 100 significant CpGs are taken from the supplementary tables of Numata et al.1 Their probe IDs were cross-matched to the published lists of cross-reactive probes3 and polymorphic CpGs (see Web Resources).The following abbreviation is used: mQTLs, methylation quantitative trait loci.
These individuals were mentioned to be Caucasian in Numata et al.1
Another potential source of error in generating data from the 27K microarray arises from probes targeting polymorphic CpGs (i.e., SNPs at either cytosine or guanine).3 A total of 907 (3%) CpG sites overlapping SNPs (dbSNP build 132) are targeted by this microarray (see Web Resources). Notably, a large proportion (29%) of the top 100 most significant cis-mQTLs reported by Numata et al.1 are linked to probes targeting loci that are polymorphic CpGs (i.e., SNPs overlapping CpG sites) (Table 2). In these cases, the methylation changes are likely to be a reflection of the underlying polymorphism. That is, the methylation level reflects alternate haplotypes of in-cis-associated SNPs so that non-CG variants of polymorphic CpGs would be detected as unmethylated loci.3 Further, the proportion of polymorphic CpGs is 2–4× higher in the top 100 most significant trans-mQTLs (14%), mQTLs in African Americans (9%), and mQTLs in individuals of European descent (mentioned as Caucasian in Numata et al.1) (8%) than in the entire array (3%) (Table 2). This further demonstrates the effect of polymorphisms at CpG loci on the evaluation of DNA methylation, suggesting that any quantitative association (e.g., between methylation and gene expression) involving these CpG loci could be greatly perturbed by the underlying genotypes of the population studied.
The existence of cross-reactive probes and polymorphic CpGs in the Illumina Infinium 27K microarray reflects the human genome’s natural diversity, which results from homologous and repetitive sequences and SNPs. Therefore, investigators should exercise caution when significant associations are found at CpG sites that are either polymorphic or targeted by cross-reactive probes. Biological interpretation requires validation of the detected methylation by other approaches such as sodium-bisulfite pyrosequencing. That Numata et al.1 and three other studies4–6 reported parallel findings previously shown to be the result of cross-reactive probes3 is concerning. False discovery has the potential to be used for inappropriately generating hypotheses or inferring biological significance. Considering that the Illumina platform is one of the most widely used DNA-methylation microarrays, we hope this letter will serve as a cautionary note for researchers who use Illumina Infinium DNA-methylation microarrays.
Web Resources
The URL for data presented herein is as follows:
List of cross-reactive probes and polymorphic CpGs in the Illumina 27K microarray, http://www.sickkids.ca/Research/Weksberg-Lab/Publications/index.html
References
- 1.Numata S., Ye T., Hyde T.M., Guitart-Navarro X., Tao R., Wininger M., Colantuoni C., Weinberger D.R., Kleinman J.E., Lipska B.K. DNA methylation signatures in development and aging of the human prefrontal cortex. Am. J. Hum. Genet. 2012;90:260–272. doi: 10.1016/j.ajhg.2011.12.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Cotton A.M., Lam L., Affleck J.G., Wilson I.M., Peñaherrera M.S., McFadden D.E., Kobor M.S., Lam W.L., Robinson W.P., Brown C.J. Chromosome-wide DNA methylation analysis predicts human tissue-specific X inactivation. Hum. Genet. 2011;130:187–201. doi: 10.1007/s00439-011-1007-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Chen Y.A., Choufani S., Ferreira J.C., Grafodatskaya D., Butcher D.T., Weksberg R. Sequence overlap between autosomal and sex-linked probes on the Illumina HumanMethylation27 microarray. Genomics. 2011;97:214–222. doi: 10.1016/j.ygeno.2010.12.004. [DOI] [PubMed] [Google Scholar]
- 4.Liu J., Morgan M., Hutchison K., Calhoun V.D. A study of the influence of sex on genome wide methylation. PLoS ONE. 2010;5:e10028. doi: 10.1371/journal.pone.0010028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Adkins R.M., Thomas F., Tylavsky F.A., Krushkal J. Parental ages and levels of DNA methylation in the newborn are correlated. BMC Med. Genet. 2011;12:47. doi: 10.1186/1471-2350-12-47. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Adkins R.M., Krushkal J., Tylavsky F.A., Thomas F. Racial differences in gene-specific DNA methylation levels are present at birth. Birth Defects Res. A Clin. Mol. Teratol. 2011;91:728–736. doi: 10.1002/bdra.20770. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Sarter B., Long T.I., Tsong W.H., Koh W.P., Yu M.C., Laird P.W. Sex differential in methylation patterns of selected genes in Singapore Chinese. Hum. Genet. 2005;117:402–403. doi: 10.1007/s00439-005-1317-9. [DOI] [PubMed] [Google Scholar]
- 8.El-Maarri O., Becker T., Junen J., Manzoor S.S., Diaz-Lacava A., Schwaab R., Wienker T., Oldenburg J. Gender specific differences in levels of DNA methylation at selected loci from human total blood: A tendency toward higher methylation levels in males. Hum. Genet. 2007;122:505–514. doi: 10.1007/s00439-007-0430-3. [DOI] [PubMed] [Google Scholar]