Abstract
Meiotic recombination events cluster into narrow segments of the genome, defined as hotspots. Here, we demonstrate that a major player for hotspot specification is the Prdm9 gene. First, two mouse strains that differ in hotspot usage are polymorphic for the zinc finger DNA binding array of PRDM9. Second, the human consensus PRDM9 allele is predicted to recognize the 13-mer motif enriched at human hotspots; this DNA binding specificity is verified by in vitro studies. Third, allelic variants of PRDM9 zinc fingers are significantly associated with variability in genome-wide hotspot usage among humans. Our results provide a molecular basis for the distribution of meiotic recombination in mammals, where the binding of PRDM9 to specific DNA sequences targets the initiation of recombination at specific locations in the genome.
Meiosis is a specialized cell cycle, essential for sexual reproduction, where diploid cells give rise to haploid gametes. The halving of genome content during meiosis results from two successive divisions. During the first one, the reductional division, unique to meiotic cells, homologous chromosomes segregate. This segregation requires the establishment of connections between homologs that are mediated in most species by reciprocal recombination events known as crossing over (CO) (1). COs also increase genome diversity, thereby improving the efficacy of natural selection (2). The molecular process of CO formation involves a highly regulated pathway of induction of programmed DNA double strand breaks (DSBs) followed by their repair on the homolog (3). In yeasts Saccharomyces cerevisiae and Schizosaccharomyces pombe, initiation sites have been mapped by the direct molecular detection of DSBs. These studies have shown that DSBs are not randomly distributed along chromosomes but occur in specific regions of the genome, according to rules that are as yet poorly understood (4). A common chromatin feature, the trimethylation of lysine 4 of histone H3 (H3K4me3), defines yeast and mouse initiation sites (5, 6).
In mammals, in most cases, the locations of initiation sites are deduced from mapping CO events. COs can be mapped at high resolution either by pedigree analysis, detection of recombinant molecules in gametes or by analysis of linkage disequibrium (LD) (7, 8). In humans, these approaches have shown that most COs are clustered in narrow regions (1–2kb), called hotspots, that are predicted to be preferred initiation sites (9). On the basis of LD patterns, over 30,000 hotspots have been identified in the human genome, spaced on average every 50–100kb, often outside from genes and with highly variable levels of activity (10, 11). In addition, some hotspots show inter-individual variation in activity as shown by sperm typing studies (7) or pedigree analysis (12).
LD-based hotspots were found to be highly enriched for a degenerate 13-mer motif (13). Moreover, in sperm typing studies, single nucleotide polymorphisms (SNPs) within this 13 bp motif were found to be associated with variation of hotspot activity in cis (14, 15). Genome-wide, the motif plays a role in approximately 40% of hotspots and is proposed to be involved in initiation specification or other aspects of recombination activity (13). In mice, based on the analysis of a 25Mb interval on chromosome (chr.) 1 (16) and several individual regions (17), initiation of meiotic recombination also appears to be clustered in small intervals. Recently, by comparing recombination activity between different mouse strains, a genetic locus responsible for the distribution of recombination in the genome was identified (18, 19), which potentially contributes, either directly or indirectly, to the specification of initiation sites in the genome. Specifically, the genetic background at this locus (wm7 haplotype from M. musculus molossinus or b haplotype from M. m. domesticus strains C57BL/10 or C57BL/6) was found to affect recombination activity measured both chromosome-wide and at two individual hotspots (Psmb9 on chr. 17 and Hlx1 on chr.1). This locus (named Dsbc1) was mapped to a region between 10.1 and 16.8 Mb on mouse chr. 17 (18).
Prdm9, a candidate gene
Upon additional crossing, we refined the Dsbc1 locus to the 12.2 to 16.8 Mb region of mouse chr. 17 (supporting online text). This region contains the Prdm9 gene coding for a protein with a SET-methyl transferase domain and a tandem array of twelve C2-H2 zinc fingers. PRDM9 has been shown to tri-methylate H3K4 and is expressed specifically in germ cells during meiotic prophase (20). Strains with distinct Dsbc1 alleles (wm7 or b) have different levels of H3K4me3 at the two recombination hotspots, Psmb9 and Hlx1. Specifically, a high level of H3K4me3 was correlated with high recombination activity at these hotspots (6). The Prdm9 gene is the only reported gene encoding for a histone methyl transferase in the Dsbc1 region and thus represents a strong candidate gene for the effect of Dsbc1. On this basis, we reasoned that the zinc fingers of PRDM9 could mediate DNA binding specificity and thus target its activity to specific sites in the genome. According to this hypothesis, altering the zinc fingers is predicted to lead to changes of sites targeted by PRDM9.
Distinct predicted DNA sequence specificities for two mouse PRDM9 zinc finger variants
We therefore determined and compared the cDNA sequences of Prdm9 from M. m. molossinus (wm7) and M. m. domesticus (b)(Fig. 1A, fig. S1). These two Prdm9 alleles showed a high level of polymorphism (24 changes over 847 residues); all but one of the changes are located in the zinc finger array. This array, encoded within a single exon, has a minisatellite-like, genomic structure where each zinc finger, 28 amino acids long, is encoded within a 84bp unit, which is repeated in tandem with almost perfect homology at both the DNA and protein levels (fig. S1). For a given allele, the differences between repeats are restricted to seven positions, five of which encode for amino-acids at coordinates −1, 3 and 6 of the zinc finger alpha helix, predicted to be in contact with the DNA and known to be involved in DNA sequence specificity (21, 22). When comparing the two Prdm9 alleles (wm7 and b), most polymorphisms (21 out of 23) were at residues −1, 3 and 6 of the zinc finger (Fig. 1A, fig. S1). The wm7 allele is also missing one zinc finger compared to the b allele. Sequencing the Prdm9 zinc finger array from M. m. castaneus showed it to be identical to wm7. This is consistent with the genetic origin of M. m. molossinus, known to be in part derived from M. m. castaneus, and with the observation that the two hotspots, Psmb9 and Hlx1, are active at similar levels in the presence of Dsbc1 alleles from either M. m. castaneus or M. m. molossinus ((18) and supporting online text). Using the Zinc finger database (http://bindr.gdcb.iastate.edu:8080/ZiFDB) (23), we predict that these two PRDM9 proteins preferentially recognize distinct DNA motifs (Fig. 1B). Due both to the low predicted specificity of some zinc fingers, and to the multiple combinations through which several zinc fingers of a protein may contribute to DNA recognition, PRDM9 is expected to recognize a large number of sites in the genome. For these reasons, and also due to the limited DNA recognition predictability of some zinc fingers (24), the predicted motif has limited power in identifying PRDM9 binding sites. Nonetheless, it is noteworthy that sequences matching respectively 8 and 9 of the 13 highest score bases of PRDM9wm7 predicted recognition motif are found near the center of Psmb9 and Hlx1 hotspots (fig. S2).
Variability in human PRDM9 zinc fingers
In human, the degenerate 13-mer motif was proposed to be a potential binding site for zinc fingers given its apparent 3bp periodicity (13). We therefore analyzed the zinc finger region of the human PRDM9 protein for its predicted binding specificity. The human PRDM9 protein referenced in databases (Genome Reference Consortium GRCh37 assembly, Ensembl release 56) contains 13 zinc fingers, with a tandem repeat structure similar to that observed in mice, where repeats are highly identical except at positions −1, 3 and 6 of the zinc finger alpha helices (fig. S3A). Strikingly, a group of five zinc fingers had a predicted affinity for a sequence that matches the 13-mer hotspot motif (Fig. 2A). This finding suggested to us that the role for Prdm9 in specifying hotspot localization might be conserved from mouse to human. If so, we might expect allelic variation in the zinc finger array to be associated with hotspot usage differences among humans. To test these predictions, we analyzed Prdm9 polymorphism by sequencing individual cDNAs from a testis library derived from a pool of 39 individuals, and by genotyping the zinc finger array by MVR-PCR (25) in individuals of European ancestry: the CEPH (Centre d’Etude du Polymorphisme Humain) resources and the Hutterites, a founder population currently living in North America (Fig. 2B, fig. S3B, fig. S4). A large number of alleles were found with differences both in the number of repeats and in their identity. In the CEPH families, six alleles were found among 105 unrelated individuals, with the major allele (allele A) occurring at a frequency of 90%. Except for one amino acid change in the 6th zinc finger, this A allele is identical to the genome sequence reference allele (allele B), which is at a frequency of 5%. Among other alleles (named C, D, E and K), the first five zinc fingers of PRDM9 show little variability, but zinc fingers 8 to 11 from allele A are highly variable with amino acid changes at the positions involved in contact with the DNA (fig. S5). Variability in human seems to be concentrated on one side of the zinc finger array, in the region involved in recognition of the 13-mer motif in allele A.
Association of human PRDM9 zinc finger variants with hotspot usage
In the Hutterite sample (25), three Prdm9 alleles, A, B and I, were present at frequencies of 94%, 4% and 2% respectively. Given the amino acid changes in its zinc finger array, the I allele variant is not expected to recognize the 13-mer motif (Fig. 2A). The presence of these variants allowed us to test the functional relationship between Prdm9 alleles, their predicted binding specificity and hotspot usage, taking advantage of well localized CO events in Hutterite families. Variation among Hutterite parents with respect to genome-wide “hotspot usage” (i.e., the fraction of COs that occurred in recombination hotspots inferred from LD data) was previously found to be significant and heritable (h2 = 0.22, (12)). To increase our sample size, we typed an additional 188 Hutterite parents, in which we found 6 AI and 10 AB genotypes. Among these, we were able to call crossover events in transmissions from an additional 2 AB individuals and 3 AI individuals and their 5 AA partners (i.e., the subset of parents where genotyping information was available for two or more children), To assess the impact of variation at the zinc finger array of Prdm9 on hotspot usage in the Hutterites, we regressed the maximum likelihood estimate of hotspot usage for each parent on his/her genotype (Fig. 3A). Both AB and AI heterozygote individuals differed significantly from AA homozygotes in their use of LD-based hotspots of recombination (pAB= 0.033, pAI= 9.3×10−12). The AI heterozygotes had significantly lower hotspot usage in both males and females (pAI=1.6×10−8 and pAI=0.0032, with nAI=7 and nAI=2 respectively) while the AB result was only significant in females (pAB=0.020, nAB=9) but was in a consistent direction in males (pAB=0.40, nAB=9). This result was robust to the relatedness among Hutterite individuals and remained significant when the phenotypes were quantile-normalized (25). Moreover, variation at the zinc finger array of Prdm9 alone explained 18% of the population variance in hotspot usage among Hutterite individuals (25); the true proportion is likely to be even higher, given that the phenotype is measured with considerable error.
As individuals differ greatly in the precision with which their phenotype is estimated due to differences in the number of well-localized CO events (Fig. 3A), we considered whether this measurement error could affect our conclusions. To this end, we calculated the likelihood surface for the hotspot usage phenotype for each genotype, in females and males (Fig. 3B and 3C). A likelihood ratio test of a model where hotspot usage does not depend on genotype to one in which it does was highly significant in both males and females (p=0.0014 in females, p<10−5 in males, as assessed by permutation; (25)). Notably, the AI genotype is associated with a three-fold drop (~70%) in the usage of LD-based hotspots (the maximum likelihood estimates fall from 60 to 18% in the joint analysis of males and females; see fig. S6). The large difference in LD-based hotspot usage between AA and AI individuals suggests that the I allele activates a set of hotspots that have not left a footprint on genetic diversity, either because they are too recent or too weak. The interpretation of the difference in hotspot usage between AA and AI individuals depends on how many crossovers are specified by the A allele in AA individuals. As a first approximation, we might consider that the 13-mer motif has been predicted to be causal at 40% of LD hotspots (13) and thus that, all else being equal, 40% of crossovers placed in LD hotspots might depend on the A allele. The fact that the estimated difference between genotypes is far larger (~70%) suggests that the binding specificity of PRDM9 explains more than 40% of LD-based hotspot activity in the current population. In any case, the strong decrease observed in AI heterozygotes suggests that the I allele is out-competing the A allele in determining crossovers in LD-based hotspots e.g., because of greater number of sites recognized or a higher binding affinity. The small but significant increase in LD-based hotspot usage in AB compared to AA individuals suggests that the sequences recognized by A and B are slightly different. This might be explained by the amino-acid difference (serine to threonine) between these two alleles (Fig. 2A), located on a residue of a zinc finger potentially involved in interaction with the DNA.
Furthermore, while across individuals, hotspot usage was not significantly correlated with genetic map length (12), AB heterozygotes showed a significantly longer genetic map in the combined sample of both sexes (pAB= 0.014); again, this effect remained even when the phenotype is quantile-normalized. In contrast, there was no detectable effect of the AI heterozygote on the map length (pAI=0.37) (25).
Direct PRDM9 binding to hotspot motifs
Together, these results provide direct evidence that Prdm9 is involved in hotspot specification and in controlling the distribution of recombination events in the human genome. To demonstrate that this effect is mediated through the binding of PRDM9 at hotspots, we directly tested the interaction between PRDM9A and PRDM9I proteins, and their predicted recognition motifs. By southwestern analysis, PRDM9A protein (labelled ZA) was shown to have high affinity to a DNA fragment including the 13-mer hotspot motif (HM), and low affinity to the same fragment but carrying mutations in the most conserved positions of this motif (HM*), as well as to a DNA fragment including the predicted motif of the PRDM9I protein (IM) (Fig. 4A, 4B, 4C). Reciprocally, binding of PRDM9I (ZI) was specific for the predicted I motif (Fig. 4B). These assays were independently confirmed by band-shift assays which showed the greater affinity of PRDM9A to the 13-mer hotspot motif compared to its mutated form and to the predicted I motif, as well as the greater affinity of the PRDM9I for the predicted I motif compared to the 13-mer hotspot motif (Fig. 4D).
In summary, our observations reveal an entirely unexpected feature of initiation of meiotic recombination: a role for Prdm9 in specifying the sites of initiation in mammals, through the direct binding of PRDM9 to specific sequences in the genome and by promoting DSB formation in the vicinity of its binding site. Using a different strategy, Myers et al. (Science this issue) predicted the preferential binding of human PRDM9 to the 13-mer hotspot motif, and thus proposed PRDM9 to be involved in hotspot localization in humans. The precise mechanism of action of Prdm9 is not known. It is likely that the histone methyl transferase activity has an important role by promoting enrichment of H3K4me3 on nucleosomes located next to PRDM9 binding sites as observed at two mouse hotspots (6). In turn, this modification of the chromatin, or downstream signals, might be recognized by a component of the recombination initiation machinery allowing the recruitment of SPO11 that catalyzes meiotic DSB formation. It is interesting to note that in S. cerevisiae, the enrichment for H3K4me3 has also been observed at initiation sites (5). In this case, this histone modification depends on the histone methyl transferase Set1 which does not contain a DNA binding domain, and which is probably recruited by an alternative mechanism. In mouse and human, PRDM9 seems to control the activity of a large fraction of hotspots. In fact, the presence of different Prdm9 alleles leads to major changes of crossover distribution on several chromosomes in mouse (18, 19), and to substantial changes in hotspot usage in human (Fig. 3). Analysis of Prdm9−/− mice has shown that Prdm9 is essential for progression through meiotic prophase (20). Based on cytological analysis, DSBs were detected in Prdm9−/− spermatocytes, suggesting that Prdm9 might not be absolutely required for DSB formation. It is therefore possible that in wild-type, some DSBs might occur at sites not bound by PRDM9.
Prdm9 has also been shown to be involved in hybrid sterility in M. musculus. This phenotype depends on polymorphisms in the zinc finger array of PRDM9 and on several independently segregating genes (26). In sterile hybrids, a defect is observed during meiotic prophase, after the stage of DSB formation, which may indicate an additional role for PRDM9, for instance in the regulation of gene expression, and presumably involving a limited number of genes. In fact, one does not expect PRDM9 to be a master transcriptional regulator given the rapid evolution of its DNA binding specificity among metazoans (27).
The features of the PRDM9 protein described above carry major implications for hotspot variability and genome evolution. The minisatellite structure of the Prdm9 zinc finger encoding region confers a strong potential to generate variability by recombination or replication slippage within the array. Specifically, a single amino-acid change within zinc fingers could lead to a PRDM9 variant with novel DNA binding specificity and thus potentially create a new family of hotspots genome-wide. The introduction of new hotspots may counteract the loss of individual hotspots due to biased gene conversion upon DSB repair (which acts against the initiating allele), and so changes in the Prdm9 gene offer a mechanistic solution to the “recombination hotspot paradox” (28). Indeed, rapid evolution of both the PRDM9 protein and of the hotspot motif have been shown by Myers et al. (Science this issue). Further, the zinc fingers of PRDM9 are evolving under positive selection and concerted evolution across many metazoan species, specifically at positions involved in defining their DNA-binding specificity (27). Regardless of the precise selective pressures acting on this gene, the properties of PRDM9 uncovered here, together with features of DSB repair, provide an interpretation for the divergence of fine-scale genetic maps between closely related species and even among individuals within species (19, 29, 30).
Supplementary Material
Acknowledgments
We thank all members of our laboratories for discussions, J. Pritchard for comments on an earlier version of the manuscript, R. Hernandez and E. Leffler for their help with bioinformatics and E. Leffler for generating fig. S7. We thank E. Brun for technical assistance on PRDM9 in vitro assays, and D. Haddou and F. Arnal for mouse facility service. This study was supported by a grant from the Centre National de la Recherche Scientifique (CNRS); the Association pour la Recherche sur le Cancer (ARC 3939); the Fondation Jerôme Lejeune and the Agence Nationale de la Recherche (ANR-06-BLAN-0160-01 and ANR-09-BLAN-0269-01) to BdM. GC was supported by a Sloan Foundation Fellowship. CG was supported by a grant from Electricité de France. This research was further was supported by NIH grants HD21244 and HL085197 to CO, a Sloan Foundation Fellowship to GC, and NIH grant GM83098, ARRA supplement 03S1, and a Howard Hughes Medical Institute Early Career Scientist Award to MP. Sequences generated for this study are deposited in Genbank, accessions GU216222, GU216223, GU216224, GU216225, GU216226, GU216227, GU216228, GU216229 and GU216230.
Footnotes
References
- 1.Petronczki M, Siomos MF, Nasmyth K. Cell. 2003;112:423. doi: 10.1016/s0092-8674(03)00083-7. [DOI] [PubMed] [Google Scholar]
- 2.Coop G, Przeworski M. Nat Rev Genet. 2007;8:23. doi: 10.1038/nrg1947. [DOI] [PubMed] [Google Scholar]
- 3.Hunter N. In: Molecular genetics of recombination. Aguilera A, Rothstein R, editors. Springer-Verlag; Berlin Heidelberg: 2007. pp. 381–442. [Google Scholar]
- 4.Keeney S. In: Genome Dynamics and Stability. Springer H, editor. Vol. 2. 2008. pp. 81–124. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Borde V, et al. Embo J. 2009;28:99. doi: 10.1038/emboj.2008.257. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Buard J, Barthes P, Grey C, de Massy B. Embo J. 2009;28:2616. doi: 10.1038/emboj.2009.207. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Arnheim N, Calabrese P, Tiemann-Boege I. Annu Rev Genet. 2007;41:369. doi: 10.1146/annurev.genet.41.110306.130301. [DOI] [PubMed] [Google Scholar]
- 8.Buard J, de Massy B. Trends Genet. 2007;23:301. doi: 10.1016/j.tig.2007.03.014. [DOI] [PubMed] [Google Scholar]
- 9.Jeffreys AJ, Kauppi L, Neumann R. Nat Genet. 2001;29:217. doi: 10.1038/ng1001-217. [DOI] [PubMed] [Google Scholar]
- 10.McVean GA, et al. Science. 2004;304:581. doi: 10.1126/science.1092500. [DOI] [PubMed] [Google Scholar]
- 11.Myers S, Bottolo L, Freeman C, McVean G, Donnelly P. Science. 2005;310:321. doi: 10.1126/science.1117196. [DOI] [PubMed] [Google Scholar]
- 12.Coop G, Wen X, Ober C, Pritchard JK, Przeworski M. Science. 2008;319:1395. doi: 10.1126/science.1151851. [DOI] [PubMed] [Google Scholar]
- 13.Myers S, Freeman C, Auton A, Donnelly P, McVean G. Nat Genet. 2008;40:1124. doi: 10.1038/ng.213. [DOI] [PubMed] [Google Scholar]
- 14.Jeffreys AJ, Neumann R. Nat Genet. 2002;31:267. doi: 10.1038/ng910. [DOI] [PubMed] [Google Scholar]
- 15.Jeffreys AJ, Neumann R. Hum Mol Genet. 2005;14:2277. doi: 10.1093/hmg/ddi232. [DOI] [PubMed] [Google Scholar]
- 16.Paigen K, et al. PLoS Genet. 2008;4:e1000119. doi: 10.1371/journal.pgen.1000119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.de Massy B. Trends Genet. 2003;19:514. doi: 10.1016/S0168-9525(03)00201-4. [DOI] [PubMed] [Google Scholar]
- 18.Grey C, Baudat F, de Massy B. PLoS Biol. 2009;7:e35. doi: 10.1371/journal.pbio.1000035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Parvanov ED, Ng SH, Petkov PM, Paigen K. PLoS Biol. 2009;7:e36. doi: 10.1371/journal.pbio.1000036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Hayashi K, Yoshida K, Matsui Y. Nature. 2005;438:374. doi: 10.1038/nature04112. [DOI] [PubMed] [Google Scholar]
- 21.Pabo CO, Peisach E, Grant RA. Annu Rev Biochem. 2001;70:313. doi: 10.1146/annurev.biochem.70.1.313. [DOI] [PubMed] [Google Scholar]
- 22.Wolfe SA, Grant RA, Elrod-Erickson M, Pabo CO. Structure. 2001;9:717. doi: 10.1016/s0969-2126(01)00632-3. [DOI] [PubMed] [Google Scholar]
- 23.Fu F, et al. Nucleic Acids Res. 2009;37:D279. doi: 10.1093/nar/gkn606. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Ramirez CL, et al. Nat Methods. 2008;5:374. doi: 10.1038/nmeth0508-374. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Materials and methods are available as supporting material on Science Online.
- 26.Mihola O, Trachtulec Z, Vlcek C, Schimenti JC, Forejt J. Science. 2008 doi: 10.1126/science.1163601. [DOI] [PubMed] [Google Scholar]
- 27.Oliver PL, et al. PLoS Genet. 2009;5:e1000753. doi: 10.1371/journal.pgen.1000753. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Boulton A, Myers RS, Redfield RJ. Proc Natl Acad Sci USA. 1997;94:8058. doi: 10.1073/pnas.94.15.8058. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Ptak SE, et al. Nat Genet. 2005;37:429. doi: 10.1038/ng1529. [DOI] [PubMed] [Google Scholar]
- 30.Winckler W, et al. Science. 2005;308:107. doi: 10.1126/science.1105322. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.