Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2012 Dec 24;110(2):600–605. doi: 10.1073/pnas.1220813110

Recombination regulator PRDM9 influences the instability of its own coding sequence in humans

Alec J Jeffreys 1,1, Victoria E Cotton 1, Rita Neumann 1, Kwan-Wood Gabriel Lam 1,2
PMCID: PMC3545772  PMID: 23267059

Abstract

PRDM9 plays a key role in specifying meiotic recombination hotspot locations in humans and mice via recognition of hotspot sequence motifs by a variable tandem-repeat zinc finger domain in the protein. We now explore germ-line instability of this domain in humans. We show that repeat turnover is driven by mitotic and meiotic mutation pathways, the latter frequently resulting in substantial remodeling of zinc fingers. Turnover dynamics predict frequent allele switches in populations with correspondingly fast changes of the recombination landscape, fully consistent with the known rapid evolution of hotspot locations. We found variation in meiotic instability between men that correlated with PRDM9 status. One particular “destabilizer” variant caused hyperinstability not only of itself but also of otherwise-stable alleles in heterozygotes. PRDM9 protein thus appears to regulate the instability of its own coding sequence. However, destabilizer variants are strongly self-limiting in populations and probably have little impact on the evolution of the recombination landscape.

Keywords: conversion, drive, minisatellite, polarity, sperm


Meiotic recombination plays a crucial role in many organisms in ensuring correct chromosome disjunction at meiosis and in increasing population diversity by reshuffling haplotypes. Recombination events in the human and mouse genomes are largely clustered into narrow hotspots, many of whose locations are specified by the protein PR domain-containing 9 (PRDM9) (110). PRDM9 is a meiosis-specific histone methyltransferase with a tandem-repeat zinc finger (ZnF) domain encoded by a minisatellite-like sequence (11). The ZnF domain is polymorphic in repeat number and type, and appears to be directly responsible for activating hotspots by binding to hotspot-associated sequence motifs in both humans and mice. Different ZnF variants specify different sets of meiotic recombination hotspots, apparently through recognition of different hotspot sequence motifs (4, 69), and in mice, Prdm9 serves to locate hotspots away from functional genomic elements (10). PRDM9 variation also influences some aspects of genome instability in humans, including rearrangements underlying two genomic disorders as well as mutation at hypervariable minisatellites (4).

Hotspots evolve rapidly, as shown by the totally different fine-scale recombination landscapes of humans and chimpanzees (12). Turnover might be driven by the tendency of hotspots to self-destruct through the systematic overtransmission of variants within hotspots that downregulate recombination initiation (3, 1315), leading to hotspot depletion and consequent selection in favor of PRDM9 variants that activate new sets of hotspots. Alternatively, rapid hotspot evolution might simply be driven by rapid turnover in the PRDM9 ZnF array. Little is known about the dynamics or processes of array mutation, although levels of PRDM9 variation in human (1, 2, 4, 7) suggest low rates of copy number change. Patterns of PRDM9 evolution suggest some contribution of recombination to array diversification, although its nature and strength are uncertain (16). We have therefore characterized de novo germ-line mutation processes in the PRDM9 ZnF coding sequence to investigate mutational dynamics, to see whether meiotic recombination plays a role in array diversification, and to determine whether this protein might influence the instability of its own coding sequence.

Results and Discussion

Detecting de Novo Mutations.

PRDM9 ZnF variability in European and African populations (4, 7) suggests a rate of copy number change of ∼0.3–20 × 10−5 per generation (Materials and Methods), too low to allow mutants to be detected by small-pool PCR approaches (17). We therefore developed size-enrichment methods (18) to detect rare de novo mutant DNA molecules showing altered repeat number in the ZnF coding array. Genomic DNA was digested with restriction enzymes to release the array (Fig. 1A), size fractionated by electrophoresis, and mutant-enriched fractions analyzed by small-pool PCR to identify de novo size-validated mutant molecules (Fig. 1B and Fig. S1). Blood DNA from a man homozygous for the most common PRDM9 allele (allele A) showed low-frequency mutations, at 2.6 × 10−5 per haploid genome, and with a spectrum dominated by small deletions. All blood mutants could be derived by simple mitotic exchange between misaligned PRDM9 A alleles or sister chromatids (Fig. 1C), in some cases creating hybrid repeats not yet seen in populations (Fig. S2). In contrast, sperm DNA showed additional length-gain mutations, some simple and some more complex with segmental triplications or more intricate rearrangements that could not be explained by a single exchange between misaligned alleles. Similar complex events, on occasion nearly doubling the number of repeats, were seen in sperm from four additional men analyzed for gain and loss mutations as well as in 10 men tested for gain mutations only (894 mutants sequenced in total) (Fig. 1C, and Figs. S3 and S4). Most complex events (95%) involved gains rather than losses of repeats, and were mostly unique, with only seven instances of the same structure being seen twice among 300 different complex mutants. In contrast, 39% of simple rearrangements were recurrent, being seen repeatedly in the same man and in different men carrying the same allele (Fig. 2 and Fig. S4). The sperm specificity of gain mutations and the scarcity of recurrent complex events suggest that gains are largely generated at meiosis.

Fig. 1.

Fig. 1.

De novo mutation at PRDM9. (A) Restriction sites used for size-enriching ZnF repeat array mutants, plus primers (blue arrows) for single molecule amplification of the array (boxes). Using HpaI and PvuII ensured that any residual partial digest products were much larger than array mutants. (B) Examples of mutants detected by small-pool PCR and Southern blot hybridization of sperm DNA fractions size-enriched from a man heterozygous for 13- and 14-repeat PRDM9 alleles. The expected size range of mutants in each fraction is shown by red bars, and the ladders L are corresponding amplicons from all known allele lengths (8–18 repeats) (4, 7). (C) Mutation dynamics in blood and sperm. Mutation spectra for simple (black) and complex (red) rearrangements are shown on the left, with progenitor allele length(s) shaded in green. PRDM9 genotypes and overall mutation frequencies per haploid genome are given in each panel, with the frequency for man 16 corrected for ∼32% of mutants not detectable because of the presence of progenitor alleles of different length. Representative examples of mutant structures are shown on the right, with different repeat types (defined in Fig. S2) color coded as indicated. Deletions are indicated by Δ, and reduplicated regions by black bars. Segments in recombinant mutants (blue) matching different alleles are underlined in blue or pink.

Fig. 2.

Fig. 2.

Distribution of rearrangements along the PRDM9 ZnF coding repeat array. (A) Distribution of rearrangement midpoints as a proportion of progenitor array length. Data were pooled over all sequenced mutants and binned into 0.05 intervals. Only the complex gains showed evidence for polarity, with 65% (193/298) of midpoints located 3′ to the center of the unstable region of the array (χ2 test, P < 0.0001). (B) Unequal exchange activity per base pair in each interval of perfect sequence identity (IPSI) shared by misaligned PRDM9 alleles. Pairs of A or C repeat arrays are shown misaligned by one repeat, with sequence mismatches marked by vertical lines. All simple ±1 repeat mutants derived from A or C alleles (Fig. S4) were used to estimate the exchange activity in each IPSI relative to all other IPSIs, allowing data to be pooled from different men. Thus, if 10% of mutants mapped to an IPSI 50 bp long, then the relative activity of this IPSI per base pair, relative to all other IPSIs for a given allele misalignment, was 0.1/50 = 0.002. If exchanges are randomly distributed along an array irrespective of ISPI length, then all IPSIs should show the same activity. Gain and deletion activities are shown separately, with mutants identical to known alleles indicated above and below the histograms. Following IPSI binning, the best-fit relationship between IPSI length i and the relative frequency of exchange f was f = 6 × 10−5i1.72 (Pearson's r = 0.989). This relationship was used to estimate the expected exchange activity in each IPSI (horizontal red lines). There was no significant correlation between IPSI location and the observed vs. expected exchange activity (Pearson's r = 0.175, P = 0.206), and thus no evidence that these simple exchanges are clustered toward one end of the repeat array.

Recombinant Mutants.

Evidence for the involvement of meiotic recombination came from the detection of apparently recombinant structures in at least 10% of sperm mutants recovered from PRDM9 heterozygotes (Fig. 1C and Fig. S4). Some recombinants arose by the gene-conversion-like insertion of repeat blocks from one allele into the other. Others apparently consisted of the beginning of one allele fused to the end of the other allele. However, given the similarities in progenitor allele structures, such mutants can equally well be explained by local interallelic conversion, as can many or all of the apparently nonrecombinant mutants. We conclude that meiotic recombination plays a substantial role in PRDM9 instability in the male germ line.

Nonpolar Mutation.

The PRDM9 ZnF array shows polarized variability, with limited diversity at the 5′ end (2, 4) (Fig. 3), but we find no evidence for a corresponding polarity in mutation (Fig. 2). Simple rearrangements showed exchanges distributed largely at random along the repeat array, but with exchange points preferentially clustered into the longest intervals of perfect sequence identity (IPSIs) shared by misaligned alleles, as also seen for unequal exchanges between γ-globin genes (21). Complex rearrangements were somewhat displaced toward the 3′ end of the array, but this could reflect this diverse region being more prone to complex versus simple repeat turnover, rather than polarity in the mutation process itself. There is therefore no evidence that PRDM9 repeat instability is driven by a flanking recombination hotspot as seen at some highly unstable minisatellites (22, 23). This is consistent with the complete linkage disequilibrium (LD) seen between SNP markers flanking the repeat array in HapMap populations (24), although the presence of a very weak hotspot that has not left its mark on LD (25) cannot be excluded. If initiating double-strand breaks can be detected in testis DNA (9), then it might be possible to test directly whether repeat instability arises because of a nearby hotspot or from initiation within the array.

Fig. 3.

Fig. 3.

Mutation frequency variation at PRDM9 in sperm. (A) Variation between men in the frequency of gains of two or more repeat units relative to the progenitor allele, with men ranked in ascending order of instability and with upper 95% confidence intervals indicated by bars. PRDM9 alleles in each man are shown on the left, with C and C-type (Ct) alleles (7) shaded in blue. The allelic origin of nonrecombinant mutants in heterozygotes is shown in black and red for the first and second allele, respectively. Gray, mutants not sequenced. (B) Structures of progenitor PRDM9 alleles in these men. In silico predicted DNA-binding motifs (19) are underneath, with matches to the PRDM9 A motif CCNCCNTNNCCNC 20 (20) highlighted in red and with the number of bases matching this motif indicated on the right. The repeat difference between alleles A and D and its match with allele C are indicated by lines. (C) Spectra of mutants, relative to the progenitor allele (green), derived from the A allele in A/A and A/D men, compared with the D allele in the A/D man. Undetectable mutant classes occluded by a progenitor allele are marked by crosses.

Variation in Instability.

PRDM9 instability varied considerably between men (Fig. 1C). To test whether this is caused by PRDM9 itself, we analyzed mutations resulting in the gain of at least two repeats because such gains appear to be exclusively meiotic in origin and compared these across 16 men containing 10 different PRDM9 alleles (Fig. 3A and Fig. S3). Mutation frequency correlated well with PRDM9 status, with all three men carrying a D allele showing the highest instability (40-fold higher than in the A/A man who showed the lowest activity), and with intermediate instability (mean, sixfold enhancement) in all men carrying C or C-type (Ct) alleles that are predicted to recognize the same DNA-binding motif (7) (Fig. 3B). Rank-order analysis showed that these associations of the D variant with high instability and Ct variants with intermediate instability are significant (Mann-Whitney test, P = 0.0036 and 0.0016, respectively), indicating that PRDM9 influences its own instability. These associations might reflect intrinsic differences in array instability mediated by local DNA sequence characteristics. However, allele D is almost identical to allele A, differing by only one additional repeat present at the same position in the C allele (Fig. 3B), and thus contains no obvious unique sequence features that could account for hyperinstability. More important, half of the mutants in A/D heterozygotes were apparently derived from allele A (Fig. 3A), with a spectrum similar to those derived from allele D and with instability greatly enhanced over that seen in an A/A homozygote (Fig. 3C). It therefore appears that allele D destabilizes both alleles in an A/D heterozygote, strongly suggesting that hyperinstability is mediated by specific binding of the D variant of the PRDM9 protein to the ZnF coding array or flanking DNA. The predicted DNA-binding motif for D is indeed unique to this variant (Fig. 3B). Although no matches to this motif were seen in or near the PRDM9 array that were any stronger than corresponding matches to the A motif, this probably reflects major uncertainties in motif prediction in silico (4, 6). Likewise, Ct variants also appeared to enhance instability of non-Ct alleles in heterozygotes (Fig. 3A), although this conclusion is limited by restricted mutant numbers. The only exception was an A/C heterozygote (man 13) with an unusually high level of instability, in whom only 9% of gain mutants were derived from allele A. This distortion is significant (χ2 test, P = 0.00015) and suggests the possible existence of a factor that promotes instability in cis on the specific C haplotype carried by this man.

Instability and Population Variability.

We next used simulations to test whether the sperm mutation parameters were compatible with levels of PRDM9 variability seen in human populations (4, 7) (Fig. 4). We ignored base mutations because the predicted rate of de novo replacement substitution (26, 27) in the ZnF coding sequence (5 × 10−6/generation) is considerably lower than the rate of repeat turnover (5 × 10−5/generation for PRDM9 A/A individuals); this dominant mode of PRDM9 evolution by reshuffling preexisting repeats is compatible with contemporary patterns of PRDM9 diversity, where few alleles carry private base substitutions. Population simulations, using effective population sizes of 10,000 broadly applicable to much of human evolution over the past 3 million years (Myr) (28), were characterized by the sequential appearance of very common alleles interspersed with periods of greater diversity. Variability was similar to that seen in European and African populations (4), indicating that mutation/drift alone can account for current PRDM9 diversity, without any need to invoke diversifying selection despite clear evidence for strong selection over evolutionary time at DNA contact residues in PRDM9 (16, 29). Variability in Europeans is at the lower limit of the range seen in simulations, which showed such low variability only 5% of the time. This apparent discrepancy could be explained by demographic factors, lower mutation rates in females, or selection against some PRDM9 alleles (for example, selection against rearrangements in the 5′ region of the ZnF array, creating polarity of diversity without polar mutation). Periods of low diversity were typically 0.1–0.9 Myr long (mean 0.3 Myr, or somewhat longer if female mutation rates are low), interspersed with periods of higher diversity of similar duration. Because most changes in the PRDM9 ZnF array result in the activation of different sets of recombination hotspots (4, 7), this implies that the human recombination landscape genome-wide has undergone repeated switching on a similar timescale. This is compatible with the complete shift in location of historical recombination hotspots seen between humans and chimpanzees (12).

Fig. 4.

Fig. 4.

PRDM9 evolution and the effects of destabilizer variants. Sperm mutation parameters were used to simulate populations initially fixed for the common A variant, as described in Materials and Methods. One in 10 new mutants was assumed to encode a destabilizer, equivalent to the D variant, with a 15-fold higher level of instability in carriers compared with men lacking the destabilizer. Populations were allowed to evolve at constant size without selection for 3 Myr. (A) A typical evolutionary trajectory showing the frequencies of the most common PRDM9 alleles, with each allele colored differently. For clarity, only alleles attaining a population frequency >20% are shown. Periods dominated by a high-frequency (>50%) allele are marked by bars at the top, and contemporary A allele frequencies (4, 7) in Europeans (Eur) and Africans (Afr) are indicated by dotted lines. (B) Frequency of destabilizer (D-type) alleles. (C) Number of different alleles in the population. (D) Heterozygosity, with current European and African heterozygosities (4, 7) indicated.

PRDM9 also seems to be mutating fast enough to overcome the tendency of hotspots to be eliminated by drive in favor of hotspot variants that suppress recombination (1315). Thus, an average hotspot with crossover frequency of 0.05% will survive for typically 0.24 Myr even if it initially contains a suppressor at 50% population frequency (30), and most hotspots will initially be devoid of suppressor mutations, extending their time to extinction well beyond the point that their activating PRDM9 variant will have been replaced by a new variant in the population. A more formal analysis of this apparent resolution of the “hotspot paradox” (13) is beyond the scope of this article.

Properties of Destabilizer Variants.

The appearance of destabilizer variants such as PRDM9 D might be expected to lead to bursts of PRDM9 instability and thus abrupt shifts in the recombination landscape. However, their instability prevents them from reaching significant population frequencies, and even when they do arise, they rapidly disappear from a population through mutation (Fig. 4B). Their impact on diversity is therefore modest, with only transient increases in numbers of different alleles (Fig. 4C) and virtually no effect on heterozygosity (Fig. 4D) or on periods of low diversity. Their overall influence on population mutation rate is correspondingly low in simulations, with destabilizer variants causing only a 1.18-fold increase in mean mutation rate. Testing hypothetical destabilizer variants 10-fold more active than PRDM9 D, or fivefold less active (equivalent to Ct variants), showed similar effects on population mutation rate, with increases of only 1.23- and 1.14-fold, respectively. The existence of PRDM9 variants capable of destabilizing their coding sequence is therefore highly self-limiting, with strong destabilizers predicted to be correspondingly rare. This is consistent with the rarity of the PRDM9 D variant [only seen in Europeans (2, 4), frequency 1%] and with the limited range of instability associated with different PRDM9 alleles (Fig. 3A), far smaller than the major shifts in recombination hotspot activity caused by changes in PRDM9 (4, 7). An instability effect might also explain why very long and potentially unstable ZnF arrays arise in sperm but have not yet been seen in populations.

Conclusion.

PRDM9 provides another example of a genomic system, along with recombination hotspots, two genomic rearrangements, and three highly unstable minisatellites (4, 7) whose dynamics are regulated by PRDM9. The properties of the pathways that drive repeat instability in the ZnF coding sequence are strikingly similar to processes seen at hypervariable minisatellites (17, 18, 31, 32), including the major involvement of meiotic gene conversion, the frequently complex remodeling of alleles and a bias toward gains of repeats, most noticeable in men carrying the D allele in which 75% of mutations result in gains (Fig. 1). The autoregulation of PRDM9 is unusual however in its ability to eliminate any protein variants that strongly stimulate instability, resulting in the curious phenomenon of diversity being largely driven by the least, not most, unstable variants in a population.

Materials and Methods

DNA Fractionation.

Semen and blood samples were collected with approval from the Leicestershire Health Authority Research Ethics Committee and with informed consent. DNA samples were prepared and subsequently manipulated under conditions designed to minimize the risk of contamination (17). Aliquots of 25 μg DNA were digested with 240 U HpaI plus 240 U PvuII-HF (New England BioLabs) in 500 μL NEB4 buffer at 37 °C for 2.5 h. Digested DNA was recovered by ethanol precipitation, dissolved in 5 mM Tris-HCl, pH 7.5, and 15 μg DNA was loaded into a 2.5-cm-wide slot in a 40-cm-long 0.9% (wt/vol) SeaKem HGT (Lonza) agarose gel in 0.5 × tris-borate EDTA buffer containing 0.5 μg/mL ethidium bromide. Adjacent size markers were 4 μg λ DNA × HindIII plus 2 μg ϕX174 DNA × HaeIII, together with 2 μg of ladder ML (Table S1) consisting of PCR products matched in size and GC content to HpaI–PvuII fragments encoding PRDM9 ZnF arrays with 11–17 repeats. DNA was electrophoresed at 180 V for 15 h, then ladder ML was used as a guide to collect 15 size fractions encompassing the progenitor allele(s) plus mutants containing 6–26 repeats. Further details of fractionation are given elsewhere (33). For analysis of gain mutants only, eight fractions were recovered spanning the progenitor allele up to 26-repeat mutants. DNA was recovered from each gel slice into 50 μL of 5 mM Tris⋅HCl, pH 7.5, using a Zymoclean Gel DNA Recovery Kit (Zymo Research) according to the manufacturer’s instructions. DNA yields of 80–90% were routinely obtained.

Analysis of Size-Fractionated DNA.

Fractions were sized by agarose gel electrophoresis against markers λ DNA × HindIII, ϕX174 DNA × HaeIII, and ladder ML. Progenitor PRDM9 alleles were assayed across fractions by PCR amplification with primers PN1.1F and PN2.4aR (Table S1), followed by agarose gel electrophoresis and staining with ethidium bromide. In general, >99.7% of progenitor molecules were excluded from nonprogenitor fractions. Control genomic HpaI–PvuII DNA fragments matched in size and GC content to mutants gaining or losing a single repeat were similarly assayed using PCR primers shown in Table S1 to estimate the proportion of such mutants lost in progenitor fractions. Such losses were typically 12%. Single DNA molecule PCR efficiency was estimated by Poisson analysis of limiting dilutions of pooled fractions, amplifying PRDM9 molecules by nested long PCR, as described previously (33), with PN1.0F and PN2.4bR (Table S1) followed by PN1.1F and PN2.4aR. Analysis of five men gave a consistent single molecule efficiency of 62 ± 5% (one amplifiable molecule of PRDM9 per 4.8 pg genomic DNA).

Detection of PRDM9 Mutants.

Multiple aliquots of each nonprogenitor DNA size fraction, each containing at most 100 remaining amplifiable progenitor molecules, were amplified by long PCR in 8-μL reactions using primers PN1.0F and PN2.4bR reserved for single DNA molecule amplification for 26 cycles at 96 °C for 20 s, 57 °C for 30 s, and 65 °C for 4 min. A full survey of mutants in a man typically involved 400–800 PCR reactions. Half of each reaction was electrophoresed on a 20-cm-long 0.9% (wt/vol) SeaKem HGT agarose gel alongside a mixture of equivalent PCR products amplified from a mixture of genomic DNAs containing all known PRDM9 ZnF array lengths (8–18 repeats) (4); the mixture was adjusted to give an amount of PCR product per allele equivalent to the yield expected from a single mutant DNA molecule. DNA was transferred to an Amersham Hybond-NX membrane (GE Healthcare) and hybridized with 32P-labeled PRDM9 ZnF array probe. Each mutant detected was sized and compared with the DNA size range of the fraction within which it was found. Any candidate mutant showing a size discrepancy with its fraction was discarded as a likely PCR artifact. On average, only 12% of candidate mutants failed this stringent size-validation test; such artifacts were usually completely different in size from the fractions in which they were detected (Fig. S1). A final inventory per size fraction of PCR reactions positive or negative for a given size-validated class of mutant was then used to correct for PCR reactions containing more than one mutant molecule of a given size class. These Poisson-corrected numbers were summed across fractions, corrected if necessary for mutants lost in progenitor fractions (see Analysis of Size-Fractionated DNA), and used to estimate mutation frequencies for each size class of mutant.

Mutant Purification and Sequencing.

Mutant-positive PCR reactions were diluted threefold with water and 1-μL aliquots reamplified in 10-μL reactions with primers PN1.0F and PN2.4bR for six cycles. Three amplicons from the PL ladder (Table S1), corresponding in size to the mutant plus ±1 repeat amplicons (0.5 μg DNA total), were added to each reamplified mutant and the DNA electrophoresed on a 40-cm-long 0.9% agarose gel at 130 V for 15 h in the presence of ethidium bromide. PL amplicons were visualized on a Dark Reader transilluminator (Clare Chemical Research) and a gel slice spanning the central mutant control amplicon was excised, washed with 5 mM Tris-HCl, pH 7.5, crushed, and freeze/thawed twice with centrifugation. A total of 4 μL gel exudate per mutant was reamplified in a 20-μL reaction with the nested primers PN1.1F and PN2.4aR for 23 cycles at 96 °C for 20 s, 57 °C for 30 s, and 65 °C for 4 min. PCR products were electrophoresed as described previously, and mutants recovered from gel slices using a Zymoclean Kit. Each mutant was sequenced with oligonucleotides PN1.2F and PN2.4aR (Table S1) using BigDye Terminator v3.1 Cycle Sequencing on a 3730 DNA Analyzer (Applied Biosystems). Modification of the sequencing protocol, reducing DNA inputs to 5–15 ng and increasing sequencing cycles to 33, allowed mutants up to 20 repeats long to be fully sequenced. Some sequencing reads showed mixed traces indicating the presence of two or more different mutants of the same length. Such mixed mutants were expected given the abundance of mutants, particularly in fractions close in size to the progenitor. On average, 15% of sequenced mutants showed mixtures and were not included in subsequent analyses. The structures of mutants longer than 20 repeats were solved by reamplifying 0.4 ng mutant DNA with 0.2 μM PN2.4aR plus 10 nM PNMtag and 0.2 μM 21F (Table S1) for 16 cycles at 96 °C for 20 s, 58 °C for 30 s, and 65 °C for 4 min. PNMtag primes off all PRDM9 ZnF repeats and, together with its driver primer 21F, generates a uniform ladder of PCR products extending from each repeat to the 3′ end of the array. Appropriate rung(s) on the ladder were separated by agarose gel electrophoresis, purified, reamplified with 21F and PN2.4aR, and sequenced with 21F to generate internal sequence reads of known location that, together with end reads, allowed even the largest mutant (25 repeats) to be fully sequenced.

Sequence Fidelity and Testing for Contamination.

A total of 895 PRDM9 mutants were fully sequenced, corresponding to 1,115,352 bp ZnF repeat DNA (13,278 ZnF repeats). Almost all ZnF sequences matched repeats present in one or both progenitor alleles, or could be derived from progenitor repeats by simple exchange to create hybrid repeats not yet seen in PRDM9 alleles (45 different hybrids in total, Fig. S2), indicating high fidelity of single molecule sequencing. Only three mutant repeats showed an abnormal sequence; two showed a base change G > A or G > C at invariant positions 11 and 58, respectively, in the ZnF repeats, whereas one showed a microdeletion of CA at invariant positions 26–27 (Fig. S2). Such apparently complete base switches have been seen before in single molecule sequences (21) and were rejected as PCR artifacts probably derived from damaged DNA molecules. Mutant screening was also potentially vulnerable to contamination with genomic DNA from other individuals. Indeed, 14 different sperm mutants (83 mutants in total including recurrences) were identical to known alleles described in populations (4, 7). However, 13 (82 with recurrences) could be derived from progenitor alleles by simple exchange within long (>31 bp) IPSIs shared by misaligned alleles, and fell fully within the distribution of other mutants generated by simple exchange (Fig. 2B). These mutants were therefore expected and were treated as genuine. One mutant was identical to the common PRDM9 allele A but was recovered from man 7 homozygous for PRDM9 allele C. Although allele A could in theory be generated from allele C, the rearrangement would have to be exceedingly complex; as a result, this single mutant molecule was rejected as a contaminant.

Estimation of Mutation Rates From Population Diversity Data.

Individuals of North European or African descent showed PRDM9 ZnF allele length heterozygosities of 11% (17/156) and 57% (42/74), respectively (4, 7). Mutation rates were estimated from heterozygosity assuming selective neutrality and with populations of effective size 10,000 at mutation/drift equilibrium. We used both the infinite allele model, which will underestimate mutation rates, and a strict stepwise mutation model whereby repeat arrays only mutate by ±1 repeat changes, which will overestimate instability. The infinite allele and stepwise mutation models predicted instability rates of 0.3 × 10−5 and 1.5 × 10−5 per generation, respectively, in Europeans, and 3.3 × 10−5 and 20 × 10−5 in Africans.

Population Simulations.

Simulations of randomly mating diploid populations, initially fixed for the common PRDM9 A allele, used a constant effective population size of 10,000, a generation time of 20 y, and mutation frequencies per gamete of 5.0 × 10−5 for individuals lacking a D-type destabilizer allele and 73.5 × 10−5 for people with a destabilizer, as seen in PRDM9 A/A and A/D men, respectively. Mutation frequencies in males and females were assumed to be equal. Each mutant was tested to determine whether it was a novel allele, based on the probability P that a mutation generates a new allele not already present in the population (observed P = 0.71 and 0.92 for A/A and A/D men, respectively; Fig. S4). One mutation in 10 was assumed to create a destabilizer allele irrespective of the allele of origin, based on the observation that one destabilizer was found among 10 different alleles surveyed. Fifteen different simulations were each continued for 3 Myr, with all allele frequencies being recorded every 4,000 y. There is of course considerable uncertainty about the likelihood that a mutation will create a destabilizer allele, although simulations varying this likelihood over the range 0.01–0.3 showed that destabilizers were always ephemeral, with modest impact on diversity.

Supplementary Material

Supporting Information

Acknowledgments

We thank J. Blower and volunteers for providing semen samples, colleagues for helpful discussions, and the Medical Research Council, the Royal Society, and the Louis-Jeantet Foundation for funding support.

Footnotes

The authors declare no conflict of interest.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1220813110/-/DCSupplemental.

References

  • 1.Parvanov ED, Petkov PM, Paigen K. Prdm9 controls activation of mammalian recombination hotspots. Science. 2010;327(5967):835. doi: 10.1126/science.1181495. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Baudat F, et al. PRDM9 is a major determinant of meiotic recombination hotspots in humans and mice. Science. 2010;327(5967):836–840. doi: 10.1126/science.1183439. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Myers S, et al. Drive against hotspot motifs in primates implicates the PRDM9 gene in meiotic recombination. Science. 2010;327(5967):876–879. doi: 10.1126/science.1182363. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Berg IL, et al. PRDM9 variation strongly influences recombination hot-spot activity and meiotic instability in humans. Nat Genet. 2010;42(10):859–863. doi: 10.1038/ng.658. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Kong A, et al. Fine-scale recombination rate differences between sexes, populations and individuals. Nature. 2010;467(7319):1099–1103. doi: 10.1038/nature09525. [DOI] [PubMed] [Google Scholar]
  • 6.Grey C, et al. Mouse PRDM9 DNA-binding specificity determines sites of histone H3 lysine 4 trimethylation for initiation of meiotic recombination. PLoS Biol. 2011;9(10):e1001176. doi: 10.1371/journal.pbio.1001176. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Berg IL, et al. Variants of the protein PRDM9 differentially regulate a set of human meiotic recombination hotspots highly active in African populations. Proc Natl Acad Sci USA. 2011;108(30):12378–12383. doi: 10.1073/pnas.1109531108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Hinch AG, et al. The landscape of recombination in African Americans. Nature. 2011;476(7359):170–175. doi: 10.1038/nature10336. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Smagulova F, et al. Genome-wide analysis reveals novel molecular features of mouse recombination hotspots. Nature. 2011;472(7343):375–378. doi: 10.1038/nature09869. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Brick K, Smagulova F, Khil P, Camerini-Otero RD, Petukhova GV. Genetic recombination is directed away from functional genomic elements in mice. Nature. 2012;485(7400):642–645. doi: 10.1038/nature11089. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Hayashi K, Yoshida K, Matsui Y. A histone H3 methyltransferase controls epigenetic events required for meiotic prophase. Nature. 2005;438(7066):374–378. doi: 10.1038/nature04112. [DOI] [PubMed] [Google Scholar]
  • 12.Auton A, et al. A fine-scale chimpanzee genetic map from population sequencing. Science. 2012;336(6078):193–198. doi: 10.1126/science.1216872. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Boulton A, Myers RS, Redfield RJ. The hotspot conversion paradox and the evolution of meiotic recombination. Proc Natl Acad Sci USA. 1997;94(15):8058–8063. doi: 10.1073/pnas.94.15.8058. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Jeffreys AJ, Neumann R. Reciprocal crossover asymmetry and meiotic drive in a human recombination hot spot. Nat Genet. 2002;31(3):267–271. doi: 10.1038/ng910. [DOI] [PubMed] [Google Scholar]
  • 15.Coop G, Myers SR. Live hot, die young: transmission distortion in recombination hotspots. PLoS Genet. 2007;3(3):e35. doi: 10.1371/journal.pgen.0030035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Oliver PL, et al. Accelerated evolution of the Prdm9 speciation gene across diverse metazoan taxa. PLoS Genet. 2009;5(12):e1000753. doi: 10.1371/journal.pgen.1000753. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Jeffreys AJ, et al. Complex gene conversion events in germline mutation at human minisatellites. Nat Genet. 1994;6(2):136–145. doi: 10.1038/ng0294-136. [DOI] [PubMed] [Google Scholar]
  • 18.Jeffreys AJ, Neumann R. Somatic mutation processes at a human minisatellite. Hum Mol Genet. 1997;6(1):129–136. doi: 10.1093/hmg/6.1.129. [DOI] [PubMed] [Google Scholar]
  • 19.Persikov AV, Osada R, Singh M. Predicting DNA recognition by Cys2His2 zinc finger proteins. Bioinformatics. 2009;25(1):22–29. doi: 10.1093/bioinformatics/btn580. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Myers S, Freeman C, Auton A, Donnelly P, McVean G. A common sequence motif associated with recombination hot spots and genome instability in humans. Nat Genet. 2008;40(9):1124–1129. doi: 10.1038/ng.213. [DOI] [PubMed] [Google Scholar]
  • 21.Neumann R, Lawson VE, Jeffreys AJ. Dynamics and processes of copy number instability in human γ-globin genes. Proc Natl Acad Sci USA. 2010;107(18):8304–8309. doi: 10.1073/pnas.1003634107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Jeffreys AJ, Murray J, Neumann R. High-resolution mapping of crossovers in human sperm defines a minisatellite-associated recombination hotspot. Mol Cell. 1998;2(2):267–273. doi: 10.1016/s1097-2765(00)80138-0. [DOI] [PubMed] [Google Scholar]
  • 23.Buard J, Shone AC, Jeffreys AJ. Meiotic recombination and flanking marker exchange at the highly unstable human minisatellite CEB1 (D2S90) Am J Hum Genet. 2000;67(2):333–344. doi: 10.1086/303015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Frazer KA, et al. International HapMap Consortium A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007;449(7164):851–861. doi: 10.1038/nature06258. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Jeffreys AJ, Neumann R, Panayi M, Myers S, Donnelly P. Human recombination hot spots hidden in regions of strong marker association. Nat Genet. 2005;37(6):601–606. doi: 10.1038/ng1565. [DOI] [PubMed] [Google Scholar]
  • 26.Roach JC, et al. Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science. 2010;328(5978):636–639. doi: 10.1126/science.1186802. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Conrad DF, et al. 1000 Genomes Project Variation in genome-wide mutation rates within and between human families. Nat Genet. 2011;43(7):712–714. doi: 10.1038/ng.862. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Li H, Durbin R. Inference of human population history from individual whole-genome sequences. Nature. 2011;475(7357):493–496. doi: 10.1038/nature10231. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Thomas JH, Emerson RO, Shendure J. Extraordinary molecular evolution in the PRDM9 fertility gene. PLoS ONE. 2009;4(12):e8505. doi: 10.1371/journal.pone.0008505. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Jeffreys AJ, Neumann R. Factors influencing recombination frequency and distribution in a human meiotic crossover hotspot. Hum Mol Genet. 2005;14(15):2277–2287. doi: 10.1093/hmg/ddi232. [DOI] [PubMed] [Google Scholar]
  • 31.Buard J, Vergnaud G. Complex recombination events at the hypermutable minisatellite CEB1 (D2S90) EMBO J. 1994;13(13):3203–3210. doi: 10.1002/j.1460-2075.1994.tb06619.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Tamaki K, May CA, Dubrova YE, Jeffreys AJ. Extremely complex repeat shuffling during germline mutation at human minisatellite B6.7. Hum Mol Genet. 1999;8(5):879–888. doi: 10.1093/hmg/8.5.879. [DOI] [PubMed] [Google Scholar]
  • 33.Holloway K, Lawson VE, Jeffreys AJ. Allelic recombination and de novo deletions in sperm in the human β-globin gene region. Hum Mol Genet. 2006;15(7):1099–1111. doi: 10.1093/hmg/ddl025. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information
1220813110_sfig02.pdf (55.9KB, pdf)
1220813110_sfig04.pdf (90.2KB, pdf)
1220813110_sfig01.ps (79.4KB, ps)
1220813110_sfig03.ps (98.4KB, ps)

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES