Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2015 Jan 8.
Published in final edited form as: Nat Commun. 2014 Jul 8;5:4370. doi: 10.1038/ncomms5370

Primate evolution of the recombination regulator PRDM9

Jerrod J Schwartz 1,, David J Roach 1,, James H Thomas 1, Jay Shendure 1,*
PMCID: PMC4110516  NIHMSID: NIHMS604606  PMID: 25001002

Abstract

The PRDM9 gene encodes a protein with a highly variable tandem-repeat zinc finger (ZF) DNA-binding domain that plays a key role in determining sequence-specific hotspots of meiotic recombination genome-wide. Here we survey the diversity of the PRDM9 ZF domain by sequencing this region in 64 primates from 18 species, revealing 68 unique alleles across all groups. We report ubiquitous positive selection at nucleotide positions corresponding to DNA contact residues and the expansion of ZFs within clades, which confirms the rapid evolution of the ZF domain throughout the primate lineage. Alignment of Neanderthal and Denisovan sequences suggests that PRDM9 in archaic hominins was closely related to present-day human alleles that are rare and specific to African populations. In the context of its role in reproduction, our results are consistent with variation in PRDM9 contributing to speciation events in primates.

INTRODUCTION

Genes that cause hybrid sterility between species have the potential to reveal important insights into reproduction and the evolutionary mechanisms that drive speciation1. PRDM9, a gene involved in meiotic recombination that was identified as a hybrid sterility gene in mammals14, encodes a protein with a KRAB domain, a PR/SET histone H3(K4) trimethyltransferase domain, and a DNA binding domain consisting of a variably sized tandem-repeat array of C2H2 zinc fingers (ZFs)2 (Figure 1). PRDM9 is active during meiosis I and regulates recombination events by marking DNA for double strand breaks at locations targeted by the ZF domain5,6. High levels of genetic diversity in the ZF domain have been observed within and between species7, and this diversity has been suggested as a source of variation in recombination hotspot motifs between taxa2. The observation that variation in the PRDM9 gene may lead to sterility in human males8 and hybrid sterility in mice1 has fueled speculation that the gene can cause post-zygotic reproductive isolation in otherwise closely related individuals, ultimately leading to speciation events7.

Figure 1. Schematic representation of the PRDM9 gene.

Figure 1

The C2H2 zinc finger domain, the subject of this study, consists of a variably sized array of zinc finger repeats, which are represented as differentially colored boxes (allele Pan.t-1 diagrammed here). Each array possesses many distinct ZFs, with some repeating at multiple positions throughout the array. The nucleotide sequence of four internal fingers is shown, and identical codons from different fingers are represented as dots. The high sequence similarity between different fingers demonstrates the repetitive nature of the ZF array. The nucleotide positions that code for the protein’s DNA contact loci are colored in red.

To further investigate the potential role of PRDM9 in primate evolution, we sequenced the ZF array in a diverse set of primates and re-analyzed reads corresponding to this locus in published genomes from Neandertal and Denisovan individuals. Rapid evolution of the PRDM9 gene was seen across all primates, presumably resulting in distinct recombination landscapes for each species, and novel zinc fingers were found in the ancient hominins. These findings confirm that PRDM9 diversity is found throughout the primate lineage, and provide further support to the idea that PRDM9 plays a role in primate speciation.

RESULTS

Sequencing of the PRDM9 zinc finger array

We surveyed the genetic diversity contained within the hypervariable ZF domain of PRDM9 in 64 individuals from the following genera: Pan, Homo, Gorilla, Pongo, Hylobates, Symphalangus, Nasalis, Papio, Macaca, Simia, and Callithrix (Supplementary Tables 1 and 2). Because of the length and highly repetitive nature of the PRDM9 ZF array, we used bacterial cloning, nested Sanger sequencing, and manual curation of reads into full-length assemblies of individual alleles (we use the term “allele” here to reference the full-length, nucleotide-level sequences of ZF arrays, and consider alleles to be unique if they differ at the nucleotide level). Prior to this study, two other groups characterized 21 non-hominid zinc fingers across 25 alleles within the Pan genus9,10. Here, we report an additional 148 zinc fingers from 40 previously uncharacterized alleles across 11 primate genera (Supplementary Data 1).

Identical protein sequences for single zinc fingers were shared between individuals from different genera in only seven instances (Supplementary Fig. 1). There were three instances of zinc finger arrays identical at the protein level between individuals from different species (Gor.g-4 in Gorilla beringei and Gorilla gorilla; Hyl.p-1 in Hylobates pileatus and Hylobates gabriellae; Pan.p-1 in Pan paniscus and Pan.t-6 in Pan troglodytes), two of which (the Gorilla and Hylobates pairs) were also identical at the nucleotide level, i.e. identical alleles as per our terminology (Supplementary Table 3).

Evidence for positive selection

The DNA binding specificity of PRDM9 is determined by the residues at positions −1, 2, 3, and 6 of each ZF, and these positions show strong signals of positive selection in humans11 and chimpanzees10. To explore whether positive selection is acting on these residues in other primate lineages, we performed a pairwise codon alignment of ZFs within each genus and generated Bayes-Emperical-Bayes dN/dS estimates. We found overwhelming evidence for positive selection at positions −1, 3, and 6 across all genera, and at position 2 in some genera. There was also evidence for positive selection at positions not in contact with DNA (Figure 2). In addition to the zinc finger diversity, the size of the arrays was highly variable, with a range from 6 to 19 across all species (Figure 3). In mice, it has been shown that PRDM9 arrays differing in size by a single finger can lead to hybrid sterility1, but given the highly heterogeneous size of ZF arrays within the species examined here, this seems unlikely to generalize.

Figure 2. Evidence for positive selection across PRDM9 zinc fingers in primates.

Figure 2

Bayes-Empirical-Bayes dN/dS estimates at zinc finger positions for which there exists evidence of positive selection are shown. Almost all genera showed strong positive selection at DNA contact positions (–1, 3, and/or 6) with the Bonferroni correction method for multiple testing.

Figure 3. Diversity in PRDM9 zinc finger array size.

Figure 3

The size of each circle is proportional to the abundance of alleles of a given array size within the labeled taxonomic grouping. The “n” value is the number of alleles sampled for each group. Only those individuals that had both alleles characterized were included.

Zinc finger binding predictions

We generated predicted binding motifs for the 15 most common chimpanzee alleles (Pan.t-1 to Pan.t-15, each seen in at least two individuals) and all of the other primate alleles12 (Supplementary Fig. 2). The allele repertoire for each species is predicted to bind distinct motifs with little to no overlap between species. Although there is substantial binding site diversity within the most frequent chimpanzee alleles, we found that there is a short common motif shared by many of them (AATTnnAnTCnTCC). We investigated whether this motif has undergone any substantial depletion specifically within the chimpanzee lineage, but found it to be equally prevalent in the human, chimpanzee, and gorilla genomes13 (Supplementary Table 4). However, it should be emphasized that there is considerable uncertainty in computational motif prediction for large ZF arrays, and performing experimental binding assays may be critical for defining the actual motifs and possible recombination hotspots specified by these alleles.

Evolutionary dynamics of primate PRDM9 zinc finger arrays

The fact that PRDM9 fingers display greater sequence identity within species than between species is consistent with observations in other tandem satellite families, and is evidence that the PRDM9 ZF locus is undergoing concerted evolution14. To further investigate the evolutionary dynamics of the PRDM9 ZF domain, we constructed a maximum-likelihood tree using an alignment of all primate ZF DNA sequences with codons encoding contact residues masked because of their extreme diversity (Figure 4). Statistical support is low for many branches, but we nonetheless observe clear structure within parts of the phylogeny. Remarkably, all of the 5′-most fingers in the ZF array form a well-defined, strongly supported clade, suggesting that an ancestral finger has retained this position throughout the primate lineage. At the 3′ most position, a distinct ancestral finger appears to have retained its position in old world monkeys, gibbons, and Sumatran orangutans, but has been displaced by internal fingers in other species (Figure 4 and Supplementary Fig. 1). The dynamics at internal finger positions are clearly much more complex, but some interesting patterns can be discerned. For example, the zinc fingers in New World Monkeys, especially Callithrix jacchus, cluster very tightly together on the tree, suggesting that a single ancestral ZF expanded to all but the most 5′ position of the ZF array along this lineage. The patterns and extent of diversity we observe at PRDM9 are consistent with the germline instability of the ZF domain shown by Jeffreys et al.15, although it remains unclear whether the apparently higher constraints on the 5′ and 3′ most ZF positions are a consequence of selection or mutational mechanism. Overall, the remarkable diversity observed in the ZF domain across 11 primate genera suggests that PRDM9 may activate recombination hotspots that are largely unique to each primate species, consistent with the lack of conservation in hotspot usage between chimpanzees and humans9,16.

Figure 4. Phylogenetic tree demonstrating evolutionary relationships between zinc fingers.

Figure 4

Maximum-likelihood reconstruction of the zinc finger phylogeny, colored by genus. Each finger is represented with a colored bar, and branches with strong approximate likelihood ratio test (aLRT) values (greater than 0.8) are marked with circles. The most 5′ fingers from all genera form a clear clade with an aLRT value of 1. The most 3′ “stop” fingers are also closely related (aLRT value 0.86) in several distantly related genera. In contrast, intra-species similarity can be seen throughout the rest of the tree, where fingers from the same or closely related genera form clusters, consistent with a concerted evolution mechanism. This is most clearly exhibited in the New World Monkeys, where nearly all internal fingers are very closely related to one another, especially in the Callithrix jacchus (brown bars).

Chimpanzee PRDM9 diversity and population structure

To explore the diversity of the ZF locus within a single species, we combined the results from two previous chimpanzee-sequencing studies9,10 with the large cohort in our study. The resulting dataset consisted of 79 individuals with 142 chromosomes characterized (16 individuals were missing data for one allele) (Supplementary Table 2). In total, we documented 34 alleles comprised of varying combinations of 23 ZFs (Figure 5). Of the 63 individuals with data for both chromosomes, 67% (42/63) were heterozygous, and there were 45 different genotypes with only 11 genotypes present in multiple individuals. To test for Hardy-Weinberg equilibrium, we organized the alleles into three distinct groups according to 5′ structural similarity (Figure 5). Interestingly, the population is not at equilibrium when analyzed in this way, with group “B” allele homozygotes at a higher prevalence than expected (Chi-squared p-value < 0.001). Consistent with population stratification as an underlying explanation, in the Groeneveld et al. and Auton et al. studies, all individuals with a “B” group allele belonged to chimpanzee subspecies Pan troglodytes troglodytes or Pan troglodytes schweinfurthii (subspecies data was unavailable for individuals sequenced in our study). These two subspecies form a monophyletic clade within Pan troglodytes17, suggesting that some PRDM9 alleles possibly arose within certain chimpanzee lineages and providing further support that the gene is undergoing concerted evolution in primates.

Figure 5. PRDM9 zinc finger array diversity in chimpanzees.

Figure 5

Shown is a schematic representation of the zinc finger arrays of the 34 chimpanzee alleles described across three datasets. Each colored box represents a different finger, and asterisks denote fingers with synonymous differences. The grey boxes represent fingers that were not sequenced but are assumed to be ZF “L”. The variability in ZF content and array size can be clearly seen in the different alleles. Although the rapid evolution of the ZF domain structure makes it difficult to precisely detail ancestral relationships between alleles, the 5′ end of the arrays displays structural patterns that enable alleles to be grouped for Hardy-Weinberg analysis.

We sequenced too few primates to perform comparable analyses in other taxa, but in a separate study conducted in humans18, 25 unique PRDM9 alleles were described in a cohort of 124 individuals (248 chromosomes) of both European and African descent. There were 36 distinct genotypes and 51% (63/124) of individuals were heterozygous, and the most common allele in humans was at a much higher prevalence than the most common chimpanzee allele (68.5% vs. 18.3%) (Supplementary Fig. 3 and Supplementary Table 5). Some human alleles are only present in either the European or African lineage, demonstrating PRDM9 population stratification in humans as well (Supplementary Table 5). We anticipate that more extensive sequencing of other primate genera will continue to reveal PRDM9 diversity as a reflection of the underlying population structure within each species.

Polymorphism informational content (PIC) values are a useful determinant of the diversity present at a given locus19. We calculated12,20 the PIC value for our chimpanzee cohort to be 0.9, while in humans it was only 0.51. For reference, one of the most diverse gene in humans21, HLA-B, was shown in one study of 234 individuals (436 chromosomes) to have a PIC value of 0.9522. That PRDM9 is so much more diverse in chimpanzees, approaching the level of diversity seen in the most diverse human gene, is perhaps unsurprising given that Pan troglodytes is known to be more genetically diverse than humans, but whether this increased diversity at the PRDM9 ZF locus is a reflection of population history or biological constraint is difficult to assess17.

Characterization of PRDM9 zinc fingers in ancient hominins

To explore the more recent evolution of PRDM9, we mapped the raw sequence reads from the high coverage Neandertal23 and Denisovan24 genome projects to a library of all known primate adjacent finger pairings. Although Neandertals, Denisovans, and modern humans diverged between 381,000 and 473,000 years ago14, we found that they share both PRDM9 ZF sequences and ZF linkages (Figure 6 and Supplementary Figs. 4 and 5). The Denisovans, however, have two ZFs with synonymous changes that appear to be unique to their lineage, as they have not been observed in humans. Furthermore, two adjacent finger pairings that are rare and specific to modern African populations18 were observed in the Denisovan and Neandertal shotgun data: D-R and D-S (finger-finger nomenclature adopted from Berg et al.18; minor allele frequencies within African populations are 0.025 and 0.008, respectively).

Figure 6. PRDM9 zinc finger profile of early hominins.

Figure 6

Raw reads from the Denisovan and Neandertal genome projects were mapped against primate PRDM9 zinc finger linkages. A) For three of the human linkages, we identified positions that had changed along the Denisovan lineage (marked with an arrow), which we denote as new fingers B* and I* (synonymous to human fingers B and I, respectively). Reads encompassed by a red box sufficiently span the exclusively identifying positions of both fingers to identify a finger-finger linkage. Note that not all mapped reads are shown. B) Two rare human linkages, D-R and D-S, were only identified in the raw Denisovan and Neandertal datasets, respectively. C) Codon alignments for the two new Denisovan fingers. For the B* finger (top), a synonymous change (G->A) at position 13 is unique to the Denisovan lineage, whereas both humans and chimpanzees have the ancestral base (G). For the I* finger (bottom), the Denisovan finger contains a derived synonymous change at position -4 (C->T).

Diversity of PRDM9 outside of the zinc finger locus

Recent work15 suggests that PRDM9 may actually influence the instability of its own coding sequence in humans, and that rapid remodeling of alleles predicts fast changes in hotspot usage. To determine if the region of genomic instability is restricted to PRDM9’s ZF region or if it extends into surrounding genomic sequence, we used long-range PCR and massively parallel sequencing to explore nearby variation in a subset of primates. We found that the region immediately flanking the zinc finger region (3.2 kb on the 5′-end and 1.4 kb on the 3′-end) contained approximately the expected number of homozygous differences relative to the human genome17 (Supplementary Fig. 6, Supplementary Table 6, and Supplementary Data 2–20). Variation in these immediately flanking regions enables accurate reconstruction of a species-level phylogeny for all great apes (Supplementary Fig. 7), consistent with the markedly elevated genetic diversity being restricted to the ZF array alone and the absence of any deep coalescence as is the case at other highly polymorphic loci such as the MHC locus.

DISCUSSION

In summary, we report the first large-scale survey of the genetic diversity of PRDM9 across the primate lineage. The remarkable amount of genetic diversity present between otherwise closely related species demonstrates that this gene is rapidly evolving in all or nearly all primates. Furthermore, the high levels of positive selection at the DNA contact positions and extensive structural variation with the ZF array suggest that these alleles are likely functional and active in specifying hotspot locations. Taken together, our data are consistent with the idea that variation in PRDM9 can lead to differential hotspot usage, which may result in hybrid sterility and contribute to speciation in the primate lineage. High throughput functional testing of these different PRDM9 alleles is needed to identify their cognate DNA binding sites, and phased genome sequencing of corresponding primate genomes may facilitate mapping of recombination hotspots. It is likely that we have only begun to sample the extent of the genetic diversity present in the PRDM9 gene in primates, and that continued exploration of its functional consequences will yield further insight into mechanisms that drive evolution and speciation events.

METHODS

Amplification and sequencing of the zinc finger PRDM9 array

Primate genomic DNA was obtained from the Coriell Cell Repositories or Evan Eichler’s Lab at the University of Washington (Supplementary Table 1). In the preliminary phase of this study, we tested out multiple different primer pairs in combination with a high fidelity and long-processivity DNA polymerase on four chimpanzee individuals (Supplementary Table 7: “Pilot set of primers”). The aim of this phase was to discover an optimal set of primer sequences, PCR conditions, and size selection procedures to maximize product while minimizing chimera formation and non-specific amplification of the PRDM9 homolog PRDM7.

PCR and sequencing primers were designed using Primer3 to selectively amplify the zinc finger region of PRDM9 based on the chimpanzee reference genome (chr5: 91,796,444-91,798,315, CSAC 2.1.4/panTro4, Supplementary Table 7). We identified a subset of optimal primers that included >50 bp of unique, non-repetitive flanking sequence around the ZF array (Supplementary Table 7: ZF_forward, ZF_reverse, Mac_ZF_f, Mac_ZF_r) to use with the entire primate sample set. Real-time PCR was performed on 10 ng of genomic DNA using Kapa HiFi HotStart ReadyMix with the following thermal cycling protocol: 95°C for 180 sec, repeat [98°C for 20 sec, 65°C for 15 sec, 72°C for 80 sec]. Each reaction was stopped prior to it leaving the exponential amplification stage to minimize PCR artifacts (e.g. chimeras) that tend to amplify preferentially in the post-exponential amplification phase. Products were size-separated and size-selected using polyacrylamide gel electrophoresis (PAGE, example gel image in Supplementary Fig. 8). Chimeras, PRDM7, and primer-primer dimers were typically seen as fainter ladder-like bands beneath the main band when analyzed via PAGE. To improve our specificity for sequencing PRDM9, the largest and brightest bands corresponding to full-length amplification were size-selected from the gel. This procedure also allowed us to isolate and sequence arrays that differed in size by as little as one finger (84 bp).

Size-selected products were cloned into pUC19 (InFusion, Invitrogen), transformed into E. coli NovaBlue competent cells (EMD Millipore), and 4–24 single colonies for each PCR product were picked by hand for amplification by TempliPhi (GE Healthcare Life Sciences). Selecting multiple clones minimized the false-positive rate of any rare chimeric product that had the same size as a true allele. Without additional purification, clonally amplified DNA was Sanger sequenced at GENEWIZ using four different sequencing primers giving 900–1000 bp of sequence in the forward and reverse directions (Supplementary Table 7: Sanger_F, Sanger_R, M13F, M13R). In total for the 64 primates studied, we sequenced 723 unique clones each with >= 2 different primer combinations in 2,270 independent Sanger sequencing reactions (not all clone:primer combinations were attempted). See Supplementary Table 2 for the number of clones, primer pairs, and reads analyzed for each sample.

Forward and reverse reads that had sufficient overlap (typically >=2 unique fingers) were used to stitch together a consensus sequence for each clone (Supplementary Table 3 and Supplementary Data 1). To be considered a valid sequence, at least two clones had to have identical sequence supporting a given consensus sequence. If only one band was present on PAGE and all clones generated from the isolated DNA had identical sequence, the individual was considered homozygous. Any aberrant sequences that differed from the canonical finger structure were discarded, and when necessary, samples were re-amplified and sequenced. All sequence data was visualized in Sequencher (Gene Codes Corporation).

Identification of new PRDM9 fingers in Neandertal and Denisova shotgun data

The raw sequence data for the Altai Neandertal and Denisova genomes was downloaded from the Max Planck Institute for Evolutionary Anthropology. First, we mapped all reads using mrsFAST25 to a reference library comprised of all known primate PRDM9 nucleotide sequences allowing for an edit distance of up to 3. This revealed that the vast majority of mapped reads were either perfect matches or had a single base change with respect to a human PRDM9 sequence (Supplementary Fig. 4). Given the highly repetitive nature of the zinc finger region and the short-read nature of the data, many reads did not exclusively identify a linkage or even a finger. However, there were enough longer reads that spanned through an exclusively identifying region of two fingers to prove informative.

We refer to a “linkage” as a unique combination of any two primate zinc finger sequences. A read will identify the presence of a linkage if it spans both the junction and the exclusively identifying variants in both fingers. We wanted to identify two things: 1) the presence of known primate linkages, and 2) the presence of novel fingers/linkages. First, we generated a reference consisting of all known 168 bp primate PRDM9 zinc finger linkages (ie. Homo A–B, B–C, C–D, etc.). For each linkage, we identified all the reads that mapped to it exclusively and perfectly without error. These reads were then removed from the mapped position file so that they would not inadvertently indicate the presence of variants in other linkages.

By repeating this process for every linkage, the list of reads was effectively divided into two groups: those that map perfectly and exclusively to a linkage and those that map to a linkage but are either not exclusive or differ at one or more positions (we call this group the uncertain origin group (UOG)). To identify novel fingers or linkages, one by one we merged each linkage’s exclusive and perfect reads and all of the UOG reads and re-mapped them together. This effectively gives all the UOG reads a chance to vote for their presence in each linkage, while allowing the perfect and exclusive reads to vote only for their perfect linkages. After repeating this process for all known linkages, novel fingers were identified through manual curation of the resulting pileups (Figure 6a).

We used Monte Carlo simulations to identify the minimum possible set of PRDM9 linkages that still accurately represented the available data. We started by mapping all reads to the collection of all possible linkages (including all possible linkages involving the newly discovered fingers B* and I*) and counting the number of perfectly mapped reads (Supplementary Fig. 5). Next, we randomly deleted one of the linkages in the reference library and re-mapped the reads. If the number of perfectly mapped reads remained unchanged, the process was repeated: another linkage was randomly deleted from the reference and the reads were re-mapped again. If there were fewer mapped reads after deleting a linkage, the linkage was returned to the library and the process was repeated. This cycle was repeated until a reduced set of core linkages was obtained, identified at the point whereby deleting any of them would result in additional reads that do not perfectly map. Due to the nature of this random walk through linkage space, not all linkages present in the final set are necessarily unique. Some may instead represent one out of a few equally possible alternatives. To identify these uncertain linkages, we repeated the entire simulation at least 10 times and kept track of which linkages appeared at the end of every simulation. Such linkages that appeared at the end of every trial were deemed to be necessary; that is, the data cannot be reconciled unless they are present.

Amplification and sequencing of the PRDM9 flanking genomic sequence

Real-time PCR was performed as described above using Kapa HiFi HotStart ReadyMix as directed by the manufacturer (Supplementary Table 7: 5.8kb_forward/reverse, 5.8kb_f_alt/r_alt), depending on the desired target, and the following thermal cycling protocol: 95°C for 180 sec, repeat [98°C for 20 sec, 65°C for 15 sec, 72°C for 180 sec]. Samples were loaded on a 1% agarose gel and run for 1 hour at 100 V. Bands of the desired size were size selected and purified (Supplementary Fig. 9). A second, four-fold larger volume PCR using the same conditions was then run using the size-selected sample. The PCR products were pooled for each individual and purified using Agencourt AMPure magnetic beads, following the manufacturer’s protocol. Samples were eluted in 50 μl and loaded directly into the Covaris Adaptive Focused Acoustics machine. Standard shotgun libraries were then prepared for paired-end (2 × 250 bp) Illumina MiSeq sequencing (sequencing primers in Supplementary Table 7: MiSeq_p7 and MiSeq_p5).

Tests for positive selection

To detect sites under positive selection, we performed a multiple sequence alignment of all zinc finger DNA sequences for each genus (Clustal Omega)26. The phylogenetic guide trees generated were then used as input to test for positive selection using the codeml package of PAML v4.727 as previously described, with the Bonferroni correction for multiple testing.

ZF array binding site prediction

We used the methods described by Persikov et al.12 to generate predicted binding motifs for the 15 most frequent chimpanzee alleles (Pan.t-1 to Pan.t-15, each seen in at least two individuals) and all of the other primate alleles. Briefly, each zinc finger array was translated into amino acid sequence and divided into sequential subarrays comprised of three adjacent fingers. The predicted binding interaction for each subarray was calculated against all possible 10 bp DNA sequences using a support vector machine12. The top 250 DNA sequences with the highest predicted binding potential for each subarray were then used to generate a nucleotide position weight matrix (PWM), and PWMs from adjacent subarrays were merged assuming equal weight at overlapping positions. Sequence logo plots were then generated using ggPlot and seqLogo.

Generation of the PRDM9 zinc finger phylogeny

To make a comprehensive dataset of known primate PRDM9 zinc finger sequences, we combined the ZFs sequenced in our study with those from Berg et. al., Auton et. al., and Groeneveld et. al. Next, we removed the hypervariable nucleotide positions encoding DNA contact loci from the ZF sequences to limit noise during phylogenetic reconstruction. A multiple sequence alignment was then created (Clustal Omega)26 and used to generate a maximum-likelihood tree with Phyml 3.028. Approximate likelihood ratio tests above 0.8 are considered confident28, and we chose this as a threshold for marking branches with “strong” support. FigTree was used for generating a graphical view of the phylogeny29.

Supplementary Material

1

Acknowledgments

We thank E. Eichler, C. Baker, P. Sudmant, M. Dennis, O. Ryder, the Southwest Foundation for Biomedical Research, the Human Genome Diversity Project, and W. Swanson for providing primate DNA samples, and J. Kitzman, M. Kircher and other members of the Shendure Lab for helpful discussions. J.J.S. was funded by a Helen Hay Whitney Foundation postdoctoral fellowship and D.J.R. was funded by through the Howard Hughes Medical Institute Medical Research Fellows Program. Our work was supported by grant HG006283 from the National Genome Research Institute (NHGRI) to J.S.

Footnotes

AUTHOR CONTRIBUTIONS

J.J.S., J.H.T. and J.S. designed the study; J.J.S. and D.J.R. performed the experiments; and J.J.S., D.J.R., and J.S. analyzed the data and wrote the manuscript.

COMPETING FINANCIAL INTERESTS

The authors declare no competing financial interests.

ACCESSION CODES

The DNA sequences generated in this study have been deposited in GenBank nucleotide database under the accession codes KJ916126 toKJ916191.

References

  • 1.Mihola O, Trachtulec Z, Vlcek C, Schimenti JC, Forejt J. A mouse speciation gene encodes a meiotic histone H3 methyltransferase. Science. 2009;323:373–5. doi: 10.1126/science.1163601. [DOI] [PubMed] [Google Scholar]
  • 2.Baudat F, et al. PRDM9 is a major determinant of meiotic recombination hotspots in humans and mice. Science. 2010;327:836–40. doi: 10.1126/science.1183439. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Myers S, et al. Drive against hotspot motifs in primates implicates the PRDM9 gene in meiotic recombination. Science. 2010;327:876–9. doi: 10.1126/science.1182363. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Parvanov ED, Petkov PM, Paigen K. Prdm9 controls activation of mammalian recombination hotspots. Science. 2010;327:835. doi: 10.1126/science.1181495. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Hayashi K, Yoshida K, Matsui Y. A histone H3 methyltransferase controls epigenetic events required for meiotic prophase. Nature. 2005;438:374–8. doi: 10.1038/nature04112. [DOI] [PubMed] [Google Scholar]
  • 6.Grey C, et al. Mouse PRDM9 DNA-binding specificity determines sites of histone H3 lysine 4 trimethylation for initiation of meiotic recombination. PLoS Biol. 2011;9:e1001176. doi: 10.1371/journal.pbio.1001176. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Oliver PL, et al. Accelerated evolution of the Prdm9 speciation gene across diverse metazoan taxa. PLoS Genet. 2009;5:e1000753. doi: 10.1371/journal.pgen.1000753. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Irie S, et al. Single-nucleotide polymorphisms of the PRDM9 (MEISETZ) gene in patients with nonobstructive azoospermia. J Androl. 2009;30:426–31. doi: 10.2164/jandrol.108.006262. [DOI] [PubMed] [Google Scholar]
  • 9.Auton A, et al. A fine-scale chimpanzee genetic map from population sequencing. Science. 2012;336:193–8. doi: 10.1126/science.1216872. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Groeneveld LF, Atencia R, Garriga RM, Vigilant L. High diversity at PRDM9 in chimpanzees and bonobos. PLoS One. 2012;7:e39064. doi: 10.1371/journal.pone.0039064. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Thomas JH, Emerson RO, Shendure J. Extraordinary molecular evolution in the PRDM9 fertility gene. PLoS One. 2009;4:e8505. doi: 10.1371/journal.pone.0008505. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Persikov AV, Osada R, Singh M. Predicting DNA recognition by Cys2His2 zinc finger proteins. Bioinformatics. 2009;25:22–9. doi: 10.1093/bioinformatics/btn580. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Grant CE, Bailey TL, Noble WS. FIMO: scanning for occurrences of a given motif. Bioinformatics. 2011;27:1017–8. doi: 10.1093/bioinformatics/btr064. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Rudd MK, Wray GA, Willard HF. The evolutionary dynamics of alpha-satellite. Genome Res. 2006;16:88–96. doi: 10.1101/gr.3810906. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Jeffreys AJ, Cotton VE, Neumann R, Lam KW. Recombination regulator PRDM9 influences the instability of its own coding sequence in humans. Proc Natl Acad Sci U S A. 2013;110:600–5. doi: 10.1073/pnas.1220813110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Ptak SE, et al. Fine-scale recombination patterns differ between chimpanzees and humans. Nat Genet. 2005;37:429–34. doi: 10.1038/ng1529. [DOI] [PubMed] [Google Scholar]
  • 17.Prado-Martinez J, et al. Great ape genetic diversity and population history. Nature. 2013;499:471–5. doi: 10.1038/nature12228. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Berg IL, et al. PRDM9 variation strongly influences recombination hot-spot activity and meiotic instability in humans. Nat Genet. 2010;42:859–63. doi: 10.1038/ng.658. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Botstein D, White RL, Skolnick M, Davis RW. Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am J Hum Genet. 1980;32:314–31. [PMC free article] [PubMed] [Google Scholar]
  • 20.Nagy S, et al. PICcalc: an online program to calculate polymorphic information content for molecular genetic studies. Biochem Genet. 2012;50:670–2. doi: 10.1007/s10528-012-9509-1. [DOI] [PubMed] [Google Scholar]
  • 21.Horton R, et al. Gene map of the extended human MHC. Nat Rev Genet. 2004;5:889–99. doi: 10.1038/nrg1489. [DOI] [PubMed] [Google Scholar]
  • 22.Zuniga J, et al. HLA class I and class II conserved extended haplotypes and their fragments or blocks in Mexicans: implications for the study of genetic diversity in admixed populations. PLoS One. 2013;8:e74442. doi: 10.1371/journal.pone.0074442. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Prufer K, et al. The complete genome sequence of a Neanderthal from the Altai Mountains. Nature. 2014;505:43–9. doi: 10.1038/nature12886. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Meyer M, et al. A high-coverage genome sequence from an archaic Denisovan individual. Science. 2012;338:222–6. doi: 10.1126/science.1224344. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Hach F, et al. mrsFAST: a cache-oblivious algorithm for short-read mapping. Nat Methods. 2010;7:576–7. doi: 10.1038/nmeth0810-576. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Sievers F, Higgins DG. Clustal Omega, accurate alignment of very large numbers of sequences. Methods Mol Biol. 2014;1079:105–16. doi: 10.1007/978-1-62703-646-7_6. [DOI] [PubMed] [Google Scholar]
  • 27.Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007;24:1586–91. doi: 10.1093/molbev/msm088. [DOI] [PubMed] [Google Scholar]
  • 28.Guindon S, et al. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol. 2010;59:307–21. doi: 10.1093/sysbio/syq010. [DOI] [PubMed] [Google Scholar]
  • 29.Bazinet AL, Zwickl DJ, Cummings MP. A Gateway for Phylogenetic Analysis Powered by Grid Computing Featuring GARLI 2.0. Syst Biol. 2014 doi: 10.1093/sysbio/syu031. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

RESOURCES