Skip to main content
PLOS Biology logoLink to PLOS Biology
. 2025 Jan 6;23(1):e3002950. doi: 10.1371/journal.pbio.3002950

PRDM9 drives the location and rapid evolution of recombination hotspots in salmonid fish

Marie Raynaud 1,*,#, Paola Sanna 2,#, Julien Joseph 3, Julie Clément 4, Yukiko Imai 5, Jean-Jacques Lareyre 6, Audrey Laurent 6, Nicolas Galtier 1, Frédéric Baudat 2, Laurent Duret 3,‡,*, Pierre-Alexandre Gagnaire 1,‡,*, Bernard de Massy 2,‡,*
Editor: Nick H Barton7
PMCID: PMC11703093  PMID: 39761307

Abstract

In many eukaryotes, meiotic recombination occurs preferentially at discrete sites, called recombination hotspots. In various lineages, recombination hotspots are located in regions with promoter-like features and are evolutionarily stable. Conversely, in some mammals, hotspots are driven by PRDM9 that targets recombination away from promoters. Paradoxically, PRDM9 induces the self-destruction of its targets and this triggers an ultra-fast evolution of mammalian hotspots. PRDM9 is ancestral to all animals, suggesting a critical importance for the meiotic program, but has been lost in many lineages with surprisingly little effect on meiosis success. However, it is unclear whether the function of PRDM9 described in mammals is shared by other species. To investigate this, we analyzed the recombination landscape of several salmonids, the genome of which harbors one full-length PRDM9 and several truncated paralogs. We identified recombination initiation sites in Oncorhynchus mykiss by mapping meiotic DNA double-strand breaks (DSBs). We found that DSBs clustered at hotspots positioned away from promoters, enriched for the H3K4me3 and H3K36me3 and the location of which depended on the genotype of full-length Prdm9. We observed a high level of polymorphism in the zinc finger domain of full-length Prdm9, indicating diversification driven by positive selection. Moreover, population-scaled recombination maps in O. mykiss, Oncorhynchus kisutch and Salmo salar revealed a rapid turnover of recombination hotspots caused by PRDM9 target motif erosion. Our results imply that PRDM9 function is conserved across vertebrates and that the peculiar evolutionary runaway caused by PRDM9 has been active for several hundred million years.


PRDM9 is a DNA-binding protein that helps determine the location of recombination hotspots in many mammals. This study of several species of salmonid fish reveals that PRDM9 function is conserved across vertebrates and that the peculiar evolutionary runaway caused by PRDM9 has been active for several hundred million years.

Introduction

Meiotic recombination (i.e., the exchange of genetic material between homologous chromosomes during meiosis) is highly conserved in a wide range of sexually reproducing eukaryotes, including plants, fungi, and animals [1]. This process is initiated by the programmed formation of DNA double-strand breaks (DSBs), followed by their repair using the homologous chromosome as template. Recombination events can lead to the reciprocal exchange of flanking regions (crossovers, COs) or proceed without reciprocal exchange (non-crossovers, NCOs). COs are essential for the proper segregation of homologous chromosomes [2]. Failure to form COs can lead to aneuploid reproductive cells or to defects in meiotic progression and sterility [3]. Meiotic recombination also plays an important evolutionary role. It increases genetic diversity by creating novel allele combinations [4,5] that facilitate adaptation and the removal of deleterious mutations from natural populations [68].

Intriguingly, the CO rate varies not only among species, populations, sexes, and individuals, but also along the genome [911]. Broad-scale patterns of variation within chromosomes (at the megabase scale) have been observed in some species: low recombination rate near centromeres and high recombination rate in telomere-proximal regions [12]. At a finer scale (kilobases), CO rate across the genome ranges from nearly uniform (e.g., flies, worms, and honeybees) [1315] to highly heterogeneous (e.g., yeast, plants, and vertebrates). In such non-uniform recombination landscapes, most recombination events are typically concentrated within short intervals of about 2 kb, called recombination hotspots [16,17]. Studies on the evolutionary dynamics of recombination hotspots have identified 2 alternative mechanisms for controlling hotspot localization. In many eukaryotes (e.g., Arabidopsis, budding yeast, swordtail fish, birds, and canids), hotspots tend to be located near chromatin accessible regions enriched for H3K4me3, including promoters and transcription start sites (TSSs) [1825]. Elevated recombination rates are also observed at transcription end sites (TESs) in plants and birds [25,26]. In dogs and birds, recombination hotspots are particularly associated with TSSs that are located within CpG islands (CGIs) [18,25,27]. Hotspot location is conserved over large evolutionary timescales in birds and yeasts [22,23,25], likely because promoters are evolutionarily stable. However, the generality of this conclusion remains to be evaluated [28]. On the other hand, mammalian species, including primates, mice, and cattle, show a drastically different pattern. Their recombination hotspots tend to occur independently of open chromatin regions [2932], and their positions evolve rapidly between closely related species and even populations [29,30,3335]. The genomic location of mammalian hotspots is controlled by the PRDM9 protein [32,36,37] that has 4 canonical domains (KRAB, SSXRD, PR/SET, and zinc finger, ZF), among which the C2H2 ZF domain binds to a specific DNA motif. After PRDM9 binding to this motif, PRDM9 trimethylates H3K4 and H3K36 on adjacent nucleosomes through its SET domain. Then, the proteins required for DSB formation are recruited at PRDM9 binding sites. The formed DSBs are repaired by homologous recombination, leading to COs and NCOs [38]. Two striking evolutionary properties of PRDM9 have been identified. First, PRDM9 triggers the erosion of its binding sites, due to biased gene conversion during DSB repair [32,39,40]. Second, its ZF array presents a very high diversity [4146] resulting from rapid evolution driven by a Red Queen dynamic in which positive selection favors the formation of new ZF arrays that recognize new binding motifs [39,40,4750]. This is the direct consequence of PRDM9 binding site erosion that decreases the efficiency of inter-homolog DSB repair, thus leading to lower fitness [47,5053]. As a result, in Mus musculus, strains carrying different PRDM9 alleles generally share only 1% to 3% of DSB hotspots [34], and hotspot locations hardly overlap between humans and chimpanzees [29]. Thus, PRDM9-dependent and -independent hotspots display different genomic locations and also evolutionary lifespan. This raises the question of why and how the genetically unstable mechanism of PRDM9-directed recombination has evolved [19,54].

Understanding the function and evolutionary dynamics of PRDM9 in mammals has been a major breakthrough [29,32,36,42,55]. Phylogenetic studies of the Prdm9 gene have revealed the presence of a full-length copy in many metazoans and also repeated partial or complete losses [19,27,54,56]. This is surprising for a gene that controls such a crucial mechanism as reported in mammals. Among vertebrates, fine-scale recombination maps from species lacking Prdm9 (e.g., birds and dogs) or harboring a truncated KRAB-less Prdm9 (e.g., swordtail fish or three-spined stickleback), revealed that their recombination hotspots are enriched at CGI-associated promoters [18,19,22,25,27,57], as observed in Prdm9 knockout mice or rats [30,58]. In snakes, which carry a full-length Prdm9 copy, the predicted binding sites of PRDM9 alleles are associated with increased recombination rates, which suggests that the sites of recombination are at least in part specified by the DNA binding property of PRDM9 similarly to mammals [59]. Whether PRDM9-mediated epigenetic modifications are functional in snakes is not known. However, snake genomes also show an enrichment of recombination at promoter-like features [59,60] that appears to be Prdm9 independent [59]. Interestingly, all vertebrate species with a full-length PRDM9 show evidence of rapid evolution in its DNA-binding domain, as predicted by the Red Queen model [19]. Furthermore, ZCWPW1, which binds H3K4me3 and H3K36me3, appears to co-evolve with PRDM9 in vertebrates [56]. All these observations suggest that the function of PRDM9, as described in mammals, might be ancestral to all vertebrates, and that the partial or complete loss of Prdm9 leads to a reversal of the default mechanism of hotspot location at gene promoters. However, it should be noted that with the exception of mammals, current knowledge of PRDM9 function relies only on indirect evidence. Furthermore, with the exception of mammals and snakes, fine-scale recombination landscapes have only been studied in animals lacking a functional PRDM9 (e.g., fruit flies [13], birds [22,25], three-spined stickleback [57,61], swordtail fish [19], lizards [62], and honeybees [15]). Thus, the question of PRDM9 function and how it evolved, particularly whether it was ancestrally involved in regulating recombination hotspots, or whether this function appeared more recently remains to be explored. To address this question, we need to characterize the recombination landscapes in other nonmammalian taxa that harbor PRDM9 and determine whether their characteristics and dynamics are similar to those described in mammals.

To this aim, we investigated the putative function of PRDM9 in salmonids, a diverse family of teleost fishes in which a full-length Prdm9 has been found [19,56]. Genes that have been shown to co-evolve with Prdm9 (Zcwpw1, Zcwpw2, Tex15, and Fbxo47) are all present in salmonids [56]. Thus, the phylogenetic position of salmonids is ideal for testing the hypothesis of an ancestral PRDM9 role in regulating meiotic recombination in vertebrates. We used the large amount of genomic resources available in salmonids and also generated new data to test the role of PRDM9 in driving the location of recombination events in salmonids. Specifically, if the role of PRDM9 in salmonids were the same as in mammals, we would expect (i) the presence of recombination hotspots; (ii) located away from promoters; (iii) overlapping with enrichment for H3K4me3 and H3K36me3; (iv) showing rapidly evolving landscapes between closely related species and populations; and (v) associated with high diversity of the PRDM9 ZF domain. Importantly, salmonids have undergone 2 rounds of whole genome duplication (WGD) [6365], offering the opportunity to investigate the impact of gene duplication (GD) on Prdm9 evolutionary dynamics.

To test these hypotheses, we first analyzed the functional conservation of the many Prdm9 duplicated copies across the phylogeny of salmonids. We then characterized the functional Prdm9 allelic diversity in Atlantic salmon and rainbow trout to assess the evolutionary dynamics of the ZF array. We also determined the meiotic DSB landscape in rainbow trout using chromatin immunoprecipitation (ChIP) of the recombinase DMC1 followed by sequencing, and compared it with the genomic landscapes of the H3K4me3 and H3K36me3 modifications. Lastly, we reconstructed linkage disequilibrium (LD)-based recombination landscapes in 5 populations from 3 different salmonid species to identify hotspots, test their association with genomic features, and measure their evolutionary stability. Our results provide a body of evidence supporting PRDM9 role as a determinant of recombination hotspots in salmonids.

Results

Duplication history and differential retention of Prdm9 paralogs in salmonids

The analysis of the genomes of 12 salmonid species and of northern pike and sea bass (used as outgroups) revealed multiple paralogous copies of the Prdm9 gene. These paralogs partly resulted from 2 rounds of WGD: the teleost-specific WGD that occurred approximately 320 Mya (referred to as Ts3R) [63,65] and a more recent WGD in the common ancestor of salmonids at approximately 90 Mya, after their speciation with pikes (referred to as Ss4R) [64]. Taking advantage of the known pairs of ohnologous chromosomes resulting from WGD in salmonids [6668], we reconstructed the duplication history of Prdm9 paralogs by combining chromosome location information and phylogenetic inference. The number of Prdm9 paralogs detected per genome ranged from 6 copies in rainbow trout (Oncorhynchus mykiss), huchen (Hucho hucho), and European grayling (Thymallus thymallus), to 14 in lake whitefish (Coregonus clupeaformis). Conversely, we found only 3 copies in northern pike (Esox lucius). These paralogs clustered into 2 main groups that were previously identified as Prdm9ɑ and Prdm9β and originated from the Ts3R WGD [19]. We found 2 additional subgroups among the Prdm9β copies (referred to as β1 and β2) that were conserved in the 12 salmonid species, but only 1 β copy in the outgroups (S1 Fig). The β paralogs contained a complete SET domain (but with mutations at the catalytic tyrosine residues) and a conserved ZF domain, but all lacked the KRAB and SSXRD domains, as previously described [19] (S1 Fig). The ɑ sequences clustered into 2 well-supported groups of paralogs (named ɑ1 and ɑ2) that could be subdivided in 2 groups of duplicated copies (designated as ɑ1.1/ɑ1.2 and ɑ2.1/ɑ2.2; Fig 1A). We found the sequence pairs β1/β2, ɑ1.1/ɑ1.2, and ɑ2.1/ɑ2.2 in 3 Ss4R ohnologous pairs, suggesting that they originated from the salmonid-specific WGD. We observed an additional subdivision within the ɑ1 group, with pairs of copies duplicated in tandem present in each pair of ohnologs (i.e., ɑ1.1 a and b and ɑ1.2 a and b; Fig 1A and S1 Table). These duplicated copies are found in almost all species, often having the same orientation (S2 Fig). Although no phylogenetic signal was associated with the a and b copies, probably due to ectopic recombination and gene conversion, these copies are likely to represent a segmental duplication (SD) that preceded the Ss4R WGD. Thus, at least 2 Prdm9 duplication events (i.e., one leading to ɑ1/ɑ2 and the other to ɑ1.a/ɑ1.b copies) occurred in addition to the WGD-linked duplications. To summarize, our results indicate that Prdm9ɑ and β copies originated from the Ts3R WGD. After the divergence of the Esociformes (pike) and Salmoniformes lineages approximately 115 Myrs ago, the ɑ copy was duplicated on another chromosome, generating ɑ1 and ɑ2 copies. The ɑ1 copy was subsequently duplicated in tandem, producing ɑ1.a and ɑ1.b copies on the same chromosome. Lastly, all these copies were duplicated on ohnologous chromosome pairs following the Ss4R WGD. This consensus evolutionary history was accompanied by gene conversion events and lineage-specific duplications and losses that were not fully identified in our analysis (Fig 1B). Most of these gene copies only contained a subset of the 10 expected exons and/or showed signatures of pseudogenization (stop codons, frameshifts), but we also identified some complete Prdm9 genes, encoding the 4 canonical domains, with conserved catalytic tyrosines in the SET domain and without evidence of pseudogenization (Fig 1). In the ɑ1 clade, we detected on average 3.8 paralogs per genome, but each species retained only 1 full-length copy (corresponding to the ɑ1.a.1 paralog in Thymallus, Oncorhynchus, and Salvelinus, and to the ɑ1.a.2 paralog in the 2 Salmo species), except in C. clupeaformis, where both ɑ1.1 and ɑ1.2 are full length. Conversely, in the ɑ2 clade, we detected a full-length copy in only 2 species (ɑ2.2 in O. mykiss and in Salvelinus namaycush). Therefore, our results support the differential retention of functional Prdm9ɑ1 paralogs between salmonid lineages following the Ss4R WGD.

Fig 1. Prdm9 duplication history in salmonids.

Fig 1

(A) Phylogenetic tree of Prdm9α paralogs in 12 salmonids and northern pike (Esox lucius) as outgroup species. Prdm9β is shown in S1 Fig. The phylogenetic tree was computed on the concatenated 6 exons of the 3 canonical PRDM9 domains KRAB, SSXRD, and SET, with 1,000 bootstrap replicates (values shown). The columns, from left to right, indicate the (i) species name; (ii) annotated paralog copy (in bold: full-length copy without pseudogenization); (iii) Prdm9 copy status. Prdm9α clusters into 2 main groups (α1 and α2) that are divided in 2 subgroups (α1.1/α1.2 and α2.1/α2.2). The scale bar is in unit of substitution per site. The right panel shows the coding potential of each paralog, and indicates the presence of frame-shifting mutations or stop codons, and of substitutions in the catalytic tyrosines of the SET domain (Y276, Y341, and Y357). Canonical (full length) Prdm9 proteins contain 4 key domains: KRAB (encoded by 2 exons), SSXRD (encoded by 1 exon), SET (encoded by 3 exons), and the ZF array (encoded by 1 exon). Complete exons are shown in blue. Missing or truncated exons are shown in pink. Other regions of the protein (upstream of the KRAB domain, and between KRAB and SSXRD) are encoded by additional exons (not shown here), that are not conserved between α1 and α2 clades. Paralogs were classified as “canonical PRDM9” if they contained all exons encoding the 4 key domains, without any frameshift/non-sense mutation (at least up to the first ZF) [NB: some sequences contain frameshifts or non-sense mutations in the ZF array. This leads to a shortened ZF array, but does not necessarily impair the function of PRDM9]. Paralogs were classified as “likely non-functional” if they contained frameshifts or non-sense mutations, or if they missed at least 1 SET exon. Other cases were classified as “truncated.” The 3 last α copies, belonging to O. kisutch, O. tshawytscha, and O. gorbuscha, have lost the 3 domains KRAB, SSXRD, and SET, but have kept their ZF exons, and were therefore added below the phylogenetic tree. The last column indicates the sequence indexes referring to the S1 Table with additional information on the corresponding copy. (B) Consensus history of Prdm9 duplication events in salmonids. After the teleost-specific WGD (Ts3R WDG), the chromosomes of the common ancestor of teleosts were duplicated. Two ohnolog chromosomes arose from the one carrying the ancestral Prdm9 locus: one carrying the Prdm9α copy and the other the Prdm9β copy. GD of the α paralog (referred to as α1) led to the appearance of a new α copy (α2) on another chromosome. The α1 copy (becoming α1.a) then underwent an SD, generating a α1.b copy in tandem on the same chromosome. By this time, the β paralog had lost the KRAB and SSXRD domains. Lastly, the 4 copies were duplicated during the salmonids-specific Ss4R WGD, with the newly formed paralogs (annotated α1.a.2, α1.b.2, α2.2, β2) on ohnolog chromosomes. One full-length copy was retained in each species. The Salmo genus (S. trutta and S. salar) retained the α1.2 copy, whereas all other salmonids retained the α1.1 copy. A second full-length PRDM9 was also retained in C. clupeaformis (α1.2), O. mykiss (α2.2), and S. namaycush (α2.2). Ohnolog chromosomes are represented with similar color shades (i.e., blue, red, and green) and Prdm9 locus in yellow. This global picture of the duplication events in the salmonid history does not show other independent lineage-specific duplication events and losses. The data and codes underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953. GD, gene duplication; WGD, whole genome duplication; ZF, zinc finger.

High PRDM9 ZF array diversity in O. mykiss and S. salar

We analyzed the allelic diversity of the ZF array of the complete PRDM9α copy found in the Atlantic salmon S. salar (α1.a.2) and the rainbow trout O. mykiss (α1.a.1) (Fig 1). We identified 11 PRDM9 ZF alleles in 26 S. salar individuals and 7 alleles in 23 O. mykiss individuals (Fig 2A). The major allele had a frequency of 40% in S. salar and 35% in O. mykiss, and the 4 most frequent alleles had a cumulative frequency >80% in both species (Fig 2B). S. salar and O. mykiss alleles contained 5 to 10 and 7 to 15 ZFs, respectively. In both species, the last ZF of the arrays was probably not functional, because it lacked the conserved histidine involved in the interaction with a zinc ion required to stabilize the finger array (S3 Fig). As seen in other species [38], the 4 positions in contact with DNA (position −1, 2, 3, and 6 of the alpha helix) were highly variable among ZF units (Fig 2C). We characterized the proportion of total amino acid diversity at these DNA-binding residues among all different ZF units identified in each species following [19]. This proportion, which is sensitive to the rapid evolution at DNA-binding sites and to the homogenization at other amino acid positions due to concerted evolution between repeats within the array, was 0.49 in S. salar and 0.55 in O. mykiss (Fig 2C). These values were within the range reported for full-length PRDM9α in vertebrates [19]. The observed high level of allelic diversity and the pattern of amino acid diversity within the ZFs were consistent with the rapid and concerted evolution of the ZF array of the full-length Prdm9 gene that characterizes PRDM9 copies involved in specifying meiotic recombination sites [19,54].

Fig 2. Zinc finger allelic diversity of full-length PRDM9 in S. salar and O. mykiss.

Fig 2

(A) Structure of the identified PRDM9 alleles in S. salar PRDM9 α1.a.2 and O. mykiss PRDM9 α1.a.1. Colored boxes represent unique ZFs, characterized by the 3 amino acids in contact with DNA (3-letter code). Additional variations relative to the reference sequence are indicated in between brackets. The complete ZF amino acid sequences are shown in S3 Fig. (B) Frequencies of the alleles displayed in panel A among the 26 S. salar and 23 O. mykiss individuals in which Prdm9 was genotyped. (C) Distribution of amino acid diversity among all unique ZFs found in the alleles shown in panel A, following a previously described methodology [19]. The amino acid diversity is plotted as a function of the amino acid position in the ZF array, from position 1 to position 28 (first and last residues) of a ZF. The ratio of amino acid diversity at the DNA-binding residues of the ZF array (−1, 2, 3, and 6), indicated as r, is shown in the upper box of each panel. The data underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953 and in S7 Table. ZF, zinc finger.

In addition to the full-length α1 copy, we observed that the α2.2 paralog is also strongly expressed in testes, both in Oncorhynchus and in Salmo genera. This paralog is full length in O. mykiss and S. namaycush, but in all other salmonids the KRAB domain of α2.2 is missing, or pseudogenized (Figs 1 and S4). This phylogenetic pattern implies that α2.2 lost its KRAB domain several times independently, in different lineages. In S. salar, the allelic diversity of the ZF array in the truncated Prdm9α2.2 was very low: in the 20 individuals analyzed, we observed 1 single allele where the array had 5 ZF units (S3 and S5A Figs). This is consistent with the hypothesis that KRAB-less PRDM9 homologs lost the capacity to trigger recombination hotspots, and therefore, are no longer subject to the Red Queen dynamics [19]. The proportion of amino acid diversity at DNA-binding residues is relatively high among the 5 ZFs of this unique PRDM9 α2.2 allele (r = 0.471). The persistence of this signature of positive selection suggests that the functional shift associated with the loss of the KRAB domain is relatively recent. In O. mykiss, where PRDM9 α2.2 is full length, we identified 5 α2.2 alleles in 20 individuals, with 6 to 12 ZFs (S5A and S5B Fig). Some ZFs lost 1 amino acid, with unknown consequence on their DNA binding capacity (S3 Fig). In O. mykiss, the proportion of amino acid diversity at DNA-binding residues was lower in PRDM9 α2.2 (r = 0.367, S5C Fig) than in PRDM9 α1.a.1 (r = 0.552, Fig 2C). This observation, together with the relatively limited allelic diversity, suggests that O. mykiss PRDM9 α2.2 is no longer subject to the Red Queen dynamics, and hence that it has lost its function of directing recombination, like the KRAB-less α2.2 paralogs in other salmonids.

PRDM9 specifies meiotic DSB hotspots in O. mykiss

To directly assess whether the full-length PRDM9α copy (hereafter PRDM9 unless otherwise specified) determines the localization of DSB hotspots, we investigated the genome-wide distribution of DMC1-bound ssDNA in O. mykiss testes by DMC1-SSDS (Fig 3A). DMC1 is a meiosis-specific recombinase that binds to ssDNA 3′ tails resulting from DSB resection. Therefore, meiotic DSB hotspots can be mapped by identifying fragment-enriched regions (i.e., peaks) in DMC1-SSDS data [30,33,69]. We detected several hundred peaks in the 3 rainbow trout individuals analyzed by DMC1-SSDS (616 peaks in TAC-1, 209 in TAC-3, and 1924 in RT-52). Differences in peak number may result from inter-sample differences in cell composition related to the testis developmental stage (see S1 Methods). In all 3 individuals, the DMC1-SSDS signal at DSB hotspots displayed a characteristic asymmetric pattern in which forward and reverse strand reads were shifted toward the left and the right of the hotspot center, respectively. This confirmed that the DMC1-SSDS peaks detected in rainbow trout were genuine meiotic DSB hotspots [30] (S6A Fig). The average width of DMC1-SSDS peaks was 1.5 to 2.5 kb, which is similar to what described in mice and humans [30,33]. The DSB hotspot density increased towards the chromosome ends, indicating that the U-shaped distribution of COs classically observed in male salmonids [70] is the result, at least in part, of a mechanism controlling DSB formation (S7A Fig).

Fig 3. Meiotic DSB hotspots are specified by full length PRDM9 in O. mykiss.

Fig 3

(A) DSB hotspots detected by DMC1-SSDS (DMC1), H3K4me3 and H3K36me3 in selected regions of the O. mykiss genome in testes from 2 or 3 (DMC1) individuals. (B) Average profile of H3K4me3 (red) and H3K36me3 (blue) ChIP-seq signal in TAC-1 (Prdm91/5) and TAC-3 (Prdm92/6) testes, at DSB hotspots detected in TAC-1 (Prdm91/5), TAC-3 (Prdm92/6), and RT-52 (Prdm91/2). (C) On top, the PRDM9 allele 1 (E = 5.1e-37) and allele 2 motifs (E = 1.2e-63) discovered in allele 1 (n = 300) and allele 2 DSB sites (n = 254) are shown. Below, the plots depict the distribution of hits for the PRDM9 allele 1 (left) and allele 2 (right) motifs at allele 1 and allele 2 DSB sites from the center of the sequence up to 2.5 kb of distance. The signal is smoothed by weighted moving average, and hits were calculated in a 250 bp sliding window. (D) Violin plot showing the distribution of DSB hotspots from TAC-1 (magenta), TAC-3 (green), and RT-52 (blue) relative to the TSS from RefSeq annotated genes. The data and codes underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953 and https://zenodo.org/records/14198863. ChIP, chromatin immunoprecipitation; DSB, double-strand break; TSS, transcription start site.

Then, we tested whether the DSB hotspot formation was PRDM9-dependent by assessing the hotspot association with (i) specific Prdm9 alleles; and (ii) sites enriched for both H3K4me3 and H3K36me3 due to PRDM9 methyltransferase activity [38,71]. The 3 individuals analyzed (only TAC-1 and TAC-3 for histone modifications) carried a functional Prdm9 (i.e., Prdm9α1.a.1) with different genotypes. TAC-1 (Prdm91/5) and TAC-3 (Prdm92/6) did not share any Prdm9 allele, whereas RT-52 (Prdm91/2) shared 1 allele with each of them. In line with the hypothesis that PRDM9 specifies DSB hotspots, some DMC1-SSDS peaks were common to RT-52 and either TAC-1 or TAC-3 (see Fig 3A for examples). Specifically, the overlap between TAC-1 and RT-52 DSB hotspots (167 of the 616 TAC-1 hotspots, 27%), and between TAC-3 and RT-52 DSB hotspots (42 of the 209 TAC-1 hotspots, 20%) was substantial, whereas only 2 hotspots were shared by all 3 individuals (S6B Fig). The 55 DMC1-SSDS peaks shared by TAC-1 and TAC-3 may be artifactual because the forward and reverse strand enrichment distribution did not follow the typical asymmetric pattern of DSB hotspots, in contrast to the overlapping hotspots between TAC-1 and RT-52 and between TAC-3 and RT-52 (S7B Fig). The histone modifications H3K4me3 and H3K36me3 usually do not colocalize at the same loci because H3K4me3 is enriched at promoters and other genomic functional elements, whereas H3K36me3 is enriched within gene bodies. Indeed, at the peaks of H3K4me3 detected in brain tissue where Prdm9 is not expressed, no H3K36me3 enrichment was detected (S8A Fig). However, at the DSB hotspots mapped in TAC-1, an enrichment for H3K4me3 and H3K36me3 was detected in testis chromatin from TAC-1 but not from TAC-3 (Fig 3B, left panels) and reciprocally for the DSB hotspots mapped in TAC-3 (Fig 3B, central panels, S8B and S8C Fig). These observations are coherent with the PRDM9-dependent deposition of these histone modifications as TAC-1 and TAC-3 carry distinct Prdm9 alleles. At the hotspots mapped in RT-52, an enrichment for H3K4me3 and H3K36me3 was detected in testis chromatin from TAC-1 or from TAC-3 (Fig 3B, right panels, S8B and S8C Fig) which is consistent with the presence of common Prdm9 alleles between RT-52 and TAC-1 and between RT-52 and TAC-3. In addition, the RT-52 hotspots overlapping with TAC-1 are expected to be distinct from those overlapping with TAC-3, and specified by the Prdm91 and Prdm92 alleles respectively. Indeed, the majority of RT-52 DSB hotspots were enriched for H3K4me3 either in testis chromatin from TAC-1 or in TAC-3, but not in both (S9B Fig). A similar effect for H3K36me3 could not be concluded due to the high level of PRDM9-independent H3K36me3 at a fraction of the sites (S9A and S9B Fig).

Population genomic landscapes of recombination

The DMC1-SSDS approach allows analyzing DSB distribution in a given male individual, but is thus restricted to one sex and does not provide information on the outcome of recombination events (CO or NCO). To get a more general picture of the genome-wide recombination landscapes and their evolution, we computed LD-based genetic maps in 3 salmonid species: coho salmon (O. kisutch), rainbow trout (O. mykiss), and Atlantic salmon (S. salar). In S. salar, we analyzed 3 populations: North Sea (NS), Barents Sea (BS), and Gaspesie Peninsula (GP). For comparison, we also reconstructed the LD-based recombination map of European sea bass (Dicentrarchus labrax) that carries the KRAB-less Prdm9β gene, but lacks a full-length Prdm9ɑ.

The population-scaled recombination landscapes showed consistent broad-scale characteristics between O. kisutch, O. mykiss, and the 3 S. salar populations. The genome-wide population recombination rate ranged from 0.0032 (in units of = 4Ner per bp) in O. kisutch to 0.012 in O. mykiss, with intermediate values in S. salar populations (Table 1). At the intra-chromosomal level, 100 kb smoothed recombination landscapes showed a general increase towards the chromosome ends, up to a 6-fold increase in S. salar (S10 Fig). This U-shape pattern mirrored the chromosomal distribution of DSB hotspots in male rainbow trout (S7A Fig).

Table 1. Summary of fine-scale recombination rate variations in 2 kb windows, hotspot detection, effective population size (Ne), recombination rate obtained from pedigree-based sex-averaged genetic maps [67,7577], and recombination to mutation rate ratio for populations of O. kisutch, O. mykiss, S. salar (only the NS population is shown), and D. labrax.

Ne ranges were estimated based on the mean nucleotide diversity measured in population resequencing datasets and mutation rates reported in fish and human (see Methods for details). μ/r ratio ranges were calculated using r obtained from pedigree-based genetic maps.

O. kisutch O. mykiss S. salar D. labrax
Genome wide recombination rate (/bp) 0.0032 0.012 0.0085 0.039
Cumulative amount of recombination in the 20% most recombining regions 90.1% 89.1% 98.1% 84.6%
Number of hotspots 22,948 21,145 17,064 7,897
Fraction of recombination in hotspots 36.7% 19.3% 18.3% 26.5%
Fraction of the genome occupied by hotspots 2.7% 2.1% 1.3% 1.9%
Hotspot density (per Mb) 13.6 10.8 6.8 9.6
N e [28,220–141,99] [80,302–401,512] [18,51–90,254] [22,600–113,000]
r (in cM/Mb) 2.24 1.31 1.99 2.77
μ/r [0.09–0.45] [0.15–0.76] [0.1–0.5] [0.07–0.36]

The fine-scale analysis of the genomic landscapes also showed highly heterogeneous recombination rates within 2 kb windows (Table 1 and S11 Fig). In each population, the local variation in recombination rate was of several orders of magnitude (S2 Table). On average, 90% of the total recombination appeared to be concentrated in 20% of the genome, a higher rate than what was observed in human and chimpanzee [29,72] and slightly higher than what we observed in sea bass (Tables 1 and S2 and S12 Fig). This heterogeneity was largely driven by the presence of recombination hotspots. Based on the raw LD maps reconstructed at each SNP interval, we confirmed that the size of most (>80% on average) salmonid hotspots was <2 kb (S13 Fig and S2 Table). Therefore, we performed the rest of our analysis using the hotspots called within 2 kb windows. The total number of called hotspots per species ranged from 17,064 in S. salar to 22,948 in O. kisutch, with hotspot density values similar to those in sea bass and also humans, mice, and snakes [60,72,73]. The proportion of total recombination cumulated in hotspots ranged from 17% in S. salar to 36% in O. kisutch, while occupying less than 3% of the genome (Table 1).

Then, we compared the LD-based recombination landscape of O. mykiss and the location of DSB hotspots mapped by DMC1-SSDS (pooling peaks from the 3 samples). We found that 6.7% of DMC1-SSDS peaks overlapped with the LD-based hotspots, which is more than expected by chance (S14A and S14B Fig). This weak overlap was comparable with that observed in Mus musculus castaneus where 12% of DSB hotspots overlap with LD-based hotspots [74]. We also found that in these shared peaks, population recombination rates were significantly higher than in non-shared LD-based or DSB hotspots and the rest of the background landscape (Kruskal–Wallis test p-value <0.05, Wilcoxon post hoc test < 0.05, S14C Fig).

Recombination hotspots are located away from TSSs

In species that lack full-length PRDM9, recombination hotspots are expected to be located in open-chromatin regions, such as unmethylated CGI-associated promoters and/or constitutive H3K4me3 sites [18,19,22,25,27], unlike in species like mice, where PRDM9 targets regions away from these genetic elements [30]. To test whether PRDM9ɑ plays a similar role in salmonids, we first examined how DSB hotspots were distributed relative to TSSs in rainbow trout. We found that the percentage of DSB hotspots overlapping with TSSs was either not different or lower than expected by chance (4.5% and 5.3% versus 7.6% for TSSs of coding and non-coding genes; S3 Table and S6C Fig). Moreover, the vast majority of DSB hotspots mapped several kb or more away from the closest TSS (Fig 3D). Therefore, DSB hotspots, at least those strong enough to be detected by our DMC1-SSDS assay, did not localize at TSSs.

We then examined how population recombination rates were distributed relative to TSSs that overlapped or not with CGIs, by comparing the 3 salmonid species to sea bass that only has a truncated PRDM9β protein. Although the criteria classically used to predict CGIs in mammals and birds are not appropriate for teleost fish where CGIs are CpG-rich but have a low GC-content [78,79], we could predict TSS-associated CGIs in fish genomes simply based on their CpG content (see S1 Analysis). Sea bass (truncated PRDM9β) showed a high level of recombination at promoter regions, with a strong 3-fold enrichment of recombination at TSSs associated with CGIs (Fig 4A), as reported in birds [25]. Conversely, in salmonid species (full-length PRDM9), recombination rate varied little between TSSs and their flanking regions (at most 1.2-fold enrichment). Specifically, at CGI-associated TSSs, recombination rate tended to be lower than at other TSSs (Fig 4A and 4B). Moreover, hotspots overlapping with TSS represented <5% of all hotspots in the 3 salmonid populations and up to 21% in sea bass (Fig 4C). The analysis of other genomic features showed little variation in recombination rate and hotspot density, with similar levels in genes, introns, exons, TEs, and CGIs compared with intergenic regions (Fig 4B and 4C). We observed only a very small increase in recombination rate at TSSs that did not overlap with CGIs and TESs in O. kisutch and O. mykiss. Therefore, our results indicated that salmonid recombination events do not concentrate at promoter-like features overlapping with CGIs, as already shown in primates and in the mouse.

Fig 4. Recombination rates at genomic features.

Fig 4

The recombination rates at different genomic features are shown for O. kisutch, O. mykiss, and S. salar (NS population), and compared to those of sea bass (D. labrax) that lacks a full-length PRDM9 copy. (A) Fold recombination rates (scaled to the average recombination rate at 50 kb from the nearest feature) according to the distance to the nearest TSS (overlapping or not with a CGI). (B) Fold recombination rates (scaled to the average recombination rates in intergenic regions) at the indicated genomic features. (C) Hotspot density at the indicated genomic features. TSS in and out CGI are shown in purple and blue, respectively. The data and codes underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953. NS, North Sea; TSS, transcription start site.

We also examined other genomic correlates and features that might influence population recombination rate variation at different levels of resolution. As expected from the joint effect of the local effective population size (Ne) on both nucleotide diversity and population recombination rate, SNP density was positively correlated with the averaged at the 100 kb scale, although this trend was not significant in O. mykiss (S15A Fig). More locally, we also observed an increase in SNP density in the 10 kb surrounding recombination hotspots (S16A Fig). These positive relationships could be amplified by a direct mutagenic effect of recombination during DSB repair, and a more pronounced erosion of neutral diversity in low-recombining regions due to linked selection [29,33,8082]. However, we cannot exclude the possibility that the accuracy of the recombination rate estimate depends on SNP density [83], leading to possible confounding effects.

In mammals, GC-biased gene conversion causes an increase in GC-content at recombination hotspots [33,40,84,85]. Conversely, in all the 5 salmonid populations analyzed, except the GP population, GC content tended to decrease close to hotspots (S16B Fig). At a larger scale (i.e., 100 kb), we observed significant positive correlations between GC content and recombination rates (S15B Fig). However, these correlations were very weak, suggesting that GC-biased gene conversion has a very small impact in salmonids compared with mammals and birds [85].

The salmonid genomes contain a high density of TEs (covering approximately 50% of the genome), among which Tc1-mariner is the most abundant superfamily (>10% of TEs) [66]. It is not known whether Tc1-mariner transposons influence the estimation of recombination rates. Our TE analysis identified between 47.37% and 52.26% of interspersed repeats in O. kisutch and S. salar, respectively, and showed that 12.48% to 14.7% of the genome was occupied by Tc1-mariner elements (S4 Table). TEs and intergenic regions showed similar average recombination rates and hotspot density (Fig 4B and 4C). Recombination rates tended to slightly increase with TE density at the larger scale, except in O. mykiss for which we observed the opposite relation (S15C Fig), without any strong effect of the TE superfamilies (S17 Fig). As recombination rates and hotspot density at TEs were globally comparable to those at intergenic regions (Fig 4B and 4C), TEs and among them Tc1-mariner elements did not seem to be characterized by extreme recombination values that may have affected our recombination rate estimations.

Lastly, residual tetrasomy resulting from the salmonid WGD event at approximately 90 Mya (Ss4R) [64,66] is observed at several chromosome regions characterized by increased genomic similarity between ohnologs. This could also affect the inference of LD-based recombination rates. Such regions have been identified in O. kisutch, O. mykiss, and S. salar [6668]. We tried to filter non-diploid allelic variation from chromosomes showing residual tetrasomy, and we also controlled their effect by comparing their recombination patterns with those of fully re-diploidized chromosomes. Overall, we found <2-fold increase of the mean recombination rate in chromosomes containing tetraploid regions (S18A Fig). This was mostly explained by the local increase towards the end of chromosomes with residual tetraploidy compared with fully re-diploidized chromosomes, an effect that was especially pronounced in O. mykiss (S18B Fig). Nevertheless, recombination rates behaved similarly in function of the distance to the nearest promoter-like feature in the 2 chromosome sets, and rate variations were similar between genomic features (S19 Fig). Overall, chromosomes containing regions with residual tetraploidy and re-diploidized chromosomes showed similar recombination patterns.

Rapid evolution of recombination landscapes

Another key feature of the mammalian system is the rapid evolution of PRDM9-directed recombination landscapes due to self-induced erosion of its binding DNA motif and rapid PRDM9 ZF evolution [29,32,34]. To determine whether this feature was present also in salmonids, we compared the location of recombination hotspots in the 2 Oncorhynchus species and in 2 geographical lineages and 2 closely related populations of S. salar. We estimated that only 6.2% of hotspots (n = 1,298) were shared by O. kisutch and O. mykiss, which diverged from their common ancestor about 16 Myr ago [86]. Although this value was significantly higher than expected by chance (S20A Fig), there was almost no increase in recombination rate at the orthologous positions of hotspots in the 2 species (Fig 5A). Similarly, the 2 genetically differentiated lineages of S. salar only shared 10.3% (GP versus BS, FST = 0.26, n = 1,793) and 11.2% (GP versus NS, FST = 0.28, n = 1,671) of their hotspots, with a weak recombination rate increase at the alternate lineage hotspots (Figs 5B, S20B and S20C). Conversely, the 2 closely related BS and NS S. salar populations (FST = 0.02) shared 26.3% of their hotspots (n = 4,421), which was much more than expected by chance (Figs 5C and S20D). In addition, recombination rate at NS hotspots in the BS population showed a 5-fold increase (and reciprocally), reflecting high correlation between BS and NS recombination landscapes (Spearman’s rank coefficient >0.7, p-value <0.05; S21 Fig). Overall, these analyses revealed a rapid evolution of hotspot localization between species and also between geographical lineages of the same species. Only closely related populations shared a substantial fraction of their hotspots. This overlap probably reflects their similar genetic background (low FST), and in particular, the fact that they may share similar sets of Prdm9 alleles recognizing common binding DNA motifs.

Fig 5. Recombination hotspots shared between populations and motif enrichment.

Fig 5

In panels (A–C), the Venn diagrams (left) show the percentages of recombination hotspots shared between pairs of taxa, and the graphs (middle and right) show the recombination rates around hotspots and at orthologous loci in the 2 taxa, for the 2 Oncorhynchus species (A), the American (GP population) and European (BS and NS populations) S. salar lineages (B), and between the 2 closely related European S. salar populations (BS and NS) (C). The percentage of shared hotspots was calculated using the number of hotspots in the population with fewer hotspots as the denominator. (D) Motif found enriched in the hotspots identified in the European populations of S. salar (BS and NS). The Venn diagram shows the percentages of population-specific and shared hotspots where the motif was found. (E) Mean recombination rate at shared hotspots (between the BS and NS populations) that harbor (n = 936 hotspots) or not (n = 3,485 hotspots) the detected motif. The recombination rate was significantly higher at hotspots with the motif (Student’s t test p-value <0.05). (F) Motif erosion in the European S. salar populations. The vertical line represents the observed difference in the occurrence of the motif in panel D between the American and European lineages. The null distribution (in gray) shows the difference for 100 random permutations of the motif. The data and codes underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953. BS, Barents Sea; GP, Gaspesie Peninsula; NS, North Sea.

Motifs enriched at hotspots show signs of erosion

A landmark of the PRDM9-dependent hotspots identified in mammals is the presence of DNA motifs, as a consequence of the sequence-specificity of the PRDM9 ZF domain [32]. Therefore, we investigated the presence of PRDM9 allele-specific DNA motifs enriched at hotspots in salmonids. We first searched for potential PRDM9 binding motifs in rainbow trout, focusing on RT-52 (Prdm91/2) DSB-based hotspots. As the Prdm91 allele is present in RT-52 and TAC-1 (Prdm91/5), we defined a subset of RT-52 DSB hotspots presumably specified by PRDM91, based on their overlap with H3K4me3/H3K36me3 peaks in TAC-1 (n = 300). Similarly, we defined a subset of DSB hotspots enriched in putative targets of the PRDM92 allele, which is present also in TAC-3 (Prdm92/6) (n = 254). We identified 2 consensus motifs: one strongly enriched in PRDM91 DSB hotspots and the other in PRDM92 DSB hotspots (Fig 3C). Consistent with the Prdm9 genotypes of the 3 rainbow trout samples, both motifs were enriched at RT-52 DSB hotspots. The PRDM91 motif was also enriched in TAC-1 DSB hotspots and the PRDM92 motif in TAC-3 DSB hotspots (S22 Fig). Moreover, the PRDM91 motif was co-centered with DSB hotspots only in RT-52 (Fisher’s test, p = 8.5 × 10−196) and TAC-1 (p = 7.7 × 10−27), while the PRDM92 motif was co-centered with DSB hotspots in RT-52 (p = 3.3 × 10−97) and TAC-3 (p = 1.7 × 10−5) (S23 Fig). These 2 consensus motifs were also significantly enriched at LD-based hotspots (S22 Fig). Particularly, the motif targeted by PRDM91 was enriched at the center of LD-based hotspots (S23 Fig), suggesting that this allele (or closely related alleles that recognize similar DNA sequences) has been quite frequent during the recent history of the wild population under study.

As PRDM9-binding DNA motifs are allele specific, the sharing of Prdm9 alleles between populations should lead to shared motif enrichment at shared LD- based hotspots. Therefore, we looked for enrichment of potential 10 to 20 bp motifs in the population-specific and shared hotspots of the 3 S. salar populations. Of note, as LD-based hotspots reflect the population-scaled recombination rate, they may result from the activity of multiple PRDM9 variants that can hinder the discovery of targeted motifs. Nevertheless, after filtering candidate motifs (S24A Fig), we found a motif that was enriched in 12% of the hotspots of the NS population and 8.9% of the BS population, and in 15.6% of their shared hotspots (Fig 5D). Overall, the recombination rates at hotspots overlapping with this 12 bp motif were significantly higher than those at other hotspots (Student’s t test p-value <0.05; Figs 5E, S24B and S24C). This suggests that the detected motif is targeted by a frequent PRDM9 variant shared by the 2 closely related NS and BS populations, possibly originating from their common ancestral variation.

PRDM9-associated hotspot motifs undergo erosion in mammals due to biased gene conversion [32,39,40]. Therefore, we tested whether the identified 12 bp motif showed signs of erosion in European S. salar populations. By comparing the number of motifs present in the available long-read genome assemblies from 7 European and 5 North American Atlantic salmon genomes (accession numbers in S1 Methods), we found a 2.97% reduction in the mean number of motifs in the European genomes (mean Europe = 3,230 versus mean North America = 3,329). This level of erosion was significant and not explained by differences in assembly sizes, as revealed by count comparisons on collinear blocks, obtained following 100 random permutations of the motif (Fig 5F). Therefore, the enriched motif shared by the NS and BS populations was partially eroded in the European lineage, as predicted by the Red Queen model of PRDM9 evolution.

Discussion

To determine whether the PRDM9 functions characterized in humans and mice are shared by other animal clades or whether they correspond to derived traits, we investigated the evolution and function of full-length Prdm9 in salmonids using phylogenetic, molecular, and population genomic approaches. These analyses allowed us to determine the evolutionary history of Prdm9 GD and loss, the diversity of the PRDM9 ZF array, the historical sex-averaged recombination map in several populations, the locations of meiotic DSB sites in spermatocytes, their chromatin environment, and the presence of conserved motifs and their erosion. Collectively, these analyses led us to conclude that PRDM9 triggers recombination hotspot activity in salmonids through a mechanism similar to that described in mammals.

PRDM9 specifies recombination sites in salmonids

Our conclusion is based on several pieces of evidence. First, we showed in O. mykiss that DSB hotspots, detected by DMC1-SSDS, are enriched for both H3K4me3 and H3K36me3. We provide evidence that hotspot localization is determined by PRDM9 ZFs because the location of DSB hotspots and the associated H3K4me3 and H3K36me3 modifications varied in function of the Prdm9 ZF alleles present in the tested individuals (Fig 3A and 3B). Consistent with this interpretation, we identified DNA motifs enriched at DSB sites. Thus, in salmonids, PRDM9 retained its DNA binding and methyltransferase activities and the capacity to attract the recombination machinery at its binding sites. Comparison of DSB hotspots detected by DMC1-SSDS with LD-based CO hotspots in O. mykiss showed a limited, but significant overlap (S14 Fig). One should note that the quantitative level of DMC1 enrichment assayed by DMC1-SSDS can be influenced by the efficiency of DSB repair. If hotspots have variable efficiencies of repair, the quantitative correlation between DSB and LD hotspots could therefore be reduced. We also identified a DNA motif enriched at DSB hotspots targeted by Prdm91 that was also enriched at the center of strong CO hotspots detected in the LD-based recombination map (S23 Fig). The overlap between hotspots is compatible with the presence of a common Prdm9 allele(s) between the individuals tested and the prevalent Prdm9 allele(s) during the history of the populations analyzed. However, as the population-scaled recombination landscape in O. mykiss has been shaped by a diversity of alleles, not necessarily represented in the 3 studied individuals, the overall hotspot overlap was low. Similar variations in the recombination landscapes driven by multiple PRDM9 alleles have been described in mouse, chimpanzee, and human populations [29,35,36,74,8789]. In the mouse, PRDM9 can suppress the recombination activity at chromatin accessible regions [30]. Here, we observe that in salmonids the presence of PRDM9-dependent hotspots is correlated with a lack of elevated recombination rate at regulatory regions (CGI, TSS, or TES) (Figs 3D and 4, and S6). We suggest that this may reflect an active suppression or competition between the 2 types of hotspots, similarly to what has been observed in mice [30], since TSS and CGIs have elevated recombination rates in D. labrax (Fig 4).

Comparison of recombination landscapes between vertebrates with or without a full-length PRDM9

In the absence of PRDM9, hotspots occur in accessible regions of the chromatin such as promoters, enhancers, or other regulatory regions and CGIs [18,19,22,25,30,57,58]. In addition to the change in distribution, differences in hotspot number have been detected between PRDM9-dependent and -independent contexts. By DMC1-SSDS, a greater number of hotspots was detected in Prdm9KO mice or rats [30,58]. However, this should be interpreted with caution as hotspot detection also depends on the half-life of DMC1 at DSB sites. A longer half-life of DMC1 in Prdm9KO may also account for an increase of detected hotspots. By LD-based approach, the number of hotspots detected is 2 to 3 times higher in the 3 salmonid species than in D. labrax (Table 1), but this difference is mainly explained by their larger genome sizes (1.7 to 2.5 Gb, compared to 0.6 Gb for D. labrax). To get a broader view of the impact of PRDM9 on vertebrate fine-scale recombination landscapes, we combined our data with previously published LD-based maps, thus resulting in a dataset of 18 species (4 birds, 7 mammals, 1 snake, and 6 teleost fish; 10 species with a full-length PRDM9, and 8 without; S5 Table). On average, the hotspot density is about 2 times higher in genomes with a full-length PRDM9 (8.6 hotspots/Mb) than without (4.4 hotspots/Mb; t test p-value = 0.056). However, it is difficult to directly compare these numbers, because different studies used different criteria to define hotspots. To get a more comparable estimator of the heterogeneity of recombination landscapes, we measured the fraction of recombination events occurring in the 20% of the genome with the highest recombination rate. On average, in genomes with a full-length PRDM9, 84% of recombination is concentrated in 20% of the genome, compared to 70% in genomes without (t test p-value = 0.011). Data from more species would be necessary to control for phylogenetic inertia. However, this preliminary observation suggests that recombination is more concentrated into hotspots in species having a full-length PRDM9. Of note, the LD-based approach measures the population-scaled recombination rate, integrated over many generations, and hence is expected to reflect the historical diversity of PRDM9 alleles. It is therefore likely that in species with PRDM9, the recombination landscapes of individuals are even more heterogeneous than what can be measured by the LD-based approach.

In addition to the localization of recombination shaped by PRDM9, we detected a higher recombination activity at telomere-proximal regions when measuring DSB activity and LD, consistent with the recombination activity measured in S. salar pedigree-based linkage maps [70]. We infer that this effect is PRDM9-independent because the putative PRDM9 motifs in O. mykiss (derived from DMC1-SSDS) and S. salar (derived from LD-based hotspots) did not show such biased distribution (S25 Fig). Of note, the increase in recombination rate towards telomeres is more pronounced in the 3 salmonids (3- to 6-fold) than in D. labrax (about 2-fold), but it is of the same order as in another teleost fish, the three-spined stickleback [61], which only has a truncated KRAB-less PRDM9ß (S10 Fig). We hypothesize that in salmonids, some additional factor(s) might modulate PRDM9 binding or any other step required for DSB activity along chromosomes. This telomere-proximal effect appears to be a conserved property, but of variable strength between sexes and among species, independently of the presence/absence of Prdm9 [25,90,91].

PRDM9 evolutionary instability

Similarly to the pattern reported in mammals [32,92,93], we found an outstanding diversity of PRDM9 ZF alleles in O. mykiss and S. salar and signatures of positive selection for ZF residues that interact with DNA, specifically in the full-length PRDM9 paralog (α1.a.1 and α1.a.2, respectively) (Fig 2). This suggests that full-length PRDM9 in salmonids could be involved in a Red Queen-like process, as documented in mammals, whereby the ZF sequence responds to a selective pressure arising from the erosion of PRDM9 binding motifs [39,40,4750]. Consistent with this hypothesis, we found almost no overlap of LD hotspots in the 2 Oncorhynchus species we compared. A similar comparison performed in 3 S. salar populations revealed that the percentage of shared hotspots decreased with the increasing genetic divergence (Fig 5). The 26.3% overlap in hotspot activity we detected in the 2 Norwegian populations could reflect the existence of shared Prdm9 alleles. On the other hand, the European and Northern American salmon populations, which belong to 2 divergent lineages, may not share the same Prdm9 alleles and as a possible consequence, only have 10.5% of common hotspots. Such patterns of population-specific hotspots and partial overlaps have been observed also in mouse populations [35], great apes [94], and humans [95]. However, hotspot overlapping is always well below the 73% of shared hotspots between zebra finch and long-tailed finch that do not carry Prdm9 [25]. Further support for a Prdm9 intra-genomic Red Queen process in Atlantic salmon came from the detection of an enriched motif in 20% of the hotspots shared by the NS and BS populations. As this motif is likely to be the target of an active Prdm9 allele in European populations, the average 3% decrease in total copy number in European populations compared with North American populations is indicative of ongoing motif erosion.

Functional divergence of PRDM9 paralogs

Another intriguing pattern revealed by our study is the complex duplication history of the Prdm9 gene in salmonids, shaped by WGD events and by gene and/or SDs. Some of these duplications led to functional innovations. Notably, the 2 major PRDM9 clades (α and ß) resulted from the Ts3R WGD in the ancestor of teleost fish [19]. PRDM9-ß lacks the KRAB and SSXRD domains, and is mutated at the catalytic residues of its SET domain [19]. The function of PRDM9ß has not been characterized, but the fact that this protein is strongly conserved across teleost fish, including salmonids that have a full-length PRDM9α (S1 Fig), implies (i) that it is functional; and (ii) that its function is not redundant with that of the canonical full-length PRDM9. Interestingly, the salmonid-specific WGD generated 2 PRDM9ß paralogs (ß1 and ß2) that are well conserved across all salmonids (S1 Fig), which indicates that they both are under purifying selection.

In contrast to the conservation of PRDM9ß paralogs, the duplications of PRDM9α genes led to many copies that are truncated or show evidence of pseudogenization (Fig 1). The first event of GD generated the α1 and α2 clades. All salmonids (12/12) have one full-length copy in the PRDM9α1 clade (in the subclade α1.1 in some species, α1.2 in others, except C. clupeaformis that has retained both α1.1 and α1.2). Conversely, only 2 species have retained a full-length PRDM9α2 paralog (α2.2 in O. mykiss and S. namaycush). In all other species, the KRAB domain of α2.2 is missing or pseudogenized (Fig 1). The analysis of published RNAseq data sets showed that PRDM9α2.2 is expressed at high level in testis, both in O. mykiss (where it is full length) and in O. kisutch and S. salar (where it is truncated). The SET domain of α2.2 contains the 3 conserved tyrosine residues important for methyltransferase catalytic activity [96] (Figs 1 and S4). But PRDM9α2.2 genes show little diversity at their ZF domain, which suggests that unlike full-length PRDM9α1, it is probably not involved in directing recombination. It is possible that those paralogs contribute together with the full-length PRDM9 to hotspot activity. For example, they may have retained putative protein interaction properties through the SSXRD domain or some zinc fingers, and may also be able to oligomerize with PRDM9 as proposed for mouse PRDM9 [97]. It is also possible that they have no function in hotspot activity, but play a role in the regulation of gene expression as some members of the PRDM protein family do [98].

The differential retention of α1 paralogs between salmonid genera suggests that 2 functional Prdm9ɑ1 copies have coexisted in the common ancestor to Salmo and the (Coregonus, Thymallus, Oncorhynchus, and Salvelinus) group (Fig 1). This might also be the case in primates where the pair of paralogs formed by Prdm7 and Prdm9 shares orthology with one ancestral copy in rodents [99]. It has been shown that changes in Prdm9 gene dosage affect fertility in mice [100,101], suggesting that PRDM9 protein level may be limiting in some contexts. Theoretical models also predict that the loss of fitness induced by the erosion of PRDM9 targets could be compensated by increased gene dosage [47,50]. Thus, the duplication of a Prdm9 allele might be temporarily advantageous when the amount of its target motifs starts to become too low in the genome. However, this benefit is expected to be only transient. This could explain why most (11/12) of the salmonid genomes analyzed contained a single full-length, non-pseudogenized copy of Prdm9ɑ1. The succession of duplications and losses reported here in salmonids and previously described in mammals contributes to the apparent instability of Prdm9 at the macro-evolutionary timescale.

The reinforced PRDM9 paradox

This study uncovers a remarkable similarity in the recombination landscape regulation between salmonids and mammals. The main conclusion is that the function of PRDM9 in specifying recombination sites most likely existed in the common ancestor to vertebrates, and might be even older. Certainly, it is not a mammalian oddity. Impressively, the ultra-fast Red Queen-driven evolution of Prdm9 and its binding motifs has been around for more than 400 My, in several vertebrate lineages [93]. This implies many thousands of amino acid substitutions per site in the ZF array [93]. Our results highlight the many open questions about this remarkable system, particularly the question of its long-term maintenance, which is now demonstrated. Prdm9 can evidently be lost, for instance in birds and canids. Its continuous presence in most mammals, snakes, salmonids, and presumably many other taxa might be partly explained by the molecular mechanisms of PRDM9-dependent and PRDM9-independent recombination. The net output of these 2 processes is the same: CO formation. However, there may be differences in the kinetics or efficiency of DNA DSB formation and repair and thus in the robustness of CO control. This is suggested by the PRDM9-dependent recruitment of ZCWPW1, a protein that facilitates DNA DSB repair [102104], and by the coevolution of Prdm9 with other genes involved in DNA DSB repair and CO formation, such as Zcwpw2, Tex15, and Fbxo47 [56]. Of note, Zcwpw1, Zcwpw2, and Tex15 are present and intact in the 3 species that contain a full-length Prdm9 (S. salar, O. mykiss, and O. kisutch), but are absent from the genome of D. labrax (S6 Table). If PRDM9 activity is linked to other molecular processes, its loss without loss of fertility may require several mutational events. Interestingly, an intermediate context, suggesting a reduction of PRDM9 activity, has been observed in the corn snake Pantherophis guttatus. Specifically, Hoge and colleagues [59] reported elevated recombination rates at PRDM9 binding sites and promoter-like features, introducing the idea of a “tug of war” between Prdm9 and the default, Prdm9-independent, system. A recent study in mammals [105] also showed that many species with Prdm9 make substantial use of default sites, unlike humans and mice. The relative efficiency of the Prdm9-independent and Prdm9-dependent pathways presumably evolves and differs among species. When the Prdm9-independent pathway is sufficiently efficient, the conditions might be met for losing Prdm9 irreversibly. The characterization of recombination patterns and mechanisms in species with and without Prdm9 should help to understand the paradox of its peculiar evolution.

Material and methods

Ethics statement

The S. salar samples were collected by the Unité Expérimentale d’Ecologie et d’ Ecotoxicologie Aquatique (U3E, INRAE, https://doi.org/10.15454/1.5573930653786494E12) with the authorization from an ethical committee number APAFIS#4025–201602051204637 v3. These samples were provided by the Biological Resource Centre Colisa (DOI: Biological Resource Centre Colisa) part of BRC4Env (DOI: https://doi.org/10.15454/TRBJTB), of the Research Infrastructure AgroBRC-RARe. The O. mykiss samples were collected in accordance with the CNRS guidelines for animal welfare and ethical authorization n° APAFIS#13616–2018021315504139 v5 issued by the local committee for ethical animal experimentation and the French ministries of research and agriculture.

Phylogenetic analysis of PRDM9 paralogs in salmonids

We investigated the presence of full-length PRDM9 in 12 species from the 3 salmonid subfamilies (Coregoninae, Thymallinae, and Salmoninae). We searched for Prdm9-related genes by homology using the full-length copy of O. kisutch (coho salmon), focusing on the 3 PRDM9 canonical domains: KRAB (encoded by 2 exons), SSXRD (1 exon), and SET (3 exons). We obtained coho salmon PRDM9 from a nearly full-length coding sequence annotated in the RefSeq database (XP_020359152.1), complemented in its 3′ end using a cDNA identified in a brain RNA-seq data set sequenced with PacBio long reads (SRR10185924.264665.1). We used this reference sequence to identify, with BLAST, Prdm9 homologs in the whole genome assembly of lake whitefish (Coregonus clupeaformis), European grayling (Thymallus thymallus), huchen (Hucho hucho), coho salmon (O. kisutch), rainbow trout (O. mykiss), chinook salmon (Oncorhynchus tschawytscha), chum salmon (Oncorhynchus keta), red salmon (Oncorhynchus nerka), pink salmon (Oncorhynchus gorbuscha), Atlantic salmon (S. salar), brown trout (Salmo trutta), and lake trout (Salvelinus namaycush), and also of northern pike (Esox lucius, Esocidae), a closely related outgroup, and sea bass (Dicentrarchus labrax). As we obtained multiple hits, we filtered out copies containing only one of the 6 exons. We compared candidates to all PRDM-related genes annotated in the human and mouse genomes in Ensembl to exclude non-Prdm9 homologs. We aligned the retained exons separately using Macse (v2.06) [106] to take into account potential frameshifts and stop codons. We manually examined and edited the alignments before concatenating exons of the same copy using AMAS concat [107]. Several paralogous copies of Prdm9 are expected to result from the 2 WGD events that occurred in the common ancestor of teleosts (Ts3R, c.a. 320 Mya) and salmonids (Ss4R, c.a. 90 Mya), respectively [6365]. We used the location of these paralogs on pairs of ohnologous chromosomes resulting from the most recent Ss4R duplication to trace the evolutionary history of Prdm9 duplications, retention and losses. We built the maximum-likelihood phylogeny of the 3 canonical domains using IQ-TREE [108] based on amino acid alignments, using ultrafast bootstrap with 1,000 replicates. Lastly, to identify functional Prdm9 copies with sequence orthology to the 10 exons found in human and mouse Prdm9 [109], we predicted the structure of each gene copy surrounded by 10 kb flanking regions using Genewise (v2.4.1) [110]. We selected representative paralogous sequences across the obtained Prdm9 phylogenetic tree to perform a sequence similarity-based annotation of the copies in each species. See details in Supporting information (S1 Methods).

Analysis of PRDM9 ZF diversity in rainbow trout and Atlantic salmon

We characterized the allelic diversity of the ZF domain of Prdm9α copies in 2 species with different functional α-paralogs: Atlantic salmon (S. salar) and rainbow trout (O. mykiss). We focused on Prdm9α because a previous study showed that in teleost fish, Prdm9β copies lack the KRAB and SSXRD domains, have a slowly evolving ZF domain, and carry a presumably inactive SET domain [19].

First, to validate the presence of expressed Prdm9α copies, we inferred the expression levels of multiple Prdm9α paralogs in immature testes from the Salmo and Oncorhynchus genera, using publicly available RNA-seq data from the Sequence Read Archive (SRA) repository. Specifically, we analyzed data from 2 S. salar samples (SRR1422872 and SRR9593306), 2 O. kisutch samples (SRR8177981 and SRR2157188), and 1 sample in O. mykiss (SRR5657606). Our analysis revealed high expression of 2 distinct Prdm9α paralogs in both genera that were previously identified in the phylogenetic analysis. We then sequenced the Prdm9α paralogs α1.a.2 (full length, chromosome 5, n = 26) and α2.2 (partial, chromosome 17, n = 20) in S. salar, and the Prdm9α paralogs α1.a.1 (full length, chromosome 31, n = 23) and α2.2 (full length, chromosome 7, n = 20) in O. mykiss.

We used wild Atlantic salmon samples from Normandy (France) and rainbow trout samples from an INRAE selected strain (S7 Table). We extracted genomic DNA from fin clips stored in ethanol at −20°C, using the Qiagen DNAeasy Kit following the manufacturer’s instructions. We measured DNA concentration and purity with a Nanodrop-1000 Spectrophotometer (Thermo Fisher Scientific) and assessed DNA quality by agarose gel electrophoresis. We designed primers using NCBI Primer Blast, ensuring specificity against the reference assemblies. Primers targeted the ZF sequence encoded in the last exon of the gene, framed by the flanking arms of the array, avoiding any specificity of the paralogous loci (S8 Table). We carried out PCR reactions using 1X Phusion HF buffer, 200 μm dNTPs, 0.5 μm forward primer, 0.5 μm reverse primer, 3% DMSO, 2.5 to 10 ng template, and 0.5 units of Phusion Polymerase (NEB) (total volume: 25 μl). Cycling conditions were: initial denaturation at 98°C for 2 min followed by 35 cycles of 98°C for 10 s, 66 to 70°C for 30 s, 72°C for 90 s, and a final elongation step at 72°C for 3 min, followed by hold at 10°C, in a C1000 Cycler (Bio-Rad). We examined PCR products on agarose gels and purified them using the NucleoSpin Gel and PCR clean-up kit (Machery-Nagel). We performed Sanger sequencing of single-size amplicons. Conversely, we separated by electrophoresis heterozygous samples showing 2 different length alleles, followed by cloning using the TOPO Blunt Cloning Kit (Invitrogen) and sequencing. Sequencing was done by Azenta-GeneWiz (Leipzig, Germany).

We assembled and aligned forward and reverse reads to the reference ZF array from S. salar ICSASG_v2 and O. mykiss USDA_OmykA_1.1, using SnapGene (v5.1.4.1–5.2.3). We translated contigs into amino acid sequences used to categorize individual Prdm9α alleles. We annotated all ZF arrays to match the C2H2 ZF motif X7-CXXC-X12-HXXXH. We reported new alleles every time we found a single amino acid variation. We aligned the DNA sequences for each allele to create a consensus sequence. We then followed [56] and [19] to compare amino acid diversity at DNA-binding residues of the ZF array (positions −1, 2, 3, and 6 of the α-helix) with diversity values at each site of the ZF array. We calculated the proportion of the total amino acid diversity (r) at DNA-binding sites as the sum of diversity at DNA-binding residues over the sum of diversity at all 28 residues of the array (see details in S1 Methods).

Identification of DSB hotspots in rainbow trout using ChIP-sequencing

We investigated the genome-wide distribution of DMC1-bound ssDNA in O. mykiss testes by ChIP followed by ssDNA enrichment (DMC1-Single Strand DNA Sequencing, DMC1-SSDS). We chose 3 rainbow trout individuals from the pool of samples previously used to characterize PRDM9 ZF diversity. We determined the stage of gonadal maturation by macroscopic (whole gonads) and histological (gonad sections) analyses, according to [111] (S26 Fig and S1 Methods). As DMC1 binds to chromatin during the early stages of the meiotic prophase I, we used testes at stages III and IV from 3 individuals with different Prdm9 genotypes (TAC-1: Prdm91/5, stage III; TAC-3: Prdm92/6, stage III; and RT-52: Prdm91/2, stage IV). This allowed us to compare DSB hotspots between individuals sharing or not a Prdm9 allele.

For H3K4me3 and H3K36me3 ChIP experiments, we used the protocols described in [112,113] with some adjustments and rabbit anti-H3K4me3 (Abcam, ab8580) and anti-H3K36me3 (Diagenode, Premium, C15410192) antibodies. For DMC1 ChIP, we used previously described methods [69,114] and antibodies against DMC1. These antibodies were raised by immunization of a rabbit and a guinea pig with a His-tagged recombinant zebrafish Dmc1 (see details in S1 Methods). All ChIP experiments were performed in duplicate. A list of the samples and antibodies used for the ChIP-seq experiments, the number of mapped reads and accession numbers are in S9 Table.

For H3K4me3 and H3K36me3 ChIP-seq, we generated libraries using the NEBNext Ultra II protocol for Illumina (NEB, E7645S-E7103S), with minor adjustments. For DMC1-SSDS, we generated libraries following the Illumina TruSeq protocol (Illumina, IP-202-9001DOC), with the introduction of an additional step of kinetic enrichment, as previously described [69,114]. Libraries were sequenced on a NovaSeq6000 platform (Illumina) with S4 flow cells by Novogene Europe (Cambridge, United Kingdom).

We analyzed histone modifications with the nf-core/chipseq v1.2.1 pipeline developed by [115]. Briefly, we aligned the sequencing reads of all ChIP-seq experiments to the USDA_OmykA_1.1 assembly with BWA (v0.7.17-r1188). For both H3K4me3 and H3K36me3 modifications, we normalized the signal based on the read coverage and by subtracting the input. We performed peak calling with MACS2 (v2.2.7.1) for both replicates and provided an input for each sample. We assessed the histone modification enrichment at DMC1 peaks, and the enrichment of H3K36me3 signal at H3K4me3 peaks in brain using the deepTools suite [116] and the bed files produced by the AQUA-FAANG project (https://www.aqua-faang.eu/). We analyzed the DMC1-SSDS data as described in [69], with some implementations described in [117], using the hotSSDS pipeline (version 1.0). We mapped reads with the modified BWA algorithm (BWA Right Align), developed to align and recover ssDNA fragments, as described by [114]. We normalized the signal based on the library size and the type 1 ssDNA fragments. We performed peak calling with MACS2 (v2.2.7.1) and relaxed conditions for each of the 2 replicates and provided an input control. We carried out an irreproducible discovery rate (IDR) analysis to identify reproducible enriched regions. Then, we used these peaks as DSB hotspots (see details in S1 Methods). We used the final peaks to check the distribution of ssDNA type 1 signal at DSB hotspots.

We explored the relationship between H3K4me3 and H3K36me3 signal distribution by calculating the correlation between H3K4me3 and H3K36me3 read enrichment at DSB hotspots in the RT-52 sample, of the H3K4me3 read enrichment between the TAC-1 and TAC-3 samples at the RT-52 DSB hotspots, and of the H3K36me3 read enrichment between the TAC-1 and TAC-3 samples at RT-52 DSB hotspots. Lastly, we assessed the proportion of DSB hotspots overlapping with H3K4me3 and H3K36me3 peaks.

Reconstruction of population recombination landscapes in 3 salmonid species

Whole-genome resequencing data

To reconstruct population-based recombination landscapes, we collected high coverage whole-genome resequencing data from 5 natural populations of 3 salmonid species from the SRA database: coho salmon (O. kisutch), rainbow trout (O. mykiss), and Atlantic salmon (S. salar). We used ~20 individuals per population as recommended [83]. We retrieved 20 genomes of the Southern British Columbia population of coho salmon [118], 22 genomes of rainbow trout from North West America [119], and 60 genomes of 3 populations of Atlantic salmon belonging to the 2 major lineages from North America and Europe [120]: 20 from the Gaspesie Peninsula in Canada (GP population thereafter), 20 from the North Sea (NS population), and 20 from the Barents Sea in Norway (BS population). Sample accession numbers and locations are in S10 Table and S27 Fig.

Variant calling

Variants and genotypes called by [118] using GATK were used for O. kisutch. We followed the same methodology for variant calling and genotyping in O. mykiss and S. salar, using the GATK best-practice pipeline (> v3.8–0, see S11 Table for the detailed versions of the programs [121,122]). First, we aligned paired-end reads to their reference genome (Okis_V1, GCF_002021735.1; Omyk_1.0, GCF_002163495.1; Ssal_v3.1, GCF_905237065.1, see S12 Table for assembly statistics) using BWA-MEM (v0.7.17, Li and Durbin, 2009; -M option), yielding an average read coverage depth per sample of 29.54×, 24.87×, and 9.97× for O. kisutch, O. mykiss, and S. salar, respectively (S4 Table). We used Picard (> v2.18.29) to mark PCR duplicates and add read groups. Then, we performed variant calling separately for each individual using HaplotypeCaller before joint genotyping with GenotypeGVCFs. In total, we analyzed 9,590,270, 39,601,311, and 27,061,466 single-nucleotide polymorphisms (SNPs) for O. kisutch, O. mykiss, and S. salar, respectively.

After genotyping, we removed variants within 5 bp of an indel with the Bcftools filter (v 1.9; Li, 2011; -g 5). We filtered low-quality SNPs with Vcftools (> v 0.1.16) [123], keeping only biallelic SNPs, and excluding genotypes with low-quality scores (—minGP 20) and SNPs with >10% of missing genotypes (—max-missing 0.9). For the S. salar data set, we set the missingness threshold at 50% to take into account the lower sequencing coverage depth in this species. To remove the effect of poorly sequenced and duplicated regions, we kept only sites with a mean coverage depth within the 5% to 95% quantiles of that species distribution. To further eliminate shared excesses of heterozygosity due to residual tetrasomy or contaminations, we applied a Hardy–Weinberg equilibrium filter with a p-value exclusion threshold of 0.01 (—hwe 0.01). We removed singletons by applying a minor allele count (MAC) filter with Vcftools (—mac 2). For S. salar, we used the missingness, Hardy–Weinberg, and MAC filters separately for each of the 3 populations. After these filtering steps, we retrieved a total of 7,205,269, 16,079,097, and 5,575,430 SNPs for O. kisutch, O. mykiss, and S. salar, respectively (S4 Table).

Variant phasing and orientation

We used the read-based phasing approach in WhatsHap (> v0.18) [124] to identify phase blocks from paired-end reads that overlapped with neighboring individual heterozygous positions. This allowed us to locally resolve the physical phase of 73.45%, 76.98%, and 7.32% of variants for O. kisutch, O. mykiss, and S. salar, respectively. Then, we performed the statistical phasing of pre-phased blocks with SHAPEIT4 (> v4.2.1, [125], default settings) in each species, assuming a uniform recombination rate of 3 cM/Mb (representative of the average recombination rates in teleosts, [11]) and using the effective population size estimated from the mean nucleotide diversity of each chromosome calculated with Vcftools.

We inferred ancestral allelic state probabilities for the set of retained variants of each species with the maximum-likelihood method implemented in est-sfs (v2.04, Kimura-2-parameter substitution model) [126], using 3 outgroups per species chosen among the available salmonid reference genomes (see details in S1 Methods). The method uses the ingroup allele frequencies and the allelic states of the outgroups to infer ancestral allelic state, taking into account the phylogenetic relationships between ingroup and outgroup species [127].

Estimation of linkage disequilibrium (LD)-based recombination rates

For each of the 5 population data sets (O. kisutch, O. mykiss, and S. salar GP, BS and NS populations, S27 Fig), we estimated the population-scaled recombination rate parameter ( = 4Ner, where Ne is the effective population size and r the recombination rate in M/bp) using LDhelmet (v1.19) [13]. LDhelmet relies on a reversible-jump Markov Chain Monte Carlo algorithm to infer the value between every pair of consecutive SNPs. Variant orientation was provided using the probabilities, estimated by est-sfs, that the major and minor alleles were ancestral, and a transition matrix was computed following [13]. We run LDhelmet 5 times independently for each population. For each chromosome, we created the haplotype configuration files with the find_conf function using the recommended window size of 50 SNPs. We created the likelihood look-up tables once for the 5 runs with the table_gen function using the recommended grid for the population recombination rate (ρ/pb) (i.e., ρ from 0 to 10 by increments of 0.1, then from 10 to 100 by increments of 1) and with the Watterson θ = 4Neμ parameter of the corresponding chromosome obtained using μ = 10−8. We created the Padé files using 11 Padé coefficients as recommended. We run the Monte Carlo Markov chain for 1 million iterations with a burn-in period of 100,000 and a window size of 50 SNPs, using a block penalty of 5. We checked the convergence of the 5 independent runs by comparing the estimated recombination values with the Spearman’s rank correlation test (Spearman’s rho >0.96; S28 Fig). We averaged and smoothened the 5 runs within 2 kb, 100 kb, and 1 Mb windows using custom python scripts.

We reconstructed the fine-scale recombination landscape of the European sea bass (D. labrax) to compare recombination features in salmonids with a species that lacks a complete Prdm9 gene due to loss of the KRAB domain. We used whole-genome haplotype data obtained by phasing-by-transmission and statistical phasing [128] to infer recombination in the Atlantic sea bass population with a similar strategy, using the seabass_V1.0 genome assembly (GenBank accession number GCA_000689215.1) (S4 and S11 Tables).

We estimated Ne based on the nucleotide diversity (θ) measured in our population resequencing data sets, and on values of mutation rates (Ne = θ/4μ). θ was calculated on the filtered data set (keeping singletons) with Vcftools in windows of 100 kb and corrected by the proportion of SNPs discarded after filtering to account for the proportion of the genome not considered by Vcftools to estimate θ. We used values of μ reported in fish: stickleback (4.56 × 10−9; [129]), Atlantic herring (2.0 × 10−9; [130]), cichlid fish (3.5 × 10−9; [131]), guppy (3.44 × 10−9; [132]), and in human (1 × 10−8; [133]) to obtain a range of Ne estimates. The range of μ/r ratio was computed in each species, based on genome-wide recombination rate (r) measured from published pedigree-based genetic maps [67,7577].

We also controlled that our estimates of recombination rates were robust to limited sequencing coverage (see section “Controlling for a possible effect of limited sequencing coverage” in S1 Methods, S29 and S30 Figs).

Identification of LD-based recombination hotspots

We identified recombination hotspots from the raw recombination map inferred by LDhelmet (i.e., raw hotspots) and from the 2 kb smoothed recombination map (i.e., 2 kb hotspots) using a sliding window approach. We defined hotspots as intervals between consecutive SNPs or 2 kb windows with a relative recombination rate ≥5-fold higher than the mean recombination rate in the 50 kb flanking regions. When consecutive 2 kb windows exceeded the threshold, we retained only the window with the highest rate.

Analysis of recombination landscapes

Comparison between DMC1 and LD-based recombination maps

We compared DSB hotspots mapped by DMC1-SSDS of the 3 pooled samples and the LD-based recombination hotspots retrieved from the recombination landscapes of O. mykiss. For this, we converted the genomic positions of the DSB hotspots mapped on the OmykA_1.1 assembly to the Omyk_1.0 coordinates on which we built the LD map using the Remap program from NCBI. We compared the locations of LD-hotspots and of DSB hotspots using Bedtools intersect [134].

Recombination at genomic features

We investigated how DSB-based (O. mykiss) and LD-based (3 salmonid species and sea bass) recombination rates and hotspots were distributed relative to genomic features. We first retrieved the positions of genes, exons, and introns from genome annotations in each species. We de novo identified transposable element (TE) families in each genome using RepeatModeler (v2.0.3; option -LTRStruct) [135], before mapping TEs and low complexity DNA sequences with RepeatMasker (version 4.1.3, http://www.repeatmasker.org/, options -xsmall, -nolow). We deduced intergenic regions from gene and TE locations using Bedtools subtract. We defined the TSS and TES as the first and last position of a gene, respectively. For each reference genome, we predicted CGIs with EMBOSS cpgplot (v6.6.0) [136], using the parameters -window 500 -minlen 250 -minoe 0.6 -minpc 0. It should be noted that the criteria that are classically used to predict CGIs in mammals and birds (CpG observed/expected ratio >0.6, GC content >50%) are not appropriate for teleost fish in which CGIs are CpG-rich but have a low GC content [78,79]. Therefore, we predicted CGIs based only on their CpG content, without any constraint on their GC content. We confirmed that these criteria efficiently predicted TSS-associated CGIs, using whole genome DNA methylation and H3K4me3 data from rainbow trout and coho salmon (see S1 Analysis).

We investigated DSB hotspots overlap with genomic features and their distance to the nearest promoter-like feature (TSS) using Bedtools. For this, we analyzed DNA DSB distribution using DMC1-SSDS read enrichment as metric. As the coordinates of genomic features were mapped on Omyk_1.0, we converted them to the OmykA_1.1 assembly using the NCBI Remap tool.

We assessed population recombination rate (2 kb scale) variations in function of the distance to the nearest TSS (overlapping or not with a CGI) using the distanceToNearest function of the R package GenomicRanges [137]. We retrieved the averaged recombination rates at each genomic feature (i.e., genes, exons, introns, intergenic regions, TSSs within and outside CGIs and TEs) using the subsetByOverlaps function of the same package. We compared recombination rates at genomic features in the 5 salmonid populations and in sea bass.

Lastly, we investigated the effect of SNP density, GC content, and TEs on the population recombination rate variation and the presence of recombination hotspots. We calculated SNP density, TE density, and GC content in non-overlapping 100 kb windows and compared them with the window-averaged recombination rates using the Spearman’s rank correlation test. We assessed the association of hotspots with SNP density and GC content at the 2 kb scale.

Comparison of LD-based landscapes between populations and species

We assessed the correlation between the 100 kb smoothed recombination maps of each of the 3 S. salar populations using the Spearman’s rank test. We identified shared hotspots between populations as overlapping 2 kb hotspots using Bedtools intersect. To compare the recombination hotspots of the O. kisutch and O. mykiss populations, we used a reciprocal blast approach to identify homologous regions of the genome in these 2 species (see S1 Methods). We used random permutations to calculate the expected amount of hotspot overlap between pairs of the 3 S. salar populations and between the 2 Oncorhynchus populations. We drew random spots (same number as that of the 2 kb hotspots) 100 times from the genome for each population using Bedtools shuffle, after applying a genome mask to discard the regions with a nucleotide diversity lower and higher than the 2.5 and 97.5th quantile, the 0.1% highest recombination rate values, the 10% larger gap sizes, and genuine hotspots, to control for diversity level, extreme values, and genome gaps. We compared each of these random sets to those of the compared population to calculate the average overlap expected only by chance.

Identification of DNA motifs at hotspots and motif erosion

In rainbow trout, we performed motif detection analysis at DSB hotspots using the MEME Suite [138], focusing on the RT-52 data set due to its high number of DSB hotspots (DMC1 peaks). We defined 2 subsets of allele-specific hotspots using Bedtools intersect: Allele 1 set [RT-52 DMC1 peaks (center ± 200 bp) overlapping with H3K4me3 and H3K36me3 peaks from TAC-1 (N = 300)] and Allele 2 set [RT-52 DMC1 peaks (center ± 200 bp) overlapping with H3K4me3 and H3K36me3 peaks from TAC-3 (N = 254)]. We used MEME-ChIP [139] to detect motifs. Then, we assessed motif enrichment in DSB hotspots relative to control sequences using FIMO [140] and evaluated central enrichment using CentriMo [141]. We also quantified the fold-enrichment of the detected motifs at LD-based hotspots (see S1 Methods).

In Atlantic salmon, we used STREME from the MEME Suite [142] to find motifs between 10 and 20 bp in length that were enriched at LD-based hotspot positions, compared with control random sequences with a similar GC-content distribution. To obtain these controls, we randomly drew windows (totaling 10 times the number of hotspots from the reference genome), and we applied the same genome mask as in the previous section to select random spots, controlling for diversity, high recombination rates and genome gaps and to exclude hotspots. To select a subset of controls matching the GC-content of hotspots, we binned the hotspot GC-content distribution (bin width = 0.025) and then drew the same number of random spots for each GC-content bin. We retained the detected motifs as potential PRDM9-binding motifs if they were enriched ≥2-fold at the hotspots compared with the control sequences and were found in ≥5% of hotspots. We searched for motifs associated with the hotspots of each S. salar population and with the shared hotspots between pairs of populations.

We then tested whether the candidate motifs showed signs of erosion between lineages by comparing the number of motifs present in the available long-read genome assemblies from 5 North American and 7 European Atlantic salmon genomes. To take into account potential differences in assembly size, we aligned these 12 genomes with SibeliaZ [143] and retrieved collinear blocks that represented 89.5% of the whole genome alignment. We then used FIMO [140] to count motif occurrence in the aligned fraction of each genome, using a p-value cut-off of 1.0E-7. To assess the statistical significance of motif erosion in a given lineage, we obtained a null distribution of the between-lineage difference in motif occurrence by running FIMO on 100 random permutations of the candidate motif matrix.

Supporting information

S1 Methods. Molecular, genomic, and population genetics methods.

(DOCX)

pbio.3002950.s001.docx (158.6KB, docx)
S1 Analysis. Prediction of CGI-associated TSSs in salmonids.

Fig A. Relationship between base composition and DNA methylation level in promoter regions of coho salmon (Oncorhynchus kisutch). Fig B. Relationship between DNA methylation level in the promoter regions of coho salmon and the presence of a nearby fpCGI. Fig C. Relationship between base composition and H3K4me3 in promoter regions of the rainbow trout (Oncorhynchus mykiss). Fig D. Relationship between H3K4me3 in the promoter regions of the rainbow trout and the presence of a nearby fpCGI.

(DOCX)

pbio.3002950.s002.docx (448.8KB, docx)
S1 Table. Chromosome location of the retained PRDM9 paralog copies.

The index allows to identify the corresponding copy in the phylogeny of the ɑ paralog copies in Fig 1A and the β copies in S1 Fig. We retrieved the location of the regions covering the 3 domains KRAB, SSXRD, and SET obtained from the blast analysis. The start and end positions correspond to the start position of the first exon blasted and the end position of the last exon blasted.

(DOCX)

pbio.3002950.s003.docx (22KB, docx)
S2 Table. Fine scale variations in recombination rates and raw recombination hotspots.

Summary statistics of the variations in recombination rates smoothed in 2 kb sliding windows, and of recombination hotspots retrieved from the inter-SNP recombination landscapes. The raw hotspots were defined as the consecutives inter-SNP windows with a recombination rate 5-fold higher than the 50 kb flanking regions.

(DOCX)

pbio.3002950.s004.docx (13.8KB, docx)
S3 Table. Overlaps between DSB hotspots and TSS/TES regions.

Overlaps were assessed for 400-bp wide windows centered on DSB hotspot centers and TSS/TES regions, defined as sequences found within 1 kb of distance from the transcription start/end site. The expected overlaps were estimated as the chance for 2 kb windows of overlapping 400 bp DSB hotspots genome wide.

(DOCX)

pbio.3002950.s005.docx (13.7KB, docx)
S4 Table. Summary statistics of the LD-landscape reconstruction pipeline.

Information is retrieved on the reference genome sizes, population sample sizes, mapping depth statistics, variant calling statistics, and effective sizes of genomic features for each population.

(DOCX)

pbio.3002950.s006.docx (14.3KB, docx)
S5 Table. Comparison of fine-scale recombination landscapes across vertebrates.

(DOCX)

pbio.3002950.s007.docx (45.1KB, docx)
S6 Table. Presence of Zcwpw1, Zcwpw2, Tex15, and Fbox47 genes in the genomes of analyzed species.

(DOCX)

pbio.3002950.s008.docx (14.7KB, docx)
S7 Table. List of genotyped samples.

Details about the Salmo salar and Oncorhynchus mykiss samples used in this study and the corresponding Prdm9ɑ genotypes identified.

(DOCX)

pbio.3002950.s009.docx (15.3KB, docx)
S8 Table. List of primers.

The primers used in this study to genotype the zinc finger array of Prdm9ɑ in Salmo salar and Oncorhynchus mykiss.

(DOCX)

pbio.3002950.s010.docx (13.3KB, docx)
S9 Table. List of ChIP-seq experiments performed.

Details about the ChIP-seq experiment performed in Oncorhynchus mykiss testes. In the third column DMC1-R1 or -GP refer to the animal used to raise the antibody, respectively, rabbit individual 1 and guinea pig.

(DOCX)

pbio.3002950.s011.docx (14.4KB, docx)
S10 Table. Sample accession number and location.

Population samples of O. kisutch, O. mykiss, and S. salar used to build the linkage disequilibrium-based recombination landscapes were selected from [118120].

(DOCX)

pbio.3002950.s012.docx (32.5KB, docx)
S11 Table. Program versions.

Details of the program versions used at each step of the reconstruction of LD-based recombination landscapes in the 5 salmonid populations. * The D. labrax data set was taken from [128], who used a reference panel of 22 genomes fully phased-by-transmission using trio-sequencing as a learning reference for the statistical phasing of 46 additional genomes with Eagle2 v2.4. Variants were oriented using whole-genome resequencing data (>20×) from the closely related species Dicentrarchus punctatus, which was used as an outgroup.

(DOCX)

pbio.3002950.s013.docx (27.7KB, docx)
S12 Table. Assembly statistics of the reference genome of O. kisutch, O. mykiss, and S. salar that were used to map the population resequencing data for the LD-based recombination landscapes and the ChIP-Seq DMC1 peaks.

(DOCX)

pbio.3002950.s014.docx (13.1KB, docx)
S13 Table. Mean sequence identity score obtained from the blast search of the 100 kb flanking sequences of the variants in the ingroup species against the reference genome of each outgroup.

Outgroups 1, 2, and 3 for O. kisutch were O. tshawytscha, O. nerka, and O. mykiss, respectively; O. tschawytscha, O. nerka, and O. kisutch for O. mykiss; and Salmo trutta, Salvelinus alpinus, and O. mykiss for S. salar.

(DOCX)

pbio.3002950.s015.docx (13.4KB, docx)
S1 Fig. Phylogenetic distribution of PRDM9β paralogs in 12 salmonids, the northern pike (Esox lucius) and the European sea bass (Dicentrarchus labrax) as outgroup species.

The phylogenetic tree was realized on the concatenated 3 exons of the SET domain, with 1,000 bootstrap replicates (values shown at nodes). To facilitate visualization, the branch of D. labrax is not drawn to scale. In column from left to right, (i) species; (ii) annotated paralog copy; (iii) Prdm9 copy status. The scale bar is in unit of substitution per site. The right panel shows the coding potential of each paralog and indicates the presence of substitutions in the catalytic tyrosines of the SET domain (Y276, Y341, and Y357). Canonical (full length) PRDM9 proteins contain 4 key domains: KRAB (encoded by 2 exons), SSXRD (encoded by 1 exon), SET (encoded by 3 exons), and the ZF array (encoded by 1 exon). Complete exons are shown in blue. Missing or truncated exons are shown in pink. Other regions of the protein (upstream of the KRAB domain, and between KRAB and SSXRD), are encoded by additional exons (not shown here), that are not conserved between α and β clades. All β copies have lost KRAB and SSXRD domains, and have substitutions in at least two of the 3 catalytic tyrosines of the SET domain. β copies are well conserved across all species (including in the ZF array), which indicates that these truncated PRDM9 homologs are under purifying selection, and hence that they have a function. The last column indicates indexes referring to the S1 Table with additional information on the corresponding copy. The data and codes underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953.

(DOCX)

pbio.3002950.s016.docx (309.4KB, docx)
S2 Fig. Chromosome position of the tandem duplicated PRDM9α a and b copies.

Relative position and orientation of the a (in red) and b (in blue) tandem duplicated copies of the PRDM9α1.1 and 1.2 paralogs for each species. The chromosome/scaffold name on which the copy seats is shown. α1.1 and 1.2, which occur as single copies in some species, are also shown (in gray). The data underlying this figure can be found in S1 Table.

(DOCX)

pbio.3002950.s017.docx (228.5KB, docx)
S3 Fig. Amino acid diversity in full-length and partial PRDM9 zinc fingers in S. salar and O. mykiss.

Amino acid sequences of all unique zinc fingers found in alleles identified in S. salar PRDM9α1.a.2 and α2.2, and in O. mykiss PRDM9α1.a.1 and α2.2 (Figs 2A and S5). In bold colored boxes are indicated the 3 hypervariable DNA-binding residues. In red are reported the cysteine (C) and histidine (H) residues involved stabilizing the structure of the array. In blue are indicated the polymorphic residues compared to the consensus, outside the 3 amino acids in contact with DNA. In shaded gray are reported the synonym variations in respect to the consensus. The complementary information about the DNA sequences of all alleles identified is available in S1 Methods.

(DOCX)

pbio.3002950.s018.docx (631.7KB, docx)
S4 Fig. Graphical view of PRDM9 paralogs.

Cartoon showing the functional domains of PRDM9 paralogs analyzed in this study. The amino acid sequences were obtained from the reference genome and analyzed using previously described methodology [144]. α1 copies and the O. mykiss α2.2 copy possess a complete KRAB domain, and we refer to these copies as canonical PRDM9. S. salar α2.2 copy possess a partial KRAB domain, and we refer to this copy as truncated PRDM9. All 4 copies present the 3 catalytic tyrosine residues in the SET domain, required for methyltransferase activity.

(DOCX)

pbio.3002950.s019.docx (245.6KB, docx)
S5 Fig. PRDM9α2.2 zinc finger allelic diversity in S. salar and O. mykiss.

(A) Structure of PRDM9 zinc finger arrays of identified alleles in S. salar PRDM9α2.2 and O. mykiss PRDM9α2.2. Colored boxes represent unique zinc fingers, characterized by the 3 amino acids in contact with DNA (3-letter code). Additional variations relative to a reference sequence are indicated between brackets. A white star indicates the zinc fingers missing one amino acid residue (27 a.a. instead of 28). The complete zinc finger amino-acid sequences are shown in S3B Fig. Frequencies of the alleles displayed on panel A among the 20 S. salar and 20 O. mykiss individuals that were genotyped for PRDM9. (C) Distribution of amino acid diversity among all unique zinc fingers found in alleles displayed on panel A, following previously described methodology [19]. The amino acid diversity is plotted as a function of amino acid position in the ZF alignment, ranging from position 1 to position 28 (first and last residues) of a ZF unit. The ratio of amino acid diversity at DNA-binding residues of the ZF array (−1, 2, 3, and 6), indicated as r, is shown in the upper box. The data underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953 and in S7 Table.

(DOCX)

pbio.3002950.s020.docx (169.7KB, docx)
S6 Fig. Meiotic DSB hotspots features in O. mykiss.

(A) Average profile of DMC1 ChIP-seq ssDNA fragments orientation (fragments per million, FPM) in TAC-1, TAC-3, and RT-52 testes, at DSB hotspots detected in TAC-1, TAC-3, and RT-52. The profile from each experiment performed is shown (2 replicates/sample). Signal mapped on the forward strand is depicted in blue, signal aligned to the reverse strand is shown in green, as shown in the cartoon on top of the panel. (B) Upset plot showing intersections between DSB hotspots from TAC-1 (n = 616), TAC-3 (n = 209), and RT-52 (n = 1,924). (C) DMC1 ChIP-seq signal fold enrichment (scaled by the average signal in intergenic regions) at multiple genomic features. TSS inside and outside CGIs are highlighted in purple and turquoise, respectively. The data and codes underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953 and https://zenodo.org/records/14198863.

(DOCX)

pbio.3002950.s021.docx (219.7KB, docx)
S7 Fig. Distribution of DSB hotspots along chromosomes in O. mykiss.

(A) Distribution of DSB hotspots from TAC-1, TAC-3, and RT-52 along chromosomes (paces of 1/30 of chromosome length). (B) Average profile and heatmap of DMC1 ChIP-seq ssDNA fragments orientation (fragments per million, FPM) in TAC-1, TAC-3, and RT-52 testes, at DSB hotspots shared by pairs of samples. Shared DMC1 peaks: TAC-1 intersecting RT-52 (n = 167), TAC-1 intersecting TAC-3 (n = 55), and RT-52 intersecting TAC-3 (n = 42). The plots depict one replicate for each experiment performed (replicate 1). Signal mapped on the forward strand is depicted in blue, signal aligned to the reverse strand is shown in green. The data and codes underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953 and https://zenodo.org/records/14198863.

(DOCX)

pbio.3002950.s022.docx (565.8KB, docx)
S8 Fig. Histone modification signal at H3K4me3 peaks and at DSB hotspots.

(A) Average profile and heatmap of H3K36me3 ChIP-seq signal in TAC-1 (blue) and TAC-3 (green) testes, at H3K4me3 peaks detected in brain (Aqua-FAANG). (B) Average profile and heatmap of H3K4me3 (left) and H3K36me3 (right) ChIP-seq signal in TAC-1 testes, at DSB hotspots detected in TAC-1 (blue), TAC-3 (cyan), and RT-52 (yellow). (C) Average profile and heatmap of H3K4me3 (left) and H3K36me3 (right) ChIP-seq signal in TAC-3 testes, at DSB hotspots detected in TAC-1 (blue), TAC-3 (cyan), and RT-52 (yellow). The data underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953.

(DOCX)

pbio.3002950.s023.docx (1.9MB, docx)
S9 Fig. Correlation of H3K4me3 and H3K36me3 signal at RT-52 hotspots.

(A) Scatterplots showing H3K4me3 and H3K36me3 ChIP-seq signal in TAC-1 and TAC-3 testes, at RT-52 DSB hotspots. (B) Left panels, scatterplots representing H3K4me3 (top) or H3K36me3 (bottom) ChIP-seq signal in TAC-1 and TAC-3 testes, at RT-52 DSB hotspots. Right panels, numbers of RT-52 hotspots with H3K4me3 (top) or H3K36me3 (bottom) ChIP-seq signal under or above 1 in TAC-1 and TAC-3. Chi-square test of homogeneity. The data underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953.

(DOCX)

pbio.3002950.s024.docx (149.1KB, docx)
S10 Fig. Broad scale recombination rate variations along the genome.

Recombination rates were averaged into percentiles of chromosome length and scaled by the genomic mean for (A) O. kisutch (in orange), O. mykiss (in green), and S. salar (in blue, only the NS population is shown); and (B) D. labrax (in red) and the three-spined stickleback Gasterosteus aculeatus (in black, data from [61]). The data and codes underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953.

(DOCX)

pbio.3002950.s025.docx (239.6KB, docx)
S11 Fig. Fine-scale recombination landscapes of O. kisutch (in orange), O. mykiss (in green), and S. salar (in blue, only the NS population is shown), with recombination rates smoothed in 2 kb sliding windows.

The data and codes underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953.

(DOCX)

pbio.3002950.s026.docx (269.3KB, docx)
S12 Fig. Proportion of recombination according to proportion of the genome for O. kisutch (orange), O. mykiss (green), S. salar (shades of blue), and D. labrax (gold).

The data and codes underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953.

(DOCX)

pbio.3002950.s027.docx (132.9KB, docx)
S13 Fig. Proportion of hotspots (in %) according to raw hotspot size.

Hotspots were defined as consecutives windows of 2 adjacent SNPs in which the recombination rate is at least 5-fold higher than the 50 kb flanking regions. The data and codes underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953.

(DOCX)

pbio.3002950.s028.docx (120.5KB, docx)
S14 Fig. Comparison between the LD-based recombination landscape and the ChIP-Seq DMC1 map of the rainbow trout O. mykiss.

(A) Venn diagram showing the percentage of shared peaks between the ChIP-Seq peaks of the pooled samples (in brown) and the LD-based hotspots (in green). The percentage has been calculated using the number of DMC1 peaks as the denominator. (B) Random expected (blue) and observed values (orange) of shared peaks between LD and ChIP-Seq maps. (C) Recombination rates in the syntenic location of the ChIP-Seq peaks, in the LD hotspots, in the shared ChIP-Seq and LD windows (i.e., 116 ChIP-Seq peaks shared with LD hotspots) and in the background landscapes (i.e., the genomic windows not containing neither a LD hotspot nor a ChIP-Seq peak). The data and codes underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953.

(DOCX)

pbio.3002950.s029.docx (85.9KB, docx)
S15 Fig. Broad scale variation in genomic variables according to recombination rates.

(A) SNP density (per kb). (B) GC-content. (C) TE density (per 100 kb). Recombination rates, SNP density GC-content, and TE density were averaged in 100 kb sliding windows. Significance p-value of Spearman’s rank test <0.05 are indicated by an asterisk in panels. The vertical dashed line is the mean recombination rates and the horizontal dashed line is the mean y variable. The data and codes underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953.

(DOCX)

pbio.3002950.s030.docx (637.6KB, docx)
S16 Fig. Genetic diversity and base composition at recombination hotspots.

(A) SNPs density (per kb) and (B) GC content, according to distance to the nearest recombination hotspots. SNP density, GC-content, and recombination rates were averaged in 2 kb windows. Colored (orange, green, and blue) dashed lines show the mean of the y variable at hotspots of the corresponding populations, the black dashed line is the genomic mean (outside hotspots). Loess curves are shown for a span of 0.5. The data and codes underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953.

(DOCX)

pbio.3002950.s031.docx (238.1KB, docx)
S17 Fig. Average recombination rates in TEs families.

Tc1-mariner, a family of LTR, is shown. The data and codes underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953.

(DOCX)

pbio.3002950.s032.docx (106.7KB, docx)
S18 Fig. Inter- and intra-chromosome variation in recombination rates in residual tetraploid chromosomes.

(A) Averaged recombination rate per chromosome. Gray bars indicate chromosomes without residual tetrasomic regions (2N), and yellow bars indicate chromosomes with residual tetrasomic regions (4N) in the 5 Oncorhynchus and Salmo populations. Gray dashed lines represent the genome averaged recombination rates of the 2N chromosomes, and the yellow line in the 4N chromosomes. Recombination rates are significantly higher in 4N chromosomes compared to 2N chromosomes (Student test, t(13.258) = −3.9404, p < 0.05) in O. mykiss population, but not in O. kisutch (Student test, t(25.867) = −0.88786, p > 0.05) neither in S. salar populations (Student test, t(23.519) = −0.0026857, p > 0.05 for GP, t(20.99) = −2.2572, p < 0.05 for BS and t(19.677) = −1.9741, p = 0.06258 for NS). (B) Recombination rates along the genome. Recombination rates were averaged into percentiles of chromosome length and scaled by the genomic mean. Same color as panel A. The data and codes underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953.

(DOCX)

pbio.3002950.s033.docx (464.1KB, docx)
S19 Fig. Recombination rates at genomic features in residual tetraploid chromosomes.

(A) Fold recombination rates (scaled by the average recombination rate at 50 kb from the nearest feature) according to distance to the nearest promoter-like features (i.e., TSS overlapping or not a CGI). Recombination rates in chromosomes not containing residual tetraploid regions are shown by the continuous line and by the dashed line for the 4N chromosomes. (B) Fold recombination rates (scaled by the average recombination rates in intergenic regions) in genomic features, in 2N (gray) and 4N (yellow) chromosomes. The horizontal line shows the intergenic recombination level. TSS and TES were defined as the first and last positions of genes. CGIs were mapped with EMBOSS using CpGoe > 0.6 and GC > 0. The data and codes underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953.

(DOCX)

pbio.3002950.s034.docx (244.2KB, docx)
S20 Fig. Significance of hotspots sharing between closely related populations.

Random expectations (blue) and observed values (orange) of shared hotspots between (A) O. kisutch and O. mykiss; between S. salar populations (B) GP and BS; (C) GP and NS; and between (D) BS and NS. Shared hotspots were defined as 2 kb hotspots overlapping by at least 1 bp. Percent shared is calculated using the number of hotspots in the species/population with fewer hotspots as the denominator. The expected distribution of shared hotspots has been obtained from 1,000 pairwise comparisons of random spot. The data and codes underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953.

(DOCX)

pbio.3002950.s035.docx (134KB, docx)
S21 Fig. Pairwise comparison of 100 kb smoothed recombination maps between S. salar populations.

(A) Comparison between GP and BS populations. (B) Comparison between GP and NS populations. (C) Comparison between BS and NS populations. Spearman’s rank test p-value <0.05. Loess curves are shown for a span of 0.7. The data and codes underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953.

(DOCX)

pbio.3002950.s036.docx (328KB, docx)
S22 Fig. DSB and LD hotspots are enriched in PRDM9 allele-specific motifs.

Frequency of sequences with at least one hit for PRDM9 allele 1 (left) and allele 2 (right) motifs at allele 1 and allele 2 sites, RT-52, TAC-1 and TAC-3 DSB hotspots, LD-hotspots and control sites. Fold enrichment relative to the control sites is shown on top of each column. The associated p-values indicate significant differences in fold enrichment relative to the control (Fisher exact test). “NS” indicates not significant (p > 0.05). The data and codes underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953.

(DOCX)

pbio.3002950.s037.docx (119.8KB, docx)
S23 Fig. DSB and LD-based hotspots are enriched in PRDM9 allele-specific motifs.

Positional distribution of hits for Prdm9 allele 1 (pink) and allele 2 motifs (green) in RT-52, TAC-1, TAC-3 DSB hotspots, LD stronger hotspots (n = 5,000) and control sites (n = 5,000). The distribution is shown from the center of the sequence with a range of ±2.5 kb for the DSB hotspots and the control sites. The LD hotspots were centered on the SNP interval showing the highest recombination rate (ρ/bp) and the distribution extends up to 7.5 kb from the refined center. The signal is smoothed by weighted moving average and hits were calculated either in a 750 bp window for the LD hotpots and in a 250 bp window for all other sequences. The statistical significance of motif enrichment, adjusted for multiple tests, is shown (one-tail binomial test). “ns” indicates non-significant enrichment (p > 0.05). The data underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953.

(DOCX)

pbio.3002950.s038.docx (583.5KB, docx)
S24 Fig. Motifs enrichment at population-specific and shared recombination hotspots in S. salar populations.

(A) Average recombination rate in motifs found enriched at hotspot. Yellow boxes show motifs found in at least 5% of hotspots showing 2-fold enrichment compared to the control set of random spots. (B) Average recombination rate in hotspots containing the retained motifs from panel A (with the corresponding motifs shown) compared to hotspots not containing the retained motifs. Significant Student’s tests are indicated (***, p-value <0.05). (C) Percentage of hotspots containing the retained motifs shown in yellow. The data and codes underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953.

(DOCX)

pbio.3002950.s039.docx (279.9KB, docx)
S25 Fig. Genome wide distribution of PRDM9α motifs along chromosomes in O. mykiss.

(A) Distribution of PRDM91 (n = 68,047) and PRDM92 (n = 59,986) motifs in rainbow trout genome along chromosomes (paces of 1/30 of chromosome length). (B) Distribution of motifs enriched in the shared hotspots between the BS and NS populations of the Atlantic salmon (n = 936).

(DOCX)

pbio.3002950.s040.docx (201.6KB, docx)
S26 Fig. Histology and immunostaining of trout gonads.

(A) Hematoxylin-eosin-stained histological sections of testes from O. mykiss samples used in this study. In TAC-1, TAC-3 and RT-52 the seminiferous tubules were filled with round cells, mostly primary spermatocytes (Sc), some spermatids (ST), few spermatogonia (Sg), and almost no mature spermatozoa visible (Sz). For meiotic cells, between brackets is indicated the substage of prophase I: leptotene (L), zygotene (Z), diplotene (D), and diakinesis (DK). Scale bars are 20 μm. (B) Immunofluorescence of SYCP3, SMC3, and DMC1 in testes sections from a stage III O. mykiss sample not used for ChIP in this study. Scale bars are 10 μm.

(DOCX)

pbio.3002950.s041.docx (2.5MB, docx)
S27 Fig. Sample location.

The 20 individuals of O. kisutch were samples in the Columbia River (in orange) [118], the 22 samples of O. mykiss come from North America rivers (in green) [119], and the 60 individuals of S. salar were sampled in Canada and Norway [120]. Based on population structure analysis, we subdivided the Atlantic salmon samples into 3 populations (in shades of blue): Gaspesie-Anticosti (GP), Barents sea (BS), and North sea (NS). The basemap shapefile used in this figure was derived from the CIA World DataBank II, accessed via the mapdata package in R.

(DOCX)

pbio.3002950.s042.docx (225.5KB, docx)
S28 Fig. Pairwise correlation between the 5 independent runs of LDhelmet.

Spearman’s rank correlation matrix for the 5 populations, p-value <0.05. The data and codes underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953.

(DOCX)

pbio.3002950.s043.docx (127KB, docx)
S29 Fig. Patterns of population recombination rate variation controlled for sequencing coverage.

(A) Average population recombination rate; (B) average hotspot density; and (C) average population recombination rate in hotspots, according to average sequencing coverage. Student t test p-values and Cohen’s D coefficient are shown in A, B, and C. (D) Fold recombination rates (scaled by the average recombination rate at 50 kb from the nearest feature) according to the distance to the nearest TSS (overlapping or not with a CGI) shown in color, and to the mean depth (shown by the line type). (E) Fold recombination rates (scaled by the average recombination rates in intergenic regions); and (F) hotspot density at the indicated genomic features according to the mean coverage shown in color. TSS and TES were defined as the first and last positions of each gene. CGIs were mapped using EMBOSS with CpGoe > 0.6 and GC > 0. “High” sequencing coverage corresponds to the half of the recombination map with the highest depth and “low” sequencing coverage corresponds to the half of the recombination map with the lowest depth. Only the NS population of S. salar is shown. The data and codes underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953.

(DOCX)

pbio.3002950.s044.docx (314.9KB, docx)
S30 Fig. Hotspot sharing between populations controlled for sequencing coverage.

Fold recombination rates around hotspots and at orthologous loci in the 2 taxa, for the 2 Oncorhynchus species (A and B), the American (GP population) and European (BS and NS populations) S. salar lineages (D and E), and between the 2 closely related European S. salar populations (BS and NS) (G and H), according to the mean coverage shown in panels. Random expectations (blue) and observed values (orange) of shared hotspots between (C) O. kisutch and O. mykiss; between S. salar populations (F) GP and BS; (I) BS and NS, according to the mean coverage shown in panels. Shared hotspots were defined as 2 kb hotspots overlapping by at least 1 bp. Percent shared is calculated using the number of hotspots in the population with fewer hotspots as the denominator. The expected distribution of shared hotspots has been obtained from 1,000 pairwise comparisons of random spots. “High” sequencing coverage corresponds to the half of the recombination map with the highest depth and “low” sequencing coverage corresponds to the half of the recombination map with the lowest depth. The data and codes underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953.

(DOCX)

pbio.3002950.s045.docx (301.7KB, docx)

Acknowledgments

We thank the European project AQUA-FAANG for sharing epigenetic data of O. mykiss and S. salar prior to publication. These data are now available at ENA (PRJEB57956 and PRJEB55063). We thank the Aqua Genome project for providing published Wild Atlantic salmon genome sequencing data. We particularly thank Sigbjørn Lien, Marie-Odile Baudement, and Yann Guiguen for helpful discussions, and Gareth Gillard for bioinformatic analysis of AQUA-FAANG data. We also thank Guillaume Evanno for providing wild Atlantic salmon samples, and Rajalekshmi Navaryana Sarma for help with Prdm9 genotyping. We are grateful to Ben Coop and Eric Rondeau for providing us with the re-sequenced genomes and genotype data set of coho salmon. We thank Louis Bernatchez, Eric Normandeau, and Maeva Leitwein for providing the whole genome bisulfite sequencing methylation data of coho salmon. We also thank Thomas Brazier, Nicolas Lartillot, and Carina Mugal for helpful discussions and feedback.

Abbreviations

BS

Barents Sea

ChIP

chromatin immunoprecipitation

CO

crossover

DSB

double-strand break

GD

gene duplication

GP

Gaspesie Peninsula

IDR

irreproducible discovery rate

LD

linkage disequilibrium

MAC

minor allele count

NS

North Sea

SD

segmental duplication

SNP

single-nucleotide polymorphism

SRA

Sequence Read Archive

TE

transposable element

TES

transcription end site

TSS

transcription start site

WGD

whole genome duplication

ZF

zinc finger

Data Availability

Sequencing data from ChIP-seq and called peaks have been deposited in the Gene Expression Omnibus (GEO) under accession GSE277449, as part of BioProject PRJNA1162462. Bioinformatic scripts as well as processed data sets (VCF files, LD-based recombination maps and hotspots, PRDM9 protein sequences and multiple alignments, ChIP-Seq fragments and peaks coordinates) are available on Zenodo (https://doi.org/10.5281/zenodo.11083953). Scripts for ChIP-seq data analysis are available at https://zenodo.org/records/14198856 (SSDS pipeline), https://zenodo.org/records/14198863 (SSDS-extra pipeline) and https://zenodo.org/records/3966161 (Next-flow ChIPseq pipeline).

Funding Statement

This project was funded by CNRS (Centre national de la recherche scientifique; https://www.cnrs.fr) and by ANR (Agence nationale de la recherche; https://anr.fr) (HotRec ANR-19-CE12-0019) to LD, NG and BdM. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Hunter N. Meiotic Recombination: The Essence of Heredity. Cold Spring Harb Perspect Biol. 2015;7(12). doi: 10.1101/cshperspect.a016618 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Zickler D, Kleckner N. Meiosis: Dances Between Homologs. Annu Rev Genet. 2023;57:1–63. doi: 10.1146/annurev-genet-061323-044915 [DOI] [PubMed] [Google Scholar]
  • 3.Nagaoka SI, Hassold TJ, Hunt PA. Human aneuploidy: mechanisms and new insights into an age-old problem. Nat Rev Genet. 2012;13(7):493–504. doi: 10.1038/nrg3245 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Coop G, Przeworski M. An evolutionary view of human recombination. Nat Rev Genet. 2007;8(1):23–34. doi: 10.1038/nrg1947 [DOI] [PubMed] [Google Scholar]
  • 5.Otto SP, Lenormand T. Resolving the paradox of sex and recombination. Nat Rev Genet. 2002;3(4):252–61. doi: 10.1038/nrg761 [DOI] [PubMed] [Google Scholar]
  • 6.Charlesworth B, Morgan MT, Charlesworth D. The effect of deleterious mutations on neutral molecular variation. Genetics. 1993;134(4):1289–303. doi: 10.1093/genetics/134.4.1289 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Ritz KR, Noor MAF, Singh ND. Variation in Recombination Rate: Adaptive or Not? Trends Genet. 2017;33(5):364–74. doi: 10.1016/j.tig.2017.03.003 [DOI] [PubMed] [Google Scholar]
  • 8.Smith JM, Haigh J. The hitch-hiking effect of a favourable gene. Genet Res. 1974;23(1):23–35. [PubMed] [Google Scholar]
  • 9.Pazhayam NM, Turcotte CA, Sekelsky J. Meiotic Crossover Patterning. Front Cell Dev Biol. 2021;9:681123. doi: 10.3389/fcell.2021.681123 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Penalba JV, Wolf JBW. From molecules to populations: appreciating and estimating recombination rate variation. Nat Rev Genet. 2020;21(8):476–92. doi: 10.1038/s41576-020-0240-1 [DOI] [PubMed] [Google Scholar]
  • 11.Stapley J, Feulner PGD, Johnston SE, Santure AW, Smadja CM. Variation in recombination frequency and distribution across eukaryotes: patterns and processes. Philos Trans R Soc Lond B Biol Sci. 2017;372(1736). doi: 10.1098/rstb.2016.0455 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Haenel Q, Laurentino TG, Roesti M, Berner D. Meta-analysis of chromosome-scale crossover rate variation in eukaryotes and its significance to evolutionary genomics. Mol Ecol. 2018;27(11):2477–97. doi: 10.1111/mec.14699 [DOI] [PubMed] [Google Scholar]
  • 13.Chan AH, Jenkins PA, Song YS. Genome-Wide Fine-Scale Recombination Rate Variation in Drosophila melanogaster. PLoS Genet. 2012;8(12):e1003090. doi: 10.1371/journal.pgen.1003090 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Kaur T, Rockman MV. Crossover heterogeneity in the absence of hotspots in Caenorhabditis elegans. Genetics. 2014;196(1):137–48. doi: 10.1534/genetics.113.158857 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Wallberg A, Glemin S, Webster MT. Extreme Recombination Frequencies Shape Genome Variation and Evolution in the Honeybee, Apis mellifera. PLoS Genet. 2015;11(4):e1005189. doi: 10.1371/journal.pgen.1005189 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Jeffreys AJ, Holloway JK, Kauppi L, May CA, Neumann R, Slingsby MT, et al. Meiotic recombination hot spots and human DNA diversity. Philos Trans R Soc Lond B Biol Sci. 2004;359(1441):141–52. doi: 10.1098/rstb.2003.1372 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.McVean GA, Myers SR, Hunt S, Deloukas P, Bentley DR, Donnelly P. The fine-scale structure of recombination rate variation in the human genome. Science. 2004;304(5670):581–4. doi: 10.1126/science.1092500 [DOI] [PubMed] [Google Scholar]
  • 18.Auton A, Rui Li Y, Kidd J, Oliveira K, Nadel J, Holloway JK, et al. Genetic Recombination Is Targeted towards Gene Promoter Regions in Dogs. PLoS Genet. 2013;9(12):e1003984. doi: 10.1371/journal.pgen.1003984 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Baker Z, Schumer M, Haba Y, Bashkirova L, Holland C, Rosenthal GG, et al. Repeated losses of PRDM9-directed recombination despite the conservation of PRDM9 across vertebrates. Elife. 2017;6. doi: 10.7554/eLife.24133 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Choi K, Zhao X, Tock AJ, Lambing C, Underwood CJ, Hardcastle TJ, et al. Nucleosomes and DNA methylation shape meiotic DSB frequency in Arabidopsis thaliana transposons and gene regulatory regions. Genome Res. 2018;28(4):532–46. doi: 10.1101/gr.225599.117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Fowler KR, Sasaki M, Milman N, Keeney S, Smith GR. Evolutionarily diverse determinants of meiotic DNA break and recombination landscapes across the genome. Genome Res. 2014;24(10):1650–64. doi: 10.1101/gr.172122.114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Kawakami T, Mugal CF, Suh A, Nater A, Burri R, Smeds L, et al. Whole-genome patterns of linkage disequilibrium across flycatcher populations clarify the causes and consequences of fine-scale recombination rate variation in birds. Mol Ecol. 2017;26(16):4158–72. doi: 10.1111/mec.14197 [DOI] [PubMed] [Google Scholar]
  • 23.Lam I, Keeney S. Nonparadoxical evolutionary stability of the recombination initiation landscape in yeast. Science. 2015;350(6263):932–7. doi: 10.1126/science.aad0814 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Pan J, Sasaki M, Kniewel R, Murakami H, Blitzblau HG, Tischfield SE, et al. A Hierarchical Combination of Factors Shapes the Genome-wide Topography of Yeast Meiotic Recombination Initiation. Cell. 2011;144(5):719–31. doi: 10.1016/j.cell.2011.02.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Singhal S, Leffler EM, Sannareddy K, Turner I, Venn O, Hooper DM, et al. Stable recombination hotspots in birds. Science. 2015;350(6263):928–32. doi: 10.1126/science.aad0843 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Choi K, Henderson IR. Meiotic recombination hotspots—a comparative view. Plant J. 2015;83(1):52–61. doi: 10.1111/tpj.12870 [DOI] [PubMed] [Google Scholar]
  • 27.Axelsson E, Webster MT, Ratnakumar A, Ponting CP, Lindblad-Toh K. Death of PRDM9 coincides with stabilization of the recombination landscape in the dog genome. Genome Res. 2012;22(1):51–63. doi: 10.1101/gr.124123.111 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Dutreux F, Dutta A, Peltier E, Bibi-Triki S, Friedrich A, Llorente B, et al. Lessons from the meiotic recombination landscape of the ZMM deficient budding yeast Lachancea waltii. PLoS Genet. 2023;19(1):e1010592. doi: 10.1371/journal.pgen.1010592 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Auton A, Fledel-Alon A, Pfeifer S, Venn O, Segurel L, Street T, et al. A Fine-Scale Chimpanzee Genetic Map from Population Sequencing. Science. 2012;336(6078):193–8. doi: 10.1126/science.1216872 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Brick K, Smagulova F, Khil P, Camerini-Otero RD, Petukhova GV. Genetic recombination is directed away from functional genomic elements in mice. Nature. 2012;485(7400):642–5. doi: 10.1038/nature11089 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Coop G, Wen X, Ober C, Pritchard JK, Przeworski M. High-Resolution Mapping of Crossovers Reveals Extensive Variation in Fine-Scale Recombination Patterns Among Humans. Science. 2008;319:1395–8. doi: 10.1126/science.1151851 [DOI] [PubMed] [Google Scholar]
  • 32.Myers S, Bowden R, Tumian A, Bontrop RE, Freeman C, MacFie TS, et al. Drive against hotspot motifs in primates implicates the PRDM9 gene in meiotic recombination. Science. 2010;327(5967):876–9. doi: 10.1126/science.1182363 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Pratto F, Brick K, Khil P, Smagulova F, Petukhova GV, Camerini-Otero RD. DNA recombination. Recombination initiation maps of individual human genomes. Science. 2014;346(6211):1256442. doi: 10.1126/science.1256442 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Smagulova F, Brick K, Pu Y, Camerini-Otero RD, Petukhova GV. The evolutionary turnover of recombination hot spots contributes to speciation in mice. Genes Dev. 2016;30(3):266–80. doi: 10.1101/gad.270009.115 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Wooldridge LK, Dumont BL. Rapid Evolution of the Fine-scale Recombination Landscape in Wild House Mouse (Mus musculus) Populations Mol Biol Evol. 2023;40(1). doi: 10.1093/molbev/msac267 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Baudat F, Buard J, Grey C, Fledel-Alon A, Ober C, Przeworski M, et al. PRDM9 is a major determinant of meiotic recombination hotspots in humans and mice. Science. 2010;327(5967):836–40. doi: 10.1126/science.1183439 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Parvanov ED, Petkov PM, Paigen K. Prdm9 controls activation of mammalian recombination hotspots. Science. 2010;327(5967):835. doi: 10.1126/science.1181495 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Grey C, Baudat F, de Massy B. PRDM9, a driver of the genetic map. PLoS Genet. 2018;14(8):e1007479. doi: 10.1371/journal.pgen.1007479 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Baker CL, Kajita S, Walker M, Saxl RL, Raghupathy N, Choi K, et al. PRDM9 Drives Evolutionary Erosion of Hotspots in Mus musculus through Haplotype-Specific Initiation of Meiotic Recombination. PLoS Genet. 2015;11(1):e1004916. doi: 10.1371/journal.pgen.1004916 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Lesecque Y, Glemin S, Lartillot N, Mouchiroud D, Duret L. The red queen model of recombination hotspots evolution in the light of archaic and modern human genomes. PLoS Genet. 2014;10(11):e1004790. doi: 10.1371/journal.pgen.1004790 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Alleva B, Brick K, Pratto F, Huang M, Camerini-Otero RD. Cataloging Human PRDM9 Allelic Variation Using Long-Read Sequencing Reveals PRDM9 Population Specificity and Two Distinct Groupings of Related Alleles. Front Cell Dev Biol. 2021;9:675286. doi: 10.3389/fcell.2021.675286 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Berg IL, Neumann R, Lam KW, Sarbajna S, Odenthal-Hesse L, May CA, et al. PRDM9 variation strongly influences recombination hot-spot activity and meiotic instability in humans. Nat Genet. 2010;42(10):859–63. doi: 10.1038/ng.658 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Buard J, Rivals E, Dunoyer de Segonzac D, Garres C, Caminade P, de Massy B, et al. Diversity of Prdm9 Zinc Finger Array in Wild Mice Unravels New Facets of the Evolutionary Turnover of this Coding Minisatellite. PLoS ONE. 2014;9(1):e85021. doi: 10.1371/journal.pone.0085021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Damm E, Ullrich KK, Amos WB, Odenthal-Hesse L. Evolution of the recombination regulator PRDM9 in minke whales. BMC Genomics. 2022;23(1):212. doi: 10.1186/s12864-022-08305-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Kono H, Tamura M, Osada N, Suzuki H, Abe K, Moriwaki K, et al. Prdm9 polymorphism unveils mouse evolutionary tracks. DNA Res. 2014;21(3):315–26. doi: 10.1093/dnares/dst059 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Schwartz JJ, Roach DJ, Thomas JH, Shendure J. Primate evolution of the recombination regulator PRDM9. Nat Commun. 2014;5:4370. doi: 10.1038/ncomms5370 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Baker Z, Przeworski M, Sella G. Down the Penrose stairs, or how selection for fewer recombination hotspots maintains their existence. Elife. 2023;12. doi: 10.7554/eLife.83769 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Latrille T, Duret L, Lartillot N. The Red Queen model of recombination hot-spot evolution: a theoretical investigation. Philos Trans R Soc Lond B Biol Sci. 2017;372(1736). doi: 10.1098/rstb.2016.0463 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Ubeda F, Wilkins JF. The Red Queen theory of recombination hotspots. J Evol Biol. 2011;24(3):541–53. doi: 10.1111/j.1420-9101.2010.02187.x [DOI] [PubMed] [Google Scholar]
  • 50.Genestier A, Duret L, Lartillot N. Bridging the gap between the evolutionary dynamics and the molecular mechanisms of meiosis: A model based exploration of the PRDM9 intra-genomic Red Queen. PLoS Genet. 2024;20(5):e1011274. doi: 10.1371/journal.pgen.1011274 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Davies B, Hatton E, Altemose N, Hussin JG, Pratto F, Zhang G, et al. Re-engineering the zinc fingers of PRDM9 reverses hybrid sterility in mice. Nature. 2016;530(7589):171–6. doi: 10.1038/nature16931 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Forejt J, Jansa P, Parvanov E. Hybrid sterility genes in mice (Mus musculus): a peculiar case of PRDM9 incompatibility. Trends Genet. 2021;37(12):1095–108. doi: 10.1016/j.tig.2021.06.008 [DOI] [PubMed] [Google Scholar]
  • 53.Gregorova S, Gergelits V, Chvatalova I, Bhattacharyya T, Valiskova B, Fotopulosova V, et al. Modulation of Prdm9-controlled meiotic chromosome asynapsis overrides hybrid sterility in mice. Elife. 2018;7. doi: 10.7554/eLife.34282 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Ponting CP. What are the genomic drivers of the rapid evolution of PRDM9? Trends Genet. 2011;27(5):165–71. doi: 10.1016/j.tig.2011.02.001 [DOI] [PubMed] [Google Scholar]
  • 55.Sandor C, Li W, Coppieters W, Druet T, Charlier C, Georges M. Genetic Variants in REC8, RNF212, and PRDM9 Influence Male Recombination in Cattle. PLoS Genet. 2012;8(7):e1002854. doi: 10.1371/journal.pgen.1002854 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Cavassim MIA, Baker Z, Hoge C, Schierup MH, Schumer M, Przeworski M. PRDM9 losses in vertebrates are coupled to those of paralogs ZCWPW1 and ZCWPW2. Proc Natl Acad Sci U S A. 2022;119(9). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Venu V, Harjunmaa E, Dreau A, Brady S, Absher D, Kingsley DM, et al. Fine-scale contemporary recombination variation and its fitness consequences in adaptively diverging stickleback fish. Nat Ecol Evol. 2024;8(7):1337–52. doi: 10.1038/s41559-024-02434-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Mihola O, Landa V, Pratto F, Brick K, Kobets T, Kusari F, et al. Rat PRDM9 shapes recombination landscapes, duration of meiosis, gametogenesis, and age of fertility. BMC Biol. 2021;19(1):86. doi: 10.1186/s12915-021-01017-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Hoge C, de Manuel M, Mahgoub M, Okami N, Fuller Z, Banerjee S, et al. Patterns of recombination in snakes reveal a tug-of-war between PRDM9 and promoter-like features. Science. 2024;383(6685):eadj7026. doi: 10.1126/science.adj7026 [DOI] [PubMed] [Google Scholar]
  • 60.Schield DR, Pasquesi GIM, Perry BW, Adams RH, Nikolakis ZL, Westfall AK, et al. Snake Recombination Landscapes Are Concentrated in Functional Regions despite PRDM9. Mol Biol Evol. 2020;37(5):1272–94. doi: 10.1093/molbev/msaa003 [DOI] [PubMed] [Google Scholar]
  • 61.Shanfelter AF, Archambeault SL, White MA. Divergent Fine-Scale Recombination Landscapes between a Freshwater and Marine Population of Threespine Stickleback Fish. Genome Biol Evol. 2019;11(6):1573–85. doi: 10.1093/gbe/evz090 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Versoza CJ, Rivera JA, Rosenblum EB, Vital-Garcia C, Hews DK, Pfeifer SP. The recombination landscapes of spiny lizards (genus Sceloporus). G3 (Bethesda). 2022;12(2). doi: 10.1093/g3journal/jkab402 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Christoffels A, Koh EG, Chia JM, Brenner S, Aparicio S, Venkatesh B. Fugu genome analysis provides evidence for a whole-genome duplication early during the evolution of ray-finned fishes. Mol Biol Evol. 2004;21(6):1146–51. doi: 10.1093/molbev/msh114 [DOI] [PubMed] [Google Scholar]
  • 64.Macqueen DJ, Johnston IA. A well-constrained estimate for the timing of the salmonid whole genome duplication reveals major decoupling from species diversification. Proc Biol Sci. 2014;281(1778):20132881. doi: 10.1098/rspb.2013.2881 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Vandepoele K, De Vos W, Taylor JS, Meyer A, Van de Peer Y. Major events in the genome evolution of vertebrates: paranome age and size differ considerably between ray-finned fishes and land vertebrates. Proc Natl Acad Sci U S A. 2004;101(6):1638–43. doi: 10.1073/pnas.0307968100 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Lien S, Koop BF, Sandve SR, Miller JR, Kent MP, Nome T, et al. The Atlantic salmon genome provides insights into rediploidization. Nature. 2016;533(7602):200–5. doi: 10.1038/nature17164 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Pearse DE, Barson NJ, Nome T, Gao G, Campbell MA, Abadía-Cardoso A, et al. Sex-dependent dominance maintains migration supergene in rainbow trout. Nat Ecol Evol. 2019;3(12):1731–42. doi: 10.1038/s41559-019-1044-6 [DOI] [PubMed] [Google Scholar]
  • 68.Sutherland BJG, Gosselin T, Normandeau E, Lamothe M, Isabel N, Audet C, et al. Salmonid Chromosome Evolution as Revealed by a Novel Method for Comparing RADseq Linkage Maps. Genome Biol Evol. 2016;8(12):3600–17. doi: 10.1093/gbe/evw262 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Khil PP, Smagulova F, Brick KM, Camerini-Otero RD, Petukhova GV. Sensitive mapping of recombination hotspots using sequencing-based detection of ssDNA. Genome Res. 2012;22(5):957–65. doi: 10.1101/gr.130583.111 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Brekke C, Johnston SE, Knutsen TM, Berg P. Genetic architecture of individual meiotic crossover rate and distribution in Atlantic Salmon. Sci Rep. 2023;13(1):20481. doi: 10.1038/s41598-023-47208-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Paigen K, Petkov PM. PRDM9 and Its Role in Genetic Recombination. Trends Genet. 2018;34(4):291–300. doi: 10.1016/j.tig.2017.12.017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Myers S, Bottolo L, Freeman C, McVean G, Donnelly P. A fine-scale map of recombination rates and hotspots across the human genome. Science. 2005;310(5746):321–4. doi: 10.1126/science.1117196 [DOI] [PubMed] [Google Scholar]
  • 73.Brunschwig H, Levi L, Ben-David E, Williams RW, Yakir B, Shifman S. Fine-scale maps of recombination rates and hotspots in the mouse genome. Genetics. 2012;191(3):757–64. doi: 10.1534/genetics.112.141036 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Booker TR, Ness RW, Keightley PD. The Recombination Landscape in Wild House Mice Inferred Using Population Genomic Data. Genetics. 2017;207(1):297–309. doi: 10.1534/genetics.117.300063 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Griot RA, Phocas F, Brard-Fudulea S, Morvezen R, Bestin A, Pierrick Haffray P, et al. Genome-wide association studies for resistance to viral nervous necrosis in three populations of European sea bass (Dicentrarchus labrax) using a novel 57k SNP array DlabChip. Aquaculture. 2021;530(735930). [Google Scholar]
  • 76.Kodama M, Brieuc MS, Devlin RH, Hard JJ, Naish KA. Comparative mapping between Coho Salmon (Oncorhynchus kisutch) and three other salmonids suggests a role for chromosomal rearrangements in the retention of duplicated regions following a whole genome duplication event. G3 (Bethesda). 2014;4(9):1717–30. doi: 10.1534/g3.114.012294 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Tsai HY, Robledo D, Lowe NR, Bekaert M, Taggart JB, Bron JE, et al. Construction and Annotation of a High Density SNP Linkage Map of the Atlantic Salmon (Salmo salar) Genome G3 (Bethesda). 2016;6(7):2173–9. doi: 10.1534/g3.116.029009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Cross S, Kovarik P, Schmidtke J, Bird A. Non-methylated islands in fish genomes are GC-poor. Nucleic Acids Res. 1991;19(7):1469–74. doi: 10.1093/nar/19.7.1469 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Long HK, Sims D, Heger A, Blackledge NP, Kutter C, Wright ML, et al. Epigenetic conservation at gene regulatory elements revealed by non-methylated DNA profiling in seven vertebrates. Elife. 2013;2:e00348. doi: 10.7554/eLife.00348 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Corbett-Detig RB, Hartl DL, Sackton TB. Natural selection constrains neutral diversity across a wide range of species. PLoS Biol. 2015;13(4):e1002112. doi: 10.1371/journal.pbio.1002112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Hinch R, Donnelly P, Hinch AG. Meiotic DNA breaks drive multifaceted mutagenesis in the human germ line. Science. 2023;382(6674):eadh2531. doi: 10.1126/science.adh2531 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Spencer CC. Human polymorphism around recombination hotspots. Biochem Soc Trans. 2006;34(Pt 4):535–6. doi: 10.1042/BST0340535 [DOI] [PubMed] [Google Scholar]
  • 83.Raynaud M, Gagnaire P-A, Galtier N. Performance and limitations of linkage-disequilibrium-based methods for inferring the genomic landscape of recombination and detecting hotspots: a simulation study. Peer Community J. 2023;3. [Google Scholar]
  • 84.Clement Y, Arndt PF. Meiotic recombination strongly influences GC-content evolution in short regions in the mouse genome. Mol Biol Evol. 2013;30(12):2612–8. doi: 10.1093/molbev/mst154 [DOI] [PubMed] [Google Scholar]
  • 85.Duret L, Galtier N. Biased gene conversion and the evolution of mammalian genomic landscapes. Annu Rev Genomics Hum Genet. 2009;10:285–311. doi: 10.1146/annurev-genom-082908-150001 [DOI] [PubMed] [Google Scholar]
  • 86.Crête-Lafrenière A, Weir LK, Bernatchez L. Framing the Salmonidae family phylogenetic portrait: a more complete picture from increased taxon sampling. PLoS ONE. 2012;7(10):e46662. doi: 10.1371/journal.pone.0046662 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Kong A, Thorleifsson G, Frigge ML, Masson G, Gudbjartsson DF, Villemoes R, et al. Common and low-frequency variants associated with genome-wide recombination rate. Nat Genet. 2014;46(1):11–6. doi: 10.1038/ng.2833 [DOI] [PubMed] [Google Scholar]
  • 88.Kong A, Thorleifsson G, Gudbjartsson DF, Masson G, Sigurdsson A, Jonasdottir A, et al. Fine-scale recombination rate differences between sexes, populations and individuals. Nature. 2010;467(7319):1099–103. doi: 10.1038/nature09525 [DOI] [PubMed] [Google Scholar]
  • 89.Hinch AG, Tandon A, Patterson N, Song Y, Rohland N, Palmer CD, et al. The landscape of recombination in African Americans. Nature. 2011;476(7359):170–5. doi: 10.1038/nature10336 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Campbell CL, Bherer C, Morrow BE, Boyko AR, Auton A. A Pedigree-Based Map of Recombination in the Domestic Dog Genome. G3 (Bethesda). 2016;6(11):3517–24. doi: 10.1534/g3.116.034678 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Zelkowski M, Olson MA, Wang M, Pawlowski W. Diversity and Determinants of Meiotic Recombination Landscapes. Trends Genet. 2019;35(5):359–70. doi: 10.1016/j.tig.2019.02.002 [DOI] [PubMed] [Google Scholar]
  • 92.Ahlawat S, De S, Sharma P, Sharma R, Arora R, Kataria RS, et al. Evolutionary dynamics of meiotic recombination hotspots regulator PRDM9 in bovids. Mol Genet Genomics. 2017;292(1):117–31. doi: 10.1007/s00438-016-1260-6 [DOI] [PubMed] [Google Scholar]
  • 93.Oliver PL, Goodstadt L, Bayes JJ, Birtle Z, Roach KC, Phadnis N, et al. Accelerated Evolution of the Prdm9 Speciation Gene across Diverse Metazoan Taxa. PLoS Genet. 2009;5(12):e1000753. doi: 10.1371/journal.pgen.1000753 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Stevison LS, Woerner AE, Kidd JM, Kelley JL, Veeramah KR, McManus KF, et al. The Time Scale of Recombination Rate Evolution in Great Apes. Mol Biol Evol. 2016;33(4):928–45. doi: 10.1093/molbev/msv331 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Spence JP, Song YS. Inference and analysis of population-specific fine-scale recombination maps across 26 diverse human populations. Sci Adv. 2019;5(10):eaaw9206. doi: 10.1126/sciadv.aaw9206 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Wu H, Mathioudakis N, Diagouraga B, Dong A, Dombrovski L, Baudat F, et al. Molecular Basis for the Regulation of the H3K4 Methyltransferase Activity of PRDM9. Cell Rep. 2013;5(1):13–20. doi: 10.1016/j.celrep.2013.08.035 [DOI] [PubMed] [Google Scholar]
  • 97.Schwarz T, Striedner Y, Horner A, Haase K, Kemptner J, Zeppezauer N, et al. PRDM9 forms a trimer by interactions within the zinc finger array. Life Sci Alliance. 2019;2(4). doi: 10.26508/lsa.201800291 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Hohenauer T, Moore AW. The Prdm family: expanding roles in stem cells and development. Development. 2012;139(13):2267–82. doi: 10.1242/dev.070110 [DOI] [PubMed] [Google Scholar]
  • 99.Fumasoni I, Meani N, Rambaldi D, Scafetta G, Alcalay M, Ciccarelli FD. Family expansion and gene rearrangements contributed to the functional specialization of PRDM genes in vertebrates. BMC Evol Biol. 2007;7:187. doi: 10.1186/1471-2148-7-187 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.Baker CL, Petkova P, Walker M, Flachs P, Mihola O, Trachtulec Z, et al. Multimer Formation Explains Allelic Suppression of PRDM9 Recombination Hotspots. PLoS Genet. 2015;11(9):e1005512. doi: 10.1371/journal.pgen.1005512 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 101.Flachs P, Mihola O, Simecek P, Gregorova S, Schimenti JC, Matsui Y, et al. Interallelic and intergenic incompatibilities of the prdm9 (hst1) gene in mouse hybrid sterility. PLoS Genet. 2012;8(11):e1003044. doi: 10.1371/journal.pgen.1003044 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Huang T, Yuan S, Gao L, Li M, Yu X, Zhang J, et al. The histone modification reader ZCWPW1 links histone methylation to PRDM9-induced double strand break repair. Elife. 2020;9. doi: 10.7554/eLife.53459 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Mahgoub M, Paiano J, Bruno M, Wu W, Pathuri S, Zhang X, et al. Dual histone methyl reader ZCWPW1 facilitates repair of meiotic double strand breaks in male mice. Elife. 2020;9. doi: 10.7554/eLife.53360 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104.Wells D, Bitoun E, Moralli D, Zhang G, Hinch A, Jankowska J, et al. ZCWPW1 is recruited to recombination hotspots by PRDM9, and is essential for meiotic double strand break repair. Elife. 2020;9. doi: 10.7554/eLife.53392 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105.Joseph J, Prentout D, Laverre A, Tricou T, Duret L. High prevalence of PRDM9-independent recombination hotspots in placental mammals. Proc Natl Acad Sci U S A. 2024;121(23):e2401973121. doi: 10.1073/pnas.2401973121 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106.Ranwez V, Douzery EJP, Cambon C, Chantret N, Delsuc F. MACSE v2: Toolkit for the Alignment of Coding Sequences Accounting for Frameshifts and Stop Codons. Mol Biol Evol. 2018;35(10):2582–4. doi: 10.1093/molbev/msy159 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107.Borowiec ML. AMAS: a fast tool for alignment manipulation and computing of summary statistics. PeerJ. 2016;4:e1660. doi: 10.7717/peerj.1660 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108.Nguyen LT, Schmidt HA, von Haeseler A, Minh BQ. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol. 2015;32(1):268–74. doi: 10.1093/molbev/msu300 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109.Hayashi K, Yoshida K, Matsui Y. A histone H3 methyltransferase controls epigenetic events required for meiotic prophase. Nature. 2005;438(7066):374–8. doi: 10.1038/nature04112 [DOI] [PubMed] [Google Scholar]
  • 110.Birney E, Clamp M, Durbin R. GeneWise and Genomewise. Genome Res. 2004;14(5):988–95. doi: 10.1101/gr.1865504 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 111.Billard R, Solari A, Escaffre AM. [Method for the quantitative analysis of spermatogenesis in teleost fish]. Ann Biol Anim Biochim Biophys. 1974;14(1):87–104. [PubMed] [Google Scholar]
  • 112.Diagouraga B, Clement JAJ, Duret L, Kadlec J, de Massy B, Baudat F. PRDM9 Methyltransferase Activity Is Essential for Meiotic DNA Double-Strand Break Formation at Its Binding Sites. Mol Cell. 2018;69(5):853–65 e6. doi: 10.1016/j.molcel.2018.01.033 [DOI] [PubMed] [Google Scholar]
  • 113.Tardat M, Brustel J, Kirsh O, Lefevbre C, Callanan M, Sardet C, et al. The histone H4 Lys 20 methyltransferase PR-Set7 regulates replication origins in mammalian cells. Nat Cell Biol. 2010;12(11):1086–93. doi: 10.1038/ncb2113 [DOI] [PubMed] [Google Scholar]
  • 114.Brick K, Pratto F, Sun CY, Camerini-Otero RD, Petukhova G. Analysis of Meiotic Double-Strand Break Initiation in Mammals. Methods Enzymol. 2018;601:391–418. doi: 10.1016/bs.mie.2017.11.037 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 115.Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, et al. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020;38(3):276–8. doi: 10.1038/s41587-020-0439-x [DOI] [PubMed] [Google Scholar]
  • 116.Ramírez F, Ryan DP, Grüning B, Bhardwaj V, Kilpert F, Richter AS, et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 2016;44(W1):W160–5. doi: 10.1093/nar/gkw257 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 117.Auffret P, de Massy B, Clement JAJ. Mapping Meiotic DNA Breaks: Two Fully-Automated Pipelines to Analyze Single-Strand DNA Sequencing Data, hotSSDS and hotSSDS-extra. Methods Mol Biol. 2024;2770:227–61. doi: 10.1007/978-1-0716-3698-5_16 [DOI] [PubMed] [Google Scholar]
  • 118.Rondeau EB, Christensen KA, Minkley DR, Leong JS, Chan MTT, Despins CA, et al. Population-size history inferences from the coho salmon (Oncorhynchus kisutch) genome. G3 (Bethesda). 2023;13(4). doi: 10.1093/g3journal/jkad033 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 119.Gao G, Nome T, Pearse DE, Moen T, Naish KA, Thorgaard GH, et al. A New Single Nucleotide Polymorphism Database for Rainbow Trout Generated Through Whole Genome Resequencing. Front Genet. 2018;9:147. doi: 10.3389/fgene.2018.00147 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 120.Bertolotti AC, Layer RM, Gundappa MK, Gallagher MD, Pehlivanoglu E, Nome T, et al. The structural variation landscape in 492 Atlantic salmon genomes. Nat Commun. 2020;11(1):5176. doi: 10.1038/s41467-020-18972-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 121.McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20(9):1297–303. doi: 10.1101/gr.107524.110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 122.Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, Del Angel G, Levy-Moonshine A, et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr Protoc Bioinformatics. 2013;43(1110):11.0.1-.0.33. doi: 10.1002/0471250953.bi1110s43 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 123.Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The variant call format and VCFtools. Bioinformatics. 2011;27(15):2156–8. doi: 10.1093/bioinformatics/btr330 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 124.Martin M, Ebert P, Marschall T. Read-Based Phasing and Analysis of Phased Variants with WhatsHap. Methods Mol Biol. 2023;2590:127–38. doi: 10.1007/978-1-0716-2819-5_8 [DOI] [PubMed] [Google Scholar]
  • 125.Delaneau O, Zagury JF, Robinson MR, Marchini JL, Dermitzakis ET. Accurate, scalable and integrative haplotype estimation. Nat Commun. 2019;10(1):5436. doi: 10.1038/s41467-019-13225-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 126.Keightley PD, Jackson BC. Inferring the Probability of the Derived vs. the Ancestral Allelic State at a Polymorphic Site. Genetics. 2018;209(3):897–906. doi: 10.1534/genetics.118.301120 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 127.Crespi BJ, Teo R. Comparative phylogenetic analysis of the evolution of semelparity and life history in salmonid fishes. Evolution. 2002;56(5):1008–20. doi: 10.1111/j.0014-3820.2002.tb01412.x [DOI] [PubMed] [Google Scholar]
  • 128.Duranton M, Allal F, Valière S, Bouchez O, Bonhomme F, Gagnaire PA. The contribution of ancient admixture to reproductive isolation between European sea bass lineages. Evol Lett. 2020;4(3):226–42. doi: 10.1002/evl3.169 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 129.Zhang C, Reid K, Sands AF, Fraimout A, Schierup MH, Merila J. De Novo Mutation Rates in Sticklebacks. Mol Biol Evol. 2023;40(9). doi: 10.1093/molbev/msad192 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 130.Feng C, Pettersson M, Lamichhaney S, Rubin CJ, Rafati N, Casini M, et al. Moderate nucleotide diversity in the Atlantic herring is associated with a low mutation rate. Elife. 2017;6. doi: 10.7554/eLife.23907 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 131.Malinsky M, Svardal H, Tyers AM, Miska EA, Genner MJ, Turner GF, et al. Whole-genome sequences of Malawi cichlids reveal multiple radiations interconnected by gene flow. Nat Ecol Evol. 2018;2(12):1940–55. doi: 10.1038/s41559-018-0717-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 132.Burda K, Konczal M. Validation of machine learning approach for direct mutation rate estimation. Mol Ecol Resour. 2023;23(8):1757–71. doi: 10.1111/1755-0998.13841 [DOI] [PubMed] [Google Scholar]
  • 133.Roach JC, Glusman G, Smit AF, Huff CD, Hubley R, Shannon PT, et al. Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science. 2010;328(5978):636–9. doi: 10.1126/science.1186802 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 134.Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26(6):841–2. doi: 10.1093/bioinformatics/btq033 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 135.Flynn JM, Hubley R, Goubert C, Rosen J, Clark AG, Feschotte C, et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc Natl Acad Sci U S A. 2020;117(17):9451–7. doi: 10.1073/pnas.1921046117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 136.Larsen F, Gundersen G, Lopez R, Prydz H. CpG islands as gene markers in the human genome. Genomics. 1992;13(4):1095–107. doi: 10.1016/0888-7543(92)90024-m [DOI] [PubMed] [Google Scholar]
  • 137.Lawrence M, Huber W, Pagès H, Aboyoun P, Carlson M, Gentleman R, et al. Software for computing and annotating genomic ranges. PLoS Comput Biol. 2013;9(8):e1003118. doi: 10.1371/journal.pcbi.1003118 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 138.Bailey TL, Johnson J, Grant CE, Noble WS. The MEME Suite. Nucleic Acids Res. 2015;43(W1):W39–49. doi: 10.1093/nar/gkv416 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 139.Machanick P, Bailey TL. MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics. 2011;27(12):1696–7. doi: 10.1093/bioinformatics/btr189 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 140.Grant CE, Bailey TL, Noble WS. FIMO: scanning for occurrences of a given motif. Bioinformatics. 2011;27(7):1017–8. doi: 10.1093/bioinformatics/btr064 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 141.Bailey TL, Machanick P. Inferring direct DNA binding from ChIP-seq. Nucleic Acids Res. 2012;40(17):e128. doi: 10.1093/nar/gks433 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 142.Bailey TL. STREME: accurate and versatile sequence motif discovery. Bioinformatics. 2021;37(18):2834–40. doi: 10.1093/bioinformatics/btab203 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 143.Minkin I, Medvedev P. Scalable multiple whole-genome alignment and locally collinear block construction with SibeliaZ. Nat Commun. 2020;11(1):6327. doi: 10.1038/s41467-020-19777-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 144.Sigrist CJ, Cerutti L, Hulo N, Gattiker A, Falquet L, Pagni M, et al. PROSITE: a documented database using patterns and profiles as motif descriptors. Brief Bioinform. 2002;3(3):265–74. doi: 10.1093/bib/3.3.265 [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Roland G Roberts

26 Apr 2024

Dear Bernard,

Thank you for submitting your manuscript entitled "PRDM9 drives the location and rapid evolution of recombination hotspots in salmonids" for consideration as a Research Article by PLOS Biology. I'd like to apologise for the extraordinary delay incurred while we sought external advice.

Your Appeal has now been evaluated by the PLOS Biology editorial staff and I am writing to let you know that we would like to send your submission out for external peer review.

However, before we can send your manuscript to reviewers, we need you to complete your submission by providing the metadata that is required for full assessment. To this end, please login to Editorial Manager where you will find the paper in the 'Submissions Needing Revisions' folder on your homepage. Please click 'Revise Submission' from the Action Links and complete all additional questions in the submission questionnaire.

Once your full submission is complete, your paper will undergo a series of checks in preparation for peer review. After your manuscript has passed the checks it will be sent out for review. To provide the metadata for your submission, please Login to Editorial Manager (https://www.editorialmanager.com/pbiology) within two working days, i.e. by Apr 30 2024 11:59PM.

If your manuscript has been previously peer-reviewed at another journal, PLOS Biology is willing to work with those reviews in order to avoid re-starting the process. Submission of the previous reviews is entirely optional and our ability to use them effectively will depend on the willingness of the previous journal to confirm the content of the reports and share the reviewer identities. Please note that we reserve the right to invite additional reviewers if we consider that additional/independent reviewers are needed, although we aim to avoid this as far as possible. In our experience, working with previous reviews does save time.

If you would like us to consider previous reviewer reports, please edit your cover letter to let us know and include the name of the journal where the work was previously considered and the manuscript ID it was given. In addition, please upload a response to the reviews as a 'Prior Peer Review' file type, which should include the reports in full and a point-by-point reply detailing how you have or plan to address the reviewers' concerns.

During the process of completing your manuscript submission, you will be invited to opt-in to posting your pre-review manuscript as a bioRxiv preprint. Visit http://journals.plos.org/plosbiology/s/preprints for full details. If you consent to posting your current manuscript as a preprint, please upload a single Preprint PDF.

Feel free to email us at plosbiology@plos.org if you have any queries relating to your submission.

Kind regards,

Roli

Roland Roberts, PhD

Senior Editor

PLOS Biology

rroberts@plos.org

Decision Letter 1

Roland G Roberts

18 Jun 2024

Dear Bernard,

Thank you for your patience while your manuscript "PRDM9 drives the location and rapid evolution of recombination hotspots in salmonids" was peer-reviewed at PLOS Biology. It has now been evaluated by the PLOS Biology editors, an Academic Editor with relevant expertise, and by four independent reviewers. Please accept my apologies for the additional delay; we had some difficulties contacting the Academic Editor. In the end we chose to send the decision letter without their input, as the way forward seemed very clear, but it is possible that the AE might make some minor requests in the next round.

Reviewer #1 thinks that the conclusions are well supported, but that "the comparison between species is hard to interpret as presented" and that several analyses need to be improved. S/he wants more information in Table 1, consideration of the effects of the low read-depth of the salmon data, use of better statistical controls, and better framing of the paper with respect to the literature (e.g. findings on snakes); there is also a list of more minor concerns. Reviewer #2 is positive but has quite a few technical and presentational requests, mostly minor, but some will need some new analyses. Reviewer #3 is very positive and simply has one discussion point. Reviewer #4 is somewhat less enthusiastic about the advance, but has a long list of requests for clarification and improvement.

In light of the reviews, which you will find at the end of this email, we would like to invite you to revise the work to thoroughly address the reviewers' reports.

Given the extent of revision needed, we cannot make a decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is likely to be sent for further evaluation by all or a subset of the reviewers.

We expect to receive your revised manuscript within 3 months. Please email us (plosbiology@plos.org) if you have any questions or concerns, or would like to request an extension.

At this stage, your manuscript remains formally under active consideration at our journal; please notify us by email if you do not intend to submit a revision so that we may withdraw it.

**IMPORTANT - SUBMITTING YOUR REVISION**

Your revisions should address the specific points made by each reviewer. Please submit the following files along with your revised manuscript:

1. A 'Response to Reviewers' file - this should detail your responses to the editorial requests, present a point-by-point response to all of the reviewers' comments, and indicate the changes made to the manuscript.

*NOTE: In your point-by-point response to the reviewers, please provide the full context of each review. Do not selectively quote paragraphs or sentences to reply to. The entire set of reviewer comments should be present in full and each specific point should be responded to individually, point by point.

You should also cite any additional relevant literature that has been published since the original submission and mention any additional citations in your response.

2. In addition to a clean copy of the manuscript, please also upload a 'track-changes' version of your manuscript that specifies the edits made. This should be uploaded as a "Revised Article with Changes Highlighted" file type.

*Re-submission Checklist*

When you are ready to resubmit your revised manuscript, please refer to this re-submission checklist: https://plos.io/Biology_Checklist

To submit a revised version of your manuscript, please go to https://www.editorialmanager.com/pbiology/ and log in as an Author. Click the link labelled 'Submissions Needing Revision' where you will find your submission record.

Please make sure to read the following important policies and guidelines while preparing your revision:

*Published Peer Review*

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Please see here for more details:

https://blogs.plos.org/plos/2019/05/plos-journals-now-open-for-published-peer-review/

*PLOS Data Policy*

Please note that as a condition of publication PLOS' data policy (http://journals.plos.org/plosbiology/s/data-availability) requires that you make available all data used to draw the conclusions arrived at in your manuscript. If you have not already done so, you must include any data used in your manuscript either in appropriate repositories, within the body of the manuscript, or as supporting information (N.B. this includes any numerical values that were used to generate graphs, histograms etc.). For an example see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5

*Blot and Gel Data Policy*

We require the original, uncropped and minimally adjusted images supporting all blot and gel results reported in an article's figures or Supporting Information files. We will require these files before a manuscript can be accepted so please prepare them now, if you have not already uploaded them. Please carefully read our guidelines for how to prepare and upload this data: https://journals.plos.org/plosbiology/s/figures#loc-blot-and-gel-reporting-requirements

*Protocols deposition*

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Thank you again for your submission to our journal. We hope that our editorial process has been constructive thus far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Roli

Roland Roberts, PhD

Senior Editor

PLOS Biology

rroberts@plos.org

------------------------------------

REVIEWERS' COMMENTS:

Reviewer #1:

The authors use a combination of experimental and statistical approaches to examine the role of PRDM9 and infer fine-scale recombination rates in four fish species. They provide compelling evidence for the role of PRDM9 in directing double strand break locations in fish that carry an intact version of the gene, and a comparison of recombination landscapes in fish with and without an intact version.

My impression is that the overall conclusions are well supported, but that the comparison between species is hard to interpret as presented. I also believe a few analyses should be revisited-not because I expect the qualitative conclusions to change, but because I think they could be refined.

Major comments:

1. As the authors appreciate (e.g., https://peercommunityjournal.org/articles/10.24072/pcjournal.254/), maps inferred from patterns of linkage disequilibrium (LD) are shaped by the ratio of recombination rates to mutation rates, the effective population size, and genome coverage, and therefore cannot be taken at face value. Unfortunately, as a result, the comparison presented in Table 1 is hard to interpret. To do so, the reader would need to know, not just the estimate of rho per bp, but the estimates of Ne (or the estimates of r). It would also be good to provide the estimated ratio of recombination to mutation for each species.

2. If I understood correctly, the salmon samples were only resequenced to an average of 10X coverage (max 13X). This feature should lead to a substantially worse LD-based map, yet to my knowledge is not discussed, let alone analyzed. I would therefore suggest that the authors add all this information to the main text, and evaluate its possible impact on their conclusions, notably for the degree of overlap in Figure 5C.

3. On a related note, there is less power to detect hotspots in regions of high recombination, so comparisons of hotspot numbers and heats across the genome need to take that into account. As one example, it is not obvious to me that hotspots are actually hotter in telomeres (lines 344-345); perhaps it is rather that only hotter hotspots are detected?

4. A number of the statistical controls could be tweaked. For example, controls for hotspots are generated without regard to the larger recombination rate/genomic context. Similarly, locations are shuffled without regard to genome gaps, minimal diversity levels etc… It may also be better to use actual genome sequences matched for GC content for the motif searching analyses. To be clear, I am not predicting that the qualitative conclusion will change, but these decisions seem somewhat anti-conservative, and should probably be revisited.

5. This final comment is a matter of opinion, but I thought the framing of the paper was somewhat odd, in making it seem as if we have no idea whether PRDM9 plays the same role outside mammals, when we have evidence that it is active in snakes (cited by the authors) and also evidence that it co-evolves with ZCWPW1 across vertebrates (Cavassim et al. 2022 PNAS; cited elsewhere). In my view, that takes nothing away from the importance of demonstrating that it also directs recombination in fish, especially as this paper presents experimental evidence. In fact, fish are particularly interesting, given the presence of every version of the PRDM9 ortholog within a single taxon. Regardless, I think the authors should report whether ZC1PW1 is present and intact across the four fish species studied here (or only the three with PRDM9), and any differences in their PRDM9 SET domain.

Additional questions/comments:

1. In the phylogenetic analysis, how do the authors convince themselves that the calls are reliable, for example that the copies of alpha 1.2 are actually different?

2. In the experimental analysis, two salmon (TAC1 and TAC3) that share no PRDM9 alleles have 55 hotspots in common, whereas RT52 and TAC3, which share an allele, have only 42. The authors argue that the 55 are likely not real, but are the 42, e.g., do they show the expected asymmetry of reads?

3. It was unclear to me why the authors chose to use the older version of the motif prediction; do they think it is more reliable? I also wondered if the second motif reported in Figure 3C (which is effectively a series of As) could simply be a nucleosome-excluding sequence.

4. The authors write that "PRDM9 can suppress the recombination activity at chromatin accessible regions (30). Here, we found that this function is conserved in salmonids" (line 562). But unless I missed something, there is no evidence provided for active suppression by PRDM9 in fish-only that rates are not elevated in such regions (e.g., Figure S8 panel C). In that regard, I think it might be worth mentioning in the main text, especially for readers who are not experimentalists, that by looking at DMC1, the data do not strictly reflect double strand frequencies but also potential differences in repair efficiencies.

5. Why does the species without an intact PRDM9 have much larger hotspots (Figure S15) and less of an increase of recombination rates at the telomeres (Figure S16)? Are these real effects?

6. In the SOM on p. 16, there is what struck me as a surprising claim: that GC rich CpG islands result from GC-biased gene conversion. What is the evidence for this? How does it explain CpG islands in mammals with PRDM9, which do not recombine much in such genomic locations?

Reviewer #2:

I have reviewed the manuscript "PBIOLOGY-D-24-00615_R1" titled "PRDM9 drives the location and rapid evolution of recombination hotspots in salmonids by Marie Raynaud et al."

The authors present a comprehensive examination of the evolution and function of PRDM9 across Salmonids. The findings imply that PRDM9 function and its mode of action has been conserved for at least several hundred million years. Salmonid recombination shows several unifying features to the well-characterized mammalian models of PRDM9-mediated recombination - despite the fascinating genome duplication event in their past. Dependent on the presence of a full-length Prdm9 genotype, there was evidence for rapid evolution of the PRDM9 gene and positive selection on particular amino acids responsible for DNA binding. Furthermore, Meiotic DNA double-strand breaks (DSBs) maps (a proxy for recombination hotspots) concentrated outside promoter regions and displayed abundant H3K4me3 and H3K4me36. Finally, a fast turnover of recombination hotspots brought on by PRDM9 target motif erosion was shown by population-scaled recombination maps. Overall, this is a very nice paper presenting several significant findings. It is the first evidence of PRDM9-mediated recombination features in fishes. It provides an important piece in the puzzle of how conserved the features of PRDM9-mediated recombination are across much longer evolutionary timescales than previously evaluated.

I recommend publication upon minor revision.

Mayor:

Please clarify:

For the motif-enrichment analyses, you compared real hotspot data, with a control set of random sequences of equal size and GC-content, and I believe I understand how you generated these random sequences. You state (in the methods), "We retrieved a number of random windows equal to the number of hotspots with a similar GC content distribution"… For me to understand the control dataset, could you state the GC content of the real and random sequences and how "similar" it is to the real dataset? Can you perhaps state what the margins are (in % difference)? Also, did you check whether these "random" loci overlap any of the true hotspots? I believe this information to be crucial.

Graphical representation of the Supplementary material:

While, in general, the quality of the graphical representations was high in the main body of the manuscript, I was disappointed with the quality of some of the Supplementary Figures. Even though they display exciting findings, they miss the level of detail awarded to the main body of the text and seem to be rather poorly put together. There are several minor and not-so-minor problems that I have listed below:

In Figures S10 and S11, Legend and Title overlap, making the legend hard to read.

S15, A, and S16 in A: could the number of hotspots be brought on the same scale? Ie. For S15 ~15000 for both OKIS and OMYC and 4000 for the three Salmo salar, 400 for DLAB and the differences explained somehow? It's curious why the "outgroups" to Salmo salar have more than 3-fold difference in number of hotspots. And for S16, a maximum of 250 hotspots on the y-axis would fit all the data.

S17: "comparaison" should probably read "comparison", and "expectated" should probably read "expected."

Figure S18: comparisons between species are on different scales. i.e. O. mykiss, S. salar NS, BS and GP all have the max value of 4000 shown on the heatmap scale, and O kisutch has 3000. So, the heatmap is hard for me to interpret when all species' graphs for each variable (GC content, SNP density, etc.) are not on the same heatmap scale. I would find it helpful if the X-axis were on the same scale, i.e. -6 to 0 for the Recombination rate. In a perfect world, Y-axes would also match, but I find it understandable that the Y-axis has different scaling. Similarly, I have a hard time interpreting whether the TE density is different in O mykiss and O kisutch compared to the S. salar species, or whether the difference of the shape is driven by plotting at a different scale. In the same context, it would be nice if the Spearman rho values were all on the same horizontal line

Figure S19, at least SNP density could be put on the same scale, and all plots could have the same height; the legend for the S. salar populations would still fit if the plots were the same height. S23 lacks the top ruler number (3000) The same is seen in Figure S25, where the Frequency of PRDM91 and PRDM9 motifs are shown on different scales, S24, again, the n_neighbors heatmap is on various scales (0-6500) in one, (0-6000) in the other.

In Figure S26, the probability values are on different scales, from 0.000000 to 0.000100 (six decimals) in one graph and go up to 0.00025 (shown in 5 decimals) in another. The spacing and the numbers on the ruler also differ between graphs. This is again making comparisons extremely difficult to grasp visually. Please unify as much as possible. The p-value also overlaps with the X-axis, making it hard to read.

Minor:

Line 37: … in turns… Do you perhaps mean "in turn"? This phrase could also be deleted as such: "It increases genetic diversity by creating novel allele combinations (4, 5) that facilitate adaptation and the removal of deleterious mutations from natural populations (6-8)."

Line 41: "Broad-scale patterns of variation within chromosomes (megabase scale) have…"

Perhaps consider rephrasing to "…(at the megabase scale)...

Lines: 50 - 55: how about in canids? Also, the sentence starting in line 52, "In vertebrates…" is misleading, as only "default" "recombination hotspots are associated with TSSs that are located within CpG islands" in some vertebrates. But this is not generally true for all vertebrates, as you also pointed out later. Please rewrite this sentence to clarify the fact.

Line 100: "here" may be unnecessary in this context; to me the sentence makes more sense as

"To this aim, we investigated…"

Line 574 …". This telomere-proximal effect appears to be a conserved property, but of variable strength between sexes and among species with/without Prdm9"

I think it's important to note that telomere proximal enrichment of recombination hotspots is seen in humans and has been seen in dogs and birds as well (that lack PRDM9), including wild birds.

Line 609: "… suggesting that PRDM9 may be limiting in some contexts."

Do you mean that "PRDM9 dosage" may be limiting? Please clarify

977: I believe the name is Rajalekshmi Navaryana Sarma, not "Sarna."

Reviewer #3:

The work by Raynaud et al., is a straightforward investigation on the determinants of meiotic recombination in salmonids. They are a diverse family of teleost fish, where, crucially, a full length copy of the protein PRDM9 has been found. PRDM9 controls the location of recombination hotspots in mammals, though it has been recently shown that it can direct recombination in vertebrates. PRDM9 has a fascinating history of losses through evolution therefore it is of interest to characterize the recombination landscape in other non-mammalian taxa. Raynaud et al., analyzed the evolutionary dynamics of PRDM9 across the phylogeny of salmonid, identified the sites of recombination initiation in the rainbow trout and inferred historical recombination rates from patterns of linkage disequilibrium (LD) in five populations from three different species. The authors convincingly show that PRDM9 controls recombination hotspot activity in salmonids and that it is the predominant pathway utilized in these fish. Interestingly they also describe a PRDM9-independent pattern of elevated double-strand break formation and recombination in the telomere-proximal region, similarly to what was described in mammals.

Altogether, this is a very comprehensive and well-executed set of experiments that fully support the major conclusions of the paper. I recommend its publication without reservations as it provides an important piece in the puzzle of how recombination arose and is controlled in and beyond mammals.

A minor comment:

The PRDM9 binding motif derives is not GC-rich as it is in mammals. The authors also show that the GC content is not elevated at recombination hotspots (in a fine-scale). Any thoughts on how these things can be related, and whether GC-biased gene conversion is not as active because of the genomic context and not due to some intrinsic mechanistic differences?

Reviewer #4:

In this manuscript entitled "PRDM9 drives the location and rapid evolution of recombination hotspots in salmonids", Raynaud et al. analyzed the meiotic recombination in different species of salmonids with a focus on the impact of Prdm9 allelic background. The authors used fine-scale recombination maps based on DMC1 ChIP sequencing and estimation of population-scaled recombination rates to present descriptive data of several salmonids. This is combined with an in silico analysis of the genomic features associated with hotspots. Authors conclude that PRDM9 determines location of recombination hotspots, leading to evolutionary signatures on those sites such as rapid motif erosion and turnover of recombination hotspots, resembling previous described patterns in model mammalian species.

Authors provide a nice introduction with relevant background material and the results are clearly written with well-organized sections, making it easy to interpret. Overall, results presented are compelling, being of broad interest to the meiosis community, especially for those working in non-model species. However, conclusions are not entirely novel, beyond the description of a conservation of patterns already described in mammals. Perhaps it will be desirable to make more emphasis on those patterns that are not common among mammals (such as mice or humans). In relation to that, authors might extend their interpretations on the data of truncated Prdm9 paralogs, providing additional detail on the possible implications.

Major comments

1. All throughout the work (see lines 255-265 or 340), authors compare their results on salmonids species with mice/humans recombination hotspots. However, the number of peaks in SSDS are considerably low compared to those detected in mice or humans. This needs clarification. Would it be related to a limitation of the technique when used in salmonids, including the use of different antibodies for different animals/species? Also, the number of detected hotspots based on LD are much higher. How can the authors reconcile these contrasting results?

2. Related to the previous point, pertinent comparisons with patterns found in non-mammalian species (such as snakes or birds) are missing. Also, there is not any mention of studies in meiotic recombination from other fishes.

3. Limited reference is made to the differences in intensity of hotspots, particularly on those shared or specific per alleles (they compare intensities when comparing hotspots from SSDS and LD at line 354, or in the discussion-Line 575-, but not between the studied species/populations)

4. Discussion on the genomic features characterizing independent-prdm9 hotspots is particularly absent.

5. It would be interesting to pointing out genomic features of independent-prdm9 hotspots. It was especially absent in the discussion.

6. Main figures could benefit from a clearer focus to strength the manuscript. For example, figures can be more homogeneous when considering aesthetics; figure 4 seems differently produced in terms of font size and style than the rest.

Additional comments:

7. Only GC-biased gene conversion is not a common pattern between salmonids and mammals-birds (Lines 405-409), is that right?

8. Regarding the recombination of tetraploid genomic regions is commented on the LD data. Have they compared the number of recombination hotspots in SSDS derived data? (Lines ~422-435).

9. Line 467: Authors attribute the high percentage of shared hotspots to common Prdm9 alleles, but they should also indicate the similar genetic background.

10. Since they have the prdm9 sequences of the alleles, looking for the predicted motif to compare will be informative. Reported motifs show strange pattern given previous literature, especially the second has high percentage of T. Aren't motifs enriched in GC?

11. Authors do not show examples of histological sections from testis, this will be useful to interpret meiosis progression in the samples used.

12. As for the genome assemblies used, it will be informative to provide stats on genome continuity; how many scaffolds and how large; are all scaffolds assigned to chromosomes?

Minor points:

* Line 72: Please, specify which Mus subspecies; there are several works describing differences between them.

* Line 178: Ts3R event (Ss4R)?

* Line 205: Please, include the reference to 'other species'.

* Lines 285-295: This section is hard to follow.

* Line 354: four out of five salmonids… which ones?

* Data support statement of Lines 423-425 needs to be included.

* Explain better the percentages of Lines 410 and 411.

* Population abbreviations are not indicated until methods.

* Line 458-459: Please, include number of hotspots, not just percentages.

* Line 488: Missing reference.

* Line 560: Perhaps will be more informative to include relevant references of recombination analysis in wild mice having prdm9 allelic variation.

* Line 623: it is stated that …"may be even older", but in the intro (Line X) they already citate papers that indicate that prdm9 was ancestral to all animals.

* Line 641: "a recent study in mammals…", please indicate the reference.

* Line 699: Is paralog alfa2.2 located in different chromosomes for S Salar and O mykiss? Chr 7 and 17?

Decision Letter 2

Roland G Roberts

25 Oct 2024

Dear Dr de Massy,

Thank you for your patience while we considered your revised manuscript "PRDM9 drives the location and rapid evolution of recombination hotspots in salmonids" for publication as a Research Article at PLOS Biology. This revised version of your manuscript has been evaluated by the PLOS Biology editors, the Academic Editor and three of the original reviewers.

Based on the reviews , we are likely to accept this manuscript for publication, provided you satisfactorily address the remaining points raised by the reviewers and the following data and other policy-related requests.

IMPORTANT - please attend to the following:

a) Please make the Title slightly more explicit for our broader readership: "PRDM9 drives the location and rapid evolution of recombination hotspots in salmonid fish"

b) Please address the remaining minor concerns from reviewer #4.

c) I note that you use new fish samples for PCR and ChiP-seq ("We used wild Atlantic salmon samples from Normandy (France) and rainbow trout samples from an INRAE selected strain"), so please can you clarify whether you needed ethical approval and/ or field licences for these samples?

d) Please address my Data Policy requests below; specifically, we need you to supply the numerical values underlying Figs 1A (treefile), 2BC, 3ABCD, 4ABC, 5ABCD, S1 (treefile), S5BC, S6ABC, S7AB, S8ABC, S9AB, S10AB, S11, S12, S13, S14ABD, S15ABC, S16AB, S17, S18AB, S19AB, S20ABCD, S21ABC, S22, S23, S24ABC, S25AB, S28, S29ABCDEF, S30ABCDEFGHI, either as a supplementary data file or as a permanent DOI’d deposition. I note that you already have an associated Zenodo deposition (https://doi.org/10.5281/zenodo.11083953), but please could you clarify whether this contains all of the data and code needed to recreate the Figures?

e) Please cite the location of the data clearly in all relevant main and supplementary Figure legends, e.g. “The data underlying this Figure can be found in S1 Data” or “The data underlying this Figure can be found in https://zenodo.org/records/11083953"

f) Please make any custom code available, either as a supplementary file or as part of your Zenodo deposition.

As you address these items, please take this last chance to review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the cover letter that accompanies your revised manuscript.

We expect to receive your revised manuscript within two weeks.

To submit your revision, please go to https://www.editorialmanager.com/pbiology/ and log in as an Author. Click the link labelled 'Submissions Needing Revision' to find your submission record. Your revised submission must include the following:

- a cover letter that should detail your responses to any editorial requests, if applicable, and whether changes have been made to the reference list

- a Response to Reviewers file that provides a detailed response to the reviewers' comments (if applicable, if not applicable please do not delete your existing 'Response to Reviewers' file.)

- a track-changes file indicating any changes that you have made to the manuscript.

NOTE: If Supporting Information files are included with your article, note that these are not copyedited and will be published as they are submitted. Please ensure that these files are legible and of high quality (at least 300 dpi) in an easily accessible file format. For this reason, please be aware that any references listed in an SI file will not be indexed. For more information, see our Supporting Information guidelines:

https://journals.plos.org/plosbiology/s/supporting-information

*Published Peer Review History*

Please note that you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Please see here for more details:

https://plos.org/published-peer-review-history/

*Press*

Should you, your institution's press office or the journal office choose to press release your paper, please ensure you have opted out of Early Article Posting on the submission form. We ask that you notify us as soon as possible if you or your institution is planning to press release the article.

*Protocols deposition*

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Please do not hesitate to contact me should you have any questions.

Sincerely,

Roli Roberts

Roland Roberts, PhD

Senior Editor

rroberts@plos.org

PLOS Biology

------------------------------------------------------------------------

ETHICS STATEMENT:

-- Please include the full name of the IACUC/ethics committee that reviewed and approved the animal care and use protocol/permit/project license. Please also include an approval number.

-- Please include the specific national or international regulations/guidelines to which your animal care and use protocol adhered. Please note that institutional or accreditation organization guidelines (such as AAALAC) do not meet this requirement.

------------------------------------------------------------------------

DATA POLICY:

You may be aware of the PLOS Data Policy, which requires that all data be made available without restriction: http://journals.plos.org/plosbiology/s/data-availability. For more information, please also see this editorial: http://dx.doi.org/10.1371/journal.pbio.1001797

Note that we do not require all raw data. Rather, we ask that all individual quantitative observations that underlie the data summarized in the figures and results of your paper be made available in one of the following forms:

1) Supplementary files (e.g., excel). Please ensure that all data files are uploaded as 'Supporting Information' and are invariably referred to (in the manuscript, figure legends, and the Description field when uploading your files) using the following format verbatim: S1 Data, S2 Data, etc. Multiple panels of a single or even several figures can be included as multiple sheets in one excel file that is saved using exactly the following convention: S1_Data.xlsx (using an underscore).

2) Deposition in a publicly available repository. Please also provide the accession code or a reviewer link so that we may view your data before publication.

Regardless of the method selected, please ensure that you provide the individual numerical values that underlie the summary data displayed in the following figure panels as they are essential for readers to assess your analysis and to reproduce it: Figs 1A (treefile), 2BC, 3ABCD, 4ABC, 5ABCD, S1 (treefile), S5BC, S6ABC, S7AB, S8ABC, S9AB, S10AB, S11, S12, S13, S14ABD, S15ABC, S16AB, S17, S18AB, S19AB, S20ABCD, S21ABC, S22, S23, S24ABC, S25AB, S28, S29ABCDEF, S30ABCDEFGHI. NOTE: the numerical data provided should include all replicates AND the way in which the plotted mean and errors were derived (it should not present only the mean/average values).

IMPORTANT: Please also ensure that figure legends in your manuscript include information on where the underlying data can be found, and ensure your supplemental data file/s has a legend.

Please ensure that your Data Statement in the submission system accurately describes where your data can be found.

------------------------------------------------------------------------

CODE POLICY

Per journal policy, if you have generated any custom code during the course of this investigation, please make it available without restrictions. Please ensure that the code is sufficiently well documented and reusable, and that your Data Statement in the Editorial Manager submission system accurately describes where your code can be found.

Please note that we cannot accept sole deposition of code in GitHub, as this could be changed after publication. However, you can archive this version of your publicly available GitHub code to Zenodo. Once you do this, it will generate a DOI number, which you will need to provide in the Data Accessibility Statement (you are welcome to also provide the GitHub access information). See the process for doing this here: https://docs.github.com/en/repositories/archiving-a-github-repository/referencing-and-citing-content

------------------------------------------------------------------------

DATA NOT SHOWN?

- Please note that per journal policy, we do not allow the mention of "data not shown", "personal communication", "manuscript in preparation" or other references to data that is not publicly available or contained within this manuscript. Please either remove mention of these data or provide figures presenting the results and the data underlying the figure(s).

------------------------------------------------------------------------

REVIEWERS' COMMENTS:

Reviewer #1:

The authors have responded to all my concerns and suggestions.

Reviewer #2:

[identifies herself as Dr. Linda Odenthal-Hesse]

I have reviewed the revised manuscript "PBIOLOGY-D-24-00615_R2" titled "PRDM9 drives the location and rapid evolution of recombination hotspots in salmonids by Marie Raynaud et al."

The authors have addressed my previous concerns and recommendations with great care, and I am fully satisfied with the revisions. I also feel that the comments made by other reviewers were carefully addressed. This is a well constructed paper and I recommend publication.

Reviewer #4:

The authors have satisfactorily addressed my initial comments. The manuscript is now much clearer, with more coherent argumentation and a stronger effort to discern the differences between species. I believe this paper makes a significant contribution to the field, and with a few minor revisions, it will be ready for publication. Minor comments.

* Line 257, the percentage doesn't should be notified in % rather than decimals?

* The sequence of the identified prdm9 alleles will be reported in fasta format? I can see in Zenodo some processed datasets such as VCF files, LD-based recombination maps and hotspots, PRDM9 protein sequences and multiple alignments, but not allelic variants

* Line 426, "On average, 90% of the total recombination appeared to be concentrated in 20% of the genome, a higher rate than what was observed in human and chimpanzee (29, 72) and slightly higher than what we observed in sea bass (Table 1, S12 Fig, S2 Table). This heterogeneity was largely driven by the presence of recombination hotspot" - From the text, I understand that the differences between sea bass and other salmonids are related to the presence of recombination hotspots and PRDM9. However, how do they explain why the recombination rate is higher in sea bass than in humans and chimpanzees?

* Line 418-419, there is an inconsistency between the recombination rates in the text and in the table 1 (either 0.012 or 0.0012)

* Figures are difficult to see, likely caused by the PDF format. Specifically, Figure 1A, which contains a lot of information and species names, is hard to read. Additionally, Figure 4, panels B and C, seem confusing. The blue and purple colors of the TSS in and out CGI bars are uniquely colored, but their legend is missing in the panels. Overall, the information and main point of Figure 4 is hard to interpret.

Decision Letter 3

Roland G Roberts

21 Nov 2024

Dear Bernard,

Thank you for the submission of your revised Research Article "PRDM9 drives the location and rapid evolution of recombination hotspots in salmonid fish" for publication in PLOS Biology. On behalf of my colleagues and the Academic Editor, Nick Barton, I'm pleased to say that we can in principle accept your manuscript for publication, provided you address any remaining formatting and reporting issues. These will be detailed in an email you should receive within 2-3 business days from our colleagues in the journal operations team; no action is required from you until then. Please note that we will not be able to formally accept your manuscript and schedule it for publication until you have completed any requested changes.

Please take a minute to log into Editorial Manager at http://www.editorialmanager.com/pbiology/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production process.

PRESS: We frequently collaborate with press offices. If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximise its impact. If the press office is planning to promote your findings, we would be grateful if they could coordinate with biologypress@plos.org. If you have previously opted in to the early version process, we ask that you notify us immediately of any press plans so that we may opt out on your behalf.

We also ask that you take this opportunity to read our Embargo Policy regarding the discussion, promotion and media coverage of work that is yet to be published by PLOS. As your manuscript is not yet published, it is bound by the conditions of our Embargo Policy. Please be aware that this policy is in place both to ensure that any press coverage of your article is fully substantiated and to provide a direct link between such coverage and the published work. For full details of our Embargo Policy, please visit http://www.plos.org/about/media-inquiries/embargo-policy/.

Thank you again for choosing PLOS Biology for publication and supporting Open Access publishing. We look forward to publishing your study. 

Sincerely, 

Roli

Roland G Roberts, PhD, PhD

Senior Editor

PLOS Biology

rroberts@plos.org

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Methods. Molecular, genomic, and population genetics methods.

    (DOCX)

    pbio.3002950.s001.docx (158.6KB, docx)
    S1 Analysis. Prediction of CGI-associated TSSs in salmonids.

    Fig A. Relationship between base composition and DNA methylation level in promoter regions of coho salmon (Oncorhynchus kisutch). Fig B. Relationship between DNA methylation level in the promoter regions of coho salmon and the presence of a nearby fpCGI. Fig C. Relationship between base composition and H3K4me3 in promoter regions of the rainbow trout (Oncorhynchus mykiss). Fig D. Relationship between H3K4me3 in the promoter regions of the rainbow trout and the presence of a nearby fpCGI.

    (DOCX)

    pbio.3002950.s002.docx (448.8KB, docx)
    S1 Table. Chromosome location of the retained PRDM9 paralog copies.

    The index allows to identify the corresponding copy in the phylogeny of the ɑ paralog copies in Fig 1A and the β copies in S1 Fig. We retrieved the location of the regions covering the 3 domains KRAB, SSXRD, and SET obtained from the blast analysis. The start and end positions correspond to the start position of the first exon blasted and the end position of the last exon blasted.

    (DOCX)

    pbio.3002950.s003.docx (22KB, docx)
    S2 Table. Fine scale variations in recombination rates and raw recombination hotspots.

    Summary statistics of the variations in recombination rates smoothed in 2 kb sliding windows, and of recombination hotspots retrieved from the inter-SNP recombination landscapes. The raw hotspots were defined as the consecutives inter-SNP windows with a recombination rate 5-fold higher than the 50 kb flanking regions.

    (DOCX)

    pbio.3002950.s004.docx (13.8KB, docx)
    S3 Table. Overlaps between DSB hotspots and TSS/TES regions.

    Overlaps were assessed for 400-bp wide windows centered on DSB hotspot centers and TSS/TES regions, defined as sequences found within 1 kb of distance from the transcription start/end site. The expected overlaps were estimated as the chance for 2 kb windows of overlapping 400 bp DSB hotspots genome wide.

    (DOCX)

    pbio.3002950.s005.docx (13.7KB, docx)
    S4 Table. Summary statistics of the LD-landscape reconstruction pipeline.

    Information is retrieved on the reference genome sizes, population sample sizes, mapping depth statistics, variant calling statistics, and effective sizes of genomic features for each population.

    (DOCX)

    pbio.3002950.s006.docx (14.3KB, docx)
    S5 Table. Comparison of fine-scale recombination landscapes across vertebrates.

    (DOCX)

    pbio.3002950.s007.docx (45.1KB, docx)
    S6 Table. Presence of Zcwpw1, Zcwpw2, Tex15, and Fbox47 genes in the genomes of analyzed species.

    (DOCX)

    pbio.3002950.s008.docx (14.7KB, docx)
    S7 Table. List of genotyped samples.

    Details about the Salmo salar and Oncorhynchus mykiss samples used in this study and the corresponding Prdm9ɑ genotypes identified.

    (DOCX)

    pbio.3002950.s009.docx (15.3KB, docx)
    S8 Table. List of primers.

    The primers used in this study to genotype the zinc finger array of Prdm9ɑ in Salmo salar and Oncorhynchus mykiss.

    (DOCX)

    pbio.3002950.s010.docx (13.3KB, docx)
    S9 Table. List of ChIP-seq experiments performed.

    Details about the ChIP-seq experiment performed in Oncorhynchus mykiss testes. In the third column DMC1-R1 or -GP refer to the animal used to raise the antibody, respectively, rabbit individual 1 and guinea pig.

    (DOCX)

    pbio.3002950.s011.docx (14.4KB, docx)
    S10 Table. Sample accession number and location.

    Population samples of O. kisutch, O. mykiss, and S. salar used to build the linkage disequilibrium-based recombination landscapes were selected from [118120].

    (DOCX)

    pbio.3002950.s012.docx (32.5KB, docx)
    S11 Table. Program versions.

    Details of the program versions used at each step of the reconstruction of LD-based recombination landscapes in the 5 salmonid populations. * The D. labrax data set was taken from [128], who used a reference panel of 22 genomes fully phased-by-transmission using trio-sequencing as a learning reference for the statistical phasing of 46 additional genomes with Eagle2 v2.4. Variants were oriented using whole-genome resequencing data (>20×) from the closely related species Dicentrarchus punctatus, which was used as an outgroup.

    (DOCX)

    pbio.3002950.s013.docx (27.7KB, docx)
    S12 Table. Assembly statistics of the reference genome of O. kisutch, O. mykiss, and S. salar that were used to map the population resequencing data for the LD-based recombination landscapes and the ChIP-Seq DMC1 peaks.

    (DOCX)

    pbio.3002950.s014.docx (13.1KB, docx)
    S13 Table. Mean sequence identity score obtained from the blast search of the 100 kb flanking sequences of the variants in the ingroup species against the reference genome of each outgroup.

    Outgroups 1, 2, and 3 for O. kisutch were O. tshawytscha, O. nerka, and O. mykiss, respectively; O. tschawytscha, O. nerka, and O. kisutch for O. mykiss; and Salmo trutta, Salvelinus alpinus, and O. mykiss for S. salar.

    (DOCX)

    pbio.3002950.s015.docx (13.4KB, docx)
    S1 Fig. Phylogenetic distribution of PRDM9β paralogs in 12 salmonids, the northern pike (Esox lucius) and the European sea bass (Dicentrarchus labrax) as outgroup species.

    The phylogenetic tree was realized on the concatenated 3 exons of the SET domain, with 1,000 bootstrap replicates (values shown at nodes). To facilitate visualization, the branch of D. labrax is not drawn to scale. In column from left to right, (i) species; (ii) annotated paralog copy; (iii) Prdm9 copy status. The scale bar is in unit of substitution per site. The right panel shows the coding potential of each paralog and indicates the presence of substitutions in the catalytic tyrosines of the SET domain (Y276, Y341, and Y357). Canonical (full length) PRDM9 proteins contain 4 key domains: KRAB (encoded by 2 exons), SSXRD (encoded by 1 exon), SET (encoded by 3 exons), and the ZF array (encoded by 1 exon). Complete exons are shown in blue. Missing or truncated exons are shown in pink. Other regions of the protein (upstream of the KRAB domain, and between KRAB and SSXRD), are encoded by additional exons (not shown here), that are not conserved between α and β clades. All β copies have lost KRAB and SSXRD domains, and have substitutions in at least two of the 3 catalytic tyrosines of the SET domain. β copies are well conserved across all species (including in the ZF array), which indicates that these truncated PRDM9 homologs are under purifying selection, and hence that they have a function. The last column indicates indexes referring to the S1 Table with additional information on the corresponding copy. The data and codes underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953.

    (DOCX)

    pbio.3002950.s016.docx (309.4KB, docx)
    S2 Fig. Chromosome position of the tandem duplicated PRDM9α a and b copies.

    Relative position and orientation of the a (in red) and b (in blue) tandem duplicated copies of the PRDM9α1.1 and 1.2 paralogs for each species. The chromosome/scaffold name on which the copy seats is shown. α1.1 and 1.2, which occur as single copies in some species, are also shown (in gray). The data underlying this figure can be found in S1 Table.

    (DOCX)

    pbio.3002950.s017.docx (228.5KB, docx)
    S3 Fig. Amino acid diversity in full-length and partial PRDM9 zinc fingers in S. salar and O. mykiss.

    Amino acid sequences of all unique zinc fingers found in alleles identified in S. salar PRDM9α1.a.2 and α2.2, and in O. mykiss PRDM9α1.a.1 and α2.2 (Figs 2A and S5). In bold colored boxes are indicated the 3 hypervariable DNA-binding residues. In red are reported the cysteine (C) and histidine (H) residues involved stabilizing the structure of the array. In blue are indicated the polymorphic residues compared to the consensus, outside the 3 amino acids in contact with DNA. In shaded gray are reported the synonym variations in respect to the consensus. The complementary information about the DNA sequences of all alleles identified is available in S1 Methods.

    (DOCX)

    pbio.3002950.s018.docx (631.7KB, docx)
    S4 Fig. Graphical view of PRDM9 paralogs.

    Cartoon showing the functional domains of PRDM9 paralogs analyzed in this study. The amino acid sequences were obtained from the reference genome and analyzed using previously described methodology [144]. α1 copies and the O. mykiss α2.2 copy possess a complete KRAB domain, and we refer to these copies as canonical PRDM9. S. salar α2.2 copy possess a partial KRAB domain, and we refer to this copy as truncated PRDM9. All 4 copies present the 3 catalytic tyrosine residues in the SET domain, required for methyltransferase activity.

    (DOCX)

    pbio.3002950.s019.docx (245.6KB, docx)
    S5 Fig. PRDM9α2.2 zinc finger allelic diversity in S. salar and O. mykiss.

    (A) Structure of PRDM9 zinc finger arrays of identified alleles in S. salar PRDM9α2.2 and O. mykiss PRDM9α2.2. Colored boxes represent unique zinc fingers, characterized by the 3 amino acids in contact with DNA (3-letter code). Additional variations relative to a reference sequence are indicated between brackets. A white star indicates the zinc fingers missing one amino acid residue (27 a.a. instead of 28). The complete zinc finger amino-acid sequences are shown in S3B Fig. Frequencies of the alleles displayed on panel A among the 20 S. salar and 20 O. mykiss individuals that were genotyped for PRDM9. (C) Distribution of amino acid diversity among all unique zinc fingers found in alleles displayed on panel A, following previously described methodology [19]. The amino acid diversity is plotted as a function of amino acid position in the ZF alignment, ranging from position 1 to position 28 (first and last residues) of a ZF unit. The ratio of amino acid diversity at DNA-binding residues of the ZF array (−1, 2, 3, and 6), indicated as r, is shown in the upper box. The data underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953 and in S7 Table.

    (DOCX)

    pbio.3002950.s020.docx (169.7KB, docx)
    S6 Fig. Meiotic DSB hotspots features in O. mykiss.

    (A) Average profile of DMC1 ChIP-seq ssDNA fragments orientation (fragments per million, FPM) in TAC-1, TAC-3, and RT-52 testes, at DSB hotspots detected in TAC-1, TAC-3, and RT-52. The profile from each experiment performed is shown (2 replicates/sample). Signal mapped on the forward strand is depicted in blue, signal aligned to the reverse strand is shown in green, as shown in the cartoon on top of the panel. (B) Upset plot showing intersections between DSB hotspots from TAC-1 (n = 616), TAC-3 (n = 209), and RT-52 (n = 1,924). (C) DMC1 ChIP-seq signal fold enrichment (scaled by the average signal in intergenic regions) at multiple genomic features. TSS inside and outside CGIs are highlighted in purple and turquoise, respectively. The data and codes underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953 and https://zenodo.org/records/14198863.

    (DOCX)

    pbio.3002950.s021.docx (219.7KB, docx)
    S7 Fig. Distribution of DSB hotspots along chromosomes in O. mykiss.

    (A) Distribution of DSB hotspots from TAC-1, TAC-3, and RT-52 along chromosomes (paces of 1/30 of chromosome length). (B) Average profile and heatmap of DMC1 ChIP-seq ssDNA fragments orientation (fragments per million, FPM) in TAC-1, TAC-3, and RT-52 testes, at DSB hotspots shared by pairs of samples. Shared DMC1 peaks: TAC-1 intersecting RT-52 (n = 167), TAC-1 intersecting TAC-3 (n = 55), and RT-52 intersecting TAC-3 (n = 42). The plots depict one replicate for each experiment performed (replicate 1). Signal mapped on the forward strand is depicted in blue, signal aligned to the reverse strand is shown in green. The data and codes underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953 and https://zenodo.org/records/14198863.

    (DOCX)

    pbio.3002950.s022.docx (565.8KB, docx)
    S8 Fig. Histone modification signal at H3K4me3 peaks and at DSB hotspots.

    (A) Average profile and heatmap of H3K36me3 ChIP-seq signal in TAC-1 (blue) and TAC-3 (green) testes, at H3K4me3 peaks detected in brain (Aqua-FAANG). (B) Average profile and heatmap of H3K4me3 (left) and H3K36me3 (right) ChIP-seq signal in TAC-1 testes, at DSB hotspots detected in TAC-1 (blue), TAC-3 (cyan), and RT-52 (yellow). (C) Average profile and heatmap of H3K4me3 (left) and H3K36me3 (right) ChIP-seq signal in TAC-3 testes, at DSB hotspots detected in TAC-1 (blue), TAC-3 (cyan), and RT-52 (yellow). The data underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953.

    (DOCX)

    pbio.3002950.s023.docx (1.9MB, docx)
    S9 Fig. Correlation of H3K4me3 and H3K36me3 signal at RT-52 hotspots.

    (A) Scatterplots showing H3K4me3 and H3K36me3 ChIP-seq signal in TAC-1 and TAC-3 testes, at RT-52 DSB hotspots. (B) Left panels, scatterplots representing H3K4me3 (top) or H3K36me3 (bottom) ChIP-seq signal in TAC-1 and TAC-3 testes, at RT-52 DSB hotspots. Right panels, numbers of RT-52 hotspots with H3K4me3 (top) or H3K36me3 (bottom) ChIP-seq signal under or above 1 in TAC-1 and TAC-3. Chi-square test of homogeneity. The data underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953.

    (DOCX)

    pbio.3002950.s024.docx (149.1KB, docx)
    S10 Fig. Broad scale recombination rate variations along the genome.

    Recombination rates were averaged into percentiles of chromosome length and scaled by the genomic mean for (A) O. kisutch (in orange), O. mykiss (in green), and S. salar (in blue, only the NS population is shown); and (B) D. labrax (in red) and the three-spined stickleback Gasterosteus aculeatus (in black, data from [61]). The data and codes underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953.

    (DOCX)

    pbio.3002950.s025.docx (239.6KB, docx)
    S11 Fig. Fine-scale recombination landscapes of O. kisutch (in orange), O. mykiss (in green), and S. salar (in blue, only the NS population is shown), with recombination rates smoothed in 2 kb sliding windows.

    The data and codes underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953.

    (DOCX)

    pbio.3002950.s026.docx (269.3KB, docx)
    S12 Fig. Proportion of recombination according to proportion of the genome for O. kisutch (orange), O. mykiss (green), S. salar (shades of blue), and D. labrax (gold).

    The data and codes underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953.

    (DOCX)

    pbio.3002950.s027.docx (132.9KB, docx)
    S13 Fig. Proportion of hotspots (in %) according to raw hotspot size.

    Hotspots were defined as consecutives windows of 2 adjacent SNPs in which the recombination rate is at least 5-fold higher than the 50 kb flanking regions. The data and codes underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953.

    (DOCX)

    pbio.3002950.s028.docx (120.5KB, docx)
    S14 Fig. Comparison between the LD-based recombination landscape and the ChIP-Seq DMC1 map of the rainbow trout O. mykiss.

    (A) Venn diagram showing the percentage of shared peaks between the ChIP-Seq peaks of the pooled samples (in brown) and the LD-based hotspots (in green). The percentage has been calculated using the number of DMC1 peaks as the denominator. (B) Random expected (blue) and observed values (orange) of shared peaks between LD and ChIP-Seq maps. (C) Recombination rates in the syntenic location of the ChIP-Seq peaks, in the LD hotspots, in the shared ChIP-Seq and LD windows (i.e., 116 ChIP-Seq peaks shared with LD hotspots) and in the background landscapes (i.e., the genomic windows not containing neither a LD hotspot nor a ChIP-Seq peak). The data and codes underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953.

    (DOCX)

    pbio.3002950.s029.docx (85.9KB, docx)
    S15 Fig. Broad scale variation in genomic variables according to recombination rates.

    (A) SNP density (per kb). (B) GC-content. (C) TE density (per 100 kb). Recombination rates, SNP density GC-content, and TE density were averaged in 100 kb sliding windows. Significance p-value of Spearman’s rank test <0.05 are indicated by an asterisk in panels. The vertical dashed line is the mean recombination rates and the horizontal dashed line is the mean y variable. The data and codes underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953.

    (DOCX)

    pbio.3002950.s030.docx (637.6KB, docx)
    S16 Fig. Genetic diversity and base composition at recombination hotspots.

    (A) SNPs density (per kb) and (B) GC content, according to distance to the nearest recombination hotspots. SNP density, GC-content, and recombination rates were averaged in 2 kb windows. Colored (orange, green, and blue) dashed lines show the mean of the y variable at hotspots of the corresponding populations, the black dashed line is the genomic mean (outside hotspots). Loess curves are shown for a span of 0.5. The data and codes underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953.

    (DOCX)

    pbio.3002950.s031.docx (238.1KB, docx)
    S17 Fig. Average recombination rates in TEs families.

    Tc1-mariner, a family of LTR, is shown. The data and codes underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953.

    (DOCX)

    pbio.3002950.s032.docx (106.7KB, docx)
    S18 Fig. Inter- and intra-chromosome variation in recombination rates in residual tetraploid chromosomes.

    (A) Averaged recombination rate per chromosome. Gray bars indicate chromosomes without residual tetrasomic regions (2N), and yellow bars indicate chromosomes with residual tetrasomic regions (4N) in the 5 Oncorhynchus and Salmo populations. Gray dashed lines represent the genome averaged recombination rates of the 2N chromosomes, and the yellow line in the 4N chromosomes. Recombination rates are significantly higher in 4N chromosomes compared to 2N chromosomes (Student test, t(13.258) = −3.9404, p < 0.05) in O. mykiss population, but not in O. kisutch (Student test, t(25.867) = −0.88786, p > 0.05) neither in S. salar populations (Student test, t(23.519) = −0.0026857, p > 0.05 for GP, t(20.99) = −2.2572, p < 0.05 for BS and t(19.677) = −1.9741, p = 0.06258 for NS). (B) Recombination rates along the genome. Recombination rates were averaged into percentiles of chromosome length and scaled by the genomic mean. Same color as panel A. The data and codes underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953.

    (DOCX)

    pbio.3002950.s033.docx (464.1KB, docx)
    S19 Fig. Recombination rates at genomic features in residual tetraploid chromosomes.

    (A) Fold recombination rates (scaled by the average recombination rate at 50 kb from the nearest feature) according to distance to the nearest promoter-like features (i.e., TSS overlapping or not a CGI). Recombination rates in chromosomes not containing residual tetraploid regions are shown by the continuous line and by the dashed line for the 4N chromosomes. (B) Fold recombination rates (scaled by the average recombination rates in intergenic regions) in genomic features, in 2N (gray) and 4N (yellow) chromosomes. The horizontal line shows the intergenic recombination level. TSS and TES were defined as the first and last positions of genes. CGIs were mapped with EMBOSS using CpGoe > 0.6 and GC > 0. The data and codes underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953.

    (DOCX)

    pbio.3002950.s034.docx (244.2KB, docx)
    S20 Fig. Significance of hotspots sharing between closely related populations.

    Random expectations (blue) and observed values (orange) of shared hotspots between (A) O. kisutch and O. mykiss; between S. salar populations (B) GP and BS; (C) GP and NS; and between (D) BS and NS. Shared hotspots were defined as 2 kb hotspots overlapping by at least 1 bp. Percent shared is calculated using the number of hotspots in the species/population with fewer hotspots as the denominator. The expected distribution of shared hotspots has been obtained from 1,000 pairwise comparisons of random spot. The data and codes underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953.

    (DOCX)

    pbio.3002950.s035.docx (134KB, docx)
    S21 Fig. Pairwise comparison of 100 kb smoothed recombination maps between S. salar populations.

    (A) Comparison between GP and BS populations. (B) Comparison between GP and NS populations. (C) Comparison between BS and NS populations. Spearman’s rank test p-value <0.05. Loess curves are shown for a span of 0.7. The data and codes underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953.

    (DOCX)

    pbio.3002950.s036.docx (328KB, docx)
    S22 Fig. DSB and LD hotspots are enriched in PRDM9 allele-specific motifs.

    Frequency of sequences with at least one hit for PRDM9 allele 1 (left) and allele 2 (right) motifs at allele 1 and allele 2 sites, RT-52, TAC-1 and TAC-3 DSB hotspots, LD-hotspots and control sites. Fold enrichment relative to the control sites is shown on top of each column. The associated p-values indicate significant differences in fold enrichment relative to the control (Fisher exact test). “NS” indicates not significant (p > 0.05). The data and codes underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953.

    (DOCX)

    pbio.3002950.s037.docx (119.8KB, docx)
    S23 Fig. DSB and LD-based hotspots are enriched in PRDM9 allele-specific motifs.

    Positional distribution of hits for Prdm9 allele 1 (pink) and allele 2 motifs (green) in RT-52, TAC-1, TAC-3 DSB hotspots, LD stronger hotspots (n = 5,000) and control sites (n = 5,000). The distribution is shown from the center of the sequence with a range of ±2.5 kb for the DSB hotspots and the control sites. The LD hotspots were centered on the SNP interval showing the highest recombination rate (ρ/bp) and the distribution extends up to 7.5 kb from the refined center. The signal is smoothed by weighted moving average and hits were calculated either in a 750 bp window for the LD hotpots and in a 250 bp window for all other sequences. The statistical significance of motif enrichment, adjusted for multiple tests, is shown (one-tail binomial test). “ns” indicates non-significant enrichment (p > 0.05). The data underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953.

    (DOCX)

    pbio.3002950.s038.docx (583.5KB, docx)
    S24 Fig. Motifs enrichment at population-specific and shared recombination hotspots in S. salar populations.

    (A) Average recombination rate in motifs found enriched at hotspot. Yellow boxes show motifs found in at least 5% of hotspots showing 2-fold enrichment compared to the control set of random spots. (B) Average recombination rate in hotspots containing the retained motifs from panel A (with the corresponding motifs shown) compared to hotspots not containing the retained motifs. Significant Student’s tests are indicated (***, p-value <0.05). (C) Percentage of hotspots containing the retained motifs shown in yellow. The data and codes underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953.

    (DOCX)

    pbio.3002950.s039.docx (279.9KB, docx)
    S25 Fig. Genome wide distribution of PRDM9α motifs along chromosomes in O. mykiss.

    (A) Distribution of PRDM91 (n = 68,047) and PRDM92 (n = 59,986) motifs in rainbow trout genome along chromosomes (paces of 1/30 of chromosome length). (B) Distribution of motifs enriched in the shared hotspots between the BS and NS populations of the Atlantic salmon (n = 936).

    (DOCX)

    pbio.3002950.s040.docx (201.6KB, docx)
    S26 Fig. Histology and immunostaining of trout gonads.

    (A) Hematoxylin-eosin-stained histological sections of testes from O. mykiss samples used in this study. In TAC-1, TAC-3 and RT-52 the seminiferous tubules were filled with round cells, mostly primary spermatocytes (Sc), some spermatids (ST), few spermatogonia (Sg), and almost no mature spermatozoa visible (Sz). For meiotic cells, between brackets is indicated the substage of prophase I: leptotene (L), zygotene (Z), diplotene (D), and diakinesis (DK). Scale bars are 20 μm. (B) Immunofluorescence of SYCP3, SMC3, and DMC1 in testes sections from a stage III O. mykiss sample not used for ChIP in this study. Scale bars are 10 μm.

    (DOCX)

    pbio.3002950.s041.docx (2.5MB, docx)
    S27 Fig. Sample location.

    The 20 individuals of O. kisutch were samples in the Columbia River (in orange) [118], the 22 samples of O. mykiss come from North America rivers (in green) [119], and the 60 individuals of S. salar were sampled in Canada and Norway [120]. Based on population structure analysis, we subdivided the Atlantic salmon samples into 3 populations (in shades of blue): Gaspesie-Anticosti (GP), Barents sea (BS), and North sea (NS). The basemap shapefile used in this figure was derived from the CIA World DataBank II, accessed via the mapdata package in R.

    (DOCX)

    pbio.3002950.s042.docx (225.5KB, docx)
    S28 Fig. Pairwise correlation between the 5 independent runs of LDhelmet.

    Spearman’s rank correlation matrix for the 5 populations, p-value <0.05. The data and codes underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953.

    (DOCX)

    pbio.3002950.s043.docx (127KB, docx)
    S29 Fig. Patterns of population recombination rate variation controlled for sequencing coverage.

    (A) Average population recombination rate; (B) average hotspot density; and (C) average population recombination rate in hotspots, according to average sequencing coverage. Student t test p-values and Cohen’s D coefficient are shown in A, B, and C. (D) Fold recombination rates (scaled by the average recombination rate at 50 kb from the nearest feature) according to the distance to the nearest TSS (overlapping or not with a CGI) shown in color, and to the mean depth (shown by the line type). (E) Fold recombination rates (scaled by the average recombination rates in intergenic regions); and (F) hotspot density at the indicated genomic features according to the mean coverage shown in color. TSS and TES were defined as the first and last positions of each gene. CGIs were mapped using EMBOSS with CpGoe > 0.6 and GC > 0. “High” sequencing coverage corresponds to the half of the recombination map with the highest depth and “low” sequencing coverage corresponds to the half of the recombination map with the lowest depth. Only the NS population of S. salar is shown. The data and codes underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953.

    (DOCX)

    pbio.3002950.s044.docx (314.9KB, docx)
    S30 Fig. Hotspot sharing between populations controlled for sequencing coverage.

    Fold recombination rates around hotspots and at orthologous loci in the 2 taxa, for the 2 Oncorhynchus species (A and B), the American (GP population) and European (BS and NS populations) S. salar lineages (D and E), and between the 2 closely related European S. salar populations (BS and NS) (G and H), according to the mean coverage shown in panels. Random expectations (blue) and observed values (orange) of shared hotspots between (C) O. kisutch and O. mykiss; between S. salar populations (F) GP and BS; (I) BS and NS, according to the mean coverage shown in panels. Shared hotspots were defined as 2 kb hotspots overlapping by at least 1 bp. Percent shared is calculated using the number of hotspots in the population with fewer hotspots as the denominator. The expected distribution of shared hotspots has been obtained from 1,000 pairwise comparisons of random spots. “High” sequencing coverage corresponds to the half of the recombination map with the highest depth and “low” sequencing coverage corresponds to the half of the recombination map with the lowest depth. The data and codes underlying this figure can be found in https://doi.org/10.5281/zenodo.11083953.

    (DOCX)

    pbio.3002950.s045.docx (301.7KB, docx)
    Attachment

    Submitted filename: RaynaudSanna_D-24-00615R1_AnswerstoReviewers.docx

    pbio.3002950.s046.docx (273KB, docx)
    Attachment

    Submitted filename: Answerto reviewers_secondroundv3.docx

    pbio.3002950.s047.docx (16.4KB, docx)

    Data Availability Statement

    Sequencing data from ChIP-seq and called peaks have been deposited in the Gene Expression Omnibus (GEO) under accession GSE277449, as part of BioProject PRJNA1162462. Bioinformatic scripts as well as processed data sets (VCF files, LD-based recombination maps and hotspots, PRDM9 protein sequences and multiple alignments, ChIP-Seq fragments and peaks coordinates) are available on Zenodo (https://doi.org/10.5281/zenodo.11083953). Scripts for ChIP-seq data analysis are available at https://zenodo.org/records/14198856 (SSDS pipeline), https://zenodo.org/records/14198863 (SSDS-extra pipeline) and https://zenodo.org/records/3966161 (Next-flow ChIPseq pipeline).


    Articles from PLOS Biology are provided here courtesy of PLOS

    RESOURCES