Abstract
Meiotic recombination creates genotypic diversity within species. Recombination rates vary substantially across taxa, and the distribution of crossovers can differ significantly among populations and between sexes. Crossover locations within species have been found to vary by chromosome and by position within chromosomes, where most crossover events occur in small regions known as recombination hotspots. However, several species appear to lack hotspots despite significant crossover heterogeneity. The nematode Caenorhabditis elegans was previously found to have the least fine-scale variation in crossover distribution among organisms studied to date. It is unclear whether this pattern extends to the X chromosome given its unique compaction through the pachytene stage of meiotic prophase in hermaphrodites. We generated 798 recombinant nested near-isogenic lines (NILs) with crossovers in a 1.41 Mb region on the left arm of the X chromosome to determine if its recombination landscape is similar to that of the autosomes. We find that the fine-scale variation in crossover rate is lower than that of other model species, and is inconsistent with hotspots. The relationship of genomic features to crossover rate is dependent on scale, with GC content, histone modifications, and nucleosome occupancy being negatively associated with crossovers. We also find that the abundances of 4- to 6-bp DNA motifs significantly explain crossover density. These results are consistent with recombination occurring at unevenly distributed sites of open chromatin.
Keywords: crossovers, recombination rate variation, meiosis, C. elegans
Meiotic crossovers are required for proper segregation of chromosomes, and, in all organisms studied to date, there is considerable variation in where these crossovers tend to occur. This variation impacts the relative efficacy of selection within the genomes of sexually reproducing species (Hill and Robertson 1966; Smukowski and Noor 2011; Cutter and Payseur 2013). Most studied eukaryotic species have higher crossover rates at peripheral regions of their chromosomes compared to the centers (Akhunov et al. 2003; Barton et al. 2008; Rockman and Kruglyak 2009; Chowdhury et al. 2009; Roesti et al. 2013). At this broad scale, a number of genomic features have been found to be positively correlated with recombination rate, including nucleotide diversity (Lercher and Hurst 2002; Cutter and Payseur 2003), GC content, and gene density (Akhunov et al. 2003). It should be noted that negative relationships have also been observed for GC content (Drouaud et al. 2006) and gene density (Barnes et al. 1995). These associations are not applicable to all species, and may not necessarily remain at scales finer than several hundred kilobases. Further, the inferred causal directions differ among these factors, with recombination rate variation thought to cause variation in nucleotide diversity and GC content, via selection and biased gene conversion, but variation in gene density thought to cause variation in recombination rate, at least at the mechanistic within-meiosis level, via chromatin state.
At the kilobase level, recombination rate heterogeneity can be much more dramatic, and additional factors may affect crossover location. In most mammals, the H3K4-methyltransferase PRDM9 plays a major role in determining the genome-wide distribution of recombination hotspots, which are 1- to 2-kb regions where crossover events occur at greatly elevated rates. However, the PRDM9 zinc-finger domains have evolved rapidly in primates and rodents, which have different preferred binding sites that contribute to different hotspot landscapes within and among these species (Baudat et al. 2013). In dogs, which have lost PRDM9 function, hotspots are located preferentially near gene promoter regions, particularly those with CpG islands (Auton et al. 2013). Hotspots near transcription start sites have also been documented in several plant species (Drouaud et al. 2006; Hellsten et al. 2013; Silva-Junior and Grattapaglia 2015), birds (Singhal et al. 2015), and budding yeast (Tsai et al. 2010). The nematode Caenorhabditis elegans, however, does not appear to have such extreme fine-scale recombination rate heterogeneity (Kaur and Rockman 2014).
In C. elegans, recombination rate broadly varies according to physical position in all six of its chromosomes. Each chromosome is comprised of three large domains: a low-recombining, gene-dense center, and two high-recombining arms (Barnes et al. 1995; Rockman and Kruglyak 2009). In addition, crossovers are absent from smaller domains adjacent to the telomeres. Unlike most other species with low recombination at chromosome centers, C. elegans chromosomes are holocentric and lack defined centromeres (Albertson and Thomson 1982). Furthermore, C. elegans exhibits near complete crossover interference, such that each chromosome has only one crossover per meiosis (Meneely et al. 2002). This chromosome-level recombination rate variation, restricted number of crossovers, and low rates of outcrossing have implications for selection at linked sites within the C. elegans genome (Cutter and Payseur 2003; Rockman et al. 2010).
The C. elegans X chromosome experiences a considerably different meiotic environment than the autosomes. The X chromosome is unpaired in males and therefore does not recombine in the male germline, though males are very rare in wild populations (Barrière and Félix 2007). In both males and hermaphrodites, the X is condensed and transcriptionally silent through pachytene in meiotic prophase (Kelly et al. 2002); in fact X chromosomes that experience spermatogenesis (in both males and hermaphrodites) are silenced throughout sperm maturation and into early embryogenesis (Bean et al. 2004; Arico et al. 2011). There are several genes that uniquely impact the X chromosome during early meiosis, including xnd-1 and him-5, which have effects on double-strand break formation (Wagner et al. 2010; Meneely et al. 2012). Furthermore, the X chromosome receives comparatively fewer double-strand breaks than the autosomes, which is thought to be a consequence of its condensed state (Gao et al. 2015). Of the six C. elegans chromosomes, the X has the least differentiation in recombination rates between the arms and center (Barnes et al. 1995; Rockman and Kruglyak 2009). Whether fine-scale variation on the X chromosome is similar to that of the autosomes is unknown (Kaur and Rockman 2014).
Here, we provide the highest resolution C. elegans crossover information to date, focused on a 1.41-Mb region on the left arm of the C. elegans X chromosome. The interval covers 7.9% of the chromosome’s physical length, and has an expected genetic map length of 3.7 cM (Yook et al. 2012). We performed genetic crosses to generate a large panel of nested near-isogenic lines (NILs). In the animals whose meiosis we studied, the targeted genomic interval is heterozygous for DNA from the Hawaiian wild isolate CB4856 and from the lab strain N2, and the rest of the genome is entirely N2. CB4856 is highly diverged from N2 (Andersen et al. 2012), which enabled us to densely genotype recombinant lines. In total, we have crossover location data for 870 NILs. The data reveal that the crossover distribution in the focal region of the X chromosome exhibits a nonuniform density with low heterogeneity, similar to the pattern observed on an autosome. Chromatin modifications and DNA sequence features explain some of the residual variation observed.
Materials and Methods
Overview
We generated strains with visible marker mutations on each side of a CB4856-derived interval in the N2 background. In crosses between NIL and N2 strains carrying the visible markers in repulsion, we selected recombinants. Finally, we genotyped CB4856/N2 SNPs within the interval to localize crossovers.
Strains
We constructed the sub-NIL panel using four starting strains. QX1321 is a NIL with CB4856 genomic DNA introgressed into the N2 background. The left boundary of the introgressed region lies between X:2,552,391 and X:3,157,209 and the right boundary is between the lon-2 locus and X:4,892,211. QG7 contains the fax-1(gm83) mutation (Much et al. 2000), and was derived from strain MU1080, originally acquired from the Caenorhabditis Genetic Center (CGC) with six generations of backcrossing to N2. Fax worms fail to make omega turns. QG144 contains the lon-2(e678) mutation (Brenner 1974) and was derived by crossing strain CB3273, also originally acquired from the CGC, to N2 and selecting non-Mec recombinants. Lon worms are long and thin. QG1 is a NIL (qgIR1), where X:4,754,307–4,864,273 is of CB4856 origin in an otherwise N2 genomic background. This region includes the npr-1 locus, which has a lab-derived gain-of-function mutation in N2 that has pleiotropic effects on phenotypes and gene expression (de Bono and Bargmann 1998; McGrath et al. 2009; Andersen et al. 2014).
Cross design and experimental conditions
We thawed the starting worm strains from frozen stocks and cleaned them by bleaching (Stiernagle 2006). We maintained worms at 20° on NGM agar plates seeded with Escherichia coli OP50 for several generations before we started experiments, which all took place at 20°.
To generate strains with marker mutations linked to the NIL introgressed interval, we first crossed QG7 hermaphrodites with QG144 males to yield a fax-1lon-2 double-mutant line, QG569. QX1321 (a wild-type NIL covering the focal interval) males were crossed to QG569 hermaphrodites to eventually produce mutant NILs. We used marker segregation along with indel and SNP genotyping to construct a strain, QG596, with fax-1 and lon-2 tightly linked to CB4856 DNA in an otherwise N2 genomic background.
As Figure 1 illustrates, QG596 was crossed to QG1 to produce F3 homozygous recombinant single-mutant individuals, where recombinants possessing the fax-1 allele would also have qgIR1. The strains possessing the most CB4856 genome based on the same genotyping methods mentioned previously are the NIL parental strains QG613 [fax-1(gm83); qgIR1; qgIR3] and QG614 [lon-2(e678); qgIR4].
To create the recombinant sub-NIL panel in which we mapped crossover positions, QG613 and QG614 were crossed to QG590 [lon-2(e678); qgIR1] and QG591 [fax-1(gm83); qgIR1], respectively. At the F2 generation, wild-type L4 hermaphrodites were picked to individual plates. Recombinant NILs were established from F3 plates lacking one of the two mutant phenotypes (Lon or Fax). Plates founded by an F3 recombinant hermaphrodite that yielded only wild-type progeny were frozen and given unique strain names.
In approximately 1% of F3 plates, all animals were wild-type, indicating that the F2 parent had two recombinant chromosomes within the focal interval. These plates were discarded and did not contribute to the final total of NILs. Sub-NILs generated for subsequent analyses represent a powerful fine-mapping resource and are available from the authors.
Whole-genome sequencing of parental strains
To determine the exact breakpoints of the CB4856 introgressed regions in the parental NILs, we sequenced the QG613 and QG614 genomes by Illumina HiSeq. We generated 100 bp paired-end reads, which were mapped to the N2 reference genome using stampy (Lunter and Goodson 2011). Variant calling was performed using samtools (Li et al. 2009). We identified a G > A nonsense mutation (R128Opal) at X:3198393 (ws250) in the fax-1 gene as gm83. The sequence of QG614 did not reveal any candidate SNPs for e678 in the lon-2 locus. Visualizing the BAM file for QG614 in the UCSC Genome Browser (Kent et al. 2002; Meyer et al. 2013), we found two large regions in the lon-2 locus to which no reads mapped. Based on sequence data for reads spanning regions that mapped and failed to map, we infer two deletions: X:4740894-4749007 and X:4749218-4749987 (ws250). The first deletion comprises almost the entire first intron through most of the ninth intron, and the second deletion covers 135 bp to 905 bp upstream of the coding region. Sequencing of QG590 also yielded these deletions in lon-2.
Genomic DNA preparation and genotyping
We isolated genomic DNA from worms using a salting-out protocol (Rockman and Kruglyak 2009). Strains were genotyped at a 191-bp indel polymorphism in the interval (X:4,103,162-4,103,352) to sort them into two groups (Supplemental Material, Figure S1). The indel genotype implies that the crossover location is either to the left or to the right of the indel, and strains were then genotyped at 139 SNPs on the side of the indel inferred to carry the crossover, plus an additional five SNPs on the other side of the indel as a control for genotyping error. We genotyped equal numbers of strains with breakpoints on the left and right sides, though the numbers of crossovers observed on each side were not equal (Figure S1). Consequently, our downstream statistical analyses incorporate interval side as a covariate to control for our unrepresentative sampling of crossovers. Illumina GoldenGate genotyping was performed at the DNA Sequencing and Genomics Core Facility of the University of Utah, Salt Lake City.
Quality control of genotyping data
Scatterplots of fluorescence intensities were manually inspected using a previously created pipeline (Kaur and Rockman 2014). One probe yielded ambiguous genotypes and was excluded from analyses. The final data set includes 139 SNPs in the left interval, and 138 SNPs in the right interval. Strains that had multiple SNP genotyping failures, ambiguous calls in the region of recombination, and/or stretches of heterozygous calls were excluded from this analysis.
Statistical analyses
Statistical tests and analyses were performed in R (R Core Team 2015). The data were analyzed at two resolutions: full, which includes all regions flanked by SNP probes (File S1), and 25kb, which comprises a more uniform distribution of SNP probe-bounded regions (File S2). Both of these files contain the physical location of each genotyped SNP, the number of crossover events that occurred between each pair of consecutive SNPs for each cross, and the identity of the interval side (left or right) that carries the SNP. File S3 includes three regions that were omitted from analyses (see Results). File S4 contains an alternative 25kb data set that includes the omitted intervals.
Analysis of recombination rate variation
To determine whether the distributions of recombination events between the two parental crosses differed, we performed two-sample, two-sided Kolmogorov-Smirnov tests, as described previously (Kaur and Rockman 2014).
Modeling of discrete constant-rate domains within the two interval sides defined by GoldenGate array identity was performed using a previously designed program (Kaur and Rockman 2014). In brief, the R package cobs (Ng and Maechler 2007) was used to fit increasing numbers of constant-rate domains within the two interval halves defined by GoldenGate genotyping. Likelihood-ratio test statistics were tested against null distributions generated by simulation. As done previously, we examined up to a maximum of 40 knots, where a knot defines the boundary of a constant-rate domain.
The level of recombination rate heterogeneity was estimated by calculating the Gini coefficient, which is a measure of inequality within a cumulative frequency distribution. The Gini coefficient can range from 0 (recombination is equally likely at every location) to 1 (all recombination occurs in one location). Gini coefficients were calculated using a previously designed program (Kaur and Rockman 2014). This program was also used to simulate crossover data, with the rate in each 1-kb bin drawn from a gamma distribution. Modifying the shape parameter for the gamma distribution allowed for examination of distributions of recombination rates ranging from highly heterogeneous to nearly uniform. Within a bin, the recombination rate is constant and crossovers are randomly positioned. For a given shape parameter value, we ran 5000 simulations. Crossovers produced by the simulation were then placed within intervals defined by our genetic markers. Gini coefficients were then calculated for each set of simulated recombination observations.
Genomic correlates of recombination rates
Genome sequence data used in all analyses were obtained from the WS225 release of WormBase (Yook et al. 2012). Polymorphism data were obtained from CB4856 resequencing (Thompson et al. 2015). Chromatin modification data were obtained from modENCODE (Table S1; http://modencode.org). Annotations for other sequence information within the C. elegans genome were obtained from WormBase. We wrote an R script to identify nonoverlapping occurrences of all 4- to 8-bp dsDNA motifs within the focal genomic interval (File S5 and File S6).
We used the generalized linear modeling function in R, glm, to model crossover count per marker interval as a Poisson-distributed random variable. We incorporated an offset term to account for variation in the distance between consecutive markers. The identity of the interval side (left or right) was included as a covariate in all models to account for our crossovers sampling design (Figure S1). To determine the significance of the crossover rate variation observed, we generated a null distribution by simulation of crossover events governed only by the physical distance (in N2) between SNP markers, creating 10,000 data sets (File S7). The residual deviance from the observed data was compared to the distribution from the simulated datasets to determine a P-value. To determine the significance of the small DNA motifs, we generated a null distribution by sequence permutation (Figure S2 and File S8). We created 1000 datasets by permuting the DNA sequence within each marker interval, preserving each interval’s GC content, and we tested each motif as an explanatory variable in an analysis of each permuted dataset. For each dataset, the minimum residual deviance across all motifs was used as a summary statistic. For pairs of motifs, a similar procedure was done where all possible combinations were examined in the permuted DNA sequences. Additionally, we used MEME (Bailey et al. 2009) to search for larger DNA motifs. We allowed for any number of repetitions of a given motif within the 10 highest recombining regions at 25 kb resolution, and searched both the given strand and its reverse complement. We searched for motifs ranging from 7 to 13 nt in length and used a first-order background model that incorporated mono- and dinucleotide composition. Only motifs with E values less than 10−10 were examined.
Data availability
The authors state that all data necessary for confirming the conclusions presented in the article are represented fully within the article.
Results
We used visible markers to generate more than 1000 strains, each containing one crossover in a 1.48-Mb region on the X chromosome. As shown in Figure 1 and Figure S1, we obtained these recombinants from reciprocal crosses, which we refer to as the FaxNIL and LonNIL crosses. After genotyping the strains at an indel within the NIL interval (191-bp indel at X:4,103,162-4,103,352), we observed a striking difference between the two parental crosses. Sub-NILs that arose from the FaxNIL cross had many more recombination events to the left of the indel than the right, whereas the proportion was roughly equal for the sub-NILs arising from the LonNIL cross (Table 1).
Table 1. Crossover distribution on left or right of indel at X:4.103 Mb.
Cross | Recombined on Left | Recombined on Right |
---|---|---|
FaxNIL | 387 | 243 |
LonNIL | 244 | 241 |
P < 0.001, Fisher’s exact test.
We obtained crossover locations for 870 lines across 277 SNPs along the 1.48-Mb interval (Figure 2). The median distance between consecutive markers is approximately 4.4 kb, with a mean distance of 5.4 kb. Of the 870 lines, 56 had crossovers that occurred between one of the visible marker mutations, and either the first or last of the 277 marker SNPs, making the resulting sub-NILs either completely N2 or completely CB4856 within this interval. Another 16 lines had crossovers in regions that are unshared between the two parental crosses (left and right edges in Figure 2). Less than 9% (22/258) of the intervals between consecutive probes in the shared 1.41 Mb had no observed crossovers, and 92% of the crossover events mapped to intervals of less than 10 kb. The largest distance between observed recombination breakpoints among strains is 41.5 kb (X:4,020,535-4,062,018), of which 30.5 kb is contiguous tandem repetitive DNA in N2 (X:4,025,997-4,056,509). This is the third-longest stretch of tandem repetitive DNA (other than the rDNA repeats) in the C. elegans genome (WormBase). The crossover distribution on the left side of the region (3.28–4.10 Mb) significantly differs between the FaxNIL cross and LonNIL cross (P = 0.03), but does not differ on the right side (4.10–4.68 Mb; P = 0.26; Figure 3).
Variation in recombination rate across the interval
We assessed the minimal number of constant recombination rate domains required to explain the observed data, if such domains existed. The simplest piecewise linear regression models that could not be rejected at P = 0.05 for the observed data suggest a minimum of 23 subdomains of constant recombination rate, with 20 in the 820 kb to the left of the indel at 4.10 Mb, and three in the 590 kb to the right (based on 100,000 simulations per model).
Three regions within the focal interval harbor large length polymorphisms or repeat sequence, and these require special consideration. An interval described above includes a 30.5 kb tandem repetitive DNA tract (at least in the N2 genome) that is very highly enriched for histone modifications indicative of heterochromatin, particularly H3K9 methylation. Prior work in Drosophila found that satellite DNA organization within the genome is regulated in a similar manner to that of ribosomal DNA, where both are similarly enriched with H3K9me2 modifications. It is thought that recombination in these regions can lead to genome instability (Peng and Karpen 2006). For two other marker intervals, recent resequencing of CB4856 identified the absence of two duplications that are present in N2, both of which are more than 6 kb in length (Noble et al. 2015; Thompson et al. 2015). As each of these indels represent > 40% of the physical distance bound by the markers defining their respective bins, we decided to remove these bins, along with the bin containing the tandem repeats, from the analyses below.
We evaluated the data at two resolutions, as prior work has found that the explanatory power of genomic features depends on the scale of analysis (Cirulli et al. 2007); full incorporates all of the intervals between genotyped informative SNPs (n = 256 bins, median distance = 4.4 kb, range: 3.3–18.2 kb; median of three breakpoints per interval), and 25kb has a more uniform distribution of bin sizes, and all bins have at least three recombination events (n = 53 bins, median distance = 25.4 kb, range: 21.3–29.9 kb; median of 14 breakpoints per interval). We first tested the simplest model of crossover distribution, a single constant rate across the interval. We fitted the observed crossover data to a generalized linear model incorporating only physical distance between markers as an explanatory variable and interval side (left vs. right) as a covariate that accounts for our sampling design. The residual deviance of this simple model is significantly higher in both the 25kb and full datasets than in datasets simulated with uniform recombination (P = 0.014 and P = 0.004, respectively), indicating that additional factors contribute to the observed heterogeneity (Figure 4).
To assess recombination rate heterogeneity, we calculated a Gini coefficient for the full data set. The Gini coefficient serves as an inequality measure for the observed crossover distribution. A value of 0 indicates perfect equality (i.e., all locations have the same recombination rate), and a value of 1 indicates perfect inequality (i.e., all recombination occurs in the same location). The left and right interval halves have Gini coefficients of 0.368 and 0.332, respectively (Figure 5). These values are considerably lower than what has been found in humans and yeast, with coefficients averaging 0.8 and 0.65, respectively (Mancera et al. 2008; Kong et al. 2010; Kaur and Rockman 2014). Estimates of human and primate Gini coefficients from linkage disequilibrium (LD) data are closer to 0.7, lower than the estimates based on direct measurements of crossovers, perhaps because of the influence of population size variation on LD-based estimates (Stevison et al. 2016). LD-based estimates in Drosophila pseudoobscura found a more modest Gini coefficient of 0.50 (Heil et al. 2015).
The Gini coefficients for both sides of the studied interval are higher than the 0.28 observed for the 214 kb to the right of the center-right arm boundary on Chromosome II (Kaur and Rockman 2014). One difference between the Chromosome II and Chromosome X studies is simply a matter of scale; the Chromosome X interval examined here is approximately seven times larger than the 214-kb Chromosome II right-arm interval. Due to the number of strains analyzed over this interval, we actually have fewer crossover events in 210–220 kb windows (115 on average for the left side, and 150 on average for the right side) than the 218 obtained on Chromosome II. Upon down-sampling the Chromosome II data, and calculating Gini coefficients for 100,000 permutations, we find that our observed Gini coefficients are not significantly different than 0.28 (P > 0.14 for left and right sides).
To determine whether the Gini coefficients are compatible with the presence of hotspots, we simulated crossovers by drawing recombination rates from a gamma distribution. By altering the shape parameter of the gamma, we examined recombination rate landscapes ranging from hotspot-rich to nearly constant. For reference, a shape parameter of 1 produces an exponential distribution, and, as this parameter increases, heterogeneity in the distribution decreases. From these simulations, we calculated Gini coefficients and compared the results to our observed crossover data. For both the left and right sides of the studied genomic interval, our data do not conform to a highly heterogeneous recombination rate landscape, as their 95% confidence intervals do not include models whose most recombinant kilobase is expected to have a recombination rate ≥ 10 times the mean rate. The models that best approximate the left and right side of our interval have shape parameters of 1.70 and 3.57, respectively. These shape parameters correspond to distributions where the most recombinant kilobase is expected to have a crossover rate approximately 5.3 and 3.4 times the mean rate, respectively.
Genomic correlates of recombination rate
GC content was previously found to be associated with the shift in recombination rate across the Chromosome II center-right arm boundary region (Kaur and Rockman 2014), with higher GC content in the lower-recombination center. Within the 1.41 Mb common region among the sub-NILs, GC content was not associated with variation in recombination rate for the full dataset, but it had a significant negative effect at 25kb (Table 2). The same effect was observed with ChIP-chip data for mononucleosome occupancy (Table 2) and histone modifications, many of which are highly correlated with each other (Figure 6 and Table S2). Several other sequence-level parameters, which have been found to influence recombination rate in other organisms, had no observed effect in this region at both scales, including gene density (i.e., the number of transcription start sites per basepair; a proxy for assessing promoters), amount of intergenic DNA between nonconvergently transcribed genes (a second proxy for assessing promoters), and amount of intronic DNA (Table 2).
Table 2. Genomic correlates for recombination rate variation at 25 kb resolution.
Genomic Feature | Coefficient | Deviance Explained | P-Value |
---|---|---|---|
Poly(A) ≥ 4 | 40.43 | 17.7% | 0.0003 |
GC content | −8.25 | 14.8% | 0.0010 |
Mononucleosome occupancy | −0.27 | 8.7% | 0.0114 |
(CA) ≥ 5 | 1399.92 | 4.4% | 0.0611 |
Proportion intergenic DNA | 0.32 | 3.3% | 0.1117 |
Gene density | −444.17 | 0.7% | 0.4481 |
Proportion intronic DNA | −0.16 | 0.4% | 0.5612 |
Each correlate was examined in a univariate generalized linear model. Values for mononucleosome occupancy incorporated an additional term for the IgG binding control. Reported P-values are not corrected for multiple tests.
We next investigated more specific DNA sequence associations with recombination rate variation. Poly(A) repeats were previously found to be positively correlated with recombination rate in budding yeast (Mancera et al. 2008) and Drosophila melanogaster (Comeron et al. 2012). Poly(A) ≥ 4 exhibits a significant positive association with crossover rate at 25kb (P < 0.001). However, (CA)n repeats found to be highly associated with crossovers in those same species (Gendrel et al. 2000; Comeron et al. 2012) do not explain the variation in crossover rate we observed; at best, (CA) ≥ 5 is marginally significant (P = 0.06, Table 2).
We also performed an exhaustive scan of all DNA motifs ranging from 4 to 8 bp in length, controlling for the large number of tests by a permutation procedure. Significant motifs are described in Table 3. For the full data set, AACA explains a significant portion of the deviance in a generalized linear model, and is positively associated with crossover rate. For the 25kb data set, GGATAG significantly explains the crossover rate variation observed, and is negatively associated. Removal of deletion polymorphisms present in CB4856 from the examined sequence does not affect the significance of the relationships (Table S3).
Table 3. Small DNA motifs with significant effects on crossover rate.
Motif(s) | Resolution | Effect Direction | Expected Deviance Explained | Additional Deviance Explained | Nominal P-Value | Permutation P-Value |
---|---|---|---|---|---|---|
AACA | Full | + | 2.5% | 3.1% | 9.25 × 10−6 | 0.004 |
GGATAG | 25 kb | — | 27.3% | 8.0% | 2.25 × 10−7 | 0.040 |
Expected deviance explained is the mean percentage of deviance explained by the most explanatory motif in 1000 permutations of DNA sequence within each bin in the focal interval. Permutation P-values control for multiple tests across all motifs of a given length.
Additionally, we conducted a MEME search for longer motifs at 25 kb resolution. We examined motifs that passed the E-value < 10−10 threshold by counting occurrences within all 53 bins, and incorporating the occurrence rate vector as a variable in a generalized linear model. The most explanatory motif we found is AAAA[AT]AT[TC]AT[AT]T, which is positively associated with crossover rate (nominal P-value = 1.32 × 10−4). However, this P-value is similar to the P-value observed for poly(A) ≥ 4, which occurs once within the discovered motif, and is almost three orders of magnitude larger than the nominal P-value of GGATAG at this resolution. Lowering the E-value threshold to 10−5 did not yield a more explanatory motif.
Discussion
We have generated a large, permanent mapping resource encompassing 798 recombination events within a 1.41-Mb span on the left arm of the C. elegans X chromosome. At a median resolution of about 4.4 kb, less than 10% of intervals lack a crossover event. We observed significant heterogeneity in recombination rate across the interval, though consistent with prior findings (Kaur and Rockman 2014), there is no evidence for local recombination hotspots. Interestingly, simple DNA sequence features, including GC content and small sequence motifs, explain a significant portion of the observed heterogeneity.
One of the intervals lacking a crossover in the NIL panel contains a very large tandem repetitive region where recombination may be suppressed. In budding yeast, the repetitive rDNA array is shielded from nonallelic homologous recombination by the histone deacetylase Sir2, and is maintained in a heterochromatic state during meiosis (Gottlieb and Esposito 1989; Mieczkowski et al. 2007). ChIP-chip data for nucleosome occupancy and H3K9me2/3 in C. elegans embryos and adults show high enrichment in this repetitive region, which may support a similar function in recombination suppression.
Within this X chromosome region, we found two large deletions in the CB4856 genome relative to N2. In both cases, the genes missing in CB4856 appear to be tandem duplicates of neighboring genes. The CB4856 allele of the indel at X:3.56 Mb is present in at least three other wild-isolated strains (JU319, MY16, and AB2), and the CB4856 allele of the indel at X:4.33 Mb is present in at least three different wild-isolated strains (JU345, JU1171, and QX1211; sequence data from Noble et al. 2015). Both of these indels may have had an impact on local recombination rate due to the physical distances being markedly different between the two strains. At the finest-scale resolution we have, the distances between consecutive probes encompassing these large deletions in the N2 genome are both over 14 kb, and only one recombination event was observed in each. Assuming a uniform recombination rate using N2 interval sizes, we would have expected 6.7 and 10.0 crossovers for those regions on the left and right, respectively. The effect of indel heterozygosity on the crossover rate of nearby flanking sequence is unknown.
Interestingly, we found that the FaxNIL cross produced significantly more crossovers on the left side of the interval than did the LonNIL cross. Furthermore, the distribution of crossovers on the left differs significantly between the crosses. However, we found that the correlates of recombination rate variation are concordant between the crosses, when examining each one individually. Because the size and location of the starting CB4856 genomic fragment differs slightly in the two crosses, we checked the unshared 75 kb for candidate genes that are involved in meiosis and/or chromatin remodeling; variants in these regions could act in trans to yield different crossover distributions in the two crosses. None of the genes in the variable region have a known function in either of those two processes. Another possible explanation for this cross effect is a deleterious epistatic interaction between two or more alleles that only occurs in one direction, either on the left of the LonNIL cross or on the right of the FaxNIL cross. For example, in the former scenario, the interaction could involve a CB4856 allele on the left side of a crossover interacting with an N2 allele on its right side. The converse would apply to the latter scenario. Since we did not observe any obvious lethality in either cross, the effect may have been a slower development time, which could have been selected against based on the timing of picking F2s.
We found multiple correlated sequence-level features that significantly explain the crossover distribution we observed, though, in all but one instance, this was at ∼25 kb resolution. The present dataset is likely underpowered to detect many associations at a finer scale. Additionally, due to unique features of our focal interval and our experimental design, we are unable to fully test the relationship of nuclear envelope attachment via LEM-2 or nucleotide diversity with recombination rate variation. In the former case, LEM-2 binding to the X chromosome drops off to the right of the large tandem repetitive sequence at 4.02 Mb (Ikegami et al. 2010), which is near the indel used for preliminary genotyping. LEM-2 binding ends up being near-perfectly confounded with the interval side covariate. We see no relationship between LEM-2 and crossover location when analyzing the left and right sides separately. A similar situation exists for nucleotide diversity, where marker intervals on the right have increased diversity compared to the left (Thompson et al. 2015) and we do not observe an effect when analyzing the sides separately.
Genome sequence features influence many activities within nuclei, including transcriptional state via histone modifications and nucleosome occupancy. These features also influence overall chromatin conformation, which may affect where crossovers occur during meiosis. At the 20–30 kb scale, we found several correlated associations of genome features with recombination rate variation. Notably, GC content, a suite of histone modifications, and nucleosome occupancy are negatively associated with recombination rate, while poly(A) is positively associated with recombination rate within our focal region. These associations may share an underlying mechanism. Prior work in C. elegans found that nucleosome occupancy in both early embryos and L3 larvae correlates strongly with GC content on the X chromosome, particularly at gene promoters (Ercan et al. 2011). Furthermore, nucleosomes are depleted in core promoters that have many interspersed blocks of consecutive As or Ts (Grishkevich et al. 2011).
In mice lacking functional Prdm9, and in budding yeast, crossovers preferentially occur at promoters (Mancera et al. 2008; Brick et al. 2012). We found no significant association between recombination rate and either gene number or intergenic distance between nonconvergently transcribed genes. Furthermore, the H3K4me3 distribution in the focal interval (measured in early embryos) does not show the positive association with recombination rate that it does in mice and yeast. These results suggest that the targets for crossovers on the X chromosome are regions lacking markers of heterochromatin and nucleosomes. Importantly, there does not appear to be a necessity for histone marks associated with active transcription (e.g., H3K4me3) to mediate where crossovers occur. It is unclear whether this is a unique feature of the X chromosome or applicable to all C. elegans chromosomes.
Neither of the small motifs we discovered has been previously described as affecting recombination rate in any organism. As the only longer motif identified through MEME is less explanatory than these sequences, it is possible that the mechanisms underlying crossover distribution in C. elegans differ markedly from those in other studied species.
In summary, our findings suggest significant heterogeneity exists within the high-recombination-rate left arm of the X chromosome in spite of a lack of crossover hotspots. As in other fine-scale recombination studies, the scale at which data are analyzed can affect the significance of explanatory variables; in this case the fine-scale data we have may be underpowered to detect some associations. Despite its unique regulation during early meiotic prophase, the X chromosome has a crossover landscape similar to that of Chromosome II. The crossover distribution is shaped in part by reduced recombination in heterochromatic regions and sites bound by nucleosomes. We do not find evidence of large DNA motifs that greatly explain the variation we observe, which may be a result of a recombination rate architecture that lacks well-defined hotspots.
Supplementary Material
Acknowledgments
We thank Christina Dai for assistance in generating sub-NILs. We thank David Riccardi, Jasmine Nicodemus, and Jia Shen for assistance in the lab, and we thank David Riccardi and the New York University (NYU) Center for Genomics & Systems Biology Genome Core facility for assistance in sequencing the parental worm strains. We thank members of the Rockman lab for helpful suggestions on manuscript drafts. We thank the Caenorhabditis Genetics Center, which is funded by the National Institutes of Health (NIH) Office of Research Infrastructure Programs (P40OD010440), for strains, and WormBase for maintaining and curating databases. This work was supported by the NIH (R01GM089972).
Footnotes
Supplemental material is available online at www.g3journal.org/lookup/suppl/doi:10.1534/g3.116.028001/-/DC1
Communicating editor: M. C. Zetka
Literature Cited
- Akhunov E. D., Goodyear A. W., Geng S., Qi L.–L., Echalier B., et al. , 2003. The organization and rate of evolution of wheat genomes are correlated with recombination rates along chromosome arms. Genome Res. 13: 753–763. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Albertson D. G., Thomson J. N., 1982. The kinetochores of Caenorhabditis elegans. Chromosoma 86: 409–428. [DOI] [PubMed] [Google Scholar]
- Andersen E. C., Gerke J. P., Shapiro J. A., Crissman J. R., Ghosh R., et al. , 2012. Chromosome-scale selective sweeps shape Caenorhabditis elegans genomic diversity. Nat. Genet. 44: 285–290. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Andersen E. C., Bloom J. S., Gerke J. P., Kruglyak L., 2014. A variant in the neuropeptide receptor npr-1 is a major determinant of Caenorhabditis elegans growth and physiology. PLoS Genet. 10: e1004156. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Arico J. K., Katz D. J., van der Vlag J., Kelly W. G., 2011. Epigenetic patterns maintained in early Caenorhabditis elegans embryos can be established by gene activity in the parental germ cells. PLoS Genet. 7: e1001391. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Auton A., Rui Li Y., Kidd J., Oliveira K., Nadel J., et al. , 2013. Genetic recombination is targeted towards gene promoter regions in dogs. PLoS Genet. 9: e1003984. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bailey T. L., Boden M., Buske F. A., Frith M., Grant C. E., et al. , 2009. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 37: W202–W208. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Barnes T., Kohara Y., Coulson A., Hekimi S., 1995. Meiotic recombination, noncoding DNA and genomic organization in Caenorhabditis elegans. Genetics 141: 159–179. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Barrière A., Félix M.-A., 2007. Temporal dynamics and linkage disequilibrium in natural Caenorhabditis elegans populations. Genetics 176: 999–1011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Barton A. B., Pekosz M. R., Kurvathi R. S., Kaback D. B., 2008. Meiotic recombination at the ends of chromosomes in Saccharomyces cerevisiae. Genetics 179: 1221–1235. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Baudat F., Imai Y., de Massy B., 2013. Meiotic recombination in mammals: localization and regulation. Nat. Rev. Genet. 14: 794–806. [DOI] [PubMed] [Google Scholar]
- Bean C. J., Schaner C. E., Kelly W. G., 2004. Meiotic pairing and imprinted X chromatin assembly in Caenorhabditis elegans. Nat. Genet. 36: 100–105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brenner S., 1974. The genetics of Caenorhabditis elegans. Genetics 77: 71–94. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brick K., Smagulova F., Khil P., Camerini-Otero R. D., Petukhova G. V., 2012. Genetic recombination is directed away from functional genomic elements in mice. Nature 485: 642–645. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chowdhury R., Bois P. R. J., Feingold E., Sherman S. L., Cheung V. G., 2009. Genetic analysis of variation in human meiotic recombination. PLoS Genet. 5: e1000648. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cirulli E. T., Kliman R. M., Noor M. A. F., 2007. Fine-scale crossover rate heterogeneity in Drosophila pseudoobscura. J. Mol. Evol. 64: 129–135. [DOI] [PubMed] [Google Scholar]
- Comeron J. M., Ratnappan R., Bailin S., 2012. The many landscapes of recombination in Drosophila melanogaster. PLoS Genet. 8: e1002905. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cutter A. D., Payseur B. A., 2003. Selection at linked sites in the partial selfer Caenorhabditis elegans. Mol. Biol. Evol. 20: 665–673. [DOI] [PubMed] [Google Scholar]
- Cutter A. D., Payseur B. A., 2013. Genomic signatures of selection at linked sites: unifying the disparity among species. Nat. Rev. Genet. 14: 262–274. [DOI] [PMC free article] [PubMed] [Google Scholar]
- de Bono M., Bargmann C. I., 1998. Natural variation in a neuropeptide Y receptor homolog modifies social behavior and food response in C. elegans. Cell 94: 679–689. [DOI] [PubMed] [Google Scholar]
- Drouaud J., Camilleri C., Bourguignon P., Canaguier A., Bérard A., et al. , 2006. Variation in crossing-over rates across chromosome 4 of Arabidopsis thaliana reveals the presence of meiotic recombination “hot spots.” Genome Res. 16: 106–114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ercan S., Lubling Y., Segal E., Lieb J. D., 2011. High nucleosome occupancy is encoded at X-linked gene promoters in C. elegans. Genome Res. 21: 237–244. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gao J., Kim H.-M., Elia A. E., Elledge S. J., Colaiácovo M. P., 2015. NatB domain-containing CRA-1 antagonizes hydrolase ACER-1 linking acetyl-CoA metabolism to the initiation of recombination during C. elegans meiosis. PLoS Genet. 11: e1005029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gendrel C., Boulet A., Dutreix M., 2000. (CA/GT)n microsatellites affect homologous recombination during yeast meiosis. Genes Dev. 14: 1261–1268. [PMC free article] [PubMed] [Google Scholar]
- Gottlieb S., Esposito R. E., 1989. A new role for a yeast transcriptional silencer gene, SIR2, in regulation of recombination in ribosomal DNA. Cell 56: 771–776. [DOI] [PubMed] [Google Scholar]
- Grishkevich V., Hashimshony T., Yanai I., 2011. Core promoter T-blocks correlate with gene expression levels in C. elegans. Genome Res. 21: 707–717. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Heil C. S. S., Ellison C., Dubin M., Noor M. A. F., 2015. Recombining without hotspots: a comprehensive evolutionary portrait of recombination in two closely related species of Drosophila. Genome Biol. Evol. 7: 2829–2842. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hellsten U., Wright K. M., Jenkins J., Shu S., Yuan Y., et al. , 2013. Fine-scale variation in meiotic recombination in Mimulus inferred from population shotgun sequencing. Proc. Natl. Acad. Sci. USA 110: 19478–19482. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hill W. G., Robertson A., 1966. The effect of linkage on limits to artificial selection. Genet. Res. 8: 269–294. [PubMed] [Google Scholar]
- Ikegami K., Egelhofer T. A., Strome S., Lieb J. D., 2010. Caenorhabditis elegans chromosome arms are anchored to the nuclear membrane via discontinuous association with LEM-2. Genome Biol. 11: R120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kaur T., Rockman M. V., 2014. Crossover heterogeneity in the absence of hotspots in Caenorhabditis elegans. Genetics 196: 137–148. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kelly W. G., Schaner C. E., Dernburg A. F., Lee M., Kim S. K., et al. , 2002. X-chromosome silencing in the germline of C. elegans. Development 129: 479–492. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kent W. J., Sugnet C. W., Furey T. S., Roskin K. M., Pringle T. H., et al. , 2002. The Human Genome Browser at UCSC. Genome Res. 12: 996–1006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kong A., Thorleifsson G., Gudbjartsson D. F., Masson G., Sigurdsson A., et al. , 2010. Fine-scale recombination rate differences between sexes, populations and individuals. Nature 467: 1099–1103. [DOI] [PubMed] [Google Scholar]
- Lercher M. J., Hurst L. D., 2002. Human SNP variability and mutation rate are higher in regions of high recombination. Trends Genet. 18: 337–340. [DOI] [PubMed] [Google Scholar]
- Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., et al. , 2009. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25: 2078–2079. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lunter G., Goodson M., 2011. Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res. 21: 936–939. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mancera E., Bourgon R., Brozzi A., Huber W., Steinmetz L. M., 2008. High-resolution mapping of meiotic crossovers and non-crossovers in yeast. Nature 454: 479–485. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McGrath P. T., Rockman M. V., Zimmer M., Jang H., Macosko E. Z., et al. , 2009. Quantitative mapping of a digenic behavioral trait implicates globin variation in C. elegans sensory behaviors. Neuron 61: 692–699. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Meneely P. M., Farago A. F., Kauffman T. M., 2002. Crossover distribution and high interference for both the X chromosome and an autosome during oogenesis and spermatogenesis in Caenorhabditis elegans. Genetics 162: 1169–1177. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Meneely P. M., McGovern O. L., Heinis F. I., Yanowitz J. L., 2012. Crossover distribution and frequency are regulated by him-5 in Caenorhabditis elegans. Genetics 190: 1251–1266. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Meyer L. R., Zweig A. S., Hinrichs A. S., Karolchik D., Kuhn R. M., et al. , 2013. The UCSC Genome Browser database: extensions and updates 2013. Nucleic Acids Res. 41: D64–D69. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mieczkowski P. A., Dominska M., Buck M. J., Lieb J. D., Petes T. D., 2007. Loss of a histone deacetylase dramatically alters the genomic distribution of Spo11p-catalyzed DNA breaks in Saccharomyces cerevisiae. Proc. Natl. Acad. Sci. USA 104: 3955–3960. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Much J. W., Slade D. J., Klampert K., Garriga G., Wightman B., 2000. The fax-1 nuclear hormone receptor regulates axon pathfinding and neurotransmitter expression. Development 127: 703–712. [DOI] [PubMed] [Google Scholar]
- Ng P., Maechler M., 2007. A fast and efficient implementation of qualitatively constrained quantile smoothing splines. Stat. Model. 7: 315–328. [Google Scholar]
- Noble L. M., Chang A. S., McNelis D., Kramer M., Yen M., et al. , 2015. Natural variation in plep-1 causes male-male copulatory behavior in C. elegans. Curr. Biol. 25: 2730–2737. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Peng J. C., Karpen G. H., 2006. H3K9 methylation and RNA interference regulate nucleolar organization and repeated DNA stability. Nat. Cell Biol. 9: 25–35. [DOI] [PMC free article] [PubMed] [Google Scholar]
- R Core Team 2015 R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Available at: http://www.R-project.org/.
- Rockman M. V., Kruglyak L., 2009. Recombinational landscape and population genomics of Caenorhabditis elegans. PLoS Genet. 5: e1000419. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rockman M. V., Skrovanek S. S., Kruglyak L., 2010. Selection at linked sites shapes heritable phenotypic variation in C. elegans. Science 330: 372–376. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roesti M., Moser D., Berner D., 2013. Recombination in the threespine stickleback genome–patterns and consequences. Mol. Ecol. 22: 3014–3027. [DOI] [PubMed] [Google Scholar]
- Silva-Junior O. B., Grattapaglia D., 2015. Genome-wide patterns of recombination, linkage disequilibrium and nucleotide diversity from pooled resequencing and single nucleotide polymorphism genotyping unlock the evolutionary history of Eucalyptus grandis. New Phytol. 208: 830–845. [DOI] [PubMed] [Google Scholar]
- Singhal S., Leffler E. M., Sannareddy K., Turner I., Venn O., et al. , 2015. Stable recombination hotspots in birds. Science 350: 928–932. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smukowski C. S., Noor M. A. F., 2011. Recombination rate variation in closely related species. Heredity 107: 496–508. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stevison L. S., Woerner A. E., Kidd J. M., Kelley J. L., Veeramah K. R., et al. , 2016. The time-scale of recombination rate evolution in great apes. Mol. Biol. Evol. 33(4): 928–945. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stiernagle, T., 2006 Maintenance of C. elegans (February 11, 2006). WormBook, ed. The C. elegans Research Community WormBook, /10.1895/wormbook.1.101.1, http://www.wormbook.org. 10.1895/wormbook.1.101.1 [DOI] [PMC free article] [PubMed]
- Thompson O. A., Snoek L. B., Nijveen H., Sterken M. G., Volkers R. J. M., et al. , 2015. Remarkably divergent regions punctuate the genome assembly of the Caenorhabditis elegans Hawaiian strain CB4856. Genetics 200: 975–989. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tsai I. J., Burt A., Koufopanou V., 2010. Conservation of recombination hotspots in yeast. Proc. Natl. Acad. Sci. USA 107: 7847–7852. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wagner C. R., Kuervers L., Baillie D. L., Yanowitz J. L., 2010. xnd-1 regulates the global recombination landscape in Caenorhabditis elegans. Nature 467: 839–843. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yook K., Harris T. W., Bieri T., Cabunoc A., Chan J., et al. , 2012. WormBase 2012: more genomes, more data, new website. Nucleic Acids Res. 40: D735–D741. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The authors state that all data necessary for confirming the conclusions presented in the article are represented fully within the article.