Abstract
The mutational process varies at many levels, from within genomes to among taxa. Many mechanisms have been linked to variation in mutation, but understanding of the evolution of the mutational process is rudimentary. Physiological condition is often implicated as a source of variation in microbial mutation rate and may contribute to mutation rate variation in multicellular organisms.
Deleterious mutations are an ubiquitous source of variation in condition. We test the hypothesis that the mutational process depends on the underlying mutation load in two groups of Caenorhabditis elegans mutation accumulation (MA) lines that differ in their starting mutation loads. “First-order MA” (O1MA) lines maintained under minimal selection for ∼250 generations were divided into high-fitness and low-fitness groups and sets of “second-order MA” (O2MA) lines derived from each O1MA line were maintained for ∼150 additional generations. Genomes of 48 O2MA lines and their progenitors were sequenced. There is significant variation among O2MA lines in base-substitution rate (µbs), but no effect of initial fitness; the indel rate is greater in high-fitness O2MA lines. Overall, µbs is positively correlated with recombination and proximity to short tandem repeats and negatively correlated with 10 bp and 1 kb GC content. However, probability of mutation is sufficiently predicted by the three-nucleotide motif alone. Approximately 90% of the variance in standing nucleotide variation is explained by mutability. Total mutation rate increased in the O2MA lines, as predicted by the “drift barrier” model of mutation rate evolution. These data, combined with experimental estimates of fitness, suggest that epistasis is synergistic.
Keywords: drift barrier, mutability, mutation accumulation, short tandem repeat, synergistic epistasis
Introduction
The evolution of the mutation rate is of longstanding interest to evolutionary theorists (Fisher 1930; Sturtevant 1937; Lynch et al. 2016), and there is abundant empirical evidence that the overall rate, molecular spectrum, and phenotypic consequences of mutation—collectively, the mutational process—vary at many biological levels, from within an individual genome to among species and higher taxa (Drake et al. 1998; Conrad et al. 2011; Schrider et al. 2013; Long et al. 2016; Ness et al. 2016; Carlson et al. 2018). The mechanistic, environmental, ecological, and evolutionary factors that potentially influence variation in the mutational process are legion.
Microbiologists have appreciated for many decades that physiological stress is often associated with increased mutation rate (e.g., see fig. 6 of Ogur et al. 1960). Recently, Agrawal and his colleagues have undertaken a systematic investigation into the effects of physiological condition (∼ “stress”) on the mutational process in Drosophila melanogaster, motivated by theoretical findings that if the mutation rate is condition-dependent, the accumulation of deleterious mutations can have interesting and sometimes counterintuitive feedback effects on population mean fitness (Agrawal 2002; Shaw and Baer 2011). They manipulated physiological condition both exogenously, by manipulating food quality (Agrawal and Wang 2008) and endogenously, by allowing mutations to accumulate under relaxed selection on genomes that were initially identical except for the presence or absence of one or two mutations of large deleterious effect (Sharp and Agrawal 2012). Poorly fed females transmitted ∼30% more lethal mutations than did well-fed females (Agrawal and Wang 2008). Similarly, mutation accumulation (MA) lines beginning with a large genetic load declined in fitness more rapidly than lines with wild-type genomes, which is most simply explained by the low-fitness lines having a greater mutation rate than the high-fitness lines (Sharp and Agrawal 2012). Whole-genome sequencing revealed that the faster decline in fitness of the loaded MA lines can be attributed to an elevated rate of small deletions, plausibly as a result of different mechanisms of double-strand break repair (Sharp and Agrawal 2016). In an analogous study, Ávila et al. (2006) constructed a set of MA lines of D. melanogaster derived from a single MA line that had itself accumulated mutations under relaxed selection for 265 generations, a protocol that we call “second-order MA” (O2MA). The per-generation decline in fitness was greater in the O2MA lines than in the ancestral (“first-order MA,” O1MA) lines. That result is consistent with an increased mutation rate in the O2MA lines relative to the O1MA lines, but it is also consistent with mutational effects being greater in the O2MA lines (i.e., synergistic epistasis; Dickinson 2008).
We report here the results of a O2MA experiment in the nematode C. elegans, specifically designed to assess the relationship between the initial genomic load of spontaneous deleterious mutations and the subsequent effects on the mutational process (fig. 1). Initially, a set of 100 MA lines derived from a single, highly inbred individual of the N2 strain was allowed to accumulate mutations for ∼250 generations under minimal selection. From a subset of 67 O1MA lines assayed for absolute fitness (defined as lifetime reproductive success under noncompetitive conditions), we chose five lines with consistently high fitness and five lines with consistently low fitness, from which we established ten independent sets of 48 O2MA lines, referred to as “O2MA families,” which were allowed to accumulate mutations for an additional ∼150 generations (Matsuba et al. 2012). Upon completion of the O2MA phase, five replicate O2MA lines from each of the ten O2MA families were sequenced at ∼25X average coverage, along with nine of the ten O1MA progenitors and an additional 23 O1MA lines not included in the O2MA experiment.
Fig. 1.
Schematic depiction of the second-order MA (O2MA) experiment. A color version of this figure is available online. After ∼250 generations of MA (unfilled worms), five low-fitness O1MA lines (light blue worms), and five high-fitness O1MA lines (dark blue worms) were chosen as progenitors (“O2MA progenitor,” circled worm) for sets of 48 O2MA lines (orange worms). Each set of 48 O2MA lines derived from an O2MA progenitor is an “O2MA family.” After ∼150 additional generations (G400), five O2MA lines were randomly chosen from each O2MA family for sequencing (“ACGT”). At the same time, the O2MA progenitor of each O2MA family was sequenced, along with 23 randomly chosen O1MA lines that were not progenitors of O2MA families (gray worms).
The resulting data allow us to address several fundamental questions about the evolution of the mutational process. First, does initial fitness affect the mutational process, and if so, how? Second, how fast does genetic variation in the mutational process accumulate due to the effects of new spontaneous mutations? Third, is the mutation rate itself subject to a mutational bias, as predicted by the “drift barrier” hypothesis of mutation rate evolution? Fourth, how does the mutational process depend on underlying features of the genome, for example, local sequence context, recombination rate or base composition? Fifth, to what extent does the local mutational milieu predict standing nucleotide sequence variation?
Finally, in combination with experimental estimates of fitness, we can assess how the average mutational effect depends on the underlying genetic load. If mutational effects are greater in the low-fitness O2MA lines than in the high-fitness lines, it implies that deleterious mutations interact synergistically; the converse implies diminishing-returns epistasis.
Results
Pooled over 48 (out of 50) O2MA lines and nine of the ten O2MA progenitors (three lines yielded poor sequencing coverage), using our default pipeline we identified 1,828 base substitutions (1,481 O2MA, 347 O1MA), 361 small deletions (293 O2 MA, 68 O1MA), and 236 small insertions (196 O2MA, 40 O1MA). We sequenced the genomes of an additional 23 O1MA lines not included in the O2MA experiment, from which we identified 884 base substitutions, 142 deletions, and 93 insertions. Mutation rates for the different groups are summarized in table 1, and given for individual lines in supplementary table S1, Supplementary Material online. Individual mutations and their properties are listed in supplementary table S2, Supplementary Material online. Raw sequence data are archived in the NCBI Short Read Archive, project numbers PRJNA395568 (O2MA lines and O2MA progenitors), and PRJNA429972 (other O1MA lines).
Table 1.
Average Mutation Rates (SEM).
O2MA Progenitor (n = 9) | O1MA Other (n = 23) | O1MA All (n = 32) | O2MA, High (n = 24) | O2MA, Low (n = 24) | O2MA, All (n = 48) | |
---|---|---|---|---|---|---|
µbs (×109) | 2.26 (0.12) | 2.35 (0.11) | 2.33 (0.08) | 2.58 (0.11) | 2.49 (0.32) | 2.57 (0.11) |
µINS (×109) | 0.26 (0.04) | 0.25 (0.03) | 0.26x (0.02) | 0.35 (0.03) | 0.32 (0.04) | 0.34y (0.03) |
µDEL (×109) | 0.42 (0.05) | 0.39 (0.04) | 0.40 (0.03) | 0.59a (0.07) | 0.41b (0.04) | 0.50 (0.04) |
µINDEL (×109) | 0.68 (0.06) | 0.64 (0.05) | 0.66x (0.04) | 0.95a (0.06) | 0.73b (0.05) | 0.84y (0.05) |
µTotal (×109) | 2.94 (0.11) | 2.99 (0.13) | 2.98x,* (0.10) | 3.52 (0.13) | 3.21 (0.31) | 3.37y,* (0.16) |
µGENOME | 0.29 | 0.30 | 0.30 | 0.35 | 0.32 | 0.34 |
Note.—All mutation rates are per-site, per-generation except µGENOME. Abbreviations are: µbs, base substitution mutation rate; µINS, insertion rate; µDEL, deletion rate; µINDEL, indel rate; µTotal, total mutation rate; µGENOME, haploid genome-wide mutation rate per-generation.
superscripts are significantly different (P < 0.05) within O1MA or O2MA groups.
superscripts are significantly different (P < 0.05) between O1MA and O2MA.
P ≈ 0.06.
Averaged over all 32 O1MA lines, the per-nucleotide base-substitution mutation rate µbs = 2.33 (± 0.08) × 10−9 per generation. µbs for the nine O2MA progenitors is 2.26 ± 0.12 × 10−9 per generation). Averaged over the 48 O2MA lines, the base-substitution mutation rate over the subsequent ∼150 generations are estimated to be µbs = 2.57 (± 0.11) × 10−9 per-generation, not significantly different from the combined O1MA rate (general linear model [GLM], F1, 14.5 = 1.32, P > 0.26; see Supplementary Appendix A1.6, Supplementary Material online for details of the GLM). The short indel rate for the full set of 32 O1MA lines µINDEL = 0.66 (±0.04) × 10−9 per-site per-generation. The indel rate of the nine O2MA progenitors (µINDEL, ANC = 0.68 ± 0.06 × 10−9/generation) does not differ from that of the other 21 O1MA lines (µINDEL, OTHER = 0.64 ± 0.05 × 10−9/generation). Averaged over all 48 O2MA lines, µINDEL = 0.84 ± 0.05 × 10−9/generation, significantly greater than the combined O1MA rate (GLM, F1, 24.2 = 8.27, P < 0.01) but not significantly greater than that of the nine O2MA progenitors (GLM, F1, 45.4 = 1.61, P > 0.21).
There is a positive correlation between the number of callable sites and the estimated base-substitution mutation rate µbs (rHIGH, O2MA = 0.25, P < 0.22, rLOW, O2MA = 0.58, P < 0.002, rO1MA = 0.30, P < 0.09). In the O2MA lines, callable sites are confounded with both O2MA family and fitness. However, the correlation is driven by a few lines with atypically low coverage (average number of callable sites <60% of the genome). The indel rate is uncorrelated with callable sites (rHIGH, O2MA = 0.16, P > 0.44, rLOW, O2MA = 0.12, P > 0.58, rO1MA = 0.00, P > 0.99). We elaborate on potential artifactual and/or biological factors that may contribute to the correlation in the Extended Discussion in Supplementary Appendix A2.1, Supplementary Material online. We do not believe the results are meaningfully affected by the correlation between µbs and callable sites.
No Evidence for Fitness-Dependent Base-Substitution Mutation
There is significantly more variation in µbs among O2MA lines than expected if the base-substitution mutation rate is uniform across the full set of 48 O2MA lines (simulation P < 0.0001; supplementary fig. S1; see methods and supplementary appendix 3.1, Supplementary Material online, Supplementary Material online for details of simulations). Moreover, there is significant variation in µbs among O2MA families (LRT, chi-square = 9.27, df = 1, P < 0.003; fig. 2). Similarly, there is more variation among O1MA lines than predicted by a uniform mutation rate (simulation P < 0.015). The simplest interpretation is that some element(s) of the mutational process diverged over the course of the first ∼250 generations of MA, and that the signal of the difference(s) carried through the next ∼150 generations of O2MA.
Fig. 2.
Distribution of mutation rate among O2MA families. A color version of this figure is available online. Families derived from high-fitness O2MA progenitors are in light gray, families derived from low-fitness O2MA progenitors are in dark gray. Families 522 and 547 contain four sequenced O2MA lines, the other families contain five O2MA lines. The horizontal line denotes the mutation rate of the nine O2MA progenitors. Points shown outside the box are beyond the 1.5× interquartile range of the family whereas whiskers represent data points within that range. that the mutational process in the ten O2MA progenitors diverged by ∼250 generations of evolution under MA conditions, and the signal of the difference(s) was retained over the subsequent ∼150 generations of O2MA. (A, left) Base-substitution mutation rate (µbs). (B, right) Indel mutation rate (µINDEL).
However, there is no evidence that µbs differs consistently between the O2MA families derived from high fitness and low fitness O2MA progenitors (GLM, F1, 6.24 = 0.18, P > 0.75). Averaged over the five O2MA families in each fitness class, the base substitution mutation rate between G250 and G400 for high and low fitness lines is estimated to be µbs, HIGH = 2.58 (±0.11) × 10−9/gen, and µbs, LOW = 2.49 (±0.32) × 10−9/gen (fig. 2).
As expected from the lack of differentiation of µbs, the base-substitution spectrum neither does differ significantly between the high-fitness and low-fitness O2MA lines (supplementary fig. S2, Supplementary Material online; Monte Carlo Fisher's Exact Test, 107 replicates, P > 0.70), nor does it vary between O2MA families (MC FET, P > 0.25), between individual O2MA lines (MC FET, P > 0.08), between O2MA lines within any of the ten families (P > 0.10 or greater in all ten cases) or between the O2MA progenitors at G250 and the O2MA lines at G400 (supplementary fig. S2, Supplementary Material online; MC FET, P > 0.51).
Consistent with many previous studies (Lynch 2007), the average mutation rate from a C or G to an A or T is significantly greater than the mutation rate from A or T to C or G (µC/G→A/T = 3.03 (±0.18) × 10−9/gen; µA/T→C/G = 0.93 (±0.05) × 10−9/gen). Extrapolating from these rates, the expected base composition of the C. elegans genome at mutational equilibrium is ∼76.5% AT, greater than the actual AT fraction of ∼64.5%. Inspection of the µbs data reveals two potential outlying O2MA lines (fig. 2), one low fitness (O2MA line 508.34, µbs = 0.78 × 10−9/gen) and one high fitness (O2MA line 579.36, µbs = 4.34 × 10−9/gen). When those two lines are omitted from the analysis, the variance among O2MA lines is sufficiently explained by a single base-substitution rate (simulation P > 0.07).
However, O2MA lines derived from O2MA progenitor 508 have the lowest base-substitution rate even with the extreme line omitted [µbs = 1.49 (±1.90) × 10−9/gen with line 508.34 included, 1.68 (±0.91) × 10−9/gen) without line 508.34] and O2MA lines derived from O2MA progenitor 579 have the highest base-substitution rate even with the extreme line omitted [µbs = 3.53 (±0.31) × 10−9/gen with line 579.36 included, 3.26 (± 0.22) × 10−9/gen) without line 579.36]. The random chance that the most extreme high line comes from the family whose other four members also have the highest average mutation rate (5/48) and that the most extreme low line comes from the family whose other four members also have the lowest average mutation rate (5/47) is ∼1%. The most parsimonious explanation is that the two outlying O2MA lines are simply the most extreme manifestations of a biological process common to their respective O2MA progenitors rather than true outliers. The alternative is that the apparently extreme mutation rates are experimental artifacts, which we think is unlikely (see Extended Discussion in Supplementary Appendix A2, Supplementary Material online).
Fitness-Dependence of the Small Indel Rate
In contrast to the base-substitution mutation rate, which does not differ between O2MA lines derived from high fitness and low fitness O1MA lines, the O2MA short indel rate is significantly greater in the high fitness group (µINDEL = 0.95 ± 0.06 × 10−9/generation) than in the low fitness group (µINDEL = 0.73 ± 0.50 × 10−9/generation: GLM, F1, 44.4 = 7.84, P < 0.01). The difference is primarily due to different rates of deletions (µDEL, High = 0.59 ± 0.07 × 10−9/generation, µDEL, Low = 0.41 ± 0.04 × 10−9/generation; F1, 7.58 = 5.77, P < 0.05) rather than insertions (µINS, High = 0.35 ± 0.03 × 10−9/generation, µINS, Low = 0.32 ± 0.04 × 10−9/generation; F1, 17.5 = 0.51, P > 0.48). The variance in µINDEL among O2MA lines within each fitness group is adequately explained by a single, fitness-specific indel rate (high-fitness, simulation P > 0.12; low-fitness, simulation P > 0.2). The distribution of indel lengths is given in supplementary figure S3, Supplementary Material online.
The higher indel rate of high-fitness O2MA lines suggests that the indel rate of O1MA lines should be greater than the overall O2MA rate, given the higher fitness of the G0 ancestor. Interestingly, that is not what we observe. The overall O1MA indel rate, including all 32 O1MA lines, is significantly lower than high-fitness O2MA indel rate (µINDEL, O1MA = 0.66 ± 0.04 × 10−9/generation, µINDEL, O2MA_High = 0.95 ± 0.06 × 10−10/generation; GLM, F1, 41.7 = 15.61, P < 0.0005), but not significantly different from low fitness O2MA indel rate (µINDEL, O1MA = 0.66 ± 0.04 × 10−9/generation, µINDEL, O2MA_Low = 0.73 ± 0.05 × 10−9/generation; F1, 47.6 = 1.10, P > 0.29). The results do not change if only the nine O2MA progenitors are used to calculate the indel rate.
The O2MA insertion and deletion rates are uncorrelated (rINS, DEL = 0.021, P > 0.88, n = 48), suggestive of different factors underlying the two types of mutations. The base-substitution rate is moderately positively correlated with the insertion rate (rbs, INS = 0.32, P < 0.03, n = 48) but uncorrelated with the deletion rate (rbs, DEL = -0.016, P > 0.91, n = 48).
There are two potential evolutionary factors that can explain the difference in the deletion rate between the different O2MA fitness groups. First, some element of the mutational process may differ, for example, DNA repair. Alternatively, selection may differ in either strength or efficiency between the two treatments. In our MA protocol, differences in selection efficiency between lines can result from different frequencies of going to backup (see Materials and Methods). However, the mean effectively neutral selection coefficient (sn = 1/4Ne) is only slightly smaller in the low-fitness lines () than in the high-fitness lines (). The slightly lower efficiency of selection in the high fitness group seems unlikely to account for the ∼30% greater indel rate
Alternatively, the strength of selection itself may be different between the two fitness groups, such that some mutations that are only mildly deleterious—and thus effectively neutral—in a high-fitness line are significantly more deleterious in a low-fitness line, that is, epistasis is synergistic (= negative; Phillips 2008). In that case, a larger fraction of mutations in the low-fitness lines would exceed the threshold of effective neutrality and be removed by selection. The mutational effect analysis, as implemented in snpEff 4.1 (see Materials and Methods), reveals no significant difference in the frequency of potentially large-effect indels (20/303 among high-fitness O2MA lines, 10/186 among low-fitness lines, Fisher's Exact Test, two-tailed P > 0.70) or SNPs (22/822 vs. 15/659, Yates' chi-square, two-tailed P > 0.75). Nor does the frequency of potentially large-effect mutations differ between the high and low-fitness G250 O2MA progenitors with respect to indels (2/47 vs. 9/61, Yates’ chi-squared, two-tailed P > 0.14) or SNPs (P > 0.60). However, large-effect SNPs are marginally overrepresented in G250 O1MA lines as opposed to O2MA lines (65/1,646 vs. 37/1,481, Yates’ chi-squared, two-tailed P < 0.03), but the difference is not significant with respect to indels (38/434 vs. 30/489, Yates’ chi-squared, two-tailed P > 0.16).
Effects of mutations can also be inferred by the frequency distribution of mutations in putatively functional genomic regions such as exons, splice junctions, etc. We used snpEff to classify mutations into multiple categories—Exons, Introns, Intergenic, Gain/Loss of Start/Stop codon, and 3′/5′UTR (see supplementary table S3, Supplementary Material online). The proportion of variants affecting exons are significantly greater in G250 O1MA lines than G400 O2MA lines with respect to both indels (62/434 vs. 35/489, Yates’ chi-square, two-tailed P < 0.0006) and SNPs (443/1,646 vs. 312/1,481, Yates' chi-square, two-tailed P < 0.0002), implying greater purifying selection against new variants in the O2MA experiment—that is, epistasis is synergistic. No such difference was observed between high and low-fitness O2MA lines with respect to either indels (23/303 vs. 12/186, Yates’ chi-squared, two-tailed P > 0.74) or SNPs (167/822 vs. 145/659, Yates’ chi-squared, two-tailed P > 0.46), or between high and low fitness O2MA progenitors with respect to indels (P > 0.12) or SNPs (P > 0.6). However, experimentally derived estimates of relative fitness (Matsuba et al. 2012) combined with estimates of the average number of mutations carried by high-fitness and low-fitness O2MA lines are consistent with synergistic epistasis (see Discussion).
Mutation Is Positively Associated with Recombination
Standing nucleotide diversity is almost always positively associated with recombination rate, which could in principle be due either to the mutagenicity of recombination or to natural selection (i.e., Hill–Robertson interference). There is good reason to think that HR interference has an important causal role in this pattern (Lynch 2007). However, with the important exception of humans, most of the data are derived from inferences drawn from comparisons with a reference class of genomic sites believed to be free from selective constraints (e.g., processed pseudogenes or 4-fold degenerate sites) rather than directly from de novo mutations, and the extent to which recombination is mutagenic per se remains an open question. Several large studies of de novo mutations in humans report a significant association between recombination rate and mutation rate (Michaelson et al. 2012; Francioli et al. 2015; Carlson et al. 2018), although recombination appears to affect nucleotide diversity beyond its association with mutation. Two cohort studies in bees (Yang et al. 2015; Liu et al. 2017) found a weak relationship between recombination and mutation, but the numbers of mutations were small and the inferences somewhat circumstantial. Conversely, Ness et al. (2015) found no association between recombination rate and mutation rate in the green alga Chlamydomonas reinhardtii, although the data on recombination rate in C. reinhardtii are sparser than for humans. In C. elegans, the relative proportions of different types of variants (SNPs, small and large indels) in the standing genetic variation differ depending on the local recombination rate, which has been suggested to reflect the signature of recombination-dependent mutation (Hwang and Wang 2017).
To investigate the relationship between recombination and mutation, we determined the association between µbs estimated from the O2MA lines and recombination rate using weighted OLS regression, dividing each chromosome into the recombination rate bins reported by Rockman and Kruglyak (2009) and weighting each bin by its size in Mb. Recombination rate is nearly constant within each bin (see fig. 1 in Rockman and Kruglyak 2009). Contrary to our previous report from a different (and smaller) subset of G250 O1MA lines (Denver et al. 2009), here we find a significant positive univariate association between recombination rate and µbs (pseudor2 = 0.26, P < 0.003, supplementary fig. S4, Supplementary Material online); presumably the discrepancy between this study and the previous report is due to the greater power afforded by the much greater number of mutations included in the present study (316 vs. 1,828).
The Genomic Correlates of Mutability
Many features of the genome and epigenome influence the mutational process. The effects of some such features are well-understood and seem to be relatively general to all living organisms (e.g., short tandem repeats [STRs], G:C vs. A:T, 5′-methyl-C), whereas others remain uncertain and/or appear to be taxon-specific. To more fully characterize the features of the C. elegans genome that are associated with the mutational process, we employed a logistic regression method, loosely following the approach of Michaelson et al. (2012) and Ness et al. (2015). Since the deletion rate differs significantly between the two O2MA fitness groups, we restricted the analysis to base-substitutions. Univariate logistic regression coefficients of the features included in the full multiple regression are shown in figure 3.
Fig. 3.
(A, Left panel) Univariate logistic regression coefficients for different genomic features (dark gray bars). Overlapping light gray bars depict the null expectation from sampling variance with 1,828 randomly selected genomic sites treated as mutations. Variable abbreviations are: GC_10, 10 bp GC-content centered on the focal base; GC_1000, 1 kb GC content; mono_di, proximity to a mono or dinucleotide STR; motif_3bp, three-base motif with the mutant base at the 3′ end; Rec, local recombination rate. The method used to condense several predictors into one, for 3-base motif and mono–di STR, is described in Materials and Methods. (B, Right panel) Base-substitution mutation rate (µbs) of each 3-base motif, with mutations on the 3′ end. Motifs are grouped by the mutant base. A color version of this figure is available online.
To assess the predictive power of the mutability model, we randomly sampled half of the O2MA lines to train the model (24 lines; roughly 740 mutations) and tested the model on the remaining 24 O2MA lines. All mutant sites and 100,000 randomly chosen nonmutant sites were arbitrarily binned into 35 bins of uniform width, and the observed mutation rate for each mutability bin was plotted against expected mutability. This analysis was repeated 100 times. Of the factors initially included in the multiple regression (see Materials and Methods), the best model includes only the 64 three-base motifs as a set of predictor variables (fig. 4a). Any combination of other predictors, with or without the three-base motif, results in a poorer fit. The poorer fit presumably results from either overfitting, multicollinearity and/or the inability of the logistic regression model to accommodate nonlinear relationships between predictor variables and mutation rate, even when tuned with high penalty (λ) to drop predictor variables altogether (Lasso). Logistic regression coefficients are tabulated in supplementary table S4, Supplementary Material online.
Fig. 4.
Left panel (A) depicts the relationship between observed and predicted mutability across mutability bins (n = 35 bins). The mutability model includes only the 3-base motif. Right panel (B) depicts the relationship between observed standing nucleotide variation, measured as Watterson’s θ, and predicted mutability. Predicted mutability of wild isolates is calculated from the full complement of mutations in the MA lines (n = 1,828). A color version of this figure is available online.
It is reassuring but not surprising that the model provides a good fit to the MA data from which it was trained. Of more interest is the relationship between mutability as predicted from MA data and the genetic variation observed in nature. We obtained publicly available whole genome sequence data from 40 C. elegans wild isolates (Thompson et al. 2013) and identified SNPs using the same pipeline that we used to call putative mutations in the MA data. We identified ∼537,000 SNPs by these criteria. Sites were categorized as variable or not variable, without regard to allele frequency.
Mutability was assessed as described previously, except in this analysis the model was trained on the full set of 1,828 mutations. Figure 4b shows a plot of nucleotide diversity (quantified as θW; Watterson 1975) at noncoding sites (288,585 intron sites, and 122,272 intergenic sites) against predicted mutability. Averaged over all bins, mutability strongly predicts standing nucleotide diversity, although the variance is high at predicted high-mutability sites, presumably because the sample sizes are small.
STRs
STR loci (“microsatellites”) can mutate orders of magnitude faster than other classes of loci, and potentially contribute a large fraction of the per-generation mutational variance. We previously estimated the haploid per-genome mutation rate of dinucleotide STR loci in the full set of O1MA lines to be ∼0.12/generation (Phillips et al. 2009). That calculation accounts for variation in mutation rate among repeat motifs (e.g., AT vs. AG, etc.) and at least partly accounts for variation in mutation rate with repeat number, although there is substantial uncertainty that cannot be easily quantified. Seyfert et al. (2008) found no significant effect of repeat length (di, tri, or tetranucleotide repeat) on the rate of STR mutation in a different set of N2-strain MA lines. Denver et al. (2004) investigated the mutational process of mononucleotide repeats (= “homopolymers”) in the same MA lines as Seyfert et al. and concluded that the (haploid) per-genome mononucleotide mutation rate is ∼0.8/generation, more than twice the rate of all other classes of mutations combined. Mutational properties of STRs are summarized by repeat type in supplementary table S5 and supplementary figures S5 and S6, Supplementary Material online.
Our findings qualitatively recapitulate those of the previous studies. First, G:C mononucleotides experience indel mutations at a 10-fold greater rate than A:T mononucleotides, as observed by Denver et al. (2004). Second, indel mutation rate differs only ∼2-fold between the three A:T containing dinucleotides (AC, AG, AT) and we detected no mutations at CG dinucleotides, both as observed by Phillips et al. (2009). The ratio of mononucleotide deletions (251) to insertions (139) is not significantly different from the 16:14 ratio reported by Denver et al. (Yates' chi-square = 1.02, df = 1, P > 0.31), and the ratio of dinucleotide deletions (10) to insertions (32) is nearly identical to the 8:28 ratio observed by Phillips et al. However, there are quantitative differences between the findings of this study and the previously reported values. The mononucleotide indel rate is estimated to be 0.056 mutations per haploid genome per generation, ∼10% the rate estimated by Denver et al. (2004). The dinucleotide indel rate is estimated to be 0.003 mutations per haploid genome per generation, ∼5% of the estimate from Phillips et al. (2009).
We believe the differences between the STR mutation rates estimated in this study and those reported previously are more apparent than real, and that the higher genome-wide estimates from the previous studies are probably closer to the truth. The previous studies included only a small number of STR loci, in which extremely long repeats were greatly overrepresented. In contrast, long repeats are significantly underrepresented in our data. There is no reason to doubt that mutation rate increases with repeat number. Probably, we have missed a handful of highly mutable loci that contribute a disproportionately large fraction of mutations to the genome-wide total. However, very long mono and dinucleotide repeats represent a miniscule fraction of the genome (e.g., mono and dinucleotides > 20 repeats represent ∼0.02% of the genome), and it seems unlikely (although not impossible) that those few highly mutable loci contribute meaningfully to fitness. Thus, although the true genome-wide mutation rate may in fact approach one per haploid genome per generation, the meaningful mutation rate may be much lower.
We elaborate on potential causes of the source(s) of the discrepancies between this study and the previous studies in the Extended Discussion, Supplementary Appendix A2.3, Supplementary Material online.
Copy Number Variants
Copy number variants were called using the read-depth (RD) based method implemented in the CNV-seq software (Xie and Tammi 2009). The number of putative CNVs inferred is sensitive to the parameters of the analysis, with different sets of input parameters leading to estimates of the CNV rate that differ by an order of magnitude (see table 2). Sensible interpretation of the results requires understanding the details of the analysis, which we outline in Materials and Methods and elaborate in section 4 of the Extended Discussion, Appendix A2. There are two salient general results: 1) in no case does the mean number of putative CNVs differ between the high and low-fitness O2MA lines, but also 2) the number of putative CNVs in O2MA lines is consistently greater than in O1MA lines (supplementary table S6, Supplementary Material online). Given the uncertainty associated with the estimates, we did not attempt to confirm CNVs (e.g., with qPCR). We report the results as a cautionary note that, even with as “easy” a genome as C. elegans N2 strain MA lines, which are almost completely homozygous and have an exceptionally well-characterized reference genome, estimates of CNVs from short-read data are extremely sensitive to methodological details. We expect that the CNV problem will eventually be resolved with accurate long-read sequencing at high coverage (Chaisson et al. 2015; Tyson et al. 2018).
Table 2.
Median Copy Number Variation Rate, Per-genome/Generation (SEM).
CNV calling criteria | O1MA (n = 32) | O2MA (n = 48) | O2MA High fitness (n = 24) | O2MA Low fitness (n = 24) |
---|---|---|---|---|
1.5X—W.S. 100 | 0.08 (0.04) | 0.25 (0.11) | 0.31 (0.20) | 0.24 (0.06) |
1.5X—W.S. 200 | 0.14 (0.09) | 0.46 (0.14) | 0.44 (0.26) | 0.58 (0.12) |
1.5X—W.S. 500 | 0.28 (0.25) | 0.63 (0.16) | 0.47 (0.24) | 0.80 (0.22) |
2X—W.S. 100 | 0.02 (0.02) | 0.08 (0.05) | 0.08 (0.1) | 0.08 (0.03) |
2X—W.S. 200 | 0.03 (0.03) | 0.14 (0.09) | 0.11 (0.17) | 0.18 (0.05) |
2X—W.S. 500 | 0.05 (0.04) | 0.16 (0.16) | 0.16 (0.28) | 0.25 (0.16) |
1.5X—W.S. 1000 | 0.005 (0.004) | 0.007 (0.005) | 0.007 (0.009) | 0.007 (0.003) |
1.5X—W.S. 2000 | 0.006 (0.004) | 0.007 (0.008) | 0.007 (0.14) | 0.007 (0.004) |
1.5X—W.S. 5000 | 0.008 (0.005) | 0.007 (0.02) | 0.007 (0.03) | 0.007 (0.009) |
2X—W.S. 1000 | 0 (0.001) | 0 (0.002) | 0.007 (0.004) | 0 (0.001) |
2X—W.S. 2000 | 0 (0.002) | 0 (0.004) | 0.003 (0.008) | 0 (0.002) |
2X—W.S. 5000 | 0.004 (0.003) | 0.007 (0.006) | 0.007 (0.01) | 0 (0.006) |
1.5X—W.S. 10000 | 0 (0.0005) | 0 (0.0003) | 0 (0.0006) | 0 (0.0003) |
1.5X—W.S. 20000 | 0 (0.0002) | 0 (0.0004) | 0 (0.0004) | 0 (0.0007) |
1.5X—W.S. 50000 | 0 (0.0006) | 0 (0.0007) | 0 (0.001) | 0 (0.0009) |
2X—W.S. 10000 | 0 (0.0001) | 0 (0.0003) | 0 (0.0006) | 0 (0) |
2X—W.S. 20000 | 0 (0.0004) | 0 (0) | 0 (0) | 0 (0) |
2X—W.S. 50000 | 0 (0.0004) | 0 (0.0001) | 0 (0.0003) | 0 (0) |
Note.—Abbreviations are: 1.5X—W.S. 100, 1.5-fold change—Window size 100.
Window sizes were optimized to detect CNVs of ∼1 kb (100, 200, 500 bp windows at 1.5X and 2X-fold-change), 10 kb (1, 2, and 5 kb, 1.5X, and 2X-fold-change), and 100 kb (10, 20, 50 kb, 1.5X, and 2X-fold-change). See Supplementary Appendix A2.4, Supplementary Material online for more details.
Discussion
The Effects of Fitness
Mutation Rate
There is no evidence that the rate or molecular spectrum of base-substitution mutations in the N2 strain of C. elegans depends on the fitness of the starting genotype. In contrast, the short indel rate, especially the deletion rate, is fitness dependent, but not in the anticipated way: low-fitness genotypes have a significantly lower deletion rate than do high-fitness genotypes. This result differs from the finding in D. melanogaster that low-fitness genotypes experience significantly greater rates of small deletions than do high-fitness genotypes, apparently because flies in poor physiological condition employ a different, more error-prone mechanism of double-strand break repair than do flies in good condition (Sharp and Agrawal 2016).
Mutational Effects
In the abstract, existence of a robust system seems to imply redundancy of components, which in the context of a genetic system implies that epistasis must be synergistic on average (de Visser et al. 2003). However, empirical evidence concerning the average epistatic effect of spontaneous deleterious mutations has been inconclusive (Halligan and Keightley 2009), and whether deleterious mutations interact synergistically on average has vexed generations of evolutionary biologists.
It is certainly possible that worms with high-fitness genotypes incur more short deletions than do worms with low-fitness genotypes and/or are worse at repairing them. If we use the point estimates of the average number of mutations carried by O2MA lines derived from high-fitness and low-fitness O2MA progenitors and the average mutational decline in relative fitness, averaged over all O2MA lines (supplementary table S7, Supplementary Material online), the average mutational effect on relative fitness of the low-fitness O2MA lines is about twice that of the high fitness lines (0.48% vs. 0.25%). If we assume that only indels affect fitness, the mean effects on low-fitness and high-fitness O2MA lines are ∼2% and 1%, respectively. We further suppose that the “dark matter” represented by the (assumed) missing fraction of indels in the low-fitness lines would amplify the difference in average selective effects. These results represent some of the first direct evidence that spontaneous mutations interact synergistically, on average (contraJasmin and Lenormand 2016).
Variation in the Mutational Process
The per-generation input of genetic variance for a trait, the mutational variance, VM, ultimately governs the evolvability of the trait and is a fundamental parameter in evolutionary genetics (Lynch and Hill 1986). VM can be estimated from MA data from the relationship , where VL is the among-line component of variance and t is the number of generations of MA (Lynch and Walsh 1998). VM is commonly standardized relative to the residual variance VE, called the mutational heritability . Among O2MA families, VL for µbs = 0.196, VE = 0.316, and = 143, so ≈ 0.002/generation. To put that result in context, for a wide variety of traits in disparate taxa averages ∼0.001/generation (Houle et al. 1996), although for some traits (notably gene expression) is consistently an order of magnitude less (e.g., Rifkin et al. 2005; Landry et al. 2007). That the mutation rate is evolvable is not surprising (we know it is), but the point estimate of suggests that the mutational target for mutation rate is substantial. However, an estimate of a ratio of variances based on ten data points cannot be considered robust.
Mutability and Genetic Variation
The observed positive univariate association between local recombination rate and mutation rate in this experiment is almost surely not due to Hill–Robertson interference. The expected time to fixation/loss of a new mutation is three generations (Keightley and Caballero 1997), and the average mutation rate (CNVs notwithstanding) is about one per genome every three generations. Thus, the opportunity for H–R interference in our experiment is very low, albeit not nonexistent. However, the causal factors underlying the relationship between local recombination rate and base-substitution mutation rate remain uncertain. The cause is not as simple as GC content (G:C being more mutable than A:T), because chromosome arms, which have higher recombination rates, are AT-rich. The sufficiency of 3-bp motif as a predictor of mutability suggests that the positive association of mutation with recombination is an epiphenomenon resulting from the overrepresentation of highly mutable motifs in regions of high recombination. Most clearly, mononucleotide runs are both mutagenic and positively associated with local recombination rate.
The mutability model does a good job of predicting standing nucleotide variation in natural isolates of C. elegans: sites that are more mutable are, on average, more variable (fig. 4b). Again, this is reassuring, but not surprising; the same relationship has been observed in C. reinhardtii (Ness et al. 2016) and humans (Francioli et al. 2015). One difference between C. elegans and many other organisms (e.g., humans) is that, in C. elegans, regions of low recombination (chromosome cores) are gene-rich rather than gene-poor, so H–R interference is more important in regions of low recombination both for inherent reasons and because the opportunity for selection is greater due to the larger target for deleterious mutations. One possibility is that, because chromosome arms are gene-poor, mutagenic features (e.g., specific motifs, such as mononucleotide runs) preferentially accumulate because their background effects on linked loci are less important. Perhaps paradoxically, the AT-richness of chromosome arms may be a signature of increased mutation rate, because C:G mutates to A:T more often than the reverse.
Transition/Transversion Ratio in MA and Wild Isolates
The ratio of transitions to transversions neither does differ significantly between high fitness O2MA and low fitness O2MA lines (Ts/TvHigh = 0.72, Ts/TvLow = 0.68, t-test, P > 0.5), nor does it differ between O2MA lines and O2MA progenitors (Ts/TvO1MA = 0.70, Ts/TvO2MA = 0.74, t-test, P > 0.8). As in previous studies with C. elegans MA lines, the Ts/Tv ratio in MA lines is less than the Ts/Tv ratio in wild isolates (Denver et al. 2009, 2012). The difference could be due to purifying selection against transversions (Babbitt and Cotter 2011), or it could be that the mutational milieu in the lab environment differs from that in the wild. Ts/Tv of rare variants (<5% variant allele frequency relative to the reference genome, Ts/Tv = 1.20) is similar to that of common variants (>50% variant allele frequency, Ts/Tv = 1.14). That finding is not consistent with purifying selection against transversions, which suggests that some feature of the lab environment may cause the mutational spectrum of MA lines to differ from that in nature. An independent test of this possibility is to compare Ts/Tv between our N2-strain MA ancestor and the N2 reference genome; any variants will have arisen in the lab environment. In that comparison, Ts/Tv = 0.78, consistent with the MA Ts/Tv ratio and different from that among wild isolates. Denver et al. (2012) found a similarly lower Ts/Tv in MA lines of the congener Caenorhabditis briggsae relative to wild isolates, suggesting that the phenomenon is not restricted to C. elegans.
Evolution of the Mutation Rate
The total mutation rate, µTotal, of the O2MA lines is ∼13% greater than that of the O1MA lines (table 1). As purifying selection against new variants was likely stronger during the O2MA experiment, 13% is perhaps an underestimate. Directional change in a trait under MA conditions (“mutational bias,” ΔM) suggests that the trait is under ongoing directional selection in the opposite direction, analogous to the direction of phenotypic change upon inbreeding (Teotónio et al. 2017). This finding is consistent with the “drift barrier” hypothesis of mutation rate evolution, which posits that directional selection to reduce the mutation rate is opposed by a weak mutational bias (Lynch 2008). It is not consistent with the mutation rate being at an optimum established by a “cost of fidelity,” wherein direct selection to reduce the input of deleterious mutations is counterbalanced by indirect selection to reduce the fitness cost of genome surveillance (Kimura 1967). If the mutation rate is at an optimum imposed by countervailing components of selection, the overall fitness function will be stabilizing. Provided that the fitness function is approximately symmetrical around the optimum, the expectation is that the among-line variance in the trait will increase but that the overall trait mean will not change. We emphasize that these findings do not imply that there is no cost of fidelity, just that the mutation rate in C. elegans does not appear to be at a global optimum.
The ∼13% increase in mutation rate over the course of ∼250 generations amounts to a per-generation change ΔM ≈ 0.0005, compared with a per-generation decrease in competitive fitness of ≈0.001 in the same lines (Yeh et al. 2018). It is difficult to believe that cumulative selection on competitive fitness is only twice as strong as cumulative selection to decrease mutation rate. The most logical conclusion is that mutations that increase mutation rate have deleterious pleiotropic effects on fitness. However, this conclusion seems at odds with the failure to observe a main effect of fitness in the O2MA lines. In principle, the discrepancy could be resolved by determining the mutational correlation of mutation rate with fitness. The sample sizes necessary to answer that question in multicellular organisms are currently prohibitive, but it may be practical in a microbial system.
Conclusions
Sets of O2MA lines that differed significantly in relative fitness were allowed to accumulate mutations for ∼150 generations, to test the hypothesis that physiological stress imposed by mutation load causes the mutation rate to increase. Contrary to our hypothesis, the base substitution rate did not differ significantly between high-fitness and low-fitness lines, whereas the indel rate was greater in the high-fitness lines. Averaged over all lines, the total mutation rate between generations 250 and 400 was significantly greater than the mutation rate between generations 0 and 250, as predicted by the “drift barrier” hypothesis of mutation rate evolution. The average mutational effect on fitness (α) is approximately twice as great in the low-fitness lines, which implies that the epistatic effects of deleterious mutations are synergistic.
Materials and Methods
MA Protocol
A schematic depiction of the experimental design is presented in figure 1. Details of the O1MA protocol and fitness assays are reported in Baer et al. (2005); details of the O2MA protocol are reported in Matsuba et al. (2012) and summarized in the Extended Methods, Supplementary Appendix A1.1, Supplementary Material online.
Genome Sequencing and Estimation of Mutation Rates
Five O2MA lines from the ten O2MA families and 35 O1MA lines, including the ten O2MA progenitors, were sequenced at an average of ∼25X coverage depth. Sequencing was done using Illumina technology with 100 bp paired-end reads. Protocols for DNA extraction and construction of sequencing libraries are given in Supplementary Appendix A1.2, Supplementary Material online; details of preliminary processing of raw sequence data are given in Supplementary Appendix A1.3, Supplementary Material online. The quality of sequence from two O2MA lines and one O1MA line was poor and these samples were omitted from further analyses, leaving 48 O2MA line and nine of the ten O2MA progenitors.
Variants were called using GATK software (McKenna et al. 2010) with a minimum coverage threshold of >10×. Variants were identified as putative mutations if 1) the variant was identified as homozygous, and 2) it was present in one and only one O2MA line. Criterion (1) means that any mutations that occurred in the last few generations of O2MA that were still segregating and/or occurred during population expansion for DNA extraction were ignored. Because the O1MA progenitor was at mutation-drift equilibrium (Lynch and Hill 1986), the segregating variation is expected to be the same in the O1MA progenitor and the O2MA line, so ignoring heterozygotes results in an unbiased estimate of mutation rate. Criterion (2) reduces the probability of mistakenly identifying a variant segregating at low frequency in the expanded population of the O1MA progenitor as a new mutation. Two pairs of O1MA lines shared multiple variants and were inferred to have experienced contamination at some point during the MA phase; one line from each pair was arbitrarily omitted from subsequent analyses (see Extended Discussion, Supplementary Appendix A2.1, Supplementary Material online).
The mutation rate (per-site, per-generation) μ of each O2MA line was calculated as m/nt where m is the number of mutations, n is the number of nucleotide sites observed and t is the number of generations of MA (Denver et al. 2009). The average mutation rate of each O1MA progenitor was calculated as the unweighted mean of the O2MA lines in that family.
We applied three additional variant-calling strategies, one with a more stringent set of criteria which we refer to as the “trimmed genome” and two with more liberal criteria, which we refer to as the “lenient genome” and “STR-relaxed,” respectively. We also explored several alternative GATK filters. Details of the alternative GATK filters are given in Supplementary Appendix A1.3, Supplementary Material online; justification and details of the additional variant-calling strategies are given in Supplementary Appendix A1.4, Supplementary Material online.
Mutation Confirmation
From each of the 48 O2MA lines, we randomly chose one putative base-substitution and one putative indel mutation for confirmation by Sanger sequencing. Details of the confirmation protocol are given in Supplementary Appendix A1.5, Supplementary Material online. We confirmed 43/48 putative base-substitutions (zero false positives, five failures) and 36/48 putative indels (zero false positives, 12 failures), consistent with a false positive rate below 2.5% based on the upper 95% Poisson confidence limit. The number of failures did not differ significantly between base-substitutions and indels (Fisher's exact test, two-tailed P > 0.1).
Data Analysis
Variation among O1MA and O2MA Lines
The simplest hypothesis regarding the mutational process is that it remained constant over the course of the experiment subsequent to the divergence of the O1MA lines from the common G0 ancestor. To test the hypothesis that a uniform mutation rate sufficiently explains variation among MA lines, we simulated the evolutionary process of 48 O2MA lines with a uniform base-substitution mutation rate equal to the unweighted mean base-substitution mutation rate of the 48 O2MA lines, accounting for the number of callable sites (supplementary table S2, Supplementary Material online) and the number of generations of O2MA of each line (supplementary table S1, Supplementary Material online). At each generation, the simulated genome (100.25 Mb) of a line was assigned a number of mutations drawn from a Poisson distribution with parameter equal to the unweighted mean mutation rate. Next, each simulated line was assigned a number of generations of MA and a number of callable sites sampled with replacement from the observed distribution. Sets of 48 simulated lines were generated 100,000 times, and the observed variance in mutation rate among the 48 O2MA lines was compared with the distribution of simulated variances (supplementary fig. S1, Supplementary Material online). An analogous simulation was done for the 34 O1MA lines. Details of the simulation and code are provided in Supplementary Appendix A3.1, Supplementary Material online.
To test for effects of fitness on mutation rate and to partition the variance in mutation rate into within and among-group components, we consider the mutation rate itself as a continuously distributed dependent variable in a GLM. Details of the GLM analyses are given in Supplementary Appendix A1.6, Supplementary Material online.
It is possible that the distribution of mutation types—the mutational spectrum—differs among groups even if the overall mutation rate does not (Long et al. 2016), or if none of the type-specific mutation rates achieves statistical significance. Spectra were compared among groups by Fisher's Exact Test using Monte Carlo sampling as implemented in the FREQ procedure of SAS v. 9.4.
Mutability
Many factors potentially influence the probability that a site will mutate. Some of these factors can be unambiguously characterized from pooled genomic DNA from a population of multicellular organisms (local sequence motif, base composition at various scales, local recombination rate), whereas other factors that potentially influence heritable mutation are only relevant in the context of the germline or its embryonic precursors (e.g., chromatin state, nucleosome occupancy, expression level). To elucidate the relationship between the various genomic properties of a given site and the probability that a mutation occurs at that site, we employed a logistic regression model in which the log odds-ratio that a mutation occurs at a given site (success = mutation) is modeled as the sum of a set of linear predictor input variables (Michaelson et al. 2012; Ness et al. 2016). We initially included as independent variables the three-base motif with the focal nucleotide at the 3′ end and at the 5′ end (these need not be redundant if the probability that a sequencing read is included is not identical for the two strands), the five-base motif with the focal nucleotide in the center, local recombination rate (see Results for details), presence or absence of the focal site in the vicinity of a (mono/di)nucleotide run (± 2 bp), and the 11-base and 1001-base GC-content, centered on the focal site. For cross-validation, we trained our models using half of our O2MA lines (n = 24), including all base substitutions and 100,000 randomly chosen nonmutant sites, and tested the predictions on the remaining 24 O2MA lines. Following model selection, we trained the model using all the ancestral O1MA and O2MA mutations (n = 1,828). Including only SNPs from the O2MA lines (n = 1,481) had no qualitative effect on the results.
Logistic regression was performed using the GLMnet package in R (Friedman et al. 2010). The two model parameters are the tuning penalty λ and the ridge/lasso penalty α. As α → 0 (ridge), the model tends to shrink the coefficients of correlated predictor variables toward each other without dropping any of the predictor variables. As α → 1 (lasso), when predictor variables are correlated the model chooses one and discards the other(s). The tuning parameter λ controls the overall strength of the penalty. For all the models we tested, λ was chosen by the package’s built-in cross validation function (“lambda-min”). The fit of our models remains largely unchanged by the selection of α, with the exception of instances where α → 0 (ridge). For values of α sufficiently close to 0, we observed twice the slope (bias, or regression coefficient; expected value = 1) expected when the observed mutation rate is regressed against the predicted mutation rate. All results presented here used α = 0.05.
Models including STRs, five base motif and/or 10 bp GC content together with three-base motif fit poorly. The poor fit could be due either to overfitting, multicollinearity and/or a nonlinear relationship between a predictor variable and mutability. Moreover, models that included STRs as the only predictors fit reasonably well in terms of calibration of the model, but suffer from poorer predictive discrimination, as they lack any discrimination in the nonSTR region of the genome (∼97% of the genome). The final set of models tested include only the three-base motif with the focal nucleotide at the 3′ end, 1Kb GC content and local recombination rate. For each set of input predictor variables and α, the data set was resampled 200 times, with a different randomly chosen set of 100,000 nonmutant sites, and divided into halves, the training set and the test set. The training set consisted of half the O2MA lines, and the test set consisted of the other half. For each data partition, the model was trained and the fit to the test set was assessed by the fraction of the variance explained by the linear regression of the observed (test) value on the predicted (training) value (R2), the regression coefficient (bias, expected value = 1), and the area under the ROC curve (AUC, or c-statistic) for predictive discrimination (varies between 0 and 1). For each set of α and input variables, we generated a distribution of 200 R2 values, bias values and AUC values, and retained model(s) that maximized R2, and AUC while kept the regression coefficient to as close to one as possible. Code and example files are provided in Supplementary Appendix A3.2, Supplementary Material online. We list the performance of all the models we tested in supplementary table S8, Supplementary Material online.
GLMnet fits the following regression model utilizing every site in the training set, and resulting mutability is the probability of mutation “Pr (mutant)” in the following equation.
A mutability value of (say) 0.08 for a specific set of features implies that 92 out of 100 sites with the same features in the training set are nonmutant. The absolute value of mutability varies positively with the number of mutations in the training set and negatively with the number of nonmutant sites included during training. Mutability can only be interpreted in the relative context of the study. For example, Ness et al. (2015) included ∼6,500 mutations in a training set with 100,000 nonmutant sites, so the average mutability in that study would be ∼3.5× greater than the estimate in this study.
Although mutability potentially varies between 0 and 1, the highest mutability observed rarely exceeded 0.2, as average mutability is the fraction of mutant sites as compared with the total number of sites. During cross-validation, we trained the model on slightly under half of the total mutations (∼740 mutants; half of all O2MA mutants, and none of the O1MA mutants), and 100,000 randomly selected sites, yielding the expected value of mutability to be ∼0.007 [740/(100,000 + 740)]. Following model selection, to assess the relationship between standing genetic variation and mutability, we recalculated mutability using all 1,828 mutations. Expected mutability calculated from the full data set is ∼0.017 [1,828/(100,000 + 1,828)].
To assess the relative contribution to mutability of different genomic features, we took a slightly different approach. It is not possible to condense 64 3-base motifs into a single predictor in a standard logistic regression framework. Instead, we trained a logistic regression model using only the 3-base motif, and used the predictions made from this model as predictors in a second logistic regression, which yields a single coefficient for 3-base motif. We then selected 1,828 random sites from the genome, treated them as mutations, and repeated the process 100 times to obtain the null distribution for these coefficients.
Copy Number Variants
We called putative CNVs using a RD based approach as implemented in the CNV-Seq software (Xie and Tammi 2009). RD outliers in an MA line (O1MA or O2MA) are identified relative to its progenitor. We employed a sliding window approach, in which RD in a given window in the focal sample is compared with RD in the reference (progenitor), by taking the ratio of total reads falling in that window, normalized by the average RD for that chromosome in that sample. We used two different signal thresholds (1.5×/2×) for calling duplications or deletions, and several sets of sliding window sizes (see Supplementary Appendix 2, Supplementary Material online, section 4 for details). The minimum detectable size of a CNV is 1 kb in all cases. A putative CNV is called only when all sliding windows covering an interval meet the signal threshold. The smaller the window, the more conservative the test, because more consecutive windows must meet the signal criterion. Similarly, the greater the signal threshold, the more conservative the criterion. As with other types of variants, only putative CNVs unique to a single O2MA line are considered potential mutations. For reasons which we elaborate in the Results, we did not attempt to validate CNVs.
Predicted Mutational Effects
To broadly categorize the distribution of predicted mutational effects, we used the publicly available software snpEff 4.1 (Cingolani et al. 2012) to annotate variants with respect to a set of characteristics potentially related to fitness (http://snpeff.sourceforge.net/VCFannotationformat_v1.0.pdf; last accessed December 10, 2018.). The snpEff software uses gene lists (gtf/gff3) from the WS234 build of the C. elegans genome to assign a putative effect score of high, moderate, or low based on these characteristics, where a “high” effect variant is most likely to have a deleterious effect on fitness. Each variant can have multiple annotations due to different alleles and/or different splice variants; we included only the largest potential effect of each variant. Effects were parsed using custom AWK scripts and categorized as having low, medium, or high maximum impact.
Supplementary Material
Supplementary data are available at Molecular Biology and Evolution online.
Supplementary Material
Acknowledgments
We thank Tim Crombie, Joanna Dembek, Joanna Joyner-Matos, Gigi Ostrow, Asher Shoucair, and the many undergraduate students who worked over the years to maintain and propagate the MA lines. We thank Rob Ness and Jonathan Sebat for graciously explaining some details of the analyses in their papers, and Mike Miyamoto, Henrique Teotónio, and the anonymous reviewers for their helpful comments. This study was supported by National Science Foundation grant DEB-0717167 to C.F.B. and National Institutes of Health grants R01GM072639 to C.F.B. and D.R. Denver and R01GM107227 to C.F.B. and E.C. Andersen.
References
- Agrawal AF, Wang AD.. 2008. Increased transmission of mutations by low condition females: evidence for condition-dependent DNA repair. PLoS Biol. 6:389–395. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Agrawal AF. 2002. Genetic loads under fitness-dependent mutation rates. J Evol Biol. 156:1004–1010. [Google Scholar]
- Ávila V, Chavarrias D, Sanchez E, Manrique A, Lopez-Fanjul C, Garcia-Dorado A.. 2006. Increase of the spontaneous mutation rate in a long-term experiment with Drosophila melanogaster. Genetics 1731:267–277. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Babbitt GA, Cotter CR.. 2011. Functional conservation of nucleosome formation selectively biases presumably neutral molecular variation in yeast genomes. Genome Biol Evol. 3:15–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Baer CF, Shaw F, Steding C, Baumgartner M, Hawkins A, Houppert A, Mason N, Reed M, Simonelic K, Woodard W, et al. 2005. Comparative evolutionary genetics of spontaneous mutations affecting fitness in rhabditid nematodes. Proc Natl Acad Sci U S A. 10216:5785–5790. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carlson J, Locke AE, Flickinger M, Zawistowski M, Levy S, Myers RM, Boehnke M, Kang HM, Scott LJ, Li JZ, et al. 2018. Extremely rare variants reveal patterns of germline mutation rate heterogeneity in humans. Nat Commun. 91:3753.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chaisson MJP, Huddleston J, Dennis MY, Sudmant PH, Malig M, Hormozdiari F, Antonacci F, Surti U, Sandstrom R, Boitano M, et al. 2015. Resolving the complexity of the human genome using single-molecule sequencing. Nature 5177536:608–U163. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cingolani P, Platts A, Wang LL, Coon M, Nguyen T, Wang L, Land SJ, Lu XY, Ruden DM.. 2012. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w(1118); iso-2; iso-3. Fly 62:80–92. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Conrad DF, Keebler JEM, DePristo MA, Lindsay SJ, Zhang YJ, Casals F, Idaghdour Y, Hartl CL, Torroja C, Garimella KV, et al. 2011. Variation in genome-wide mutation rates within and between human families. Nat Genet. 437:712–U137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- de Visser J, Hermisson J, Wagner GP, Meyers LA, Bagheri HC, Blanchard JL, Chao L, Cheverud JM, Elena SF, Fontana W, et al. 2003. Perspective: evolution and detection of genetic robustness. Evolution 579:1959–1972. [DOI] [PubMed] [Google Scholar]
- Denver DR, Dolan PC, Wilhelm LJ, Sung W, Lucas-Lledo JI, Howe DK, Lewis SC, Okamoto K, Thomas WK, Lynch M, et al. 2009. A genome-wide view of Caenorhabditis elegans base-substitution mutation processes. Proc Natl Acad Sci U S A. 10638:16310–16314. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Denver DR, Morris K, Kewalramani A, Harris KE, Chow A, Estes S, Lynch M, Thomas WK.. 2004. Abundance, distribution, and mutation rates of homopolymeric nucleotide runs in the genome of Caenorhabditis elegans. J Mol Evol. 585:584–595. [DOI] [PubMed] [Google Scholar]
- Denver DR, Wilhelm LJ, Howe DK, Gafner K, Dolan PC, Baer CF.. 2012. Variation in base-substitution mutation in experimental and natural lineages of Caenorhabditis nematodes. Genome Biol Evol. 44:513–522. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dickinson WJ. 2008. Synergistic fitness interactions and a high frequency of beneficial changes among mutations accumulated under relaxed selection in Saccharomyces cerevisiae. Genetics 1783:1571–1578. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Drake JW, Charlesworth B, Charlesworth D, Crow JF.. 1998. Rates of spontaneous mutation. Genetics 1484:1667–1686. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fisher RA. 1930. The genetical theory of natural selection. Oxford: Clarendon Press. [Google Scholar]
- Francioli LC, Polak PP, Koren A, Menelaou A, Chun S, Renkens I, van Duijn CM, Swertz M, Wijmenga C, van Ommen G, et al. 2015. Genome-wide patterns and properties of de novo mutations in humans. Nat Genet. 477:822.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Friedman J, Hastie T, Tibshirani R.. 2010. Regularization paths for generalized linear models via coordinate descent. J Stat Softw. 331:1–22. [PMC free article] [PubMed] [Google Scholar]
- Halligan DL, Keightley PD.. 2009. Spontaneous mutation accumulation studies in evolutionary genetics. Annu Rev Ecol Evol Syst. 401:151–172. [Google Scholar]
- Houle D, Morikawa B, Lynch M.. 1996. Comparing mutational variabilities. Genetics 1433:1467–1483. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hwang HY, Wang J.. 2017. Effect of mutation mechanisms on variant composition and distribution in Caenorhabditis elegans. PLoS Comput Biol. 131: e1005369. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jasmin JN, Lenormand T.. 2016. Accelerating mutational load is not due to synergistic epistasis or mutator alleles in mutation accumulation lines of yeast. Genetics 2022:751. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Keightley PD, Caballero A.. 1997. Genomic mutation rates for lifetime reproductive output and lifespan in Caenorhabditis elegans. Proc Natl Acad Sci U S A. 948:3823–3827. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kimura M. 1967. On evolutionary adjustment of spontaneous mutation rates. Genet Res. 901:23–27. [Google Scholar]
- Landry CR, Lemos B, Rifkin SA, Dickinson WJ, Hartl DL.. 2007. Genetic properties influencing the evolvability of gene expression. Science 3175834:118–121. [DOI] [PubMed] [Google Scholar]
- Liu HX, Jia YX, Sun XG, Tian DC, Hurst LD, Yang SH.. 2017. Direct determination of the mutation rate in the bumblebee reveals evidence for weak recombination-associated mutation and an approximate rate constancy in insects. Mol Biol Evol. 341:119–130. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Long HG, Behringer MG, Williams E, Te R, Lynch M.. 2016. Similar mutation rates but highly diverse mutation spectra in ascomycete and basidiomycete yeasts. Genome Biol Evol. 812:3815–3821. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lynch M, Ackerman MS, Gout JF, Long H, Sung W, Thomas WK, Foster PL.. 2016. Genetic drift, selection and the evolution of the mutation rate. Nat Rev Genet. 1711:704–714. [DOI] [PubMed] [Google Scholar]
- Lynch M, Hill WG.. 1986. Phenotypic evolution by neutral mutation. Evolution 405:915–935. [DOI] [PubMed] [Google Scholar]
- Lynch M, Walsh B.. 1998. Genetics and analysis of quantitative traits. Sunderland (MA: ): Sinauer. [Google Scholar]
- Lynch M. 2007. The origins of genome architecture. Sunderland (MA): Sinauer Associates, Inc. [Google Scholar]
- Lynch M. 2008. The cellular, developmental and population-genetic determinants of mutation-rate evolution. Genetics 1802:933–943. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Matsuba C, Lewis S, Ostrow DG, Salomon MP, Sylvestre L, Ungvari-Martin J, Baer CF.. 2012. Invariance (?) of mutational parameters for relative fitness over 400 generations of mutation accumulation in Caenorhabditis elegans. G3/Genes, Genomes, Genetics. 2:1497–1503. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, et al. 2010. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 209:1297–1303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Michaelson JJ, Shi YJ, Gujral M, Zheng HC, Malhotra D, Jin X, Jian MH, Liu GM, Greer D, Bhandari A, et al. 2012. Whole-genome sequencing in autism identifies hot spots for de novo germline mutation. Cell 1517:1431–1442. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ness RW, Kraemer SA, Colegrave N, Keightley PD.. 2016. Direct estimate of the spontaneous mutation rate uncovers the effects of drift and recombination in the Chlamydomonas reinhardtii plastid genome. Mol Biol Evol. 333:800–808. [DOI] [PubMed] [Google Scholar]
- Ness RW, Morgan AD, Vasanthakrishnan RB, Colegrave N, Keightley PD.. 2015. Extensive de novo mutation rate variation between individuals and across the genome of Chlamydomonas reinhardtii. Genome Res. 2511:1739–1749. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ogur M, Ogur S, Stjohn R.. 1960. Temperature dependence of the spontaneous mutation rate to respiration deficiency in Saccharomyces. Genetics 452:189–194. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Phillips N, Salomon M, Custer A, Ostrow D, Baer CF.. 2009. Spontaneous mutational and standing genetic (co)variation at dinucleotide microsatellites in Caenorhabditis briggsae and Caenorhabditis elegans. Mol Biol Evol. 263:659–669. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Phillips PC. 2008. Epistasis—the essential role of gene interactions in the structure and evolution of genetic systems. Nat Rev Genet. 911:855–867. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rifkin SA, Houle D, Kim J, White KP.. 2005. A mutation accumulation assay reveals a broad capacity for rapid evolution of gene expression. Nature 4387065:220–223. [DOI] [PubMed] [Google Scholar]
- Rockman MV, Kruglyak L.. 2009. Recombinational landscape and population genomics of Caenorhabditis elegans. PLoS Genet. 53: e1000419. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schrider DR, Houle D, Lynch M, Hahn MW.. 2013. Rates and genomic consequences of spontaneous mutational events in Drosophila melanogaster. Genetics 1944:937–954. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Seyfert AL, Cristescu MEA, Frisse L, Schaack S, Thomas WK, Lynch M.. 2008. The rate and spectrum of microsatellite mutation in Caenorhabditis elegans and Daphnia pulex. Genetics 1784:2113–2121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sharp NP, Agrawal AF.. 2012. Evidence for elevated mutation rates in low-quality genotypes. Proc Natl Acad Sci U S A. 10916:6142–6146. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sharp NP, Agrawal AF.. 2016. Low genetic quality alters key dimensions of the mutational spectrum. PLoS Biol. 143: e1002419. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shaw FH, Baer CF.. 2011. Evolutionary consequences of fitness-dependent mutation rate in finite populations. J Evol Biol. 248:1677–1684. [DOI] [PubMed] [Google Scholar]
- Sturtevant AH. 1937. Essays on evolution. 1. On the effects of selection on mutation rate. Quart Rev Biol. 12:467–477. [Google Scholar]
- Teotónio H, Estes S, Phillips PC, Baer CF.. 2017. Experimental evolution with caenorhabditis nematodes. Genetics 2062:691–716. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thompson O, Edgley M, Strasbourger P, Flibotte S, Ewing B, Adair R, Au V, Chaudhry I, Fernando L, Hutter H, et al. 2013. The million mutation project: a new approach to genetics in Caenorhabditis elegans. Genome Res. 2310:1749–1762. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tyson JR, O'Neil NJ, Jain M, Olsen HE, Hieter P, Snutch TP.. 2018. MinION-based long-read sequencing and assembly extends the Caenorhabditis elegans reference genome. Genome Res. 282:266–274. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Watterson GA. 1975. Number of segregating sites in genetic models without recombination. Theor Popul Biol. 72:256–276. [DOI] [PubMed] [Google Scholar]
- Xie C, Tammi MT.. 2009. CNV-seq, a new method to detect copy number variation using high-throughput sequencing. BMC Bioinformatics. 10.10:80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang SH, Wang L, Huang J, Zhang XH, Yuan Y, Chen JQ, Hurst LD, Tian DC.. 2015. Parent-progeny sequencing indicates higher mutation rates in heterozygotes. Nature 5237561:463–U187. [DOI] [PubMed] [Google Scholar]
- Yeh S-D, Saxena AS, Crombie TA, Feistel D, Johnson LM, Lam I, Lam J, Saber S, Baer CF.. 2018. The mutational decay of male-male and hermaphrodite–hermaphrodite competitive fitness in the androdioecious nematode C. elegans. Heredity 1201:1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.