Abstract
Mutations are the ultimate source of variation used for evolutionary adaptation, while also being predominantly deleterious and a source of genetic disorders. Understanding the rate of insertion-deletion mutations (indels) is essential to understanding evolutionary processes, especially in coding regions, where such mutations can disrupt production of essential proteins. Using direct estimates of indel rates from 14 phylogenetically diverse eukaryotic and bacterial species, along with measures of standing variation in such species, we obtain results that imply an inverse relationship of mutation rate and effective population size. These results, which corroborate earlier observations on the base-substitution mutation rate, appear most compatible with the hypothesis that natural selection reduces mutation rates per effective genome to the point at which the power of random genetic drift (approximated by the inverse of effective population size) becomes overwhelming. Given the substantial differences in DNA metabolism pathways that give rise to these two types of mutations, this consistency of results raises the possibility that refinement of other molecular and cellular traits may be inversely related to species-specific levels of random genetic drift.
Keywords: insertion-deletion mutation rate, mutation-rate evolution, drift barrier, mutation accumulation
Mutations are a double-edged sword in all organisms, constituting the ultimate source of variation used for evolutionary adaptation, while also being predominantly deleterious and a source of genetic disorders. Hence, researchers have long sought the primary factors governing mutation-rate evolution. Some have argued that the mutation rate of an organism reflects a balance between the deleterious effect of mutations and physiological limitations, with further refinement of replication fidelity limiting the speed of DNA synthesis necessary for efficient daughter-cell production (Drake 1991; Sniegowski et al. 2000). However, replication fidelity can be improved without a significant decrease in doubling time (Loh et al. 2010), and prokaryotes undergo high cell-division rates and have low mutation rates (Drake 1991; Lynch 2010), suggesting that replication fidelity does not limit the rate of daughter-cell production. Furthermore, because there is no negative correlation between cell-division rate and genome size (Mira et al. 2001; Vieira-Silva et al. 2010), and the reverse may even be true in bacteria (Lynch and Marinov 2015), cell-division rates do not appear to be limited by the amount of DNA synthesized. Thus, alternative forces may govern mutation-rate evolution.
A general relationship describing mutation-rate variation was proposed by Drake et al. (1998), who suggested that the mutation rate per nucleotide site scales inversely with genome size in bacteria and unicellular eukaryotes, such that there is a constant ∼0.003 mutations per haploid genome per cell division. However, as direct estimates of mutation rates for additional organisms became available, the general relationship between genome size and mutation rate became less apparent, even when scaled to the number of cell divisions per generation in multicellular species (Lynch 2010).
In a previous analysis, we found a relationship between the base-substitution mutation rate per site per generation (ubs) multiplied by the amount of functional DNA in a genome (Ge, approximated by proteome size), and the power of random genetic drift, which is inversely proportional to the effective population size (Ne) (Sung et al. 2012a). Because mutations are generally deleterious, this finding suggested that selection operates to reduce genome-wide mutation rates by refining DNA replication fidelity and repair until further improvements are too inconsequential to overcome the power of random genetic drift (Sniegowski and Raynes 2013). This result is consistent with the drift-barrier hypothesis (DBH), which proposes that natural selection operates to improve molecular and cellular traits until the selective advantage of a beneficial mutation refining the trait is so miniscule that the probability of it being fixed is essentially the same as that for neutral mutations (Lynch 2011; Sung et al. 2012a).
While the negative correlation between ubsGe and Ne is consistent with expectations from population-genetic theory, there is a potential issue of circularity when correlating these factors, as the estimation of Ne relies indirectly on the estimation of ubs (Sung et al. 2012a). Although we presented an analysis suggesting that the correlated parameters are not likely to be the primary factor in the observed relationship (Sung et al. 2012a), and provide another one here (Supplemental Material, File S1), a more independent analysis is desirable, and, given the amount of data that has accumulated, it is time to go beyond a study that simply considers base-substitution mutations. Here, we present the rate of insertion-deletion mutation (indel) events (uid) per site per generation across eight eukaryotic and seven bacterial species, while also providing genome-wide estimates of ubs and uid from three new bacterial mutation-accumulation studies. These data continue to support a negative correlation between the genome-wide mutation rate and Ne.
The DBH postulates that genetic drift determines the limit of adaptive molecular refinement that can be achieved for any trait, including those that determine the rate of indels. Indels are a class of mutations separate from base substitutions, differing in how they originate. Indels generally arise from strand slippage or double-strand breaks, whereas base-substitution mutations originate primarily from base misincorporation or biochemical alteration. Furthermore, there are major differences in how the two mutation types are repaired. Base-substitution mutations are often reversed by enzymes such as DNA photolyases and alkyl transferases, which do not require DNA incision and synthesis (Sancar et al. 2004), or are identified by glycosylases in base-excision repair (BER) pathways, and repaired by incision and DNA-gap filling (Krokan and Bjoras 2013). On the other hand, indel mutations are not surveyed by BER, but are repaired primarily by nucleotide-excision repair (NER), which has broad substrate specificity, and is used to excise bulky lesions arising from the insertion or deletion of nucleotides (Morita et al. 2010). Although the mismatch-repair (MMR) pathway can operate on both base-substitution mutations and indels, MMR-deficient strains of Escherichia coli and Caenorhabditis elegans exhibit a significantly greater elevation of the indel mutation rate relative to that for base substitutions, providing further evidence for the differential treatment of mutation types by DNA-repair pathways (Denver et al. 2005; Lee et al. 2012). Furthermore, depending on the type of mismatch and local sequence context, the error rates of different polymerases are highly variable between indel and base-substitution mutations (McCulloch and Kunkel 2008; Kunkel 2009; Sung et al. 2015). In summary, because the enzymes influencing base-substitution and indel mutation rates differ (and shared enzymes differ in the spectrum of repaired premutations), a focus on the indel mutation rate provides a means of testing the validity of the DBH that is substantially independent biologically (and essentially fully independent in terms of investigator sampling) of that used to extrapolate measures of the power of random genetic drift.
Selection operates to refine DNA replication fidelity and repair when the genome-wide deleterious load confers a discernable fitness disadvantage on an organism (Kimura 1967, 1983; Lynch 2010), and the contributions of indel and base substitution mutations to genome-wide deleterious load differ in two ways. First, the effects of base substitutions in coding regions are highly variable (Eyre-Walker and Keightley 2007), and some base substitutions may not have any effect on organismal fitness, which may create some uncertainties in quantifying the effective genome size (Ge), thereby reducing the correlation observed between ubsGe and Ne (Sung et al. 2012a). On the other hand, most indel mutations that arise in protein-coding genes will generate a frame-shift mutation, interfering with gene function, and having a direct effect on organismal fitness. Because such indels are generally deleterious, selection is then expected to more efficiently fine tune the rate at which indels arise, and, if the DBH holds true, this should yield a close correlation between uidGe and Ne. Second, base-substitutions are generally limited to single nucleotides, while indels may involve many base pairs. Although this might suggest that indels have a larger effect than base substitutions, single-base pair indels and gene-sized indels both result in gene disruption, thus generating more similar fitness effects regardless of the indel length. In fact, single base-pair indels in coding DNA may generate malformed gene products that require degradation, which might be even more harmful than entire gene deletions. Because the number of indel events, and not the size of indels, determines the genome-wide deleterious burden, we define the parameter uid to be the number of indel mutation events per site per generation, and use this parameter to test the DBH.
Materials and Methods
To examine the effect of genetic drift on mutation-rate evolution, it is necessary to derive accurate estimates of the mutation rate and genetic diversity across phylogenetically diverse organisms. Whole-genome sequencing (WGS) has greatly improved our ability to estimate such parameters. Highly accurate measurements of ubs and uid can be obtained through WGS of mutation-accumulation (MA) lines, in which repeated single-organism bottlenecking minimizes the efficiency of selection, allowing for the accumulation of all but the most deleterious mutations (Lynch et al. 2008; Denver et al. 2009; Ossowski et al. 2010; Sung et al. 2012a, 2012b, 2015; Schrider et al. 2013). Along with data from prior MA studies, this study contains MA data from four new MA experiments. For new bacterial MA species, ∼100 independent MA lines were initiated from a single founder colony. The new strains used were as follows: Agrobacterium tumefaciens str. C58, Staphylococcus epidermidis ATCC 12228, and Vibrio cholerae 2740-80.
Depending on the speed of growth, a single colony from each MA line was isolated and transferred to a fresh plate every 1–3 d over the course of the experiment. The bottlenecking process ensures that mutations accumulate in an effectively neutral fashion (Kibota and Lynch 1996). After each transfer, the original plate was retained as a backup plate at 4°. If the destination plate was contaminated, or if a single colony could not be picked, a single colony was transferred from the last 4° backup plate.
To estimate the generation times that occurred between each transfer, every 2 wk, an entire colony from five randomly selected MA lines was transferred to 1 × PBS saline buffer. These were vortexed, serially diluted, and replated. Cell density was calculated from viable cell counts in both the growth conditions used throughout the bottleneck process as well as growth conditions at 4°. The total number of generations for each MA line was calculated by the average number of cell divisions per transfer multiplied by the total number of transfers. If backup plates were used, the average number of cell divisions at 4° was used in place of the average number of cell divisions per bottleneck at standard growth temperatures.
The average number of cell divisions across the MA are as follows (Dataset S1): A. tumefaciens, 5819; Bacillus subtilis, 5078 (Sung et al. 2015); E. coli, 4246 (Lee et al. 2012); Mesoplasma florum, 2351 (Sung et al. 2012a); S. epidermidis, 7170, and V. cholerae, 6453. The average number of generations used for reanalysis of the C. elegans MA study was 250 (Denver et al. 2009) (Dataset S2).
DNA extraction of MA lines was done using the wizard DNA extraction kit (Promega) or lysis media (CTAB or SDS) followed by phenol/chloroform extractions to Illumina library standards. Then, 101-bp paired-end Illumina (Illumina Hi-Seq platform) sequencing was applied to randomly selected MA lines of A. tumefaciens, S. epidermidis, and V. cholerae. Each MA line was sequenced to a coverage depth of ∼100 ×, with an average library fragment size (distance between paired-end reads) of ∼175 bp. The paired-end reads for each MA line were individually mapped against the reference genome (assembly and annotation available from the National Center for Biotechnology Information, https://www.ncbi.nlm.nih.gov) using two separate alignment algorithms: BWA v0.7.4 (Li and Durbin 2009) and NOVOALIGN v2.08.02 (available at www.novocraft.com). The resulting pileup files were converted to SAM format using SAMTOOLS v0.1.18 (Li et al. 2009). Using in-house perl scripts, the alignment information was further parsed to generate forward and reverse mapping information at each site, resulting in a configuration of eight numbers for each line (A, a, C, c, G, g, T, and t), corresponding to the number of reads mapped at each genomic position in the reference sequence. A separate file was also generated to display sites that had indel calls from the two alignment algorithms. Mutation calling was performed using a consensus method (Lynch et al. 2008; Denver et al. 2009; Ossowski et al. 2010; Lee et al. 2012; Sung et al. 2012a, 2012b, 2015).
A random subset of base-substitutions mutations called using these methods have been previously validated in E. coli and B. subtilis MA lines using fluorescent sequencing technology at the Indiana Molecular Biology Institute at Indiana University (Lee et al. 2012; Sung et al. 2015) (Dataset S3).
To verify indel mutations, we designed 38 primer sets to PCR amplify 300–500 bp regions surrounding the putative indel mutation in the B. subtilis MA lines (Dataset S4). All 29/29 short indels (< 10 bp) were directly confirmed using standard fluorescent sequencing technology. Two out of nine large indels (> 10 bp) were confirmed through sizing of the PCR product on gel electrophoresis. The remaining seven large indels did not amplify. For all cases, the indel was also confirmed to be absent in one other line without the mutation.
To calculate the base-substitution mutation rate per cell division for each line, we used the following equation:
where ubs is the base-substitution mutation rate (per nucleotide site per generation), m is the number of observed base substitutions, n is the number of nucleotide sites analyzed, and T is the number of generations that occurred in the mutation-accumulation study. The SE for an individual line is calculated using (Denver et al. 2004, 2009):
The total SE of base-substitution mutation rate is given by the SD of the mutation rates across all lines (s) divided by the square root of the number of lines analyzed (N).
The same calculation was used to calculate indel mutation rate, with ubs replaced with uid.
Data availability
Illumina DNA sequences for the MA lines used in this study are deposited under the following Bioprojects: A. tumefaciens PRJNA256312, B. subtilis PRJNA256312, M. florum PRJNA256337, S. epidermidis PRJNA256338, and V. cholerae PRJNA256339.
File S1 contains detailed descriptions of eukaryotic uid estimates, as well as calculations for Ge, Gnc, θs, πs, and phylogenetic independent contrasts for both eukaryotic and prokaryotic organisms. Figure S1 contains average depth of sequencing coverage for each MA line in A. tumefaciens, S. epidermidis, and V. cholerae. Figure S2 displays the similarity in θs when increasing the number of unique alleles analyzed. Figure S3 shows the frequency distribution of mutant calls across MA lines. Table S1 contains the calculation for the estimated limit of selection to fix antimutators. Figure S4, Figure S5, Figure S6, and Table S2 contain statistical support for the DBH. Dataset S1, Dataset S2, Dataset S3, and Dataset S4 contain single nucleotide polymorphisms and indels for prokaryotic and eukaryotic organisms generated in this study.
Results
To examine the effect of genetic drift on mutation-rate evolution, it is necessary to derive accurate estimates of the mutation rate and genetic diversity across phylogenetically diverse organisms. WGS has greatly improved our ability to estimate such parameters. Highly accurate measurements of ubs and uid can be obtained through WGS of MA lines, in which repeated single-organism bottlenecking minimizes the efficiency of selection, allowing for the accumulation of all but the most deleterious mutations (Lynch et al. 2008; Denver et al. 2009; Ossowski et al. 2010; Sung et al. 2012a, 2012b, 2015; Schrider et al. 2013).
The power of genetic drift is related to the inverse of the effective population size [1/Ne for haploids, 1/(2Ne) for diploids]. Under the assumption of neutrality, the effective population size (Ne) can be estimated from the average nucleotide heterozygosity at silent sites in natural populations (πs), or as a function of the number of segregating sites in the population (θs), both of which lead to expected values equal to 4Neubs in diploids and 2Neubs in haploids (Kimura 1983). For most organisms analyzed in this study, enough WGS data were available to allow calculation of species-specific θs values (see File S1 and Table 1). For the remaining species, we pooled large available multilocus-sequence studies to estimate πs. In all cases, we set the estimates of θs or πs equal to 4Neubs in diploids (2Neubs in haploids), and solved for Ne by factoring out ubs. Because this calculation only involves ubs, the estimate of Ne is uninfluenced by sampling error in uid, thus providing an independent trait measurement by which to test the DBH (see File S1 for further evaluation of the nonindependence issue).
Table 1. Effective genome size (Ge), indel events per site per generation (uid), base-substitution mutation rate per generation (ubs), θs (or πs, denoted by *) measurements for population mutation rate (Watterson 1975; Tajima 1989; Fu 1995), and estimated effective population size (Ne) for seven prokaryotic and eight eukaryotic organisms (see File S1 for details).
Species | Label | Ge (× 107 Sites) | Gc + Gnc (× 107 Sites) | uid (× 10−10 per Site per Generation) | ubs (× 10−10 Events per Site per Generation) | θs or πs | Ne (× 106) |
---|---|---|---|---|---|---|---|
Prokaryotes | |||||||
Agrobacterium tumefaciens | Agt | 0.50 | 0.57 | 0.30 | 2.92 | 0.200* | 342.47 |
Bacillus subtilis | Bs | 0.36 | 0.43 | 1.20d | 3.35d | 0.041 | 61.19 |
Escherichia coli | Ec | 0.39 | 0.46 | 0.37e | 2.00e | 0.071 | 179.60 |
Mesoplasma florum | Mf | 0.07 | 0.08 | 23.10f | 97.80f | 0.021 | 1.07 |
Pseudomonas aeruginosa | Pa | 0.59 | 0.67 | 0.14g | 0.79g | 0.033* | 210.70 |
Staphlyococcus epidermidis | Se | 0.21 | 0.26 | 1.13 | 7.40 | 0.052 | 35.14 |
Vibrio cholerae | Vc | 0.34 | 0.39 | 0.18 | 1.15 | 0.110 | 478.26 |
Eukaryotes | |||||||
Arabidopsis thaliana | At | 4.21 | 5.55a | 11.20h | 69.50h,p | 0.008 | 0.29 |
Caenorhabditis elegans | Ce | 2.50 | 6.37b | 6.69i | 14.50q | 0.003 | 0.54 |
Chlamydomonas reinhardtii | Cr | 3.92 | 5.51 | 0.44j | 3.80j | 0.032 | 43.31 |
Drosophila melanogaster | Dm | 2.32 | 8.86c | 4.61k | 51.65k | 0.018 | 0.86 |
Homo sapiens | Hs | 3.65 | 21.75b | 18.20l | 135.13l | 0.001 | 0.02 |
Mus musculus | Mm | 3.55 | 27.17b | 3.10m | 54.00m | 0.004* | 1.77 |
Paramecium tetraurelia | Pt | 5.68 | 7.28 | 0.04n | 0.19n | 0.008 | 101.80 |
Saccharomyces cerevisiae | Sc | 0.87 | 1.02b | 0.92o | 2.63o | 0.004 | 7.78 |
Gc + Gnc is the effective genome size when including the total amount of coding (Gc) and noncoding DNA (Gnc) that is estimated to be under purifying selection. Footnotes in uid and ubs indicate data sources (rates pooled when multiple data sources are available), and, when absent, indicate data generated in this study (see Materials and Methods).
To provide additional data for testing whether the power of genetic drift constrains the lower limit of indel mutation-rate evolution, we performed MA experiments in A. tumefaciens str. C58, S. epidermidis ATCC 12228, and V. cholerae 2740-80. Each bacterial MA experiment was initiated from multiple lines derived from a single progenitor colony, each of which was repeatedly bottlenecked to accumulate mutations for an average of 5819, 7170, and 6453 generations, respectively (see Materials and Methods; harmonic mean population sizes between transfers were 13.4 (0.1), 12.6 (0.3), and 14.9 (0.2), respectively). Then, 101-bp paired-end WGS was applied to randomly selected MA lines (47 A. tumefaciens, 22 S. epidermidis, and 46 V. cholerae MA lines, Dataset S1). The average sequencing coverage depth is greater than 20 × per site across all MA lines surveyed in these organisms (Figure S1), and greater than 50 × per site for 93.75% (150/160) of the MA lines, providing high accuracy for measurement of ubs and uid. Mutations were called and categorized for each of the three species (Dataset S3 and Dataset S4), with ubs and uid shown in Table 1.
To test the DBH, we combined ubs and uid from the three bacterial species analyzed in this study with ubs and uid from four bacterial and eight eukaryotic MA WGS studies (Table 1, Dataset S1, Dataset S2, Dataset S3, and Dataset S4), and also included the same estimates for human derived from WGS of parent-offspring trios. uid includes all indel events in each of the 15 study species (see File S1). Due to the highly repetitive DNA sequence in eukaryotic genomes, the number of large indels events (> 9 bp) in eukaryotes may be downwardly biased when using WGS methods. Therefore, our estimate of the number of large indel events also includes events identified by comparative genome hybridization arrays for organisms where data were available (Lynch et al. 2008; Lipinski et al. 2011). Large indel events only account for 15.0% of total indels events across the study bacteria (76/506, Dataset S4), suggesting that any underestimation of the number of large indel events should only have a small effect on uid.
To determine the genome-wide deleterious burden in each organism associated with indel mutations, we multiplied uid with Ge, approximating the latter by the proteome size of that organism. A plot of the logs of the two parameters of uidGe and Ne against one another yields a strong negative correlation across all of cellular life (Figure 1A, r2 = 0.89). Because the power of genetic drift is inversely proportional to Ne, this observation is consistent with the idea that selection operates to reduce mutation rates to a barrier imposed by random genetic drift. Phylogenetic nonindependence may complicate observed relationships between genomic attributes and Ne (Whitney and Garland 2010). However, the relationship between Ne and uidGe remains robust even after phylogenetic correction (Figure 2, A and B, r2 = 0.83), indicating that the correlation between Ne and uidGe reflects a true biological phenomenon across the Tree of Life.
Discussion
Because the DBH makes general predictions about the pattern of molecular and cellular evolution across the Tree of Life, because our focus is on one of the central determining factors in the evolutionary process (the mutation rate), and because the patterns appear so strong, it is essential to consider the range of factors that might give rise to the observed statistical relationships, and also to alternative evolutionary hypotheses for them. We first consider three issues with respect to estimating the key parameters Ne, ubs, uid, and Ge, and then elaborate on the significance and implications of the relationship between uidGe and Ne for our understanding of molecular evolution.
First, we address the estimation of Ne, one of the most difficult issues in empirical population genetics. Because populations fluctuate in density over time, any estimate of Ne must reflect a long-term average, presumably approximating a harmonic mean, not the immediate population state. Because evolution is a long-term process, however, the mean is most relevant to the issues being examined herein. Recent selective sweeps or population bottlenecks can transiently modify levels of genetic variation at individual loci (Charlesworth 2009; Karasov et al. 2010), introducing noise into any estimates of Ne derived from limited numbers of genetic loci, but this would reduce the strength of any true underlying correlation between the rate of mutation (uidGe), and long-term Ne, i.e., would operate against our ability to detect the expected signal of the DBH.
Such effects are especially likely in asexual species, where the possibility of reduced recombination might subject many neutral nucleotide sites to the effects of selection on nearby, linked sites. Thus, to minimize sampling error, wherever possible, we have relied upon genome-wide sampling of the number of segregating sites to obtain a low-variance estimator of Neu from observations on silent sites (Watterson 1975). The utilization of an average θs across a large number of nucleotide sites and individual isolates reduces the effects of evolutionary sampling variance associated with chromosomally localized and population-specific sweeps arising within individual species (Fu and Li 1993). Using available genomic data, we calculated θs across a large number of within-species genotypic isolates, excluding nearly identical lab strains that originated from the same individual (see Materials and Methods). Although no estimates of silent-site diversity (the source of Ne estimates) are without error, estimates derived from segregating polymorphic sites across large-scale genomic data sets appear quite robust (Figure S2). Moreover, should the levels of variation sampled in our various study species reflect recent events, to which mutation-rate evolution has not had adequate time to respond (Brandvain and Wright 2016), this would only introduce noise into the relationship between effective population size and mutation rates.
Second, as we have noted earlier, there is some concern that correlations between estimates of mutation rates and Ne could, in part, be spurious artifacts resulting from the use of estimates of Ne obtained by dividing measures of standing variation at silent-sites by ubs (Sung et al. 2012a). If the sampling variance of ubs is substantial enough, this could lead to a negative correlation between the observed ubs and extrapolated Ne estimates, and, if there were a sampling covariance between ubs and uid, this could carry over into the current study. In the Supplemental Material (File S1, Figure S4, Figure S5, Figure S6 and Figure S7), we provide complementary analyses to that in Sung et al. (2012a), indicating that the sampling variance of ubs from WGS-MA studies is not large enough to explain the negative correlation previously seen between ubs and Ne estimates. Because ubs and uid are measured by different methods, the sampling covariance between these two measures is expected to be negligible. We emphasize that it is the sampling variance, not the evolutionary variance, that is of concern here. The variance of the log-scaled values of ubs would have to exceed the log-scaled values of Ne by ∼two orders of magnitude in order to create the negative correlations that we observe (File S1). As an extreme way of looking at the situation, if silent-site variation were constant across all taxa, and the parametric values of mutation rates and Ne were obtained without error, the only explanation for the data would be a true underlying negative evolutionary covariance between the two features. In fact, there is a marginal negative correlation between estimates of πs and ubs (Figure S3, Figure S4, Figure S5, Figure S6, Figure S7, and Table S2), further bolstering the idea that ubs and uid decline evolutionarily as Ne increases.
Third, the DBH proposes that the strength of selection operating to reduce the indel mutation rate is based upon the total indel deleterious mutational load, i.e., the product of the mutational rate of appearance of indels at individual nucleotide sites (uid), and the number of sites under selective constraint in the genome (Ge, approximated by the proteome size of the organism). However, some noncoding DNA (e.g., noncoding functional RNAs, and cis-regulatory units in untranslated regions or introns) is certainly under selective constraint, with mutations at these sites increasing the deleterious mutational load. Thus, it can be argued that the estimated number of nucleotides affecting fitness (Ge) scales differently than the protein-coding region of the genome, particularly in larger eukaryotic genomes with a considerable number of noncoding sites (Halligan et al. 2004; Siepel et al. 2005; Halligan and Keightley 2006). Difficulties can arise when estimating the proportion of noncoding DNA that is under selective constraint (Gnc), as the estimated number of such sites can vary greatly depending on the model used to define noncoding DNA, and the identification of conserved noncoding DNA is highly sensitive to the available phylogeny (Siepel et al. 2005). Nevertheless, if we sum the estimated total amount of noncoding DNA under selective constraint (Gnc, see File S1) with that of coding DNA (Gc), we find that uid(Gc + Gnc) and Ne remain highly correlated (Figure 1B, r2 = 0.87), simply because the fraction of functional noncoding DNA increases with the total amount of coding DNA.
We currently adhere to the DBH as an explanation for the phylogenetic pattern of mutation-rate variation primarily because it has been difficult to reconcile the patterns with alternative hypotheses. In the introduction, we provided arguments as to why selection for replication speed appears to be unlikely to explain a negative correlation between mutation rates and population size in unicellular species, and, in multicellular species, the simultaneous deployment of hundreds to thousands of origins of replication makes such an explanation even more unlikely. Nor does a general constraint on replication fidelity explain the data.
A second potential explanation for variation in the per-generation mutation rate is that it is driven largely by variation in numbers of germline cell divisions (Ness et al. 2012), but this cannot be reconciled with the fact that the base-substitution mutation rate scales negatively with Ne in analyses entirely restricted to unicellular species (Sung et al. 2012a). In all such species, there is one cell division per generation, and yet the base-substitution mutation rate per site per cell division ranges from ∼10−11 in Paramecium tetraurelia (Sung et al. 2012b) to ∼10−8 in M. florum (Sung et al. 2012a). Similarly, the number of indel mutational events per site per cell division differs by over two orders of magnitude across unicellular organisms (Table 1 and Figure 3), and the negative regression with Ne remains significant when confined to unicellular species (Figure 1, r2 = 0.66, P = 0.003).
A third hypothesis for mutation-rate evolution is that selection is effective enough to reduce the error rate to the point at which the physical laws of thermodynamics take over (Kimura 1967). However, it is difficult to reconcile this argument with the data now showing that mutation rates vary by three orders of magnitude, as there are no known mechanisms by which basic biophysical features (such as diffusion coefficients and stochastic molecular motion) would vary by this degree among the cytoplasms of different taxa. There is, of course, the issue of evolved differences in the biochemical features and efficiency of operation of the proteins involved in replication and repair. However, this type of variation is in the explanatory domain of the DBH. The DBH postulates that replication fidelity is typically not at the maximum possible level of refinement, but just the lowest level possible under the prevailing level of random genetic drift, which varies substantially among lineages.
That a decline in replication fidelity should decline with decreasing effective population size appears to be a unique prediction of the DBH. Although other theoretical work has been done on mutation-rate evolution, in no case is this type of scaling obviously predicted (acknowledging that this has not been a central focus of such work). For example, allowing for a role of beneficial mutations, Kimura (1967) and Leigh (1970) suggested that the long-term rate of adaptation is maximized when the genome-wide mutation rate equals the rate of population fixation of beneficial mutations. The precise predictions of this hypothesis are not entirely clear, but because mutations arise at a higher rate in large populations, and, if beneficial, fix with higher probabilities, a positive association between the mutation rate and Ne seems to be implied. A rather different model argues that populations should evolve genome-wide mutation rates equal to the average effect of a deleterious mutation (Orr 2000; Johnson and Barton 2002), which seems to imply an optimal mutation rate independent of population size (unless one wishes to postulate an association between average mutational effect and Ne, for which we are unaware of any evidence).
The DBH proposes that new alleles that reduce the genome-wide indel mutation rate (i.e., anti-mutators) can be promoted by selection only if they provide a significant enough advantage to offset the power of genetic drift. The average selective effect of an antimutator or mutator allele (which operate opposite to each other) can be approximated by st·∆Uid, with ∆Uid representing the change in the genome-wide indel mutation rate with respect to the population mean rate, s being the average reduction in fitness per mutation (Lynch 2010), and t being the number of generations a mutation remains associated with its mutator genetic background (Lynch 2011). ∆Uid can be approximated by the change in the indel mutation rate over the effective genome, or ∆uidGe (Lynch 2011). By setting st∆uidGe equal to the power of random genetic drift [1/Ne for haploids, 1/(2Ne) for diploids], we can acquire some sense of the average reduction in the indel mutation rate that is required for the power of selection to exceed power of genetic drift. Using estimates of an average value of the selective coefficient (s = 0.01) (Lynch et al. 1999; Eyre-Walker and Keightley 2007), and assuming that free recombination unlinks mutation-rate modifier alleles from their background every ∼2 generations in sexually outcrossing species (t = 2) (Lynch 2010), solving st∆uidGe = 1/Ne [= 1/(2Ne) for diploids] for ∆uid suggests that the average antimutator must reduce the indel mutation rate by greater than ∼0.1–1% in most organisms (Table S1) in order to be promoted by selection. One major limitation of this kind of analysis is that values of s and t are not well known, and are likely vary across organisms. A second and equally important caveat is that the prior analysis assumes that mutator and antimutator alleles arise with equal frequency. Owing to the high level of refinement of the replication and repair machinery, it seems much more likely that mutations involving the components of such machinery will increase rather than decrease the mutation rate. This will push the equilibrium mutation rate to higher levels than expected (Lynch 2008), although without quantitative information on such bias, it is difficult to determine the exact position at which the mutation rate will stall.
Finally, we note that because recombination unlinks alleles from their genetic background, the capacity of selection to enhance replication fidelity is ultimately a function of the recombination rate (Kimura 1967; Lynch 2008). Thus, it may be viewed as surprising that bacteria, which do not undergo meiotic recombination, exhibit a relationship between uid and Ne similar to that in eukaryotic species engaging in periodic to regular meiosis (Figure 1, A and B). It should be noted, however, that bacterial recombination occurs through multiple mechanisms (transformation, conjugation, and/or transduction). Many bacterial species are known to naturally undergo high rates of recombination, with ratios of recombination to mutation rates frequently being comparable to those in multicellular eukaryotes (Feil and Spratt 2001; Lynch 2007; Doroghazi et al. 2014; Lassalle et al. 2015), so, in this sense, comparable behavior of bacterial and eukaryotic species is not unexpected.
In summary, as in our previous work on the base-substitution mutation rate (Sung et al. 2012a), the strong correlation between the genome-wide indel rate and Ne appears not to be a statistical artifact. Moreover, among various hypotheses that have been suggested for mutation-rate evolution, the DBH appears to provide the most compatible explanation for the ∼1000-fold range of variation of this trait across the Tree of Life. As noted above, the molecular mechanisms that generate and resolve base-substitution and indel mutations differ in a number of ways, and the rate of occurrence of these two types of mutations differ by one to two orders of magnitude (with uid ranging from 1.8 to 11.9% of ubs, presumably because of the elevated deleterious effects of indel mutations). Yet, despite these differences, both ubs and uid scale similarly with changes in Ne (Figure 3, r2 = 0.89). Because the forces of mutation, selection, and drift apply to all biological traits, the maximum achievable level of refinement for other fundamental cellular traits may also be influenced by the drift barrier.
Supplementary Material
Acknowledgments
Support was provided by the Multidisciplinary University Research Initiative Award W911NF-09-1-0444, and from the US Army Research Office to M. L., P. Foster, H. Tang, and S. Finkel, and W911NF-14-1-0411 to M. L., P. Foster, J. McKinlay, and J. T. Lennon, by CAREER award DEB-0845851 from the National Science Foundation to V. C., and by National Institutes of Health Awards F32-GM103164 to W.S., and R01-GM036827 to M. L. and W. K. Thomas. This material is based upon work supported by the National Science Foundation under grant no. CNS-0521433, CNS-0723054, and ABI-1062432 to Indiana University.
Author contributions: W.S., C.F., V.C., and M.L. designed the research; W.S., M.A., M.D., and T.P. performed the research; W.S. and M.A. analyzed the data; and W.S., M.A., and M.L. wrote the paper.
Footnotes
Supplemental material is available online at www.g3journal.org/lookup/suppl/doi:10.1534/g3.116.030890/-/DC1
Communicating editor: S. I. Wright
Literature Cited
- Brandvain Y., Wright S. I., 2016. The limits of natural selection in a nonequilibrium world. Trends Genet. 32: 201–210. [DOI] [PubMed] [Google Scholar]
- Campbell C. D., Eichler E. E., 2013. Properties and rates of germline mutations in humans. Trends Genet. 29: 575–584. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Charlesworth B., 2009. Fundamental concepts in genetics: effective population size and patterns of molecular evolution and variation. Nat. Rev. Genet. 10: 195–205. [DOI] [PubMed] [Google Scholar]
- Conrad D. F., Keebler J. E., DePristo M. A., Lindsay S. J., Zhang Y., et al. , 2011. Variation in genome-wide mutation rates within and between human families. Nat. Genet. 43: 712–714. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Denver D. R., Morris K., Lynch M., Thomas W. K., 2004. High mutation rate and predominance of insertions in the Caenorhabditis elegans nuclear genome. Nature 430: 679–682. [DOI] [PubMed] [Google Scholar]
- Denver D. R., Feinberg S., Estes S., Thomas W. K., Lynch M., 2005. Mutation rates, spectra and hotspots in mismatch repair-deficient Caenorhabditis elegans. Genetics 170: 107–113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Denver D. R., Dolan P. C., Wilhelm L. J., Sung W., Lucas-Lledo J. I., et al. , 2009. A genome-wide view of Caenorhabditis elegans base-substitution mutation processes. Proc. Natl. Acad. Sci. USA 106: 16310–16314. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Doroghazi J. R., Buckley D. H., 2014. Intraspecies comparison of Streptomyces pratensis genomes reveals high levels of recombination and gene conservation between strains of disparate geographic origin. BMC Genomics 15: 970. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Drake J. W., 1991. A constant rate of spontaneous mutation in DNA-based microbes. Proc. Natl. Acad. Sci. USA 88: 7160–7164. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Drake J. W., Charlesworth B., Charlesworth D., Crow J. F., 1998. Rates of spontaneous mutation. Genetics 148: 1667–1686. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eyre-Walker A., Keightley P. D., 2007. The distribution of fitness effects of new mutations. Nat. Rev. Genet. 8: 610–618. [DOI] [PubMed] [Google Scholar]
- Feil E. J., Spratt B. G., 2011. Recombination and the population structures of bacterial pathogens. Annu. Rev. Microbiol. 55: 561–590. [DOI] [PubMed] [Google Scholar]
- Fu Y. X., 1995. Statistical properties of segregating sites. Theor. Popul. Biol. 48: 172–197. [DOI] [PubMed] [Google Scholar]
- Fu Y. X., Li W. H., 1993. Statistical tests of neutrality of mutations. Genetics 133: 693–709. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Garland T., Dickerman A. W., Janis C. M., Jones J. A., 1993. Phylogenetic analysis of covariance by computer-simulation. Syst. Biol. 42: 265–292. [Google Scholar]
- Halligan D. L., Keightley P. D., 2006. Ubiquitous selective constraints in the Drosophila genome revealed by a genome-wide interspecies comparison. Genome Res. 16: 875–884. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Halligan D. L., Eyre-Walker A., Andolfatto P., Keightley P. D., 2004. Patterns of evolutionary constraints in intronic and intergenic DNA of Drosophila. Genome Res. 14: 273–279. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Haudry A., Platts A. E., Vello E., Hoen D. R., Leclercq M., et al. , 2013. An atlas of over 90,000 conserved noncoding sequences provides insight into crucifer regulatory regions. Nat. Genet. 45: 891–898. [DOI] [PubMed] [Google Scholar]
- Johnson T., Barton N. H., 2002. The effect of deleterious alleles on adaptation in asexual populations. Genetics 162: 395–411. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Karasov T., Messer P. W., Petrov D. A., 2010. Evidence that adaptation in Drosophila is not limited by mutation at single sites. PLoS Genet. 6: e1000924. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kibota T. T., Lynch M., 1996. Estimate of the genomic mutation rate deleterious to overall fitness in E. coli. Nature 381: 694–696. [DOI] [PubMed] [Google Scholar]
- Kimura M., 1967. On the evolutionary adjustment of spontaneous mutation rates. Genet. Res. 9: 23–24. [Google Scholar]
- Kimura M., 1983. The Neutral Theory of Molecular Evolution, Cambridge University Press, Cambridge, UK. [Google Scholar]
- Kong A., Frigge M. L., Masson G., Besenbacher S., Sulem P., et al. , 2012. Rate of de novo mutations and the importance of father’s age to disease risk. Nature 488: 471–475. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Krokan H. E., Bjoras M., 2013. Base excision repair. Cold Spring Harb. Perspect. Biol. 5: a012583. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kunkel T. A., 2009. Evolving views of DNA replication (in)fidelity. Cold Spring Harb. Symp. Quant. Biol. 74: 91–101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lassalle F., Perian S., Bataillon T., Nesme X., Duret L., et al. , 2015. GC-Content evolution in bacterial genomes: the biased gene conversion hypothesis expands. PLoS Genet. 11: e1004941. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lee H., Popodi E., Tang H., Foster P. L., 2012. Rate and molecular spectrum of spontaneous mutations in the bacterium Escherichia coli as determined by whole-genome sequencing. Proc. Natl. Acad. Sci. USA 109: e2774–e2783. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Leigh E. G., Jr., 1970. Natural selection and mutability. Am. Nat. 104: 301–305. [Google Scholar]
- Li H., Durbin R., 2009. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25: 1754–1760. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., et al. , 2009. The sequence alignment/map format and SAMtools. Bioinformatics 25: 2078–2079. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lipinski K. J., Farslow J. C., Fitzpatrick K. A., Lynch M., Katju V., et al. , 2011. High spontaneous rate of gene duplication in Caenorhabditis elegans. Curr. Biol. 21: 306–310. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Loh E., Salk J. J., Loeb L. A., 2010. Optimization of DNA polymerase mutation rates during bacterial evolution. Proc. Natl. Acad. Sci. USA 107: 1154–1159. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lynch M., 2007. The Origins of Genome Architecture, Sinauer Associates, Sunderland, Massachusetts. [Google Scholar]
- Lynch M., 2008. The cellular, developmental and population-genetic determinants of mutation-rate evolution. Genetics 180: 933–943. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lynch M., 2010. Evolution of the mutation rate. Trends Genet. 26: 345–352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lynch M., 2011. The lower bound to the evolution of mutation rates. Genome Biol. Evol. 3: 1107–1118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lynch M., Marinov G. K., 2015. The bioenergetic costs of a gene. Proc. Natl. Acad. Sci. USA 112: 15690–15695. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lynch M., Blanchard J., Houle D., Kibota T., Schultz S., et al. , 1999. Spontaneous deleterious mutation. Evolution 53: 645–663. [DOI] [PubMed] [Google Scholar]
- Lynch M., Sung W., Morris K., Coffey N., Landry C. R., et al. , 2008. A genome-wide view of the spectrum of spontaneous mutations in yeast. Proc. Natl. Acad. Sci. USA 105: 9272–9277. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Martins, E. P., 2004 Compare, Version 4.6b. Computer Programs for the Statistical Analysis of Comparative Data Department of Biology, Indiana University, Bloomington, IN. Available at: http://compare.bio.indiana.edu. [Google Scholar]
- McCulloch S. D., Kunkel T. A., 2008. The fidelity of DNA synthesis by eukaryotic replicative and translesion synthesis polymerases. Cell Res. 18: 148–161. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mira A., Ochman H., Moran N. A., 2001. Deletional bias and the evolution of bacterial genomes. Trends Genet. 17: 589–596. [DOI] [PubMed] [Google Scholar]
- Morita R., Nakane S., Shimada A., Inoue M., Iino H., et al. , 2010. Molecular mechanisms of the whole DNA repair system: a comparison of bacterial and eukaryotic systems. J. Nucleic Acids 2010: 179594. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ness R. W., Morgan A. D., Colegrave N., Keightley P. D., 2012. Estimate of the spontaneous mutation rate in Chlamydomonas reinhardtii. Genetics 192: 1447–1454. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ness R. W., Kraemer S. A., Colegrave N., Keightley P. D., 2015. Direct estimate of the spontaneous mutation rate uncovers the effects of drift and recombination in the Chlamydomonas reinhardtii plastid genome. Mol. Biol. Evol. 33: 800–808. [DOI] [PubMed] [Google Scholar]
- O’Roak B. J., Deriziotis P., Lee C., Vives L., Schwartz J. J., et al. , 2011. Exome sequencing in sporadic autism spectrum disorders identifies severe de novo mutations. Nat. Genet. 43: 585–589. [DOI] [PMC free article] [PubMed] [Google Scholar]
- O’Roak B. J., Vives L., Girirajan S., Karakoc E., Krumm N., et al. , 2012. Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations. Nature 485: 246–250. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Orr H. A., 2000. The rate of adaptation in asexuals. Genetics 155: 961–968. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ossowski S., Schneeberger K., Lucas-Lledo J. I., Warthmann N., Clark R. M., et al. , 2010. The rate and molecular spectrum of spontaneous mutations in Arabidopsis thaliana. Science 327: 92–94. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sancar A., Lindsey-Boltz L. A., Unsal-Kacmaz K., Linn S., 2004. Molecular mechanisms of mammalian DNA repair and the DNA damage checkpoints. Annu. Rev. Biochem. 73: 39–85. [DOI] [PubMed] [Google Scholar]
- Schrider D. R., Houle D., Lynch M., Hahn M. W., 2013. Rates and genomic consequences of spontaneous mutational events in Drosophila melanogaster. Genetics 194: 937–954. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Siepel A., Bejerano G., Pedersen J. S., Hinrichs A. S., Hou M., et al. , 2005. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15: 1034–1050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sniegowski P., Raynes Y., 2013. Mutation rates: how low can you go? Curr. Biol. 23: R147–R149. [DOI] [PubMed] [Google Scholar]
- Sniegowski P. D., Gerrish P. J., Johnson T., Shaver A., 2000. The evolution of mutation rates: separating causes from consequences. BioEssays 22: 1057–1066. [DOI] [PubMed] [Google Scholar]
- Sung W., Ackerman M. S., Miller S. F., Doak T. G., Lynch M., 2012a Drift-barrier hypothesis and mutation-rate evolution. Proc. Natl. Acad. Sci. USA 109: 18488–18492. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sung W., Tucker A. E., Doak T. G., Choi E., Thomas W. K., et al. , 2012b Extraordinary genome stability in the ciliate Paramecium tetraurelia. Proc. Natl. Acad. Sci. USA 109: 19339–19344. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sung W., Ackerman M. S., Gout J. F., Miller S. F., Williams E., et al. , 2015. Asymmetric context-dependent mutation patterns revealed through mutation-accumulation experiments. Mol. Biol. Evol. 32: 1672–1683. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tajima F., 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123: 585–595. [DOI] [PMC free article] [PubMed] [Google Scholar]
- The 1000 Genomes Project Consortium , 2015. A global reference for human genetic variation. Nature 526: 68–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Uchimura A., Higuchi M., Minakuchi Y., Ohno M., Toyoda A., et al. , 2015. Germline mutation rates and the long-term phenotypic effects of mutation accumulation in wild-type laboratory mice and mutator mice. Genome Res. 25: 1125–1134. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vieira-Silva S., Touchon M., Rocha E. P., 2010. No evidence for elemental-based streamlining of prokaryotic genomes. Trends Ecol. Evol. 25: 319–320; author reply 320–311. [DOI] [PubMed] [Google Scholar]
- Wang H., Zhu X., 2014. De novo mutations discovered in 8 Mexican American families through whole genome sequencing. BMC Proc. 8: S24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Watterson G. A., 1975. On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7: 256–276. [DOI] [PubMed] [Google Scholar]
- Whitney K. D., Garland T., Jr, 2010. Did genetic drift drive increases in genome complexity? PLoS Genet. 6: e1001080. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang S., Wang L., Huang J., Zhang X., Yuan Y., et al. , 2015. Parent-progeny sequencing indicates higher mutation rates in heterozygotes. Nature 523: 463–467. [DOI] [PubMed] [Google Scholar]
- Zhu Y. O., Siegal M. L., Hall D. W., Petrov D. A., 2014. Precise estimates of mutation rate and spectrum in yeast. Proc. Natl. Acad. Sci. USA 111: e2310–e2318. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Illumina DNA sequences for the MA lines used in this study are deposited under the following Bioprojects: A. tumefaciens PRJNA256312, B. subtilis PRJNA256312, M. florum PRJNA256337, S. epidermidis PRJNA256338, and V. cholerae PRJNA256339.
File S1 contains detailed descriptions of eukaryotic uid estimates, as well as calculations for Ge, Gnc, θs, πs, and phylogenetic independent contrasts for both eukaryotic and prokaryotic organisms. Figure S1 contains average depth of sequencing coverage for each MA line in A. tumefaciens, S. epidermidis, and V. cholerae. Figure S2 displays the similarity in θs when increasing the number of unique alleles analyzed. Figure S3 shows the frequency distribution of mutant calls across MA lines. Table S1 contains the calculation for the estimated limit of selection to fix antimutators. Figure S4, Figure S5, Figure S6, and Table S2 contain statistical support for the DBH. Dataset S1, Dataset S2, Dataset S3, and Dataset S4 contain single nucleotide polymorphisms and indels for prokaryotic and eukaryotic organisms generated in this study.