Abstract
Synonymous codons are not used at equal frequency throughout the genome, a phenomenon termed codon usage bias (CUB). It is often assumed that interspecific variation in the intensity of CUB is related to species differences in effective population sizes (Ne), with selection on CUB operating less efficiently in species with small Ne. Here, we specifically ask whether variation in Ne predicts differences in CUB in mammals and report two main findings. First, across 41 mammalian genomes, CUB was not correlated with two indirect proxies of Ne (body mass and generation time), even though there was statistically significant evidence of selection shaping CUB across all species. Interestingly, autosomal genes showed higher codon usage bias compared to X-linked genes, and high-recombination genes showed higher codon usage bias compared to low recombination genes, suggesting intraspecific variation in Ne predicts variation in CUB. Second, across six mammalian species with genetic estimates of Ne (human, chimpanzee, rabbit, and three mouse species: Mus musculus, M. domesticus, and M. castaneus), Ne and CUB were weakly and inconsistently correlated. At least in mammals, interspecific divergence in Ne does not strongly predict variation in CUB. One hypothesis is that each species responds to a unique distribution of selection coefficients, confounding any straightforward link between Ne and CUB.
Keywords: Codon usage bias, effective population size, selection
Introduction
In most organisms, synonymous codons are not used at equal frequencies. This phenomenon has been termed codon usage bias (CUB), and many studies support a role of natural selection in this phenomenon (Shields et al. 1988; Moriyama and Hartl 1993; Akashi et al. 1998; Comeron and Kreitman 1998; Chamary et al. 2006; Plotkin and Kudla 2011; Waldman et al. 2011; Behura et al. 2013; Kober and Pogson 2013). Proposed mechanisms influencing CUB include translational efficiency (Grantham et al. 1981; Ikemura 1985; Bulmer 1991; Carlini and Stephan 2003; Rocha 2004; Stoletzki and Eyre-Walker 2007; Parmley and Huynen 2009; Hense 2010; Ran and Higgs 2010, 2012; Sharp et al. 2010; Behura and Severson 2011; Shah and Gilchrist 2011; Qian et al. 2012; Agashe et al. 2013; Lawrie et al. 2013; Michely 2013), mRNA stability or folding (Moriyama and Powell 1998; dos Reis et al. 2004; Chamary and Hurst 2005; Chamary et al. 2006; Novoa and Ribas de Pouplana 2012; Kober and Pogson 2013; Shabalina et al. 2013), transcription factor binding (Stergachis 2013), overlap with other functional elements in the genome (Lin 2011), and/or a trade-off between rapid versus accurate translation (Yang et al. 2014).
The level of CUB varies dramatically across species (Grantham et al. 1980a,b; Sharp 1988), including insects (Vicario et al. 2007), mammals (Doherty and McInerney 2013), and plants (Ingvarsson 2008, 2010). Given the large number of codons affecting most cellular processes, the selective benefit associated with any single “preferred” codon should be small. Therefore, CUB is likely to be under weak selection (Akashi 1995; Maside et al. 2004; Cutter and Charlesworth 2006; Haddrill et al. 2010), the efficacy of which will depend on a species’ effective population size (Ne; Kimura 1983; Charlesworth 2009). Consequently, it is often assumed that interspecific variation in CUB can be attributed to interspecific variation in Ne.
Consistent with this hypothesis, Drosophila simulans has relatively high CUB compared to Drosophila melanogaster (Akashi 1996; McVean and Vieira 2001; Andolfatto et al. 2011) and D. simulans has relatively large Ne (Aquadro et al. 1988). Similarly, Drosophila pseudoobscura shows higher codon usage bias than Drosophila miranda, with evidence that the former has a larger Ne (Bachtrog 2007; Haddrill et al. 2010). Outcrossing plant species have more biased codon usage than self-fertilizing relatives (Qiu et al. 2011b), consistent with an expected reduction of Ne upon the evolution of selfing. However, humans have experienced more evolutionary constraint on codon usage compared to mice (Eory et al. 2010), in spite of their smaller Ne (Zhao 2000). Thus, it remains unclear whether variation in CUB can be attributed to differences in effective population sizes, especially in mammals.
The hypothesized link between Ne and CUB assumes that the strength of selection is both small and homogeneous across species. Multiple studies have demonstrated that weak selection shapes patterns of CUB across mammals, including humans, the mammal with the smallest known historical Ne (Urrutia and Hurst 2003; Comeron 2004; Lu and Wu 2005; Kondrashov et al. 2006; Yang and Nielsen 2008; Waldman et al. 2011; Doherty and McInerney 2013), but see (Urrutia and Hurst 2001; Duret 2002). However, theoretical and empirical studies suggest that selective coefficients may not be homogeneous across species. For example, mutations toward suboptimal codons may be multiplicatively deleterious, so that the strength of selection acting against a suboptimal codon depends on the number of suboptimal codons already present (Kondrashov et al. 2006; Charlesworth 2013). If true, then species with small Ne may harbor more suboptimal codons, but this may lead to relatively stronger selection against them (Akashi 1995; Hershberg and Petrov 2008), potentially confounding any predicted relationship between Ne and CUB.
The goal of this manuscript is to test whether Ne and CUB are correlated in mammals. After demonstrating that codon usage is affected by selection across mammals, we address this goal in two main steps. First, we quantify CUB from 41 mammalian genomes and demonstrate that the observed variation in CUB is not phylogenetically correlated with two indirect proxies for Ne (age at sexual maturity and body mass). Second, we test for the same correlation across six mammalian species (human, chimpanzee, rabbit, and three mice: Mus musculus, M. domesticus, and M. castaneus), for which genetic estimates of Ne existed in the literature, ranging from ∼10K (humans) to ∼780K (rabbits). Although Ne and CUB are phylogenetically correlated in these latter six species, the effects are modest and inconsistent. At least in mammals, therefore, differences in Ne do not seem to account for the broad interspecific variation in CUB. One hypothesis is that the distribution of selection coefficients varies across species independently of Ne, confounding any straightforward link to CUB. Direct estimates of selection coefficients using divergence data in two independent species pairs supported this hypothesis.
Materials and Methods
Genomes
In the 41-species analyses, all transcripts (exons and introns) were downloaded from Ensembl version 74 (http://www.ensembl.org). For genes with more than one transcript, we chose a transcript randomly for analyses. Across all genes, we repeated the random choice five times, then averaged across the five iterations. CUB is correlated with gene length (Eyre-Walker 1996; Moriyama and Powell 1998; Zeng and Charlesworth 2009), so we also repeated our analyses after systematically choosing the shortest transcript or the longest transcript from all genes. Results from the five random, shortest, and longest transcripts were qualitatively similar; we report the average of the five random choices for simplicity. Only transcripts with at least 100 codons were included due to uncertainty in estimating codon usage in shorter genes (Moriyama and Powell 1998; Novembre 2002).
In the six-species analyses, our aim was to compare only homologous codons. In addition to confining the analysis to one-to-one orthologs between six species, we excluded all codons that had an ambiguity or indel in any one species, as well as all codons that were 3′ to the earliest stop codon of any one species (Appendix S1). We dealt with multiple transcripts as described above. Complete genomes of M. domesticus, M. musculus, and M. castaneus were downloaded from Keane et al. (2011), then transcripts and flanking regions were computationally assembled using the coordinates of the mouse genome annotations of Ensembl version 65 (http://www.ensembl.org). Due to incomplete lineage sorting and/or hybridization, phylogenetic relationships among these three mouse species vary across the genome (White et al. 2009); we therefore added the three mouse species to the phylogeny as an unresolved trichotomy with a common ancestor 350K years ago (Geraldes 2008). One-to-one orthologs among species were identified using the phylogenetic analyses of Ensembl version 65: approximately 11,000 genes had one-to-one orthologs across all six species and a minimum of 100 codons of alignment. We translated each set of transcripts into proteins, aligned protein sequences with both PRANK (Löytynoja and Goldman 2005) and CLUSTALW (Thompson et al. 1994), and then back-translated aligned sequences to their original DNA sequences. We report results based on PRANK-aligned sequences; our conclusions did not change if we used CLUSTALW-aligned sequences.
Quantifying CUB
There are multiple methods for quantifying codon usage bias. The “effective number of codons” (ENC; Wright 1990) and variations thereof (Fuglsang 2006) quantify deviation from the null hypothesis that synonymous codons within each amino acid class are used at equal frequency. However, that null hypothesis assumes equal frequency of the four nucleotides, which could be violated if the mutational process is biased (Palidwor et al. 2010; Zhang 2012) or if base composition varies across the genome (Bernardi 1995, 2000). To control for biased mutational processes, Novembre (2002) proposed the “effective number of codons Prime” (ENCp), deriving expected codon usage from local base composition. If the four bases are equally frequent, ENCp reduces to ENC. Both ENC and ENCp theoretically range from 20 (every amino acid coded by a single codon, representing maximal bias) to 61 (each amino acid coded by each of its synonymous codons at equal frequency, representing minimal bias). We calculated ENC and ENCp using Novembre's ENCprime software, which quantifies the significance of observed versus expected codon usage via Pearson's χ2 statistics (Novembre 2002). Alternative methods, such as the “frequency of preferred codons” (Ikemura 1981) or the “codon adaptation index” (Sharp and Li 1987; Lee et al. 2010), require a priori definition of “preferred codons,” which may not be conserved across species (Hershberg and Petrov 2009; Rao 2011). Furthermore, selection may favor an overall balanced combination of preferred and unpreferred codons at the genomic scale so that genes vary in their preferred and unpreferred codons (Shah and Gilchrist 2011; Qian et al. 2012; Agashe et al. 2013; Yang et al. 2014). All results presented below were qualitatively similar whether we use ENC or ENCp; we report ENCp.
When estimating ENCp, we estimated the background base composition using either the 2 kb flanking sequence on each side of the gene (4 kb total) or the concatenated introns of each transcript. The latter approach controls for mutational processes specifically related to transcription; however, many genes drop out of the analysis because they did not meet our minimum requirement of having at least 1000 bp of intronic DNA. Both strategies yielded nearly identical results (Appendix S2); we report results based on flanking regions.
Testing for selection
One of the primary assumptions behind the hypothesis that CUB scales with Ne is that codon usage is shaped by selection. In addition to existing literature on the subject (see Introduction), we tested for selection using three main methods. First, we quantified the number of genes that showed ENCp significantly different than expectations built from local base composition across the 41 mammalian species (Novembre 2002).
Second, we implemented the methodology of Yang and Nielsen (2008) to specifically test whether codon usage was influenced by selection across the six mammalian species for which we had independent genetic estimates of effective population size. Under a FMutSel0 model, codon usage evolves only by mutational bias and is unaffected by selection. Under a FMutSel model, synonymous mutations can fix according to differences in fitness between synonymous codons. A likelihood ratio test (LRT), quantified as twice the difference in log-likelihoods of the two models, distributed as a χ2 distribution with degrees of freedom equal to the difference in the number of parameters estimated, is a formal test of whether selection affects codon usage.
Third, we tested for selection with a novel approach that focused on resequencing data from humans, the species with the smallest Ne, and therefore the least likely to be affected by selection. We analyzed the 1000 genome data (Consortium TGP 2012) from the Yoruban population in order to minimize effects of known bottlenecks in non-African populations. For amino acids with redundant codons (sixfold redundant amino acids were divided into their respective fourfold and twofold redundant classes, Rocha 2004; Sun et al. 2013), we considered the most frequently used codon in the genome as “preferred” and the least frequently used codon as “unpreferred”. Inaccuracy in defining preferred/unpreferred codons in this way should only add noise to our analysis, making our conclusions below conservative. A McDonald–Kreitman framework (McDonald and Kreitman 1991) was then applied to test whether the ratio of fixed: polymorphic sites differed between unpreferred-to-preferred: preferred-to-unpreferred mutations, with polarity determined by comparison to the chimp + gorilla genomes. To generate null expectations and account for possible mutational biases that could mimic codon bias, we repeated the analysis in introns, forcing segregating sites to be a third position in an imaginary codon. Intronic “codons” from reverse-transcribed genes were also reverse complemented. Any segregating sites in an intron that fell within 20 bp of an exon–intron boundary were excluded. For all codons, we also gathered the +1 and +2 position so that we could repeat the analyses after excluding sites that could have been mis-polarized due to CpG hypermutability on either strand. Specifically, an XXTGX in chimps + gorilla to XXCGX mutation in human could be falsely polarized if two independent CpG->TpG mutations occurred in chimp and gorilla (X indicates any base with the constraint that they are the same across species; 5′ to 3′ of coding direction is shown, with the 3rd codon position in the middle of the quintet). By similar logic, XCAXX in chimp + gorilla and XCGXX in humans could arise via two independent cytosine deaminations on the other strand.
Quantifying Ne
For the 41-species analyses, robust estimates of Ne do not exist for most species, so we turned to two indirect proxies of Ne – body mass and age at sexual maturity – gleaned from the literature (Appendix S3). Large and/or slowly reproducing mammals tend to have small population sizes (Ohta 1972; Damuth 1981).
For the six-species analyses, genetic estimates of Ne were taken from the literature: human (Zhao 2000), chimp (Won and Hey 2005), rabbit (Carneiro et al. 2009), and the three mouse species (Geraldes 2008; Geraldes et al. 2011). We confined our analyses to those studies for which Ne was estimated from resequencing data rather than genotyping known polymorphisms because the latter strategy suffers from ascertainment bias. Although the studies cited obviously differ in methodology, they were largely drawn from noncoding regions of the genome, to avoid assaying regions affected by selection. Estimates of Ne ranged roughly 78-fold, from ∼10K in humans to ∼780K in rabbits.
We tested the correlation between Ne (or its proxies) and ENCp using the gls procedure in the R package nlme, with a correlation structure that accounted for phylogenetic relatedness (Pagel 1999), built with the corPagel procedure in the R package ape (Paradis et al. 2004). We used the phylogenetic relationships and branch lengths inferred by Meredith et al. (2011). In the 6-species analyses, convergence was unstable; we therefore repeated each gls under all 6! = 720 unique orders in which taxa could be input into the analysis and report median statistical values (Appendix S4).
We repeated the analyses after dividing genes into groups expected to show intraspecific variation in Ne. For species whose public genomes included chromosomal compartment, we tested whether ENCp differed among autosomal versus X-linked genes. Assuming an equal effective sex ratio (which may not be a valid assumption, Hammer et al. 2008; Hammer 2010), the X chromosome has an Ne predicted to be three-fourths as large as each autosome. Positive selection on X-linked recessives will further reduce the effective population size of the X chromosome, due to selection at linked sites (Maynard Smith and Haigh 1974; Kaplan et al. 1989; Andolfatto and Przeworski 2001; Kousathanas et al. 2014).
Similar to X-linked versus autosomal comparisons, genes in regions of low recombination should have reduced Ne, because they are more likely to be in physical linkage with sites under selection (Hill and Robertson 1966). Consistent with this intuition, codon usage was reduced in regions with relatively low recombination in Drosophila (Kliman and Hey 1993; Hey and Kliman 2002; Marais and Piganeau 2002). Because recombination rates are not known for all species, we considered genes within 10 MB of the centromere boundary to be in relatively low recombination regions compared to genes within 10 MB of the telomere boundary. In mouse (human), this demarcation was biologically relevant, where the average recombination rate was 0.43 cM/Mb (4.6 cM/Mb) for centromeric regions and 0.58 cM/Mb (8.7 cM/Mb) for telomeric regions; Wilcoxon rank-sum test P = 0.015 (P = 10−10). Mouse recombination maps were taken from Cox (2009), while human recombination maps were downloaded from UCSC Genome Browser's HapMap2 for the GRCh37 (HG19) build.
To test for heterogeneity across different classes of genes or sites, we repeated all analyses for (1) the bottom, top, and middle third of genes ranked by ENCp; (2) each amino acid group separately (e.g., Kliman 2014; Yang et al. 2014); and (3) after excluding potential exon splice enhancers. For this latter analysis, we excluded the first 15–17 bases (five codons plus 0–2 additional base pairs to preserve reading frame) from each end of every coding exon from each transcript. Such regions may be constrained to act as exon splice enhancer elements (Eyre-Walker and Bulmer 1993; Parmley and Hurst 2007; Warnecke and Hurst 2007; Gu et al. 2010; Lin 2011) and may experience less efficient selection compared to internal codons (Loewe and Charlesworth 2007).
Results
Codon usage bias is shaped by selection in mammals
Using three different approaches, we found strong evidence that CUB has been shaped by selection across mammals. First, across the 41 mammalian genomes, an average of 91.1% of genes (range: 85.8–93.7%; average number of 17,271 genes analyzed, range: 10,235–19,410) showed significant evidence of selection (ENCp more biased than expected at P < 0.05 after Benjamini–Hochberg correction).
Second, for the six-species analysis, approximately 90% of the 11,000 orthologous gene alignments showed statistically significant evidence of selection (twice the difference in log-likelihoods estimated for the FMutSel0 versus FMutSel models ≥ 56.94, df = 41, P < 0.05, significance determined after Benjamini–Hochberg correction, Yang and Nielsen 2008). This number is similar to Yang and Nielsen (2008), who found evidence of selection in 94% of genes evolving across a phylogeny of five divergent mammal species. We repeated the analysis using only human and chimp, the two species with the smallest effective population sizes. Our power to detect selection is expected to plummet for the human–chimp comparison after trimming out most of the evolutionary divergence from the phylogeny. Furthermore, we might expect to detect less selection given these are the two species with the smallest effective population sizes. In spite of these two expected limitations, we still found statistically significant evidence for selection in 77% of orthologous genes (likelihood ratio test ≥ 56.94, df = 41, P < 0.05, significance determined after Benjamini–Hochberg correction). This number is similar to the 87% of genes found to be under selection by Yang and Nielsen (2008), who compared human and macaque.
Third, Yoruban genomes showed a significantly higher proportion of variable sites that are fixed for unpreferred-to-preferred versus preferred-to-unpreferred mutations (0.49 vs. 0.43, χ2 = 66.62, P < 10−15, Table1). Interestingly, this same pattern was observed in fake “codons” constructed from intronic regions (0.36 vs. 0.35, χ2 = 458.4, P < 10−15, Table1), suggesting some mutational bias mimics some patterns of codon usage bias. However, the observed χ2 deviation normalized by the total number of observed mutations was more than an order of magnitude larger in exonic versus intronic regions (χ2/N = 2.60 vs. 0.23, respectively, Table1), strongly suggesting that selection favors unpreferred-to-preferred mutations in exons above and beyond mutational biases. After excluding any sites that could have arisen via CpG hypermutation, the overall patterns remain the same (numbers in parentheses of Table1).
Table 1.
Polymorphic | Fixed1 | P (Fixed) | Chisq | P | Chisq/N | |
---|---|---|---|---|---|---|
Exons | ||||||
Preferred→unpreferred | 10,249 (10,249) | 7729 (7729) | 0.43 (0.43) | |||
Unpreferred→preferred | 3957 (2859) | 3731 (2487) | 0.49 (0.47) | 66.62 (20.71) | 10−15 (10−5) | 2.60 (0.89) |
Intron | ||||||
Preferred→unpreferred | 690,248 (690,248) | 366,993 (366,993) | 0.35 (0.35) | |||
Unpreferred→preferred | 609,530 (521,822) | 345,218 (285,076) | 0.36 (0.35) | 458.46 (76.7) | 10−15 (10−15) | 0.23 (0.04) |
Polarized by comparison of human segregating sites to chimpanzee + gorilla genomes.
In sum, all three approaches provided strong support that selection has shaped codon bias in mammals, even those species with the smallest effective population sizes. Our results are consistent with a growing body of work demonstrating codon usage is under selection in mammals (Urrutia and Hurst 2003; Comeron 2004; Lu and Wu 2005; Kondrashov et al. 2006; Yang and Nielsen 2008; Waldman et al. 2011; Doherty and McInerney 2013). Other studies have argued that patterns of selection in mammals are either absent or the result of mutational processes or methodological artifacts (Urrutia and Hurst 2001; Duret 2002). These latter studies point out the confounding factors that mutational processes and base composition have on estimates of codon usage bias, which are controlled for in all three approaches used above.
Codon usage bias was not correlated to inferred variation in Ne across 41 mammalian species
In spite of the evidence that selection shapes codon usage, ENCp was not correlated to either proxy of effective population size. ENCp varied from the most biased score of 48.93 in cow to the least biased score of 51.99 in hedgehog (Fig.1, Appendix S3). Variation in log10 ENCp was not correlated to log10 age at sexual maturity (phylogenetically controlled t39 = −0.09, P = 0.93, Fig.2A) or log10 body mass (phylogenetically controlled t39 = −1.35, P = 0.18, Fig.2B). For the most part, we did not find either correlation if we analyzed (1) the lowest, intermediate, or highest third of genes ranked according to ENCp (generation time: t39 = −0.44, −0.08, 1.15; r = −0.19, −0.14, −0.02; P = 0.67, 0.93, 0.26, for the three groups, respectively; body mass: t39 = −0.99, −1.33, −0.39; r = −0.21, −0.27, −0.09; P = 0.33, 0.19, 0.70, for the three groups, respectively); (2) each amino acid family separately (exceptions being twofold redundant arginine where ENCp correlated to generation time and threonine where ENCp correlated to body mass, Appendix S5); or (3) each transcript after excluding potential exon splice enhancers (correlation to log10 generation time: t39 = −0.09, r = −0.16, P = 0.92; correlation to log10 body mass: t39 = −1.24, r = −0.30, P = 0.22).
The phylogenetically controlled methods just presented test for a linear relationship between ENCp and Ne. To check for nonlinear relationships, we performed a simple nonparametric test, asking whether codon usage bias tended to increase in those parts of the phylogeny where effective population size increased, regardless of magnitude. After calculating phylogenetically independent contrasts using the pic function in ape (Paradis et al. 2004), there was no evidence for this pattern using either proxy of effective population size (for both body mass and generation time: 22 of 40 independent contrasts showed increased codon bias with increased Ne, Fisher's exact test P > 0.65), again arguing against a strong correlation between codon usage and effective population size.
One possible explanation for the lack of a strong correlation between Ne and CUB is that each species is subject to its own unique distribution of selection coefficients associated with codon usage. If true, then within each species, variation in CUB may still correlate with intragenomic variation in Ne. To test this prediction, we now turn our attention to comparisons of genes that are X-linked versus autosomal, as well as in high versus low recombination regions.
Codon bias was weaker for X-linked genes
For all 17 species for which chromosomal compartment was annotated, autosomal genes were more biased than X-linked genes, significantly so for 15/17 species (P < 0.05 after Benjamini–Hochberg correction, the exceptions being gorilla and opossum), in an analysis of covariance (ANCOVA) taking into account the important covariates of exon and intron lengths (Moriyama and Powell 1998; Duret and Mouchiroud 1999; Comeron and Kreitman 2000; Vinogradov 2001; Stoletzki and Eyre-Walker 2007; Stoletzki 2011; Behura et al. 2013; Fig.3A and B, Appendices S3 and S6). The reduced bias of X-linked genes is consistent with their expected reduction in Ne relative to autosomes, although much of the variation remains to be explained.
The pattern of reduced CUB on X-linked genes observed here is opposite that observed in flies, worms, and plants, where X-linked genes were more strongly biased than autosomes (Singh et al. 2005; Haddrill et al. 2010; Zeng and Charlesworth 2010; Qiu et al. 2011a). Although outside the main focus of this manuscript, the differences may be due to differences in X inactivation, dosage compensation, and/or the history of gene traffic between X and autosomes (Emerson et al. 2004).
Codon bias was weaker for low recombination genes
For all five species for which centromere and telomere were annotated, centromeric genes were less biased then telomeric genes, significantly so for 4/5 species (P < 0.05 after Benjamini–Hochberg correction, with marmoset the exception) in an ANCOVA taking into account length of exons and introns (Fig.3C and D, Appendices S3 and S7). As with comparisons between X chromosomes and autosomes, this result is consistent with the idea that intragenomic differences in Ne predict variation in CUB. Although recombination itself may favor mutations toward GC (Marais et al. 2001), such processes are not expected to explain our results because ENCp takes mutational biases into account. As argued in Materials and Methods, we chose a biologically meaningful chromosome length (10 Mb) to define centromeric and telomeric loci, but we uncovered the same qualitative results if used either 20 Mb or 5 Mb cutoffs instead.
Codon usage bias was not strongly correlated to variation in genetic estimates of Ne across six mammalian species
Among the six mammalian species for which independent genetic estimates of Ne existed, there was a significant correlation between Ne and ENCp (phylogenetically controlled t = −5.716, r = −0.617, P = 0.005, Appendix S8). When analyzed separately, 18.1% of genes showed a significant, negative correlation between Ne and ENCp (P < 0.05 after Benjamini–Hochberg correction). However, the differences in ENCp were modest, ranging from 51.41 in humans to 50.46 in rabbit even though Ne varies by roughly 78-fold among these species (Appendices S8 and S9). Furthermore, 15.0% of genes showed a significant, positive correlation between Ne and ENCp (P < 0.05 after Benjamini–Hochberg correction), opposite the prediction of weak selection. Although rabbit is clearly an outlier (Appendix S8), removing it did not change the significance (phylogenetically controlled t = −2.892, r = −0.5992, P = 0.01).
For the six-species analyses, three different subsets of the data revealed some interesting patterns. First, for genes ranked in the lowest third of ENCp (indicating high bias), there was not a statistically significant correlation to Ne (phylogenetically controlled t4 = 2.32, r = 0.028, P = 0.08 [although there is a trend, note that the positive t-value indicates it is in the opposite direction predicted by weak selection]). The relationship held for the other two groups (phylogenetically controlled t4 = −3557340, −16.46; r = −0.53, −0.82; P ≤ 0.0001, 0.0001, for the intermediate and highest ranked thirds, respectively). Second, when we analyzed each amino acid family separately, 18 of the 21 amino acid groups showed statistically significant evidence of the correlation (phylogenetically controlled P < 0.05 in all cases, the exceptions being asparagine, the fourfold serine, and the twofold leucine, Appendix S10). However, two of the 18 significant results (proline and threonine) showed a correlation in the opposite direction predicted by weak selection (Appendix S10). Third, when transcripts were analyzed after removing potential exon splice enhancers, there was not a significant correlation between Ne and codon usage (phylogenetically controlled t = 2.15, r = 0.29, P = 0.09 [although there is a trend, note that it is in the opposite direction predicted by weak selection]). Removing potential exon splice enhancers removed a median of 252 bp from transcripts, covering a median 21.3% of each transcript. Overall, then, there is not a consistently strong correlation between Ne and CUB in the 6-species analyses.
Discussion
It is often assumed that interspecific variation in effective population size (Ne) explains a significant amount of the interspecific variation in CUB. We find almost no support for the correlation between Ne and CUB. In our 41-species analyses, we did not uncover any evidence that interspecific differences in Ne predicted variation in codon usage bias. In the six-species analyses, we uncovered a significant phylogenetic correlation, but the differences in CUB were subtle in spite of a 78-fold range in Ne and inconsistent across different subsets of the data. Furthermore, even though rabbit has by far the largest mammalian Ne in the present study, it does not have the most biased genome (Fig.1, Appendix 3). On the whole, our study does not support a strong relationship between effective population size and codon usage bias.
There are multiple hypotheses that could explain why Ne and CUB were not strongly correlated in our study. The general prediction that Ne predicts CUB assumes that the average selective coefficient affecting codon usage is both small and homogenous across species. Although these assumptions seem to hold in a variety of studies (see Introduction), they may be violated under some scenarios. For example, selection associated with CUB may be stronger than previously appreciated (Carlini and Stephan 2003; Lawrie et al. 2013), may vary according to the number of suboptimal codons in a genome (Akashi 1995; Hershberg and Petrov 2008), or may act synergistically (Kondrashov et al. 2006; Charlesworth 2013). Another possibility is that species with large Ne experience elevated rates of adaptive evolution on protein coding genes, potentially interfering with selection on codon usage (Betancourt and Presgraves 2002; Haddrill et al. 2011; Phifer-Rixey 2012), though such an effect has been argued to be weak (Bierne and Eyre-Walker 2006). Ultimately, codon usage bias is correlated to many factors (Behura and Severson 2013), especially gene expression level (Gouy and Gautier 1982) and it may simply be that the appropriate data for parsing out different forces affecting CUB (e.g., gene expression data across tissues and species) are currently lacking. Species divergence in any of these correlates could obscure any simple link between CUB and Ne. We now explore three potential hypotheses in more detail.
First, the distribution of selection coefficients may differ across species. For example, species with more deleterious codon usage might experience stronger selection toward preferred codons (Akashi 1995; Kondrashov et al. 2006; Hershberg and Petrov 2008; Charlesworth 2013). To test this hypothesis further, we estimated |Ne*s| (Yang and Nielsen 2008), where s is the selective coefficient acting specifically on codon usage, for two independent species pairs: human–chimp and M. castaneus–M. domesticus. The median |Ne*s| did not differ across 11,000 genes (median |Ne*s| = 6.61, 6.62 for rodents and primates, respectively; Wilcoxon rank-sum test, P-value = 0.93) even though the Ne of the rodent species pair (M. castaneus = 220K, M. domesticus = 100K) is more than nine times larger than the Ne of the primate species pair (chimp = 25K, human = 10K). By extension, the average selection coefficient associated with codon usage must be roughly nine times larger in primates compared to rodents, supporting the hypothesis that the selection coefficients vary across species.
Second, other predictors of the efficacy of selection, such as recombination rate, may differ between species, and could obscure any straightforward predictions about the effects of population size. A growing body of evidence suggests that species with small Ne have evolved increased rates of recombination. For example, the number of chiasmata is positively correlated with generation time across mammals (Burt and Bell 1987). Additionally, artificial selection experiments, where organisms experience both a bottleneck in numbers and an increase in selective intensity, often result in the evolution of increased recombination rate (Burt and Bell 1987; Otto and Lenormand 2002). Recombination rates for three mouse species studied here (M. castaneus, M. domesticus, and M. musculus) vary by ∼30%; M. musculus, the species with the smallest effective population size, has the largest recombination rate while M. castaneus, the species with the largest effective population size, has the smallest recombination rate (Dumont et al. 2011). Furthermore, a small island population of M. domesticus has evolved a higher recombination rate compared to classic strains of M. domesticus (Dumont and Payseur 2011). It is possible that reductions in effective population size are somewhat counterbalanced by the expected increase in selective efficiency gained by elevated rates of recombination, complicating straightforward predictions about Ne-CUB correlations.
Third, codon usage bias may simply evolve on a different time scale than effective population size or its correlates (Jensen and Bachtrog; Marais et al. 2004; Zeng and Charlesworth 2009, 2010), which would be especially important if populations frequently deviate from equilibrium (Zeng and Charlesworth 2009). However, we note that patterns of adaptive protein evolution correlated with effective population size were detected in the three mouse species studied here (Phifer-Rixey 2012), even though they have only been separated for ∼350K years (Geraldes 2008). Thus, the timescale would seem to be long enough to detect a correlation if it existed.
In sum, species-specific selection coefficients and/or recombination rates may obscure the predicted correlation between CUB and Ne across 41 mammalian species. At least in mammals, our study rejects the common assumption that interspecific differences in codon usage can be attributed to variation in effective population sizes. This pattern may be widespread: across 13 independent pairs of eukaryotic species, Gossmann et al. (2012) failed to find a correlation between CUB and Ne. Gossmann et al. (2012) analyzed two mammalian species pairs, averaging 261 genes; our study extends to 41 mammalian genomes. The continued accumulation of population level resequencing data and whole genomes, as well as independent estimates of Ne over a broader range of taxa, will shed further light on the evolutionary processes that shape codon usage.
Acknowledgments
Funding was provided by USC startup funds (MDD) and National Science Foundation Grant #1146525 (MDD). J. Jensen, D. Lawrie, R. Nielsen, J. Novembre, S. Nuzhdin, P. Ralph, M. Somel, and S. Wright made many valuable contributions. K. Tsung provided assistance gathering data. M. Springer provided electronic versions of mammal phylogenies. A. Moore and four anonymous reviewers provided many helpful comments.
Conflict of Interest
None declared.
Supporting Information
Additional Supporting Information may be found in the online version of this article:
References
- Agashe D, Martinez-Gomez NC, Drummond DA, Marx CJ. Good codons, bad transcript: large reductions in gene expression and fitness arising from synonymous mutations in a key enzyme. Mol. Biol. Evol. 2013;30:549–560. doi: 10.1093/molbev/mss273. doi: 10.1093/molbev/mss273. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Akashi H. Inferring weak selection from patterns of polymorphism and divergence at “silent” sites in Drosophila DNA. Genetics. 1995;139:1067–1076. doi: 10.1093/genetics/139.2.1067. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Akashi H. Molecular evolution between Drosophila melanogaster and D. simulans: reduced codon bias, faster rates of amino acid substitution, and larger proteins in D. melanogaster. Genetics. 1996;144:1297–1307. doi: 10.1093/genetics/144.3.1297. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Akashi H, Kliman R. Mutation pressure, natural selection, and the evolution of base composition in Drosophila. In: Woodruff R, Thompson J Jr, Eyre-Walker A, editors. Mutation and evolution. Dordrecht, the Netherlands: Springer; 1998. pp. 49–60. [PubMed] [Google Scholar]
- Andolfatto P, Przeworski M. Regions of lower crossing over harbor more rare variants in African populations of Drosophila melanogaster. Genetics. 2001;158:657–665. doi: 10.1093/genetics/158.2.657. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Andolfatto P, Wong KM, Bachtrog D. Effective population size and the efficacy of selection on the X chromosomes of two closely related Drosophila species. Genome Biol. Evol. 2011;3:114–128. doi: 10.1093/gbe/evq086. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Aquadro CF, Lado KM, Noon WA. The rosy region of Drosophila melanogaster and Drosophila simulans. I. Contrasting levels of naturally occurring DNA restriction map variation and divergence. Genetics. 1988;119:875–888. doi: 10.1093/genetics/119.4.875. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bachtrog D. Reduced selection for codon usage bias in Drosophila miranda. J. Mol. Evol. 2007;64:586–590. doi: 10.1007/s00239-006-0257-x. doi: 10.1007/s00239-006-0257-x. [DOI] [PubMed] [Google Scholar]
- Behura SK, Severson DW. Coadaptation of isoacceptor tRNA genes and codon usage bias for translation efficiency in Aedes aegypti and Anopheles gambiae. Insect Mol. Biol. 2011;20:177–187. doi: 10.1111/j.1365-2583.2010.01055.x. doi: 10.1111/j.1365-2583.2010.01055.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Behura SK, Severson DW. Codon usage bias: causative factors, quantification methods and genome-wide patterns: with emphasis on insect genomes. Biol. Rev. 2013;88:49–61. doi: 10.1111/j.1469-185X.2012.00242.x. doi: 10.1111/j.1469-185X.2012.00242.x. [DOI] [PubMed] [Google Scholar]
- Behura SK, Singh BK, Severson DW. Antagonistic relationships between intron content and codon usage bias of genes in three mosquito species: functional and evolutionary implications. Evol. Appl. 2013;6:1079–1089. doi: 10.1111/eva.12088. doi: 10.1111/eva.12088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bernardi G. The human genome: organization and evolutionary history. Annu. Rev. Genet. 1995;29:445–476. doi: 10.1146/annurev.ge.29.120195.002305. doi: 10.1146/annurev.ge.29.120195.002305. [DOI] [PubMed] [Google Scholar]
- Bernardi G. Isochores and the evolutionary genomics of vertebrates. Gene. 2000;241:3–17. doi: 10.1016/s0378-1119(99)00485-0. doi: http://dx.doi.org/10.1016/S0378-1119(99)00485-0. [DOI] [PubMed] [Google Scholar]
- Betancourt AJ, Presgraves DC. Linkage limits the power of natural selection in Drosophila. Proc. Natl Acad. Sci. USA. 2002;99:13616–13620. doi: 10.1073/pnas.212277199. doi: 10.1073/pnas.212277199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bierne N, Eyre-Walker A. Variation in synonymous codon use and DNA polymorphism within the Drosophila genome. J. Evol. Biol. 2006;19:1–11. doi: 10.1111/j.1420-9101.2005.00996.x. doi: 10.1111/j.1420-9101.2005.00996.x. [DOI] [PubMed] [Google Scholar]
- Bulmer M. The selection-mutation-drift theory of synonymous codon usage. Genetics. 1991;129:897–907. doi: 10.1093/genetics/129.3.897. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Burt A, Bell G. Mammalian chiasma frequencies as a test of two theories of recombination. Nature. 1987;326:803–805. doi: 10.1038/326803a0. [DOI] [PubMed] [Google Scholar]
- Carlini DB, Stephan W. In vivo introduction of unpreferred synonymous codons into the Drosophila Adh gene results in reduced levels of ADH protein. Genetics. 2003;163:239–243. doi: 10.1093/genetics/163.1.239. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carneiro M, Ferrand N, Nachman MW. Recombination and speciation: loci near centromeres are more differentiated than loci near telomeres between subspecies of the European rabbit (Oryctolagus cuniculus. Genetics. 2009;181:593–606. doi: 10.1534/genetics.108.096826. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chamary JV, Hurst LD. Evidence for selection on synonymous mutations affecting stability of mRNA secondary structure in mammals. Genome Biol. 2005;6:R75. doi: 10.1186/gb-2005-6-9-r75. doi: 10.1186/gb-2005-6-9-r75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chamary JV, Parmley JL, Hurst LD. Hearing silence: non-neutral evolution at synonymous sites in mammals. Nat. Rev. Genet. 2006;7:98–108. doi: 10.1038/nrg1770. doi: 10.1038/nrg1770. [DOI] [PubMed] [Google Scholar]
- Charlesworth B. Effective population size and patterns of molecular evolution and variation. Nat. Rev. Genet. 2009;10:195–205. doi: 10.1038/nrg2526. [DOI] [PubMed] [Google Scholar]
- Charlesworth B. Stabilizing selection, purifying selection and mutational bias in finite populations. Genetics. 2013;194:955–971. doi: 10.1534/genetics.113.151555. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Comeron JM. Selective and mutational patterns associated with gene expression in humans: influences on synonymous composition and intron presence. Genetics. 2004;167:1293–1304. doi: 10.1534/genetics.104.026351. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Comeron JM, Kreitman M. The correlation between synonymous and nonsynonymous substitutions in Drosophila: mutation, selection or relaxed constraints? Genetics. 1998;150:767–775. doi: 10.1093/genetics/150.2.767. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Comeron JM, Kreitman M. The correlation between intron length and recombination in Drosophila: dynamic equilibrium between mutational and selective forces. Genetics. 2000;156:1175–1190. doi: 10.1093/genetics/156.3.1175. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Consortium TGP. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491:56–65. doi: 10.1038/nature11632. doi: http://www.nature.com/nature/journal/v491/n7422/abs/nature11632.html – supplementary-information. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cox A, Ackert-Bicknell CL, Dumont BL, Ding Y, Bell JT, Brockmann GA, et al. A new standard genetic map for the laboratory mouse. Genetics. 2009;182:1335–1344. doi: 10.1534/genetics.109.105486. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cutter AD, Charlesworth B. Selection intensity on preferred codons correlates with overall codon usage bias in Caenorhabditis remanei. Curr. Biol. 2006;16:2053–2057. doi: 10.1016/j.cub.2006.08.067. [DOI] [PubMed] [Google Scholar]
- Damuth J. Population density and body size in mammals. Nature. 1981;290:699–700. [Google Scholar]
- Doherty A, McInerney JO. Translational selection frequently overcomes genetic drift in shaping synonymous codon usage patterns in vertebrates. Mol. Biol. Evol. 2013;30:2263–2267. doi: 10.1093/molbev/mst128. [DOI] [PubMed] [Google Scholar]
- Dumont BL, Payseur BA. Evolution of the genomic recombination rate in murid rodents. Genetics. 2011;187:643–657. doi: 10.1534/genetics.110.123851. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dumont BL, White MA, Steffy B, Wiltshire T, Payseur BA. Extensive recombination rate variation in the house mouse species complex inferred from genetic linkage maps. Genome Res. 2011;21:114–125. doi: 10.1101/gr.111252.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Duret L. Evolution of synonymous codon usage in metazoans. Curr. Opin. Genet. Dev. 2002;12:640–649. doi: 10.1016/s0959-437x(02)00353-2. [DOI] [PubMed] [Google Scholar]
- Duret L, Mouchiroud D. Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis. Proc. Natl Acad. Sci. 1999;96:4482–4487. doi: 10.1073/pnas.96.8.4482. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Emerson JJ, Kaessmann H, Betran E, Long M. Extensive gene traffic on the mammalian X chromosome. Science. 2004;303:537–540. doi: 10.1126/science.1090042. [DOI] [PubMed] [Google Scholar]
- Eory L, Halligan DL, Keightley PD. Distributions of selectively constrained sites and deleterious mutation rates in the hominid and murid genomes. Mol. Biol. Evol. 2010;27:177–192. doi: 10.1093/molbev/msp219. [DOI] [PubMed] [Google Scholar]
- Eyre-Walker A. Synonymous codon bias is related to gene length in Escherichia coli: selection for translational accuracy? Mol. Biol. Evol. 1996;13:864–872. doi: 10.1093/oxfordjournals.molbev.a025646. [DOI] [PubMed] [Google Scholar]
- Eyre-Walker A, Bulmer M. Reduced synonymous substitution rate at the start of enterobacterial genes. Nucleic Acids Res. 1993;21:4599–4603. doi: 10.1093/nar/21.19.4599. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fuglsang A. Estimating the “effective number of codons”: the Wright way of determining codon homozygosity leads to superior estimates. Genetics. 2006;172:1301–1307. doi: 10.1534/genetics.105.049643. doi: 10.1534/genetics.105.049643. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Geraldes A, Basset P, Gibson B, Smith KL, Harr B, Yu HT, et al. Inferring the history of speciation in house mice from autosomal, X-linked, Y-linked and mitochondrial genes. Mol. Ecol. 2008;17:5349–5363. doi: 10.1111/j.1365-294X.2008.04005.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Geraldes A, Basset P, Smith KL, Nachman MW. Higher differentiation among subspecies of the house mouse (Mus musculus) in genomic regions with low recombination. Mol. Ecol. 2011;20:4722–4736. doi: 10.1111/j.1365-294X.2011.05285.x. doi: 10.1111/j.1365-294X.2011.05285.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gossmann TI, Keightley PD, Eyre-Walker A. The effect of variation in the effective population size on the rate of adaptive molecular evolution in eukaryotes. Genome Biol. Evol. 2012;4:658–667. doi: 10.1093/gbe/evs027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gouy M, Gautier C. Codon usage in bacteria: correlation with gene expressivity. Nucleic Acids Res. 1982;10:7055–7074. doi: 10.1093/nar/10.22.7055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grantham R, Gautier C, Gouy M. Codon frequencies in 119 individual genes confirm consistent choices of degenerate bases according to genome type. Nucleic Acids Res. 1980a;8:1893–1912. doi: 10.1093/nar/8.9.1893. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grantham R, Gautier C, Gouy M, Mercier R, Pave A. Codon catalog usage and the genome hypothesis. Nucleic Acids Res. 1980b;8:r49–r62. doi: 10.1093/nar/8.1.197-c. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grantham R, Gautier C, Gouy M, Jacobzone M, Mercier R. Codon catalog usage is a genome strategy modulated for gene expressivity. Nucleic Acids Res. 1981;9:r43–r74. doi: 10.1093/nar/9.1.213-b. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gu W, Zhou T, Wilke CO. A universal trend of reduced mRNA stability near the translation-initiation site in prokaryotes and eukaryotes. PLoS Comput. Biol. 2010;6:e1000664. doi: 10.1371/journal.pcbi.1000664. doi: 10.1371/journal.pcbi.1000664. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Haddrill PR, Loewe L, Charlesworth B. Estimating the parameters of selection on nonsynonymous mutations in Drosophila pseudoobscura and D. miranda. Genetics. 2010;185:1381–1396. doi: 10.1534/genetics.110.117614. doi: genetics.110.117614 [pii] 10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Haddrill PR, Zeng K, Charlesworth B. Determinants of synonymous and nonsynonymous variability in three species of Drosophila. Mol. Biol. Evol. 2011;28:1731–1743. doi: 10.1093/molbev/msq354. [DOI] [PubMed] [Google Scholar]
- Hammer MF, Woerner AE, Mendez FL, Watkins JC, Cox MP, Wall JD. The ratio of human X chromosome to autosome diversity is positively correlated with genetic distance from genes. Nat. Genet. 2010;42:830–831. doi: 10.1038/ng.651. [DOI] [PubMed] [Google Scholar]
- Hammer MF, Mendez FL, Cox MP, Woerner AE, Wall JD. Sex-biased evolutionary forces shape genomic patterns of human diversity. PLoS Genet. 2008;4:e1000202. doi: 10.1371/journal.pgen.1000202. doi: 10.1371/journal.pgen.1000202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hense W, Anderson N, Hutter S, Stephan W, Parsch J, Carlini DB. Experimentally increased codon bias in the Drosophila ADH gene leads to an increase in larval, but not adult, alcohol dehydrogenase activity. Genetics. 2010;184:547–555. doi: 10.1534/genetics.109.111294. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hershberg R, Petrov DA. Selection on codon bias. Annu. Rev. Genet. 2008;42:287–299. doi: 10.1146/annurev.genet.42.110807.091442. doi: 10.1146/annurev.genet.42.110807.091442. [DOI] [PubMed] [Google Scholar]
- Hershberg R, Petrov DA. General rules for optimal codon choice. PLoS Genet. 2009;5:e1000556. doi: 10.1371/journal.pgen.1000556. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hey J, Kliman RM. Interactions between natural selection, recombination and gene density in the genes of Drosophila. Genetics. 2002;160:595–608. doi: 10.1093/genetics/160.2.595. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hill WG, Robertson A. The effect of linkage on limits to artificial selection. Genet. Res. 1966;8:269–294. [PubMed] [Google Scholar]
- Ikemura T. Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes. J. Mol. Biol. 1981;146:1–21. doi: 10.1016/0022-2836(81)90363-6. [DOI] [PubMed] [Google Scholar]
- Ikemura T. Codon usage and tRNA content in unicellular and multicellular organisms. Mol. Biol. Evol. 1985;2:13–24. doi: 10.1093/oxfordjournals.molbev.a040335. [DOI] [PubMed] [Google Scholar]
- Ingvarsson PK. Molecular evolution of synonymous codon usage in Populus. BMC Evol. Biol. 2008;8:307. doi: 10.1186/1471-2148-8-307. doi: 10.1186/1471-2148-8-307. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ingvarsson PK. Natural selection on synonymous and nonsynonymous mutations shapes patterns of polymorphism in Populus tremula. Mol. Biol. Evol. 2010;27:650–660. doi: 10.1093/molbev/msp255. doi: 10.1093/molbev/msp255. [DOI] [PubMed] [Google Scholar]
- Jensen JD, Bachtrog D. Characterizing the influence of effective population size on the rate of adaptation: Gillespie's Darwin Domain. Genome Biol. Evol. 2011;3:687–701. doi: 10.1093/gbe/evr063. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kaplan NL, Hudson RR, Langley CH. The “hitchhicking effect” revisited. Genetics. 1989;123:887–899. doi: 10.1093/genetics/123.4.887. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Keane TM, Goodstadt L, Danecek P, White MA, Wong K, Yalcin B, et al. Mouse genomic variation and its effect on phenotypes and gene regulation. Nature. 2011;477:289–294. doi: 10.1038/nature10413. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kimura M. The neutral theory of molecular evolution. Cambridge, U.K: Cambridge Univ. Press; 1983. [Google Scholar]
- Kliman RM. Evidence that natural selection on codon usage in Drosophila pseudoobscura varies across codons. G3 (Bethesda) 2014;4:681–692. doi: 10.1534/g3.114.010488. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kliman RM, Hey J. Reduced natural selection associated with low recombination in Drosophila melanogaster. Mol. Biol. Evol. 1993;10:1239–1258. doi: 10.1093/oxfordjournals.molbev.a040074. [DOI] [PubMed] [Google Scholar]
- Kober KM, Pogson GH. Genome-wide patterns of codon bias are shaped by natural selection in the purple sea urchin, Strongylocentrotus purpuratus. G3 (Bethesda) 2013;3:1069–1083. doi: 10.1534/g3.113.005769. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kondrashov FA, Ogurtsov AY, Kondrashov AS. Selection in favor of nucleotides G and C diversifies evolution rates and levels of polymorphism at mammalian synonymous sites. J. Theor. Biol. 2006;240:616–626. doi: 10.1016/j.jtbi.2005.10.020. doi: 10.1016/j.jtbi.2005.10.020. [DOI] [PubMed] [Google Scholar]
- Kousathanas A, Halligan DL, Keightley PD. Faster-X Adaptive Protein Evolution in House Mice. Genetics. 2014;196:1131–1143. doi: 10.1534/genetics.113.158246. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lawrie DS, Messer PW, Hershberg R, Petrov DA. Strong purifying selection at synonymous sites in D. melanogaster. PLoS Genet. 2013;9:e1003527. doi: 10.1371/journal.pgen.1003527. doi: 10.1371/journal.pgen.1003527. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lee S, Weon S, Lee S, Kang C. Relative codon adaptation index, a sensitive measure of codon usage bias. Evol. Bioinform. Online. 2010;6:47–55. doi: 10.4137/ebo.s4608. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lin MF, Kheradpour P, Washietl S, Parker BJ, Pedersen JS, Kellis M. Locating protein-coding sequences under selection for additional, overlapping functions in 29 mammalian genomes. Genome Res. 2011;21:1916–1928. doi: 10.1101/gr.108753.110. doi: 10.1101/gr.108753.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Loewe L, Charlesworth B. Background selection in single genes may explain patterns of codon bias. Genetics. 2007;175:1381–1393. doi: 10.1534/genetics.106.065557. doi: 10.1534/genetics.106.065557. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Löytynoja A, Goldman N. An algorithm for progressive multiple alignment of sequences with insertions. Proc. Natl Acad. Sci. USA. 2005;102:10557–10562. doi: 10.1073/pnas.0409137102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lu J, Wu CI. Weak selection revealed by the whole-genome comparison of the X chromosome and autosomes of human and chimpanzee. Proc. Natl Acad. Sci. USA. 2005;102:4063–4067. doi: 10.1073/pnas.0500436102. doi: 10.1073/pnas.0500436102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marais G, Piganeau G. Hill-Robertson interference is a minor determinant of variations in codon bias across Drosophila melanogaster and Caenorhabditis elegans genomes. Mol. Biol. Evol. 2002;19:1399–1406. doi: 10.1093/oxfordjournals.molbev.a004203. [DOI] [PubMed] [Google Scholar]
- Marais G, Mouchiroud D, Duret L. Does recombination improve selection on codon usage? Lessons from nematode and fly complete genomes. Proc. Natl Acad. Sci. 2001;98:5688–5692. doi: 10.1073/pnas.091427698. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marais G, Domazet-Loso T, Tautz D, Charlesworth B. Correlated evolution of synonymous and nonsynonymous sites in Drosophila. J. Mol. Evol. 2004;59:771–779. doi: 10.1007/s00239-004-2671-2. doi: 10.1007/s00239-004-2671-2. [DOI] [PubMed] [Google Scholar]
- Maside X, Lee AW, Charlesworth B. Selection on codon usage in Drosophila americana. Curr. Biol. 2004;14:150–154. doi: 10.1016/j.cub.2003.12.055. [DOI] [PubMed] [Google Scholar]
- Maynard Smith J, Haigh J. The hitchhicking effect of a favorable gene. Genet. Res. 1974;23:23–25. [PubMed] [Google Scholar]
- McDonald JH, Kreitman M. Adaptive protein evolution at the Adh locus in Drosophila. Nature. 1991;351:652–654. doi: 10.1038/351652a0. [DOI] [PubMed] [Google Scholar]
- McVean GA, Vieira J. Inferring parameters of mutation, selection and demography from patterns of synonymous site evolution in Drosophila. Genetics. 2001;157:245–257. doi: 10.1093/genetics/157.1.245. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Meredith RW, Janečka JE, Gatesy J, Ryder OA, Fisher CA, Teeling EC, et al. Impacts of the cretaceous terrestrial revolution and KPg extinction on mammal diversification. Science. 2011;334:521–524. doi: 10.1126/science.1211028. [DOI] [PubMed] [Google Scholar]
- Michely S, Toulza E, Subirana L, John U, Cognat V, Marechal-Drouard L, et al. Evolution of codon usage in the smallest photosynthetic eukaryotes and their giant viruses. Genome Biol. Evol. 2013;5:848–859. doi: 10.1093/gbe/evt053. doi: 10.1093/gbe/evt053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moriyama EN, Hartl DL. Codon usage bias and base composition of nuclear genes in Drosophila. Genetics. 1993;134:847–858. doi: 10.1093/genetics/134.3.847. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moriyama EN, Powell JR. Gene length and codon usage bias in Drosophila melanogaster Saccharomyces cerevisiae and Escherichia coli. Nucleic Acids Res. 1998;26:3188–3193. doi: 10.1093/nar/26.13.3188. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Novembre JA. Accounting for background nucleotide composition when measuring codon usage bias. Mol. Biol. Evol. 2002;19:1390–1394. doi: 10.1093/oxfordjournals.molbev.a004201. [DOI] [PubMed] [Google Scholar]
- Novoa EM, Ribas de Pouplana L. Speeding with control: codon usage, tRNAs, and ribosomes. Trends Genet. 2012;28:574–581. doi: 10.1016/j.tig.2012.07.006. doi: 10.1016/j.tig.2012.07.006. [DOI] [PubMed] [Google Scholar]
- Ohta T. Evolutionary rate of cistrons and DNA divergence. J. Mol. Evol. 1972;1:150–157. doi: 10.1007/BF01659161. doi: 10.1007/BF01659161. [DOI] [PubMed] [Google Scholar]
- Otto SP, Lenormand T. Resolving the paradox of sex and recombination. Nat. Rev. Genet. 2002;3:252–261. doi: 10.1038/nrg761. [DOI] [PubMed] [Google Scholar]
- Pagel M. Inferring the historical patterns of biological evolution. Nature. 1999;401:877–884. doi: 10.1038/44766. doi: 10.1038/44766. [DOI] [PubMed] [Google Scholar]
- Palidwor GA, Perkins TJ, Xia X. A general model of codon bias due to GC mutational bias. PLoS ONE. 2010;5:e13431. doi: 10.1371/journal.pone.0013431. doi: 10.1371/journal.pone.0013431. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Paradis E, Claude J, Strimmer K. APE: analyses of phylogenetics and evolution in R language. Bioinformatics. 2004;20:289–290. doi: 10.1093/bioinformatics/btg412. [DOI] [PubMed] [Google Scholar]
- Parmley JL, Hurst LD. Exonic splicing regulatory elements skew synonymous codon usage near intron-exon boundaries in mammals. Mol. Biol. Evol. 2007;24:1600–1603. doi: 10.1093/molbev/msm104. doi: 10.1093/molbev/msm104. [DOI] [PubMed] [Google Scholar]
- Parmley JL, Huynen MA. Clustering of codons with rare cognate tRNAs in human genes suggests an extra level of expression regulation. PLoS Genet. 2009;5:e1000548. doi: 10.1371/journal.pgen.1000548. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Phifer-Rixey M, Bonhomme F, Boursot P, Churchill GA, Piálek J, Tucker PK, et al. Adaptive evolution and effective population size in wild house mice. Mol. Biol. Evol. 2012;29:2949–2955. doi: 10.1093/molbev/mss105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Plotkin JB, Kudla G. Synonymous but not the same: the causes and consequences of codon bias. Nat. Rev. Genet. 2011;12:32–42. doi: 10.1038/nrg2899. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Qian W, Yang J-R, Pearson NM, Maclean C, Zhang J. Balanced codon usage optimizes eukaryotic translational efficiency. PLoS Genet. 2012;8:e1002603. doi: 10.1371/journal.pgen.1002603. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Qiu S, Bergero R, Zeng K, Charlesworth D. Patterns of codon usage bias in Silene latifolia. Mol. Biol. Evol. 2011a;28:771–780. doi: 10.1093/molbev/msq251. [DOI] [PubMed] [Google Scholar]
- Qiu S, Zeng K, Slotte T, Wright S, Charlesworth D. Reduced efficacy of natural selection on codon usage bias in selfing Arabidopsis and Capsella species. Genome Biol. Evol. 2011b;3:868–880. doi: 10.1093/gbe/evr085. doi: 10.1093/gbe/evr085. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ran W, Higgs PG. The influence of anticodon-codon interactions and modified bases on codon usage bias in bacteria. Mol. Biol. Evol. 2010;27:2129–2140. doi: 10.1093/molbev/msq102. doi: 10.1093/molbev/msq102. [DOI] [PubMed] [Google Scholar]
- Ran W, Higgs PG. Contributions of speed and accuracy to translational selection in bacteria. PLoS ONE. 2012;7:e51652. doi: 10.1371/journal.pone.0051652. doi: 10.1371/journal.pone.0051652. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rao Y, et al. Mutation bias is the driving force of codon usage in the Gallus gallus genome. DNA Res. 2011;18:499–512. doi: 10.1093/dnares/dsr035. doi: 10.1093/dnares/dsr035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- dos Reis M, Savva R, Wernisch L. Solving the riddle of codon usage preferences: a test for translational selection. Nucleic Acids Res. 2004;32:5036–5044. doi: 10.1093/nar/gkh834. doi: 10.1093/nar/gkh834. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rocha EP. Codon usage bias from tRNA's point of view: redundancy, specialization, and efficient decoding for translation optimization. Genome Res. 2004;14:2279–2286. doi: 10.1101/gr.2896904. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shabalina SA, Spiridonov NA, Kashina A. Sounds of silence: synonymous nucleotides as a key to biological regulation and complexity. Nucleic Acids Res. 2013;41:2073–2094. doi: 10.1093/nar/gks1205. doi: 10.1093/nar/gks1205. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shah P, Gilchrist MA. Explaining complex codon usage patterns with selection for translational efficiency, mutation bias, and genetic drift. Proc. Natl Acad. Sci. 2011;108:10231–10236. doi: 10.1073/pnas.1016719108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sharp PM, Cowe E, Higgins DG, Shields DC, Wolfe KH, Wright F. Codon usage patterns in Escherichia coli Bacillus subtilis Saccharomyces cerevisiae Schizosaccharomyces pombe Drosophila melanogaster and Homo sapiens; a review of the considerable within-species diversity. Nucleic Acids Res. 1988;16:8207–8211. doi: 10.1093/nar/16.17.8207. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sharp PM, Li W-H. The codon adaptation index-a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res. 1987;15:1281–1295. doi: 10.1093/nar/15.3.1281. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sharp PM, Emery LR, Zeng K. Forces that influence the evolution of codon bias. Philos. Trans. R. Soc. Lond. B Biol. Sci. 2010;365:1203–1212. doi: 10.1098/rstb.2009.0305. doi: 10.1098/rstb.2009.0305. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shields DC, Sharp PM, Higgins DG, Wright F. “Silent” sites in Drosophila genes are not neutral: evidence of selection among synonymous codons. Mol. Biol. Evol. 1988;5:704–716. doi: 10.1093/oxfordjournals.molbev.a040525. [DOI] [PubMed] [Google Scholar]
- Singh ND, Davis JC, Petrov DA. X-linked genes evolve higher codon bias in Drosophila and Caenorhabditis. Genetics. 2005;171:145–155. doi: 10.1534/genetics.105.043497. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stergachis AB, Haugen E, Shafer A, Fu W, Vernot B, Reynolds A, et al. Exonic transcription factor binding directs codon choice and affects protein evolution. Science. 2013;342:1367–1372. doi: 10.1126/science.1243490. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stoletzki N. The surprising negative correlation of gene length and optimal codon use–disentangling translational selection from GC-biased gene conversion in yeast. BMC Evol. Biol. 2011;11:93. doi: 10.1186/1471-2148-11-93. doi: 10.1186/1471-2148-11-93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stoletzki N, Eyre-Walker A. Synonymous codon usage in Escherichia coli: selection for translational accuracy. Mol. Biol. Evol. 2007;24:374–381. doi: 10.1093/molbev/msl166. doi: 10.1093/molbev/msl166. [DOI] [PubMed] [Google Scholar]
- Sun X, Yang Q, Xia X. An improved implementation of effective number of codons (nc) Mol. Biol. Evol. 2013;30:191–196. doi: 10.1093/molbev/mss201. doi: 10.1093/molbev/mss201. [DOI] [PubMed] [Google Scholar]
- Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994;22:4673–4680. doi: 10.1093/nar/22.22.4673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Urrutia AO, Hurst LD. Codon usage bias covaries with expression breadth and the rate of synonymous evolution in humans, but this is not evidence for selection. Genetics. 2001;159:1191–1199. doi: 10.1093/genetics/159.3.1191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Urrutia AO, Hurst LD. The signature of selection mediated by expression on human genes. Genome Res. 2003;13:2260–2264. doi: 10.1101/gr.641103. doi: 10.1101/gr.641103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vicario S, Moriyama E, Powell J. Codon usage in twelve species of Drosophila. BMC Evol. Biol. 2007;7:226. doi: 10.1186/1471-2148-7-226. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vinogradov AE. Intron length and codon usage. J. Mol. Evol. 2001;52:2–5. doi: 10.1007/s002390010128. doi: 10.1007/s002390010128. [DOI] [PubMed] [Google Scholar]
- Waldman YY, Tuller T, Keinan A, Ruppin E. Selection for translation efficiency on synonymous polymorphisms in recent human evolution. Genome Biol. Evol. 2011;3:749–761. doi: 10.1093/gbe/evr076. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Warnecke T, Hurst LD. Evidence for a trade-off between translational efficiency and splicing regulation in determining synonymous codon usage in Drosophila melanogaster. Mol. Biol. Evol. 2007;24:2755–2762. doi: 10.1093/molbev/msm210. doi: 10.1093/molbev/msm210. [DOI] [PubMed] [Google Scholar]
- White MA, Ané C, Dewey CN, Larget BR, Payseur BA. Fine-scale phylogenetic discordance across the house mouse genome. PLoS Genet. 2009;5:e1000729. doi: 10.1371/journal.pgen.1000729. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Won YJ, Hey J. Divergence population genetics of chimpanzees. Mol. Biol. Evol. 2005;22:297–307. doi: 10.1093/molbev/msi017. doi: 10.1093/molbev/msi017. [DOI] [PubMed] [Google Scholar]
- Wright F. The ‘effective number of codons’ used in a gene. Gene. 1990;87:23–29. doi: 10.1016/0378-1119(90)90491-9. doi: 0378-1119(90)90491-9 [pii] [DOI] [PubMed] [Google Scholar]
- Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 2007;24:1586–1591. doi: 10.1093/molbev/msm088. doi: 10.1093/molbev/msm088. [DOI] [PubMed] [Google Scholar]
- Yang Z, Nielsen R. Mutation-selection models of codon substitution and their use to estimate selective strengths on codon usage. Mol. Biol. Evol. 2008;25:568–579. doi: 10.1093/molbev/msm284. doi: 10.1093/molbev/msm284. [DOI] [PubMed] [Google Scholar]
- Yang J-R, Chen X, Zhang J. Codon-by-codon modulation of translational speed and accuracy via mrna folding. PLoS Biol. 2014;12:e1001910. doi: 10.1371/journal.pbio.1001910. doi: 10.1371/journal.pbio.1001910. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zeng K, Charlesworth B. Estimating selection intensity on synonymous codon usage in a nonequilibrium population. Genetics. 2009;183:651–662. doi: 10.1534/genetics.109.101782. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zeng K, Charlesworth B. Studying patterns of recent evolution at synonymous sites and intronic sites in Drosophila melanogaster. J. Mol. Evol. 2010;70:116–128. doi: 10.1007/s00239-009-9314-6. doi: 10.1007/s00239-009-9314-6. [DOI] [PubMed] [Google Scholar]
- Zhang Z, Li J, Cui P, Ding F, Li A, Townsend JP, et al. Codon Deviation Coefficient: a novel measure for estimating codon usage bias and its statistical significance. BMC Bioinformatics. 2012;13:43. doi: 10.1186/1471-2105-13-43. doi: 10.1186/1471-2105-13-43. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhao Z, Jin L, Fu YX, Ramsay M, Jenkins T, Leskinen E, et al. Worldwide DNA sequence variation in a 10-kilobase noncoding region on human chromosome 22. Proc. Natl Acad. Sci. USA. 2000;97:11354–11358. doi: 10.1073/pnas.200348197. doi: 10.1073/pnas.200348197. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.