Abstract
Previous studies observed a higher ratio of divergences at nonsynonymous and synonymous sites (ω = dN/dS) in species with a small population size compared to that estimated for those with a large population size. Here we examined the theoretical relationship between ω, effective population size (Ne), and selection coefficient (s). Our analysis revealed that when purifying selection is high, ω of species with small Ne is much higher than that of species with large Ne. However the difference between the two ω reduces with the decline in selection pressure (s → 0). We examined this relationship using primate and rodent genes and found that the ω estimated for highly constrained genes of primates was up to 2.9 times higher than that obtained for their orthologous rodent genes. Conversely, for genes under weak purifying selection the ω of primates was only 17% higher than that of rodents. When tissue specificity was used as a proxy for selection pressure we found that the ω of broadly expressed genes of primates was up to 2.1-fold higher than that of their rodent counterparts and this difference was only 27% for tissue specific genes. Since most of the nonsynonymous mutations in constrained or broadly expressed genes are deleterious, fixation of these mutations is influenced by Ne. This results in a higher ω of these genes in primates compared to those from rodents. Conversely, the majority of nonsynonymous mutations in less-constrained or tissue-specific genes are neutral or nearly neutral and therefore fixation of them is largely independent of Ne, which leads to the similarity of ω in primates and rodents.
Keywords: population size effect, deleterious mutations, amino acid substitutions, gene expression and population genetics theory
THE magnitude of selection constraint on proteins is quantified by the ratio of divergences at nonsynonymous (dN ) and synonymous (dS) positions (ω = dN/dS) (Li 1997). Assuming neutral evolution at synonymous sites, theoretically the ratio ω denotes the fraction of nonsynonymous mutations fixed in the species compared (Nielsen and Yang 2003; Kryazhimskiy and Plotkin 2008). However, ω estimated for species with small population sizes include a fraction of deleterious amino acid substitutions. This is because theories suggest that the rate of fixation of slightly deleterious mutations is determined by the product of the effective population size (Ne) and the selection coefficient (s) (Ohta and Kimura 1971; Ohta 1992). Therefore a much higher fraction of deleterious mutations is expected to be fixed in species with small population size compared to those with large population sizes. Evidence for this prediction was first shown by Ohta (1972), using generation time as a proxy for population size. Ohta used the ratio of divergences at amino acid and nucleotide positions as a measure of selection pressure (assuming constrained and neutral evolution at these sites, respectively) and showed a negative relationship between this ratio and the generation times of species. This suggests a much higher amino acid substitution in species with long generation times or small population sizes, which indicates fixation of a higher fraction of deleterious mutations in these species compared to those with short generation times or large population sizes. Later studies compared the ratio of nonsynonymous to synonymous substitutions (ω) estimated from species with different generation times and observed similar relationships, which confirmed the results of the previous study (Ohta 1993; Keightley and Eyre-Walker 2000; Piganeau and Eyre-Walker 2009). Similarly a higher ω was also observed for endosymbiotic bacteria compared to their free-living counterparts owing to a small population size of the former (Moran 1996; Woolfit and Bromham 2003). Furthermore ω estimated for island birds were found to be much higher than those estimated for their mainland sister taxa, which again is due to the small population sizes imposed by the confined habitat of the former (Johnson and Seger 2001; Woolfit and Bromham 2005).
Using over 7000 genes, a previous genome-wide study estimated ω for primates (human–chimpanzee) and rodents (mouse–rat), which were 0.20 and 0.13, respectively (Mikkelsen et al. 2005). These estimates suggest a 35% higher amino acid substitution in primates compared to rodents and this excess is most likely due to the fixation of deleterious mutations in primates. Further detailed studies on the nuclear genes of mammals and fruit fly (Eyre-Walker et al. 2002) and on the mitochondrial genes of mammals (Eyre-Walker et al. 2002; Popadin et al. 2007; Piganeau and Eyre-Walker 2009) also observed a higher ω in species with small Ne. Furthermore these studies showed that the fraction of radical amino acid substitutions (those involving changes between amino acids with dissimilar biochemical properties) was higher in species with small population sizes compared to those with large population sizes. This also points to the fixation of more deleterious mutations in species with small populations as radical changes are more deleterious than conservative (similar) amino acid changes.
A previous study showed a large difference between ω estimated for primates and rodents using the amino acid sites that are conserved or necessary for the structure and/or function of protein (Subramanian 2011). In contrast, this difference was small using less-conserved or noncritical amino acid positions. This study also revealed a significant difference in the fraction of radical amino acid substitutions between primates and rodents using the amino acids that are conserved across vertebrates and this difference was not significant when less-conserved amino acid sites were examined. These discrepancies were attributed to the difference in the selection constraints on amino acid positions. However, the actual theoretical basis for these observations is not clear. Importantly, we do not know how and to what extent the difference in population sizes influences the ω of various mammalian genes that are under different levels of selection constraints. Furthermore, apart from the structural and functional constraints, breadth of gene expression is also known to be an important determinant of protein evolution (Duret and Mouchiroud 2000; Liao et al. 2006). Therefore, it is essential to examine how population size affects the estimation of ω for genes with different breadths of expression.
In population genetics and evolutionary studies, gene-based ω estimates are routinely used to infer the magnitude of selection pressure on species with different population sizes. Therefore it is important to understand the factors that influence the estimation of ω. Hence, we conducted a detailed study by first examining the theoretical relationship between ω, Ne, and s and showed how Ne modulates ω when s is large and small. We then examined this pattern using the real data and for this purpose we grouped genes from primates and rodents into different categories based on their level of selective constraints. We used two different ways to quantify the negative selection pressure on genes. First, ω of artiodactyls (cow–pig) was assumed as a proxy for selection pressure and used to group the corresponding orthologous genes of primates and rodents. Previous studies have shown that the breadth of gene expression negatively correlates with the rate of protein evolution (Duret and Mouchiroud 2000; Liao et al. 2006). While ubiquitously expressed genes were found to be under high selective constraints, those expressed in only one or a few tissues were under relaxed selective constraints. Therefore, we used breath of expression as a proxy for the magnitude of selection and genes were grouped based on their level of tissue specificity. The ω estimates from each group of orthologous genes for primates and rodents were then compared.
Materials and Methods
Genomic sequence data
Protein and cDNA sequence data of human (Homo sapiens), macaque (Macaca mulatta), cow (Bos taurus), pig (Sus scrofa), mouse (Mus musculus), and rat (Rattus norwegicus) were obtained from GenBank (http://www.ncbi.nlm.nih.gov/genbank/). A reciprocal BLAST (Altschul et al. 1997) hit approach was employed to obtain the genes from each species that are orthologous to human using the significance threshold described by Duret et al. (1994). This resulted in 6633 genes that were orthologous to all six mammalian species. The orthologous protein sequences of each gene were aligned using CLUSTALW (Larkin et al. 2007) and the cDNA alignment for each gene was created using the protein sequence alignment as the guide. Alignment gap(s) were excluded.
Estimation of evolutionary divergence
Evolutionary divergences at synonymous (dS) and nonsynonymous positions (dN ) were estimated by the codeml package of PAML (phylogenetic analysis by maximum likelihood)using the pairwise option (Yang 2007). For each gene, dN and dS were estimated for the orthologous sequences belonging to human–macaq, mouse–rat, and cow–pig pairs. To reduce estimation errors, genes with dN or dS > 0.8 were excluded. Furthermore genes under positive selection (dN/dS > 1.0) were also excluded as this study is focused only on the effect of purifying selection. This resulted in a final data set of 6209 genes. The orthologous genes of human, macaque, mouse, and rat were grouped into 12 categories based on the mean dN/dS estimates of their corresponding orthologous artiodactyls (cow–pig) counterparts, as (number of genes are given in parenthesis): <0.05 (1536), 0.05–0.1 (1290), 0.1–0.15 (983), 0.15–0.2 (748), 0.2–0.25 (512), 0.25–0.3 (353), 0.3–0.35 (257), 0.35–0.4 (170), 0.4–0.45 (111), 0.45–0.5 (77), 0.5–0.55 (64), and >0.55 (108). To reduce the variance all genes belonging to each group were concatenated and a combined dN and dS were estimated using PAML. However, in this method the estimates might be influenced by a small number of highly conserved or highly variable genes in a category. Therefore, we also estimated the variance using a bootstrap method (1000 replicates) in which we resampled (with replacement) the genes belonging to each category. To avoid any methodological bias we also estimated divergence at fourfold and zerofold sites (instead of dN and dS) using the Tamura–Nei method.
Tissue specificity of gene expression
Gene expression data were obtained from a previous large-scale study on 79 human and 61 mouse tissues (Su et al. 2004) and both these estimates were available for 4768 genes. To estimate the breadth of expression, we used the tissue specificity index (τ) described previously (Yanai et al. 2005; Liao et al. 2006). This index τ is defined as
where n is the number of tissues examined, S is the signal intensity obtained from the data, and Smax is the maximum signal intensity of the expression of a gene across all tissues. When signal intensities were obtained for a single gene from more than one probe, average estimates were used. The value τ ranges from 0 to 1 with higher value indicating higher tissue specificity. If a gene is equally expressed in all tissues then τ = 0 and in contrast it will approach 1 when a gene is expressed in only one tissue. This method accounts for the relative intensities of expression in each tissue while estimating the tissue specificity (or breadth of expression). The average of τ for human and mouse was used as a proxy of selection magnitude. All genes were grouped into 10 categories based on their mean τ. This grouping was done by combining genes with τ values of <0.15 (289), 0.15–0.2 (887), 0.2–0.25 (1031), 0.25–0.3 (948), 0.3–0.35 (620), 0.35–0.4 (416), 0.4–0.45 (260), 0.45–0.5 (142), 0.5–0.55 (104), and >0.55 (71). The numbers of genes are given in parentheses.
Proportion of radical amino acid substitutions
Radical and conservative amino acid substitutions are those that occur between amino acids of dissimilar and similar biochemical properties, respectively. Although the 20 amino acids could be grouped several ways (Zhang 2000), a previous study showed that one particular grouping (classification A) performed better than others in capturing the magnitude of selection constraints on mammalian protein coding genes (Hanada et al. 2007). Therefore, the current study used this classification, in which amino acids were grouped into four categories, namely, acidic (D, E), basic (R, Q, H, K, F, W, Y), neutral small (A, N, C, G, P, S, T), and neutral large (I, L, M, V). All substitutions involving amino acids belonging to different groups were considered radical and those from within groups were considered conservative in nature. The fraction of radical amino acid substitution (ρ) was computed as the number of radical changes divided by the number of all amino acid differences between two genomes. A binomial variance was used to compute the standard error.
Results
Theoretical relationship between ω, population size (Ne), and selection intensity (s)
Using Kimura’s formula on the probability of fixation (Kimura 1983), Nielsen and Yang (2003) showed the relationship between dN/dS ratio (ω) and selection coefficient (s) as
| (1) |
where S = 4Nes. Now let us assume that Ne of species A and B are two times higher than that of species C and D and s is the mean selection coefficient on the nonsynonymous sites of a gene or a collection of genes. Assuming neutral evolution in synonymous sites we plotted ω as a function of 4Nes (Figure 1A). For simplicity we used the values of 0–0.001 for selection coefficient (s) and 1000 and 500 as effective population sizes (Ne) of the species pairs AB and CD, respectively. The theoretical relationship clearly shows that when s or the magnitude of negative selection is high, the difference between the ω of the species pairs AB and CD is large, whereas this difference reduces exponentially when s approaches zero (s → 0) and both ω attain a value of 1 at s = 0. Figure 1B shows that the proportion of ω estimated for the species pair AB is very small when s is −0.001. This fraction increases and ω obtained for the two species pairs become equal when s = 0. The magnitude of difference between the ω of the species pairs AB and CD is revealed by Figure 1C. The ratio of ωAB/ωCD shows a negative relationship with selection coefficient. Importantly this ratio is only 0.24 when s is −0.001, which is more than four times smaller than the ratio (1.0) at s = 0. These results highlight the importance of Ne on the fixation of deleterious mutations. When selection constraint is high on a gene, a higher fraction of nonsynonymous mutations are eliminated in species with large population sizes than in those with small Ne. On the contrary, for the genes under neutral evolution, the fractions of nonsynonymous mutations fixed are similar between the species with small and large Ne.
Figure 1.
Theoretical relationship between ω (dN/dS), effective population size (Ne) and selection coefficient (s) using Equation 1. (A) Two ω are plotted as functions of 4Nes, assuming the population sizes of 1000 and 500 for the species pair AB and CD, respectively. (B) The stacked columns show ω estimated for 10 different values of s ranging from −0.001 to 0. (C) Correlation between selection coefficient (s) and the ratio of ωCD/ωAB.
Comparison of observed ω for orthologous genes from primates and rodents
To examine the effect of population size on the fixation of deleterious mutations in genes under different levels of selection pressure, we examined the ω for genes from primates and rodents. For this purpose the orthologous genes of human and mouse need to be grouped based on the level of selective constraints on them. To group these genes in an unbiased way we used the ω of the orthologous genes of artiodactyls (see Materials and Methods) and clustered them into 12 different categories (Table 1). Here the ω of artiodactyls (ωArt) is actually used as a proxy for the magnitude of selection pressure on orthologous genes from primates and rodents. The stacked columns in Figure 2A show the mean ω estimates for rodents (ωRod) and primates (ωPri). In general ωPri estimates were much higher than ωRod for highly constrained genes whereas these estimates were more similar for genes under relaxed constraint. This is clear from the strong positive relationship (Spearman’s ρ = 0.99, P = 0.001) between selection pressure (ωArt) and the ratio ωRod/ωPri (Figure 2B). The estimates for the genes under strong selective constraint (ωArt = 0.025) showed that ω of primate is 2.9 times higher than that of rodents (Table 1). In contrast, ωPri was only 17% higher than ωRod for the genes under relaxed selective constraint (ωArt = 0.659), suggesting a fraction of nonsynonymous substitutions in constrained primate genes much higher than that in genes under weak purifying selection.
Table 1. Mean estimates of ω for primates (human-macaque), rodents (mouse-rat) and artiodactyls (cow-pig).
| ωArta | Genes | ωPri (SE) | ωRod (SE) | ωRod/ωPri (SE)b | δc |
|---|---|---|---|---|---|
| 0.025 | 1536 | 0.144 (0.001) | 0.050 (0.001) | 0.349 (0.016) | 0.65 |
| 0.074 | 1290 | 0.189 (0.002) | 0.099 (0.001) | 0.527 (0.019) | 0.47 |
| 0.124 | 983 | 0.240 (0.002) | 0.143 (0.001) | 0.594 (0.017) | 0.41 |
| 0.173 | 748 | 0.285 (0.003) | 0.188 (0.001) | 0.659 (0.018) | 0.34 |
| 0.223 | 512 | 0.330 (0.004) | 0.215 (0.002) | 0.653 (0.021) | 0.35 |
| 0.273 | 353 | 0.360 (0.005) | 0.243 (0.002) | 0.677 (0.024) | 0.32 |
| 0.324 | 257 | 0.400 (0.007) | 0.277 (0.003) | 0.691 (0.028) | 0.31 |
| 0.374 | 170 | 0.442 (0.009) | 0.305 (0.004) | 0.691 (0.031) | 0.31 |
| 0.423 | 111 | 0.471 (0.012) | 0.343 (0.006) | 0.727 (0.036) | 0.27 |
| 0.470 | 77 | 0.500 (0.015) | 0.379 (0.008) | 0.757 (0.045) | 0.24 |
| 0.525 | 64 | 0.532 (0.018) | 0.422 (0.010) | 0.793 (0.039) | 0.21 |
| 0.659 | 108 | 0.551 (0.014) | 0.470 (0.008) | 0.852 (0.032) | 0.15 |
All orthologous genes were sorted based on the ω of artiodactyls and grouped into 10 categories. The average estimates are given above (see Materials and Methods).
Standard error was estimated using a bootstrap procedure of sampling genes (see Materials and Methods).
.
Figure 2.
Relationship between the intensity of selection pressure and ω of primates and rodents. (A) Mean ω shown were estimated for the human–macaq (ωPri) and mouse–rat (ωRod) pairs by grouping genes into 12 categories based on the ω estimates of the orthologous genes of artiodactyls (ωArt), which was used as a proxy for selection pressure. (B) Positive correlation between selection pressure (ωArt) and ωRod/ωPri ratios is shown. Error bars show the standard error estimated using a bootstrap procedure of resampling genes.
In the above analysis dN and dS were estimated using a likelihood method implemented in PAML. However, similar results were obtained when we estimated divergences at fourfold and zerofold degenerate sites (as proxies for dN and dS, respectively) using the Tamura–Nei method. For instance this method also produced a highly significant positive relationship (Spearman’s ρ = 0.96, P = 0.0014) between selection pressure (ωArt) and the ratio ωRod/ωPri. Inclusion of duplicate genes in the analysis might have some influence on the overall results because occasionally paralogous genes might interfere in orthology determination. Therefore, we excluded all genes belonging to multigene families and used only the singleton genes from primates and rodents. Although this removed 78% of the genes in our data set we observed similar results. For example, ωPri (0.151) of constrained singleton genes was 2.8 times higher than that of ωRod (0.054). Whereas, for weakly selected singleton genes, ωPri (0.466) was only 39% higher than that of ωRod (0.335).
Tissue specificity of gene expression as a proxy for selection pressure
A previous study using expressed sequence tag data (EST) revealed a negative relationship between breadth of gene expression and ω (Duret and Mouchiroud 2000; Zhang and Li 2004). This was further confirmed by a later study based on microarray data that showed a positive relationship between tissue specificity and ω (Liao et al. 2006). For this reason we used tissue specificity as a proxy for selection intensity to group primate and rodent genes. Over 4500 orthologous genes were grouped into 10 categories based on the mean tissue specificity measure (τ) obtained for human and mouse (see Materials and Methods). The mean ω estimates for each group of genes showed results similar to our previous observations (Figure 3A). A strong positive relationship (Spearman’s ρ = 0.99, P = 0.003) between tissue specificity (τ) and ωRod/ωPri ratio is shown in Figure 3B. The ωPri of broadly expressed primate genes (τ = 0.13) was 2.1 times higher than ωRod estimated for the corresponding orthologous genes from rodents (Table 2). On the contrary, ωPri of tissue-specific genes (τ = 0.592) was only 27% higher than their rodent counterparts (ωRod). These results suggest that broadly expressed primate genes accumulate relatively more nonsynonymous substitutions than the genes expressed in one or a few tissues.
Figure 3.
Tissue specificity (τ) and ω estimates of primate and rodent genes. (A) Average ω were estimated by grouping genes into 10 categories based on their breadth of expression. (B) Relationship between tissue specificity (τ) vs. ωRod/ωPri ratio. Error bars are the SE estimated using a bootstrap method.
Table 2. Tissue specificity and evolutionary divergence estimates for primates and rodents.
| Tissuea specificity (τ) | Genes | ωPri (SE) | ωRod (SE) | ωRod/ωPri (SE)b | δc |
|---|---|---|---|---|---|
| 0.130 | 289 | 0.175 (0.004) | 0.083 (0.002) | 0.474 (0.044) | 0.53 |
| 0.176 | 887 | 0.223 (0.002) | 0.121 (0.001) | 0.544 (0.021) | 0.46 |
| 0.225 | 1031 | 0.229 (0.002) | 0.132 (0.001) | 0.578 (0.018) | 0.42 |
| 0.275 | 948 | 0.246 (0.002) | 0.150 (0.001) | 0.611 (0.027) | 0.39 |
| 0.323 | 620 | 0.240 (0.003) | 0.155 (0.001) | 0.648 (0.028) | 0.35 |
| 0.373 | 416 | 0.248 (0.003) | 0.159 (0.002) | 0.639 (0.033) | 0.36 |
| 0.423 | 260 | 0.277 (0.005) | 0.181 (0.002) | 0.654 (0.014) | 0.35 |
| 0.473 | 142 | 0.275 (0.007) | 0.183 (0.004) | 0.666 (0.042) | 0.33 |
| 0.521 | 104 | 0.324 (0.009) | 0.226 (0.005) | 0.698 (0.051) | 0.30 |
| 0.592 | 71 | 0.337 (0.012) | 0.266 (0.007) | 0.790 (0.051) | 0.21 |
All orthologous genes were sorted based on the level of tissue specificity (τ) and grouped into 10 categories. The average estimates are given above (see Materials and Methods).
Standard error was estimated using a bootstrap procedure of sampling genes (see Materials and Methods).
.
Severity of amino acid substitutions in constrained primate and rodent genes
Previous results suggested an increase in the amino acid substitutions in the constrained genes of primates compared to those from rodents. To examine whether these excess substitutions are due to the fixation of deleterious mutations in primates we examined the nature of the amino acid changes. It is well known that mutations involving amino acids with different biochemical properties (radical) are more deleterious than those involving amino acids with similar properties (conservative) (Williamson et al. 2005). To determine the proportion of radical amino acid substitutions we used the matrix suggested by Hanada et al. (2007) (see Materials and Methods). The proportion of radical amino acid substitutions (ρ) was estimated for primate (ρPri) and rodent (ρRod) genes belonging to the 12 categories based on selective constraints (ωArt). The fraction of radical substitutions in primates is (ρPri) slightly higher for constrained genes than that for genes under weak selection (Figure 4A). However, an increasing trend of ρRod was observed for the rodent genes. The ratio ρRod/ρPri showed a strong positive relationship (Spearman’s ρ = 0.97, P = 0.001) with selection intensity (ωArt) (Figure 4B). The ρPri (0.524) estimated for highly constrained genes was 54% higher than ρRod (0.341) and the difference is highly significant (P < 10−6), whereas ρPri (0.506) and ρRod (0.500) estimates are almost identical (P = 0.56) for the genes under weak purifying selection. These results reveal a much higher rate of fixation of radical mutations in the primate genes under strong selection than that of their orthologus counterparts in rodents. In contrast, this rate was similar for primate and rodent genes under weak selection pressure. Similar results were observed when tissue specificity (τ) was used as a proxy for selection pressure (Figure 4C). The ratio ρRod/ρPri positively correlated with τ (Spearman’s ρ = 0.98, P = 0.003) (Figure 4D). ρPri (0.510) estimated for broadly expressed primate genes was 42% higher than ρRod (0.359) estimated for broadly expressed rodent genes and this difference was statistically significant (P < 10−6). On the contrary, the difference between ρPri (0.457) and ρRod (0.445) estimated for tissue-specific genes of primates and rodents was not significant (P = 0.46). These results confirm that the excess amino acid substitutions in constrained or broadly expressed primate genes are indeed deleterious in nature.
Figure 4.
The fraction of radical amino acid substitutions in primates (ρPri) and rodents (ρRod) was estimated using genes under different levels of selective constraint. ωArt (A) and τ (C) were used as proxies for the magnitude of selection pressure. Correlations of ωArt vs. ρRod/ρPri (B) and τ vs. ρRod/ρPri (D) are shown. Error bars denote the standard error.
Discussion
In this study we showed that how population size effect could influence the dN/dS ratios estimated from different species. We showed that the observed difference in these ratios between primates and rodents (Figure 2) was strikingly similar to the patterns predicted by theoretical expectations (Figure 1). However, the theoretical relationship shown in Figure 1 is under a number of assumptions. For instance, s was assumed to be the same for the orthologous genes from species with different Ne, which may not be entirely true as s itself could vary to some extent between species. Furthermore, Ne is also known to vary within a genome, and Ne for a particular location in a genome is influenced by the effects of recombination, background selection, and hitchhiking (Hill–Robertson effects) (Charlesworth 2009). However, the similarity of the expected and observed patterns observed in this study suggests that these effects are not large enough to affect the conclusions of this study. The slight difference in the pattern of nonlinear curves fitted for the expected (Figure 1C) and observed (Figure 2B and Figure 3B) relationships could be due to the reasons mentioned above and possibly other variables that were not considered here.
The proportion of deleterious substitutions in primates can be quantified by estimating the normalized difference between ωPri and ωRod. The difference in the ω ratios between species with small and large population sizes reflects the excess fraction of nonsynonymous mutation fixed in species with small population sizes. Theories predict these excess substitutions to be deleterious in nature. Based on this rationale, the proportion of deleterious amino acid mutations fixed in primates (δ) can be estimated as (Subramanian 2011). The estimate δ is the proportion of deleterious nonsynonymous substitutions of all the amino acid replacement mutations fixed in humans. Table 1 shows that δ estimated for highly constrained genes is over fourfold higher than that observed for the genes under relaxed constraint (65% vs. 15%). Similarly Table 2 shows that the observed δ of broadly expressed genes (53%) is 2.5 times higher than that estimated for tissue specific genes (21%). Furthermore, the high rate of fixation of deleterious mutations in primates is also supported by the observed higher fraction of radical amino acid substitutions than that of the conservative substitutions.
In this study we assumed similar levels of selection pressure on the orthologous genes of primates, rodents, and artiodactyls. In fact all pairwise comparisons of ω between the three species showed a very high positive correlation (r = 0.57–0.67, P < 10−7). The results of this study are under the assumption that the fraction of adaptive nonsynonymous substitutions in mammals is negligible. Since fixation of slightly beneficial mutations is determined by Nes, a much higher proportion of adaptive mutations is expected to be fixed in rodents than primates due to the large Ne of the former. Hence, assuming a much higher fraction of adaptive substitutions in mammals will only increase the difference between ω estimates of primates and rodents. Similarly, this study assumes that substitutions at synonymous sites are neutral. If selection is assumed at these sites (Chamary et al. 2006), then the observed dS is less than that expected under neutrality and thus ω will be overestimated. However, the magnitude of selection at synonymous sites is expected to be higher for rodents than for primates due to the relatively high Ne of the former. Therefore, the magnitude of overestimation of ω will be much higher for rodents than primates, which will further increase the difference between ω of primates and rodents. Hence our assumptions made the results of this study conservative in nature.
Population genetic theories suggest that purifying selection prevents deleterious mutations from becoming fixed. Therefore, highly constrained genes are expected to be devoid of deleterious substitutions. Contrary to this belief, this study suggests that a high fraction of deleterious mutations are in fact fixed in constrained primate genes. However, the results of this study could be explained based on the frequency of occurrence of deleterious mutations. Since the vast majority of the nonsynonymous positions in constrained genes are under high purifying selection, most of the mutations occurring in these genes are detrimental to the structure and/or function of the protein. As shown in Figure 1A, fixation of deleterious mutations is strongly influenced by Ne and, therefore, ω of species with small Ne is much higher than that of species with large Ne. Conversely, most of the amino acid changing mutations in genes under weak selection are neutral or nearly neutral and fixation of these mutations is largely independent of Ne. Therefore, ω estimated for the weakly selected genes of species with large and small population sizes are largely similar.
The effect of small population size in the fixation of mutations has been well documented by studies in the past few decades (Ohta 1993; Keightley and Eyre-Walker 2000; Popadin et al. 2007; Piganeau and Eyre-Walker 2009). However, this study reveals that the population-size effect on the fixation of deleterious mutations is typically more pronounced in genes under strong purifying selection and this effect is only marginal for the less-constrained genes. The findings of this study suggest that when comparing the ω between species, the selection intensity of the genes should be considered. For example, comparing ω of weakly selected genes from two species will smother the actual difference between their population sizes. Although the result of this study is based on protein-coding genes, a similar pattern is expected for constrained noncoding regions such as UTRs, promotors, enhancers, and silencers.
Acknowledgment
The author is grateful to David Lambert and acknowledges the support from Environmental Futures Centre, Griffith University. I thank Caitlin Curtis, Leon Huynen, and Bhuvana for critical comments.
Footnotes
Communicating editor: J. Wall
Literature cited
- Altschul S. F., Madden T. L., Schaffer A. A., Zhang J., Zhang Z., et al. , 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25: 3389–3402. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chamary J. V., Parmley J. L., Hurst L. D., 2006. Hearing silence: non-neutral evolution at synonymous sites in mammals. Nat. Rev. Genet. 7: 98–108. [DOI] [PubMed] [Google Scholar]
- Charlesworth B., 2009. Fundamental concepts in genetics: effective population size and patterns of molecular evolution and variation. Nat. Rev. Genet. 10: 195–205. [DOI] [PubMed] [Google Scholar]
- Duret L., Mouchiroud D., 2000. Determinants of substitution rates in mammalian genes: expression pattern affects selection intensity but not mutation rate. Mol. Biol. Evol. 17: 68–74. [DOI] [PubMed] [Google Scholar]
- Duret L., Mouchiroud D., Gouy M., 1994. HOVERGEN: a database of homologous vertebrate genes. Nucleic Acids Res. 22: 2360–2365. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eyre-Walker A., Keightley P. D., Smith N. G., Gaffney D., 2002. Quantifying the slightly deleterious mutation model of molecular evolution. Mol. Biol. Evol. 19: 2142–2149. [DOI] [PubMed] [Google Scholar]
- Hanada K., Shiu S. H., Li W. H., 2007. The nonsynonymous/synonymous substitution rate ratio vs. the radical/conservative replacement rate ratio in the evolution of mammalian genes. Mol. Biol. Evol. 24: 2235–2241. [DOI] [PubMed] [Google Scholar]
- Johnson K. P., Seger J., 2001. Elevated rates of nonsynonymous substitution in island birds. Mol. Biol. Evol. 18: 874–881. [DOI] [PubMed] [Google Scholar]
- Keightley P. D., Eyre-Walker A., 2000. Deleterious mutations and the evolution of sex. Science 290: 331–333. [DOI] [PubMed] [Google Scholar]
- Kimura M., 1983. The Neutral Theory of Molecular Evolution. Cambridge University Press, Cambridge, UK. [Google Scholar]
- Kryazhimskiy S., Plotkin J. B., 2008. The population genetics of dN/dS. PLoS Genet. 4: e1000304. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Larkin M. A., Blackshields G., Brown N. P., Chenna R., McGettigan P. A., et al. , 2007. Clustal W and Clustal X version 2.0. Bioinformatics 23: 2947–2948. [DOI] [PubMed] [Google Scholar]
- Li W. H., 1997. Molecular Evolution. Sinauer, Sunderland, MA. [Google Scholar]
- Liao B. Y., Scott N. M., Zhang J., 2006. Impacts of gene essentiality, expression pattern, and gene compactness on the evolutionary rate of mammalian proteins. Mol. Biol. Evol. 23: 2072–2080. [DOI] [PubMed] [Google Scholar]
- Mikkelsen T. S., Hillier L. W., Eichler E. E., Zody M. C., Jaffe D. B., et al. , 2005. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437: 69–87. [DOI] [PubMed] [Google Scholar]
- Moran N. A., 1996. Accelerated evolution and Muller’s rachet in endosymbiotic bacteria. Proc. Natl. Acad. Sci. USA 93: 2873–2878. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nielsen R., Yang Z., 2003. Estimating the distribution of selection coefficients from phylogenetic data with applications to mitochondrial and viral DNA. Mol. Biol. Evol. 20: 1231–1239. [DOI] [PubMed] [Google Scholar]
- Ohta T., 1972. Evolutionary rate of Cistrons and DNA divergence. J. Mol. Evol. 1: 150–157. [DOI] [PubMed] [Google Scholar]
- Ohta T., 1992. The nearly neutral theory of molecular evolution. Annu. Rev. Ecol. Syst. 23: 263–286. [Google Scholar]
- Ohta T., 1993. An examination of the generation-time effect on molecular evolution. Proc. Natl. Acad. Sci. USA 90: 10676–10680. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ohta T., Kimura M., 1971. On the constancy of the evolutionary rate of Cistrons. J. Mol. Evol. 1: 18–25. [DOI] [PubMed] [Google Scholar]
- Piganeau G., Eyre-Walker A., 2009. Evidence for variation in the effective population size of animal mitochondrial DNA. PLoS ONE 4: e4396. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Popadin K., Polishchuk L. V., Mamirova L., Knorre D., Gunbin K., 2007. Accumulation of slightly deleterious mutations in mitochondrial protein-coding genes of large vs. small mammals. Proc. Natl. Acad. Sci. USA 104: 13390–13395. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Su A. I., Wiltshire T., Batalov S., Lapp H., Ching K. A., et al. , 2004. A gene atlas of the mouse and human protein-encoding transcriptomes. Proc. Natl. Acad. Sci. USA 101: 6062–6067. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Subramanian S., 2011. Fixation of deleterious mutations at critical positions in human proteins. Mol. Biol. Evol. 28: 2687–2693. [DOI] [PubMed] [Google Scholar]
- Williamson S. H., Hernandez R., Fledel-Alon A., Zhu L., Nielsen R., et al. , 2005. Simultaneous inference of selection and population growth from patterns of variation in the human genome. Proc. Natl. Acad. Sci. USA 102: 7882–7887. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Woolfit M., Bromham L., 2003. Increased rates of sequence evolution in endosymbiotic bacteria and fungi with small effective population sizes. Mol. Biol. Evol. 20: 1545–1555. [DOI] [PubMed] [Google Scholar]
- Woolfit M., Bromham L., 2005. Population size and molecular evolution on islands. Proc. Biol. Sci. 272: 2277–2282. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yanai I., Benjamin H., Shmoish M., Chalifa-Caspi V., Shklar M., et al. , 2005. Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification. Bioinformatics 21: 650–659. [DOI] [PubMed] [Google Scholar]
- Yang Z., 2007. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24: 1586–1591. [DOI] [PubMed] [Google Scholar]
- Zhang J., 2000. Rates of conservative and radical nonsynonymous nucleotide substitutions in mammalian nuclear genes. J. Mol. Evol. 50: 56–68. [DOI] [PubMed] [Google Scholar]
- Zhang L., Li W. H., 2004. Mammalian housekeeping genes evolve more slowly than tissue-specific genes. Mol. Biol. Evol. 21: 236–239. [DOI] [PubMed] [Google Scholar]




