Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2004 May 24;101(22):8396–8401. doi: 10.1073/pnas.0400146101

The distribution of fitness effects caused by single-nucleotide substitutions in an RNA virus

Rafael Sanjuán *,, Andrés Moya *, Santiago F Elena
PMCID: PMC420405  PMID: 15159545

Abstract

Little is known about the mutational fitness effects associated with single-nucleotide substitutions on RNA viral genomes. Here, we used site-directed mutagenesis to create 91 single mutant clones of vesicular stomatitis virus derived from a common ancestral cDNA and performed competition experiments to measure the relative fitness of each mutant. The distribution of nonlethal deleterious effects was highly skewed and had a long, flat tail. As expected, fitness effects depended on whether mutations were chosen at random or reproduced previously described ones. The effect of random deleterious mutations was well described by a log-normal distribution, with -19% reduction of average fitness; the effects distribution of preobserved deleterious mutations was better explained by a β model. The fit of both models was improved when combined with a uniform distribution. Up to 40% of random mutations were lethal. The proportion of beneficial mutations was unexpectedly high. Beneficial effects followed a γ distribution, with expected fitness increases of 1% for random mutations and 5% for preobserved mutations.


Mutation is a double-edged sword. At one side, it is the ultimate source of genetic variation and the raw material for selection to act upon; a genotype with a null mutation rate would be sentenced to extinction because of its inability to respond to environmental perturbations. At the other side, mutations typically lead to reduced fitness and are removed by purifying selection. It is generally assumed that mutation is a blind process, so that living beings cannot benefit from it without suffering its negative consequences, which is why the avoidance of the detrimental consequences of mutation may be as important to survival as the genesis of adaptive novelties. For example, recombination and sex are although to be advantageous to accelerate the fixation of beneficial mutations (1, 2) but also to avoid the accumulation of deleterious mutations (2, 3). Therefore, the distribution of mutational effects on fitness is of fundamental importance for predicting evolutionary dynamics (46). Yet, surprisingly little quantitative information on the distribution of mutational effects exists. A few ambitious studies sought to measure the distribution of mutational effects in Drosophila melanogaster (7, 8), Caenorhabditis elegans (9), and Escherichia coli (10). However, these studies suffer from at least one of the following limitations: (i) they are focused on phenotypic traits of unclear adaptive significance or on viability that represents only one fitness component; (ii) they were done by introducing an unknown number of mutations by chemical mutagenesis or by the accumulation of spontaneous mutations under conditions of relaxed selection; and/or (iii) they focused on particular types of mutations such as gene knock-outs caused by transposon insertion.

A particular key property of RNA viruses is their error-prone replication (11), which is believed to confer them the advantage of great adaptability (12). In fact, RNA viral populations are usually described as molecular quasispecies that replicate near the maximum error rate compatible with the maintenance of the encoded genetic information (13). However, the nature of RNA viral populations does not depend only on mutation rate but also on the distribution of mutational fitness effects (14). Elena and Moya (15) analyzed fitness data for vesicular stomatitis virus (VSV) clones serially transferred throughout bottlenecks (16, 17), finding that the probability density function (pdf) better fitting the data was a complex one in which a minority of clones had fitness values drawn from a [0, 1] uniform, whereas the majority had fitness values sampled from a γ distribution (15). Recently, Lázaro et al. (18) explored the effect of random mutations on the long-term survival of foot-and-mouth disease virus clones subjected to continuous bottlenecks of size one. They found that the distribution of mutational effects was well described by a Weibull pdf, whereas the distribution observed for large, nonevolving populations was best described by a log-normal pdf (18). Regardless of the ground-breaking importance of these studies for evolutionary virology, they suffer from one of the problems mentioned above: the number of mutations fixed per clone and its molecular nature are unknown. Therefore, inferences are only possible for the distribution of accumulated effects. Additionally, sequence analysis has revealed the difficulty of unambiguously establishing the relationship between multiple mutations fixed and fitness (1921).

The goal of this work is to avoid this “black-box” process of mutagenesis by creating a collection of single-nucleotide substitution mutants by site-directed mutagenesis on an infectious VSV cDNA. Then we measure fitness for each member of the collection to infer the statistical properties of the distribution of mutational fitness effects.

Materials and Methods

Site-Directed Mutagenesis. We created a collection of single-nucleotide substitution mutants of VSV. The collection constituted two different sets of mutations. The first contained 48 mutants for which both the site to be changed and the nucleotide to be introduced were chosen randomly. The second contained 43 substitutions already described in wild isolates (22, 23), laboratory populations (19, 20, 2426), or laboratory clones (2730). Mutations were distributed evenly along the genome. Table 3, which is published as supporting information on the PNAS web site, contains information about each mutant.

A full-length infectious cDNA clone (kindly provided by G. T. W. Wertz, University of Alabama at Birmingham, Birmingham) was used as template for creating the collection of mutants (31). Site-directed mutagenesis reactions were performed by using the high-fidelity Pfu DNA polymerase (Promega) to minimize the chance of appearance of undesired mutations (32). The products were digested with DpnI (Stratagene) to remove the parental methylated strands and then transformed into ultracompetent XL-10 Gold cells (Stratagene). Sequencing of the cDNAs was done to confirm that each desired mutation was incorporated successfully.

As a first step, we introduced the substitution A-3853 → C in the plus strand (Asp-259 → Ala substitution in the G surface protein), which confers the ability of growing in the presence of the I1 mAb (MARM phenotype), at concentrations that inhibit wild-type growth (33). This cDNA clone, named MARM RSV, was used as template for the rest of mutagenesis.

Virus Recovery from cDNA Clones. Approximately 105 (90–95% confluent) baby hamster kidney (BHK21) cells (American Type Culture Collection) were infected with a recombinant vaccinia virus, vTF7-3 (American Type Culture Collection), which expressed the T7 RNA polymerase. After incubation, cells were cotransfected with the full-length mutant cDNA clone and three support plasmids that provided in trans the P, L, and N genes of VSV as described by Whelan et al. (31). Transfections were done by using Lipofectamine supplemented with Plus reagent (Invitrogen) and adding 25 μg/ml 1-β-d-arabinofuranosylcytosine to the cultures 6 h postinfection (hpi) to inhibit the replication of vaccinia virus vTF7-3. After 96 hpi, the cultures were frozen and thawed, and the supernatant was harvested. Dilutions (100- to 104-fold) were plated on a fresh monolayer with 0.4% agarose in the overlay DMEM (supplemented with 5% calf serum). The presence of plaque-forming units (PFU) 24 hpi indicated the successful recovery of infectious VSV particles, because vaccinia virus vTF7-3 is unable to produce PFU in such conditions (E. Martínez-Salas, personal communication). Any residual vaccinia virus vTF7-3 particle was removed by filtering the supernatant throughout 0.2-μm membranes (Millipore). Titers of successful transfections ranged between 104 and 106 PFU/ml. Preliminary experiments showed that the accuracy of fitness estimates depended on the titer obtained after the transfection. Therefore, to homogenize the titer of all mutants, 50 μl from the filtered supernatant were used to infect ≈104 cells. After 48 h, cultures were harvested by freezing-thawing and stored in aliquots at -80°C. Titers, estimated by triplicate, were now ≈5 × 106 PFU/ml. Failed transfection experiments were repeated until a positive result was obtained, with a maximum of 10 trials.

Transfection experiments were performed for the whole collection of mutants, the nonmutated wild type, and the MARM RSV clones. A large volume of wild type with a high titer was produced and kept at -80°C. This stock constituted our common competitor for fitness assays.

The MARM phenotype of all mutants, as well as the sensitivity of wild type to I1 mAb, was confirmed by plating assays in which the overlay medium was supplemented with 25% (vol/vol) of antibody.

Relative Fitness Assays. The fitness of each mutant relative to the nonmutated wild type was assessed by seeding ≈2.5 × 103 PFU of each genotype into ≈105 cells. To minimize the probability of fixation of new mutations during competition experiments, they were run for only 12 hpi. Preliminary assays showed that exponential growth occurred during this interval. Samples were taken at 6, 8, 10, and 12 hpi. The titer of both genotypes was determined by plating the appropriate dilution in the presence and absence of I1 mAb. The fitness of each mutant relative to wild type (ω) was estimated as the slope of the linear regression log[NM(t)/NM(t0)] = ωlog[NW(t)/NW(t0)], where NM(·) and NW(·) represent the titer of mutant and wild type, respectively, at the beginning of the infection (t0) and t hpi. Under exponential growth, ω is equal to the ratio of intrinsic growth rates, rM/rW, of the mutant and the wild type, respectively. All assays were replicated in five independent blocks. For each block, fitness was also assayed for the MARM RSV progenitor by triplicate. Fitness estimates of each mutant relative to its progenitor (W) were adjusted by dividing the ω values obtained in each block by the fitness value of MARM RSV estimated in the same block. The average fitness value of MARM RSV relative to wild type was 0.859 ± 0.019 (±1 SEM).

Statistical Analyses. Statistical analyses were performed by using spss 11.5. For the purpose of describing the distribution of mutational effects on fitness, each mutant was treated as an independent observation. The fit of the observed distribution to alternative pdf models was performed by least-squares nonlinear regression. The models chosen share the basic feature that mutations with small effects are more common than mutations with larger effects. Akaike's information criterion (AIC) was used to compare the log likelihood of nonnested models (34). The model that better explains the observations, while requiring the lower number of parameters, is the one with the lower AIC.

Results

Discarding Compensatory Mutations. The study of the distribution of single-nucleotide substitution fitness effects strongly depends on whether each genotype carries only the desired mutation or additional mutations having a fitness effect arise during the early stages of replication and are common to most progeny of a transfection experiment. The number of generations, defined as cycles of cell infection and production of progeny (35), elapsed between the transfection, and the beginning of the competition experiment is low enough (in the range of 1.96–6.13, with a median of 2.92) to preclude compensatory mutations to rise and distort the fitness of single mutants. However, to rule out this potential problem, we took a twofold strategy. First, we ran four independent transfection experiments for five genotypes and competed the resulting viruses against our reference wild type. These genotypes covered the whole distribution of fitness effects. As expected, fitness depended on the mutation introduced (nested ANOVA: F4,15 = 470.614; P < 0.001). If additional compensatory mutations had accumulated before fitness assays, we would expect to detect also differences between transfection experiments. However, there was no evidence supporting this hypothesis (nested ANOVA: F15,80 = 0.975; P = 0.489). Second, we determined the full-length RNA consensus sequence resulting from one transfection experiment for these five genotypes. Not a single unexpected change was observed in three of them. Two of them (originally having nonsynonymous mutations), however, presented one additional synonymous change that obviously has no fitness effect. In conclusion, compensatory mutations occurring before competition experiments do not take place at a noticeable rate.

Assessing the Proportion of Deleterious, Neutral, and Beneficial Mutations. We recovered infectious particles for 67 of 91 mutants. The fitness for each mutant was compared with the neutral value (W = 1) using a one-sample t test, and each mutation was subsequently classified in one of the three categories: deleterious, neutral, and beneficial. Overall, 31 mutations had no significant fitness effect, 32 were deleterious, and 4 were beneficial (Table 1). Two kinds of statistical errors can affect these proportions: (i) rejecting the hypothesis of neutrality when it is actually true (type I error) and (ii) accepting it being actually false (type II error). If all mutations were neutral, we would expect to detect one or two (67 × 0.025) false-deleterious effects as well as one or two false-beneficial effects as a consequence of a type I error. Clearly, this would not be important for the estimated proportion of deleterious mutations. For beneficial mutations, we could apply a multiple test correction, but this enlarges type II errors. Instead, we performed five additional fitness assays for the 10 upper extreme fitness cases, in which the four putative beneficial mutants were included. After additional replication, these four cases remained statistically significant, and another four became so, adding up to a total of eight beneficial mutations. It is noteworthy that these estimates of the proportions of deleterious and beneficial mutations have to be considered as lower bounds, because some of the mutations classified as neutrals could actually have a fitness effect too weak to be detected by our experimental method (type II error).

Table 1. Proportion and number (in parentheses) of lethal, deleterious, neutral, and beneficial effects for random and previously described mutations.

Random
Preobserved
Total
Proportion, % Effect, % Proportion, % Effect, % Proportion, % Effect, %
Lethal 39.6 (19) —100 11.6 (5) —100 26.4 (24) —100
Deleterious 29.2 (14) —24.4 41.9 (18) —16.4 35.2 (32) —19.9
Neutral 27.1 (13) —3.8 32.6 (14) —0.9 29.7 (27) —2.3
Beneficial 4.2 (2) 4.2 14.0 (6) 7.9 8.8 (8) 7.0
Total 100 (48) —47.6 100 (43) —17.7 100 (91) —33.4

For each category, the mean fitness effect is shown.

Dealing with the Existence of Lethal Mutations. Lethal mutations and failed transfection experiments produce the same apparent result: an absence of infectious particles in the supernatant of the transfection. We failed to recover viral particles from the supernatant after 10 trials for 24 mutants. To rule out the possibility of these mutations not being lethal but failed transfection experiments, we estimated our rate of transfection failure as follows. We ran 67 new, independent transfection experiments either with the MARM RSV or wild-type cDNAs. We recovered infectious particles in 39 of these experiments after one trial. Therefore, our rate of failure is 41.8% per transfection experiment. By using this figure, the likelihood of not recovering infectious particles caused by recurrent experimental failure after 10 trials is 0.41810 = 1.63 × 10-4. In a sample of 91 mutants, hence, we expect much less than one case (91 × 1.36 × 10-4 = 0.015) to be assigned erroneously to the category of lethal mutations. In conclusion, we are quite confident that the cases classified as lethal mutations are really so. This possibility is further supported by considering the kind of mutations putatively lethal (Table 3): 19 produced nonsynonymous substitutions, 3 introduced stop codons, and 1 disrupted the initiation codon of the G gene. By contrast, there was only one case of lethal synonymous substitution, 53 nt before the end of M gene. Among random mutations, 40% were putative lethal. For preobserved mutations, although significantly reduced (Fisher's exact test, P < 0.010), this proportion was still 12% (Table 1).

Distribution of Negative Fitness Effects. The average fitness effect for the 51 mutations with effects that were <1.0 (not necessarily significant) but nonlethal was -0.139 ± 0.021. The distribution was highly and significantly skewed toward strongly negative values (g1 = -2.002; t50 = 6.005; P < 0.001), and consequently the median (-0.092) was well above the mean. The distribution was also strongly and significantly leptokurtic (g2 = 4.970; t50 = 7.578; P < 0.001), such that many values lie near the center and in the tail, whereas relatively few have intermediate values. These general properties are valid for both random and preobserved mutations. However, the analysis of fitness distribution needs to be done separately for random and preobserved mutations, because the biological meaning of both data sets is a priori different: the former group reflects pure mutational fitness effects, whereas the latter is influenced by the action of drift and natural selection. As expected, the mean negative fitness effect was larger for random than for preobserved nonsynonymous mutations (Fig. 1; Mann–Whitney test, Z = 2.098; one-tailed, P = 0.018). For synonymous mutations, fitness did not differ from 1 (t test, t8 = 1.197; P = 0.266).

Fig. 1.

Fig. 1.

Frequency of fitness values associated with single-nucleotide substitutions measured for random (A) and previously described (B) mutations.

Because fitness effects are not distributed normally, it becomes necessary to determine which of several alternative models better describes our observations. Table 2 shows the statistics describing the fitting of several models to the negative effects. The first model tested was the exponential distribution. Exponential pdfs have been used for a long time for describing deleterious mutational effects (36), and more recently it has been proposed as a good model for describing beneficial effects as well (3739). The only parameter, λ, is the inverse of the expected value. This model fitted significantly well to random (F1,21 = 2120.132; P < 0.001) and preobserved (F1,20 = 3327.380; P < 0.001) effects, explaining 95.8% and 96.4% of the observed variation, respectively.

Table 2. Fit of the observed distribution of deleterious mutational effects to several models for random and preobserved mutations.

Random
Preobserved
Model Parameters R2 AIC R2 AIC
Exponential 1 0.958 105.047 0.964 106.746
γ 2 0.961 106.866 0.974 110.546
β 2 0.955 109.071 0.977 109.470
Weibull 2 0.962 106.618 0.973 110.527
Log-normal 2 0.974 102.217 0.956 113.200
Exponential + uniform 3 0.968 104.775 0.989 101.083
γ + uniform 4 0.992 92.615 0.996 97.630
β + uniform 4 0.992 94.775 0.996 97.610
Weibull + uniform 4 0.991 92.523 0.993 97.776
Log-normal + uniform 4 0.993 92.382 0.995 98.138

See text for details.

Then we tested several two-parameter models. The first model was the γ distribution (40). A γ distribution is characterized by the scale, α, and the shape, β. The expected value of a γ is β/α. Because the exponential is a particular case of the γ, it is possible to use a partial F test to compare the fit of both models. For preobserved mutations, the γ significantly improved over the exponential distribution (F1,20 = 11.394; P = 0.003). An alternative to the γ is the β distribution. It has a narrower range of values; whereas the domain of application of the γ is 0 ≤ W ≤ +∞, the β is bounded in the range of 0 ≤ W ≤ 1. Therefore, it is especially well suited to model mutational effects. The β distribution is characterized by two shape parameters, α and β. The expected value of a β distribution is α/(α + β). This pdf scored the best fit for preobserved mutational effects. According to AIC, it was better than the γ and other alternative two-parameter models such as the Weibull and the log-normal. The least-squares parameter estimates for the β distribution were α = 0.742 ± 0.049 and β = 5.767 ± 0.526. The expected reduction in fitness was -11.4%, a value that is still 18.0% discrepant with the observed average reduction in fitness. The fit of the β model to the data are shown in Fig. 2A.

Fig. 2.

Fig. 2.

Cumulative frequency distributions for nonlethal deleterious fitness effects associated with single-nucleotide substitutions. The observed distributions are represented by filled circles. (A) Mutations chosen randomly. The continuous line shows the predicted probabilities using a log-normal pdf; the dashed line shows the predicted probabilities using a log-normal + uniform pdf. (B) Previously observed changes. Predicted values using a β pdf are shown with a continuous line; the dashed line shows the probabilities predicted by a β + uniform pdf.

For random mutations the γ did not improve the fit of exponential distribution (F1,20 = 1.468; P = 0.240). Similarly, neither the β nor the Weibull were significantly better than the exponential (larger AIC values; Table 1). The best fit for random mutations was obtained for the log-normal distribution. This model is characterized by a scale parameter, m, and a shape parameter, σ. The least-squares parameter estimates were m = 0.092 ± 0.003 and σ = 1.206 ± 0.067. The expected value for the log-normal distribution, meσ2/2, was a fitness reduction of -19.1%. The fit of this model to the data is shown in Fig. 2B.

Elena et al. (10) proposed that deleterious fitness effects should be explained better by more complex models intended to capture cases with large effects unexplained by simpler distributions. Thus, we tried to combine the above single-distribution models with a uniform pdf. For example, in the case of the exponential, the complex model was p × exp(s|λ) + (1 - p) × Un(s|0, b), with Un(s|0, b) being the uniform pdf in the range [0, b] and p indicating the fraction of mutations sampled from each distribution. The fit of simple models was strongly improved when combined with the uniform distribution, according to partial F tests (all cases P ≤ 0.049). In combination with the uniform pdf, the β distribution again was the best descriptor for preobserved mutations (Table 2 and Fig. 2B), whereas the log-normal remained the best descriptor for random mutations (Table 2 and Fig. 2 A). The consequence of adding a uniform term is to raise up the probability of highly deleterious mutations to occur. In fact, in the case of preobserved mutations, the uniform pdf accounted for >99% of the overall predicted probability for fitness effects beyond -8%, whereas the β pdf explained less deleterious effects. In the case of random mutations, this transition was shifted to a fitness effect of -15%. Under the compound models, the expected mean fitness effects are -10.5% for preobserved mutations and -15.4% for random mutations. However, these values are dominated by the uniform pdf and thus are strongly dependent on the upper bound of this distribution, which in turn is highly dependent on sampling error.

Distribution of Beneficial Fitness Effects. For the 16 mutants showing beneficial effects, the average fitness effect was 0.044 ± 0.012, a value significantly greater than zero (t15 = 3.690; P = 0.002). The distribution was skewed toward small beneficial effects (g1 = 1.744; t15 = 3.091; P = 0.008), with median fitness effect (0.032) below the mean. The distribution was also significantly leptokurtic (g2 = 2.587; t15 = 2.358; P = 0.017). As expected, the mean positive fitness effect was stronger for preobserved mutations than for random mutations (Mann–Whitney test, Z = 2.315; one-tailed, P = 0.010).

Positive fitness effects are much more rare than deleterious ones (Fig. 1), and that is why it is difficult to infer complex distributions from the data. The exponential distribution provided a relatively poor fit to both preobserved and random data sets, leaving unexplained >10% of the total variance (R2 = 0.888 in both cases). The γ distribution provided better fits (R2 = 0.937 for preobserved and R2 = 0.953 for random mutations), although the benefit of including an additional parameter was barely significant (preobserved mutations: F1,7 = 5.532, P = 0.051; random mutations: F1,5 = 6.935; P = 0.046). The fit to alternative two-parameter pdfs provided similar fits (data not shown). The mean beneficial effects according to a γ distribution were 4.6% for preobserved and 1.7% for random mutations. The fit of the γ model to the data are shown in Fig. 3.

Fig. 3.

Fig. 3.

Cumulative frequency distributions for beneficial fitness effects associated with single-nucleotide substitutions measured for random (A) and previously described (B) mutations are shown. The filled circles represent the observed distributions; the accumulated probabilities predicted by using a γ pdf are shown by a continuous line.

Discussion

This work represents a study of the distribution of mutational effects on fitness for an RNA virus using explicit single-nucleotide substitutions. On average, mutations were deleterious even when lethals were ignored. Functional and structural analyses (41, 42) have shown that RNA viruses have a very narrow tolerance to accumulate mutations and still be functional, and thus it is not surprising to find that lethal and deleterious mutations are so common. Additionally, previous indirect approaches (15) estimated that the frequency of deleterious mutations in VSV was ≈34%, a value close to ours (Table 1).

On the other side, we found that among 48 random mutations, two were apparently beneficial. It is generally accepted that beneficial effects are ≈1,000-fold less common that neutral and deleterious ones (6, 39, 43). Therefore, it is striking that two of 48 random mutations were beneficial. However, this result is not so surprising if we recall that we used a chimera genome as template for our mutagenesis experiments. The template cDNA was assembled from clones of each of the VSV genes and intergenic sequences from two different sources. Whereas the N, P, M, and L genes were obtained from the San Juan strain of the Indiana serotype, the G gene was obtained from the Orsay strain of the same serotype (31). At the amino acid level, the divergence between the San Juan and the Orsay G proteins is ≈5%. The question is whether this difference precludes an efficient interaction between the Orsay G protein and the rest of the gene products from the San Juan strain. This being the case, many different possible ways to optimize such genomes are available. Furthermore, the ratio of beneficial to deleterious mutations depends on the degree of adaptation of the virus to the laboratory conditions, which in this case is minimal.

As expected, the mean mutational effects as well as the proportion of lethals were different for the random and preobserved mutation sets. However, the effect of preobserved mutations was still deleterious on average, and in a few cases even lethal (Table 1). This result is not surprising for those changes reported in isolated clones, because RNA virus populations are in a dynamic equilibrium between the input of deleterious variants and purifying selection (13). Additionally, some of these variants could have been hidden from natural selection by genetic complementation, provided that multiplicity of infection was high enough (29, 44). However, 18 of the mutations introduced were not found in isolated clones but in consensus sequence characterized for laboratory populations. Novella et al. (19) sequenced half of the genome of viruses evolved in mammalian cells, insect cells, or alternating between both cell types. A total of 13 nt substitutions were detected, and 2 of them rose independently in viruses isolated from different evolutionary regimes. Interestingly, both convergent mutations conferred increased fitness when recreated in our experiments (Pro-120 → Ala and Leu-123 → Trp both in the M gene), which made them good candidates for conferring a general nonspecific adaptive advantage. All three lineages harbored at least one mutation with a positive fitness effect, but on the other side, all of them also contained at least one mutation with a negative effect, measured in our experimental setup. (The latter are good candidates for environment-specific mutations.) The rise in frequency of deleterious mutations can be explained by hitchhiking with beneficial mutations in a nonrecombining genome. Cuevas et al. (20) found 25 different mutations in 21 independently evolving populations of VSV undergoing adaptive evolution, most of them occurring recurrently in different populations, in a remarkable case of parallel evolution. Among them, we chose 12 nonsynonymous mutations. In at least four of these experimental populations, all the substitutions fixed had a negative fitness effect when introduced in our experiments, and one was even lethal. In contrast, we found only one beneficial mutation. It is therefore naive to expect a predominance of neutral and beneficial effects among preobserved mutations, because fitness effects strongly depend on genotype (epistasis) and environment (20, 45).

Much effort has gone into studying the distribution of deleterious mutational effects in biological systems such as Caenorhabditis (9, 46), Drosophila (40, 4749), E. coli (10), and RNA viruses (15, 18). Using a set of random mutations, we have shown that mutational fitness effects in VSV are well described by a log-normal pdf. Many processes in life sciences such as latent periods of infectious diseases, microorganisms' sensitivity to drug treatments, survival times in medicine, presence of contaminants in the air, or the abundance of species in ecology have been described by using log-normal models (50). In general, this distribution arises when a given variable is determined by multiple multiplicative small effects. Recently, Lázaro et al. (18) showed that the pattern of titer fluctuations in nonevolving foot-and-mouth disease virus populations was log-normally distributed. Such a result was not unexpected, because numerous cellular factors participate in virus replication, each of them having a small effect on the viral yield. However, in their experimental system, these cellular factors could not be distinguished from mutational effects. In contrast, our results unravel the effect of explicit mutations on viral fitness. RNA viruses have a very compact genome such that a given genomic region may be involved in multiple functions, not only as mere carriers of genetic information but as regulatory elements or even ribozymes (21, 51). Consequently, a single-nucleotide change may have strong pleiotropic effects.

For the set of preobserved mutations, we found that deleterious effects were better described by a β pdf, although a γ also gave a very satisfactory fit. Similar distributions, with an exponential-like shape, have been reported previously for different kinds of DNA organisms and RNA viruses (9, 10, 15, 40, 4648). Similarly, the variation of codon substitution rates across viral genomes has been modeled by using β and γ distributions (52, 53). This exponential-like shape, with most of the mutations having very small effects but a few having very large deleterious effects, is explained easily under the action of natural selection simply because mutations with small effects are more influenced by genetic drift and less efficiently eliminated from the population (54). When a uniform pdf was added to two-parameter pdfs, models fitted substantially better to the empirical deleterious fitness effects (Table 2). A compound model in which a proportion p of the mutants is drawn from a uniform distribution and a proportion 1 - p from a γ distribution was the best descriptor for the deleterious fitness effects associated with Tn10 transposition mutations in E. coli (10) and with mutations accumulated by the action of Muller's ratchet in VSV (15).

Studies characterizing the statistical properties of beneficial effects are more scarce than those dealing with deleterious mutations, probably because of the difficulty of isolating beneficial mutations in enough numbers to make trustable statistical inference. Thus far, only two studies using E. coli populations directly tackled this issue. Imhof and Schlötterer (37) reported an exponential distribution for the beneficial mutations that survived drift and reached a detectable frequency in the population. Rozen et al. (38) found an exponential-like distribution among beneficial mutations fixed. However, none of these studies provide information about the actual distribution of all possible beneficial effects. Using extreme value theory, Orr (39) showed that the distribution of beneficial effects has to be exponential independently of the fitness of the wild-type allele. Despite the limited number of mutations with positive effects, our results support the notion that the distribution of beneficial effects is skewed toward low effects and with a long tail of very large beneficial effects. However, the exponential distribution might be improved by more general two-parameter models such as the γ distribution, suggesting that, in analogy to deleterious mutations, the distribution of positive effects shall be not as simple.

Acknowledgments

We thank G. T. W. Wertz for kindly providing the VSV full-length infectious cDNA as well as the three support plasmids. We are indebted to A. V. Bordería, C. López-Galíndez, E. Martínez-Salas, and I. S. Novella for invaluable technical advice. This study was supported by Spanish Ministerio de Ciencia y Tecnología Grant BMC2001-3096 (to A.M.) and Generalitat Valenciana Grant GV01-65 (to S.F.E.). R.S. enjoyed a predoctoral fellowship from the Ministerio de Educación, Cultura y Deporte.

This paper was submitted directly (Track II) to the PNAS office.

Abbreviations: VSV, vesicular stomatitis virus; pdf, probability density function; hpi, hours postinfection; PFU, plaque-forming units; AIC, Akaike's information criterion.

References


Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES