Skip to main content
PLOS Computational Biology logoLink to PLOS Computational Biology
. 2011 Aug 18;7(8):e1002134. doi: 10.1371/journal.pcbi.1002134

Evolutionary Accessibility of Mutational Pathways

Jasper Franke 1, Alexander Klözer 1, J Arjan G M de Visser 2, Joachim Krug 1,*
Editor: Claus O Wilke3
PMCID: PMC3158036  PMID: 21876664

Abstract

Functional effects of different mutations are known to combine to the total effect in highly nontrivial ways. For the trait under evolutionary selection (‘fitness’), measured values over all possible combinations of a set of mutations yield a fitness landscape that determines which mutational states can be reached from a given initial genotype. Understanding the accessibility properties of fitness landscapes is conceptually important in answering questions about the predictability and repeatability of evolutionary adaptation. Here we theoretically investigate accessibility of the globally optimal state on a wide variety of model landscapes, including landscapes with tunable ruggedness as well as neutral ‘holey’ landscapes. We define a mutational pathway to be accessible if it contains the minimal number of mutations required to reach the target genotype, and if fitness increases in each mutational step. Under this definition accessibility is high, in the sense that at least one accessible pathway exists with a substantial probability that approaches unity as the dimensionality of the fitness landscape (set by the number of mutational loci) becomes large. At the same time the number of alternative accessible pathways grows without bounds. We test the model predictions against an empirical 8-locus fitness landscape obtained for the filamentous fungus Aspergillus niger. By analyzing subgraphs of the full landscape containing different subsets of mutations, we are able to probe the mutational distance scale in the empirical data. The predicted effect of high accessibility is supported by the empirical data and is very robust, which we argue reflects the generic topology of sequence spaces. Together with the restrictive assumptions that lie in our definition of accessibility, this implies that the globally optimal configuration should be accessible to genome wide evolution, but the repeatability of evolutionary trajectories is limited owing to the presence of a large number of alternative mutational pathways.

Author Summary

Fitness landscapes describe the fitness of related genotypes in a given environment, and can be used to identify which mutational steps lead towards higher fitness under particular evolutionary scenarios. The structure of a fitness landscape results from the way mutations interact in determining fitness, and can be smooth when mutations have multiplicative effect or rugged when interactions are strong and of opposite sign. Little is known about the structure of real fitness landscapes. Here, we study the evolutionary accessibility of fitness landscapes by using various landscape models with tunable ruggedness, and compare the results with an empirical fitness landscape involving eight marker mutations in the fungus Aspergillus niger. We ask how many mutational pathways from a low-fitness to the globally optimal genotype are accessible by natural selection in the sense that each step increases fitness. We find that for all landscapes with lower than maximal ruggedness the number of accessible pathways increases with increases of the number of loci involved, despite decreases in the accessibility for each pathway individually. We also find that models with intermediate ruggedness describe the A. niger data best.

Introduction

Mutations are the main sources of evolutionary novelty, and as such constitute a key driving force in evolution. They act on the genetic constitution of an organism at very different levels, from single nucleotide substitutions to large-scale chromosomal modifications. Selection, a second major evolutionary force, favors organisms best adapted to their respective surroundings. Selection acts on the fitness of the organism. How fitness is connected to specific traits such as reproduction or survival depends strongly on the environmental conditions, but indirectly it can be viewed as a function of the organism's genotype.

If one considers mutations at more than one locus, it is not at all clear how they combine in their final effect on fitness. Two mutations that individually have no significant effect on a trait under selection can in combination be highly advantageous or deleterious. Well known examples for such epistatic interactions [1] include resistance evolution in pathogens [2][4] or metabolic changes in yeast [5]. In general, the presence of epistatic interactions makes the fitness landscape more rugged, particularly when epistasis affects the sign of the fitness effects of mutations [6][8]. Fitness landscapes are most easily dealt with in the context of asexual haploid organisms, and we will restrict our considerations here to this case.

In a remarkable recent development, several experimental studies have probed the effect of epistatic interactions on fitness landscapes [3], [4], [7], [9][16]. Most of these studies are based on two genotypes, one that is well adapted to the given environment, and another that differs by a known, small set of mutations; the largest landscapes studied so far involve five mutations [3], [10], [16]. All (or some fraction of the) intermediate genotypes are then constructed and their fitness measured. However, selection in natural populations does not act on small, carefully selected sets of mutations, but rather on all possible beneficial mutations that occur anywhere in the genome, making the number of possible mutations many orders of magnitude greater than those considered in empirical studies.

Figure 1 shows three sample landscapes obtained from an empirical 8-locus data set of fitness values for the fungus Aspergillus niger originally obtained in [17] (see Materials and Methods for details on the data set and its representations). These landscapes display a wide variation in topography, and despite their moderate size of Inline graphic genotypes, the combinatorial proliferation of possible mutational pathways makes it difficult to infer the adaptive fate of a population without explicit simulation [10]. In fact, in view of the broad range of possible landscape topographies, even a thorough understanding of evolution on one of these landscapes would be of limited use when confronted with another subset of mutations or even fitness landscapes from a different organism. Instead, one would like to understand and quantify the typical features of ensembles of fitness landscape, where an ensemble can be formed e.g. by selecting different subsets of mutations from an empirical data set, or by generating different realizations of a random landscape model.

Figure 1. Graphical representation of three fitness landscapes of size Inline graphic extracted from the empirical 8-locus fitness data set for A. niger.

Figure 1

The presence/absence of a given mutation is indicated by 1/0. Arrows point towards higher fitness, local maxima are enlarged and underlined, and colors mark basins of attraction of maxima under a greedy (steepest ascent) adaptive walk. (A) All combinations of mutations argH12, pyrA5, leuA1, oliC2. This landscape has a single fitness maximum (the wildtype), but only 9 out of 4! = 24 paths from {1111} to {0000} are accessible. (B) Mutations argH12, pyrA5, leuA1, pheA1. This landscape has three maxima and no accessible path. (C) Mutations fwnA1, leuA1, oliC2, crnB12. The landscape has four maxima and 2 accessible paths.

Although genome-wide surveys of pairwise epistatic interactions have recently become feasible [18], exploring an entire fitness landscape on a genome-wide scale remains an elusive goal. In this situation theoretical considerations are indispensable to assess the influence of epistasis on the outcome of evolutionary adaptation. Here, we aim to perform part of this task by answering the following question: Does epistasis make the global fitness optimum selectively inaccessible?

This question has a long history in evolutionary theory, and two contradictory intuitions can be discerned in the still ongoing debate [1]. One viewpoint generally attributed to Fisher [19] emphasizes the proliferation of mutational pathways in high dimensional genotype spaces to argue that, because of the sheer number of possible paths, accessibility will remain high. The second line of argument originally formulated by Wright [20], and more recently promoted by Kauffman [21] and others, focuses instead on the proliferation of local fitness maxima, which present obstacles to adaptation and reduce accessibility with increasing genotypic dimensionality. Here we show that both views are valid at a qualitative level, but that Fisher's scenario prevails on the basis of a specific, quantitative definition of accessibility, since the number of accessible pathways grows much faster with landscape dimensionality than the inaccessibility per pathway as long as the fitness landscape is not completely uncorrelated. Moreover, our analysis of accessibility in the empirical A. niger data set illustrated in Figure 1 shows how evolutionary accessibility can be used to quantify the degree of sign epistasis in a given fitness landscape.

Mathematical framework

The dynamics of adaptation of a haploid asexual population on a given fitness landscape is governed by population size Inline graphic, selection strength Inline graphic and mutation rate Inline graphic, and different regimes for these parameters have been identified [22][24]. Here we assume a ‘strong-selection/weak mutation’ (SSWM) regime [25], [26], which implies that mutations are selected one by one and prohibits the populations from crossing valleys of fitness. In natural populations of sufficient size, a number of double mutants is present at all times, and the crossing of fitness valleys can be relatively facile [27], [28]; the SSWM assumption may therefore seem overly restrictive. However, we will see that even under these conditions, the landscapes considered are typically very accessible.

In the remainder of the paper, the genetic configuration of the organism will be represented as a binary sequence Inline graphic of total length Inline graphic, where Inline graphic (Inline graphic) stands for the presence (absence) of a given mutation in the landscape of interest. The SSWM assumption together with the fact that we only consider binary sequences gives the configuration space the topological structure of a hypercube of dimension Inline graphic. Accessibility can then be quantified by studying the accessible mutational paths [2], [3], [29]. A mutational path is a collection of point mutations connecting an initial state Inline graphic with a final state Inline graphic. If these two states differ at Inline graphic sites, there are Inline graphic shortest paths connecting them, corresponding to the different orders in which the mutations can be introduced into the population [30]. The assumed weak mutation rate implies that paths longer than the shortest possible path have a much lower probability of occurrence and hence are not considered here, adding to the constraints already imposed on accessibility. A mutational path is considered selectively accessible (or accessible for short) if the fitness values encountered along it are monotonically increasing; thus along such a path, the population never encounters a decline in fitness. If two states are separated by a fitness valley, the path is inaccessible. Neutral mutations are generally not detected in the empirical fitness data sets of interest here, though they may be present at a finer scale of resolution [31]. In our modeling we therefore assume that the fitness values of neighboring genotypes can always be distinguished (but see the discussion of the holey landscape model below).

Unlike Ref. [3] we only consider whether a given path is at all accessible or not, independent of the probability of the path actually being found by the population. Our reason for focusing on this restricted notion of accessibility is that it can be formulated solely with reference to the underlying fitness landscape, without the need to specify the adaptive dynamics of the population (see also Discussion ). The endpoint of the paths considered here, much like in the experimental studies [3], [4], [10], is the global fitness maximum, and the starting point is the ‘antipodal’ sequence which differs from the optimal sequence at all Inline graphic loci. Because it is at the opposite end of the configuration space, these are the longest direct paths. As such, they are a priori the least likely to be accessible and thus give a lower limit on the accessibility of typical paths (note that the mean length of the path from a randomly chosen genotype to the global maximum is Inline graphic).

For a fitness landscape comprised of up to Inline graphic mutations, there are a total of Inline graphic paths connecting the antipodal sequence to the global maximum. How many of them are selectively accessible in the sense described above? Given that natural selection is expected to act genome-wide, we are interested in the behavior of accessibility properties when the number of loci Inline graphic becomes very large. Two questions are of particular interest: What is the probability of finding at least one accessible path, and what number of accessible paths can one expect to find on average? The first question addresses the overall accessibility of the global fitness maximum [32], while the second question is relevant for the repeatability of evolution: If there are many possible mutational pathways connecting the initial genotype to the global maximum, depending on population dynamics different pathways can be chosen in replicate experiments and repeatability will be low. To address these questions in a quantitative way, consider a sample of fitness landscapes, obtained e.g. as random realizations of a landscape model or by choosing subsets of mutations from a large empirical data set (see Figure 1). The fraction of these that have exactly Inline graphic accessible paths is denoted by Inline graphic, and gives an estimate of the probability that a given fitness landscape has Inline graphic accessible paths (cf. Figure 2). The expected number of paths is given by the mean of this probability distribution,

graphic file with name pcbi.1002134.e023.jpg (1)

and Inline graphic is the probability to find at least one accessible path. The behavior of these two quantities will be investigated in the following, both for model landscapes and on the basis of empirical data.

Figure 2. Accessibility of mutational pathways in the House-of-Cards model.

Figure 2

Main figure shows the distribution of the number of accessible paths for three different sequence lengths in the HoC model in semi-logarithmic scales. The value of Inline graphic is an outlier, indicating that a large fraction of landscapes have no accessible paths at all. This is a typical feature of rugged fitness landscapes of moderate dimensionality Inline graphic, see Figures S4 and S5. Inset shows Inline graphic as function of Inline graphic for the HoC model. The top curve makes no assumptions about the antipodal sequence, while the bottom curve assumes it to be the global fitness minimum. Note the decline in the bottom curve.

Results

House of Cards (HoC) model

Consider a model where fitness values are uncorrelated and a single mutation may change fitness completely [21], [24], [29]; following Kingman [33] we refer to this as the ‘House of Cards’ model. In real organisms one expects fitnesses of closely related genotypes to be at least somewhat correlated, and in this sense the HoC model serves as a null model. The expected number of accessible paths can be computed exactly by a simple order statistics argument [34]. Each of the Inline graphic shortest paths contains Inline graphic genotypes. Out of the Inline graphic fitness values encountered along a path, all but the last one (which is known to be the global maximum) are arranged in any order with equal probability. One of the Inline graphic possible orderings is monotonic in fitness, hence for the HoC model

graphic file with name pcbi.1002134.e033.jpg (2)

for all Inline graphic. The probability Inline graphic of not finding any path is more difficult to compute and was so far only analyzed by numerical simulations. We find that for sequence lengths up to Inline graphic, Inline graphic appears to approach unity, see inset of Figure 2 and Figure S1. Whether this is asymptotically true remains to be established, but the scaling plot in the inset of Figure 3 suggests that Inline graphic is indeed monotonically increasing for all finite Inline graphic.

Figure 3. Accessibility in fitness landscape models with tunable ruggedness.

Figure 3

(A) Behavior of Inline graphic in the RMF model as function of the correlation parameter Inline graphic. Inset shows normalized rescaled curves, all taking their maximum at Inline graphic. This implies that Inline graphic increases monotonically only for Inline graphic. (B) Probability Inline graphic for the Inline graphic model as a function of Inline graphic at fixed Inline graphic (main figure) and fixed Inline graphic (inset), respectively.

This behavior changes drastically when the antipodal state is required to be the global fitness minimum. This case was considered previously by Carneiro and Hartl [32], who postulated that Inline graphic saturates to an asymptotic value around Inline graphic for large Inline graphic. However by continuing the simulations to Inline graphic, one sees a clear decline (inset of Figure 2), indicating that accessibility increases with increasing Inline graphic. We will see in the following that this is in fact the generic situation.

Rough Mount Fuji (RMF) model

Next we ask what happens if some fitness correlations are introduced. The Rough Mount Fuji (RMF) model [35] accomplishes just that: Denoting the number of mutations separating a given genotype Inline graphic from the global optimum by Inline graphic, the RMF model assigns fitness values according to

graphic file with name pcbi.1002134.e057.jpg (3)

where Inline graphic is a constant and the Inline graphic are independent normal random variables with zero mean and unit variance. When Inline graphic the RMF reduces to the HoC case, and thus it can serve as starting point for approximate calculations to first order in Inline graphic. For the expected number of accessible paths one obtains [34]

graphic file with name pcbi.1002134.e062.jpg (4)

where Inline graphic and terms of higher order have been neglected (see also Eq. (7)). In this limit Inline graphic grows like Inline graphic for large Inline graphic and constant Inline graphic. Compared to the HoC case Inline graphic, this shows that the large Inline graphic-behavior of a landscape with even the slightest correlation between fitness values is substantially different from the case without correlations.

The probability of finding no accessible paths was again obtained by numerical simulation, and is shown in Figure 3(A). In striking contrast to the unconstrained HoC model, the probability Inline graphic of finding at least one accessible path is seen to increase for large Inline graphic. Motivated by the result (4), in the inset of Figure 3(A) the simulation results are plotted as a function of Inline graphic, which leads to an approximate collapse of the different data sets. On the basis of these results we conjecture that, for any Inline graphic, the probability Inline graphic decreases for large Inline graphic, and most likely approaches zero asymptotically for Inline graphic.

LK model

Better known as the NK-model [21], [36], this classical model explicitly takes into account epistatic interactions among different loci. Each of the Inline graphic sites in the genome is assigned a certain number Inline graphic of other sites with which it interacts, and for each of the possible Inline graphic states of this set of interacting loci the site under consideration contributes to the fitness by a random amount. Thus the parameter Inline graphic defines the size of the epistatically interacting parts of the sequence and provides a measure for the amount of epistasis. Like the RMF model, the Inline graphic model reduces in one limit to the HoC case, which is realized for Inline graphic.

Due to the construction of the model, even local properties such as the number of local fitness optima [37], [38] are generally very difficult to compute. Figure 3(B) shows the variation of Inline graphic with Inline graphic obtained from numerical simulations of the Inline graphic model. In this figure two different relations between Inline graphic (the number of interacting loci) and Inline graphic (the total number of loci) were employed. In the main plot the fraction of interacting loci Inline graphic was kept constant. Under this scenario, the curves show a non-monotonic behavior of Inline graphic similar to that of the RMF model at constant epistasis parameter Inline graphic. In the inset, the number of interacting loci Inline graphic is kept fixed, which results in a monotonic decrease of Inline graphic. A third possibility is to fix the difference Inline graphic (the number of non-interacting loci), see Figure S2. In this case one can argue that for Inline graphic, the difference in behavior between Inline graphic and Inline graphic, say, should not be substantial, and indeed the curves for Inline graphic seem to be monotonically increasing with Inline graphic, showing qualitatively the same behavior as the curve for Inline graphic, which is equivalent to the HoC model. Finally, in Figure S3 we show the expected number of accessible paths for different values of Inline graphic and Inline graphic. The data are seen to interpolate smoothly between the known limits Inline graphic for Inline graphic and Inline graphic for Inline graphic.

Holey landscapes

The neutral theory of evolution [39] implies a very simple, flat fitness landscape without maxima or minima. When strongly deleterious mutations are included, the resulting fitness landscape has plateaus of viable states and stretches of lethal states [40]. Such ‘holey’ landscapes can be mapped [41] to the problem of percolation, a paradigm of statistical physics [42]. In percolation, each configuration is either viable (fitness Inline graphic) with probability Inline graphic or lethal (fitness Inline graphic) with probability Inline graphic, independent of the others. Our definition of accessibility must be adapted in this case, as there is no notion of increasing fitness and no global fitness optimum. However, one can still ask the question whether it is possible to get from one end of configuration space to the other on a shortest path of length Inline graphic without encountering a ‘hole’, i.e. a non-viable state. Apart from the restriction to shortest paths, the probability Inline graphic of finding at least one connecting path then corresponds to the percolation probability.

The percolation problem on the hypercube differs from the standard case of percolation on finite-dimensional lattices [42] in that the parameter Inline graphic represents both the dimensionality and the diameter of the configuration space. Percolation properties are therefore described by statements that hold asymptotically for large Inline graphic under some suitable scaling of the viability probability Inline graphic [43], [44]. Specifically, when Inline graphic for some constant Inline graphic, it is known that for Inline graphic a giant connected set of viable genotypes emerges for Inline graphic. Conversely, taking Inline graphic at fixed Inline graphic one expects that two antipodal genotypes are connected by a path with a probability approaching unity. Indeed, the simulation results shown in Figure S4 support the conjecture that the quantity corresponding to Inline graphic vanishes for large Inline graphic and any Inline graphic. The equivalent of computing Inline graphic is straightforward: The probability that Inline graphic consecutive states are viable factorizes by independence of the fitness values to the product of the individual probabilities of viability, to simply yield Inline graphic, which, as Inline graphic, decays exponentially. We already know that there are Inline graphic possible paths in the sequence space, thus we find

graphic file with name pcbi.1002134.e129.jpg (5)

Since Inline graphic grows faster than Inline graphic declines, Inline graphic grows without bounds for large Inline graphic.

Comparison to empirical data

Next we compare the predictions of the models described so far to the results of the analysis of a large empirical data set obtained from fitness measurements for the asexual filamentous fungus A. niger. As described in more detail in Materials and Methods , we analyzed the accessibility properties of ensembles of subgraphs containing subsets of Inline graphic out of a total of 8 mutations which are individually deleterious but display significant epistatic interactions [17]. The full data set contains fitness values for 186 out of the Inline graphic possible strains, and statistical analysis shows that the 70 missing combinations can be treated as non-viable genotypes with zero fitness. The distribution of the non-viable genotypes in the subgraph ensemble is well described by a simple two-parameter model which reveals that the lysine deficiency mutation lysD25 is about 25 times more likely to cause lethality than the other seven mutations (see Materials and Methods ).

Results of the subgraph analysis are displayed in Table 1 and in Figure 4. The data in Figure 4(A) show a systematic increase of the average number of accessible paths with the mutational distance Inline graphic in the empirical data, which rules out the null hypothesis of uncorrelated fitness values and is quantitatively consistent with the RMF model with Inline graphic (inset). The data for even subgraph sizes Inline graphic are equally well described by the Inline graphic model with Inline graphic and Inline graphic (main figure). Alternatively, the empirical data can be compared to the results of a subgraph analysis of a Inline graphic fitness landscape with fixed Inline graphic and Inline graphic (Figure S5). While the fit between model and data is less satisfactory than that shown in Figure 4(A), the comparison is consistent with a value of Inline graphic between 4 and 5, which again indicates that each locus interacts with roughly half of the other loci.

Table 1. Subgraphs of the A. niger data set.

Inline graphic # SG # VSG Inline graphic Inline graphic Inline graphic
2 28 20 (19.5) 1.61 (1.72) 0.82 0.36
3 56 29 (28.1) 4.05 (4.22) 1.34 0.39
4 70 19 (19.5) 12.53 (13.19) 2.01 0.50
5 56 4 (4.9) 55.32 (48.81) 3.16 0.63
6 28 0 (0.2) 246.0 (201.16) 6.07 0.68

The table summarizes properties of subgraphs of sizes Inline graphic of the empirical A. niger fitness landscape. Second column shows the total number of subgraphs Inline graphic and third column the number of viable subgraphs not containing any non-viable genotypes, with the model prediction (10) given in brackets. Fourth column contains the number of accessible paths that would be present if accessibility were reduced only because of the presence of non-viable genotype, with the model prediction (11) shown in brackets. Finally, the last two columns show the mean number of accessible paths Inline graphic and the probability of no accessible path Inline graphic, respectively, computed from the full subgraph ensemble.

Figure 4. Comparison of models to empirical data.

Figure 4

(A) Mean number of accessible paths for HoC, RMF and Inline graphic models compared to the empirical A. niger data. With the exception of the HoC model, all curves show an increase of Inline graphic with Inline graphic. Both RMF (inset) and Inline graphic (main plot) models can be fit to the empirical data. Error bars on the empirical data represent standard deviations obtained from the resampling analysis. (B) Cumulative probability of the number of accessible paths as observed in the empirical fitness landscape compared to Inline graphic (main plot) and RMF (inset) model. Error bars represent the standard deviation estimated by the resampling method.

Further analysis of statistical properties of the A. niger landscape confirms this conclusion. As an example, in Figure 4(B) we display the cumulative distribution of the number of accessible paths

graphic file with name pcbi.1002134.e159.jpg (6)

obtained from the analysis of the largest subgraph ensemble with Inline graphic. The main figure shows that good quantitative agreement is achieved with the Inline graphic Inline graphic model. The inset displays a similar comparison to the RMF-model, which leads to the estimate Inline graphic for the roughness parameter, in close agreement with the estimate obtained from Inline graphic.

For the Inline graphic subgraph ensemble, the probability Inline graphic of finding no accessible path is approximately 0.5. Corresponding estimates Inline graphic for other values of Inline graphic can be found in the last column of Table 1. Up to Inline graphic, the probability is found to increase with Inline graphic, which implies that the ultimate increase of accessibility (decrease in Inline graphic) predicted by the models cannot yet be seen on the scale of the empirical data. This is consistent with the estimates of the epistasis parameters Inline graphic and Inline graphic mentioned above, for which the maximum in Inline graphic is reached at or beyond six loci (compare to Figure 3).

Discussion

Evolutionary accessibility

The models considered here represent a wide variety of intuitions about fitness landscapes, from the null hypothesis of uncorrelated fitness values through explicitly epistatic models to the holey fitness landscapes derived from neutral theory, thus covering all classes of fitness landscapes that are expected to be relevant for real organisms. With the exception of the extreme case of uncorrelated fitness values, which is ruled out by comparison to the empirical data, all models show that fitness landscapes become highly accessible in the biologically relevant limit of large Inline graphic: The probability of finding at least one accessible path is an increasing function of Inline graphic which we conjecture to reach unity for Inline graphic, and the expected number of paths grows with Inline graphic without bounds. The latter feature limits the repeatability of evolutionary trajectories.

In view of the robustness of these properties, we believe that their origin lies in the topological structure of the configuration space: The probability of accessibility of a given path (and thus the relative fraction of accessible paths) decreases exponentially with Inline graphic, but this is overwhelmed by the combinatorial proliferation of possible paths (Inline graphic), see Eq. (5) for the neutral model and Eq. (8) for the RMF model. As we have imposed severe constraints on the adaptive process by prohibiting the crossing of fitness valleys by double mutations and by only considering shortest paths, our estimate of accessibility is rather conservative. We therefore expect that naturally occurring, genome-wide fitness landscapes should show a very high degree of accessibility as well.

A second general conclusion of our study is that pathway accessibility in epistatic fitness landscapes is subject to large fluctuations, as evidenced by the typical form of the probability distribution Inline graphic in Figure 2 and Figures S6, S7. For landscape dimensionalities Inline graphic in the range relevant for the available empirical studies, a substantial fraction of landscapes, given by Inline graphic, does not possess a single accessible pathway. On the other hand, for all models except the HoC model, the average number of accessible pathways exceeds unity and increases rapidly with increasing Inline graphic. This implies that in those landscapes in which the maximum is accessible at all, it is typically accessible through a large number of pathways. For example, among the 70 Inline graphic subgraphs of the A. niger landscape, half do not contain a single accessible path, but the average number of paths among the graphs with Inline graphic is 4, and two subgraphs display as many as 10 accessible paths.

This observation becomes relevant when applying similar analyses to empirical fitness landscapes based on mutations that are collectively beneficial, such as the examples described in [4], [15], [16]. In these cases the adapted multiple mutant could not have been formed easily by natural selection (alone) unless at least one selectively accessible pathway from the wildtype to the mutant existed. The statistics of such landscapes is therefore biased towards larger accessibility, and a comparison with random models should then be based on the probability distribution Inline graphic conditioned on Inline graphic. The general question as to whether landscapes formed by combinations of beneficial or deleterious mutations have similar topographical properties can only be answered by further empirical studies.

The A. niger landscape

The analysis of accessible mutational pathways in the empirical A. niger data set has allowed us to quantify the amount of sign epistasis in this landscape in terms of model parameters like the roughness scale Inline graphic in the RMF model or the number of interacting loci Inline graphic in the Inline graphic model. Similar to a recent experimental study of viral adaptation [45], we ruled out the null model of a completely uncorrelated fitness landscape. Nevertheless our results suggest that the epistatic interactions in this system are remarkably strong. To put our estimate of Inline graphic into perspective, we carried out a subgraph analysis of the TEM Inline graphic-lactamase antibiotic resistance landscape obtained in [3] (Figure S8). In this case the number of loci is Inline graphic, and the comparison of the mean number of accessible paths in subgraphs of sizes Inline graphic with simulation results for the Inline graphic model suggests that Inline graphic, significantly smaller than the estimate Inline graphic obtained for the A. niger landscape. A low value of Inline graphic was also found in the analysis of a DNA-protein affinity landscape for the set of all possible 10 base oligomers [46].

Our finding of a high level of intergenic sign epistasis, compared to the examples of intragenic epistasis considered in [3] and [46], contradicts the general expectation that epistatic interactions should be stronger within genes than between genes [15], [16], [47]. Note, however, that the comparisons among the available epistasis data are confounded by differences in the combined fitness of the mutations involved: while the A. niger mutations were chosen without a priori knowledge of their (combined) fitness effects, the mutations considered in most studies were known to be collectively beneficial [3], [4], [9], [13], [15], [16], and hence biased against negative epistatic combinations.

Population dynamics

In the present paper we have focused on the existence of accessible mutational pathways, without explicitly addressing the probability that a given pathway will actually be found under a specific evolutionary scenario. This probability is expected to depend on population parameters, primarily on the mutation supply rate Inline graphic, in a complex way. In the SSWM regime characterized by Inline graphic it is straightforward in principle to assign probabilistic weights to mutational pathways in terms of the known transition probabilities of the individual steps [3], [26]. For larger populations additional effects come into play, whose bearing on accessibility and predictability is difficult to assess.

On the one hand, an increase in the mutation supply rate Inline graphic may bias adaptation towards the use of mutations of large beneficial effects, which makes the evolutionary process more deterministic [24] but also more prone to trapping at local fitness maxima [48]. While this reduces the accessibility of the global optimum, at the same time the crossing of fitness valleys becomes more likely due to the fixation of multiple mutations at once [28], which tests mutants for their short-term evolvability [49] and enlarges the set of possible mutational pathways. We plan to address the interplay between landscape structure and population parameters in their effect on pathway accessibility in a future publication.

Materials and Methods

Numerical simulations

For the numerical simulations of random landscapes, fitness values were assigned to each of the Inline graphic genotypes according to the ensemble to be sampled from (HoC, RMF or Inline graphic model). The number of paths was then found by a depth-first backtracking algorithm implemented as an iterative subroutine starting at the antipodal genotype and either moving forward, i.e. towards the global fitness maximum, or, if a local maximum is reached, going back to the last genotype encountered before the local maximum. For finding the probability Inline graphic of no accessible paths, the search was ended upon finding the first path, making this search much faster than that for the full distribution of paths and thus enabling us to consider much larger genotype spaces. Results were typically averaged over Inline graphic realizations of the random landscape. In analyzing the empirical A. niger data, the same routines were used but with the measured fitness values as input instead of fitness values sampled from one of the models.

Analytic results for the RMF model

It was argued above that both the expected number of accessible paths Inline graphic and the probability of no accessible path Inline graphic behave fundamentally different for Inline graphic (HoC-model) and the RMF model with strictly positive Inline graphic, even if Inline graphic. Here we provide additional information on the relation (4) and lend support to the statement that typically Inline graphic, the probability of a given path being accessible, decays exponentially in Inline graphic. Since by linearity of the expected value Inline graphic, it is sufficient to consider Inline graphic to compute Inline graphic.

It was shown in [34] that

graphic file with name pcbi.1002134.e217.jpg (7)

for Inline graphic, where Inline graphic is the probability density of the random fitness contribution Inline graphic. From this form it is clear that the HoC case Inline graphic is quite different from the general case Inline graphic. Note that according to (7), Inline graphic still decays factorially as Inline graphic. This changes, however, when higher order terms in Inline graphic are taken into account.

For the special case when the random fitness contributions are drawn from the Gumbel distribution Inline graphic, the probability Inline graphic can be computed explicitly for any Inline graphic [34]. One obtains the expression

graphic file with name pcbi.1002134.e229.jpg (8)

with Inline graphic. For large Inline graphic, the denominator approaches a constant given by

graphic file with name pcbi.1002134.e232.jpg (9)

and thus Inline graphic decays exponentially, Inline graphic. We expect this behavior to be generic for most choices of Inline graphic.

Data set

The fitness values constituting the 8-locus empirical data set are presented in Table S1. Here we briefly describe how these values were obtained. A detailed description of the construction and fitness measurement of the A. niger strains is given elsewhere [10], [17].

Briefly, A. niger is an asexual filamentous fungus with a predominantly haploid life cycle. However, at a low rate haploid nuclei fuse and become diploid; these diploid nuclei are often unstable and generate haploid nuclei by random chromosome segregation. This alternation of ploidy levels resembles the sexual life cycle of haploid organisms and is termed parasexual cycle, since it does not involve two sexes. We exploited the parasexual cycle of A. niger to isolate haploid segregants from a diploid strain that originated from a heterokaryon between two strains that were isogenic, except for the presence of eight phenotypic marker mutations in one strain, one on each of its eight chromosomes. These mutations include, in increasing chromosomal order, fwnA1 (fawn-colored conidiospores), argH12 (arginine deficiency), pyrA5 (pyrimidine deficiency), leuA1 (leucine deficiency), pheA1 (phenyl-alanine deficiency), lysD25 (lysine deficiency), oliC2 (oligomycin resistance), and crnB12 (chlorate resistance). The wild-type strain only carried a spore-color marker (olvA1, causing olive-colored conidiospores) on its first chromosome to allow haploid segregants to be distinguished from the diploid mycelium with black-colored conidiospores. Because these mutations were individually induced with a low dose of UV and combined using the parasexual cycle it was unlikely that the two strains differed at loci other than those of the eight markers.

From the Inline graphic possible haploid segregants, 186 were isolated after forced haploidization of the heterozygous diploid strain on benomyl medium from among 2,500 strains tested. Fitness of all strains was measured with two-fold replication by measuring the linear mycelium growth rate in two perpendicular directions during radial colony growth on supplemented medium that allowed the growth of all strains, and was expressed relative to the mycelium growth rate of the olvA1 strain with the highest growth rate (see Table S1). As will be explained in the next section, missing genotypes are assigned zero fitness.

Data analysis

To analyze the data set, first one has to address the problem of missing strains. In the experiments, Inline graphic out of Inline graphic possible strains were found in approximately Inline graphic segregants. Assume first that all genotypes are equally likely to be found in the sample. Denoting the number of segregants by Inline graphic, the probability for a given strain to be missed by chance is Inline graphic. The probability Inline graphic for at most Inline graphic genotypes to have been missed is then given by a Poisson distribution with mean Inline graphic. This gives the estimates Inline graphic and Inline graphic. For a more conservative estimate, one may assume that different genotypes have different likelihoods to be found, which are uniformly distributed in the interval Inline graphic with Inline graphic. Choosing Inline graphic which corresponds to the lowest relative fitness that was observed among the viable genotypes, simulations of this scenario yield Inline graphic and Inline graphic. We conclude that it is unlikely that more than one viable genotype has been missed by chance. This justifies the assignment of zero fitness to the missing 70 genotypes.

Next we need to verify that accessibility in the empirical fitness landscape is predominantly determined by sign epistasis among viable genotypes, rather than by the presence of lethals. As described in the main text, we consider subgraphs of the A. niger data set containing all combinations of Inline graphic of the eight mutations in total. The set of subgraphs of size Inline graphic is composed of Inline graphic distinct Inline graphic-locus landscapes, each of which spans a region in genotype space ranging from the wild type genotype shared by all subgraphs to one particular Inline graphic-fold mutant. We focus here on the ensembles with Inline graphic.

Key properties of the subgraph ensembles are summarized in Table 1. The first column shows the total number Inline graphic of subgraphs, and the second column shows the number of viable subgraphs (VSG's), defined as subgraphs which contain no non-viable strains. Two of the four VSG's with Inline graphic were previously analyzed in [10], and three of the 19 VSG's with Inline graphic are shown in Figure 1. To assess the impact of lethal genotypes on accessibility, let Inline graphic denote the average number of accessible paths per subgraph (averaged across all subgraphs of fixed Inline graphic) that would be present if only lethal states were allowed to block a path and the actual fitness values of viable genotypes were ignored. Similarly, Inline graphic denotes the average number of accessible paths per subgraph for fixed Inline graphic if both mechanisms for blocking are taken into account. Comparison between the two numbers, displayed in the fourth and fifth column of Table 1, shows that the contribution of the lethal mutants to reducing pathway accessibility is relatively minor. For example, for Inline graphic lethals reduce the number of accessible paths from Inline graphic to Inline graphic, by a factor of Inline graphic, whereas the epistasis among viable genotypes leads to a much more substantial further reduction from Inline graphic to Inline graphic, by a factor of Inline graphic; for Inline graphic the corresponding factors are Inline graphic and Inline graphic. We conclude that pathway accessibility is determined primarily by epistasis among viable genotypes.

Inspection of the VSG's shows that the role of different mutations in causing lethality is strikingly inhomogeneous. In particular, we find that the lysine deficiency mutation lysD25 is not present in any of the VSG's, whereas the distribution of the other mutations across the VSG's is roughly homogeneous. The lys mutation is also strongly overrepresented in the non-viable strains, being present in Inline graphic out of Inline graphic cases. The main features of the set of lethal mutations can be captured in a simple model in which the presence of a mutation Inline graphic leads to a non-viable strain with probability Inline graphic, and different mutations interact multiplicatively, such that a strain containing two mutations Inline graphic and Inline graphic is viable with probability Inline graphic. The data for the number of VSG's for different Inline graphic cannot be described assuming the Inline graphic to be the same for all mutations, but a two-parameter model assigning probability Inline graphic to the Inline graphic mutation and a common value Inline graphic to all others suffices. Simple analysis show that under this model the expected total number of viable strains is Inline graphic, while the total number of viable strains in the subset of strains excluding lys is Inline graphic. With Inline graphic and Inline graphic we obtain the estimates Inline graphic and Inline graphic. Given that the VSG's do not contain the lys mutation, the expected number of VSG's depends only on Inline graphic, and is given by

graphic file with name pcbi.1002134.e294.jpg (10)

The prediction for the expected number of viable subgraphs is shown in brackets in the third column of Table 1, and is seen to match the data very well. Similarly, the expected number of paths that do not contain any lethal genotypes can be computed analytically, resulting in the expression

graphic file with name pcbi.1002134.e295.jpg (11)

which is shown in brackets in the fourth column of Table 1.

Resampling procedure

The accessibility of mutational pathways in the A. niger data set was analyzed using two different approaches. The first approach is based on a single set of fitness values obtained by averaging the two replicate fitness measurements for each strain; these average fitness values are shown in Table S1. In this approach the fitness assigned to each viable genotype is a normally distributed random variable with the mean given by the average of the two fitness measurements and a common standard deviation Inline graphic estimated from the mean squared differences between replicate fitness values in the entire data set; the fitness of genotypes identified as non-viable remains zero. Statistical properties of accessible pathways are then computed by averaging over Inline graphic realizations of this resampled landscape ensemble. Empirical data points and error bars shown in Figure 4 represent the mean and standard deviations obtained from the second approach. Results obtained by directly analyzing the mean fitness landscape (first approach) do not differ significantly from those presented here.

Supporting Information

Figure S1

Plot of Inline graphic as function of Inline graphic for the HoC model. While the extrapolation to Inline graphic is not straightforward, Inline graphic clearly decreases monotonically with a limiting value below Inline graphic.

(PDF)

Figure S2

Simulation results for the probability of finding no accessible path in the Inline graphic model when the number of non-interacting loci Inline graphic is kept fixed.

(PDF)

Figure S3

Simulation results for the mean number of accessible paths for the Inline graphic model.

(PDF)

Figure S4

Simulation results for the probability of finding no shortest connected path between two viable antipodal genotypes for the holey landscape (neutral) model at different viability probabilities Inline graphic. In these simulations the initial genotype and its antipode were constrained to be viable.

(PDF)

Figure S5

Mean number of accessible paths obtained from subgraph analysis of the A. niger landscape (diamonds with error bars) compared to the results of a subgraph analysis of Inline graphic landscapes with Inline graphic, Inline graphic (circles) and Inline graphic (squares) and Inline graphic (triangles).

(PDF)

Figure S6

Distribution of the number of accessible paths in the RMF model with Inline graphic. Note that the behavior for the HoC-case Inline graphic is typical for small values of Inline graphic with most of the probabilistic weight on Inline graphic. This changes for larger values of Inline graphic, where the probabilistic weight shifts towards many accessible paths. This effect becomes more pronounced as Inline graphic grows.

(PDF)

Figure S7

Distribution of the number of accessible paths for the LK model with Inline graphic and different values of Inline graphic. For all Inline graphic, the most likely outcome is Inline graphic. Note the pronounced peaks for Inline graphic, which reflect complex combinatorial correlations among the paths.

(PDF)

Figure S8

Mean number of accessible paths obtained from subgraph analysis of the TEM Inline graphic-lactamase resistance landscape of Weinreich et al. [3] (squares) compared to the results of a subgraph analysis of Inline graphic landscapes with Inline graphic, Inline graphic (triangles), Inline graphic (crosses) and Inline graphic (circles).

(PDF)

Table S1

Mean fitness Inline graphic (mycelium growth rate) of the 186 segregants of A. niger relative to that of the wildtype strain with the olv marker. Presence or absence of marker mutations is indicated with 1 and 0, respectively. Missing genotypes are marked with Inline graphic.

(PDF)

Acknowledgments

We thank Simon Gravel, Su-Chan Park, Chris Marx, Martijn Schenk and Shamil Sunyaev for useful discussions and suggestions.

Footnotes

The authors have declared that no competing interests exist.

This work was supported by Deutsche Forschungsgemeinschaft (http://www.dfg.de) within SFB 680 “Molecular Basis of Evolutionary Innovations” and within the Bonn-Cologne Graduate School of Physics and Astronomy, as well as by Studienstiftung des deutschen Volkes (www.studienstiftung.de) through a fellowship to JF. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Phillips PC. Epistasis - the essential role of gene interactions in the structure and evolution of genetic systems. Nat Rev Genet. 2008;9:855–867. doi: 10.1038/nrg2452. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Hall BG. Predicting evolution by in vitro evolution requires determining evolutionary pathways. Antimicrob Agents Chemother. 2002;46:3035–3038. doi: 10.1128/AAC.46.9.3035-3038.2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Weinreich DM, Delaney NF, DePristo MA, Hartl DM. Darwinian evolution can follow only very few mutational paths to fitter proteins. Science. 2006;312:111–114. doi: 10.1126/science.1123539. [DOI] [PubMed] [Google Scholar]
  • 4.Lozovsky ER, Chookajorn T, Brown KM, Imwong M, Shaw PJ, et al. Stepwise acquisition of pyrimethamine resistance in the malaria parasite. Proc Natl Acad Sci U S A. 2009;106:12025–12030. doi: 10.1073/pnas.0905922106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Segrè D, DeLuna A, Church GM, Kishony R. Modular epistasis in yeast metabolism. Nat Genet. 2005;37:1. doi: 10.1038/ng1489. [DOI] [PubMed] [Google Scholar]
  • 6.Weinreich DM, Watson RA, Chao L. Perspective: Sign epistasis and genetic constraints on evolutionary trajectories. Evolution. 2005;59:1165–1174. [PubMed] [Google Scholar]
  • 7.Poelwijk FJ, Kiviet DJ, Weinreich DM, Tans SJ. Empirical fitness landscapes reveal accessible evolutionary paths. Nature. 2007;445:383–386. doi: 10.1038/nature05451. [DOI] [PubMed] [Google Scholar]
  • 8.Kvitek DJ, Sherlock G. Reciprocal sign epistasis between frequently experimentally evolved adaptive mutations causes a rugged fitness landscape. PLoS Genet. 2011;7:e1002056. doi: 10.1371/journal.pgen.1002056. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Lunzer M, Miller SP, Felsheim R, Dean AM. The biochemical architecture of an ancient adaptive landscape. Science. 2005;310:4899–501. doi: 10.1126/science.1115649. [DOI] [PubMed] [Google Scholar]
  • 10.de Visser JAGM, Park SC, Krug J. Exploring the effect of sex on empirical fitness landscapes. Am Nat. 2009;174:S15–S30. doi: 10.1086/599081. [DOI] [PubMed] [Google Scholar]
  • 11.Kogenaru M, de Vos MGJ, Tans SJ. Revealing evolutionary pathways by fitness landscape reconstruction. Crit Rev Biochem Mol. 2009;44:169–174. doi: 10.1080/10409230903039658. [DOI] [PubMed] [Google Scholar]
  • 12.Dawid A, Kiviet DJ, Kogenaru M, de Vos M, Tans SJ. Multiple peaks and reciprocal sign epistasis in an empirically determined genotype-phenotype landscape. Chaos. 2010;20:026105. doi: 10.1063/1.3453602. [DOI] [PubMed] [Google Scholar]
  • 13.da Silva J, Coetzer M, Nedellec R, Pastore C, Mosier DE. Fitness epistasis and constraints on adaptation in a human immunodeficiency virus type I protein region. Genetics. 2010;185:293–303. doi: 10.1534/genetics.109.112458. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Tan L, Serene S, Chao HX, Gore J. Hidden randomness between fitness landscapes limits reverse evolution. Phys Rev Lett. 2011;106:198102. doi: 10.1103/PhysRevLett.106.198102. [DOI] [PubMed] [Google Scholar]
  • 15.Chou HH, Chiu HC, Delaney NF, Segré D, Marx CJ. Diminishing returns epistasis among beneficial mutations decelerates adaptation. Science. 2011;332:1190–1192. doi: 10.1126/science.1203799. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Khan AI, Dinh DM, Schneider D, Lenski RE, Cooper TF. Negative epistasis between beneficial mutations in an evolving bacterial population. Science. 2011;332:1193–1196. doi: 10.1126/science.1203801. [DOI] [PubMed] [Google Scholar]
  • 17.de Visser JAGM, Hoekstra RF, van den Ende H. Test of interaction between genetic markers that affect fitness in Aspergillus niger. Evolution. 1997;51:1499–1505. doi: 10.1111/j.1558-5646.1997.tb01473.x. [DOI] [PubMed] [Google Scholar]
  • 18.Costanzo M, Baryshnikova A, Bellay J, Kim Y, Spear ED, et al. The genetic landscape of a cell. Science. 2010;327:425–431. doi: 10.1126/science.1180823. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Fisher RA. The genetical theory of natural selection. New York: Dover; 1958. [Google Scholar]
  • 20.Wright S. The roles of mutatation, inbreeding, cross-breeding and selection in evolution. Proc Sixth Int Cong Genet. 1932;1:356–366. [Google Scholar]
  • 21.Kauffman SA. The Origins of Order. Oxford: Oxford University Press; 1993. [Google Scholar]
  • 22.Gillespie JH. Population Genetics: A concise guide. Baltimore: John Hopkins University Press; 2004. [Google Scholar]
  • 23.Hartl DL, Clark AG. Principles of Population Genetics. Sunderland, Massachusetts: Sinauer Associates; 1997. [Google Scholar]
  • 24.Jain K, Krug J. Deterministic and stochastic regimes of asexual evolution on rugged fitness landscapes. Genetics. 2007;175:1275–1288. doi: 10.1534/genetics.106.067165. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Gillespie JH. Some properties of finite populations experiencing strong selection and weak mutation. Am Nat. 1983;121:691–708. [Google Scholar]
  • 26.Orr HA. The population genetics of adaptation: The adaptation of DNA sequences. Evolution. 2002;56:1317–1330. doi: 10.1111/j.0014-3820.2002.tb01446.x. [DOI] [PubMed] [Google Scholar]
  • 27.Weinreich DM, Chao L. Rapid evolutionary escape by large populations from local fitness peaks is likely in nature. Evolution. 2005;59:1175–1182. [PubMed] [Google Scholar]
  • 28.Weissman DB, Desai MM, Fisher DS, Feldman MW. The rate at which asexual populations cross fitness valleys. Theor Popul Biol. 2009;75:286–300. doi: 10.1016/j.tpb.2009.02.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Kauffman S, Levin S. Towards a general theory of adaptive walks on rugged landscapes. J Theor Biol. 1987;128:11–45. doi: 10.1016/s0022-5193(87)80029-2. [DOI] [PubMed] [Google Scholar]
  • 30.Gokhale CS, Iwasa Y, Nowak MA, Traulsen A. The pace of evolution across fitness valleys. J Theor Biol. 2009;259:613–620. doi: 10.1016/j.jtbi.2009.04.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Wagner A. Neutralism and selectionism: a network-based reconciliation. Nat Rev Genet. 2008;9:965–974. doi: 10.1038/nrg2473. [DOI] [PubMed] [Google Scholar]
  • 32.Carneiro M, Hartl DL. Adaptive landscapes and protein evolution. Proc Natl Acad Sci U S A. 2010;107:1747–1751. doi: 10.1073/pnas.0906192106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Kingman JFC. A simple model for the balance between mutation and selection. J Appl Prob. 1978;15:1–12. [Google Scholar]
  • 34.Franke J, Wergen G, Krug J. Records and sequences of records from random variables with a linear trend. J Stat Mech: Theory Exp. 2010:P10013. [Google Scholar]
  • 35.Aita T, Uchiyama H, Inaoka T, Nakajima M, Kokubo T, et al. Analysis of a local fitness landscape with a model of the rough Mt. Fuji-type landscape: Application to protyl endopeptidase and thermolysis. Biopolymers. 2000;54:64–79. doi: 10.1002/(SICI)1097-0282(200007)54:1<64::AID-BIP70>3.0.CO;2-R. [DOI] [PubMed] [Google Scholar]
  • 36.Kauffman SA, Weinberger ED. The NK model of rugged fitness landscapes and its application to maturation of the immune response. J Theor Biol. 1989;141:211–245. doi: 10.1016/s0022-5193(89)80019-0. [DOI] [PubMed] [Google Scholar]
  • 37.Durrett R, Limic V. Rigorous results for the NK model. Ann Prob. 2003;31:1713–1753. [Google Scholar]
  • 38.Limic V, Pemantle R. More rigorous results on the Kauffman Levin model of evolution. Ann Prob. 2004;32:2149–2178. [Google Scholar]
  • 39.Kimura M. The neutral theory of molecular evolution. Cambridge: Cambridge University Press; 1983. [Google Scholar]
  • 40.Maynard Smith J. Natural selection and the concept of a protein space. Nature. 1970;225:563–564. doi: 10.1038/225563a0. [DOI] [PubMed] [Google Scholar]
  • 41.Gavrilets S. Fitness Landscapes and the Origin of Species. Princeton: Princeton University Press; 2004. [Google Scholar]
  • 42.Stauffer D, Aharony A. Introduction to percolation theory. London: Taylor & Francis; 1992. [Google Scholar]
  • 43.Gavrilets S, Gravner J. Percolation on the fitness hypercube and the evolution of reproductive isolation. J Theor Biol. 1997;184:51–64. doi: 10.1006/jtbi.1996.0242. [DOI] [PubMed] [Google Scholar]
  • 44.Reidys CM. Random induced subgraphs of generalized n-cubes. Adv Appl Math. 1997;19:360–377. [Google Scholar]
  • 45.Miller CR, Joyce P, Wichman H. Mutational effects and population dynamics during viral adaptation challenge current models. Genetics. 2011;187:185–202. doi: 10.1534/genetics.110.121400. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Rowe W, Platt M, Wedge DC, Day PJ, Kell DB, et al. Analysis of a complete DNA-protein affinity landscape. J R Soc Interface. 2010;7:397–408. doi: 10.1098/rsif.2009.0193. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Watson RA, Weinreich DM, Wakeley J. Genome structure and the benefit of sex. Evolution. 2011;65:523–536. doi: 10.1111/j.1558-5646.2010.01144.x. [DOI] [PubMed] [Google Scholar]
  • 48.Jain K, Krug J, Park SC. Evolutionary advantage of small populations on complex fitness landscapes. Evolution. 2011;65:1945–1955. doi: 10.1111/j.1558-5646.2011.01280.x. [DOI] [PubMed] [Google Scholar]
  • 49.Woods RJ, Barrick JE, Cooper TF, Shrestha U, Kauth MR, et al. Second-order selection for evolvability in a large Escherichia coli population. Science. 2011;331:1433–1436. doi: 10.1126/science.1198914. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Figure S1

Plot of Inline graphic as function of Inline graphic for the HoC model. While the extrapolation to Inline graphic is not straightforward, Inline graphic clearly decreases monotonically with a limiting value below Inline graphic.

(PDF)

Figure S2

Simulation results for the probability of finding no accessible path in the Inline graphic model when the number of non-interacting loci Inline graphic is kept fixed.

(PDF)

Figure S3

Simulation results for the mean number of accessible paths for the Inline graphic model.

(PDF)

Figure S4

Simulation results for the probability of finding no shortest connected path between two viable antipodal genotypes for the holey landscape (neutral) model at different viability probabilities Inline graphic. In these simulations the initial genotype and its antipode were constrained to be viable.

(PDF)

Figure S5

Mean number of accessible paths obtained from subgraph analysis of the A. niger landscape (diamonds with error bars) compared to the results of a subgraph analysis of Inline graphic landscapes with Inline graphic, Inline graphic (circles) and Inline graphic (squares) and Inline graphic (triangles).

(PDF)

Figure S6

Distribution of the number of accessible paths in the RMF model with Inline graphic. Note that the behavior for the HoC-case Inline graphic is typical for small values of Inline graphic with most of the probabilistic weight on Inline graphic. This changes for larger values of Inline graphic, where the probabilistic weight shifts towards many accessible paths. This effect becomes more pronounced as Inline graphic grows.

(PDF)

Figure S7

Distribution of the number of accessible paths for the LK model with Inline graphic and different values of Inline graphic. For all Inline graphic, the most likely outcome is Inline graphic. Note the pronounced peaks for Inline graphic, which reflect complex combinatorial correlations among the paths.

(PDF)

Figure S8

Mean number of accessible paths obtained from subgraph analysis of the TEM Inline graphic-lactamase resistance landscape of Weinreich et al. [3] (squares) compared to the results of a subgraph analysis of Inline graphic landscapes with Inline graphic, Inline graphic (triangles), Inline graphic (crosses) and Inline graphic (circles).

(PDF)

Table S1

Mean fitness Inline graphic (mycelium growth rate) of the 186 segregants of A. niger relative to that of the wildtype strain with the olv marker. Presence or absence of marker mutations is indicated with 1 and 0, respectively. Missing genotypes are marked with Inline graphic.

(PDF)


Articles from PLoS Computational Biology are provided here courtesy of PLOS

RESOURCES