Abstract
Evolution is a dynamic process. The two classical forces of evolution are mutation and selection. Assuming small mutation rates, evolution can be predicted based solely on the fitness differences between phenotypes. Predicting an evolutionary process under varying mutation rates as well as varying fitness is still an open question. Experimental procedures, however, do include these complexities along with fluctuating population sizes and stochastic events such as extinctions. We investigate the mutational path probabilities of systems having epistatic effects on both fitness and mutation rates using a theoretical and computational framework. In contrast to previous models, we do not limit ourselves to the typical strong selection, weak mutation (SSWM)-regime or to fixed population sizes. Rather we allow epistatic interactions to also affect mutation rates. This can lead to qualitatively non-trivial dynamics. Pathways, that are negligible in the SSWM-regime, can overcome fitness valleys and become accessible. This finding has the potential to extend the traditional predictions based on the SSWM foundation and bring us closer to what is observed in experimental systems.
How repeatable is evolution? As the metaphor by Stephen J Gould goes ‘if we run the tape of life back from the start how likely is it that we will get the same outcome that we see around us today?’1. The pioneering work of Lenski et al. tackled this question experimentally with microbes. It is now possible to literally play back evolution from a certain starting point and see where it leads2,3,4,5,6.
Such empirical explorations made the until then theoretical concept of fitness landscapes tangible. The concept of a fitness landscape is a mapping between the genotype and the phenotype of an organism. Since selection acts on the phenotype or essentially on the fitness of the phenotype, the genotype of each phenotype can be attributed a certain fitness. Connecting the genotypes which are one mutational step away from each other leads to the concept of fitness landscapes7,8. Such empirical studies do make it clear that predictions will not be based on simple rules but complicated phenomena such as epistasis and epigenetics which play a major role in the process of evolution6,9,10.
Epistasis is any deviation from the additive effects of alleles at different loci11. Epistasis gives rise to rugged fitness landscapes which have been found to be quite common in experimental observations in a variety of model systems12,13. In particular, reciprocal sign epistasis is a necessary condition for having a rugged fitness landscape14. While in magnitude epistasis the fitness always increases (or decreases) with every additional mutation in a non-additive manner, in sign epistasis, however, valleys appear in the fitness landscape. A certain mutation might have a lower fitness than the previous state although it leads to higher fitness eventually. In such a case not all paths in the fitness landscape might be accessible by the population15. Comparing experimental systems to theoretical predictions made on the basis of the underlying fitness landscape helps elucidate the role of microscopic properties of the system in determining the macroscopic evolutionary trajectory. The details of the process such as the mutation rate, fitnesses of individual states and the global population size act as constraints on the accessibility of paths13. Using the assumption of strong selection and weak mutation rates (SSWM), the system advances on the fitness landscape in a stepwise fashion. This automatically limits the possible number of adaptive paths10.
Evolutionary predictability and the speed of the dynamics is not only determined by the molecular constraints of fitness and mutation rate but also by population dynamics14. Theoretical explorations often assume a fixed population size starting at one node of the fitness landscape and its movement is tracked over the course of time. Increasing the population size, or the mutation rate, we observe the phenomenon of clonal interference15,16. This occurs when a second step mutant arises in a population even when the first step mutation is not fixed. In other words, the SSWM assumption is no longer valid. Clonal interference has been extensively explored experimentally17,18,19 as well as theoretically16,20,21,22,23,24,25. This phenomenon removes the limit on the accessibility of non-adaptive trajectories. If the fitnesses and mutation rates align to particular conditions, i.e. the mutation rates also underlie epistatic interactions, then such valley crossings might be faster than adaptive trajectories24,26.
Populations in real systems are finite and their size can undergo fluctuations which can lead to possible extinction events. Together with the phenomena of clonal interference and epistatic interactions between mutations (correlated rugged fitness landscapes), predicting evolution through a given fitness landscape seems like an impossible task. Herein we develop a general methodology for predicting all path probabilities in a fitness landscape with epistatic interactions in a multi-dimensional fitness landscape. To reflect a realistic scenario we use a multi-type branching process (e.g. Ref. 27) to drop the assumption of a constant population size. For presentation purposes we limit ourselves to systems without back mutations. The model in its full generality is free of this assumption, although it is unclear how to define pathways when back mutations are allowed (see Supplementary Information for a detailed explanation). To introduce the framework we begin with a simple model in which the wild type can have two independent mutations leading to the fittest type. Then we increase the number of mutational events it takes to get to the corresponding type leading to a generalization of the methodology. We briefly mention an application of this approach by linking it to a cancer initiation model28 showing how mutational epistasis changes the path probabilities. Finally we provide an outline on how to extend the model to a general system where different mutations need to be acquired to reach the final mutant.
Methods and Results
Probability Generating Function
For our methodology, we are making use of extinction probabilities, more specifically the probability for different types to be present or not to be present. In a branching process this probability can be recursively obtained using probability generating functions (PGFs). Since the relation between PGFs and the probability for a type to be present is the main tool we are using, we devote this subsection to giving a short overview about this correlation, although it is rather technical and well known (e.g. Refs. 27, 30).
The PGF in discrete time for a one-type process is in general defined as
where k denotes the number of offspring and pk represents the probability of having k offspring (the focal individual dies in this context)27. For many biological processes, for example cell multiplication, it makes sense to only consider offspring numbers of 0 (death), 1 (nothing happens), and 2 (cell division). But in other biological systems it makes sense to consider many offspring at once, for example reproduction via numerous seeds in plants. Our analysis is not restricted to any particular offspring distribution. However, for the sake of simplicity, we restrict our example to the so called binary splitting, i.e. either two or no offspring. The use of the argument s is not obvious at this point. If we set s equal to 0, the probability generating function reduces to f(0) = p0, which is the extinction probability for a population of one individual in one time step. Since all individuals behave independently, is the extinction probability for a population of size N in one time step. Now looking at the extinction probability within two time steps, we note that with probability p2 we would have two individuals in the next time step originating from one individual. Hence, the extinction probability for a single individual within two time steps is,
and that of population with N individuals is,
Continuing for further time steps, we see that is the extinction probability for the system within t time steps.
As of now we assumed that individuals reproduce clonally i.e. giving rise to the same type. Now we continue investigating the extinction probability for a two-type process. Let us think of the two types A and B, where an A individual can produce any number of A or B individuals, and respectively for B. Then the general PGFs if the process starts with one type A or one type B individual are defined as
where denotes the probability of one A (B) individual producing kA A and kB B individuals in the next time step. Let us try to recover the extinction probability as for the one-type process. If we set both sA and sB equal to zero and assume that we start with one A individual, we obtain a similar result as above for the total extinction probability
Oftentimes, one is rather interested in the extinction, or non-presence, of just one particular type. Let us for example assume we are only interested in the presence of B individuals. The probability of having no B individuals in time step 1 is the sum over all probabilities, where no B offspring is being produced , starting with one A (B) individual. Now looking at the probability of having no B individuals in time step 2, we need to account for the probability of having kA A and kB B individuals being produced in the first time step. This leads to
Continuing this procedure and analogous to the one-type process, the probability of having no B individual in time t is .
In a similar fashion this procedure can be extended to a multi-type process with an arbitrary number of types. For further information and detailed insights into extinction of branching processes we refer to Refs. 27, 30.
Two dimensional fitness landscape
We begin with a minimal fitness landscape. Envision a wildtype ab which can mutate at the two loci to A and B, respectively. With both mutations, the system is in the final state of AB. In such a system there are two different paths as illustrated in Figure 1.
Traditionally, epistatic models are discussed in terms of different fitness values, whereas the mutation rates stay the same13,14. Exemplarily the fitness landscape for a system with sign epistasis is shown in Figure 1. In such a system where the mutation rates stay the same, i.e. and , it is clear that the path via Ab is the most probable one. However, if the mutation rates change, e.g. , also the path via aB can become accessible. Changing mutation rates amounts to including epistasis in the mutational landscape in addition to epistasis in the fitness landscape29.
For the four types of the above model, we need to consider four different PGFs, one for each type
where bi and di are the birth and death probabilities of type i. The exponent of 2 arises from a branching process with binary splitting. The arguments sab,…,sAB correspond to extinction probabilities of the respective type as discussed above. The functions fi correspond to the extinction probability of the whole process given that the process starts with a single individual of type i. The PGF fi at time t is recursively calculated as
Time Distribution
Using the generating functions we now approach the extinction time distribution of the binary branching process. Particularly starting with 1 wild type individual, the probability of having no AB-individual at time t is . Thus the probability of having at least 1 AB-individual at time t is 1−f(t). The probability, that at least 1 AB-individual appears exactly at time t is the probability, that there is an AB-individual at t minus the probability that there was already one at time t−1:
Starting with N wild type individuals the probability that there are no AB-individual at time t is then f(t)N. This leads to the time distribution as,
However, the arising AB should start a lineage that does not die out. Hence we are interested in the probability of having a successful AB-individual. To calculate this we use the known extinction probability of an AB-individual in place of sAB. The probability of an AB-individual going extinct is its death probability divided by its birth probability eAB: = dAB/bAB31. The modified PGFs for this purpose then read as
Note, that the PGF for the final mutant type is not necessary anymore. We can now calculate the time distribution until the first successful mutant appears the same way as described above. Figure 2 shows the perfect agreement between the recursive solution and 5000 simulations. The parameters, specified in the Figure 2's caption, are entirely arbitrarily chosen to reflect an epistatic fitness landscape as sketched in Figure 1. The reason we chose a very slightly advantageous fitness for the type Ab-individuals is solely to stress the fact, that this method holds for any fitness values, not only if some are restricted, for example to being neutral.
For a three-type continuous time branching process, as in , the time distribution was computed in Ref. 32. This was done using the analytical solution of the probability generating function for the two-type process 33 and the fact, that in continuous time mutations follow a Poisson distribution. Adding a second intermediate type, e.g. B2, would also give such a process but immediately results in unwieldy analytical calculations.
Path Probabilities
In the current example there are two possible paths by which the wildtype can reach the final mutant AB, either ab → Ab → AB or ab → aB → AB. Experimental evidence shows that not all paths are equally probable15,34. Beginning with ab then what is the probability of the first AB mutant arising via either path and how long does it take for the different pathways?
The probability, that the first mutant arises exactly at time t via pathway Ab is (derived in the SI),
where is defined in the Supporting Information (SI) and is being computed in a similar fashion as f(t). The total probability for this path is then the summation of ρAb(t)
Computationally the sum would go up to a tmax, where (where usually machine epsilon is chosen as ). The total extinction probability of a multi-type branching process is determined by the smallest fixed point of the probability generating functions f(s*) = s*, where is the extinction probability, if the process starts with one ab-individual27. Nevertheless those total extinction probabilities are not suitable for the question, via which path the first successful AB-mutant arises. The problem lies in the time; the pathway via Ab for example could have a very low extinction probability whereas the pathway via aB might have an extinction probability of 1/2. Intuitively one would expect the path via Ab to be more frequent. However, if the path via aB is much faster (e.g. due to ) one would actually find that each path happens with probability that approaches 1/2. Therefore, it is important to do the recursive analysis to include the probability, that a successful mutant did not arise through any other path beforehand.
Figure 3 shows the probability densities for the different pathways of the minimal model. Interestingly, the pathway via aB is predominantly prominent in the beginning but overall less likely. Hence if experiments are stopped after a short time interval then they might provide conclusions which can be upended by looking at the experiments at a later time point.
Multiple mutations in two dimensions
In the earlier model the wildtype had two possible mutations a → A and b → B. It is possible, that a to A and b to B are a multi-step process. Hence we can assume that it takes m mutations to go from a to A and n to go from b to B. Hence for m = n = 1 we recover the simple model as discussed above. The calculation of the time distribution can be directly transferred from the simple model by including all necessary probability generating functions for all available types. Increasing the length of the dimensions has a direct impact on the number of paths leading from the wildtype to the final mutant. In particular there are possible paths. Assuming in general m mutations in the A dimension and n in the B dimension we enumerate the paths as follows. Path 1 is the path where at first all A mutations and subsequently all B mutations happen. Path 2 is the path where all but one A mutations happen first, then one B, then the last A, and finally all other B mutations. Figure 4 shows the different paths for a system with four mutations for type A and one mutation for type B. Thus calculating the path probability for any particular path p now takes the form,
where f(t) is the probability generating function as in Eq. A.2 and is defined analogously to Eq. A.9 in the SI
Here, the probability generating functions with a p index belong to types along the regarded path (which in total are m + n + 1 without back mutations, beginning at 0, with which we always label the subindex for the wild type). Accordingly, probability generating functions with a q index are associated with types, that do not belong to the respective path (which are in total m × n). The probability generating function for the final mutant type is again replaced by the extinction probability of this type. We use our framework with this extension on the cancer initiation model proposed in Ref. 28. Therein a model with several mutational steps to reach state A and one mutational step for state B is analyzed (cf. Fig. 4). The direct change in fitness for the A mutations is (nearly) zero, and the B mutation alone is even deleterious. However, if an individual obtains all A mutations and the B mutation, the fitness is enhanced which in the model leads to rapid proliferation. Here, we provide an example on how the path probabilities change, when epistasis is not just in the fitness landscape but in the mutational landscape as well. Figure 5 compares the path probability distributions with and without epistasis in the mutational landscape. The fitness values, the birth and death probabilities respectively, as well as the “nonepistatic” mutation probabilities, are the same as in Ref. 28.
Multi dimensional fitness landscapes
The cancer landscape discussed above is a two dimensional system. In principle it is possible to extend this approach to higher dimensions. For fitness landscapes of higher orders15,35 it is still possible to write down the system of probability generating functions and apply the approach explained here. The concept remains the same. For each type the probability generating functions are needed except for the final mutant type, here only the extinction probability is necessary (SI). Finally the probability generating function for the wild type needs to be recursively calculated for the time distribution. For the path probabilities the probability generating functions related to types not along the considered path again are one time step behind, similar as in Eq. 16. However for these experimental fitness landscapes while we can get accurate data elucidating the fitness landscape, the mutational landscape is usually hard to determine.
Discussion
We have presented a theoretical framework to study mutational pathways in epistatic systems. The crucial part is that in our analysis epistasis affects not only fitness (i.e. proliferation and death rates) but also mutation rates. Hereby we could show, that pathways become accessible, that without mutational epistatic effects are mostly unlikely to happen (cf. e.g. Figure 5). Our analysis is based on multi-type branching processes and hence it does not rely on the assumption of a constant population size.
While we have focused on a fairly simple system with a fitness landscape with a single peak, the approach can be extended to a rugged fitness landscape. Moreover, if back mutations are involved, one can still calculate the time distribution, although pathways are not clearly defined in a system with back mutations anymore (see SI). Furthermore in the current scenario in each time step the individuals could replicate or die. In addition we could have a resting probability where the individuals remain in the same state with a certain probability. Such complicated scenarios can be incorporated in our framework as well (SI). The computations can be precisely represented in analytic terms and need to be solved recursively.
We apply our framework to a cancer model including mutational epistasis28 and show how the path probabilities are altered by it. Mutational epistasis can thus lead to heterogeneity in the density of different mutant types between different age groups as reaching the final mutant early is only possible by one mutational pathway which is not possible at later time points.
As shown here the mutational landscape can undermine the current predictions based solely on fitness landscapes. Just like in long term evolution, experimental as well as theoretical approaches ought to be balanced between studying effects of selection and the strengths of mutations. The theoretical analysis based on the approach explained here helps in understanding the importance of mutational epistasis, even though the computations have to be solved recursively. In particular, it makes analyzing the fitness and mutational landscapes more interactive, since long-lasting simulations are not necessary any more.
Supplementary Material
Acknowledgments
We thank Laura Hindersin and Arne Traulsen for providing constructive comments on the manuscript. Funding from the Max Planck Society, the New Zealand Institute for Advanced Study and the DFG Priority Programme 1590 Probabilistic Structures in Evolution (Grant GO2270/1-1) is gratefully acknowledged.
Footnotes
The authors declare no competing financial interests.
Author Contributions B.B. did the mathematical analysis and performed simulations. B.B. and C.S.G. developed the recursive algorithm and wrote the manuscript.
References
- Beatty J. Replaying life's tape. J. Phil. 103, 336–362 (2006). [Google Scholar]
- Lenski R. E., Rose M. R., Simpson S. C. & Tadler S. C. Long-term experimental evolution in escherichia coli. I. adaptation and divergence during 2,000 generations. Am. Nat. 138, 1315–1341 (1991). [Google Scholar]
- Cooper T. F., Rozen D. E. & Lenski R. E. Parallel changes in gene expression after 20,000 generations of evolution in Escherichia coli. Proc. Natl. Acad. Sci. 100, 1072–1077 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Blount Z. D., Barrick J. E., Davidson C. J. & Lenski R. E. Genomic analysis of a key innovation in an experimental Escherichia coli population. Nature 489, 513–518 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Meyer J. R. et al. Repeatability and contingency in he evolution of a key innovation in phage lambda. Science 335, 428–432 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Travisano M. & Shaw R. G. Lost in the map. Evolution 67, 305–314 (2013). [DOI] [PubMed] [Google Scholar]
- Haldane J. B. S. A mathematical theory of natural and artificial selection. v. selection and mutation. Proc. Cam. Phil. Soc. 23, 838–844 (1927). [Google Scholar]
- Fisher R. A. The Genetical Theory of Natural Selection (Clarendon Press, Oxford, 1930). [Google Scholar]
- Travisano M., Mongold J. A., Bennett A. F. & Lenski R. E. Experimental tests of the roles of adaptation, chance, and history in evolution. Science 267, 87–90 (1995). [DOI] [PubMed] [Google Scholar]
- Weinreich D. M., Watson R. & Chao L. Perspective: sign epistasis and genetic constraint on evolutionary trajectories. Evolution 56, 1165–1174 (2005). [PubMed] [Google Scholar]
- Fisher R. A. The correlation between relatives on the supposition of mendelian inheritance. Trans. Roy. Soc. Edinburgh 52, 399–433 (1918). [Google Scholar]
- Jain K. & Krug J. Deterministic and stochastic regimes of asexual evolution on rugged fitness landscapes. Genetics 175, 1275–1288 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Szendro I. G., Franke J., de Visser J. A. G. M. & Krug J. Predictability of evolution depends nonmonotonically on population size. Proc. Natl. Acad. Sci. 110, 571–576 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Poelwijk F. J., Kiviet D. J., Weinreich D. M. & Tans S. J. Empirical fitness landscales reveal accessible evolutionary paths. Nature 445, 383–386 (2007). [DOI] [PubMed] [Google Scholar]
- Weinreich D., Delaney N., DePristo M. & Hartl D. Darwinian evolution can follow only very few mutational paths to fitter proteins. Science 312, 111–114 (2006). [DOI] [PubMed] [Google Scholar]
- Park S.-C. & Krug J. Clonal interference in large populations. Proc. Natl. Acad. Sci. 104, 18135–18140 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Imhof M. & Schlotterer C. Fitness effects of advantageous mutations in evolving Escherichia coli populations. Proc. Natl. Acad. Sci. 98, 1113–1117 (2001). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Elena S. F. & Lenski R. E. Evolution experiments with microorganisms: the dynamics and genetic bases of adaptation. Nat. Rev. Gen. 4, 457–469 (2003). [DOI] [PubMed] [Google Scholar]
- Hegreness M., Shoresh N., Hartl D. & Kishony R. An equivalence principle for the incorporation of favorable mutations in asexual populations. Science 311, 1615–1617 (2006). [DOI] [PubMed] [Google Scholar]
- Gerrish P. J. & Lenski R. E. The fate of competing beneficial mutations in an asexual population. Genetica 102–103, 127–144 (1998). [PubMed] [Google Scholar]
- Iwasa Y., Michor F. & Nowak M. A. Stochastic tunnels in evolutionary dynamics. Genetics 166, 1571–1579 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weinreich D. M. & Chao L. Rapid evolutionary escape by large populations from local fitness peaks is likely in nature. Evolution 59, 1175–1182 (2005). [PubMed] [Google Scholar]
- Desai M. M., Fisher D. S. & Murray A. W. The Speed of Evolution and Maintenance of Variation in Asexual Populations. Curr. Biol. 17, 385–394 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gokhale C. S., Iwasa Y., Nowak M. A. & Traulsen A. The pace of evolution across fitness valleys. J Theor Biol 259, 613–620 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weissman D. B., Desai M. M., Fisher D. S. & Feldman M. W. The rate at which asexual populations cross fitness valleys. Theor. Pop. Biol. 75, 286–300 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lynch M. & Abegg A. The Rate of Establishment of Complex Adaptations. Mol. Biol. and Evol. 27, 1404–1414 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Haccou P., Jagers P. & Vatutin V. A. Branching processes: variation, growth, and extinction of populations, vol. 5 (Cambridge University Press, Cambridge, 2005). [Google Scholar]
- Bauer B., Siebert R. & Traulsen A. Cancer initiation with epistatic interactions between driver and passenger mutations. J Theor Biol 358C, 52–60 (2014). [DOI] [PubMed] [Google Scholar]
- Sasaki A. & Nowak M. A. Mutation landscapes. J Theor Biol 224, 241–7 (2003). [DOI] [PubMed] [Google Scholar]
- Kimmel M. & Axelrod D. E. Branching Processes in Biology (Springer, NY, 2002). [Google Scholar]
- Athreya K. B. & Ney P. E. Branching Processes (Springer, Berlin, 1972). [Google Scholar]
- Bozic I. et al. Evolutionary dynamics of cancer in response to targeted combination therapy. Elife 2, (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Antal T. & Krapivsky P. Exact solution of a two-type branching process: models of tumor progression. J. Stat. Mech.: Theory and Experiment 2011, P08018 (2011). [Google Scholar]
- Lee T. H., DSouza L. M. & Fox G. E. Equally parsimonious pathways through an rna sequence space are not equally likely. J. Mol. Evol. 45, 278–284 (1997). [DOI] [PubMed] [Google Scholar]
- Khan A. I., Dinh D. M., Schneider D., Lenski R. E. & Cooper T. F. Negative epistasis between beneficial mutations in an evolving bacterial population. Science 332, 1193–1196 (2011). [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.