Skip to main content
Genetics logoLink to Genetics
. 2017 Apr 24;206(2):1049–1079. doi: 10.1534/genetics.116.199497

Genotypic Complexity of Fisher’s Geometric Model

Sungmin Hwang *, Su-Chan Park †,1, Joachim Krug *
PMCID: PMC5499163  PMID: 28450460

In his celebrated model of adaptation, Fisher assumed a smooth phenotype fitness map with one optimum. This assumption is at odds with the rugged.....

Keywords: fitness landscape, genotype–phenotype map, epistasis, adaptation, fitness peaks

Abstract

Fisher’s geometric model was originally introduced to argue that complex adaptations must occur in small steps because of pleiotropic constraints. When supplemented with the assumption of additivity of mutational effects on phenotypic traits, it provides a simple mechanism for the emergence of genotypic epistasis from the nonlinear mapping of phenotypes to fitness. Of particular interest is the occurrence of reciprocal sign epistasis, which is a necessary condition for multipeaked genotypic fitness landscapes. Here we compute the probability that a pair of randomly chosen mutations interacts sign epistatically, which is found to decrease with increasing phenotypic dimension n, and varies nonmonotonically with the distance from the phenotypic optimum. We then derive expressions for the mean number of fitness maxima in genotypic landscapes comprised of all combinations of L random mutations. This number increases exponentially with L, and the corresponding growth rate is used as a measure of the complexity of the landscape. The dependence of the complexity on the model parameters is found to be surprisingly rich, and three distinct phases characterized by different landscape structures are identified. Our analysis shows that the phenotypic dimension, which is often referred to as phenotypic complexity, does not generally correlate with the complexity of fitness landscapes and that even organisms with a single phenotypic trait can have complex landscapes. Our results further inform the interpretation of experiments where the parameters of Fisher’s model have been inferred from data, and help to elucidate which features of empirical fitness landscapes can be described by this model.


A fundamental question in the theory of evolutionary adaptation concerns the distribution of mutational effect sizes and the relative roles of mutations of small vs. large effects in the adaptive process (Orr 2005). In his seminal 1930 monograph, Ronald Fisher devised a simple geometric model of adaptation in which an organism is described by n phenotypic traits and mutations are random displacements in the trait space (Fisher 1930). Each trait has a unique optimal value and the combination of these values defines a single phenotypic fitness optimum that constitutes the target of adaptation. Because random mutations act pleiotropically on multiple traits, the probability that a given mutation brings the phenotype closer to the target decreases with increasing n. Fisher’s analysis showed that, for large n, the mutational step size in units of the distance to the optimum must be smaller than 1/n for the mutation to be beneficial with an appreciable probability. He thus concluded that the evolution of complex adaptations involving a large number of traits must rely on mutations of small effect. This conclusion was subsequently qualified by the realization that small effect mutations are likely to be lost by genetic drift, and therefore mutations of intermediate size contribute most effectively to adaptation (Kimura 1983; Orr 1998, 2000).

During the past decade, Fisher’s geometric model (FGM) has become a standard reference point for theoretical and experimental work on fundamental aspects of evolutionary adaptation (Tenaillon 2014). In particular, it has been found that FGM provides a versatile and conceptually simple mechanism for the emergence of epistatic interactions between genetic mutations in their effect on fitness (Martin et al. 2007; Gros et al. 2009; Blanquart et al. 2014). For this purpose, two extensions of Fisher’s original formulation of the model have been suggested. First, phenotypes are assigned an explicit fitness value, which is usually taken to be a smooth function on the trait space with a single maximum at the optimal phenotype. Second, and more importantly, mutational effects on the phenotypes are assumed to be additive. As a consequence, any deviations from additivity that arise on the level of fitness are solely due to the nonlinear mapping from phenotype to fitness, or, in mathematical terms, due to the curvature of the fitness function. Because the curvature is largest around the phenotypic optimum, epistasis generally increases upon approaching the optimal phenotype and is weak far away from the optimum. Several recent studies have made use of the framework of FGM to interpret experimental results on pairwise epistastic interactions and to estimate the parameters of the model from data (Martin et al. 2007; Velenich and Gore 2013; Weinreich and Knies 2013; Perfeito et al. 2014; Schoustra et al. 2016).

A particularly important form of epistatic interaction is sign epistasis, where a given mutation is beneficial or deleterious depending on the genetic background (Weinreich et al. 2005). Two types of sign epistasis are distinguished depending on whether one of the mutations affects the effect sign of the other, but the reverse is not true [simple sign epistasis (SSE)]; or whether the interaction is reciprocal [reciprocal sign epistasis (RSE)]. For a pictorial representation of the two kinds of sign epistasis, see, for example, Poelwijk et al. (2007). Sign epistasis can arise in FGM either between large effect beneficial mutations that in combination overshoot the fitness optimum, or between mutations of small fitness effect that display antagonistic pleiotropy (Blanquart et al. 2014). The presence of sign epistasis is a defining feature of genotypic fitness landscapes that are complex, in the sense that not all mutational pathways are accessible through simple hill climbing, and multiple genotypic fitness peaks may exist (Weinreich et al. 2005; Franke et al. 2011; de Visser and Krug 2014). Specifically, RSE is a necessary condition for the existence of multiple fitness peaks (Poelwijk et al. 2011; Crona et al. 2013).

Following common practice, here a genotypic fitness landscape is understood to consist of the assignment of fitness values to all combinations of L haploid, biallelic loci that together constitute the L-dimensional genotype space. A peak in such a landscape is a genotype that has higher fitness than all its L neighbors that can be reached by a single point mutation (Kauffman and Levin 1987). Note that, in contrast to the continuous phenotypic space on which FGM is defined, the space of genotypes is discrete.

Blanquart et al. (2014) showed that an ensemble of L-dimensional genotypic landscapes can be constructed from FGM by combining subsets of L randomly chosen mutational displacements. Each sample of L mutations defines another realization of the landscape ensemble, and the exploratory simulations reported by Blanquart et al. (2014) indicate a large variability among the realized landscapes. Nevertheless, some general trends in the properties of the genotypic landscapes were identified. In particular, as expected on the basis of the considerations outlined above, the genotypic landscapes are essentially additive when the focal phenotype representing the unmutated wild type is far away from the optimum and become increasingly rugged as the optimal phenotype is approached.

In this article we present a detailed and largely analytic study of the properties of genotypic landscapes generated under FGM. The focus is on two types of measures of landscape complexity, that is, the fraction of sign-epistatic pairs of random mutations and the number of fitness maxima in the genotypic landscape. A central motivation for our investigation is to assess the potential of FGM and related phenotypic models to explain the properties of empirical genotypic fitness landscapes of the kind that have been recently reported in the literature (Szendro et al. 2013; Weinreich et al. 2013; de Visser and Krug 2014). The ability of nonlinear phenotype-fitness maps to explain epistatic interactions among multiple loci has been demonstrated for a virus (Rokyta et al. 2011) and for an antibiotic resistance enzyme (Schenk et al. 2013), but a comparative study of several different data sets using approximate Bayesian computation (ABC) has questioned the broader applicability of phenotype-based models (Blanquart and Bataillon 2016). It is thus important to develop a better understanding of the structure of genotypic landscapes generated by phenotypic models such as FGM.

In the next section we describe the mathematical setting and introduce the relevant model parameters: the phenotypic and genotypic dimensionalities n and L, the distance of the focal phenotype to the optimum, and the standard deviation (SD) of mutational displacements. As in previous studies of FGM, specific scaling relations among these parameters have to be imposed to arrive at meaningful results for large n and L. We then present analytic results for the probability of sign epistasis and the behavior of the number of fitness maxima for large L, both in the case of fixed phenotypic dimension n and for a situation where the joint limit n,L is taken at constant ratio α=n/L.

Similar to other probabilistic models of genotypic fitness landscapes (Kauffman and Levin 1987; Weinberger 1991; Evans and Steinsaltz 2002; Durrett and Limic 2003; Limic and Pemantle 2004; Neidhart et al. 2014), the number of maxima generally increases exponentially with L, and we use the exponential growth rate as a measure of genotypic complexity. We find that this quantity displays several phase transitions as a function of the parameters of FGM which separate parameter regimes characterized by qualitatively different landscape structures. Depending on the regime, the genotypic landscapes induced by FGM become more or less rugged with increasing phenotypic dimension. This indicates that the role of the number of phenotypic traits in shaping the fitness landscapes of FGM is much more subtle than has been previously appreciated, and that the sweeping designation of n as (phenotypic) “complexity” can be misleading. Further implications of our study for the theory of adaptation and the interpretation of empirical data will be elaborated in the Discussion.

Model

Basic properties of FGM

In FGM, the phenotype of an organism is modeled as a set of n real-valued traits and represented by a vector y=(y1,y2,,yn) in the n-dimensional Cartesian space, yn. The fitness W(y) is assumed to be a smooth, single-peaked function of the phenotype y. By choosing an appropriate coordinate system, the optimum phenotype, i.e., the combination of phenotypic traits with the highest fitness value, can be placed at the origin in n. We also assume that the fitness W(y) depends on the distance to the optimum |y| but not on the direction of y, which can be justified by arguments based on random matrix theory (Martin 2014). The uniqueness of the phenotypic optimum at the origin implies that W(y) is a decreasing function of |y|. The form of the fitness function will be specified below when needed. Most of the results presented in this article are, however, independent of the explicit shape of W(y), as they rely solely on the relative ordering of different genotypes with respect to their fitness.

When a mutation arises the phenotype of the mutant becomes y+ξ, where y is the parental phenotype and the mutational vector ξ corresponds to the change of traits due to the mutation. The key result derived by Fisher (1930) concerns the fraction Pb of beneficial mutations arising from a wild-type phenotype located at distance d from the optimum. Assuming that mutational displacements have a fixed length |ξ|=r and random directions, he showed that for n1

Pb=12πxet2/2dt=12erfc(x/2), (1)

where erfc denotes the complementary error function and x=rn/(2d). Thus, for large n the mutational step size has to be much smaller than the distance to the optimum, rd/nd, for the mutation to have a chance of increasing fitness.

As has become customary in the field, we here assume that the mutational displacements are independent and identically distributed random variables drawn from an n-dimensional Gaussian distribution with zero mean. The covariance matrix can be taken to be of diagonal form σ2I, where I is the n-dimensional identity matrix and σ2 is the variance of a single trait (Blanquart et al. 2014). In the limit n, the form of the distribution of the mutational displacements becomes irrelevant owing to the central limit theorem (CLT), and therefore Fisher’s result of Equation 1 also holds in the present setting of Gaussian mutational displacements of mean size r=σn (Waxman and Welch 2005; Ram and Hadany 2015); an explicit derivation will be provided below. Because lengths in the phenotype space can be naturally measured in units of σ, the parameters d and σ should always appear as the ratio d/σ, as can be seen in Equation 1. Thus, without loss of generality, we can set σ=1. In the following we denote the (scaled) wild-type phenotype by Q, its distance to the optimum by

Q=|Q|=dσ, (2)

and draw the displacement vectors ξ from the n-dimensional Gaussian density p(ξ) with unit covariance matrix.

By normalizing phenotypic distances to the SD σ of the mutational effect on a single trait, we are adopting a particular pleiotropic scaling that has been referred to as the “Euclidean superposition model” (Hermisson and McGregor 2008; Wagner et al. 2008). An alternative choice which is closer to Fisher’s original formulation but appears to have less empirical support is the “total effect model,” wherein the total length r of the mutational displacements is taken to be independent of n. Since r=σn, this implies that the single trait effect size decreases with n as σ1/n. As a consequence, the parameter Q defined by Equation 2 becomes n dependent and increases as n, provided d does not depend on n (Orr 2000). The results presented below will always be given in terms of ratios of the basic parameters of FGM, such that their translation to the total effect model is in principle straightforward. We will nevertheless explicitly point out instances where the two settings give rise to qualitatively different behaviors.

The genotypic fitness landscape induced by FGM

To study epistasis within FGM, Fisher’s original definition has to be supplemented with a rule for how the effects of multiple mutations are combined. Based on earlier work (Lande 1980) in quantitative genetics, Martin et al. (2007) introduced the assumption that mutations act additively on the level of the phenotype. Thus the phenotype arising from two mutations ξ1 and ξ2 applied to the wild-type Q is simply given by Q+ξ1+ξ2. This definition suffices to associate an L-dimensional genotypic fitness landscape to any set of L mutational displacements ξ1,ξ2,,ξL (Blanquart et al. 2014). For this purpose the haploid genotype τ is represented by a binary sequence with length L, τ=(τ1,τ2,,τL) with τi=1 (τi=0) in the presence (absence) of the ith mutation. For the wild type τi=0 for all i, and in general the phenotype vector associated with the genotype τ reads

z(τ)=Q+i=1Lτiξi. (3)

Two examples illustrating this genotype–phenotype map and the resulting genotypic fitness landscapes with L=3 and n=2 are shown in Figure 1.

Figure 1.

Figure 1

Examples of three-dimensional genotypic fitness landscapes induced by FGM with two phenotypic dimensions (L = 3 and n = 2). The panels show the projection of the discrete genotype space onto the phenotype plane, where the phenotypic optimum is represented by a black ●. In the left panel, the binary sequence notation for genotypes is indicated. The wild-type genotype 000, marked by a green ▴, is located at distance Q from the phenotypic optimum. The nodes represented by red ▪’s are local fitness maxima of the genotypic landscapes, as can be seen from the contour lines of constant fitness. In the right panel, the mutant phenotypes overshoot the optimum, whereas in the left panel they do not.

As can be seen from the figure, the projection of the discrete genotype space onto the continuous phenotype space can give rise to multiple genotypic fitness maxima, although the phenotypic landscape is single peaked. It is the assumption of a finite (and hence discrete) set of phenotypic mutation vectors that distinguishes our setting from much of the earlier work on FGM, where mutations are drawn from a continuum of alleles (Fisher 1930; Orr 1998, 2000, 2005) and the probability of further improvement (as given by Equation 1) vanishes only strictly at the phenotypic optimum. Remarkably, our analysis shows that the conventional setting is not simply recovered by taking the number of mutational vectors L to infinity; rather, the number of genotypic fitness maxima is found to increase exponentially with L.

Since fitness decreases monotonically with the distance to the optimum phenotype, a natural proxy for fitness is the negative squared magnitude of the phenotype vector

|z(τ)|2=|Q|22i=1L(Qξi)τii,j=1L(ξiξj)τiτj, (4)

where xy denotes the scalar product between two vectors x and y. This quantity is thus seen to consist of a part that is additive across loci with coefficients given by the scalar products Qξi, and a pairwise epistatic part with coefficients ξiξj.

It is instructive to decompose Equation 4 into contributions from the mutational displacements parallel and perpendicular to Q. Writing ξi=ξiQ1Q+ξi with Qξi=0, Equation 4 can be recast into the form

|z(τ)|2=(Q+i=1Lξiτi)2i,j=1L(ξiξj)τiτj. (5)

The first term on the right-hand side contains both additive and epistatic contributions associated with displacements along the Q direction. The second term is dominated by the diagonal contributions with i=j and is of order L(n1) because |ξi|2=n1 on average.

We now show how the first term on the right-hand side of Equation 5 can be made to vanish for a range of Q. For this purpose, consider the subset of phenotypic displacement vectors for which the component ξi in the direction of Q is negative. There are on average L/2 such mutations, and the expected value of each component is

20dyy2πey2/2=2π2q0, (6)

where the factor 2 in front of the integral arises from conditioning on ξi<0. Setting τi=1 for s out of these L/2 vectors and τi=0 for all other mutations, the sum inside the brackets in Equation 5 becomes approximately equal to 2q0s, which cancels the Q term for s=Q/(2q0). Since s can be at most L/2 in a typical realization, such genotypes can be constructed with a probability approaching unity provided Q<q0L.

We will see below that the structure of the genotypic fitness landscapes induced by FGM depends crucially on whether or not the phenotypes of multiple mutants are able to closely approach the phenotypic optimum. Assuming that the contributions from the perpendicular displacements in Equation 5 can be neglected, which will be justified shortly, the simple argument given above shows that a close approach to the optimum is facile when Q<q0L, but becomes unlikely when Qq0L. This observation hints at a possible transition between different types of landscape topographies at some value of Q which is proportional to L. The existence and nature of this transition is a central theme of this article.

Scaling limits

Since we are interested in describing complex organisms with large phenotypic and genotypic dimensions, appropriate scaling relations have to be imposed to arrive at meaningful asymptotic results. Three distinct scaling limits will be considered.

  1. Fisher’s classic result (Equation 1) shows that the distance of the wild type from the phenotypic optimum has to be increased with increasing n to maintain a nonzero fraction of beneficial mutations for n. In our notation Fisher’s parameter is
    x=n2Q (7)
    • and hence Fisher scaling implies taking n,Q at fixed ratio n/Q. We will extend Fisher’s analysis by computing the probability of sign epistasis between pairs of mutations for fixed x and large n, which amounts to characterizing the shape of genotypic fitness landscapes of size L=2.

  2. We have argued above that the distance toward the phenotypic optimum that can be covered by typical multiple mutations is of order L, and hence the limit L is naturally accompanied by a limit Q at fixed ratio
    q=QL. (8)
    • From a biological point of view, one expects that Ln1, which motivates considering the limit L,Q at constant phenotypic dimension n. Under this scaling, the first term on the right-hand side of Equation 5 is of order L2, whereas the contribution from the perpendicular displacements is only (n1)L. Thus in this regime the topography of the fitness landscape is determined mainly by the one-dimensional mutational displacements in the Q direction, which is reflected by the fact that the genotypic complexity is independent of n to leading order and coincides with its value for the case n=1, in which the perpendicular contribution in Equation 5 does not exist (see Results).

  3. By contrast, the perpendicular displacements play an important role when both the phenotypic and genotypic dimensions are taken to infinity at fixed ratio
    α=nL. (9)
    • Combining this with the limit Q at fixed q=Q/L, both terms on the right-hand side of Equation 5 are of the same order L2. Fisher’s parameter (Equation 7) is then also a constant given by x=α/(2q).

Preliminary considerations about genotypic fitness maxima

To set the stage for the detailed investigation of the number of genotypic fitness maxima in Results, it is useful to develop some intuition for the behavior of this quantity based on the elementary properties of FGM that have been described so far. For this purpose we consider the probability Pwt for the wild type to be a local fitness maximum, which is equal to the probability that all the L mutations are deleterious. Since mutations are statistically independent, we have

Pwt=(1Pb)L=2L[1+erf(x/2)]L, (10)

where erf=1erfc is the error function. Under the (highly questionable) assumption that this estimate can be applied to all 2L genotypes in the landscape, we arrive at the expression

Nwt=2LPwt=[1+erf(x/2)]L (11)

for the expected number of genotypic fitness maxima.

Consider first the scaling limit 2, where x=n/(2Q)=n/(2qL)0. Expanding the error function for small arguments as erf(y)2y/π we obtain

Nwt(1+2x2π)Lexp(q0nq) (12)

for L, where q0=1/2π was defined in Equation 6. We will show below that this expression correctly captures the asymptotic behavior for very large q but generally grossly underestimates the number of maxima. The reason for this is that for moderate values of q (in particular for q<q0), the relevant mutant phenotypes are much closer to the origin than the wild type, which entails a mechanism for generating a large number of fitness maxima that grows exponentially with L.

Such an exponential dependence on L is expected from Equation 11 in the scaling limit 3, where x=α/(2q) is a nonzero constant and the expression in the square brackets is >1. Although this general prediction is confirmed by the detailed analysis for this case, the behavior of the number of maxima predicted by Equation 11 will again turn out to be valid only when q is very large. In particular, whereas Equation 11 is an increasing function of α for any q, we will see below that the expected number of maxima actually decreases with increasing phenotypic dimension (hence increasing α) in a substantial range of q. In qualitative terms, this can be attributed to the effect of the perpendicular displacements in Equation 5, which grows with α and makes it increasingly more difficult for the mutant phenotypes to closely approach the origin.

The observation that the number of genotypic fitness maxima grows exponentially with L in most cases motivates us to make use of the corresponding growth rate as a measure of the ruggedness of the landscape. We therefore define the genotypic complexity Σ* through the limiting relation

Σ*=limLlnNL, (13)

where N is the average number of genotypic fitness maxima and L is the sequence length. Since the total number of binary genotypes is 2L, the complexity is bounded from above by ln2. If any genotype had the same probability Pmax of being a fitness maximum (which is in fact not the case for FGM), we could write N=2LPmax and hence Pmaxexp[(ln2Σ*)L].

Data availability

The authors state that all data necessary for confirming the conclusions presented in the article are represented fully within the article. All numerical calculations including simulations described in this work were implemented in Mathematica and C++. When counting the number of local genotypic maxima, we checked all genotypes and counted the exact number for a randomly realized landscape, then took an average. All relevant source codes are available upon request.

Results

Preliminary note

In the following sections our results on the structure of genotypic fitness landscapes induced by FGM are stated in precise mathematical terms and the key steps of their derivation are outlined, with some technical details relegated to the appendices. To facilitate the navigation through the inevitable mathematical formalism, we display the definitions of the most commonly used mathematical symbols in Table 1. Moreover, we provide numbered summaries at the end of each subsection which state the main results without resorting to mathematical expressions.

Table 1. List of mathematical symbols.

Symbols Description
n Number of phenotypic traits, also referred to as phenotypic dimension
L Length of the binary genetic sequence, also referred to as genotypic dimension
α Ratio of phenotypic to genotypic dimension (α=n/L)
Q, Q n-dimensional vector Q representing the wild-type phenotype and its magnitude Q=|Q|
q, q Wild-type phenotype vector in units of L (q=Q/L) and its magnitude (q=|q|)
τ Genotype represented by a binary sequence of length L
τi Binary number indicating absence (0) or presence (1) of a mutation at site i (i=1,,L) of the genotype τ
z(τ), z(τ) Phenotype vector corresponding to the genotype τ (Equation 3) and its magnitude z=|z|
ξ, ξ Random phenotypic displacement vector representing a mutation and its magnitude ξ=|ξ|
p(ξ) Probability density of the random vector ξ. In this article, the density is Gaussian with unit covariance matrix
x Fisher’s scaling parameter; in our notation x=n/(2Q)
N Total number of fitness maxima in a genotypic fitness landscape averaged over all realizations of sets of ξ’s
Σ* Genotypic complexity defined as the ratio of lnN to L for L; see Equation 13
qc Transition point of q that separates regimes I and II
q0 Half of the average mutational displacement ξi of a single trait conditioned on being positive (q0=1/2π)
ρ Fraction of mutations that are present in a genotype τ, ρ=L1i=1Lτi
ρ* Mean value of ρ of a local maximum, also referred to as mean genotypic distance from the wild type
z* Mean value of z(τ) of a local maximum, also referred to as mean phenotypic distance from the optimum

Sign epistasis

Random mutations:

We first study the local topography of the fitness landscape around the wild type, focusing on the epistasis between two random mutations with phenotypic displacements ξ and η. Since fitness is determined by the magnitude of a phenotypic vector, i.e., the distance of the phenotype from the origin, the epistatic effect of the two mutations can be understood by analyzing how the magnitudes of the four vectors Q, Q+ξ, Q+η, and Q+ξ+η are ordered. To this end, we introduce the quantities

R11n(|ξ+Q|2Q2),R21n(|η+Q|2Q2),andR1n(|ξ+η+Q|2Q2), (14)

where division by n guarantees the existence of a finite limit for n. The sign of these quantities determines whether a mutation is beneficial or deleterious. For example, if R1<0, the mutation ξ is beneficial; if R>0, the two mutations combined together confer a deleterious effect; and so on. We will see later that R1,2 and R are actually closely related to the selection coefficients of the respective mutations.

We proceed to express the different types of pairwise epistasis defined by Weinreich et al. (2005) and Poelwijk et al. (2007) in terms of conditions on the quantities defined in Equation 14. Without loss of generality we assume R1<R2 and consider first the case where both mutations are beneficial, R1<R2<0. Then magnitude epistasis (ME), the absence of sign epistasis, applies when the fitness of the double mutant is higher than that of each of the single mutants, i.e., R<R1<R2<0. Similarly, for two deleterious mutations the condition for ME reads R>R2>R1>0. When one mutant is deleterious and the other beneficial, in the case of ME, the double mutant fitness has to be intermediate between the two single mutants, which implies that R1<R<R2 when R2>0>R1.

The condition for RSE reads R>R2>R1 when both single mutants are beneficial and R<R1<R2 when both are deleterious, and the remaining possibility R1<R<R2 corresponds to SSE between two mutations of the same sign. If the two single mutant effects are of different signs, RSE is impossible and SSE applies when R<R1<0<R2 or R>R2>0>R1. Figure 2 depicts the different categories of epistasis as regions in the (R2,R) plane. Note that the corresponding picture for R1>R2 is obtained by exchanging R1R2.

Figure 2.

Figure 2

Domains in the (R2,R) plane contributing to different types of epistasis: ME, SSE, and RSE. The two panels illustrate the two cases: (A) R1>0 and (B) R1<0. The red solid lines indicate R=R1+R2. The labeling of the domains D1,,D6 is used in the derivation in Appendix B.

To find the probability of each epistasis, we require the joint probability density P(R1,R2,R). In Appendix A it is shown that

P(R1,R2,R)=x2n1/242π3/2e18n(R+R1+R2)2x22((R11)2+(R21)2)×[1+O(1n)], (15)

which can be obtained rather easily by resorting to the CLT. The applicability of the CLT follows from the fact that R1,2 and R are sums of a large number of independent terms for n (Waxman and Welch 2005; Ram and Hadany 2015). According to the CLT, it is sufficient to determine the first and second cumulants of these quantities. Denoting averages by angular brackets, we find the mean Ri=1, the variance Ri2Ri2=1/x2, and the covariance R1R2R1R2=0 (i=1,2). Similarly, the corresponding quantities evaluated for RR1R2 are RR1R2=0, (RR1R2)2RR1R22=4/n, and (RR1R2)RiRR1R2Ri=0 (i=1,2). With an appropriate normalization constant, this leads directly to Equation 15.

As a first application, we rederive Fisher’s Equation 1 by integrating P(R1,R2,R) over the region R1<0 for all R2 and R, which indeed yields

Pb=0dR1dR2dRP(R1,R2,R)=12erfc(x2).

An immediate conclusion from the form of P(R1,R2,R) is that it is unlikely to observe sign epistasis for large n, because P(R1,R2,R) becomes concentrated along the line R=R1+R2 as n increases. As can be seen in Figure 2, this line touches the region of SSE in one point for R1<0, whereas it maintains a finite distance to the region of RSE everywhere. This indicates that the probability of RSE decays more rapidly with increasing n than the probability of SSE. Moreover, one expects the latter probability to be proportional to the width of the region around the line R=R1+R2, where the joint probability in Equation 15 has appreciable weight, which is of order 1/n.

To be more quantitative, we need to integrate P(R1,R2,R) over the domains in Figure 2 corresponding to the different categories of epistasis. In Appendix B, we obtain the asymptotic expressions

PRSE=2x2πnex2+O(n3/2) (16)

and

PSSE=4xπnex2/2+O(n1) (17)

for the probabilities of RSE (PRSE) and SSE (PSSE). Due to the nonlinearity of the phenotype-fitness map, FGM does not allow for strictly nonepistatic combination of fitness effects. The probability of ME, therefore, is given by PME=1PRSEPSSE. Interestingly, the probability of sign epistasis varies nonmonotonically with x. To confirm our analytic results, we compare our results with simulations in Figure 3, which shows an excellent agreement.

Figure 3.

Figure 3

Comparison of analytic results for the probability of epistasis with simulations. Depicted are probabilities of SSE (PSSE) and RSE (PRSE) between two randomly chosen mutations among nearest neighbor genotypes of the wild type (A) as functions of n for fixed Fisher parameter x=0.5 and (B) as functions of x for fixed phenotypic dimension n=640. For each parameter set, 104 randomly generated landscapes were analyzed. The asymptotic expressions provide accurate approximations even for moderate n>10. The nonmonotonic behavior with respect to x means that the probabilities are nonmonotonic functions of Q for fixed n and vice versa.

Similarly, we can calculate the probabilities of sign epistasis conditioned on both mutations being beneficial, which in our setting means R2<0. The conditioning requires normalization by the unconditional probability of two random mutations being beneficial, which is given by the square of Pb in Equation 1. Hence

PRSEb=2Pr(D1)Pb24x2πnerfc(x/2)2ex2 (18)

and

PSSEb=2Pr(D5)Pb24xπnerfc(x/2)ex2/2, (19)

where Pr(Di) denotes the integral of the joint probability density over the domain Di in Figure 2 (see Appendix B).

As anticipated from the form of Equation 15, the fraction of sign-epistatic pairs of mutations decreases with increasing phenotypic dimension n, and this decay is faster for RSE (1/n) than for SSE (1/n). At first glance this might seem to suggest that FGM has little potential for generating rugged genotypic fitness landscapes. However, as we will see below, the results obtained in this section apply only to the immediate neighborhood of the wild-type phenotype. They are modified qualitatively in the presence of a large number of mutations that are able to substantially displace the phenotype and allow it to approach the phenotypic optimum.

Mutations of fixed effect size:

As a slight variation to the previous setting, one may consider the fraction of sign epistasis conditioned on the two single mutations to have the same selection strength, as recently investigated by Schoustra et al. (2016). In our notation this implies that R1=R2R, and it is easy to see that sign epistasis is always reciprocal in this case. If the two mutations are beneficial, R<0, and the condition for (reciprocal) sign epistasis is R>R. The corresponding probability is

PRSE(R)=RP(R,R,R)dRP(R,R,R)dR=12erfc(nR22). (20)

Following the same procedure for deleterious mutations (R>0) one finds that the probability is actually symmetric around R=0 and hence depends only on |R|.

To express PRSE in terms of the selection coefficient of the single mutations, we introduce a Gaussian phenotypic fitness function of the form

W(y)=W0exp(λ|y|2), (21)

where λ>0 is a measure for the strength of selection. The selection coefficient of a mutation with phenotypic effect ξ is then given by

S=ln[W(Q+ξ)W(Q)]=λ(|Q+ξ|2|Q|2)=λnR. (22)

To fix the value of λ we note that the largest possible selection coefficient, which is achieved for mutations that reach the phenotypic optimum, is S0=λQ2, and hence R is related to the selection coefficient through R=(Q2/n)(S/S0). With this substitution, the result in Equation 20 becomes

PRSE(S)=12erfc(n3/282x2|S|S0). (23)

The probability of sign epistasis conditioned on selection strength takes on its maximal value PRSE=1/2 in the neutral limit S0 and decreases monotonically with |S|. Similar to the results of Equations 16, 17, and 18 for unconstrained mutations, it also decreases with increasing phenotypic dimension n when S and x are kept fixed.

In a previous numerical study carried out at finite Q and n, it was found that PRSE varies nonmonotonically with S for the case of beneficial mutations, and displays a second peak at the maximum selection coefficient S=S0 (Schoustra et al. 2016). The two peaks were argued to reflect the two distinct mechanisms giving rise to sign epistasis within FGM (Blanquart et al. 2014). Mutations of small effect correspond to phenotypic displacements that proceed almost perpendicularly to the direction of the phenotypic optimum, and sign epistasis is generated through antagonistic pleiotropy. On the other hand, for mutations of large effect, the dominant mechanism for sign epistasis is through overshooting of the phenotypic optimum. Because of the Fisher scaling implemented in this section with Q,n at fixed x=n/(2Q), the second class of mutations cannot be captured by our approach and only the peak at small S remains. Figure 4A shows the full two-peak structure for a few representative values of n, and Figure 4B illustrates the convergence to the asymptotic expression Equation 23 for the left peak. Using the results of Schoustra et al. (2016), it can be shown that the right peak becomes a step function for n, displaying a discontinuous jump from PRSE=0 to PRSE=1 at S/S0=8/9=0.888

Figure 4.

Figure 4

Probability of RSE PRSE conditioned on the selection coefficients S of the two single mutations to be equal and positive: (A) for the full range of S on a linear scale and (B) for S/S0 smaller than 0.2 on a semilogarithmic scale. Here, the fitness of a phenotype y is assumed to be W(y)=W0exp(λ|y|2), where the parameter λ is related to the maximal beneficial selection coefficient S0 through the relation S0=λQ2. Dashed lines depict the asymptotic expression Equation 23, and solid lines were obtained numerically using the Gaussian approximation for the distribution of epistasis developed by Schoustra et al. (2016).

Summary 1:

When the phenotypic dimension n is large and the Fisher parameter x is moderate, the probability of RSE decays as 1/n, while that of SSE decays as 1/n. Although these probabilities decrease monotonically with n at fixed x, they have a nonmonotonic behavior as a function of x: For small x they increase with x and for large x they decrease with x (see Figure 3). Under the pleiotropic scaling adopted in this work, this implies that the probabilities are nonmonotonic functions of the wild-type distance Q at fixed n and vice versa. In contrast, under the total effect model, where both the wild-type distance Q and x scale as n, the probabilities decrease monotonically and exponentially with n.

Genotypic complexity at a fixed phenotypic dimension

In this section, we are interested in the number of local maxima in the genotypic fitness landscape. We focus on the expected number of maxima, which we denote by N, and analyze how this quantity behaves in the limit of large genotypic dimension, L, when the phenotypic dimension n is fixed. For the sake of clarity, the (unique) maximum of the phenotypic fitness landscape will be referred to as the phenotypic optimum throughout.

The number of local fitness maxima:

Since fitness decreases monotonically with the distance to the phenotypic optimum, a genotype τ is a local fitness maximum if the corresponding phenotype defined by Equation 3 satisfies

|z(τ)|<|z(τ)+(12τi)ξi| (24)

for all 1iL. The phenotype vector appearing on the right-hand side of this inequality arises from z(τ), either by removing a mutation vector that is already part of the sum in Equation 3 (τi=1) or by adding a mutation vector that was not previously present (τi=0). The condition in Equation 24 is obviously always fulfilled if z(τ)=0, that is, if the phenotype is optimal, and we will see that in general the probability for this condition to be satisfied is larger the more closely the phenotype approaches the origin. A graphical illustration of the condition in Equation 24 is shown in Figure 5.

Figure 5.

Figure 5

Illustration of the condition for a genotype to be a local fitness maximum. The circle encloses phenotypes that have higher fitness than the focal phenotype z(τ). For τ to be a genotypic fitness maximum, both a phenotype with a further mutation (dash-dotted green arrow) and a phenotype without one of the mutations in τ (red segment and blue dotted arrows) should lie outside the circle.

The ability of a phenotype z(τ) to approach the origin clearly depends on the number s=i=1Lτi of mutant vectors it is composed of, and all phenotypes with the same number of mutations are statistically equivalent. The expected number of fitness maxima can therefore be decomposed as

N=s=0L(Ls)s(L), (25)

where (Ls) is the number of possible combinations of s out of L mutation vectors and s(L) is the probability that a genotype with s mutations is a fitness maximum. The latter can be written as

s(L)=ndz[i=s+1LD(z)dξip(ξi)]×[i=1sD(z)dξip(ξi)]δ(zQi=1sξi), (26)

with

D(y){ξn||ξy|>|y|}. (27)

Here and below, n stands for the integral over n.

Equation 26 can be understood as follows. First, the δ function δ(zQi=1sξi) constrains z to be the phenotype of τ as defined in Equation 3. Next, the integration domains of the ξis reflect the condition in Equation 24. Assuming, without loss of generality, that the L genetic loci are ordered such that τi=1 for is and τi=0 for i>s, the maximum condition for is requires |z|<|zξi|, so the integration domain should be D(z); whereas for i>s the condition is |z|<|z+ξi|, corresponding to the integration domain D(z). Using the integral representation of the δ function,

δ(y)=1(2π)nndkexp(iky), (28)

we can write

s(L)=nndzdk(2π)nexp[ik(zQ)]F(k,z)s×F(0,z)Ls, (29)

where

F(k,z)D(z)dξp(ξ)exp(ikξ). (30)

It was argued on qualitative grounds in Model that phenotypes that approach arbitrarily close to the origin are easily generated when the scaled wild-type distance q is small, but they become rare for large q. As a consequence, it turns out that the main contribution to the integral over z in Equation 29 comes from the region around the origin z=0 for small q, but shifts to a distance zL along the Q direction for large q. To account for this possibility, it is necessary to divide the integral domain into two parts, |z|<z0 and |z|>z0, where z0 is an arbitrary nonzero number with z0/L0 as L. Thus, we write s(L) as

s(L)=s<(L)+s>(L), (31)

where

s<(L)=|z|<z0dzndk(2π)neik(zQ)F(k,z)sF(0,z)Ls,s>(L)=|z|>z0dzndk(2π)neik(zQ)F(k,z)sF(0,z)Ls, (32)

and correspondingly define N< and N> as

N<=s(Ls)s<(L)andN>=s(Ls)s>(L). (33)

The total number of local maxima is then N=N<+N>.

Regime I:

We first consider s<(L). Expanding F(k,z) around the origin z=0, we show in Appendix C that

s<(L)sn/2exp[Q2/(2s)]sexp[Q2/(2s2)]+Ls. (34)

For an interpretation of Equation 34 it is helpful to refer to Figure 5. Note first that the probability that z=Q+i=1sξi lies in the ball |z|<ζ with radius ζ1 is

Prob(|z|<ζ)Vn(2π)n/2sn/2exp[Q2/(2s)], (35)

where Vn(ζ)ζn is the volume of the ball. We need to estimate how small ζ has to be for τ to be a local fitness maximum with an appreciable probability. Since the s random vectors contributing to z are statistically equivalent, it is plausible to assume that their average component parallel to Q is ξiQ/s. We further assume that the conditional probability density ps(ξ) of these vectors, conditioned on their sum z reaching the ball around the origin, can be approximated by a Gaussian, which consequently has the form

ps(ξ)1(2π)n/2exp(12|ξ+Qs|2). (36)

For z to be a phenotype vector of a local maximum, all these random vectors should lie in the region D(z) and the remaining (unconstrained) Ls vectors should lie in D(z). This event happens with probability Thus, we can estimate the typical value of ζ as the solution of

Vn(ζ)(2π)n/2{sexp[Q2/(2s2)]+Ls}1, (38)

which, combined with Equation 35, indeed gives Equation 34.

To find the asymptotic behavior of N< for large L, we use Stirling’s formula in Equation 33 and approximate the summation over s by an integral over ρs/L. This yields

N<01dρ1Ln/2ρn/2eLΣ(ρ)2πLρ(1ρ)11ρ+ρeq22ρ2, (39)

where the exponent Σ(ρ) is given by

Σ(ρ)ρlnρ(1ρ)ln(1ρ)q22ρ. (40)

Under the condition L1, the remaining integral with respect to ρ can be performed by expanding Σ(ρ) to the second order around the saddle point ρ* determined by the condition

0=ρΣ(ρ)|ρ=ρ*=q22(ρ*)2lnρ*1ρ*. (41)

Performing the resulting Gaussian integral with respect to ρ one finally obtains

N<1L1+n/211+(1ρ*)(q/ρ*)2(ρ*)n/2eLΣ(ρ*)1ρ*+ρ*eq22(ρ*)2, (42)

where ρ*=ρ*(q) is the solution of Equation 41, which is the (scaled) mean number of mutations in a local maximum. We will call ρ* the mean genotypic distance. This solution is not available in closed form, but it can be shown that ρ*=(1/2)+(q2/2)+O(q4) and Σ(ρ*)=ln2q2+O(q4) for small q. Figure 6 compares Equation 42 with the mean number of local maxima obtained by numerical simulations for various q’s with n=1, to show an excellent agreement even for L=10.

Figure 6.

Figure 6

Plots of mean number of local maxima N as a function of the genotypic dimension L for q=0, 0.2, 0.4, and 0.6 with n=1 on a semilogarithmic scale. Data from numerical simulations are represented as dots, and the analytical prediction of Equation 42 is shown as solid lines. Each dot represents the average over 105 realizations of landscapes. In this parameter regime, N grows exponentially with L and the growth rate (i.e., the slopes of the lines) decreases with increasing q.

[D(z)dξps(ξ)]s[D(z)dξp(ξ)]Ls{1Vn(2π)n/2exp[Q2/(2s2)]}s[1Vn(2π)n/2]Lsexp[Vn(2π)n/2{sexp[Q2/(2s2)]+Ls}]. (37)

It is obvious that Σ(ρ) will eventually be negative as q increases for any value of ρ, and this must be true also for the maximum value Σ(ρ*). Indeed, we found the threshold qc0.924809, above which Σ(ρ*) is negative. This signals a phase transition in the landscape properties. Inspection of Equation 40 shows that the transition is driven by a competition between the abundance of genotypes with a certain number of mutations and their likelihood to bring the phenotype close to the optimum. The first two terms in the expression for Σ(ρ) are the standard sequence entropy (see, for example, Schmitt and Herzel 1997) which is maximal at ρ=1/2 (s=L/2), whereas the last term represents the statistical cost associated with “stretching” the phenotype toward to origin. With increasing q, the genotypes contributing to the formation of local maxima become increasingly atypical, in the sense that they contain more than the typical fraction ρ=1/2 of mutations, and ρ* increases. For q>qc, the cost can no longer be compensated by the entropy term and Σ(ρ*) becomes negative. In this regime N< decreases exponentially with L, and therefore the total number of fitness maxima N which by construction cannot be <1, must be dominated by the second contribution N>.

Regime II:

We defer the detailed derivation of N> to Appendix C and here only report the final result obtained in the limit L, which is independent of L and reads

N>[qq0qexp(1q/q01)]n1. (43)

This expression is valid for q>q0=1/2π0.399, but it dominates the contribution N< for large L only when q>qc. Figure 7 indeed shows that Equation 43 approximates the mean number of local maxima for q>qc, that is, N converges to N> for large L. This figure also shows, as is clear by Equation 43, that N is an increasing (decreasing) function of n (q) for a fixed value of q (n). The expected number of maxima is small in absolute terms in this regime, which can be attributed to the fact that the expression inside the parentheses in Equation 43 takes the value 1.214 at q=qc, and decreases rapidly toward unity for larger q.

Figure 7.

Figure 7

Comparison of simulation results (symbols) of the mean number of local maxima Nwith analytic approximations (lines) for q>qc. Each symbol is the result of averaging over 2×106 realizations. (A) Nis shown to increase with n for fixed q. (B) Nis shown to decrease with q for fixed n. (C) Deviation of the analytic expression from the simulation results, defined as 1(Ndata/Ntheory), is depicted as a function of L on a double logarithmic scale. The phenotypic dimension for this panel is n=4, where the largest deviations are observed in (A). The deviation decreases inversely with L as indicated by the black dashed line with slope 1.

To understand the appearance of q0, we refer to Model, where it was argued that 2q0s is the maximal distance toward the origin, which can be covered by a phenotype made up of s typical mutation vectors. Correspondingly, the analysis in Appendix C shows that the main contribution to s>(L) comes from phenotypes located at a distance z=2s(qq0) from the origin, i.e., at a distance 2sq0 from the wild type. The sum over s in Equation 33 is dominated by typical genotypes with s=L/2, and therefore the main contribution to N> comes from phenotypes at a distance z=(qq0)L from the origin. The seeming divergence of N> as qq0+ is an artifact of the approximation scheme, which assumes that the main contribution comes from the region where zO(L); clearly this assumption becomes invalid when qq0+. We note that for very large q and large n, Equation 43 reduces to the expression Nwt obtained in Equation 11 on the basis of Fisher’s formula for the fraction of beneficial mutations from the wild-type phenotype.

Phase transition:

To sum up, the leading behavior of N is

N={N<,q<qc,N>,qqc, (44)

with N< and N> given by Equation 42 and Equation 43, respectively. Since N< decreases to zero with L in a power-law fashion at q=qc, the dominant contribution at this value is N>. At q=qc, the mean genotypic distance ρ* jumps discontinuously from ρ*(qc)0.7035 to ρ*=1/2; and the mean phenotypic distance z*, which is defined as the averaged magnitude of phenotype vectors for local maxima, jumps from z*0 to z*=(qcqo)L. The genotypic complexity Σ* defined in Equation 13 is given by

Σ*={Σ(ρ*),q<qc,0,qqc, (45)

where ρ* is the solution of Equation 41, and hence vanishes continuously at q=qc. These results are graphically represented in Figure 8. Recall that the value Σ*=ln2 attained at q=0 is the largest possible, because the total number of genotypes is 2L=exp(Lln2). Remarkably, these leading order results are independent of the phenotypic dimension. A dependence on n emerges at the subleading order, and it affects the number of fitness maxima in qualitatively different ways in the two phases. For q<qc, the preexponential factor in Equation 42 is a power law in L with exponent 1+n/2 and hence decreases with increasing n; whereas the expression in Equation 43 describing the regime q>qc increases exponentially with n.

Figure 8.

Figure 8

Plot of the genotypic complexity Σ* as a function of the scaled phenotypic wild-type distance q. Here the phenotypic dimension n is kept finite while taking the genotypic dimension L to infinity. The complexity vanishes at the phase transition point q=qc0.924 809. Inset: Plot of the mean genotypic distance ρ* of local maxima from the wild type as a function of q. Starting from 1/2, ρ* increases with q for q<qc and remains at 1/2 for q>qc.

Interpretation:

The phase transition reflects a shift between two distinct mechanisms for generating genotypic complexity in FGM, which are analogous to the two origins of pairwise sign epistasis that were identified by Blanquart et al. (2014) and discussed above in Sign epistasis. In regime I (q<qc), the mutant phenotype closely approaches the origin and multiple fitness maxima are generated by overshooting the phenotypic optimum. By contrast, in regime II (q>qc), the phenotypic optimum cannot be reached and the genotypic complexity arises from the local curvature of the fitness isoclines. These two situations are exemplified by the two panels of Figure 1. For the sake of brevity, in the following discussion we will refer to the two mechanisms as mechanism I and mechanism II, respectively.

The approach to the origin in regime I is a largely one-dimensional phenomenon governed by the components of the mutation vector along the direction of the wild-type phenotype Q, which explains why the leading order behavior of the genotypic complexity is independent of n. For q<qc, the n dependence of the preexponential factor in Equation 42 arises from the increasing difficulty of the random walk formed by the mutational vectors to locate the origin in high dimensions. By contrast, mechanism II operating for q>qc relies on the existence of the transverse dimensions, which is the reason why N> in Equation 43 is an increasing function of n with N>=1 for n=1.

When q0<q<qc, both mechanisms seem to be present simultaneously. As our analysis is restricted to the average number of local maxima, at this point we cannot decide whether both mechanisms appear in a single realization of the fitness landscape, or if one of them dominates for a given realization. To answer this question, we generated 104 fitness landscapes randomly for given parameter sets and identified all local maxima for each landscape. We then determined the number of local maxima and averaged the phenotypic distance of the local maxima to the optimum for each realization. This mean distance will be denoted by z and is itself a random variable; it should not be confused with the mean phenotypic distance z*, which is calculated by taking an average over all fitness peaks in all realizations, giving the same weight to each peak. The results are depicted as a two-dimensional histogram in Figure 9A.

Figure 9.

Figure 9

Coexistence of the two mechanisms I and II for q0<q<qc. (A) Two-dimensional histogram of the number of fitness maxima and the average phenotypic distance of the maxima to the optimum within a single realization. Here L=15 and n=2 are used and 104 different landscapes are randomly generated for each value of q. Only a small number of realizations have a small average distance but these contribute an exceptionally large number of fitness peaks. (B) Two examples of genotype–phenotype maps selected from realizations with q=0.5, L=6, and n=2. The wild-type phenotype is marked by a green ▴ and local fitness maxima by red ▪’s. When the phenotypes of the local fitness maxima are close to (far away from) the origin, the number of maxima is large (small), which corresponds to mechanism I (II).

The figure shows that the marginal distribution of z displays a pronounced peak around z/Lqq0, which corresponds to the behavior that is typical of mechanism I. For most realizations, z/L deviates significantly from zero and only a small number of landscapes have local maxima near z=0. However, these landscapes have many more maxima than typical landscapes and therefore dominantly contribute to the mean number of maxima N. This shows that within a single realization the two mechanisms are not operative together and only a single mechanism exists. Since most realizations exhibit mechanism II, whereas the mean number of local maxima grows exponentially as expected for mechanism I, we conclude that mechanism I occurs rarely but once it does, it generates a huge number of local maxima, which compensates the low probability of occurrence. We may thus say that both mechanisms coexist for q0<q<qc and q0 can be regarded as the threshold of coexistence. Two fitness landscape realizations generated for the same value of q located in the coexistence region that exemplify the two mechanisms are shown in Figure 9B.

Summary 2:

If the dimension n of phenotypic space is much smaller than the dimension L of genotypic space, there exists a threshold qc of the scaled wild-type distance q to the phenotypic optimum below which the mean number Nof local maxima in a genotypic fitness landscape increases exponentially with L, and above which it saturates to a finite value. The genotypic complexity Σ*, which is defined as the exponential growth rate of Nwith L, is a decreasing function of q but does not depend on n. On the other hand, N decreases with n for q<qc yet increases with n for q>qc. Figure 8 depicts Σ* and the mean genotypic distance ρ* as functions of q. For q0<q<qc, where q0=1/2π, N is dominated by a small fraction of landscape realizations that display an exceptionally large number of maxima. If the pleiotropic scaling is assumed to follow the total effects model, we need to specify how the unscaled wild-type distance d in Equation 2 depends on L. Assuming that d=d0L, where d0 is independent of n (Orr 2000), the scaled wild-type distance q=Q/L=d0n becomes an increasing function of n, and therefore the relation q<qc for regime I is never realized when n is sufficiently large.

Genotypic complexity in the joint limit

In the previous subsection, we have calculated the mean number of local fitness maxima Nat a fixed phenotypic dimension n, assuming that the genotypic dimension L is much larger than n (Ln). However, in applications of FGM one often expects that both L and n are large and possibly of comparable magnitude. In this case, the results derived above can be unreliable for large n, as exemplified by the fact that the subleading correction to Equation 42 is of the order of O(L1/n) (see Appendix C).

To obtain a reliable expression for Nthat is valid when both n and L are large, we now consider the joint limit n,L at fixed ratio α=n/L. This will allow us to find the leading behavior of the mean number of local maxima with a correction of order O(1/L). Furthermore, we will clarify the role of the phenotypic dimension in the two phases described in the previous subsection, and we will uncover a third phase that appears at large α (see Figure 10).

Figure 10.

Figure 10

Phase diagrams in the parameter space (q,α). Here, q=Q/L is the scaled distance of the wild-type phenotype from the origin and α=n/L is the ratio of phenotypic dimension to genotypic dimension. Dashed lines are phase boundaries at which the mean genotypic and phenotypic distances change discontinuously. (A) The phase boundary separating regimes I and II starts at (q,α)(0.925,0) and continues to exist until approximately α0.18. (B) The phase boundary separating regimes II and III starts at (q,α)(0,2.38) and continues to exist until approximately q0.62.

The number of local fitness maxima:

We relegate the detailed calculation to Appendix D and directly present our final expression for the mean number of local maxima,

N=C(a*,b*,g*)eLΣred(a*,b*,g*)[1+O(1L)], (46)

where the function Σred(a,b,g) in the exponent is given by

Σred(a,b,g)=α2ln[α(α+g)2(ac(g)+b2)]+α+2b+g2ln2+ln{e2c(g)[erf(α+2b2a)+1]+erf(α2a)+1}, (47)

with c(g)=(α2g2)/(16q2). As before, the starred variables a*, b*, and g* denote the solution of the extremum condition

Σred(a,b,g)|(a,b,g)=(a*,b*,g*)=(0,0,0), (48)

where is the gradient with respect to the three variables (a,b,g). When several solutions of Equation 48 exist, the one giving the largest value of Σred is chosen. The prefactor C(a*,b*,g*), which is independent of L, can be determined from Equation D17 presented in Appendix D. Even though the variables (a,b,g) lack a direct interpretation in terms of the original setting of FGM, we show in Appendix E that a* is related to the mean phenotypic distance z* by the equation z*=La*/2.

An immediate consequence of Equation 46 is that the number of local maxima increases exponentially in L for any value of q and α without algebraic corrections of the kind found in Equation 42. Obtaining closed-form solutions of Equation 48, which ultimately determine the functional dependence of the complexity Σ* on α and q, seems to be a formidable task. Instead, we resort to numerical methods by sweeping through the most interesting intervals, q(0,2) and α(0,3). Surprisingly, we find three independent branches of solutions that correspond to distinct phases. To acquire a qualitative understanding of these branches, it is instructive to first focus on the small α behavior, where one expects a smooth continuation to the results of Equations 42 and 43 as α0.

Small α behavior:

In contrast to the fixed n case where two separate analyses were carried out for the two regimes q<qc and q>qc, the present approach yields a single expression describing the genotypic complexity for arbitrary values of q and α. Consistently with the fixed n analysis, only two out of the three branches of solutions that were found in the numerical analysis exist for sufficiently small α, and they are separated by a phase transition as shown in the phase diagram in Figure 10A. By extrapolating the behavior of Σ* toward α0 as shown in Figure 11, we are able to identify the correct counterparts for each of the two previously found regimes.

Figure 11.

Figure 11

Convergence of the complexity to the fixed n case for small α. (A) The solid lines depict numerical solutions of Equation 48 for values of α belonging to regime I. The convergence to Equation 42 (dashed line) is clearly seen as α0. (B) The blue solid line depicts the numerical solution of Equation 48 for α=0.1 belonging to regime II. Except for a slight deviation detectable when q is close to q0, Equation 49 (dashed line) remains a good approximation.

The extrapolation is straightforward in regime II, where the replacement nαL in Equation 43 yields an exponential dependence of Non L with the growth rate

Σapprox*(II)=αln{qq0qexp[(q/q01)1]}. (49)

This crude approximation turns out to be remarkably accurate even at α=0.1, as illustrated in Figure 11B. By contrast, in regime I the naive replacement of n by αL in Equation 42 yields an expression that vanishes faster than exponential in L, as exp[(α/2)LlnL]. This reflects the fact that the mean phenotypic distance z* moves away from the origin for any α>0 and hence the complexity cannot be derived only by inspecting Equation 26 around z=0 (see Figure 12, A and D). At the same time, the mean genotypic distance ρ* decreases with increasing α and eventually falls below the value ρ*=1/2 favored by the sequence entropy (Figure 12, B and E).

Figure 12.

Figure 12

Plots of scaled mean phenotypic distance z*/L (left column), mean genotypic distance ρ* (middle column), and genotypic complexity Σ* (right column) against q for fixed α (top row) and against α for fixed q (bottom row). The curves in the top (bottom) panels are drawn along the arrows in the inset of (C) and (F). Top row (A, B, and C): When α is small, the landscape behaves similarly to the fixed n case which effectively corresponds to α=0. In this case z* and ρ* for large q are well approximated by qq0 and 1/2, respectively. As α increases beyond the transition line, the first-order transition visualized by the red dashed lines disappears and all quantities change smoothly with q. Bottom row (D, E, and F): As α increases for small q, another phase transition with discontinuities in z* and ρ* (blue dashed lines) signals the appearance of regime III. The genotypic maxima in regime III are located very close to the wild-type position, z*/Lq and ρ*0. This transition ceases to exist when q exceeds ∼0.62. Note that the dependence of Σ* on α is nonmonotonic for q=0.7 (F).

Both trends can be attributed to the increasing role of the perpendicular mutational displacements that make up the second term on the right-hand side of Equation 5. Under the scaling of the joint limit, this term is of order ρL(n1)ραL2 and hence comparable to the first term originating from the parallel displacements. The perpendicular displacements always increase the phenotypic distance to the origin, and they are present even when q=0. The additional cost to reduce the perpendicular contribution results in a smaller value of Σ* compared to the case of fixed n. Moreover, whereas the parallel contribution is minimized (for q>q0) by making ρ as large as possible, the reduction of the perpendicular displacements requires small ρ.

In the fixed n analysis, the number of fitness maxima was found to decrease (increase) with n in regime I (II) and this tendency is recovered from the joint-limit case when α is not too large (Figure 12C). Because of these opposing trends of Σ* in the two regimes, the location of the phase transition separating them is expected to decrease with increasing α, as can be seen in Figure 10A. If one ignores the contribution from the perpendicular displacements, the phenotypic position of the fitness maxima is expected to jump from z*=0 to z*=qcq0 at the transition, and thus the jump size should decrease as qc decreases. This observation suggests that the two branches should merge into one when qc reaches q0. With the additional contribution of perpendicular dimensions, we numerically found that this critical end point at which the phases I and II merge occurs even earlier, at α0.18 and q0.62>q0 (Figure 10). For α > 0.18, ρ* does not show any discontinuity for any q as long as the parameters are in regime II.

Large α behavior and regime III:

To develop some intuition about the FGM fitness landscape in the regime where α=n/L1, we revisit the results obtained in Sign epistasis, where pairs of mutations were considered. Two conclusions can be drawn about the typical shape of these small genotypic landscapes (of size L=2) in the limit n. First, the probability that the wild type is a genotypic maximum tends to unity according to Equation 10. Second, the joint distribution given in Equation 15 enforces additivity of mutational effects for large n, and correspondingly the probability for sign epistasis vanishes. Thus for large n the two-dimensional genotypic landscape becomes smooth with a single maximum located at the wild type. Assuming that this picture holds more generally whenever the limit n is taken at finite L, we expect the following asymptotic behaviors of the quantifiers of genotypic complexity for large α: (i) N1, Σ*0 (unique genotypic optimum); and (ii) z*/Lq, ρ*0 (location of the maximum at the wild-type phenotype and genotype).

This expectation is largely borne out by the numerical results shown in the bottom panels of Figure 12. However, depending on the value of q, the approach to the limit of a smooth landscape can be either continuous (for large q) or display characteristic jumps as indicated by the blue dashed lines in Figure 12, D and E. These jumps as well as the discontinuity in the slope of Σ* as a function of α in Figure 12F are hallmarks of the phase transition to the new regime III, which is represented by the dashed line in Figure 10B.

Fortunately, the solution of Equation 48 describing the new phase can be obtained analytically from Equation 47 or Equation F3 in Appendix F as a series expansion. The derivation presented in Appendix G yields

a*=4q2[162πq3α2+O(q4/α3)]ϵ+O(ϵ2),b*=α+αϵ2πq+O(ϵ2),g*=α+O(ϵ2),andρ*=2πqϵα+O(ϵ2), (50)

where the expansion parameter ϵ=eα2/(8q2) decays rapidly with increasing α/q. The corresponding genotypic complexity can also be evaluated in a series expansion,

Σ(III)(a*,b*,c*)=αϵ24πq2+O(ϵ3), (51)

which shows that Σ* is positive but vanishingly small in this regime. We note that using Equation E5, the expression for a* in Equation 50 amounts to

z*Lq22πq2α2ϵ, (52)

implying that the small number of local maxima that exist in this phase are located very close to the wild-type phenotype.

To first order in ε, the results for ρ* and z* in Equations 50 and 52 can be easily derived from the idea that mutational effects become approximately additive for large α, thus providing further support for this assumption. If mutational effects are strictly additive, the probability for a genotype containing s mutations to be a local fitness maximum is given by

sadd=Pbs(1Pb)Ls, (53)

where Pb is the probability for a mutation to be beneficial. Equation 53 expresses the condition that reverting any one of the s mutations contained in the genotype as well as adding one of the unused Ls mutations should lower the fitness. Using Fisher’s Equation 1, the probability for a beneficial mutation is Pb(2/π)(q/α)ϵ for large α. Thus to linear order in ε or Pb, the expected number of mutations contributing to such a genotype is LPb=Lρ*=L(2/π)(q/α)ϵ, which is consistent with Equation 50.

The phenotypic location of a local maximum deviates from Q in those rare instances where one of the mutations from the wild type is beneficial, which happens with probability Pb. To estimate the corresponding shift in z*, we refer to the results of subsection Sign epistasis, where it was shown that the squared phenotypic displacement R1 defined in Equation 14 has a Gaussian distribution with mean 1 and variance 1/x2=4q2/α2 for large n. Using this, it is straightforward to show that the expected value of R1 conditioned on the mutation to be beneficial (R1<0) is R¯1=4q2/α2 to leading order. Multiplying this by the expected number of mutations LPb we obtain the relation

Lρ*R¯1(z*)2Q2n, (54)

which yields the same leading behavior for z*/L as in Equation 52.

As previously observed for the transition between regimes I and II, the phase boundary separating regimes II and III terminates at a point where the two solutions defining the regimes merge (Figure 10B). Beyond this point the jumps in z* and ρ* seen in Figure 12, D and E, disappear and all quantities approach smoothly to their asymptotic values. A surprising feature of the large α behavior that persists also for larger q is that the complexity becomes an increasing function of q when α>1.7 (Figure 12F). In Figure 13 we verify this behavior using direct simulations of FGM. These simulations also show that the predictions based on Equation 47 are already remarkably accurate for moderate values of L and n.

Figure 13.

Figure 13

Semilogarithmic plots of the mean number of local maxima N vs. the genotypic dimension L for (A) α=0.2 and (B) α=2.5 and for various values of q. Each symbol represents the average over 105 randomly generated landscapes, and lines depict the analytic approximation of Equation D17. The approximation is good even for moderate L.

Summary 3:

When the dimension n of the phenotypic trait space and the dimension L of the genotypic space are large and comparable, the genotypic complexity Σ* is always nonzero and depends on the ratios α=n/L and q=Q/L. There are three regimes where the behavior of the genotypic complexity and the mean genotypic distance ρ* (the average number of mutations in a local maximum divided by L) are qualitatively different. In regime I, which is roughly characterized by small q and small α, there are many local maxima in the region located far away from the wild type but close to the phenotypic optimum, and the fitness landscape is quite rugged. In regime II, which is roughly characterized by large q and small α, there is an appreciable number of local maxima, though smaller than in regime I, and typically half of the L mutations contribute to the corresponding genotypes. In regime III, which is roughly characterized by large α, the genetic complexity is very small, though nonzero. Also ρ* is close to zero, which means that the wild type has a high probability to be the global fitness maximum. An overview of the three regimes is found in Table 2.

Table 2. Characteristics of the three regimes in the joint limit.
Regime Condition Σ* ρ* Landscape
I q1, α1 >0 >1/2 rugged
II q1, α1 0 1/2 intermediate
III α1 0 0 almost smooth

Discussion

FGM provides a simple yet generic scenario for the emergence of complex epistatic interactions from a nonlinear mapping of an additive, multidimensional phenotype onto fitness. Its role in the theory of adaptation may be aptly described as that of a “proof-of-concept model” (Servedio et al. 2014), and as such it is widely used in fundamental theoretical studies (Gros et al. 2009; Chevin et al. 2010; Blanquart et al. 2014; Martin 2014; Fraïsse et al. 2016; Moura de Sousa et al. 2016) as well as for the parameterization and interpretation of empirical data (Martin et al. 2007; Velenich and Gore 2013; Weinreich and Knies 2013; Bank et al. 2014; Perfeito et al. 2014; Blanquart and Bataillon 2016; Schoustra et al. 2016). Rather than tracing the mutational effects and their interactions to the underlying molecular basis, the model aims at identifying robust features of the adaptive process that can be expected to be shared by large classes of organisms.

To give an example of such a feature that is of central importance in the present context, it was pointed out by Blanquart et al. (2014) that pairwise sign epistasis is generated in FGM through two distinct mechanisms. In one case the mutational displacements overshoot the phenotypic optimum, whereas in the other case the displacements are directed approximately perpendicular to the direction of the optimum, and sign epistasis arises because the fitness isoclines are curved. The first mechanism is obviously also operative in a one-dimensional phenotype space, but in the second case (termed antagonistic pleiotropy by Blanquart et al. 2014) at least two phenotypic dimensions are required. Interestingly, both mechanisms have been invoked in empirical studies where a nonlinear phenotype-fitness map was used to model epistatic interactions between multiple mutations. In one study, Rokyta et al. (2011) explained the pairwise epistatic interactions between nine beneficial mutations in the single-stranded DNA bacteriophage ID11 by assuming that fitness is a single-peaked nonlinear function of a one-dimensional additive phenotype. In the second study the genotypic fitness landscapes based on all combinations of two groups of four antibiotic resistance mutations in the enzyme β-lactamase were parameterized by a nonlinear function mapping a two-dimensional phenotype to resistance (Schenk et al. 2013). The fitted function was in fact monotonic and did not possess a phenotypic optimum, which makes it clear that the epistatic interactions arose solely from antagonistic pleiotropy in this case.

In this work we have shown that the two mechanisms described by Blanquart et al. (2014) lead to distinct regimes or phases in the parameter space of FGM, where the genotypic fitness landscapes display qualitatively different properties (Figure 10A). When the phenotypic dimension n is much smaller than the genotypic dimension L, the two regimes are separated by a sharp phase transition where the average number and location of genotypic fitness maxima changes abruptly as the distance q of the wild-type phenotype from the optimum is varied. In regime I (q<qc), the phenotypic optimum is reachable at least by some combinations of mutational displacements. Overshooting of the optimum is therefore possible and sign epistasis is strong, leading to rugged genotypic landscapes with a large number of local fitness maxima that grows exponentially with L. By contrast, in regime II (q>qc), only antagonistic pleiotropy is operative and the number of fitness maxima is much smaller. More precisely, for finite n the number tends to a finite limit for L, but the limiting value is an exponentially growing function of n.

An important consequence of our results is that the dependence of the fitness landscape ruggedness on the phenotypic dimension n is remarkably complicated. For nL, landscapes become less rugged with increasing n in regime I (q<qc), but display increasing ruggedness in regime II (q>qc). When nL, the ruggedness decreases with n for all q and the landscapes become approximately additive (regime III). In particular, the probability of sign epistasis vanishes algebraically with n in this regime. Thus n cannot in general be regarded as a measure of “phenotypic complexity,” as a larger value of n does not imply that the corresponding fitness landscape is more complex.

This observation is relevant for the interpretation of experiments where the parameters of FGM are estimated from data. In recent work, FGM was used to analyze data on pairwise epistasis between beneficial mutations in the filamentous fungus Aspergillus nidulans growing in two different media (Schoustra et al. 2016). The estimates obtained for the phenotypic dimension and the distance of the wild-type phenotype from the optimum were n = 19.3 and Q = 6.89 in complete medium and n = 34.8 and Q = 9.81 in minimal medium, which, surprisingly, may seem to suggest a higher phenotypic complexity in the minimal medium. Using the results derived in this article, we can translate the estimated parameter values into the average number of maxima that a genotypic fitness landscape of a given dimension L would have. As can be seen in Figure 14, with respect to this measure the fitness landscape of the fungus growing in complete medium is actually more rugged. This is consistent with experiments using Escherichia coli, which found a greater heterogeneity of fitness trajectories in complete medium (Rozen et al. 2008), and indicates that the complete medium allows for a greater diversity of paths to adaptation than the minimal medium.

Figure 14.

Figure 14

The logarithm of the number of local fitness maxima divided by the number of loci L is shown as a function of L for FGM with the parameter values n=19.3,Q=6.89 and n=34.8,Q=9.81 obtained by Schoustra et al. (2016) for the fungus A. nidulans growing in complete (CM) and minimal medium (MM), respectively. For the evaluation of N Equation 47 was used.

We hope that the results presented here will promote the use of FGM as part of the toolbox of probabilistic models that are currently available for the analysis of empirical fitness landscapes (Hayashi et al. 2006; Szendro et al. 2013; de Visser and Krug 2014; Neidhart et al. 2014; Bank et al. 2016; Blanquart and Bataillon 2016). Compared to purely genotype-based models such as the NK and rough-Mount-Fuji (RMF) models, FGM is arguably more realistic in that it introduces an explicit phenotypic layer mediating between genotypes and fitness (Martin 2014). Somewhat similarly to the RMF model, the fitness landscapes of FGM are anisotropic and display a systematic change of properties as a function of the distance to the optimal phenotype (FGM) or the reference sequence (RMF), respectively (Neidhart et al. 2014). The idea that fitness landscape ruggedness increases systematically and possibly abruptly when approaching the optimum has been proposed previously in the context of in vitro evolution of proteins (Hayashi et al. 2006). If this is indeed a generic pattern, it may have broader implications. For example, de Visser et al. (2009) showed that the evolutionary benefits of recombination are severely limited by the presence of multiple peaks. If such peaks are rarely encountered far away from the optimum, the benefits of recombination would be most pronounced for particularly maladapted populations.

A recent investigation of 26 published empirical fitness landscapes using ABC concluded that FGM could account for the full structure of the landscapes only in a minority of cases (Blanquart and Bataillon 2016). One of the features of the empirical landscapes that prevented a close fit to FGM was the occurrence of sign epistasis far away from the phenotypic optimum. Our analysis confirms that this is an unlikely event in FGM, and precisely quantifies the corresponding probability through Equation 16 and Equation 17. Blanquart and Bataillon (2016) also found that the phenotypic dimension is particularly difficult to infer from realizations of genotypic fitness landscapes, which matches our observation that the structure of the landscape depends only weakly on n when nL. We expect that our results will help to further clarify which features of an empirical fitness landscape make it more or less amenable to a phenotypic description in terms of FGM or some generalization thereof.

We conclude by mentioning some open questions that should be addressed in future theoretical work on FGM. First, a significant limitation of our results lies in their restriction to the average number of local fitness maxima. The number of maxima induced by a given realization of mutational displacements is a random variable, and unless the distribution of this variable is well concentrated, the average value may not reflect the typical behavior. The large fluctuations between different realizations of fitness landscapes generated by FGM were noticed already by Blanquart et al. (2014) on the basis of small-scale simulations, and they clearly contribute to the difficulty of inferring the parameters of FGM from individual realizations that was reported by Blanquart and Bataillon (2016). In light of our analysis, this pronounced heterogeneity can be attributed to the existence of multiple phases in the model, and it is exemplified by the simulation results in Figure 9. To quantitatively characterize the fluctuations between different realizations, a better understanding of the distribution of the number of fitness maxima and its higher moments is required.

Second, the consequences of relaxing some of the assumptions underlying the formulation of FGM used in this work should be explored. The level of pleiotropy can be reduced by restricting the effects of mutational displacements to a subset of traits (Chevin et al. 2010; Moura de Sousa et al. 2016), and it would be interesting to see how this affects the ruggedness of the fitness landscape. However, the most critical and empirically poorly motivated assumption of FGM is clearly the absence of epistatic interactions on the level of phenotypes. It would therefore be important to understand how robust the results presented here are with respect to some level of phenotypic epistasis, which should ideally arise from a realistic model of phenotypic networks (Martin 2014).

Third, a natural extension of the present study is to consider multiallelic genetic sequences. An immediate generalization keeping the additivity of mutational effects on the level of phenotypes is to consider the following genotype–phenotype map:

z(τ)=Q+i=1Lk=1Aτikξik, (55)

where A is the size of the alphabet from which the sequence elements are drawn (e.g., A=4 for DNA or RNA and A=20 for proteins), τik=1 (0) if the allele at site i is (is not) k, and the ξik are uncorrelated random vectors. Clearly, our results for pairwise epistasis remain the same for this generalized model because they only concern mutations at different sites. However, the condition for a local fitness maximum now involves mutations to different alleles at the same site, which may lead to a nontrivial dependence on A. On the basis of a recent study of evolutionary accessibility in multiallelic sequence spaces, one may expect the fitness landscapes to become less rugged with increasing A (Zagorski et al. 2016), but this conjecture would have to be corroborated by a detailed analysis.

Finally, whereas the present work focused on the structure of the fitness landscapes induced by FGM, it is of obvious importance to understand how the adaptive process actually proceeds on such a landscape (Orr 2005). A simple framework that allows us to address this question is provided by adaptive walks following Gillespie’s strong selection/weak mutation dynamics (Gillespie 1983, 1984; Orr 2002). In a pioneering study, Orr (1998) considered adaptive walks in FGM assuming that the number L of possible mutations is unlimited. In this setting, any population not located precisely at the phenotypic optimum has a nonzero probability of generating another beneficial mutation and the adaptive walk never stops; see Park and Krug (2008) for a related analysis of adaptation in the house-of-cards landscape. For finite but large L, an interesting question concerns the number of steps until the population finds a local fitness maximum when the adaptive dynamics is random (Kauffman and Levin 1987; Park et al. 2015; Park and Krug 2016) or greedy (Orr 2003; Park et al. 2016). This problem is currently under investigation.

Acknowledgments

We thank Anton Bovier, David Dean, Guillaume Martin, Olivier Tenaillon, and an anonymous reviewer for helpful remarks. This work was supported by Deutsche Forschungsgemeinschaft within Sonderforschungsbereich 680 “Molecular Basis of Evolutionary Innovations” and Schwerpunktprogramm 1590 “Probabilistic Structures in Evolution.” S.-C.P. acknowledges the support by the Basic Science Research Program through the National Research Foundation of Korea funded by the Ministry of Science, ICT and Future Planning (grant no. 2014R1A1A2058694). S.-C.P. would also like to thank Korea Institute for Advanced Study for its support and hospitality during his stay there on sabbatical leave (2016–2017).

Appendix A: Derivation of the Joint Probability Density P(R1,R2,R)

For the purpose of this calculation it will turn out to be convenient to locate the wild-type phenotype on the diagonal of the trait space, i.e., to set Q=Q/n(1,1,1,,). The probability density P(R1,R2,R) can then be formally defined as

P(R1,R2,R)=n3δ{nR1i=1n[(ξi+Qi)2Qi2]}δ{nR2i=1n[(ηi+Qi)2Qi2]}δ{nRi=1n[(ξi+ηi+Qi)2Qi2]}ξ,η, (A1)

where Qi=Q/n and ξ,η stands for the average over the distribution of ξ and η. Using the integral representation of the δ function, we can write

P(R1,R2,R)=n3(2π)3dkeik1nR1+ik2nR2+iknRieik1(ξi+Qi)2ik2(ηi+Qi)2ik(ξi+ηi+Qi)2+iQi2(k1+k2+k3)ξ,η, (A2)

where dR and dk stand for dR1dR2dR and dk1dk2dk, respectively, and we factorized the average by taking into account that the ξis and ηis are all independent and identically distributed. The average in Equation A2 is readily calculated as

12πdξdηexp[η22ξ22ik1ξ2ik2η2ik(ξ+η)22ik1Qnξ2ik2Qnη2ikQn(ξ+η)]=1(1+2ik1)(1+2ik2)4k(k1+k2i)exp[2Q2n2i(k+k1)(k+k2)(k1+k2i)+(k1k2)24k(k1+k2i)+(2k1i)(2k2i)], (A3)

which gives

P(R1,R2,R)=n3(2π)3dkeik1nR1+ik2nR2+iknR[(1+2ik1)(1+2ik2)4k(k1+k2i)]n/2exp[n22x22i(k+k1)(k+k2)(k1+k2i)+(k1k2)24k(k1+k2i)+(2k1i)(2k2i)]. (A4)

In the limit n, the integral is dominated by contributions from the vicinity of the extremum of the exponent, which can be algebraically determined to be k=k1=k2=0. By expanding the argument of the exponential function up to the second order around this point and performing the Gaussian integral, we obtain

P(R1,R2,R)n3(2π)3dkexp{n22x2[(k1+k2+k)22k1k2]n[4k2+k12+k22ik1(R11)ik2(R21)+k(2i+2k1+2k2iR)]}=nx242π3/2[1+O(n1)]exp{n8(RR1R2)x22[(R11)2+(R21)2]}, (A5)

which is Equation 15.

Appendix B: Probability of Sign Epistasis

In this appendix, we present the mathematical details of the derivation of the probabilities Pr and Ps of observing RSE and SSE, respectively. As in the main text, let us assume R1<R2. In calculating the probabilities, the integral over R takes one of three forms

R1n8πen8(R+R1+R2)2dR=12erfc(nR222),R2n8πen8(R+R1+R2)2dR=12erfc(nR122),orR1R2n8πen8(R+R1+R2)2dR=12[erfc(nR122)erfc(nR222)]. (B1)

First, we consider RSE which corresponds to the two domains

D1={(R1,R2,R)|R1<R2<R,R2<0}andD2={(R1,R2,R)|R<R1<R2,R1>0}, (B2)

as illustrated in Figure 2. The probability of being in D1 is

Pr(D1)=0dR1R10dR2R2dRP(R1,R2,R3)=x22π0dR1R10dR2exp[x22(R11)2x22(R21)2]erfc(nR122)=x22π0dR10R1dR2exp[x22(R1+1)2x22(R2+1)2]erfc(nR122), (B3)

where we have changed variables RiRi. Since erfc(y)ey2/(yπ) for y1, the above integral is dominated by the region R11 for large n. Thus, it is sufficient to approximate exp[x2((R11)2+(R21)2)/2]ex2, which yields

Pr(D1)x2ex24π0dR10R1dR2erfc(nR122)x2ex24π0dR1R1erfc(nR122)=2x2ex2nπ0dyyerfc(y)=x22nπex2. (B4)

The probability of being in D2 has the same leading behavior,

Pr(D2)=0dR20R2dR1R1dRP(R1,R2,R3)=x22π0dR20R2dR1exp[x22(R11)2x22(R21)2]erfc(nR222)x22π0dR10R1dR2ex2erfc(nR122)=x22nπex2, (B5)

where we have exchanged the variables R1R2. Due to the symmetrical roles of R1 and R2, the total probability of RSE is

Pr2i=12Pr(Di)2x2nπex2. (B6)

We can use a similar approximation scheme to calculate the probability of SSE. There are four domains contributing to SSE (see Figure 2):

D3={(R1,R2,R)|R<R1<0<R2},D4={(R1,R2,R)|R1<0<R2<R},D5={(R1,R2,R)|R1<R<R2<0},D6={(R1,R2,R)|0<R1<R<R2}. (B7)

As we will see, all integrals can be represented by the functions

G1(a,b)=x24π0dR10R1dR2exp[x22(R1+a)2x22(R2+b)2]erfc(nR222), (B8)
G2(a,b)=x24π0dR10R1dR2exp[x22(R1+a)2x22(R2+b)2]erfc(nR122)=x42πerfc(bx2)0dR1exp[x22(R1+a)2]erfc(nR122)G1(b,a), (B9)

where a,b=±1 and we have used that

0dyydzf(y,z)=0dy0ydzf(z,y). (B10)

To be specific, we write the probabilities of being in each domain as

Pr(D3)=x24π0dR10dR2exp[x22(R11)2x22(R21)2]erfc(nR222)=G1(1,1)+G2(1,1),
Pr(D4)=x24π0dR10dR2exp[x22(R11)2x22(R21)2]erfc(nR122)=G1(1,1)+G2(1,1),
Pr(D5)=x24π0dR1R10dR2exp[x22(R11)2x22(R21)2][erfc(nR122)erfc(nR222)]=G1(1,1)G2(1,1),
Pr(D6)=x24π0dR1R1dR2exp[x22(R11)2x22(R21)2][erfc(nR122)erfc(nR222)]=G1(1,1)G2(1,1),

where we have changed negative integral domains into positive domains and made use of Equation B10. Using the approximation scheme explained above, we get

G1(a,b)=x42π0dR2ex2(R2+b)2/2erfc(nR222)erfc[x(R2+a)2]x42πerfc(ax2)ex2/20dR2erfc(nR222)=x2nπerfc(ax2)ex2/2+O(1/n). (B11)

Since

x42πerfc(bx2)0dR1exp[x22(R1+a)2]erfc(nR122)x2nπerfc(bx2)ex2/2+O(1/n), (B12)

we conclude that G2(a,b)=O(1/n). Using erfc(y)+erfc(y)=2, we finally obtain

Ps2i=36Pr(Di)4xnπex2/2. (B13)

Appendix C: Large L Behavior of s(L) for Fixed n

In this appendix, we calculate the asymptotic behavior of the probability s(L) for a genotype with s mutations to be a local fitness maximum in the limit where L is large and the phenotype dimensions n is fixed. As explained in the main text, this probability has two contributions which arise from expanding the function F(k,z) defined in Equation 30 near |z|=0 and |z|=z*L, respectively (see Equation 31).

First, we consider the contribution from the region |z|1. In this case, we can approximate F(k,z) as

F(k,z)=neikξp(ξ)dξDc(z)eikξp(ξ)dξek2/2Dc(z)p(0)dξ=ek2/2Anznexp[k22Anznek2/2], (C1)

where k=|k|; z=|z|; Dc(z)={y||yz|z}, which is the complement of D(z); An=p(0)Sn1/n with Sn1=2πn/2/Γ(n/2) being the surface area of the unit sphere in (n1) dimensions; and p(0)=(2π)n/2. Note that the error of the above approximation is O(zn+1). Thus, setting ρs/L we can approximate

s<(L)ndzdk(2π)nexp[ikz+LH1(k,z)],H1(k,z)ikqρk22ρAnznek2/2(1ρ)Anzn, (C2)

where q=Q/L. Since L is large, we can employ the saddle-point approximation. One can easily see that the saddle point solving the equations kjH1=zkH1=0 is at z=0 and kj=iqj/ρ. Around the saddle point, we expand

H1q22ρρ2(k+iq/ρ)2Anzn[ρeq2/(2ρ2)+(1ρ)], (C3)

which gives

s<(L)exp(Lq22ρ)ndzexp{LAnzn[ρeq2/(2ρ2)+(1ρ)]}ndk(2π)nexp[Lρ2(k+iq/ρ)2]=exp(Lq22ρ)(2πLρ)n/20Sn1zn1dzexp{LAnzn[ρeq2/(2ρ2)+(1ρ)]}=sn/2exp[Q2/(2s)]sexp[Q2/(2s2)]+Ls[1+O(L1/n)], (C4)

and the last step involves a change of variables zt=Sn1zn/n. Since L appears in the integrand in the combination Lzn, the error that arises from neglecting terms of O(zn+1) is L1/n. The leading order of Equation C4 was reported in Equation 34.

Now we move on to the calculation of s>(L), where the dominant contribution to F(k,z) comes from a region where zO(L). Using dξp(ξ)exp(ikξ)=exp(k2/2), we calculate the integral Iexp(k2/2)F(k,z) as

I=1(2π)n/202zdξneiknξnξn2/2B(2z,ξn)dξeikξξ2/2=1(2π)n/202zdξneiknξnξn2/2[n1dξeikξξ2/2Bc(2z,ξn)dξeikξξ2/2]=ek2/22π[0dξneiknξnξn2/22zdξneiknξnξn2/2]1(2π)n/202zdξneiknξnξn2/2Bc(2z,ξn)dξeikξξ2/2=ek2/22[1erf(ikn2)]ek2/22π2zdξneiknξnξn2/2C1(k,z), (C5)

where we set z=zen, ξ=ξξnen, and k=kknen with en=(0,,0,1), B(2z,ξn) is an (n − 1)-dimensional ball with radius ξn(2zξn) whose center is located at the origin, Bc is the relative complement of B with respect to n1, and erf(z)=20zet2dt/π is the error function. The definition of C1 is self-explanatory. Since

|2zdξneiknξnξn2/2|2zdξneξn2/2=2z1dye2z2y2e2z22z, (C6)

where we used the Laplace method for the asymptotic expansion, the leading finite z correction is expected to come from C1 for n>1. Note that C1 is identically zero for n=1. Thus we get

F(k,z)12ek2/2[1+erf(ikz2z)]+C1(k,z), (C7)

where kn is written as a projection of k along the z direction, kn=kz/z. Since

|C1(k,z)|1(2π)n/202zdξneξn2/2Bc(2z,ξn)dξeξ2/2=C1(0,z), (C8)

it is sufficient to find an approximate formula for C1(0,z) to determine the z dependence of C1(k,z). Using spherical coordinates in n1, we get

C1(0,z)=Sn2(2π)n/202zdyey2/2y(2zy)dxxn2ex2/2=Sn2(2π)n/2{02zdyey2/2zdxxn2ex2/2+0zdxxn2ex2/2[0M_(x,z)dyey2/2+M+(x,z)zdyey2/2]}, (C9)

where Sn2=2π(n1)/2/Γ[(n1)/2] is the surface area of the unit (n2) sphere. In the second term on the second line, the order of integration was reversed and the integration boundaries M±(x,z)=z±z2x2 were introduced. Since the first integral (zdz) and the third integral (M+zdy) decrease exponentially with z, the main contribution to C1(0,z) comes from the second integral. Thus,

C1(0,z)Sn2(2π)n/20zdxxn2ex2/20M(x,z)dyey2/2=Sn2zn(2π)n/201dxxn2ez2x2/20M(x,1)dyez2y2/2=Sn2zn1(2π)n/2π201dxxn2ez2x2/2erf[M(x,1)z/2]. (C10)

Since the last integral is dominated by the region xz1, we can approximate M(x,1)zx2z/2O(1/z) and erf[M(x,1)z/2]=x2z/2π. Finally, we get

C1(0,z)Sn2zn(2π)n+1/2π201dxxnez2x2/2Sn2zn(2π)n+1/2π20dxxnez2x2/2=n122πz, (C11)

which also implies that C1(k,z)O(z1).

If we write

F(k,z)=12ek2/2[1+erf(ikr2r)][1+1Lf(r,k)+O(z2)], (C12)

where r=z/L and r=z/L, then comparison with Equation C7 and Equation C11 shows that f(r,0)=(n1)/(2πr). Inserting Equation C12 into Equation 32, it follows that

s>(L)Ln2Ldrdk(2π)nexp[ρf(r,k)+(1ρ)f(r,0)]exp[LH2(k,r)], (C13)

with

H2(k,r)=ik(rq)ρk22+ρln[1+erf(ikr2r)]. (C14)

Now we employ the steepest-descent method. For convenience, we set q=(q,0,,0). The saddle point satisfies the equations

H2rj=ikj+i2ρπr3exp[(kr)22r2][kjr2rj(kr)][1+erf(ikr2r)]1=0, (C15)
H2kj=i(rjqδj1)ρkj+i2ρπrexp[(kr)22r2]rj[1+erf(ikr2r)]1=0, (C16)

with the solution k*=0 and r*=(qρ2/π,0,,0). Note that there is no solution if q<ρ2/π, so the valid range of ρ has the upper boundary ρc(q)=min(1,π/2q). The matrix of second derivatives around the saddle point k*,r* is

2H2rlrj|*=0,2H2kmkj|*=ρδmj(12πδl1),2H2rmkj|*=iδjl[1+(1δm1)2ρπq2ρ]. (C17)

Thus, we get

s>(L)Ln2Lef(r*,0)dydk(2π)nexp[Lρπ(π2)k12Lρk2+iLk1y1+iLπqπq2ρky], (C18)

where y=rr*, y=(0,y2,,yn), and k=(0,k2,,kn). If we perform the integration over y first, we obtain δ functions which make the integral over k trivial. Finally, we arrive at

s>(L)2Lθ(ρcρ)[q2ρ/2πqexp(12πq2ρ)]n1, (C19)

where θ(x) is the Heaviside step function defined by θ(x0)=1 and θ(x<0)=0.

To evaluate the corresponding contribution to the number of fitness maxima, N>, we replace the summation over s in Equation 33 by an integral over ρ=s/L and use Stirling’s formula to approximate the binomial coefficients. This yields

N>=L01dρexp{L[ρlnρ(1ρ)ln(1ρ)ln2]}2πρ(1ρ)[q2ρ/2πqexp(12πq2ρ)]n1θ(ρcρ). (C20)

If ρc<1/2 or q<2π1=q0, the integral is dominated around ρρc, which results in an exponential decrease with L. On the other hand, if ρc>1/2, the integral is dominated around ρ1/2, which gives

N>2Lπdxe2Lx2[q1/2πqexp(12πq1)]n1=[q1/2πqexp(12πq1)]n1 (C21)

as reported in Equation 43.

Appendix D: Derivation of Equation 47

In this appendix, we calculate the average number of fitness maxima N in the limit n,L at fixed ratio αn/L. To this end, we write Iτ, the probability for the genotype τ to be a local fitness maximum, using the Heaviside step function as

Iτ=k=1L[dξkp(ξk)θ{1L[z+(12τk)ξk]21L|z|2}]Dξkθ(εk), (D1)

where z is determined by τ through Equation 3, Dξkdξkp(ξk), and εk is defined as

εk=1L[z+ξk(12τk)]21L|z|2=1L[2zξk(12τk)+|ξ|2]=1L[2(Q+jξjτj)ξk(12τk)+|ξk|2]. (D2)

Note that the prefactor 1/L is introduced to make εk finite in the limit L and we have used that (12τk)2=1. Applying the identity (Tanaka and Edwards 1980; Bray and Moore 1980)

θ(εk)=0dλkδ(λkεk)=0dλkdφk2πexp[iφk(λkξk)] (D3)

to Equation D1, the expected number of local fitness maxima reads

N=τDξk=1L[0dλkdφk2πeiφk(λkεk)]=τDξDλDφexp{k=1L[iφkλk+iLφk(2ξkj=1Lξjτj+2ξkQ|ξk|2)]}=τDξDλDφexp{k=1L[iφkλk+iLφk(2ξkQ|ξk|2)]}exp[iLk=1Lφkξkj=1Lξj(2τj)], (D4)

where Dλ0kdλk, Dφkdφk/2π, and we made the change of variables (2τk1)ξkξk to arrive at the second equality. Using the identity

exp(iLXY)=Lnndνδ(LνX)exp(iYν)=(L2π)nndμdνexp[iLμνiXμ+iYν], (D5)

which is valid for any n-dimensional real vectors X and Y, we can write the last term of Equation D4 as

exp[iLkφkξkjξj(2τj)]=(L2π)nndμdνexp[iLμν+ikξk(φkμ+2τkν)], (D6)

which gives

N=τDλDφDμDνeiφλ+iLμνk=1Ls=1ndξks2πexp{i[1Lφk(2ξksQsξks2)+ξks(φkμs+2τkνs)]ξks22}, (D7)

where DμDν(L2π)nndμdν, φλk=1Lφkλk, and ξks, Qs, μs, νs are the sth components of the vectors ξk, Q, μ, and ν, respectively. Note that the integrals over the ξkss become independent of each other. If we choose Qs=Q/n=qL/α for all s and define χ2Qs/L, the integral over ξks becomes

dξks2πexp{ξks22(1+2iφk/L)+iξks[φk(χμs)+2τkνs]}=11+2iφk/Lexp{[2νsτk+(χμs)φk]22(1+2iφk/L)}=11+2iφk/Lexp[(μsχ)2φk2+4τk(μsχ)νsφk4νs2τk2(1+2iφk/L)], (D8)

which, in turn, gives

N=τDλDφDμDνeiφλ+iLμνk=1L1(1+2iφk/L)n/2exp[φk2s(μsχ)2+4τkφks(μsχ)νs4τksνs22(1+2iφk/L)]. (D9)

If we now insert the identity

1=0dadbdcδ(as(μsχ)2)δ(bs(μsχ)νs)δ(csνs2)=0dadbdcdA2π/LdB2π/LdC2π/Lexp[iAL{as(μsχ)2}+iBL{bs(μsχ)νs}+iCL{csνs2}], (D10)

we can write

N=τDλDφDμDνeiφλ+iLμν0dadAdbdBdcdC(2π/L)3k(1+2iφk/L)Lα/2exp[aφk2+4bτkφk4cτk2(1+2iφk/L)]×exp{iAL[as(μsχ)2]+iBL[bs(μsχ)νs]+iCLπ(csνs2)}, (D11)

where we have replaced n by Lα. The integral domain of a is restricted to the positive real axis to ensure that the integral with respect to φk in Equation D9 continues to be well-defined after the substitution. Performing the integrals over μs and νs, we get

L2πdμsdνseiLμsνsiLA(μsr)2iLB(μsr)νsiLCνs2=exp[12(Ln{(B1)24ACAi}+Ln(Ai))+i4q2A/α4AC(B1)2], (D12)

where Ln(x) is the principal value of the logarithm with argument in the interval (π,π] and the branch cut lies on the negative real axis.

Subsequently, the remaining integral over φi and λi can be readily evaluated as follows:

12π0dλkdφk(1+2iφk/L)Lα/2exp[aφk2+4bτkφk4cτk2(1+2iφk/L)+iφkλk]=T(a,bi,c,τk)+1LU(a,bi,c,τk)+O(1/L2), (D13)

where

T(a,b,c,τ)=12e2cτ[erf(α+2bτ2a)+1],U(a,b,c,τ)=4acτ+a+2bτ(α+2bτ)2πa3/2exp[(α+2bτ)22a2cτ]. (D14)

After summing over the τks, we arrive at the equation

N=0dadAdbdBdcdC(2π/L)3exp[U(a,bi,c,1)+u(a,bi,c,0)T(a,bi,c,1)+T(a,bi,c,0)]exp[LΣ(a,bi,c,Ai,B,Ci)], (D15)

where

Σ(a,b,c,A,B,C)=aA+bB+cC12αln[4AC+(B1)2]4Aq24AC+(B1)2+ln[T(a,b,c,1)+T(a,b,c,0)]. (D16)

The remaining integrals are hard to evaluate analytically. Instead, we resort to the saddle-point method to obtain an asymptotic expansion of the integral. Since Σ is the exponential growth factor of the number of local maxima which must be a real number, one expects that the saddle points of Equation D16 are formed for the real arguments of Σ. This suggests that we should make the changes of variables bb/i, AA/i, and CC/i. For large L, the integrals are then dominated by the saddle point (a*,b*,c*,A*,B*,C*) of Σ(a,b,c,A,B,C). If there is more than one saddle point, the one giving the largest value of Σ(a,b,c,A,B,C) has to be chosen. Then, the leading behavior of the number of maxima can be expressed in terms of the saddle point as

N=1|detH(Σ)|exp[U(a*,b*,c*,1)+U(a*,b*,c*,0)T(a*,b*,c*,1)+T(a*,b*,c*,0)]exp[LΣ(a*,b*,c*,A*,B*,C*)], (D17)

where H(Σ) is the Hessian matrix around the saddle point. The reader may have noticed that the two principal values of the logarithm defined in Equation D12 are replaced by a real-valued logarithm in Equation D16, which can be dangerous in general. However, it can be shown that this substitution is indeed correct by verifying that (B*1)2+4A*C* is always positive for all saddle points of Equation D16, and thus the imaginary arguments always cancel each other out.

Now, let us evaluate the saddle-point conditions. The derivatives of Σ with respect to A, B, and C are

ΣA=a2αC[4AC+(B1)2]+2q2(B1)2[4AC+(B1)2]2,ΣB=b(B1)[α(B1)2+A(4Cα8q2)][4AC+(B1)2]2,ΣC=c2A[α(B1)2+A(4Cα8q2)][4AC+(B1)2]2. (D18)

By requiring that the above three equations are zero at the saddle point, we get

A=αc2(ac+b2),B1=αbac+b2,C=14(2aαac+b2+α±α216cq2c). (D19)

The two solutions of C force us to perform a two-fold analysis for the remaining integrals since we cannot a priori determine which solution will yield the correct saddle point. Instead, we introduce another real number g=±α216cq2 which is allowed to take both signs. Then, by imposing the functional relation c(g)=(α2g2)/(16q2), both solutions are covered by a single analysis. In this way, the saddle point is obtained in terms of g instead of c. Finally, substituting this solution into Equation D17 gives Equation 47.

Appendix E: Mean Phenotypic Distance z* in the Joint Limit

In this appendix, we will associate the saddle-point value a* of the variable a entering the complexity function Equation 47 with the mean phenotypic distance z*. To this end, we first consider the probability density P(τ,a) that a genotype τ whose phenotypic vector is of squared magnitude L2a/4 is a local maximum. Formally, we can write

P(τ,a)=Dξkθ(εk)δ[a4L2(Q+kξkτk)2]=L2πdADξkθ(εk)exp[iLaAi4AL(Q+kξkτk)2]=L2πdADξkθ(εk)Dψexp[iLaA+iL16Aψ2+iψ(Q+kξkτk)], (E1)

where we have used the identity dxexp(ipx2+iqx)=π/peiπ/4exp[iq2/(4p)] for p>0, Dψ=s(Leiπ/4dψs)/(4πA), and the notation is the same as in Appendix D. Following the same procedure in the previous appendix, we get

P(τ,a)=DξDλDφDμDνeiφλ+iLμνexp[kiLφk(2ξkQ|ξk|2)+ikξk(φkμ+2τkν)]×L2πdADψexp[iLaA+iL16Aψ2+iψ(Q+iξiτi)]. (E2)

By shifting ννψ/2 and integrating over ψ, we have

P(τ,a)=DξDλDφDμDνeiφλ+iLμνexp[kiLφk(2ξkQ|ξk|2)+ikξk(φkμ+2τkν)]×L2πdADψexp[iLaA+iL16Aψ2+iψ(QLμ/2)]=DξDλDφDμDνeiφλ+iLμνexp[kiLφk(2ξkQ|ξk|2)+ikξk(φkμ+2τkν)]×dA2π/Lexp{iLA[a(2QLμ)2]}. (E3)

Since we have set Qs=Q/n for s=1,,n, the last integral becomes

dA2π/Lexp{iLA[a(2QLμ)2]}=dA2π/Lexp{iLA[as(μsχ)2]}. (E4)

Since N=τ0daP(τ,a), by applying the manipulations of Appendix D to Equation E3 we arrive at the same integral form as in Equation D10. Since τP(τ,a) is the mean number of local maxima whose phenotypic vectors have squared magnitude a, we see that the saddle point a* of Equation D16 determines the mean phenotypic distance z* through

z*=La*2. (E5)

This shows in particular that z* is linear in L.

Appendix F: Mean Genotypic Distance ρ* in the Joint Limit

To have access to the information about the typical value of the genotypic (Hamming) distance of a local fitness maximum from the wild type, we rewrite Equation D15 as

N=0dadAdbdBdcdC(2π/L)3s=0L(Ls)[T(a,b,c,1)+U(a,b,c,1)L]s[T(a,b,c,0)+U(a,b,c,0)L]Ls0dadAdbdBdcdC(2π/L)301dρeLΣ(a,b,c,A,B,C,ρ)2πLρ(1ρ)exp{ρ[1+U(a,b,c,1)T(a,b,c,1)]+(1ρ)[1+U(a,b,c,0)T(a,b,c,0)]}, (F1)

where we have rearranged the summation Στ as s=0L(Ls). taking advantage of the inherent permutation symmetry, Stirling’s formula has been used to evaluate the binomial coefficients, Σs is approximated as L01dρ with s=Lρ, and

Σ(a,b,c,A,B,C,ρ)aA+bB+cC12αln[4AC+(B1)2]4Aq24AC+(B1)2+ρlnT(a,b,c,1)+(1ρ)lnT(a,b,c,0)ρlnρ(1ρ)ln(1ρ). (F2)

The saddle-point equations for this expression involve seven variables including ρ. Since the saddle-point equations for A, B, and C are the same as Equation D18, we may again insert Equation D19 into Equation F2, which yields

Σred(a,b,g,ρ)=ln2+α2(1lnα2)α2ln[α+gac(g)+b2]+b+g2ρlnρ(1ρ)ln(1ρ)+(1ρ)ln(erf(α2a)+1)+ρ{ln[erf(α+2b2a)+1]2c(g)}. (F3)

Since

Σρ=lnT(a,b,c,1)T(a,b,c,0)lnρ+ln(1ρ), (F4)

the saddle-point value of ρ* is

ρ*=T(a*,b*,c*,1)T(a*,b*,c*,1)+T(a*,b*,c*,0)={1+e2c*[erf(α/2a*)+1erf[(α+2b*)/2a*]+1]}1. (F5)

By inserting ρ* into the saddle-point equations for a,b,c, one can easily see that the final equations are the same as those derived from Equation D16.

Appendix G: Derivation of Equation 50

The determination of the solution describing regime III relies on the intuition that as α becomes large, the fitness landscape is asymptotically linear with the wild type being the global fitness maximum, as demonstrated in Sign epistasis for L=2. This suggests an ansatz where a* is close to 4q2, which corresponds to the wild-type phenotypic distance as shown in Equation E5. Given this clue, one can additionally find that

aΣred(a,b,g)=0 (G1)

is solved by a=4q2, b=α, and g=α. Furthermore, if we evaluate the remaining saddle-point conditions around this point, we find that this solution fails to solve them by a slight margin,

bΣred(a,b,g)|a=4q2,b=α,g=α=eα28q22πq (G2)

and

cΣred(a,b,g)|a=4q2,b=α,g=α=α8q2erfc(α22q). (G3)

Given the fact that erfc(x)=ex2[1/(πx)+O(x3)], these nonvanishing terms are seen to be of the order of ε=eα2/(8q2). Hence, it is sufficient to consider an expansion around the zeroth-order solution of the form Σred(4q2+A1ϵ,α+A2ϵ,α+A3ϵ) to show that Equation 50 satisfies the saddle-point conditions of Equation 48. To this end, we first focus on the derivatives with respect to A1 and A2,

1ϵA1Σred(4q2+A1ϵ,α+A2ϵ,α+A3ϵ)=A3ϵ16q2+O(ϵ2),1ϵA2Σred(4q2+A1ϵ,α+A2ϵ,α+A3ϵ)=(12πq2A2+A32α)ϵ+O(ϵ2). (G4)

The vanishing contributions in ε imply that the zeroth-order solution (4q2,α,α) satisfies the first two saddle-point conditions. Additionally, we find that the corrections of the order O(ϵ) are A3=0 and A2=α/(2πq). Since A3=0 to leading order, the saddle-point equation with respect to g should be evaluated to order O(ϵ2). This yields

1ϵ2B3Σred(4q2+A1ϵ,α+A2ϵ,α+B3ϵ2)=[A116q22πqα2+O(q3α4)]ϵ+O(ϵ2), (G5)

and subsequently, A1 is solved to be A1=[(162/πq3/α2)+O(q4/α3)]ϵ+O(ϵ2). Finally, by inserting the solutions A1, A2, and A3 as well as the zeroth-order solutions into Equation F5, the solution for ρ* is found to be

ρ*=[2πqϵα+O(q3α3)]+O(ϵ2). (G6)

Footnotes

Communicating editor: J. Hermisson

Literature Cited

  1. Bank C., Hietpas R. T., Wong A., Bolon D. N., Jensen J. D., 2014.  A bayesian MCMC approach to assess the complete distribution of fitness effects of new mutations: uncovering the potential for adaptive walks in challenging environments. Genetics 196: 841–852. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Bank C., Matuszewski S., Hietpas R. T., Jensen J. D., 2016.  On the (un)predictability of a large intragenic fitness landscape. Proc. Natl. Acad. Sci. USA 113: 14085–14090. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Blanquart F., Bataillon T., 2016.  Epistasis and the structure of fitness landscapes: are experimental fitness landscapes compatible with Fisher’s model? Genetics 203: 847–862. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Blanquart F., Achaz G., Bataillon T., Tenaillon O., 2014.  Properties of selected mutations and genotypic landscapes under Fisher’s geometric model. Evolution 68: 3537–3554. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bray A. J., Moore M. A., 1980.  Metastable states in spin glasses. J. Phys. C Solid State Phys. 13: L469–L476. [Google Scholar]
  6. Chevin L.-M., Martin G., Lenormand T., 2010.  Fisher’s model and the genomics of adaptation: restricted pleiotropy, heterogeneous mutation, and parallel evolution. Evolution 64: 3213–3231. [DOI] [PubMed] [Google Scholar]
  7. Crona K., Greene D., Barlow M., 2013.  The peaks and geometry of fitness landscapes. J. Theor. Biol. 317: 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. de Visser J. A. G. M., Krug J., 2014.  Empirical fitness landscapes and the predictability of evolution. Nat. Rev. Genet. 15: 480–490. [DOI] [PubMed] [Google Scholar]
  9. de Visser J. A. G. M., Park S.-C., Krug J., 2009.  Exploring the effect of sex on empirical fitness landscapes. Am. Nat. 174: S15–S30. [DOI] [PubMed] [Google Scholar]
  10. Durrett R., Limic V., 2003.  Rigorous results for the NK model. Ann. Probab. 31: 1713–1753. [Google Scholar]
  11. Evans S. N., Steinsaltz D., 2002.  Estimating some features of NK fitness landscapes. Ann. Appl. Probab. 12: 1299–1321. [Google Scholar]
  12. Fisher R. A., 1930.  The Genetical Theory of Natural Selection. Clarendon Press, Oxford. [Google Scholar]
  13. Fraïsse C., Gunnarsson P. A., Roze D., Bierne N., Welch J. J., 2016.  The genetics of speciation: insights from Fisher’s geometric model. Evolution 70: 1450–1464. [DOI] [PubMed] [Google Scholar]
  14. Franke J., Klözer A., de Visser J. A. G. M., Krug J., 2011.  Evolutionary accessibility of mutational pathways. PLoS Comp. Biol. 7: e1002134. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Gillespie J. H., 1983.  A simple stochastic gene substitution model. Theor. Popul. Biol. 23: 202–215. [DOI] [PubMed] [Google Scholar]
  16. Gillespie J. H., 1984.  Molecular evolution over the mutational landscape. Evolution 38: 1116–1129. [DOI] [PubMed] [Google Scholar]
  17. Gros P.-A., Le Nagard H., Tenaillon O., 2009.  The evolution of epistasis and its links with genetic robustness, complexity and drift in a phenotypic model of adaptation. Genetics 182: 277–293. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Hayashi Y., Aita T., Toyota H., Husimi Y., Urabe I., et al. , 2006.  Experimental rugged fitness landscape in protein sequence space. PLoS One 1: e96. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Hermisson J., McGregor A. P., 2008.  Pleiotropic scaling and QTL data. Nature 456: E3. [DOI] [PubMed] [Google Scholar]
  20. Kauffman S., Levin S., 1987.  Towards a general theory of adaptive walks on rugged landscapes. J. Theor. Biol. 128: 11–45. [DOI] [PubMed] [Google Scholar]
  21. Kimura M., 1983.  The Neutral Theory of Molecular Evolution. Cambridge University Press, Cambridge, United Kingdom. [Google Scholar]
  22. Lande R., 1980.  The genetic covariance between characters maintained by pleiotropic mutations. Genetics 94: 203–215. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Limic V., Pemantle R., 2004.  More rigorous results on the Kauffman-Levin model of evolution. Ann. Probab. 32: 2149–2178. [Google Scholar]
  24. Martin G., 2014.  Fisher’s geometric model emerges as a property of complex integrated phenotypic networks. Genetics 197: 237–255. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Martin G., Elena S. F., Lenormand T., 2007.  Distributions of epistasis in microbes fit predictions from a fitness landscape model. Nat. Genet. 39: 555–560. [DOI] [PubMed] [Google Scholar]
  26. Moura de Sousa J. A., Alpendrinha J., Campos P. R. A., Gordo I., 2016.  Competition and fixation of cohorts of adaptive mutations under Fisher geometrical model. PeerJ 4: e2256. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Neidhart J., Szendro I. G., Krug J., 2014.  Adaptation in tunably rugged fitness landscapes: the rough mount fuji model. Genetics 198: 699–721. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Orr H. A., 1998.  The population genetics of adaptation: the distribution of factors fixed during adaptive evolution. Evolution 52: 935–949. [DOI] [PubMed] [Google Scholar]
  29. Orr H. A., 2000.  Adaptation and the cost of complexity. Evolution 54: 13–20. [DOI] [PubMed] [Google Scholar]
  30. Orr H. A., 2002.  The population genetics of adaptation: the adaptation of DNA sequences. Evolution 56: 1317–1330. [DOI] [PubMed] [Google Scholar]
  31. Orr H. A., 2003.  A minimum on the mean number of steps taken in adaptive walks. J. Theor. Biol. 220: 241–247. [DOI] [PubMed] [Google Scholar]
  32. Orr H. A., 2005.  The genetic theory of adaptation: a brief history. Nat. Rev. Genet. 6: 119–127. [DOI] [PubMed] [Google Scholar]
  33. Park S.-C., Krug J., 2008.  Evolution in random fitness landscapes: the infinite sites model. J. Stat. Mech.: Theory Exp. 2008: P04014.
  34. Park S.-C., Krug J., 2016.  δ-exceedance records and random adaptive walks. J. Phys. A Math. Theor. 49: 315601. [Google Scholar]
  35. Park S.-C., Szendro I. G., Neidhart J., Krug J., 2015.  Phase transition in random adaptive walks on correlated fitness landscapes. Phys. Rev. E Stat. Nonlin. Soft Matter Phys. 91: 042707. [DOI] [PubMed] [Google Scholar]
  36. Park S.-C., Neidhart J., Krug J., 2016.  Greedy adaptive walks on a correlated fitness landscape. J. Theor. Biol. 397: 89–102. [DOI] [PubMed] [Google Scholar]
  37. Perfeito L., Sousa A., Bataillon T., Gordo I., 2014.  Rates of fitness decline and rebound suggest pervasive epistasis. Evolution 68: 150–162. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Poelwijk F. J., Kiviet D. J., Weinreich D. M., Tans S. J., 2007.  Empirical fitness landscapes reveal accessible evolutionary paths. Nature 445: 383–386. [DOI] [PubMed] [Google Scholar]
  39. Poelwijk F. J., Tănase-Nicola S., Kiviet D. J., Tans S. J., 2011.  Reciprocal sign epistasis is a necessary condition for multipeaked fitness landscapes. J. Theor. Biol. 272: 141–144. [DOI] [PubMed] [Google Scholar]
  40. Ram Y., Hadany L., 2015.  The probability of improvement in Fisher’s geometric model: a probabilistic approach. Theor. Popul. Biol. 99: 1–6. [DOI] [PubMed] [Google Scholar]
  41. Rokyta D., Joyce P., Caudle S., Mille C., Beisel C., et al. , 2011.  Epistasis between beneficial mutations and the phenotype-to-fitness map for a ssDNA virus. PLoS Genet. 7: e1002075. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Rozen D. E., Habets M. G. J. L., Handel A., de Visser J. A. G. M., 2008.  Heterogeneous adaptive trajectories of small populations on complex fitness landscapes. PLoS One 3: e1715. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Schenk M. F., Szendro I. G., Salverda M. L., Krug J., de Visser J. A. G. M., 2013.  Patterns of epistasis between beneficial mutations in an antibiotic resistance gene. Mol. Biol. Evol. 30: 1779–1787. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Schmitt A. O., Herzel H., 1997.  Estimating the entropy of DNA sequences. J. Theor. Biol. 188: 369–377. [DOI] [PubMed] [Google Scholar]
  45. Schoustra S., Hwang S., Krug J., de Visser J. A. G. M., 2016.  Diminishing-returns epistasis among random beneficial mutations in a multicellular fungus. Proc. Biol. Sci. 283: 20161376. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Servedio M. R., Brandvain Y., Dhole S., Fitzpatrick C. L., Goldberg E. E., et al. , 2014.  Not just a theory-the utility of mathematical models in evolutionary biology. PLoS Biol. 12: e1002017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Szendro I. G., Schenk M. F., Franke J., Krug J., de Visser J. A. G. M., 2013.  Quantitative analyses of empirical fitness landscapes. J. Stat. Mech.: Theory Exp. 2013: P01005.
  48. Tanaka F., Edwards S. F., 1980.  Analytic theory of the ground state properties of a spin glass. I. Ising spin glass. J. Phys. F: Met. Phys. 10: 2769. [Google Scholar]
  49. Tenaillon O., 2014.  The utility of Fisher’s geometric model in evolutionary genetics. Annu. Rev. Ecol. Evol. Syst. 45: 179–201. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Velenich A., Gore J., 2013.  The strength of genetic interactions scales weakly with mutational effects. Genome Biol. 14: R76. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Wagner G. P., Kenney-Hunt J. P., Pavlicev M., Peck J. R., Waxman D., et al. , 2008.  Pleiotropic scaling of gene effects and the ‘cost of complexity’. Nature 452: 470–472. [DOI] [PubMed] [Google Scholar]
  52. Waxman D., Welch J. J., 2005.  Fisher’s microscope and Haldane’s ellipse. Am. Nat. 166: 447–457. [DOI] [PubMed] [Google Scholar]
  53. Weinberger E. D., 1991.  Local properties of Kauffman’s N-k model: a tunably rugged energy landscape. Phys. Rev. A 44: 6399–6413. [DOI] [PubMed] [Google Scholar]
  54. Weinreich D. M., Knies J. L., 2013.  Fisher’s geometric model of adaptation meets the functional synthesis: data on pairwise epistasis for fitness yields insights into the shape and size of phenotype space. Evolution 67: 2957–2972. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Weinreich D. M., Watson R. A., Chao L., 2005.  Sign epistasis and genetic constraint on evolutionary trajectories. Evolution 59: 1165–1174. [PubMed] [Google Scholar]
  56. Weinreich D. M., Lan Y., Wylie C. S., Heckendorn R. B., 2013.  Should evolutionary geneticists worry about higher-order epistasis? Curr. Opin. Genet. Dev. 23: 700–707. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Zagorski M., Burda Z., Waclaw B., 2016.  Beyond the hypercube: evolutionary accessibility of fitness landscapes with realistic mutational networks. PLoS Comp. Biol. 12: e1005218. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The authors state that all data necessary for confirming the conclusions presented in the article are represented fully within the article. All numerical calculations including simulations described in this work were implemented in Mathematica and C++. When counting the number of local genotypic maxima, we checked all genotypes and counted the exact number for a randomly realized landscape, then took an average. All relevant source codes are available upon request.


Articles from Genetics are provided here courtesy of Oxford University Press

RESOURCES