Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2015 Apr 28.
Published in final edited form as: Theor Popul Biol. 2013 Nov 7;91:3–19. doi: 10.1016/j.tpb.2013.10.004

Evolution of learning strategies in temporally and spatially variable environments: A review of theory

Kenichi Aoki a,*, Marcus W Feldman b
PMCID: PMC4412376  NIHMSID: NIHMS527132  PMID: 24211681

Abstract

The theoretical literature from 1985 to the present on the evolution of learning strategies in variable environments is reviewed, with the focus on deterministic dynamical models that are amenable to local stability analysis, and on deterministic models yielding evolutionarily stable strategies. Individual learning, unbiased and biased social learning, mixed learning, and learning schedules are considered. A rapidly changing environment or frequent migration in a spatially heterogeneous environment favors individual learning over unbiased social learning. However, results are not so straightforward in the context of learning schedules or when biases in social learning are introduced. The three major methods of modeling temporal environmental change – coevolutionary, two-timescale, and information decay – are compared and shown to sometimes yield contradictory results. The so-called Rogers’ paradox is inherent in the two-timescale method as originally applied to the evolution of pure strategies, but is often eliminated when the other methods are used. Moreover, Rogers’ paradox is not observed for the mixed learning strategies and learning schedules that we review. We believe that further theoretical work is necessary on learning schedules and biased social learning, based on models that are logically consistent and empirically pertinent.

Keywords: Dynamical models, Evolutionarily stable strategies, Monte Carlo/agent-based simulations, Rogers’ paradox

1. Introduction

Learning is a means of acquiring information about the environment and of expressing a phenotype (behavior) appropriate to that environment. Two forms of learning may be distinguished by the source of the information acquired. Individual learning (IL) occurs when an organism depends on its personal experience to gather the information directly from the environment, e.g., by trial-and-error. The second form of learning is social learning (SL), which occurs when an organism obtains the information indirectly by copying other organisms, e.g., by imitation.

A learning strategy is the way in which an organism combines IL and SL, either simultaneously or sequentially, and its relative dependence on each. Biases associated with SL in the choice of whom to copy are also an integral part of a learning strategy. The simplest strategies involve the use of IL or SL but not both. Each learning strategy can be regarded as a genetic adaptation to a specific kind of environmental variability. A learning strategy supports culture, to the extent that an innovation produced by IL is propagated through the population by SL. The learning strategy available to a species will – in conjunction with other factors such as its demography – determine the nature and properties of its culture.

Evolutionary models of learning, the subject of this review, are to be distinguished from classical learning models in psychology, which were constructed as mathematical formulations for how to assess the probabilities of alternative behaviors upon presentation of stimuli to the subject. These probabilities changed dynamically so that the subject’s behavior over time would also change. The focus was on a modifications of individual behavior over the course of such trials (Bush and Mosteller, 1955; Hanania, 1959). Extensions of such models have been made to competitive situations where the members of a set of players adopt behaviors at each time step that depend on the history of decisions made by all the players (e.g., Izquierdo and Izquierdo, 2008). Common applications allow players to choose one of two behaviors, and the time-dependent and asymptotic probabilities of adopting each behavior are computed.

Our focus is on the evolution of learning strategies in a population. Each learning strategy is assumed to be genetically determined and – in the models that we consider in this review – not modifiable by learning. The fitness of a learning strategy in a given environment depends on whether the behavior(s) it dictates is (are) adaptive or maladaptive in that environment. The environment may change in time or vary spatially, and a behavior that may have been the best, in terms of natural selection, in one environment may not be the best in another. The fitness of a learning strategy also depends in a frequency-dependent manner on what the competing strategies are doing. Earlier studies (e.g., Boyd and Richerson, 1985; Rogers, 1988; Feldman et al., 1996) emphasized SL, as this form of learning is essential for culture. More recently, learning strategies combining IL and SL that support cumulative culture are receiving attention (e.g., Enquist et al., 2007; Borenstein et al., 2008; Aoki, 2010; Lehmann et al., 2010; Aoki et al., 2012).

In ecology, the evolution of learning has been widely studied in the context of foraging (e.g., Barnard and Sibly, 1981; Stephens, 1991; Rodriguez-Gironés and Vásquez, 1997; Giraldeau and Caraco, 2000; Eliassen et al., 2009; Dubois et al., 2010; Katsnelson et al., 2011; Arbilly et al., 2011). The models in this area often address complex situations and posit specific targets of learning, such as where to forage or whether to produce or to scrounge. As such, these models are usually not amenable to a formal mathematical treatment. The evolutionary models of learning that we consider in this review are more “abstract”, in the sense that the behavioral alternatives are distinguished only by whether they are adaptive or maladaptive, or by the degree of adaptedness. Some models are phrased in terms of the number of adaptive cultural traits carried by an organism (Lehmann and Feldman, 2009; Nakahashi, 2010), but they will not be addressed in this review. In the simplest situations, we can write down the dynamical equations describing the changes in the frequencies of the competing learning strategies in terms of their variable fitnesses in the different environments to which they are exposed. More complicated situations involving strategies that differ in the probabilities of using IL or SL can sometimes be modeled by the evolutionarily stable strategy (ESS) approach (Maynard Smith, 1982).

The models reviewed in detail in this paper are numbered sequentially from 1 to 11. We seek the stable equilibria of the dynamical equations or alternatively the ES learning strategy. In addition, we briefly discuss several interesting but complex models, some of which have been investigated using Monte Carlo/agent-based simulations. It will be seen that the results obtained from the simpler models can usefully be applied to interpreting the observations on the more complex models. Finally, we ask whether the presence of SL will improve the (geometric) mean fitness of a population relative to when it is absent—i.e., we address the so-called Rogers’ paradox (Rogers, 1988; Boyd and Richerson, 1995). Table 1 summarizes the provenance of models 1–11 and indicates for each model whether or not Rogers’ paradox occurs.

Table 1.

Provenance of Models 1–11 and possibility of Rogers’ paradox.

Model Provenance Commentsa Rogers’ paradox
1 Feldman et al. (1996) Generalization Always observed
2 Feldman et al. (1996) Parameter range extended Sometimes resolved
3 Rogers (1988) Modified formulation Always observed
4 Kendal et al. (2009) Simplification Sometimes resolved
5 Feldman et al. (1996) Detailed analysis Resolved
6 Boyd and Richerson (1988, 1995) Modified formulation Not considered
7 Aoki and Nakahashi (2008) Unmodified Sometimes resolved
8 Enquist et al. (2007) Reworded Sometimes resolved
9 Aoki et al. (2012) Unmodified Resolved
10 Nakahashi et al. (2012) Unmodified Not addressed
11 Wakano and Aoki (2006) Unmodified Not addressed
a

Comments refer to the present analysis and discussion of the models in the corresponding references.

2. Dynamical models in temporally variable environments

The basic models of this section assume the simplest learning strategies, namely those that involve the use of IL or SL but not both. They also assume dichotomous variation in the phenotype (behavior) that can be acquired by learning. It is then possible to write down the difference equations governing the frequency dynamics of the learning strategies and phenotypes, which is done here for three of the four models.

2.1. Model 1: infinite-states l-cycle coevolutionary model

This model, which was first described by Feldman et al. (1996) in a slightly less general form, is coevolutionary in the sense that the learning strategies and behaviors can coevolve. Consider an infinite population of haploid organisms in which a genetic locus with two alleles determines whether an organism is an obligate individual learner or an obligate social learner. Among the adults of each generation, we distinguish two behaviors, correct or wrong, which are adaptive or maladaptive, respectively, in the environment faced by that generation. Behaviors are defined relative to the environment, so that when the environment changes, so do the behaviors that are correct or wrong. These adults reproduce asexually without fertility differences.

A newborn individual learner gathers information directly from the environment and achieves the correct behavior on its own before becoming an adult. However, it suffers a cost, c, which can be interpreted as the probability of making a fatal mistake. Hence, a fraction 1 − c of individual learners survive to adulthood, and they all show the correct behavior.

A newborn social learner, on the other hand, acquires its behavior by faithfully copying (i.e., imitating) a random member of the parental generation. Its behavior will be correct only if the behavior that it copies from its exemplar (i.e., cultural parent) is correct in the environment into which it is born. We assume that the environment changes every l generations, with that change occurring just prior to birth. Moreover, an environmental change results in a previously unknown state, which entails that neither of the two preexisting behaviors (correct or wrong) can be correct after the environmental change. Hence, only the individual learners can acquire the correct behavior immediately after an environmental change; this is known as the infinite environmental states assumption. A social learner with correct behavior has fitness (relative viability) 1, whereas the fitness associated with wrong behavior is 1 − s. We assume 0 < c < s < 1; otherwise, the individual learners will be selected out unconditionally.

Hence, among the surviving adults of any generation, there can be three phenogenotypes (i.e., genotype–phenotype combinations, Feldman and Cavalli-Sforza, 1984): individual learner, social learner with correct behavior (SLC), and social learner with wrong behavior (SLW). Let us denote their respective frequencies in the parental generation by z, x, and y, and the corresponding frequencies in the offspring generation by z′, x′, and y′. Then, the difference equations governing the dynamics of these variables can be written as follows. When there is an intervening environmental change, which occurs once every l generations, we have

Vx=0, (1.1a)
Vy=(1-s)(x+y), (1.1b)
Vz=(1-c)z, (1.1c)

where

V=(1-c)z+(1-s)(1-z). (1.1d)

On the other hand, when the environment does not change between the generations,

Wx=(x+y)(x+z), (1.2a)
Wy=(1-s)(x+y)y, (1.2b)
Wz=(1-c)z, (1.2c)

where

W=(1-c)z+(1-sy)(1-z). (1.2d)

Eqs. (1.2) are to be applied consecutively l − 1 times, since there are l − 1 generations of environmental stasis after a change. The variables V and W in Eqs. (1.1d) and (1.2d) give the population mean fitnesses when the environment changes and when it does not change, respectively, and serve to normalize the equations.

To illustrate how these recursions follow from the assumptions, we briefly explain the derivation of Eqs. (1.1a), (1.1b), (1.2a) and (1.2b). Note first that SLC and SLW have the same genotype although their phenotypes differ, so that the frequency of social learners among the newborns of the offspring generation will be the sum of their frequencies, x+y. When the environment changes, neither the correct behavior nor the wrong behavior in the parental generation is correct as viewed from – i.e., in the environment faced by – the offspring generation. Hence, all newborn social learners will be SLW, and since their viability is 1 − s, we obtain Eq. (1.1b). In this case the frequency of SLC will of course be 0, resulting in Eq. (1.1a). With environmental stasis, on the other hand, fractions x + z and y of the parental generation – recall x, y, and z are the frequencies of SLC, SLW, and individual learners, respectively – have correct and wrong behaviors, respectively, as viewed from the offspring generation. Hence, when a newborn social learner copies a random member of the parental generation, it acquires the correct behavior with probability x + z and the wrong behavior with complementary probability y. On including the effects of viability selection, we obtain Eqs. (1.2a) and (1.2b).

There are two monomorphic equilibria of the model, which always (i.e., for all valid parameter combinations) exist. At the first such equilibrium, the individual learners are fixed, and we have (i) = 1 for 0 ≤ il − 1, where the hat indicates equilibrium, and the generations are counted with i = 0 indicating the generation immediately before an environmental change. This equilibrium is (locally) stable if

1-s<(1-c)l. (1.3)

At the second monomorphic equilibrium, SLW are fixed, such that ŷ(i) = 1 for 0 ≤ il − 1. This equilibrium is always unstable. In addition, we conjecture the existence of a periodic fully polymorphic (all three phenogenotypes are present) equilibrium, which is stable when inequality (1.3) is reversed. Clearly, inequality (1.3) is more likely to be satisfied when l is small, where l is the period of environmental change. Since a shorter period corresponds to greater environmental instability, the model predicts that individual learners are more likely to be fixed in a more changeable environment. See Appendix A for details.

2.2. Rogers’ paradox

Rogers (1988) reasoned that the introduction of social learners into a population of individual learners would improve the mean fitness of that population, because social learners make culture possible, and culture is presumably an adaptation the capacity for which evolved by natural selection. Contrary to expectation, his simple model (see Model 3) suggested that this was not true. Rogers’ paradox, as it has come to be called, is said to occur when the parameter values are such that (a) a stable equilibrium exists with social learning (learners) present, (b) a monomorphic equilibrium of individual learning (learners) also exists, and (c) the two equilibria have the same mean fitness (Boyd and Richerson, 1995; Enquist et al., 2007). In such a case, culture does not appear to impart a fitness advantage: hence the paradox.

For Model 1 described above, numerical work suggests that a periodic fully polymorphic equilibrium exists and is stable when inequality (1.3) is reversed. Because (i) > 0 for 0 ≤ il − 1 at this equilibrium, Eqs. (1.1c) and (1.2c) entail that the equilibrium mean fitness in any generation is equal to 1 − c, and hence that the geometric mean of the mean fitnesses over the l-cycle is 1 − c. The same argument shows that the geometric mean when individual learners are fixed is also 1 − c. Thus, Rogers’ paradox is observed in this model.

2.3. Model 2: two-state l-cycle coevolutionary model

We modify Model 1 by assuming that the environment fluctuates between two states, e.g., hot and cold, again with period l. There are two behaviors each of which is correct in one environmental state and wrong in the other. Hence, when there is an intervening environmental change, the wrong behavior in the parental generation becomes the correct behavior as viewed from the offspring generation. For this model, the difference Eq. (1.2) still apply, but we must replace Eq. (1.1) by

Vx=(x+y)y, (2.1a)
Vy=(1-s)(x+y)(x+z), (2.1b)
Vz=(1-c)z, (2.1c)

where

V=(1-c)z+[1-s(1-y)](1-z). (2.1d)

Note that x + z and y in Eqs. (2.1a) and (2.1b) are interchanged, reflecting the reversal of correct and wrong after an environmental change.

In relation to Rogers’ paradox, an interesting aspect of this model is that a population that is monomorphic for social learners – but in which SLC and SLW coexist – cannot be invaded by individual learners if

(1-c)2<1-s. (2.2)

There are an infinite number of periodic equilibria where social learners are fixed, which are neutrally stable with regard to perturbations in the frequencies of SLC and SLW. Nevertheless, each such equilibrium is stable to invasion by individual learners provided inequality (2.2) is satisfied. Moreover, the geometric mean of the mean fitnesses is 1-s, whereas it is 1 − c when individual learners are fixed. Hence, a population that is monomorphic for social learners can be stable and have a higher mean fitness than one that is monomorphic for individual learners—Rogers’ paradox is resolved. See Appendix B for details. (In Feldman et al. (1996), we stated that the fixation of social learners was always unstable. This result followed from the parameterization used in that paper which entailed that c < s/2.)

2.4. Model 3: Rogers’ two-timescale model

The two-timescale model assumes that the genetic evolution is much slower than the cultural evolution and hence that genetic variables can be regarded as constants of cultural evolution (Rogers, 1988; Enquist et al., 2007). Although this is not a dynamical model for phenogenotypes, we include it here because it is closely related to the other models in this section. We explain Rogers’ result using an argument that differs slightly from the original. There are two environmental states as in Model 2 described above. The possible phenogenotypes and their fitnesses are as in Models 1 and 2, which is a more general parameterization than Rogers (1988).

Let p be the genetically-determined frequency of social learners among adults. Consider a focal newborn social learner and a social transmission chain extending backward in time with random copying from the previous generation. This chain may comprise social learners of ascending generations, but will eventually end in an individual learner. Since p is assumed constant during cultural evolution, the probability that an individual learner occurs in this chain for the first time exactly t generations ago is

pt-1(1-p). (3.1)

Next, let ε be the probability of an environmental change between generations. Since the environment fluctuates between two states, fitness is clearly dependent on whether the number of environmental changes is even or odd. Appendix C shows that the probability of no environmental change or an even number of changes in t generations is

12[1+(1-2ε)t]. (3.2)

In this case, the (correct) behavior acquired t generations ago by an individual learner is correct in the current generation. If, on the other hand, there are an odd number of environmental changes, which occurs with complementary probability 12[1-(1-2ε)t], this behavior is now wrong, and the fitness will be 1 − s rather than 1. Hence, the expected fitness of a social learner is

ws(2)=12t=1pt-1(1-p){[1+(1-2ε)t]+[1-(1-2ε)t](1-s)}, (3.3)

which reduces to

ws(2)=1-sε1-p(1-2ε). (3.4)

The superscript (2) indicates that the fitness of the social learner has been calculated on the two environmental states model. Rogers (1988) sets the fitnesses of SLC and SLW to w + b and wb, respectively. It can be shown with the appropriate substitutions (w → 1 − s/2, bs/2, u → 2ε) that Eq. (3.4) is equivalent to the unnumbered equation in Rogers (1988) immediately above Eq. (2) of that paper.

The fitness of an individual learner is 1 − c. Thus, at the genetically polymorphic equilibrium

1-c=1-sε1-p(1-2ε). (3.5)

Solving Eq. (3.5) yields

p^=c-sεc(1-2ε), (3.6)

which is valid for < c. If > c, on the other hand, the only valid equilibrium is one that is monomorphic for individual learners.

Let us now briefly redo the calculations for an infinite environmental states version of Rogers’ model. The probability of no environmental change in t generations is (1 − ε)t, and the probability of at least one change is 1 − (1 − ε)t. Hence, the expected fitness of the newborn social learner in this modified Rogers’ model is

ws()=t=1pt-1(1-p){(1-ε)t+[1-(1-ε)t](1-s)}, (3.7)

which reduces to

ws=1-sε1-p(1-ε). (3.8)

The superscript (∞) distinguishes the infinite from the two environmental states model. Hence, the frequency of social learners at the genetically polymorphic equilibrium for this case is

p^=c-sεc(1-ε). (3.9)

We have two comments on the analysis by Rogers (1988). First, the so-called Rogers’ paradox is inherent in his derivation, since the genetically polymorphic equilibrium, , is obtained by equating the fitnesses of the individual learners and the social learners (see Eq. (3.5)). Second, it fails to identify the stable monomorphism of social learners that is predicted by the dynamical model (Model 2).

2.5. Model 4: information decay model

Assume, as in the modified (infinite environmental states) Rogers’ model, that the correct behavior becomes outdated and hence wrong with probability ε per generation. Parameter ε would appear to have the same meaning as in the modified Rogers’ model. Then, with the possible phenogenotypes and their fitnesses as in Model 1, the difference equations are

Wx=(x+y)(x+z)(1-ε), (4.1a)
Wy=(1-s)(x+y)[(x+z)ε+y], (4.1b)
Wz=(1-c)z, (4.1c)

where

W=(1-c)z+{1-s[ε+y(1-ε)]}(1-z). (4.1d)

In Eqs. (4.1a) and (4.1b), the term (x + z) ε represents the decay of adaptive information. With this model, we do not require a separate set of equations for environmental change and stasis. This modeling approach has been used by Kendal et al. (2009), Lehmann and Feldman (2009), Lehmann et al. (2010), and Nakahashi (2010). In fact, the inclusion of Model 4 in the current paper was motivated by the model of Kendal et al. (2009) for the case of unbiased social learning.

The equilibria of Eq. (4.1) and their stability properties are as follows.

  1. = 0, ŷ = 0, = 1, Ŵ = 1 − c; this is monomorphic for individual learners, always (i.e., for all parameter combinations) exists, and is stable if c < .

  2. = 0, ŷ = 1, = 0, Ŵ = 1 − s; this always exists but is never stable.

  3. x^=s-εs(1-ε),y^=ε(1-s)s(1-ε), = 0, Ŵ = 1 − ε; this is genetically monomorphic for social learners, exists if ε < s, and is stable if ε < c.

  4. x^=(c-sε)(s-c)s(1-ε)c(1-s),y^c-sεs(1-ε),z^(s-c)(ε-c)(1-ε)c(1-s), Ŵ = 1 − c; this is a fully polymorphic equilibrium, exists if < c < ε, and is stable whenever it exists. See Appendix D for details.

Fig. 1 illustrates the mutually exclusive regions in the parameter space of c and s where equilibria (A), (C), and (D) exist and are stable. Equilibrium (A), which is monomorphic for individual learners, coexists with the other two equilibria throughout the upper triangular region of Fig. 1, and is always associated with a mean fitness of 1 − c. Social learners are present at equilibria (C) and (D). The mean fitness at equilibrium (D) is 1 − c, which is identical to that at equilibrium (A), and these two equilibria can coexist for the same parameter values (i.e., < c < ε). Hence, Rogers’ paradox occurs in this case.

Fig. 1.

Fig. 1

Regions of local stability of the equilibria of Model 4 (information decay model) in the (c, s)-parameter space. In the triangular regions labeled IL, SL, and P, the parameter values are such that fixation of individual learners, fixation of social learners, and polymorphism of both are, respectively, the unique locally stable equilibrium. The lower triangular region is void by assumption. Parameter ε represents the rate of decay of adaptive information per generation.

When c > ε on the other hand, the equilibrium (C) is stable, and the mean fitness at this equilibrium, 1 − ε, is greater than at the coexisting unstable monomorphism of individual learners. Hence, Rogers’ paradox is resolved in this case. Interestingly, an equilibrium corresponding to (C) does not exist in the model of Kendal et al. (2009), which entails that Rogers’ paradox is not eliminated (in the case of unbiased social learning). But when notational idiosyncrasies are accounted for, their model is identical to Model 4 considered here, except that selection acts through fertility rather than viability differences. Clearly, whether or not Rogers’ paradox occurs is determined by subtle differences in the assumptions.

Equilibrium (D) in this model would seem to correspond to the genetically polymorphic equilibrium in the modified Rogers’ model given by Eq. (3.9). Hence, one might expect + ŷ = to hold, but this is not the case. In fact, x^+y^=p^1-c1-s.

3. ESS models in temporally variable environments

3.1. Model 5: mixed strategy model with infinite-states l-cycle

The four models described above share the perhaps “unrealistic” assumption that an organism can engage in individual learning (IL) or social learning (SL) but not both. Here, we consider a model due to Feldman et al. (1996) in which IL and SL are both available to the same organism. The analysis requires us to posit two genetically determined mixed strategies, the resident and the mutant, that differ in the probability of using IL, which we denote by L for the resident and L + dL for the mutant. As before, a behavior is either correct or wrong. Hence, we need to distinguish four phenogenotypes: resident with correct behavior, resident with wrong behavior, mutant with correct behavior, and mutant with wrong behavior. Their frequencies among asexually reproducing adults are x, , y, and ȳ, respectively. Note that, these variables are defined differently from those in the dynamical models considered in the previous section. An environmental change results in a state that has not been experienced before (infinite states).

When the environment changes, the correct behavior is achieved only by IL. Hence, the difference equations are

Vx=(x+x¯)L(1-c), (5.1a)
Vx¯=(x+x¯)(1-L)(1-s), (5.1b)
Vy=(y+y¯)(L+dL)(1-c), (5.1c)
Vy¯=(y+y¯)(1-L-dL)(1-s), (5.1d)

with

V=(x+x¯)[L(1-c)+(1-L)(1-s)]+(y+y¯)[(L+dL)(1-c)+(1-L-dL)(1-s)]. (5.1e)

When there is no environmental change between the generations, the correct behavior can be achieved either by IL, or by SL from an appropriate exemplar. Hence,

Wx=(x+x¯)[L(1-c)+(1-L)(x+y)], (5.2a)
Wx¯=(x+x¯)(1-L)(1-s)(x¯+y¯), (5.2b)
Wy=(y+y¯)[(L+dL)(1-c)+(1-L-dL)(x+y)], (5.2c)
Wy¯=(y+y¯)(1-L-dL)(1-s)(x¯+y¯), (5.2d)

with

W=(x+x¯){L(1-c)+(1-L)[1-s(x¯+y¯)]}+(y-y¯){(L+dL)(1-c)+(1-L-dL)[1-s(x¯+y¯)]}. (5.2e)

As before, we assume 0 < c < s < 1.

We show in Appendix E that the genetically monomorphic periodic equilibrium of the resident can be written as

x¯^(i)=1-α1-α+β(1-αi),x^(i)=1-x¯^(i) (5.3a)

for 1 ≤ il, where

α=L(1-c)+1-L(1-L)(1-s),β=L(1-c)(1-L)(1-s). (5.3b)

This equilibrium has period l, since x¯^(1)=11+β every l generations. Hence, x¯^(0)=x¯^(l). In the special case of L = 1, we clearly have x¯^(i)=0 for 1 ≤ il from Eqs. (5.1b) and (5.2b).

Furthermore, the eigenvalue governing the invasion of the mutant is

λ=Ai=1l-1B(i), (5.4a)

where

A=1+dLs-cL(1-c)+(1-L)(1-s),B(i)=1+dLsx¯^(i)-cL(1-c)+(1-L)(1-sx¯^(i)). (5.4b)

It can readily be shown that the L = 0 (pure SL) strategy cannot be an ESS, and that a necessary condition for the L = 1 (pure IL) strategy to be an ESS is l < s/c. The latter condition entails that shorter periods of environmental change are necessary for the evolution of IL, which is consistent with our findings based on the previous models. Furthermore, when dL is small, a necessary condition for an internal ESS is given by

f(L)=s-cL(1-c)+(1-L)(1-s)+i=1l-1sx¯^(i)-cL(1-c)+(1-L)(1-sx¯^(i))=0. (5.5)

The solution(s) for L can be found numerically for given values of s, c, and l on substituting Eq. (5.3a) with (5.3b) into (5.5).

Let us now examine the special case of environmental period 2 (l = 2) in some detail. We wish to determine the conditions for an interior ESS – which is analogous to a genetically polymorphic equilibrium in the dynamical models – to exist and whether Rogers’ paradox is observed. Eqs. (5.3) and (5.4) reduce to

x¯^(0)=11+β(1+α),x¯^(1)=11+β, (5.6)

with α and β given by Eq. (5.3b), and

λ=1+dL[s-cL(1-c)+(1-L)(1-s)+sx¯^(1)-cL(1-c)+(1-L)(1-sx¯^(1))]+(dL)2[s-cL(1-c)+(1-L)(1-s)]×[sx¯^(1)-cL(1-c)+(1-L)(1-sx¯^(1))]. (5.7)

The coefficient of dL in Eq. (5.7) is clearly f(L) as defined in Eq. (5.5) for the special case of l = 2. Using Eqs. (5.3) and (5.6), we find that f(0) > 0 since c < s by assumption, and that f(1) < 0 provided s/c < 2. We assume s/c < 2 in what follows. There is an (at least one) interior root of f(L) = 0 in this case, which we denote by L*. Of the two terms in f(L*), the first is positive, so the second term must be negative. Hence, the coefficient of (dL)2 is also negative, ensuring that L* is an ESS.

Next, the product of the mean fitnesses in generations 0 and 1 is given by the quadratic

g(L)=W^(0)W^(1)=[L(1-c)+(1-L)(1-s)]2+L(1-L)s(1-c). (5.8)

Clearly, g(0) = (1 − s)2, g(1) = (1 − c)2, and g(L) has a maximum between 0 and 1, as shown schematically in Fig. 2. If Rogers’ paradox occurs, then we should have g(L*) = (1 − c)2. Unfortunately, although f(L) and g(L) are both simple functions of L, we are unable to show analytically whether this is true or false. On the other hand, we find on numerically solving f(L) = 0 for L* that g(L*) > (1 − c)2. Thus, the product (or alternatively the geometric mean) of the mean fitnesses is apparently greater for the interior ESS than in a population using the L = 1 (pure IL) strategy, but is not maximized.

Fig. 2.

Fig. 2

Product of the mean fitnesses in two successive generations (g(L)) plotted against the probability of using individual learning (L) for the special case of Model 5 (mixed strategy model) with environmental period 2(l = 2). L* denotes the evolutionarily stable probability of individual learning. Note that g(L*) > (1 − c)2, whereas equality would hold if Rogers’ paradox were to apply.

3.2. Model 6: sampling the environment

We briefly describe a model due to Boyd and Richerson (1988, 1995), which we include here because it has a mathematical structure that is quite similar to Model 5 (Wakano and Aoki, 2007). The original formulation and analysis by Boyd and Richerson (1988, 1995) were based on a combination of the two-timescale (see Model 3) and the information decay (see Model 4) approaches.

In this model, the environment fluctuates between two states. Using a parameter μ, the two states are denoted μ > 0 and −μ < 0. We assume that the temporal fluctuations occur periodically, rather than with a given probability as in the model of Boyd and Richerson (1988, 1995). An organism first samples the environment by IL, and then resorts to oblique SL when the information obtained is indecisive. More specifically, suppose that the environment is in state μ > 0. Information, z, obtained on sampling the environment is distributed normally as

φ(z)=12πσexp{-(z-μ)22σ2}. (6.1)

Large positive values of z correctly suggest that the environment is in state μ > 0, while moderately large negative values incorrectly suggest that the environment is in state −μ < 0. The resident strategy achieves the correct behavior by IL with probability

πC=dφ(z)dz (6.2a)

when z exceeds a genetically determined threshold, d. Similarly, it acquires the wrong behavior by IL with probability

πW=--dφ(z)dz (6.2b)

when z is small, and it resorts to SL with probability

K=-ddφ(z)dz (6.2c)

when z takes intermediate values. Hence K + πC +πW = 1, and due to the way in which the three parameters are defined in Eq. (6.2), only one is independent.

Analogous equations apply when the environment is in state −μ. For the mutant strategy, we substitute K + dK for K, πC + C for πC, and πW + W for πW, where dK + C + W = 0.

As before, we denote the frequencies among adults of the residents with correct behavior, the residents with wrong behavior, the mutants with correct behavior, and the mutants with wrong behavior by x, , y, and ȳ, respectively. Then, when the environment changes between generations,

Vx=(x+x¯)[K(x¯+y¯)+πC], (6.3a)
Vx¯=(x+x¯)[K(x+y)+πW](1-s), (6.3b)
Vy=(y+y¯)[(K+dK)(x¯+y¯)+πC+dπC], (6.3c)
Vy¯=(y+y¯)[(K+dK)(x+y)+πW+dπW](1-s), (6.3d)

and when the environment remains constant between generations

Wx=(x+x¯)[K(x+y)+πC], (6.4a)
Wx¯=(x+x¯)[K(x¯+y¯)+πW](1-s), (6.4b)
Wy=(y+y¯)[(K+dK)(x+y)+πC+dπC], (6.4c)
Wy¯=(y+y¯)[(K+dK)(x¯+y¯)+πW+dπW](1-s). (6.4d)

Let the period of environmental fluctuations be l. Assuming that dK, C, and W are small, Wakano and Aoki (2007) showed that the K = 0 (i.e., d = 0; pure IL) strategy is unstable if l > 2 and the K = 1 (i.e., d → ∞; pure SL) strategy is always unstable. Hence, at least one interior ESS, 0 < K* < 1 (0 < d* < ∞), is predicted for l > 2, and can be identified numerically.

A comparison of Eqs. (5.1) and (5.2) with Eqs. (6.3) and (6.4) shows that Model 6 indeed has a similar mathematical structure to Model 5, although there are two significant differences. First, Model 6 assumes two environmental states as does Model 2, so that the wrong behavior in the parental generation becomes the correct behavior in the offspring generation after an environmental change. Second, IL can result in the wrong behavior, but in Model 6 does not entail an exogenous cost measured by c in Model 5. The model of Boyd and Richerson (1988, 1995) on which Model 6 is based has often been interpreted as prescribing a learning schedule in which IL is followed by SL. However, the similarities with Model 5 indicate that the sequential use of IL and SL is not a necessary aspect of this learning strategy.

Boyd and Richerson (1995) find that the interior ESS is associated with a greater mean fitness than the K = 0 (pure IL) strategy (i.e., Rogers’ paradox is resolved), but does not maximize the mean fitness, which again is qualitatively similar to the result for Model 5.

4. Population structure, spatially variable environment, and migration

Spatial environmental heterogeneity such that migrating organisms will experience novel environments has been addressed by Boyd and Richerson (1985, 1988, 1995), Henrich and Boyd (1998), and Rendell et al. (2010), Aoki (2010), Nakahashi et al. (2012) among others. Here, we briefly review one dynamical model due to Aoki and Nakahashi (2008), which permits a fairly complete analysis.

4.1. Model 7: finite island model with environmentally heterogeneous sites

Organisms are of two genetically distinct types, individual learners and social learners. They may occupy any of n equally-connected sites in a spatially heterogeneous world. Each site has a different environment. We distinguish n behaviors, each of which is locally adapted to one particular environment but maladaptive in the n − 1 other environments. Behaviors that are maladaptive in all n environments are not incorporated into the dynamics. Let Xij(1 ≤ i, jn) be the number of social learners at site i that are adapted to the environment of site j. Then, at site i there are Xi=j=1nXij social learners, of which Xii are behaving adaptively (SLC), and XiXii are behaving maladaptively (SLW) at fitness cost s. Similarly, let Zi(1 ≤ i, jn) be the number of individual learners at site i. Individual learners always acquire the correct behavior, but suffer an exogenous cost c. Write Ni = Xi + Zi for the total population at site i. These numbers are enumerated at the adult stage just prior to reproduction. Birth is followed by social learning, migration, individual learning, and then viability selection, in that order.

The life cycle begins with reproduction, where each organism gives birth asexually to b(Ni) offspring according to the discrete logistic

b(Ni)=1+r(1-Ni/K). (7.1)

Here, the intrinsic growth rate, r > 0, and the carrying capacity, K > 0, are assumed to be the same for all sites. Second, social learners acquire their behavior by copying a random member of the parental generation at their site. As a result, the number of social learners at site i that are adapted to the environment of site j becomes

Xib(Ni)Xij+ZiδijNi, (7.2)

where δij = 1 for i = j and 0 otherwise. All members of the parental generation die immediately afterward. The third event is migration, where a fixed fraction of the organisms at each site emigrate (constant forward migration rate). We assume reciprocal migration between all pairs of sites at rate m/(n−1) with 0 < m ≤ 1/2. Fourth, the individual learners acquire the behavior suitable to their environment. And fifth, viability selection occurs such that all social learners behaving adaptively (SLC) survive, a fraction 1 − s of social learners behaving maladaptively (SLW) survive, and a fraction 1 − c of individual learners survive, where 0 < c < s < 1.

The above assumptions entail that the difference equations for this island model be written as

Xii=(1-m)Xib(Ni)Xii+ZiNi+mn-1kinXkb(Nk)XkiNk, (7.3a)
Xij=(1-s){(1-m)Xib(Ni)XijNi+mn-1Xjb(Nj)Xij+ZjNj+mn-1ki,jnXkb(Nk)XkjNk} (7.3b)
Zi=(1-c){(1-m)Zib(Ni)+mn-1kinZkb(Nk)}, (7.3c)

where 1 ≤ i, jn and ij in Eq. (7.3b). Consider, for example, Eq. (7.3b). Noting Eq. (7.2), the three terms in parentheses on the right hand side represent social learners – all with behavior adapted to site j – that are natives of site i, immigrants originating at site j, and immigrants originating at sites other than i and j, respectively.

Using local stability analysis and heuristic arguments, Aoki and Nakahashi (2008) identify four classes of stable equilibria of Eq. (7.3), fixation of individual learners (ii = 0, ij = 0, i > 0; class I), polymorphism of individual learners and social learners (ii > 0, ij > 0, i > 0; class II), fixation of social learners (ii > 0, ij > 0, i = 0; class III), and extinction (ii = 0, ij = 0, i = 0; class IV), which exist for mutually exclusive regions of the parameter space of m and c (Fig. 3). A higher migration rate between the environmentally heterogeneous sites of a subdivided population is analogous to a greater instability of the temporally changing environment, as pointed out by Boyd and Richerson (1985, 1988, 1995). Clearly, the evolution of individual learners is more likely at higher migration rates.

Fig. 3.

Fig. 3

Heuristic diagram showing the four regions of the (m, c)-parameter space (0 < m ≤ 1/2, 0 < c < s), corresponding to the four classes of stable equilibria of Model 7 (finite island model with environmentally heterogeneous sites). Region I: fixation of individual learners. Region II: polymorphism of individual learners and social learners. Region III: fixation of social learners. Region IV: extinction. Boundary between regions I and II: c = ms. Boundary between regions II and III: c = m(1 − θ), where θ={m-s-(1-s)mn-1+[m-s-(1-s)mn-1]2+4(1-s)m2n-1}/(2m). Boundary between regions II and IV: horizontal straight line c = r/(1 + r). Boundary between regions III and IV: vertical straight line defined implicitly by r = m(1 − θ)/[1 − m(1 − θ)]. Other parameter values are s = 0.5, r = 0.4, n = 15, K = 100.

However, as we will show later in connection with learning schedules and biased SL, rapid environmental change and high migration rates do not necessarily ensure that (pure) IL will evolve.

Let us now consider Rogers’ question of whether the presence of social learners – in addition to, or to the exclusion of, individual learners – enhances the adaptedness of a population at equilibrium. Total population size serves as a proxy measure of adaptedness in this and some other models (Lehmann and Feldman, 2009; Rendell et al., 2010). From symmetry considerations, the total population at the class II equilibria (stable polymorphism of individual learners and social learners) is

nN^=nK[1-c/r(1-c)], (7.4)

which is the same as when individual learners are fixed. At the class III equilibria (stable monomorphism of social learners), on the other hand, the total population size exceeds the value given by Eq. (7.4). Hence, Rogers’ paradox is observed in the former case, but is resolved in the latter.

5. Learning schedules

With the possible exception of Model 6, all models considered so far assume “one-shot” strategies in which an organism uses IL and/or SL with fixed probabilities throughout its life. In this section, we consider models that allow for two learning stages per generation, in which the probabilities of using IL and/or SL can differ between the two stages. We assume an unstructured population of infinite size with asexual reproduction.

5.1. Model 8: critical social learning

Enquist et al. (2007) introduce two novel strategies called “critical social learning” and “conditional social learning”. Critical social learning is a learning schedule in which an organism first uses SL. If the correct behavior is acquired by SL, where the organism is assumed to be capable of judging whether it has succeeded in doing so, no further learning occurs. However, if the wrong behavior is acquired by SL, the organism next tries IL. In conditional social learning, the order in which SL and IL are used is interchanged. For both learning strategies, the exemplar in SL is chosen randomly, and the occurrence of the second learning stage is contingent on the outcome of the first.

In this section, we summarize their proof that critical social learning can be an ESS in the absence of conditional social learners. The competing strategies are then obligate individual learner, obligate social learner, and critical social learner. Enquist et al. (2007) adopt the two-timescale approach, which assumes that cultural evolution can be investigated with the genetic variables held constant (see also Model 3). They obtain the frequency of the correct behavior – they call this “the OK solution” – at the equilibrium of the cultural dynamics, qOK, which they then use to calculate the selection coefficients – fitnesses, in their terminology – of the competing strategies.

Let us define the following parameters as in their model, using a more compact notation: ε is the probability of environmental change per generation (i.e., information decay rate, see Model 4), α is the probability that an individual learner acquires the correct behavior, β is the probability that a social learner faithfully copies the exemplar, ci is the exogenous cost of IL, and cs is the exogenous cost of SL. Then, we can write the selection coefficients of obligate individual learners, obligate social learners, and critical social learners as

wi=α-ci, (8.1a)
ws=qOK(1-ε)β-cs, (8.1b)
wsi=qOK(1-ε)β-cs+[1-qOK(1-ε)β](α-ci), (8.1c)

respectively. Note that wsi > ws if, as we assume, individual learning is adaptive, i.e. α > ci.

To determine qOK from the cultural dynamics, we denote the frequencies of obligate individual learners, obligate social learners, and critical social learners – which are by assumption fixed during cultural evolution – by qi, qs, and qsi, respectively, and the variable frequency of the OK solution in generation t by qtOK. Then,

qt+1OK=qiα+qsqtOK(1-ε)β+qsi{qtOK(1-ε)β+[1-qtOK(1-ε)β]α}. (8.2)

In deriving Eq. (8.2), the exogenous costs, ci and cs, that appear in Eq. (8.1) have been ignored, using the standard assumption of the two-timescale approach that selection is weak. Hence, at equilibrium (obtained by setting qt+1OK=qtOK=qOK), we have

qOK=α(1-qs)/{1-(1-ε)β[qs+qsi(1-α)]}. (8.3)

Substitution of Eq. (8.3) into (8.1) yields the selection coefficients of the three strategies, which depend on their frequencies, qi, qs, and qsi.

Since wsi > ws holds by assumption for all frequencies, critical social learner will be an ESS if wsi > wi when qsi = 1. Hence, the condition for critical social learner to be an ESS is

ci>α-1+cs[1-(1-α)(1-ε)β]/[α(1-ε)β]. (8.4)

Clearly, Rogers’ paradox is resolved since wsi > wi at the evolutionarily stable equilibrium where critical social learners are fixed.

5.2. Model 9: evolutionarily stable learning schedules with a continuous phenotype

All models presented so far have assumed a dichotomous phenotype that is either correct or wrong. Here, we posit a continuous phenotype with a fitness optimum that depends on the environmental state. The question of learning schedules was first addressed by Boyd and Richerson (1985), who introduced a learning strategy or process (in this latter context) that they called “guided variation”. Guided variation entails that an organism acquires its “initial phenotype” by SL – from the parental generation according to a “blending rule” – which is then adaptively modified by IL to yield the “mature phenotype”. Such a strategy permits a gradual and cumulative improvement of the phenotype over generations.

IL or SL or a mixture of the two can be sequentially combined in an infinite number of ways to form a learning schedule. Aoki et al. (2012) used a method informed by optimal control theory to ask what the evolutionarily stable learning schedules would be under various regimes of environmental change.

Suppose that there are two learning stages per discrete generation, stage-0 and stage-1 learning. Genetically determined learning strategies assign weights between 0 and 1 to the information gathered by IL and by SL during each stage. The phenotype of a naïve newborn before any learning occurs will be called the “initial phenotype” and is set to 0. We refer to the phenotype after stage-0 learning as the “intermediate phenotype” and the phenotype after stage-1 learning as the “mature phenotype”. These terms are used differently here from Boyd and Richerson (1985). With each learning strategy are associated one intermediate phenotype and one mature phenotype that are sequentially expressed. Only the mature phenotype contributes to fitness. Post-learning organisms reproduce in proportion to their fitness (fertility selection) and survive to serve as exemplars (i.e., cultural parents) for the next generation.

In what follows, we deal only with the situation where the environment undergoes periodic changes twice per generation, i.e., between birth and stage-0 learning, and between stage-0 learning and stage-1 learning. The optimal phenotype is −z* during stage-0 learning, whereas it is z* during stage-1 learning and reproduction. Since the initial phenotype is set to 0, this requires a special assumption regarding the phenotypic scale. The target for IL in each learning stage is the optimal phenotype in that learning stage. The target for SL is always the mature phenotype of the previous generation if the population is genetically monomorphic, or the population mean of the mature phenotypes if more than one learning schedule is segregating. The efficiencies of IL and SL during stage-i learning are αi and βi(i = 0, 1), respectively, where efficiency is defined as the proportional reduction in the deviation from the target and assumed to be the same for all learning strategies. We assume 0 < αi, βi < 1 (i = 0, 1), which entails that the mature phenotype approaches, but never converges to, the optimal phenotype.

First, we consider a population that is monomorphic for the resident learning strategy, letting zt be the mature phenotype in generation t. This strategy assigns weight ui (called the control) to IL and complementary weight 1 − ui to SL during stage-i learning, where 0 ≤ ui ≤ 1(i = 0, 1). In generation t + 1, pure IL would produce the phenotype −α0z* during stage-0 learning, whereas pure SL would yield the phenotype β0zt. Hence, taking the weights into consideration, we have the intermediate phenotype

zt+1int=-u0α0z+(1-u0)β0zt. (9.1)

On further application of pure IL during stage-1 learning, the incremental change in the phenotype would be α1(z-zt+1int), because the optimal phenotype and hence the target for IL has changed to z*. Pure SL during stage-1 learning would produce the incremental change β1(zt-zt+1int). Thus, the mature phenotype is

zt+1=zt+1int+u1α1(z-zt+1int)+(1-u1)β1(zt-zt+1int). (9.2)

Substituting Eq. (9.1) into (9.2), the difference equation in the mature phenotype across generations can be written as

zt+1=Azt+Cz, (9.3a)

where

A=(1-u0)β0[1-u1α1-(1-u1)β1]+(1-u1)β1, (9.3b)
C=u1α1-u0α0[1-u1α1-(1-u1)β1]. (9.3c)

At equilibrium of the cultural dynamics described by Eq. (9.3), we have

z^=Cz/(1-A). (9.4)

Next, introduce a rare mutant strategy of small effect into the population of resident strategists at this equilibrium. Denote the control for this mutant strategy by μim(i=0,1), and let Am and Cm be the functions of uim(i=0,1) obtained by substituting ui=uim(i=0,1) in Eqs. (9.3a) and (9.3b), respectively. Then, the mature phenotype of the mutant strategy is given approximately by

zm=Amz^+Cmz. (9.5)

The method for determining an evolutionarily stable control, ui(i=0,1), prescribes that we define an objective function, which gives the fitness of a mutant strategy introduced at low frequency into the population of resident strategist at equilibrium (Maynard Smith, 1982). Specifically, we assume an objective function of the Gaussian form,

F(u0m,u1m,u0,u1)=exp{-(zm-z)2w}, (9.6)

where zm is given by Eq. (9.5). Eq. (9.6) formalizes the assumption that deviations from the optimal phenotype, z*, are penalized, with an intensity of selection inversely proportional to w.

An evolutionarily stable control, ui(i=0,1), is one such that F(u0m,u1m,u0,u1) takes a (local) maximum when u0m=u0=u0 and u1m=u1=u1. Set

s0=Fu0m|u0m=u0,u1m=u1, (9.7a)
s1=Fu1m|u0m=u0,u1m=u1, (9.7b)

which represent the selection gradients on the mutant control for stage-0 and stage-1 learning, respectively. In Appendix F, we show that s0 < 0 and s1 > 0 for all values of u0 and u1. Hence, there is only one evolutionarily stable control, which is u0=0,u1=1. In words, when the environment changes twice per generation –between birth and stage-0 learning, and between stage-0 learning and stage-1 learning – we predict an evolutionarily stable learning schedule where pure SL is followed by pure IL, which is equivalent to guided variation. Note that, this schedule incorporates SL in spite of the highly changeable environment.

The fitness of a genetically monomorphic population at the cultural equilibrium, , is given by

W(z^)=exp{-(z^-z)2w}, (9.8)

which is of the same form as Eq. (9.6). To obtain the fitness of a population using the evolutionarily stable strategy, we substitute u0=u0=0 and u1=u1=1 in Eq. (9.3) and then use Eq. (9.4) to compute . This yields

z^=zα11-(1-α1)β0, (9.9)

and it can be shown that this value of maximizes W(). Hence, any other control including the pure IL strategy (u0 = 1, u1 = 1) is associated with a lower fitness at the cultural equilibrium. Thus, Rogers’ paradox is not observed in this model.

6. Biased social learning

The previous models all assume that SL occurs from a random member of the parental generation (unbiased or random SL). The empirical literature sometimes suggests that organisms choose their exemplar(s) according to certain criteria such as conformity or perceived success (e.g., Mesoudi, 2009; Henrich and Broesch, 2011; Chudek et al., 2012). In this section, we review a model due to Nakahashi et al. (2012) that investigates the conditions under which such biases in SL may evolve.

6.1. Model 10: conformist and payoff bias with spatial environmental heterogeneity

Nakahashi et al. (2012) extend Model 7 by the simultaneous introduction of two new SL strategies, conformist bias and payoff bias, which are in competition with each other and with individual learners and unbiased social learners. As before, there are n environmentally different sites, each associated with a behavior that is correct there but equally wrong at the n − 1 other sites. Migration at rate m occurs among the sites after SL but before individual learning. An individual learner always achieves the correct behavior, but suffers an exogenous cost c. The cost of wrong behavior for an unbiased social learner is s.

Social learning is said to exhibit conformist bias when the probability that a common phenotype – or the majority phenotype when there are only two options – is adopted exceeds the frequency of that phenotype among the possible exemplars. Specifically, let bij be the frequency of organisms at site i with behavior adapted to site j in the parental generation, where j=1nbij=1 for each i. Then, an organism of the offspring generation using the conformist bias strategy at site i acquires the behavior adapted to site j with probability

ρij=(bij)a/k=1n(bik)a, (10.1)

where a > 1. When a → ∞, as is assumed here, the most common behavior at site i is copied. In particular, if bii > bij for each ji, then ρii → 1, so that the locally correct behavior is copied with probability one in this case.

Payoff bias, on the other hand, is a strategy in which the locally correct behavior is always copied. An organism using the payoff bias strategy will acquire the behavior adapted to its natal site. Such behavior always exists provided individual learners are initially present in the population.

Nakahashi et al. (2012) assume exogenous costs d to conformist bias and g to payoff bias, in addition to a cost s for the wrong behavior. Hence, for example, the fitness of an organism with the payoff bias strategy that has acquired the wrong behavior – i.e., all migrants – will be (1 − g)(1 − s). It is assumed that

0<d<g<c<s<1. (10.2)

In particular, inequalities (10.2) entail that the payoff bias is more costly to implement than the conformist bias.

Nakahashi et al. (2012) derive conditions for the local stability of monomorphic and polymorphic equilibria at which the conformist bias exists. They find that a high cost of IL (large c) in conjunction with a high migration rate (large m) favor the evolution of conformist bias. Thus, high migration rates do not necessarily favor IL when in competition with biased SL. On the other hand, fixation of payoff bias is unstable if d < g, as is assumed, and polymorphisms involving the payoff bias strategy are either not possible or are unstable when they exist. Hence, payoff bias can never evolve. The intuitive reason for this latter result is that the frequency of the correct behavior at a site eventually exceeds the frequency of any one of the n− 1 wrong behaviors–i.e., bii > bij for each ji. Thus, both conformist bias and payoff bias will achieve the correct behavior, but the former strategy suffers a smaller exogenous cost and so out-competes the latter (given d < g).

7. Emergence of learning

We have reviewed ten models for the evolution of learning strategies in species where IL and SL are already well-established. In doing so, we have ignored the possibility that the phenotype (behavior) may be innately determined. We now consider conditions for the emergence of a partial reliance on learning.

7.1. Model 11: mixed strategy model that includes innate behavior

Wakano and Aoki (2006) proposed an extension of Model 5 to include innate behavior. As in Model 5, the environment changes every l generations to a previously unknown state. However, this extended mixed strategy model has several differences. First, an organism with the resident strategy uses SL with probability K, IL with probability L, and acts innately with probability 1 − KL. The corresponding probabilities for the mutant strategy are K + dK, L + dL, and 1 − KdKLdL, respectively.

Second, there is a genetic locus (other than the locus that determines the strategy), which carries the phenotypic information that is expressed when the organism behaves innately. We call this genetic locus the “innate information” locus and assume that it has many alleles, which can be classified into those that produce adaptive behavior and those that produce maladaptive behavior in a given environment. A small subset amounting to a fraction ρ of each class has the special property of producing different behaviors before and after an environmental change. In particular, these rare variants produce adaptive behavior in the post-change generation(s) – they can be regarded as pre-adapted alleles awaiting an environmental change – but the behavior they produce in the pre-change generation(s) is maladaptive as viewed from the post-change generations(s).

Third, an organism behaving innately and expressing the correct behavior is assigned a fitness of 1. Relative to this phenogenotype, SL and IL have exogenous costs d and c, respectively. An additional cost, s, accrues to an organism behaving innately or using SL that has acquired the wrong behavior. By assumption, IL always results in the correct behavior. These costs are additive, such that an organism using SL with the wrong behavior, for example, has fitness 1 − ds. Moreover,

0<d<c<s<d+s<1. (11.1)

We now consider the conditions for a mutant strategy that relies on a small amount of learning to invade a population that is monomorphic for the pure innate resident strategy. In terms of the probabilities of using SL and/or IL, the resident strategy can be defined as K = L = 0. Then, the corresponding probabilities for the mutant strategy are dK and dL, respectively, where dK and dL are small and non-negative, and at least one is positive. For small dK and dL, the eigenvalue takes the approximate form

λ=1+CKdK+CLdL, (11.2)

and the invasion condition is λ > 1.

Wakano and Aoki (2006) show that CK < 0. Hence, CL > 0 and dL > 0 are necessary conditions for invasion. The latter inequality entails that a small amount of SL in isolation (dK > 0, dL = 0) confers no selective advantage over the pure innate strategy, so that a successful mutant strategy must have an IL component (dL >0). Furthermore, CL > 0 if and only if

η(l)η(1-l)<1-c, (11.3a)

where we define

η(l)=ρ+(1-ρ)(1-s)l. (11.3b)

The ratio η(l)η(l-1) is monotone increasing in l, with η(1)η(0)=ρ+(1-ρ)s, and limlη(l)η(l-1)=1. Hence, provided

ρ+(1-ρ)(1-s)<1-c, (11.4)

there exists a unique integer lM ≥ 1 such that (11.3) is satisfied for llM. Restated, the pure innate strategy is evolutionarily stable if inequality (11.4) is reversed or if l > lM.

Inequality (11.4) can be rewritten as

c<(1-ρ)s. (11.5)

The left hand side of inequality (11.5) is the exogenous cost of IL. The right hand side is the fitness loss to innate behavior due to the expression of non-adapted alleles at the innate information locus immediately after an environmental change. Thus, the conditions that favor the emergence of learning are (1) a small exogenous cost of IL, (2) a high cost of wrong behavior, (3) a low frequency of pre-adapted alleles at the innate information locus, and (4) a changeable environment. In addition, learning must include some IL.

8. Discussion

8.1. Comparing the dynamical models in temporally variable environments

Models 1, 2, and 4 are three basic dynamical models for the evolution of obligate individual learners and obligate social learners in temporally variable environments. They produce predictions that differ both qualitatively and quantitatively, the main distinction being between Models 2 and 4 for which fixation of social learners can be stable, and Model 1 for which it cannot. The contradictory predictions can be related to the different assumptions. Models 2 and 4 allow for some of the behaviors occurring in the parental generation to remain adaptive in the offspring generation. On the other hand, Model 1 posits that none of the preexisting behaviors can be adaptive after an environmental change. Model 1 has the merit of being relatively easy to analyze. However, Models 2 and 4 may be better representations of reality, at least in the sense that some continuity in what makes a behavior adaptive across generations is permitted.

As noted above, fixation of social learners can be stable in Models 2 and 4. At such equilibria, social learners with the correct behavior (SLC) and social learners with the wrong behavior (SLW) coexist at what may loosely be described, in analogy with genetics, as a “mutation–selection balance”. Moreover, since there is no input of novel behaviors by IL, the same correct behavior and the same wrong behavior are maintained indefinitely. Specifically, Model 2 produces periodic fluctuations in the frequencies of the two behaviors, whereas Model 4 entails that they are constant. In neither case can there be sustained cultural change. Nevertheless, the mean fitness of the population at such equilibria will be higher than at the other possible equilibria, including fixation of individual learners. Hence, Rogers’ paradox is resolved, but this result arguably introduces another paradox, in that the highest mean fitness is associated with a population that does not evolve culturally.

The three models are deterministic and have in common the assumptions that the environment changes at regular intervals and population size is infinite. Aoki et al. (2005) conducted Monte Carlo simulations of an extended version of Model 1 that includes innate determination of behavior as a third alternative. In these simulations, the environment changes between generations with probability p – as measured by a uniformly distributed pseudo-random number – so that on average the environment changes every 1/p generations. As expected, they found that the equilibrium frequencies of individual learners, social learners, and innate determination depended on 1/p in the same way as on the fixed period of environmental change, l, in the deterministic model. Specifically, individual learners, social learners, and innate determination were favored by natural selection when environmental changes occurred at short, intermediate, and long intervals, respectively.

We note in passing that the simple forms of vertical transmission (i.e. SL from parents) are congruent with innate determination. Hence, it comes as no surprise when McElreath and Strimling (2008) find that vertical transmission, where one of the two parents is copied with equal probability, should be selectively favored when the environment is stable.

Rendell et al. (2010) conducted agent-based simulations on the evolution of various SL strategies (pure, critical, and conditional SL) in a finite population with stochastic temporal changes of the environment. More specifically, their model posits that organisms in this population inhabit cells on a lattice that are arranged as a two-dimensional torus. We will have more to say on this study when we discuss the effects of population structure. Here, we consider their results obtained under the global conditions (SL occurs from a randomly chosen member of the population) and when all cells are in the same environmental state at any one time. In the competition between pure social learners and pure individual learners, Rendell et al. (2010) find that the frequency of social learners and the mean fitness at polymorphic equilibrium, obtained from their simulations, are in good “qualitative” agreement with the analytical predictions of Rogers (1988) and Enquist et al. (2007). However, they also indicate that the frequency of social learners is lower than expected, whereas the mean fitness is higher.

It is not clear to us how the analytical predictions of Rogers (1988) and Enquist et al. (2007) on the frequency of social learners at equilibrium would apply, because the model of Rendell et al. (2010) differs in various respects, such as the number of environmental states and the possible transitions among them. However, the mean fitness of the population when individual learners are fixed is 1 − c in all three models, where c (ca in their notation) is the exogenous cost to individual learners, providing a point of comparison. As noted above, Rendell et al. (2010) find that the mean fitness at polymorphic equilibrium is marginally higher than 1 − c, suggesting that Rogers’ paradox is resolved. These authors do not attribute this discrepancy to environmental stochasticity or finite population size. Rather, they argue that the difference may be caused, for example, by their use of the coevolutionary approach in their simulations, as opposed to the two-timescale argument favored by Rogers (1988) and Enquist et al. (2007). This interpretation is consistent with what we have shown in this paper, that Rogers’ paradox is inherent in the two-timescale method at least as applied to the evolution of pure strategies (Model 3), whereas it can be resolved in a coevolutionary model with two (Model 2) – or a finite number (Model 7) of –environmental states.

8.2. ESS mixed strategy models in temporally variable environments

Models in which an organism can engage in both IL and SL are arguably more realistic than those that assume it can only do one or the other. The goal with these mixed strategy models is to investigate the dependence of the evolutionarily stable probability, L, of using IL (as in Model 5) – or the complementary probability, K, of using SL (as in Model 6) – on the environmental variability.

Model 5, which assumes an infinite number of environmental states, is the direct analog of the dynamical model, Model 1. Hence, it comes as no surprise that the ESS value of L predicted by Model 5 depends on the environmental variability in qualitatively the same way as the equilibrium frequency of obligate individual learners does in Model 1. Specifically, both increase as the environmental stability decreases, such that total reliance on IL (L = 1) and fixation of individual learners can both be stable if the environment is sufficiently changeable. On the other hand, total reliance on SL (L = 0), or equivalently fixation of social learners, cannot be stable in either model. However, Rogers’ paradox was seen to occur in Model 1, but to be eliminated in Model 5—for the latter model, the geometric mean of the mean fitnesses is higher at an interior ESS than in a population with L = 1. The reason for this discrepancy remains unclear.

Model 6 is the coevolutionary version of a model described by Boyd and Richerson (1988, 1995) for fluctuating environments (Wakano and Aoki, 2007). The original model was formulated and analyzed by a combination of the two-timescale and information decay approaches. The ESS analysis of Model 6 agrees qualitatively with Boyd and Richerson (1988, 1995) in finding that the total reliance on SL (K = 1) cannot be an ESS, whereas the total reliance on IL (K = 0) can in a rapidly fluctuating environment. At intermediate levels of environmental variability, an organism evolves to depend on both IL and SL. Hence, the original and reformulated models would appear to produce qualitatively consistent predictions.

In the model of Boyd and Richerson (1988, 1995), the mean fitness associated with an ESS mixing IL and SL is greater than the mean fitness of the pure IL strategy. We suspect Rogers’ paradox is also resolved in Model 6, since it has a mathematical structure similar to Model 5, for which we have already seen that Rogers’ paradox does not occur. Boyd and Richerson (1995) argue that Rogers’ paradox is resolved in their model, because increased reliance on SL has the effect of improving the accuracy of IL (i.e., reducing the probability that the wrong behavior is acquired when IL is used), as suggested by their Fig. 3. We have no reason to question this interpretation. However, it should be noted that Rogers’ paradox is also eliminated in Model 5, where IL always results in the correct behavior regardless of the degree of reliance on SL.

8.3. Spatially variable environment

Model 7 addresses the effect of spatial environmental heterogeneity on the evolution of obligate individual learners and obligate social learners. The habitable world comprises a finite number of islands, each with a different environment and among which organisms may migrate. All stable equilibria of this deterministic model are symmetric; that is the numbers of individual learners, SLC, and SLW are the same at all sites. Fixation of individual learners, polymorphism of individual learners and social learners, fixation of social learners, and extinction are the possible stable equilibria. Let us take a horizontal transect through Fig. 3 at a value of the exogenous cost to individual learners, c, that is small relative to the cost of maladaptive behavior, s. Then we can see that fixation of social learners is stable for small values of the migration rate, m, polymorphism of individual learners and social learners is stable for intermediate values of m, and fixation of individual learners is stable for large values of m. In addition, it can be shown that the total population size is greater at the stable monomorphism of social learners than at the coexisting unstable monomorphism of individual learners, which can be interpreted as indicating that Rogers’ paradox is resolved.

Let us compare these observations with the findings of Rendell et al. (2010) from their agent-based simulations. Their treatment of the competition between individual learners and pure social learners under the local conditions (SL occurs from neighbors and dispersal is limited to neighboring sites) with spatial variation in the environment corresponds most closely to the postulates of Model 7. An important parameter in the model of Rendell et al. (2010) is the spatial correlation, pn, defined as the probability that neighboring cells have the same environmental state. Small values of pn in this model would appear to be analogous to large values of m in Model 7. This is because smaller values of the spatial correlation entail that a newborn social learner is more likely to be exposed to a neighbor that has experienced a different environmental state from its own. Similarly, higher migration rates entail that a newborn social learner is more likely to acquire its behavior from an immigrant with locally maladaptive behavior.

Rendell et al. (2010) find that the proportion of social learners at equilibrium increases as pn increases (see their Fig. 4). Moreover, social learners can be “effectively fixed”—their model includes mutation among the strategies. These results mirror the predictions of Model 7 when we note the analogy between pn and m. Rendell et al. (2010) also claim that Rogers’ paradox does not apply when the spatial correlation is high and social learners are effectively fixed. This result is what we would expect from our analysis of Model 7, but is apparently at odds with their Fig. 4 for the case that a social learner does not initially learn from its parent.

8.4. Learning schedules and cumulative culture

Boyd and Richerson (1985) were the first to deal with the evolution of learning schedules. In their model of guided variation, a continuous cultural trait is acquired by a two-step process, whereby an initial phenotype that is acquired by oblique SL is adaptively modified by IL to yield the mature phenotype. These authors rely on the two-timescale approach to predict the “evolutionary equilibrium” contributions of SL and IL to the mature phenotype. Enquist et al. (2007) introduced a strategy called critical social learning, which also entails that SL occurs before IL.

Learning schedules are of particular interest as they relate to the possibility of cumulative culture. By cumulative culture we mean a cumulative improvement in the adaptiveness of a cultural trait, although the term can also describe an increase in the number of adaptive cultural traits. A learning schedule in which each organism accurately absorbs an extant variant of a cultural trait by SL and then builds on it by IL – SL followed by IL – can be supportive of cumulative culture. In this regard, guided variation can result in a gradual and cumulative improvement of the phenotype over generations. However, critical social learning cannot, because this learning strategy entails that the behavior acquired by SL is rejected if it is judged to be maladaptive, in which case IL must occur from scratch.

An explicit study of learning schedules in the context of cumulative culture was made by Aoki et al. (2012), part of which has been reproduced as Model 9. The model assumes two learning stages per discrete generation in each of which an organism can use IL, SL, or a mixture of the two. Analysis of Model 9 showed that if the environment is highly changeable, undergoing periodic fluctuations twice per generation, the evolutionarily stable learning schedule comprised pure SL during the earlier learning stage and pure IL during the latter. Pure SL followed by pure IL can also be the evolutionarily stable learning schedule in a constant environment, depending on the efficiencies of SL and IL (Aoki et al., 2012). Interestingly, a mixture of IL and SL was never found to be evolutionarily stable for either of the learning stages.

An “unrealistic” aspect of the model of Aoki et al. (2012) is the prediction noted above that an organism should use only IL or SL to the exclusion of the other during any one learning stage. This result can, of course, be modified to be more consonant with actual learning behavior by introducing stage-dependent exogenous costs to IL and/or SL. The guided variation model of Boyd and Richerson (1985), on the other hand, allows for intermediate levels of IL and SL at evolutionary equilibrium, even in the absence of exogenous costs. Unfortunately, the two models are not really comparable – neither model can be reduced to a special case of the other – so that it is not possible to ascertain whether the contradictory outcomes can be reconciled.

One reason why the guided variation model sometimes predicts intermediate levels of IL and SL may be that this model can be interpreted as providing only one learning stage per generation, instead of two learning stages as in the model of Aoki et al. (2012). Thus, Eq. (4.9) of Boyd and Richerson (1985) defines a weighted average of the phenotypes that are acquired by SL and IL, as explained in further detail by these authors on page 97 of their book. In fact, their Eq. (4.9) is compatible with the assumption that SL and IL occur concurrently, rather than in some temporal sequence. Hence, the intermediate levels of IL and SL arising from this model may be congruent with the one-shot mixed strategies observed in Models 5 and 6.

8.5. Biased social learning

Conformist bias has been studied from the standpoint of evolutionary theory and in psychological experiments. Evolutionary theorists have provided several definitions of conformity in the context of SL. Eq. (10.1) is suitable for dealing with the dynamics of three or more culturally transmitted phenotypes (Lachlan et al., 2004; Nakahashi, 2007). On the other hand, when there are only two options, the probability that the focal variant – which exists at frequency p – is chosen is usually written as

p+Dp(1-p)(2p-1),

where D is a parameter assumed to be positive (Boyd and Richerson, 1985, p. 208). That is, the majority (minority) phenotype is adopted with a probability greater (less) than its representation in the population. Different formulations of conformist bias are possible in finite populations, where, for example, each newborn samples a relatively small number of exemplars from the parental generation and adopts the majority phenotype in that sample (Eriksson et al., 2007; Aoki et al., 2011).

Empirical evidence for conformity in humans as defined above – psychologists use a slightly different definition – is poor (Boyd and Richerson, 1985, pp. 223–227; Eriksson and Coultas, 2009; Eriksson et al., 2007; Claidière et al., 2012). On the other hand, theoretical work shows that the conditions for the evolution of conformist bias are not particularly stringent. For example, Model 10 predicts that a high migration rate in a spatially heterogeneous environment, in conjunction with a high cost of IL, favors conformist bias. Nakahashi (2007), Wakano and Aoki (2007), and Kendal et al. (2009) obtained an analogous result that conformity is selected for when the environment changes rapidly in time. Henrich and Boyd (1998) conducted a numerical study on a model incorporating both spatial and temporal variability of the environment, and reached the conclusion that “conformist transmission is favored under a very broad range of conditions”. This may be true, but it is also true that selection on conformist bias is often extremely weak (Wakano and Aoki, 2007). Hence, conformist bias may not necessarily evolve, even under conditions that favor it. Eriksson et al. (2007) give additional reasons for doubting that conformist bias is a general phenomenon.

Payoff bias and direct bias are closely related concepts. Direct bias can be defined as “a naïve individual (choosing) his/her exemplar (cultural parent) based only on the competence of that exemplar in the specific skill that is to be copied” (Kobayashi and Aoki, 2012). This definition is consistent with the one given originally by Boyd and Richerson (1985, p. 137) and with the current usage in the theoretical literature on cultural evolutionary rates (Powell et al., 2009; Mesoudi, 2011). More generally, direct bias means that a particular variant of a cultural trait is preferred and an individual carrying that variant is identified and copied. The term payoff bias makes more explicit the assumption that a phenotypic variant is more likely to be adopted if it is associated with a higher fitness; Boyd and Richerson (1985) were agnostic with regard to this aspect of direct bias. Payoff bias can take many forms. For example, Model 10 due to Nakahashi et al. (2012) assumes that an organism implementing payoff bias always identifies and acquires the (most) adaptive behavior, no matter how low the frequency of that behavior in the population. Alternatively, the fidelity of SL may be set proportional to the fitness benefit from adopting the adaptive behavior (Kendal et al., 2009).

Empirical studies in the laboratory (for review see Mesoudi, 2009; Chudek et al., 2012) and in the field (Henrich and Broesch, 2011) may suggest present-day humans are capable of payoff and/or direct bias. Apparently, the evidence for direct and indirect biases – the latter includes attraction to prestigious and/or successful individuals Boyd and Richerson (1985) – is stronger than for conformist bias. It is then ironic that Model 10 predicts conformist bias will out-compete payoff bias in a spatially variable environment. Nakahashi et al. (2012) also consider the temporal infinite environmental states analog of Model 10, for which they find that payoff bias can be maintained in the population. Unfortunately, this model is not well formulated, since it assumes the presence of individual learners from which the payoff bias strategists can acquire the adaptive behavior, without explicitly incorporating them into the dynamical equations. On the other hand, Kendal et al. (2009), using the information decay approach to model temporal environmental change (see Model 4), show that payoff bias in competition with individual learners can reach polymorphic frequencies. This latter result suggests that Nakahashi et al. (2012) may be right for the wrong reasons.

Clearly, the evolution of biases in SL is an important issue that needs to be pursued further. Such biases can interact with population size to have a large effect on cultural evolutionary rates and have been invoked to explain archeological observations on changes in lithic traditions of various hominid species (Henrich, 2004; Powell et al., 2009; Mesoudi, 2011; Aoki et al., 2011; Kobayashi and Aoki, 2012).

8.6. Extensions

Our assumptions may be translated into the language of decision making and behavior. In most of the models (e.g., Models 1–5, 7, 10, 11), individual learners always make the correct decision about what they would gain (in fitness) from the environment, but usually pay a cost to do so. Social learners pay a cost when they make an incorrect decision, in which case they behave inappropriately for the environment they are in, and the latter cost is greater than that paid by individual learners. For human learners, it may be difficult to discern which behavior should be “invented” as an individual learner or “copied” as a social learner, because the optimum behavior in a given environment may be difficult to identify. Kahneman and Tversky (1979) showed that in choosing between sets of payoffs with different probabilities it is not always the highest mean payoff that humans decide upon. In our terminology, it may be very difficult to decide whom to copy in a given environment, or what the payoffs to possible decision choices are. Quantification of uncertainty and the probability of acting upon perceived measures of uncertainty are features of evolution that are crucial to learning, and in principle both should contribute to fitnesses of different learning strategies.

A similar issue has been shown to arise in recent agent-based models of SL and IL in uncertain environments. The context is the producer–scrounger game, where some organisms, producers, discover resources, and others, scroungers, then join them and take advantage of the producers’ discoveries (Barnard and Sibly, 1981; Giraldeau and Caraco, 2000). This game provides a frequency-dependent situation in which IL and SL strategies can compete with innate behavior, and the structure of the competitive environment can be manipulated to give appropriate optimum combinations of these strategies.

Arbilly et al. (2011) used this approach with an environment they designed to have the highest payoff occur with the lowest probability. In their simulations, agents learned in which patch to forage either individually (as a producer) or socially (as a scrounger) by observing or joining a producer, over many time steps during the agents’ lifetimes. When the number of learning steps per lifetime was large enough, the social learners were able to learn to forage on the high-payoff-low-probability food sources, which resulted in their ultimately taking over the population. Other such studies have shown that learners can invade a population of non-learners (Dubois et al., 2010) and that if the producer–scrounger dimorphism is not perfectly heritable, learners could come to dominate the population when the amount of resource was either fixed or fluctuating (Katsnelson et al., 2011).

Although these agent-based simulations involve complexities of finite population size, multiple phenogenotypes and complex environments, which preclude their representation by the recursion systems that we would like to analyze formally, they provide an informative complement to the mathematical analyses reviewed here. They also suggest avenues along which the mathematical models might profitably be developed, albeit in a more abstract form.

Acknowledgments

We are grateful to many colleagues. In particular, Wataru Nakahashi and Magnus Enquist provided clarification on their models, Laurent Lehmann and Jeremy Kendal made helpful comments on the first draft, and Shripad Tuljapurkar contributed Appendix C. This research was supported in part by Monbukagakusho grant 22101004 to KA and NIH grant GM28016 to MWF.

Appendix A. Infinite-states l-cycle model (Model 1)

We summarize the local stability analysis for the two monomorphic equilibria.

  1. (i) = 1 for 0 ≤ il − 1.

    When the environment changes between generations,
    x+y=1-s1-c(x+y).
    When the environment does not change,
    x+y=11-c(x+y).
    Hence after l generations,
    x(l)+y(l)=1-s(1-c)l(x(0)+y(0)).

    Thus, this equilibrium is stable if 1-s(1-c)l<1, i.e., if l<log(1-s)log(1-c).

  2. (i) = 0, (i) = 0, ŷ(i) = 1 for 0 ≤ il − 1.

    Assume that (i) = 0. Then Eq. (1.1) gives x^(1)y^(1)=0, and Eq. (1.2) gives x^(i+1)y^(i+1)=11-s·x^(i)y^(i) for 1 ≤ il − 1. Hence, ŷ(i) = 1 for 0 ≤ il − 1, when social learners are fixed.

    After an environmental change,
    x(1)=0,z(1)=1-c1-sz(0).
    Without an environmental change,
    x(i+1)=11-s(x(i)+z(i)),z(i+1)=1-c1-sz(i).
    Hence,
    (x(l)z(l))=(11-s11-s01-c1-s)l-1(0001-c1-s)(x(0)z(0))=(00(1-c1-s)l)(x(0)z(0))

where * indicates a non-zero quantity. The eigenvalues do not depend on the order of matrix multiplication (Caswell, 2001, pp. 350–351), and since c < s the leading eigenvalue is larger than 1, and the equilibrium is unstable.

Appendix B. Two-state l-cycle model (Model 2)

We identify the genetically monomorphic equilibria and determine their local stability conditions.

  1. (i) = 1 for 0 ≤ il − 1.

    After an environmental change,
    x+y=1-s1-c(x+y).
    Without an environmental change,
    x+y=11-c(x+y).
    Hence after l generations,
    x(l)+y(l)=1-s(1-c)l(x(0)+y(0)).

    Thus stable if 1-s(1-c)l<1, i.e., if l<log(1-s)log(1-c).

  2. (i) + ŷ(i) = 1 for 0 ≤ il − 1.

    The natural period of the periodic solutions is 2l as shown below. After the first environmental change,
    x(1)y(1)=11-s·y(0)x(0).
    During the subsequent l − 1 generations without change,
    x(i)y(i)=11-s·x(i-1)y(i-1)
    where 2 ≤ il. Similarly, after the second environmental change
    x(l+1)y(l+1)=11-s·y(l)x(l),
    and during the following l − 1 generations of stasis,
    x(l+i)y(l+i)=11-s·x(l+i-1)y(l+i-1)
    for 2 ≤ il. Hence,
    x(2l)y(2l)=(1-s)-l·y(l)x(l)=(1-s)-l·(1-s)l·x(0)y(0)=x(0)y(0),

which entails that there exist an infinite number of periodic solutions of period 2l, each of which is neutrally stable (see eigenvalue λ1 below).

Next, we consider the local stability of each of these periodic solutions. Set y(i)=y^(i)+εy(i),z(i)=εz(i). After the first environmental change,

W(0)=(1-c)εz(0)+[1-s(1-y^(0)-εy(0))](1-εz(0))1-s(1-y^(0))+sεy(0)+O(εz(0)),εy(1)(1-s)(1-εz(0))(1-y^(0)-εy(0))1-s(1-y^(0))+sεy(0)+O(εz(0))-(1-s)(1-y^(0))1-s(1-y^(0))-1-s[1-s(1-y^(0))]2εy(0)+O(εz(0))εz(1)=1-c1-s(1-y^(0))εz(0).

Without change,

W(i-1)=(1-c)εz(i-1)+[1-s(y^(i-1)+εy(i-1))](1-εz(i-1))1-sy^(i-1)-sεy(i-1)+O(εz(i-1)),εy(i)(1-s)(1-εz(i-1))(y^(i-1)+εy(i-1))1-sy^(i-1)-sεy(i-1)+O(εz(i-1))-(1-s)y^(i-1)1-sy^(i-1)=1-s(1-sy^(i-1))2εy(i-1)+O(εz(i-1)),εz(i)=1-c1-sy^(i-1)εz(i-1).

Analogous equations hold for the next l generations. It follows that the eigenvalues of the local stability matrix are

λ1=(1-s)2lμ2andλ2(1-c)2lμ,

where

μ=11-sx^(0)(i=1l-111-sy^(i))·11-sx^(l)(i=1l-111-sy^(l+i)).

Note that, μ is the reciprocal of the product of the mean fitnesses over the 2l generations.

To evaluate these eigenvalues, we need to obtain the explicit solution for the periodic equilibrium. Set K0=y^(0)x^(0) and Kl=y^(l)x^(l). Then

1-sy^(i)=(1-s)1+(1-s)-(i+1)K01+(1-s)-iK01-sy^(l+i)=(1-s)1+(1-s)-(i+1)Kl1+(1-s)-iKl

for 1 ≤ il. Moreover,

1-sx^(0)=1-s+K01+K01-sx^(l)=1-s+Kl1+Kl.

Hence,

μ=(1-s)-2l1+K01+(1-s)-lK0·1+Kl1+(1-s)-lKl.

But

Kl=y^(l)x^(l)=(1-s)lx^(0)y^(0)=(1-s)lK0-1.

Hence,

μ=(1-s)-2l1+K01+Kl-1·1+Kl1+K0-1=(1-s)-2lK0Kl=(1-s)-l,

which entails that the geometric mean of the mean fitnesses is 1-s.

Thus, we finally have

λ1=1andλ2=(1-c)2l(1-s)l.

Since the corresponding eigenvectors are orthogonal, we conclude that the fixation of social learners is stable to invasion by individual learners if (1 − c)2 < 1 − s.

Appendix C. Probability of an even number of environmental changes in t generations (Model 3)

Let q be the probability of an environmental change per generation, and set p = 1 − q. The probability of an even number of environmental changes in t generations, including the case of no change, is

π=(t0)q0pt+(t2)q2pt-2++(tt)qtp0

when t is even, and

π=(t0)q0pt+(t2)q2pt-2++(tt-1)qt-1p1

when t is odd. In either case, we can rewrite π as

π=12[(p+q)t+(p-q)t].

In particular, if we set q = ε and p = 1 − ε, we obtain

π=12[1+(1-2ε)t].

Appendix D. Fully polymorphic equilibria of the information decay model (Model 4)

From Eqs. (4.1c) and (4.1d), we have at equilibrium

z^(1-z^){c-s[ε+y^(1-ε)]}=0.

Since 0 < < 1,

y^=c-sεs(1-ε),

which is valid for < c. Next, from Eqs. (4.1a) and (4.1b), we have

x^y^=(1-y^)(1-ε)(1-s)[ε+y^(1-ε)],

from which we obtain

x^=c-sεs(1-ε)·s-cc(1-s).

Hence,

z^=1-x^-y^=(s-c)(ε-c)c(1-s)(1-ε),

which is valid for c < ε. In summary, the fully polymorphic equilibrium exists if < c < ε.

Next, using y = ŷ + εy and z = + εz as the variables for the local stability analysis, the characteristic polynomial is

M(λ)=|c-sεc·1-ε-(s-c)(1-ε)(1-s)-λ-c(1-s)s(1-c)(s-c)(ε-c)(1-ε)c(1-s)·s(c-sε)c(1-s)1-λ|.

Assuming real eigenvalues, we can write this as

M(λ)=[c-sεc·1-ε-(s-c)(1-ε)(1-s)-λ](1-λ)+c(1-s)s(1-c)·(s-c)(ε-c)(1-ε)c(1-s)·s(c-sε)c(1-s).

By assumption c < s, and the condition for existence is < c < ε. Hence,

0<c-sεc·1-ε-(s-c)(1-ε)(1-s)<1andc(1-s)s(1-c)·(s-c)(ε-c)(1-ε)c(1-s)·s(c-sε)c(1-s)>0,

and both eigenvalues will be positive and smaller than 1. Assuming complex eigenvalues, on the other hand, we can rewrite the characteristic polynomial as

M(λ)=λ2-[1+c-sεc·1-ε-(s-c)(1-ε)(1-s)]λ+c-sεc(1-c),

where the constant term can be shown to satisfy 0<c-sεc(1-c)<1. Hence, the eigenvalues will be less than 1 in absolute value. Thus, whether the eigenvalues are real or complex, the existence of the equilibrium implies its stability.

Appendix E. Analysis of mixed strategy model with infinite-states l-cycle (Model 5)

To prove that the periodic equilibrium of the resident type is given by Eq. (5.3), we note from Eqs. (5.1a) and (5.1b) that

1x¯^(1)=L(1-c)+(1-L)(1-s)(1-L)(1-s)=α-s1-s,

and from Eqs. (5.2a) and (5.2b) that

1x¯^(i)=L(1-c)+(1-L)(1-sx¯^(i-1))(1-L)(1-s)x¯^(i-1)=α·1x¯^(i-1)-s1-s

for 2 ≤ il. Hence

1x¯^(i)=αi-11x¯^(1)-s1-sj=0i-2αj=αi-s1-sj=0i-1αj=1-α+β(1-α(i))1-α.

Next, we obtain the condition for invasion of the mutant type. Adding Eqs. (5.1c) and (5.1d) gives

y(1)+y¯(1)=A(y(0)+y¯(0)),

where

A=(L+dL)(1-c)+(1-L-dL)(1-s)L(1-c)+(1-L)(1-s)>0.

Similarly, adding Eqs. (5.2c) and (5.2d) gives

y(i)+y¯(i)=B(i-1)(y(i-1)+y¯(i-1)),

where

B(i)=(L+dL)(1-c)+(1-L-dL)(1-sx¯^(i))L(1-c)+(1-L)(1-sx¯^(i))>0.

Hence the residents will be invaded by mutants if

λ=Ai=1l-1B(i)>1.

Appendix F. Sign of selection gradients in Model 9

The selection gradients are

s0=Fu0m|u0m=u0,u1m=u1=-2(z-z)wexp{-(z^-z)2w}zmu0m|u0m=u0,u1m=u1,s1=Fu1m|u0m=u0,u1m=u1=-2(z^-z)wexp{-(z^-z)2w}zmu1m|u0m=u0,u1mu1.

It can readily be shown that

z-z^=z-zm|u0m=u0,u1m=u1=z(1-A-C)1-A>0,

where A and C are given by Eq. (9.3). Moreover,

(z)-1[1-u1α1-(1-u1)β1]-1(1-A)zmu0m|u0m=u0,u1m=u1=-{u1α1β0(1+α0)+α0(1-β0)[1-(1-u1)β1]}

and

(z)-1(1-A)zmu1m|u0m=u0,u1m=u1=α1(1-β1)[1+u0α0-(1-u0)β0].

Clearly,

zmu0m|u0m=u0,u1m=u1<0andzmu1m|u0m=u0,u1m=u1>0,

from which we conclude s0 < 0 and s1 > 0.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  1. Aoki K. Evolution of the social-learner–explorer strategy in an environmentally heterogeneous two-island model. Evolution. 2010;64:573–581. doi: 10.1111/j.1558-5646.2010.01017.x. [DOI] [PubMed] [Google Scholar]
  2. Aoki K, Lehmann L, Feldman MW. Rates of cultural change and patterns of cultural accumulation in stochastic models of social transmission. Theor Popul Biol. 2011;79:192–202. doi: 10.1016/j.tpb.2011.02.001. [DOI] [PubMed] [Google Scholar]
  3. Aoki K, Nakahashi W. Evolution of learning in subdivided populations that occupy environmentally heterogeneous sites. Theor Popul Biol. 2008;74:356–368. doi: 10.1016/j.tpb.2008.09.006. [DOI] [PubMed] [Google Scholar]
  4. Aoki K, Wakano JY, Feldman MW. The emergence of social learning in a temporally changing environment: a theoretical model. Curr Anthropol. 2005;46:334–340. [Google Scholar]
  5. Aoki K, Wakano JY, Lehmann L. Evolutionarily stable learning schedules and cumulative culture in discrete generation models. Theor Popul Biol. 2012;81:300–308. doi: 10.1016/j.tpb.2012.01.006. [DOI] [PubMed] [Google Scholar]
  6. Arbilly M, Motro U, Feldman MW, Lotem A. Evolution of social learning when high expected payoffs are associated with high risk of failure. J R Soc Interface. 2011;8:1604–1615. doi: 10.1098/rsif.2011.0138. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Barnard CJ, Sibly RM. Producers and scroungers: a general model and its application to captive flocks of house sparrows. Anim Behav. 1981;29:543–550. [Google Scholar]
  8. Borenstein E, Feldman MW, Aoki K. Evolution of learning in fluctuating environments: when selection favors both social and exploratory individual learning. Evolution. 2008;62:586–602. doi: 10.1111/j.1558-5646.2007.00313.x. [DOI] [PubMed] [Google Scholar]
  9. Boyd R, Richerson PJ. Culture and the Evolutionary Process. University of Chicago Press; Chicago, IL: 1985. [Google Scholar]
  10. Boyd R, Richerson PJ. An evolutionary model of social learning: the effect of spatial and temporal variation. In: Zentall T, Galef BG Jr, editors. Social Learning. Erlbaum; Hillsdale, NJ: 1988. pp. 29–48. [Google Scholar]
  11. Boyd R, Richerson PJ. Why does culture increase human adaptability? Ethol. Sociobiol. 1995;16:125–143. [Google Scholar]
  12. Bush RR, Mosteller F. A Comparison of Eight Models. In: Bush RR, Estes WK, editors. Studies in Mathematical Learning Theory. Chapter 16 Stanford University Press; Stanford, CA: 1955. pp. 293–307. [Google Scholar]
  13. Caswell H. Matrix Population Models. 2. Sinauer; Sunderland, MA: 2001. [Google Scholar]
  14. Chudek M, Heller S, Birch S, Henrich J. Prestige-biased cultural learning: bystander’s differential attention to potential models influences children’s learning. Evol Hum Behav. 2012;33:46–56. [Google Scholar]
  15. Claidière N, Bowler M, Whiten A. Evidence for weak or linear conformity but not for hyper-conformity in an everyday social learning context. PLoS One. 2012;7 (2):1–8. doi: 10.1371/journal.pone.0030970. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Dubois F, Morand-Ferron J, Giraldeau LA. Learning in a game context: strategy choice by some keeps learning from evolving in others. Proc R Soc Lond (Biol ) 2010;277:3609–3616. doi: 10.1098/rspb.2010.0857. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Eliassen S, Jørgensen C, Mangel M, Giske J. Quantifying the adaptive value of learning in foraging behavior. Amer Nat. 2009;174:478–489. doi: 10.1086/605370. [DOI] [PubMed] [Google Scholar]
  18. Enquist M, Eriksson K, Ghirlanda S. Critical social learning: a solution to Rogers’s paradox of nonadaptive culture. Am Anthropol. 2007;109:727–734. [Google Scholar]
  19. Eriksson K, Coultas JC. Are people really conformist-biased? An empirical test and a new mathematical model. J Evol Psych. 2009;7:5–21. [Google Scholar]
  20. Eriksson K, Enquist M, Ghirlanda S. Critical points in current theory of conformist social learning. J Evol Psych. 2007;5:67–87. [Google Scholar]
  21. Feldman MW, Cavalli-Sforza LL. Cultural and biological evolutionary processes II: gene-culture disequilibrium. Proc Natl Acad Sci. 1984;81:1604–1607. doi: 10.1073/pnas.81.5.1604. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Feldman MW, Aoki K, Kumm J. Individual versus social learning: evolutionary analysis in a fluctuating environment. Anthropol Sci. 1996;104:209–213. [Google Scholar]
  23. Giraldeau L-A, Caraco T. Social Foraging Theory. Princeton University Press; Princeton, NJ: 2000. [Google Scholar]
  24. Hanania MI. A generalization of the Bush–Mosteller model with some significance tests. Psychometrika. 1959;24 (1):53–68. [Google Scholar]
  25. Henrich J. Demography and cultural evolution: how adaptive cultural processes can produce maladaptive losses—the Tasmanian case. Am Antiquity. 2004;69:197–214. [Google Scholar]
  26. Henrich J, Boyd R. The evolution of conformist transmission and the emergence of between-group differences. Evol Hum Behav. 1998;19(4):215–241. [Google Scholar]
  27. Henrich J, Broesch J. On the nature of cultural transmission networks: evidence from Fijian villages for adaptive learning biases. Phil Trans R Soc Lond B Bio. 2011;366:1139–1148. doi: 10.1098/rstb.2010.0323. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Izquierdo LR, Izquierdo SS. Dynamics of the Bush–Mosteller learning algorithm in 2 2 games. In: Weber C, Elshaw M, Mayer NM, editors. Reinforcement Learning: Theory and Applications. I-Tech Education and Publishing; Vienna, Austria: 2008. pp. 199–224. [Google Scholar]
  29. Kahneman D, Tversky A. Prospect theory: an analysis of decision under risk. Econometrica. 1979;47 (2):263–292. [Google Scholar]
  30. Katsnelson E, Motro U, Feldman MW, Lotem A. Individual-learning ability predicts social-foraging strategy in house sparrows. Proc R Soc Lond (Biol) 2011;278:582–589. doi: 10.1098/rspb.2010.1151. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Kendal J, Giraldeau L-A, Laland K. The evolution of social learning rules: payoff-biased and frequency-dependent biased transmission. J Theoret Biol. 2009;260:210–219. doi: 10.1016/j.jtbi.2009.05.029. [DOI] [PubMed] [Google Scholar]
  32. Kobayashi Y, Aoki K. Innovativeness, population size and cumulative cultural evolution. Theor Popul Biol. 2012;82:38–47. doi: 10.1016/j.tpb.2012.04.001. [DOI] [PubMed] [Google Scholar]
  33. Lachlan RF, Janik VM, Slater JB. The evolution of conformity-enforcing behavior in cultural communication systems. Anim Behav. 2004;68:561–570. [Google Scholar]
  34. Lehmann L, Feldman MW. Coevolution of adaptive technology, maladaptive culture, and population size in a producer–scrounger game. Proc R Soc Lond (Biol) 2009;276:3853–3862. doi: 10.1098/rspb.2009.0724. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Lehmann L, Feldman MW, Kaeuffer R. Cumulative cultural dynamics and the coevolution of cultural innovation and transmission: an ESS model for panmictic and structured populations. J Evol Biol. 2010;23(11):2356–2369. doi: 10.1111/j.1420-9101.2010.02096.x. [DOI] [PubMed] [Google Scholar]
  36. Maynard Smith J. Evolution and the Theory of Games. Cambridge University Press; Cambridge: 1982. [Google Scholar]
  37. McElreath R, Strimling P. When natural selection favors imitation from parents. Curr Anthropol. 2008;49:307–316. [Google Scholar]
  38. Mesoudi A. How cultural evolutionary theory can inform social psychology and vice versa. Psychol Rev. 2009;116(4):929–952. doi: 10.1037/a0017062. [DOI] [PubMed] [Google Scholar]
  39. Mesoudi A. Variable cultural acquisition costs constrain cumulative cultural evolution. PLoS One. 2011;6 (3):1–10. doi: 10.1371/journal.pone.0018239. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Nakahashi W. The evolution of conformist transmission in social learning when the environment changes periodically. Theor Popul Biol. 2007;72:52–66. doi: 10.1016/j.tpb.2007.03.003. [DOI] [PubMed] [Google Scholar]
  41. Nakahashi W. Evolution of learning capacities and learning levels. Theor Popul Biol. 2010;78(3):211–224. doi: 10.1016/j.tpb.2010.08.001. [DOI] [PubMed] [Google Scholar]
  42. Nakahashi W, Wakano JY, Henrich J. Adaptive social learning strategies in temporally and spatially varying environments. Hum Nat. 2012;23:386–418. doi: 10.1007/s12110-012-9151-y. [DOI] [PubMed] [Google Scholar]
  43. Powell A, Shennan S, Thomas MG. Late Pleistocene demography and the appearance of modern human behavior. Science. 2009;324:1298–1301. doi: 10.1126/science.1170165. [DOI] [PubMed] [Google Scholar]
  44. Rendell L, Fogarty L, Laland KN. Rogers’ paradox recast and resolved: population structure and the evolution of social learning strategies. Evolution. 2010;64 (2):534–548. doi: 10.1111/j.1558-5646.2009.00817.x. [DOI] [PubMed] [Google Scholar]
  45. Rodriguez-Gironés MA, Vásquez RA. Density-dependent patch exploitation and acquisition of environmental information. Theor Popul Biol. 1997;52(1):32–42. doi: 10.1006/tpbi.1997.1317. [DOI] [PubMed] [Google Scholar]
  46. Rogers AR. Does biology constrain culture? Am. Anthropol. 1988;90(4):819–831. [Google Scholar]
  47. Stephens DW. Change, regularity, and value in the evolution of animal learning. Behav Ecol. 1991;2(1):77–89. [Google Scholar]
  48. Wakano JY, Aoki K. A mixed strategy model for the emergence and intensification of social learning in a periodically changing natural environment. Theor Popul Biol. 2006;70(4):486–497. doi: 10.1016/j.tpb.2006.04.003. [DOI] [PubMed] [Google Scholar]
  49. Wakano JY, Aoki K. Do social learning and conformist bias coevolve? Henrich and Boyd revisited. Theor Popul Biol. 2007;72(4):504–512. doi: 10.1016/j.tpb.2007.04.003. [DOI] [PubMed] [Google Scholar]

RESOURCES