Abstract
The interaction between phenotypic plasticity, e.g. learning, and evolution is an important topic both in Evolutionary Biology and Machine Learning. The evolution of learning is commonly studied in Evolutionary Biology, while the use of an evolutionary process to improve learning is of interest to the field of Machine Learning. This paper takes a different point of view by studying the effect of learning on the evolutionary process, the so-called Baldwin effect. A well-studied result in the literature about the Baldwin effect is that learning affects the speed of convergence of the evolutionary process towards some genetic configuration, which corresponds to the environment-induced plastic response. This paper demonstrates that learning can change the outcome of evolution, i.e., lead to a genetic configuration that does not correspond to the plastic response. Results are obtained both analytically and experimentally by means of an agent-based model of a foraging task, in an environment where the distribution of resources follows seasonal cycles and the foraging success on different resource types is conditioned by trade-offs that can be evolved and learned. This paper attempts to answer a question that has been overlooked: whether learning has an effect on what genotypic traits are evolved, i.e. the selection of a trait that enables a plastic response changes the selection pressure on a different trait, in what could be described as co-evolution between different traits in the same genome.
1 Introduction
The so called Baldwin effect [1] is a much debated theory in the literature of evolution [2] about how new features are inherited by an individual with phenotypic plasticity [3–5]. Baldwin proposed this new “factor in evolution” [1] to explain how complex features such as an eye can evolve [6–8], as an alternative to the then-popular Lamarckian evolution which assumed that traits acquired by an individual through phenotypic plasticity would be transferred directly to its offspring’s genome [9]. This idea went unnoticed until the late 1990s, when it caught the interest of the fields of Psychology, in reference to the evolution of human learning, and Computer Science, in reference to evolutionary computation, machine learning, and artificial life. Only from the mid-2000s did the Baldwin effect start taking ground in the field of Evolutionary Biology. [10].
Given the long debate surrounding the Baldwin effect, there are different definitions of it with different levels of generality, e.g. “The Baldwin Effect, states that learned behavior and characteristics at the level of individuals can significantly affect evolution at the level of species” [11], Schull relates the Baldwin effect to statements such as “individual developmental responses will necessarily lead to directed and non-random evolutionary change” [12]. The open peer commentaries of [12] highlight different conflicting stances regarding the Baldwin effect and its definition. The working definition of the Baldwin effect used in this paper is: plasticity is a “positive driving force of evolution” that affects the selection pressure such that “standing genetic variation can be selected upon so that evolution can proceed in the direction of the induced plastic response” [8]. According to this definition, the Baldwin effect describes the evolution of a “target” genotypic trait that corresponds to the environment-induced plastic response at the phenotypic level. In other words, the induced plastic response determines the direction towards which the genotype evolves. This definition is especially relevant when considering biologically inspired optimization techniques [13].
A well-known example of the Baldwin effect is that learning, i.e. an instance of phenotypic plasticity, affects the evolutionary process by either speeding up or slowing down the evolution of the “target” genetic configuration.
This work demonstrates that this definition is too restrictive, as a genotypic trait is shown to evolve that differs from the environment-induced plastic response. The term Baldwin veering effect is introduced to refer to this new finding and defined as follows: a change in the selection pressure of genetic variations, caused by phenotypic plasticity and induced plastic responses, leads evolution in a different direction from that indicated by the induced plastic response. In other words, the Baldwin veering effect happens when a trait evolves by effect of plasticity that does not correspond to the environment-induced plastic response.
In order to demonstrate the existence of the Baldwin veering effect, the following two conditions have to be verified:
A trait evolves that differs from the induced plastic response, i.e. the genome and the phenotype converge towards different trait values.
The evolution of such trait is caused by plasticity, i.e. the genome converges towards different trait values in presence or absence of plasticity.
The effect of plasticity—we choose learning among many potential mechanisms, e.g. polyphenism [14]—on evolution is studied both computationally by means of an agent-based model of a foraging task, modeled after previous work [15, 16], and analytically by means of a mathematical model [17]. Computational experiments and analytical results in a cyclically-changing environment demonstrate the existence of both the Baldwin effect and of the Baldwin veering effect. Specifically, it is found that in a quickly-changing cyclical environment, learning agents evolve a generalist foraging strategy that allows them to adapt quickly to changes in the resource distribution. A generalist configuration is never induced, i.e. learned, at the phenotypic level. Analytical results confirm that plasticity changes the fitness landscape in a way that makes a generalist configuration a global optimum in the space of genotypes.
The novelty of this result is to expand the understanding of the effect of plasticity on evolution by demonstrating that plasticity can affect both the speed and the outcome of evolution. A fundamental difference of this result from previous work [14, 18, 19] is that learning is not only shown to change the phenotype but the genotype as well.
The main contributions of this paper are to show that in a cyclically-changing environment: (I) the well-known Baldwin effect is present, (II) the novel Baldwin veering effect is present, (III) a mathematical model captures this new effect and confirms the experimental findings, and (IV) the existence of this new effect depends only on the relation between the speed of learning and the frequency of change in the environment.
2 Methods
The computational model follows the agent-based methodology [20] by studying the interactions of a population of software agents, subject to an evolutionary process [21], that perform a foraging task [15, 22], i.e. search the environment for food in a grid like environment. This model builds on the extensive research in the artificial life community, where software agents have been provided with learning mechanisms [11, 23–26] in an evolutionary context. The time-step driven simulation model is based on previous work [27] and favors simplicity over realism. Modeling realistic entities and ecosystems is outside the scope of this work.
This section provides an overview of the essential components of the computational model, the possible phenomenon occurring at every simulated abstract time-unit, and a detailed description of the environment, agents, decision making, and evaluation metrics. Subsection 2.1 describes the cyclically-changing environment, the resources to be foraged and their seasonality. Subsection 2.2 describes the agent most relevant parameters (aptitude and skill) and the trade-offs that these parameters cause during foraging, the relation between foraging and energy levels, and how energy levels affect fitness, reproduction, and death. Subsection 2.3 details the agent behavior (reactive and learning); additionally, it is explained how reproduction affects the parameters related to the decision making process. Finally, subsection 2.4 defines the measures used to evaluate the agents’ behavior. Table 1 contains an overview of the notation used throughout the description of the model. The results presented in this paper are the outcome of 300 Monte Carlo type simulations for each specific scenario.
Table 1. Summary of the mathematical notation used in order of appearance in the text.
Math symbol | Description |
---|---|
The set of all N agents ever alive in the simulation | |
R = {r0, …, rM} | The set of M resource types |
Set of all cells containing resources | |
Φ | The maximum quantity of resource that any cell can contain |
The quantity of resources of type r in cell i at time t | |
The configuration of the environment at time t | |
The time steps, t of the simulation | |
The skill level of agent a at time t | |
The population at time t | |
The fitness function | |
Energy level increased by successful foraging | |
The foraging success function of agent a for resource type r | |
The decision function which determines the behavior of agent a at time t | |
O = {o1, …, on} | The set of n possible actions |
The probability at time t of agent a to forage resources of type r | |
The probability of reproduction of agent a at time t, capped at cr | |
cr | The normalization constant of reproduction |
The probability of death of agent a at time t, capped at cd | |
The age function, linearly increasing in time. | |
cd | The normalization constant of death |
is visible to a } | The perception vector of agent a at time t |
The foraging history of agent a and resource type r at time t | |
Ta,r = {t ∈ T: a choses to eat r} | The times at which agent a executes a foraging action on a resource of type r |
The simulation length | |
The length of seasons | |
The foraging history of agent a at time t | |
The behavior function which assigns a value to every action |
2.1 Environment
The environment is modeled as a square grid of size m × m with continuous boundary conditions in which agents can move. Every grid cell can contain one of the two resource types, i.e. |R| = 2, whose proportions vary over time [28] such that in every “season” a specific resource is more abundant than the other.
2.1.1 Food sources
The number of cells with resources, |Ft|, is constant at every point in time: whenever one cell is emptied, a random quantity of resources of the same type spawns at a random location. New food sources are initialized to contain a random quantity of food, driven by the parameter Φ that determines the abundance of food.
2.1.2 Seasons
The environment cycles periodically between two different configurations, named seasons [14, 28], which determine what resources are available for agents to forage. Foraging of different resource types is subject to trade-offs: the more an agent specializes in the gathering and consumption of one resource, the less effectively it forages the other resource, e.g. due to neophobia [29], a non-transferable skill set or other constraints, e.g. energy or memory constraints. This trade-off is modeled by a single skill parameter that determines the probability of success of foraging two resource types [30]. Environmental change is a known requirement for the evolution of learning, and seasons offer enough predictability for learning to be effective [31].
2.2 Agents
The agents serve as an abstract model for simple biological entities, which require to find food and forage in order to survive and reproduce. Agents are able to perceive their surroundings, i.e., defined as their range one Moore neighborhood, in the grid-like environment; the perception vector is denoted as . A range one Moore neighborhood in a two-dimensional square grid is comprised of eight surrounding cells (horizontal (2), vertical (2) and two diagonals(4)).
Agent actions can either be a movement, that displaces them by one cell in the environment, or foraging, that consumes any available food in their current location. A foraging action fails if the current cell does not contain any resource, or randomly with probability otherwise (for agent a with skill level s at time t for resource type r).
2.2.1 Aptitude and skill
The foraging strategy of an agent is determined by two parameters: (i) aptitude, which defines the value encoded in the genome and inherited from the parent, and (ii) skill, which defines the corresponding phenotypic expression and models the trade-off of specialization in a specific resource type [30] by influencing the probability of successful foraging.
For this reason, the skill of an agent is a determinant factor for the energy intake, their ability to reproduce, and consequently the fitness of the agents. The aptitude remains constant during the whole lifetime of an individual and changes only between generations via random mutations during reproduction. The initial value of skill at birth is determined by the inherited value of aptitude. If the skill parameter is plastic, i.e. adapts to the environment during the agent’s lifetime, then the value of aptitude influences only indirectly the energy intake of learning agents.
2.2.2 Energy level
The energy level of an individual depends on three factors: (i) the availability of resources in the environment at each given time, (ii) the individual skill which determines the probability of successful foraging, and (iii) the individual behavior which determines what actions to execute for a given configuration of the environment.
More formally, the fitness function f(a, t) of an agent at time t is defined as the total energy intake:
(1) |
Where is the foraging history and ϵ is the energy level increase factor.
Fitness depends on the foraging success function g:
(2) |
2.2.3 Foraging, reproduction, death, and fitness
The experimental design introduces a trade-off between the foraging success of different resource types, determined by the skill : agents can either become generalists, i.e. be able to forage both resources with a low probability, or specialize, i.e. be able to forage one resource with a high probability and lose the ability to forage the other.
Successful foraging increases the energy of an individual which determines the probability of reproduction. As agents compete for the same limited resources, efficient foraging translates to high reproduction rate.
The probabilities of foraging Pf, reproduction Pr and of death Pd are defined as:
(3) |
With a linear relation between skill and probability of foraging success, i.e. q = 1, the average total intake of an agent is equivalent to the average resource distribution: a specialist agent forages with certainty one type of resources but none of the other, while a generalist agent forages each resource with 50% probability. Assuming a non-linear relation between skill and foraging probability instead, i.e. q > 1, then a specialization leads to higher fitness than a generalization.
The effects of these values can be found in the supplementary material, S3–S5 Figs.
The framework determines the reproduction and death events by means of a genetic operator called roulette wheel selection with stochastic acceptance (as in Torney et al. 2011 [32]), according to which agents reproduce asexually with a probability Pr proportional to their fitness and die with a probability Pd proportional to their age. Upon reproduction, the energy level ϵ of the parent is split equally between the parent and the offspring and the offspring inherits a randomly-mutated copy of the parent’s genetic configuration.
2.3 Agent behavior
An agent’s desired behavior associates the desired action to each perception vector , containing a representation of the surroundings that informs about the location and presence of resources. This mapping between perception and action can be achieved by different techniques, e.g. an artificial neural network. The success of the desired action is determined by the skill value, which is defined as the phenotypic expression of the aptitude genotype.
The aptitude and the mapping B(a, t) changes from one generation to the next due to random mutations, and learning allows the inherited skill and the phenotypic expression of the mapping B(a, t) to be more suited to the current state of the environment.
2.3.1 Agent types and learning
Two types of agents are introduced: reactive agents keep their behavior and skill constant throughout their lifetime, as they are a direct expression of the genotype, while learning agents adapt their behavior and skill according to their experience via reinforcement learning [16, 25, 26, 33–35]. Learning optimizes the expected reward associated with successfully foraging a resource of any type. Different reinforcement learning architectures are evaluated: Q-Learning [36], reinforcement learning based on a Restricted Boltzman Machine [37], Deep Reinforcement Learning [38] and reinforcement learning based on a single feed forward perceptron. The results presented in the main text are based on a single feed forward perceptron, see the supplementary material for further details, Section B in S1 File).
If learning is disabled (reactive agents), weights and skills cannot be learned and remain constant and equal to the inherited value for the whole lifetime of the individual, hence individuals are selected based on their inherited aptitude value. If learning is enabled, the behavior can adapt to changes in the environment. Specifically, the adaptation process happens through directly increasing the skill value after every successful foraging event and by using the successful foraging event as a reward signal in the reinforcement learning algorithm.
2.3.2 Genotype and mutations
Upon reproduction, an offspring is generated that contains a mutated copy of the parent’s genome, consisting of the initial weights of the neural network, prior to any learning, and an additional gene called aptitude. These values are used to initialize the phenotype of the offspring.
2.3.3 Reinforcement learning
Although, modeling biologically realistic entities is outside the scope of this paper; the study of the biological feasibility of different learning techniques including different versions of reinforcement learning, have shown that reinforcement learning is being able to reproduce certain human decision-making process and equilibrium [39–41]. More recently it has been shown that human level strategies can arise from reinforcement learning-based systems, even without human data [42]. In the case of this implementation of reinforcement learning, the behavior function of an agent a takes the form of which indicates the Q-values for all actions and state . The mapping between perceptions and actions is done via a neural network. Agents perceive their the environment, specifically, they are able to see a subset of the grid centered at their location (range one Moore neighborhood) and are able to identify food sources within this visual range, . For the current model, a 3 × 3 region is observable and the food sources are observable but without the specificity of the amount of food contained. Based on this perception and using the neural network based choice model agents chose an action from their action space: move (north, south, east, west) or eat. Agents with different learning algorithms (Neural Network type) behave differently when faced with a variable environment, in terms of convergence and adaptation to change (see the supplementary material, Sections B and C in S1 File).
The agent skill is learned by increasing its value by ΔS after every time it performs foraging successfully, while for the choice of action the learning algorithms are based on the Reinforcement Learning approach, Q-Learning [36]. The Q-Table, a mapping from states/perceptions and possible actions O to the quality value of each action for that state , of the original Q-Learning approach is replaced by a Q-Network as per [38] and the corresponding algorithm for the specific Q-Network structure is used for its training. The following equation describes the update to the quality values:
(4) |
(5) |
The results presented in the main text are based on Reinforcement learning using a single layer feed forward perceptron as its network architecture to “store” and query the Q-values, trained with backpropagation (PQL). The Q-network structure is where is an input vector, W are the weights of the neural network and β the biases associated to the input layer. Further details about the variations depending on different neural network structures can be found in the supplementary material, Sections B and C in S1 File.
2.4 Measures
The degree of specialization of a population is measured with different metrics:
(I) the distribution of individual aptitudes across the population, according to which a higher frequency of extreme values corresponds to a more specialized population, (II) the individual foraging history, i.e. the frequency of successful foraging actions for a specific resource type, according to which extreme values indicate a specialized diet, (III) standard measures of group behavior that quantify the rate of consumption of resources ([43], page 241).
The degree of specialization of the population is measured by the distribution of aptitudes (I) at each given timestep, normalized by the population size at that timestep:
(6) |
The foraging history (II) of the population at value x is measured as the frequency of individuals in the population who, during their lifetime, foraged a specific proportion of type r resources corresponding to x:
(7) |
Additionally, standard measures of group behavior (III), taken from [43], page 241, are used to quantify the specialization of the population. The measures are defined and explained in the supplementary materials, Section H in S1 File.
While (I) measures the characteristics of the genotype, (II) and (III) measure the behavior of the agents which is determined by the phenotype.
3 Results: Computational model
3.1 The Baldwin effect
Previous work in the literature about the Baldwin effect found that the evolutionary process can be either speed up or slowed down [2] depending on the learning mechanism, the fitness function and the starting conditions of the population. Simulations are performed to verify whether or not the Baldwin effect exists in a cyclical environment, a question that, to the best of our knowledge, has not been answered before [17].
The existence of the Baldwin effect is evaluated by means of simulation by comparing the speed of genetic assimilation of phenotypic features as a function of the learning ability.
Fig 1 shows a comparison over time of three agent types in terms of the genetic assimilation of aptitude values due to changes in skill value:
Reactive agents: baseline, i.e. unable to learn.
Learning (Actions): agents that can modify their own actions through learning, a speedup in the genetic assimilation is observed.
Learning (Actions & Skill): agents that can modify their own actions and their skill through learning, a slowdown in the genetic assimilation is observed.
The only difference between agent types concerns what traits can be learned. All other parameters of the learning algorithms are constant across types. The dependence of the speed of genetic assimilation on the degree of learning confirms the presence of the Baldwin effect.
3.2 A new effect: The Baldwin veering effect
This experiment investigates whether the Baldwin veering effect exists, i.e. a trait evolves by effect of plasticity that does not correspond to the plastic response induced by a cyclically-changing environment.
Slowly-changing environments allow populations to adapt via natural selection. Learning helps natural selection traversing the space of genetic configurations [44], and does so on a shorter timescale, therefore learning might speed up or delay this process. In quickly-changing environments, which change faster than the evolutionary timescale, learning and natural selection take on two different roles: Learning improves the behavior of agents in response to environmental variability, while natural selection improves the efficiency of learning.
The Baldwin veering effect is present if the following two conditions are verified: (i) the evolved trait differs from the environment-induced plastic response, i.e. genome and the phenotype converge towards different values, and (ii) this effect is determined by the presence of plasticity, i.e. the genetic configuration evolved by learning agents differ from that evolved under the same conditions by reactive agents.
Two genetic configurations are considered: a specialist configuration is defined as a genome whose aptitude evolves to one of the extreme values, i.e. specializes in either resource type, generalist configuration is defined as a genome whose aptitude evolves to an intermediate value. Different genetic configurations correspond to different initial learning efforts in terms of time required to adapt to the environment; assuming that an individual has the same probability of being born in either season, the optimal genetic configuration should reduce equally the effort of learning either skill.
The first condition is verified in Fig 2 by comparing the evolution over time of the inherited aptitude of populations of learning agents in a slowly-changing (Left) and in a quickly-changing environment (Right). Being able to quickly adapt to changes in the environment, the phenotype of learning agents tracks changes in resource availability, hence the induced plastic response is a specialized strategy corresponding to the most abundant resource type. In a slowly-changing environment, the genetic trait evolves towards the induced plastic response, while in a quickly-changing environment, the genetic trait evolves an intermediate value that corresponds to a generalist strategy which is not induced at the phenotypic level.
The second condition is verified in Fig 3 by comparing the evolution over time of the inherited aptitude of a population of reactive agents (Left) with that of a population of learning agents (Right). Both populations are initialized with an intermediate aptitude value, which evolves over time until it converges to some configuration after around 4000 timesteps. Reactive agents evolve extreme aptitude values, i.e. a specialist configuration. Specifically, half of the population evolves a high aptitude value (specialist in one type of resource) and the other half a low aptitude value (specialist in the other type of resource). The foraging success of reactive agents is determined directly by the static skill as inherited from the aptitude value, hence each half of the population specializes in foraging one or the other type of resource. For learning agents the foraging success is determined by the skill level, whose initial adaptation effectiveness is determined by the aptitude value. As a consequence, learning agents evolve an intermediate aptitude value, i.e. a generalist configuration, which allows them to adapt quickly to any environmental condition. Fig 4 highlights the difference between genetic configurations evolved by the two populations at the end of the simulation.
In the following section, we present further supporting evidence for these results.
3.3 Differences in individual behaviors
In order to verify that a difference in genetic configuration actually results in different behaviors, in this section, we analyze reactive agents instantiated with the genetic configurations of the agents that are alive during the last time-step of the previous simulations (see Fig 4). In order to produce a fair comparison, reproduction is also disabled, this way the only variable in the simulations is the genetic configuration which remains constant during these simulations and is expressed directly in the phenotype.
The goal of these new simulation set is to quantify the difference between genetic configurations evolved by different populations, this is achieved by evaluating the behavior that such configurations encode.
In these new simulations, the environment is set to have only one season and contains an equivalent quantity of both types of resources. An abundance of both resource types allows any foraging strategy to perform at its best, hence contributing to a fair comparison of different foraging strategies in terms of foraging success. The behavior of individuals is compared with the measures of foraging history and of group behavior, which are described in Section 2.4.
Fig 5 shows the foraging history of the two populations of study. Additionally, it shows the foraging history of 2 baseline populations. These baseline populations are also reactive agents instantiated with genetic configurations specifically aimed to produce specialist behavior (being able to eat only one food type with high probability) and generalist behavior (being able to eat both food types with 50% probability). The foraging history shows that the behaviors in the two populations of study differ (cf. Fig 5), namely the population instantiated with the last reactive configuration is split into two groups of comparable size, each of which is specialized in foraging one type of resource, while the population instantiated with the last learning configuration has a more uniform foraging pattern which includes more generalists. The measure of individual foraging history is quantified by the frequency of foraging resources of type one, e.g. a value of 90% indicates that 90% of all resources foraged by the agent were of type one, and the remaining 10% of type two. These values are then aggregated across the population to determine the frequency of different values of foraging history. The inset of Fig 5 reports the L2 Norm between the distributions; learning configuration agents distribution is closer to the generalists’ distribution (0.19) while reactive configuration agents distribution is closer to the specialist distribution (0.25).
Besides the measure of foraging history, different standard measures of group behavior [43] are used to compare the behavior of the populations (cf Fig 6). The interpretation of these measures is not straightforward, so baselines are added for reference: the dashed line represents the value of a population where half of the agents specialize in one resource and the other half in the other resource, while the continuous line represents a population of generalists.
The measures confirm that the learning configuration agents develop a generalist foraging strategy, both on the group level (among-resource diversity) and on the individual level (within-individual diversity). In contrast, last reactive configuration agents develop a more specialized foraging strategy on the group level (among-resource diversity). Understanding whether or not reactive configuration agents develop a specialized foraging strategy on the individual level is not straightforward, as a high value of among-resource diversity can either mean that different agents have different specialized diets or that agents have generalized diets. Combining this measure with that of within-individual diversity, which indicates a specialized diet on the individual level, allows us to conclude that specialization occurs also on the group level.
4 Results: Analytical model
The results outlined in the previous section showcase the existence of the Baldwin veering effect, but give little information about the process behind it. This section introduces and analyzes the predictions of an analytical model, inspired by previous work [45], which gives possible explanations to the simulation results and identify the conditions under which the Baldwin veering effect manifests. The model defines a fitness function for a generic individual, the evolutionary process is not explicitly modeled so evolutionary outcomes are inferred from considerations about the relative fitness of different individuals. Time and location of agents are not explicitly modeled, this abstraction is sensible because of the deterministic nature of seasonal changes, i.e. the environment displays the same conditions on average over each seasonal cycle. More fine-grained results about evolution and its dynamics might be obtained by pairing the fitness function with an existing model of evolution, e.g. [45, 46], such effort is outside the scope of this paper and is left for future work.
4.1 Description of the analytical model
The environment contains two types of resources, j = {0, 1}, whose proportion is denoted by π0 and π1.
The fitness Wi of a reactive agent i is formulated as follows:
(8) |
Where the foraging success is determined by the agent’s skill si,j ∈ [0, 1] (which is equal to the aptitude level, being it a reactive agent) and by a parameter which defines the relation between skill and foraging success. If the parameter q = 1, specializing on one resource and generalizing on two resources lead to the same foraging success. If q < 1 generalization becomes more beneficial than specialization as intermediate aptitudes produce a higher foraging success than extreme ones. Vice versa, specialization is more beneficial when q > 1 as the reward function is concave, a requirement for the co-existence of specialists and generalists in the same environment [47].
Following the design of the computational model, the two skills of an agent, as well as the resource proportions, are assumed to be complementary, i.e. si,0 + si,1 = 1, π1 + π0 = 1, therefore the notation can be simplified by defining si ≔ si,0, 1 − si ≔ si,1 and π1 = 1 − π0 which leads to:
(9) |
In order to model learning agents, a new parameter δ is introduced which represents plasticity. The parameter c determines the cost of plasticity [48, 49]. A learning agent is not constrained by its inherited aptitude α, as its skill can adapt to changes in the environment. The value of δ determines the skills an agent can express by defining the maximum and minimum skill values: this range is centered on the aptitude and spans in both directions (cf. Fig 7), si = αi ± δ. Given that the skill value is limited in the domain [0, 1], the previous expression for the bounds of skill values s is limited as follows, si = min(1, αi + δ) for the beneficial side and si = max(0, αi − δ) for the dis-favorable effect. For example an individual with aptitude 0.1 and δ = 0.6 can express any skill value in the range [0, 0.7]. As the aptitude is also constrained to the range [0, 1] the range of meaningful δ is also between [0, 1]. In this model, we consider only the effect of plasticity that increases the skill level (i.e learning that improves a skill):
(10) |
For simplicity, the model assumes that agents adapt instantaneously to the environment by adopting the best available skill value for each resource type, i.e. skill of si = αi + δ for resource type π0 and skill of si,1 = αi,1 + δ = 1 − (αi − δ) for resource type π1, which maximize the fitness function. The speed of learning, also called time lag, is modeled by reducing the value of δ (cf. Fig 8). In practice the value of δ depends on the ratio between the speed of learning and the season length: a slower learning mechanism reduces the distance to which the value can change, similarly, a shorter season reduces the number of experiences an agent has during a season.
4.2 Analysis: Baldwin veering effect
Fig 9 shows how different aptitudes compare, in terms of fitness, for varying values of plasticity δ. A combination of aptitude and plasticity associated with a higher fitness value produces more fit individuals that are favored by natural selection. The red circles represent the globally optimum aptitudes for a given value of δ, i.e. the attractors in genetic configuration space of the evolutionary process. If δ < 0.5 agents evolve a specialist configuration, as opposed to a generalist configuration if δ = 0.5. Note that the configuration with δ = 0.5 and aptitude αi = 0.5 maximizes the fitness as it allows agents to choose any skill value in the range [0, 1], hence allows agents to forage both resource types with certainty. This condition is observable in the agent-based model simulation when the speed of learning is as fast as the frequency of change in the environment, i.e. an agent adapts its skill to a new environmental state but does so too slowly to remain specialized for a long time before the environment changes again. This confirms the existence of the “Baldwin veering effect”, as any value of δ > 0 changes the fitness landscape such that fitness is maximized by a different aptitude, which is then selected. These results hold even for asymmetric seasons, i.e. when the probability of one season is higher (cf. Fig 9 right).
For values of δ > 0.5, learning makes an increasingly large range of aptitude values equivalent in terms of evolutionary fitness which could allow agents to generalize, but such a configuration would not evolve in reality as the overall fitness is reduced when compared to δ = 0.5. These results are confirmed also for c = 0 and q ≤ 1, see the supplementary material, Section G in S1 File.
Concluding, learning agents evolve an intermediate aptitude, i.e. a generalist configuration, only if learning speed is proportionate to the season length such that agents can adapt to both resource types. This result is general and holds independently of the value of q and resource proportion π0, hence confirms that the Baldwin veering effect depends exclusively on the timescales of learning and environmental change.
5 Discussion
A common finding in the literature about the interactions between plasticity and cyclically changing environments is that plastic individuals, who can adapt to changes in the environment after a certain time lag, i.e., speed of learning, are more fit than non-plastic individuals, who are unable to adapt, when the frequency of change in the environment is faster than a certain threshold. The definition of plasticity varies in the literature: plasticity is modeled as switching between two distinct phenotypes [14, 50], as a change in niche breadth [51], or—as in this work—as behavioral adaptation through learning [52, 53]. Related work concludes that similar patterns of specialization and generalization in the phenotype might develop also when assuming non-reversible plasticity, i.e. a phenotypical trait can assume only one specialized state in an individual’s lifetime [19]. Although this work focuses on reversible plasticity, i.e. the same phenotypical trait can change from one specialized state to another, reversibility is not claimed to be a prerequisite for the existence of the Baldwin veering effect. Our claim about the existence of the Baldwin veering effect is not invalidated by whether or not the effect manifests also in the presence of non-reversible plastic traits, this is nevertheless a promising research question for future work.
It is important to note that although the concepts of specialization, i.e. adaptation to only one state, and generalization, i.e. trading-off adaptation across more than one state, are consistent across the literature, the concepts of generalist and specialist can differ substantially: while the definition of specialists adopted by this work implies the ability to specialize in only one resource, unless the agent is able to learn, other work defines them as able to specialize simultaneously on many resources [14]. To the best of our knowledge, this work is the first to investigate the effect of plasticity and cyclical changes in the environment on the evolution of the genetic configuration. Although, [54] considers local variation (cycles) the analysis is focused on a single movement to an extreme environment; our work is consistent in terms of the expected genetic assimilation. Previous work investigates the evolution of plasticity by analyzing the co-evolution of populations with different genetic configurations [55, 56], or by analyzing the scaling of plasticity itself [54] while in the current work traits are either plastic or not. Due to the differences between our models, we cannot make more extensive claims, i.e. regarding the development of co-evolving populations nor the evolution of plasticity itself. However, results pertaining to the more general aspects of the Balding effect, i.e. genetic assimilation, is consistent across the works. Other work [14] predicts that non-plastic individuals would evolve a genotype that leads to a wide tolerance function, making the individual able to adapt to a broader set of environmental configurations. This is in conflict with our finding that non-plastic individuals specialize in one environmental configuration, which leads to a split in the population. We believe this difference is caused by a modeling assumption that is relaxed in this paper, i.e. each agent can express a different phenotype (tolerance function) for each environmental state. The thread of literature looking at the evolution of artificial neural networks [52, 57] concludes that different levels of plasticity lead to the evolution of different weights. The main difference with the proposed model is that the environment changes [52] during an individual’s lifetime [57], hence that model is not able to capture the effect of the frequency of environmental change on evolution which is crucial for the results presented in this work.
The results presented in this work rely on the assumption that information about the environment is always precise. Relaxing this assumption requires the consideration of imperfect perceptions. Hence, the agents need to learn an estimate of the environmental state, before they can begin phenotypic adaptation [58] or while they are adapting [19]. Previous work finds that agents with imperfect perception learn accurate estimates of the probability distribution of environmental states and demonstrates genetic assimilation of phenotypic features, i.e. the Baldwin effect [59]. These results suggest that the Baldwin veering effect does not depend on the assumption of perception accuracy.
The aim of this paper is to provide a proof of concept, not modeling realistic entities, hence the model is constrained to only two resources. Increasing the complexity of the environment, as well as introducing group behavior, is required to model any realistic ecosystem and is left for future work.
The Baldwin veering effect can be interpreted as the interaction between two different traits throughout the evolutionary process. This interpretation could be described as a co-evolutionary process between two different traits in the same genotype: (1) the evolutionary selection pressure on the existence of plasticity implies a specific evolutionary selection pressure on (2) the aptitude level.
This effect can also be interpreted as an extended form of gene interactions [18] that affects both the phenotype and the genotype.
Plastic behavior is the outcome of complex interactions between genes, for the sake of tractability, this work abstracts these interactions as the effect of one single gene called aptitude. This simplification is reasonable to model some simple natural organisms e.g. fish behavior [60] and foraging in bacteria [61]. Gene interaction has an effect on the model, i.e. the interaction between the aptitude and the gene for plasticity changes the phenotypic expression of individuals [18]. Nevertheless, the results are more than just a special case of gene interaction as the presence of a “plasticity gene” causes changes both at the phenotypic and at the genotypic level, i.e. a different genetic code evolves in the population, which in turn produces different phenotypic traits.
Although this suggests the existence of the hypothesized effect, the theory does not clarify what processes cause learning agents to evolve a generalist configuration instead of a specialist configuration. One possible mechanism is that a generalist configuration allows individuals to have a more constant foraging success than a specialist configuration, as a skill level that oscillates around an average value allows individuals to forage more or less constantly throughout their lifetime, while a skill level oscillating around any of the extremes would result in periods of high and periods of low foraging success. An imbalance in foraging success translates to higher variance in offspring number, which is known to reduce the fitness [62]. Another possible mechanism for the evolution of a generalist configuration would be to provide indirect rewards: then, an intermediate aptitude would increase the evolutionary fitness indirectly by allowing for a faster adaptation to any environmental configuration. A similar mechanism has been described in the literature about intrinsic motivation, where evolution favors actions that are providing rewards only indirectly. For example, it has been found that curiosity and playfulness at a young age can improve fitness at a later age [63]. Understanding what processes cause the evolution of a generalist configuration is a worthy result in its own right which should be addressed in future work.
Another relevant observation is one of the convergent phenotypic outcomes. This phenomenon is highlighted in the fact that the neural network weights defining the desired behavior are initialized randomly and individuals with different genetic composition (weights) converge to similar behaviors but different weight composition. This shows that the large space of weights combinations posses a large number of equivalent optimums. Further studies in this topic might provide deeper insights relevant to the machine learning community.
Future work will also verify the predictions of the analytical model within the agent-based simulation framework, in particular, that there exists a configuration for which a learning population splits into two groups of specialists with aptitude values in [0, 0.5] and [0.5, 1] respectively, and a configuration in which learning population evolve a uniform distribution of aptitude values.
6 Conclusions
Plasticity, e.g. learning, is known to influence the speed at which evolution converges to some “target” configuration. This work, in contrast, addresses the question of whether or not plasticity in a cyclically changing environment can lead to the evolution of a different genetic configuration. Following previous work, this question is answered by means of an agent-based model of a foraging task, with cyclical variability in the resource distribution. Additionally, this result is confirmed through an analytical model.
Experimental and analytical results show the existence of the Baldwin effect in a cyclical environment and identify the novel “Baldwin veering effect”, i.e. a trait (generalist configuration) evolves by effect of plasticity that does not correspond to the plastic response induced by a cyclically-changing environment (specialist configuration) and the conditions under which it exists. A mathematical model verifies that the introduction of plasticity in the phenotype changes the fitness landscape in a way such that a generalist configuration becomes the global optimum in the space of genotypes.
These results are relevant for the literature of Evolutionary Biology, as they expand the understanding of how phenotypic plasticity influences evolution and present a novel effect caused by the interaction between learning and evolution. These results might also help to understand the effect of a fast learning process on a slow learning process in another context, which has a cyclical component, for example, opinion formation in settings where learning [64, 65] mediates the rate of exposure to different opinions [66, 67].
6.1 Data availability
The source code used to generate and analyze the data-sets is available on GitHub [68, 69]. Other information, including the parameters and libraries used, is provided in the Supplementary information, see Sections D and E in S1 File.
Supporting information
Acknowledgments
The authors would also like to thank Leonard Wossnig and Johannes Thiele for their help in developing the simulation framework. S.B. and L.A. are Joint First Authors of this publication and their names are presented in alphabetical order.
Data Availability
The source code and data to reproduce this work are available from Github at the following link: https://github.com/leaguilar/baldwin_veering/.
Funding Statement
The authors acknowledge support by the European Commission through the European Research Council Advanced Investigator Grant ‘Momentum’ (Grant No. 324247). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1. Baldwin JM. A new factor in evolution. The american naturalist. 1896;30(354):441–451. 10.1086/276408 [DOI] [Google Scholar]
- 2. Ancel LW. Undermining the Baldwin expediting effect: does phenotypic plasticity accelerate evolution? Theoretical population biology. 2000;58(4):307–319. 10.1006/tpbi.2000.1484 [DOI] [PubMed] [Google Scholar]
- 3. West-Eberhard MJ. Phenotypic plasticity and the origins of diversity. Annual review of Ecology and Systematics. 1989;20(1):249–278. 10.1146/annurev.es.20.110189.001341 [DOI] [Google Scholar]
- 4. DeWitt TJ, Scheiner SM. Phenotypic plasticity: functional and conceptual approaches. Oxford University Press; 2004. [Google Scholar]
- 5. Via S, Gomulkiewicz R, De Jong G, Scheiner SM, Schlichting CD, Van Tienderen PH. Adaptive phenotypic plasticity: consensus and controversy. Trends in Ecology & Evolution. 1995;10(5):212–217. 10.1016/S0169-5347(00)89061-8 [DOI] [PubMed] [Google Scholar]
- 6. Sterelny K. A review of Evolution and learning: the Baldwin effect reconsidered edited by Bruce Weber and David Depew. Evolution & Development. 2004;6(4):295–300. 10.1111/j.1525-142X.2004.04035.x [DOI] [Google Scholar]
- 7. DeJager J. Baldwin’s Remarkable Effect. Biological Theory. 2016;11(4):207–219. 10.1007/s13752-016-0250-6 [DOI] [Google Scholar]
- 8. Crispo E. The Baldwin effect and genetic assimilation: revisiting two mechanisms of evolutionary change mediated by phenotypic plasticity. Evolution. 2007;61(11):2469–2479. 10.1111/j.1558-5646.2007.00203.x [DOI] [PubMed] [Google Scholar]
- 9. Burkhardt RW. Lamarck, evolution, and the inheritance of acquired characters. Genetics. 2013;194(4):793–805. 10.1534/genetics.113.151852 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Scheiner SM. The Baldwin effect: neglected and misunderstood. The American Naturalist. 2014;184(4):ii–iii. 10.1086/677944 [DOI] [PubMed] [Google Scholar]
- 11. French R, Messinger A. Genes, phenes and the Baldwin effect: Learning and evolution in a simulated population. In: Artificial Life IV; 1994. p. 277–282. [Google Scholar]
- 12. Schull J. Are species intelligent? Behavioral and Brain Sciences. 1990;13(1):63–75. 10.1017/S0140525X00077785 [DOI] [Google Scholar]
- 13. Whitley D, Gordon VS, Mathias K. Lamarckian evolution, the Baldwin effect and function optimization In: International Conference on Parallel Problem Solving from Nature. Springer; 1994. p. 5–15. [Google Scholar]
- 14. Gabriel W, Luttbeg B, Sih A, Tollrian R. Environmental tolerance, heterogeneity, and the evolution of reversible plastic responses. The American Naturalist. 2005;166(3):339–353. 10.1086/432558 [DOI] [PubMed] [Google Scholar]
- 15. Hamblin S, Giraldeau LA. Finding the evolutionarily stable learning rule for frequency-dependent foraging. Animal Behaviour. 2009;78(6):1343–1350. 10.1016/j.anbehav.2009.09.001 [DOI] [Google Scholar]
- 16. Red’ko V, Prokhorov D. Learning and Evolution of Autonomous Adaptive Agents. Advances in Machine Learning I. 2010; p. 491–500. [Google Scholar]
- 17. Sznajder B, Sabelis M, Egas M. How adaptive learning affects evolution: reviewing theory on the Baldwin effect. Evolutionary biology. 2012;39(3):301–310. 10.1007/s11692-011-9155-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Phillips PC. The language of gene interaction. Genetics. 1998;149(3):1167–1171. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Panchanathan K, Frankenhuis WE. The evolution of sensitive periods in a model of incremental development In: Proc. R. Soc. B. vol. 283 The Royal Society; 2016. p. 20152439. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Epstein JM. Agent-based computational models and generative social science. Complexity. 1999;4(5):41–60. [DOI] [Google Scholar]
- 21. Perc M, Gómez-Gardeñes J, Szolnoki A, Floría LM, Moreno Y. Evolutionary dynamics of group interactions on structured populations: a review. Journal of the royal society interface. 2013;10(80):20120997 10.1098/rsif.2012.0997 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Beauchamp G. Learning rules for social foragers: implications for the producer–scrounger game and ideal free distribution theory. Journal of Theoretical Biology. 2000;207(1):21–35. 10.1006/jtbi.2000.2153 [DOI] [PubMed] [Google Scholar]
- 23. Hinton GE, Nowlan SJ. How Learning Can Guide Evolution. Complex systems. 1987;1(3):495–502. [Google Scholar]
- 24. Nolfi S, Floreano D. Learning and Evolution. Autonomous robots. 1999;7(1):89–113. 10.1023/A:1008973931182 [DOI] [Google Scholar]
- 25. Menczer F, Belew RK. Evolving sensors in environments of controlled complexity In: Artificial life IV. MIT Press; 1994. p. 210–221. [Google Scholar]
- 26. Ackley D, Littman M. Interactions between learning and evolution. Artificial life II. 1991;10:487–509. [Google Scholar]
- 27. Bennati S. On the role of collective sensing and evolution in group formation. Swarm Intelligence. 2018;. 10.1007/s11721-018-0156-y [DOI] [Google Scholar]
- 28. Pulliam HR, Dunning JB, Liu J. Population dynamics in complex landscapes: a case study. Ecological Applications. 1992;2(2):165–177. 10.2307/1941773 [DOI] [PubMed] [Google Scholar]
- 29. Beissinger S, Donnay T, Walton R. Experimental analysis of diet specialization in the snail kite: the role of behavioral conservatism. Oecologia. 1994;100(1):54–65. 10.1007/BF00317130 [DOI] [PubMed] [Google Scholar]
- 30. Laverty TM. Costs to foraging bumble bees of switching plant species. Canadian Journal of Zoology. 1994;72(1):43–47. 10.1139/z94-007 [DOI] [Google Scholar]
- 31. Dridi S, Lehmann L. Environmental complexity favors the evolution of learning. Behavioral Ecology. 2015;27(3):842–850. 10.1093/beheco/arv184 [DOI] [Google Scholar]
- 32. Torney CJ, Berdahl A, Couzin ID. Signalling and the evolution of cooperative foraging in dynamic environments. PLoS computational biology. 2011;7(9). 10.1371/journal.pcbi.1002194 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Gruau F, Whitley D. Adding learning to the cellular development of neural networks: Evolution and the Baldwin effect. Evolutionary computation. 1993;1(3):213–233. 10.1162/evco.1993.1.3.213 [DOI] [Google Scholar]
- 34. Batali J, Grundy WN. Modeling the evolution of motivation. Evolutionary Computation. 1996;4(3):235–270. 10.1162/evco.1996.4.3.235 [DOI] [Google Scholar]
- 35.Nolfi S, Miglino O, Parisi D. Phenotypic plasticity in evolving neural networks. In: From Perception to Action Conference, 1994. IEEE; 1994. p. 146–157.
- 36. Watkins CJ, Dayan P. Q-learning. Machine learning. 1992;8(3-4):279–292. 10.1007/BF00992698 [DOI] [Google Scholar]
- 37. Hinton G, Osindero S, Welling M, Teh YW. Unsupervised discovery of nonlinear structure using contrastive backpropagation. Cognitive science. 2006;30(4):725–731. 10.1207/s15516709cog0000_76 [DOI] [PubMed] [Google Scholar]
- 38. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, et al. Human-level control through deep reinforcement learning. Nature. 2015;518(7540):529–533. 10.1038/nature14236 [DOI] [PubMed] [Google Scholar]
- 39. Roth AE, Erev I. Learning in extensive-form games: Experimental data and simple dynamic models in the intermediate term. Games and economic behavior. 1995;8(1):164–212. 10.1016/S0899-8256(05)80020-X [DOI] [Google Scholar]
- 40. Erev I, Roth AE. Predicting how people play games: Reinforcement learning in experimental games with unique, mixed strategy equilibria. American economic review. 1998; p. 848–881. [Google Scholar]
- 41.Cooper DJ, Feltovich N. Selection of Leaming Rules: Theory and Experimental Evidence ‘. 1997;.
- 42. Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, et al. Mastering the game of go without human knowledge. Nature. 2017;550(7676):354 10.1038/nature24270 [DOI] [PubMed] [Google Scholar]
- 43. Giraldeau LA, Caraco T. Social foraging theory. Princeton University Press; 2000. [Google Scholar]
- 44. Watson RA, Szathmáry E. How can evolution learn? Trends in ecology & evolution. 2016;31(2):147–157. 10.1016/j.tree.2015.11.009 [DOI] [PubMed] [Google Scholar]
- 45. Frankenhuis WE, Panchanathan K, Belsky J. A mathematical model of the evolution of individual differences in developmental plasticity arising through parental bet-hedging. Developmental science. 2016;19(2):251–274. 10.1111/desc.12309 [DOI] [PubMed] [Google Scholar]
- 46. Van Tienderen PH. Evolution of generalists and specialists in spatially heterogeneous environments. Evolution. 1991;45(6):1317–1331. 10.1111/j.1558-5646.1991.tb02638.x [DOI] [PubMed] [Google Scholar]
- 47. Wilson DS, Yoshimura J. On the coexistence of specialists and generalists. The American Naturalist. 1994;144(4):692–707. 10.1086/285702 [DOI] [Google Scholar]
- 48. DeWitt TJ, Sih A, Wilson DS. Costs and limits of phenotypic plasticity. Trends in ecology & evolution. 1998;13(2):77–81. 10.1016/S0169-5347(97)01274-3 [DOI] [PubMed] [Google Scholar]
- 49. Murren CJ, Auld JR, Callahan H, Ghalambor CK, Handelsman CA, Heskel MA, et al. Constraints on the evolution of phenotypic plasticity: limits and costs of phenotype and plasticity. Heredity. 2015;115(4):293 10.1038/hdy.2015.8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Padilla DK, Adolph SC. Plastic inducible morphologies are not always adaptive: the importance of time delays in a stochastic environment. Evolutionary Ecology. 1996;10(1):105–117. 10.1007/BF01239351 [DOI] [Google Scholar]
- 51. Kassen R. The experimental evolution of specialists, generalists, and the maintenance of diversity. Journal of evolutionary biology. 2002;15(2):173–190. 10.1046/j.1420-9101.2002.00377.x [DOI] [Google Scholar]
- 52.Floreano D, Nolfi S. Adaptive behavior in competing co-evolving species. In: 4th European Conference on Artificial Life; 1997. p. 378–387.
- 53. Wakano JY, Aoki K, Feldman MW. Evolution of social learning: a mathematical analysis. Theoretical population biology. 2004;66(3):249–258. 10.1016/j.tpb.2004.06.005 [DOI] [PubMed] [Google Scholar]
- 54. Lande R. Adaptation to an extraordinary environment by evolution of phenotypic plasticity and genetic assimilation. Journal of evolutionary biology. 2009;22(7):1435–1446. 10.1111/j.1420-9101.2009.01754.x [DOI] [PubMed] [Google Scholar]
- 55. Fordyce JA. The evolutionary consequences of ecological interactions mediated through phenotypic plasticity. Journal of Experimental Biology. 2006;209(12):2377–2383. 10.1242/jeb.02271 [DOI] [PubMed] [Google Scholar]
- 56. Chakra MA, Hilbe C, Traulsen A. Plastic behaviors in hosts promote the emergence of retaliatory parasites. Scientific reports. 2014;4:4251 10.1038/srep04251 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57. Nolfi S, Parisi D. Learning to adapt to changing environments in evolving neural networks. Adaptive behavior. 1996;5(1):75–98. 10.1177/105971239600500104 [DOI] [Google Scholar]
- 58. Frankenhuis WE, Panchanathan K. Balancing sampling and specialization: An adaptationist model of incremental development. Proceedings of the Royal Society of London B: Biological Sciences. 2011; p. rspb20110055. 10.1098/rspb.2011.0055 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59. Ramírez JC, Marshall JA. Can natural selection encode Bayesian priors? Journal of theoretical biology. 2017;426:57–66. 10.1016/j.jtbi.2017.05.017 [DOI] [PubMed] [Google Scholar]
- 60. Chapman BB, Morrell LJ, Krause J. Plasticity in male courtship behaviour as a function of light intensity in guppies. Behavioral Ecology and Sociobiology. 2009;63(12):1757–1763. 10.1007/s00265-009-0796-4 [DOI] [Google Scholar]
- 61. Gottschal JC, de Vries S, Kuenen JG. Competition between the facultatively chemolithotrophic Thiobacillus A2, an obligately chemolithotrophic Thiobacillus and a heterotrophic Spirillum for inorganic and organic substrates. Archives of Microbiology. 1979;121(3):241–249. 10.1007/BF00425062 [DOI] [Google Scholar]
- 62. Gillespie JH. Natural selection for variances in offspring numbers: a new evolutionary principle. The American Naturalist. 1977;111(981):1010–1014. 10.1086/283230 [DOI] [Google Scholar]
- 63. Singh S, Lewis RL, Barto AG, Sorg J. Intrinsically motivated reinforcement learning: An evolutionary perspective. IEEE Transactions on Autonomous Mental Development. 2010;2(2):70–82. 10.1109/TAMD.2010.2051031 [DOI] [Google Scholar]
- 64. Pariser E. The filter bubble: What the Internet is hiding from you. Penguin; UK; 2011. [Google Scholar]
- 65.Nguyen TT, Hui PM, Harper FM, Terveen L, Konstan JA. Exploring the filter bubble: the effect of using recommender systems on content diversity. In: Proceedings of the 23rd international conference on World wide web. ACM; 2014. p. 677–686.
- 66. Mäs M, Flache A. Differentiation without distancing. Explaining bi-polarization of opinions without negative influence. PloS one. 2013;8(11):e74516 10.1371/journal.pone.0074516 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67. Quattrociocchi W, Caldarelli G, Scala A. Opinion dynamics on interacting networks: media competition and social influence. Scientific reports. 2014;4:4938 10.1038/srep04938 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Code Repository: How learning can change the course of evolution;. https://github.com/bennati/baldwin_veering.
- 69.Code Repository: How learning can change the course of evolution;. https://github.com/leaguilar/baldwin_veering.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The source code and data to reproduce this work are available from Github at the following link: https://github.com/leaguilar/baldwin_veering/.