Abstract
An in vitro evolution is a simplified Darwinian evolution in well-controlled surroundings. This evolution process can be modeled as a hill-climbing or adaptive walk on a fitness landscape in sequence space. The evolving molecular system gains at least two kinds of information originating from the converged sequences and the fitness increment of the evolving biopolymer as the adaptive walker. These two represent two aspects of the biomolecular information, its extent and its content, respectively. Here, we review studies related to formulation of the “content” and “extent” of biomolecular information. The two aspects are interconnected through physicochemical properties of the biopolymer, contrary to the case of conventional information, which seems to be independent of matter. The interconnection was analyzed based on the analogy between the evolution process and thermodynamics. The linear combination of the two by a temperature-like fluctuation factor resulted in a free-energy-like monotonically increasing function during the evolution process.
Keywords: Biological information, Fitness landscape, Free fitness, In vitro evolution, Pragmatic information, Quasi-species, Sequence space
Introduction
In vitro evolution is an artificial molecular evolution that is conducted in a laboratory and driven by the Darwinian evolution mechanism. Study in this field began with Spiegelman’s experiment in 1967 (Mills et al. 1967) and has been further developed in both science and engineering, that is, the quest for the principle of emergence of functional biopolymers and practical applications to industry and medicine, e.g., evolutionary (or adaptive) drug design.
Here, we introduce several important terms to comprehend in vitro evolution theoretically. A quantitative measure of a molecular phenotype, that is, a certain physicochemical property (such as enzymatic activity or affinity to a ligand molecule or replication rate constant) of an evolving molecule is designated as the fitness. The conceptual space of all conceivable base or amino acid sequences (=genotypes) is designated as the sequence space (Maynard-Smith 1970). Each of the conceivable sequences is mapped onto their corresponding points in the sequence space. A distance between two arbitrary points is measured with the Hamming distance between the two corresponding sequences. The scalar field constructed by plotting the fitness value of each sequence into the corresponding point in the sequence space is designated as the fitness landscape, which is regarded as the evolutionary attribute of the biopolymer (Eigen and Schuster 1979; Kauffman 1993).
In an in vitro molecular evolution process, we have two parts: the evolving system, which consists of phenotype/genotype molecules as evolving entities and the surroundings (or environments) as an experimental setup. For example, we consider a typical case where we try to create specific peptides with high binding affinity to a given receptor protein through in vitro evolution (Fig. 1). The peptides correspond to the evolving system, and the shape of the receptor molecular surface including distribution of the electric charge on it and other experimental conditions of the solvent correspond to the surroundings. In this case, the fitness should be defined as the logarithm of the association constant between the peptide and the receptor, ln K a. Using the logarithm of the association constant means that the fitness is treated in the free-energy level. We must set and control the surroundings properly for correct evaluation of the fitness. Darwinian evolution is considered as an information gaining process from the surroundings. In order to simplify the information gaining process and treat it physically, we focused on the in vitro molecular evolution in a well-controlled environment as an extremely simplified process of biological evolution.
Fig. 1.
Emergence of a specific peptide with a high affinity to a receptor molecule, by absorbing the fitness information
from the surroundings, where T
E represents the evolutionary temperature and T
T represents the thermodynamical temperature. This is a case of M = 1
In this simplified system, an evolution process of a biopolymer is considered as an “adaptive walk or hill-climbing” on the corresponding fitness landscape in the sequence space (Fig. 2 right). Here, the evolving molecular system gains at least two kinds of information originating from the converged sequences and the fitness increment of the evolving biopolymer as the adaptive walker. These represent two aspects of the biomolecular information, extent and content, respectively. The two are interconnected through physicochemical properties of the biopolymer. The interconnection may be analyzed based on the analogy between the evolution process and thermodynamics. In fact, the right side of Fig. 2 is analogous to the conceptual view that protein folding is considered to be related to the downhill walk on the energy landscape in the conformation space (Wolynes and Luthey-Schulten 1997) (Fig. 2 left). Therefore, there have been many studies on the analogy between evolution and thermodynamics.
Fig. 2.
An analogy between protein folding and protein evolution. In protein folding, a folding polypeptide descends the energy landscape by emitting the thermal entropy –ΔH/T (= ΔS
sur) to the surroundings. In protein evolution, we interpret that the evolving sequences climb the fitness landscape by absorbing fitness information
as the negative entropy from the surroundings, which is defined in an experimental setup. Adapted from Aita and Husimi (2006)
Extending the interpretation of evolution by thermodynamics-like concepts, we can clarify a view of the evolution process as an information gaining process from the surroundings. Eigen raised a question about the extent and content of information in biological evolution (Eigen 2000). According to him, the extent of information is related to the constrained volume of the sequence space and can be treated within the classical information theory (Shannon and Weaver 1949). On the other hand, the content of information means the meaning or semantic value of information.
In this review, we summarize two typical studies on the evolutionary dynamics of in vitro evolution, which has been interpreted in terms of thermodynamics-like concepts leading to both aspects of the information concept, its extent and its content. One is a theory for a natural selection type model. That is the quasi-species theory developed by Eigen’s group (Eigen and Schuster 1979; Eigen 2000; Weinberger 2002). The other is a theory of an artificial selection type model, which was developed by the authors (Aita et al. 2004, 2005, 2007; Aita and Husimi 2006). These studies gave a formulation for the information gaining process from the surroundings. Other studies are also reviewed within the framework of the two typical studies.
Natural selection type model of in vitro molecular evolution
We can classify models of in vitro molecular evolution into two types: natural selection type and artificial selection type. In the natural selection type, the fitness is the specific growth rate of evolving molecules. An example of this is self-replicating RNA molecules in a flow reactor. In the artificial selection type, the fitness is one of the physicochemical properties of evolving molecules, e.g., binding free energy to a target receptor, and the selection process is conducted through a cycle of evaluation and amplification by the experimenter. Therefore, in vitro evolution of the artificial selection type is also called directed evolution.
Eigen and Schuster proposed the quasi-species model as the natural selection type model. This model describes the evolutionary dynamics of the ensemble of simple self-replicators in a flow reactor, such as self-replicating RNA molecules (Eigen and Schuster 1979; Eigen 2000). The genome sequence of the self-replicators consists of ν sites, and λ letters are available at each site, that is, there are
possible sequences. Then, each genome sequence is mapped onto the corresponding point in the λ-valued ν-dimensional sequence space. In the replication processes, it is possible to replace each letter with one of the other λ–1 letters with a probability of the mutation rate μ. The mole fraction of a certain species s at time t, denoted by x
s(t), obeys the following differential equation:
![]() |
1 |
![]() |
2 |
where f
s is the replication rate or fitness of s, and m
su is a probability of mutation from s to u. The d(s, u) is the Hamming distance between s and u. D(t) is the dilution rate and works to satisfy
and is given by the mean replication rate
. Equations 1 and 2 mean that any species propagates cooperatively with neighbor species in the sequence space through mutation. It should be noted that the quasi-species model assumes an infinite population and can be applied to any model of fitness landscapes.
Artificial selection type model of in vitro molecular evolution
We considered evolutionary dynamics of the following artificial selection type model (Aita et al. 2004, 2005). M parents (parent sequences) produce N offspring (mutant sequences), and the fitness value of each offspring is evaluated. Subsequently, the best M individuals among the N offspring become new parents in the next generation. N is the library size of mutants to be screened in a single generation. In the reproduction process, d-fold point mutations occur randomly, that is, d represents the Hamming distance between a parent and each of its offspring. This iterative process is called the adaptive walk or hill-climbing, and the parents are regarded as the walkers or climbers with the step length of d on the landscape. The parameters M, N, and d are constant throughout the walking process.
The evolutionary dynamics of a finite population is dependent on local structures of fitness landscapes. We supposed the following NK landscape (Kauffman 1993) in the λ-valued ν-dimensional sequence space (Aita et al. 2005, 2007)1. In this model, an arbitrary site in a sequence interacts with other k sites. The fitness W for a given sequence
is defined by
![]() |
3 |
where w
j is the site fitness, i.e., a fitness contribution from the jth site, and Aj represents a particular letter at the jth site. The value of w
j is given as a function of 1+k letters at the jth site (Aj) and other k sites (Aj1, ..., Ajk). The interacting k sites {j
1, ..., j
k} are randomly chosen from among all of the ν–1 sites except the jth site. Once a set of letters {Aj1, ..., Ajk} at these k sites is given, the value of w
j for an arbitrary letter Aj, w
j(Aj|Aj1, ..., Ajk) is assigned randomly from a given probability distribution. Here, we adopt a discrete uniform distribution in the range [
], where
is a positive constant (
). Let σ
2 be the variance for the uniform distribution, then σ
2 = ε
2/3 for this case. On the whole, the ruggedness of the landscape is controlled by the parameter k. In the case of k = 0, the resulting fitness landscape has a smooth surface and a single global peak. As the k value increases, the surface of the fitness landscape becomes more rugged and many local optima appear.
Analogy between evolution and thermodynamics
Thermodynamic concepts connect the evolution process to information. Here, we review studies related to the analogy between evolution and thermodynamics from various viewpoints.
Iwasa stressed the importance of genetic drift rather than of mutation in a finite population and introduced the free fitness
, where N is population size, and demonstrated that free fitness I always increases with time in the evolution process. This scheme indicates the analogy of free fitness to free energy, the mean fitness to enthalpy, and fluctuation parameter 1/2N to temperature (Iwasa 1988). The special case of N = ∞ corresponds to Fisher’s fundamental theorem of natural selection (Fisher 1930). Blackburne and Hirst conducted a simulation of population dynamics using simple lattice model proteins (Blackburne and Hirst 2005). They also estimated the population using the analogy with Boltzmann distribution in thermodynamics, in which a temperature-like parameter was empirically derived as a function of the mutation rate and selection pressure.
Sato et al. referred to a mathematical relationship between fluctuation V[X]a and response
in a biological system (where
and V[X]a are the average and variance of the variable X at the initial parameter value a), and demonstrated that the relationship they found,
, is similar to Einstein’s relation in the fluctuation-dissipation theorem in Brownian motion (Sato et al. 2003). They confirmed the relationship through an experimental observation in which X represents the logarithm of fluorescence intensity per E. coli cell including mutant GFP proteins and Δa represents the synonymous mutation rate of their genes. Ao presented the relationship between Darwinian evolution and thermodynamics from the viewpoint of Langevin dynamics (Ao 2008). His theory describes the dynamics on a potential surface in genetic frequency space, where each coordinate axis represents the frequency of a species. From the viewpoint of molecular imprinting, Pande et al. developed statistical mechanics of protein folding and design (Pande et al. 1997), introducing the design temperature T
des, which controls the probability of the occurrence of amino acid sequences with low energy (designed sequences) in a given canonical ensemble. They obtained a phase diagram for model heteropolymers in a two dimensional T
des–T (T is thermodynamic temperature) space.
Quasi-species
We summarized the result of the mathematical analysis of the natural selection type model (Eq. 1). In a special case of μ = 0, the solution of Eq. 1 is easily obtained by introducing the variable
. Let s* be the fittest species that has a maximum replication rate among all the n species. In the stationary state, we can observe
![]() |
That is, only the fittest species s* exists in the reactor.
In general cases of μ > 0, Eq. 1 must be transformed in the following manner. Considering the n-dimensional matrix [m
su
f
u], we denote the qth eigenvalue and eigenvector of this matrix by Λq and
(for q = 1,2, ..., n), respectively. By diagonalizing the matrix [m
su
f
u] by the unitary matrix
and introducing
, Eq. 1 is transformed to
![]() |
4 |
where q is designated as the quasi-species. The dilution rate is rewritten by
. The solution of Eq. 4 is easily obtained in the same manner as mentioned above. Let q* be the quasi-species that has a maximum eigenvalue (according to Frobenius theorem,
and
. In the stationary state, we can observe
![]() |
That is, only the quasi-species q* is realized in the reactor. The x s(t) in the stationary state is given by
![]() |
The realized eigenvector
is designated as the quasi-species distribution. The quasi-species distribution is caused by the mutation-selection balance.
It is important to note that the quasi-species distribution is strongly dependent on the shape of the fitness landscape ({f s}) and mutation rate (µ). For proper landscapes, the quasi-species distribution shows a phase transition at several critical points of the mutation rate. When the mutation rate µ exceeds a certain critical point called the error threshold,
![]() |
5 |
an error catastrophe or a localization-delocalization transition occurs. In the delocalization state, all species have the identical mole fraction. In Eq. 5, the species m is the master sequence and
is the mean fitness over all the species except m.
An example of a fitness landscape demonstrating a sharp localization-localization transition is that of asymmetric twin peaks consisting of a sharp high peak and a broad low peak. When µ is very small, the quasi-species members localize at the high peak. When µ becomes greater than some critical value (and less than the error threshold), the quasi-species members become localized at the broad low peak because the population at the broad peak is mutationally robust based on mutational interconnectedness. The transition is very narrow for µ and shows a critical slowing down phenomenon (Husimi 1988; Schuster and Swetina 1988). Wilke et al. called this situation the survival of the flattest (Wilke et al. 2001).
These results can be interpreted by thermodynamics-like concepts. Mutation causes the species to diffuse in the sequence space, while selection causes them to converge on the local area. Therefore, the mutation rate µ corresponds to a temperature-like parameter T. When µ = 0, the fittest s* that has a maximum fitness
is realized. This is analogous to the case of T = 0 in thermodynamics because the thermodynamic system realizes the minimum energy state. On the other hand, when µ > 0, the quasi-species q* that has a maximum eigenvalue
is realized. This is analogous to the case of the thermodynamic system realizing the minimum free energy state. In the above mentioned asymmetric twin peak case, the localization at the broad low peak is an analogous state with an intermediate conformation X of a protein in the unfolding process (native ↔ X ↔ denatured). Therefore, the eigenvalue Λq could be called the free fitness (Husimi 1988).
In thermodynamics, the phase transition temperature between state A and state B is given by
, where ΔH and ΔS are the enthalpy change and entropy change between A and B, respectively. In Eq. 5, the numerator represents energy- or enthalpy-like quantity, and the denominator represents entropy-like quantity
. Therefore, Eq. 5 is analogous to
.
Free fitness
We summarized the result of the statistical analysis of the artificial selection type model. Denoting the mean fitness over the M parents by
for every generation, we focus on the statistical properties of a time course of
through the hill-climbing process. Consider that the hill-climbing starts from the foot of the landscape
. The mean fitness
increases exponentially and tends toward a stationary value denoted by
. In the stationary state, the value of
fluctuates around the attractor
. As a result, under extreme conditions where λ, ν, d, and N have large values2, the attractor
is given by:
| 6 |
where ln(N/M) is the selection pressure because we select the best M individuals from among N offspring, and d(1 + k) is the expected number of the affected sites by random d-fold point mutations because a single point mutation changes the site-fitness values of on average 1 + k sites.
This stationary state is caused by the mutation-selection-random drift balance.
In order to interpret the evolutionary dynamics mentioned above, we introduce the following thermodynamics-like quantities. The analogy between the concepts in evolution and those in thermodynamics is compiled in Table 1. We denote the fitness coordinate by W. The frequency distribution of fitness over all conceivable sequences (of
) is given approximately by the following normal distribution:
| 7 |
where H ≡ εν and
. The average of fitness over the whole sequence space corresponds to the foot of the landscape, while regions where W < 0 correspond to below sea level and are negligible for the adaptive walks that start from random points, which are likely to be located at the foot of the landscape. Since the fitness at the global peak takes about H (= ɛν), H corresponds to the height of the landscape from the foot to the global peak. In this paper, we focus on the regions from the foot of the landscape to the global peak: 0 ≦ W ≦ H.
Table 1.
An analogy between thermodynamics and in vitro evolution
We define the landscape constant k L and the entropy S as a function of W as follows:
![]() |
8 |
![]() |
9 |
| 10 |
Following the definition of thermodynamic temperature, we define the evolutionary temperature T as follows:
| 11 |
where
is a stationary value of mean fitness
for the M walkers. Using Eqs. 6 and 10, T is given by
![]() |
12 |
In Eq. 8, the
means the standard deviation of a change in fitness for a unit Hamming distance. That is, the landscape constant k
L indicates the degree of ruggedness of the landscape. In Eq. 12, the d indicates the degree of diffusion in the sequence space by random mutation, while ln(N/M) indicates the degree of convergence of sequence diversity by selection. T is the ratio of these conflicting effects. By using these parameters, the stationary value
given by Eq. 6 is rewritten by:
| 13 |
According to thermodynamics of protein folding, the most probable energy of a canonical ensemble is given by an equation similar to Eq. 13 (e.g., Eq. 11 in Wolynes and Luthey-Schulten 1997). Therefore, we can say that k L and T are analogous to the Boltzmann constant and thermodynamic temperature of the thermal bath, respectively.
Here, we define the evolutionary potential ϕ and free fitness G for the M walkers with mean fitness
as follows:
| 14 |
| 15 |
Using Eq. 10, ϕ is rewritten as the following convex function of
:
| 16 |
| 17 |
Equation 17 is derived by substituting Eq. 13 into Eq. 16. We can see that the evolutionary potential ϕ and free fitness G take the maximum values at
. Furthermore, ϕ and G are Lyapunov functions of the evolution process (proof is shown in the next section).
In addition to these quantities, we define the evolutionary force X by
| 18 |
| 19 |
Substituting Eq. 14 into Eq. 18, the evolutionary force X is decomposed into the fitness force X
fit and the entropy force
, where
| 20 |
Xfit is caused by a selection event and pushes the walkers upward, while Xent is caused by a mutation event and pushes the walkers downward. The mutation-selection-random drift balance occurs when Xfit and Xent cancel each other out.
Based on the definition we mentioned above, we can describe the dynamics of the adaptive walk as follows (see Fig. 3). Driven by the evolutionary force X, the walkers tend to achieve the maximum free fitness state (or maximum evolutionary potential state) in which the fitness force X
fit and the entropy force X
ent cancel each other out. The evolutionary force X depends strongly on the evolutionary temperature T. Here, consider the walkers are located at the middle point on the landscape. If T = ∞ (which is the case where N/M ≈ 1), then
lies at the foot of the landscape and a negative force (X < 0) acts on the walkers so the walkers are pushed downward. In this case, the maximum entropy state is realized. As T becomes lower (which is the case where
), the
becomes higher up near the top of the landscape and a positive force (X > 0) acts on the walkers so the walkers are pushed upward. In this case, the (nearly) maximum fitness state is realized. If T = 0 (which is the case of d = 0), the walkers cannot move in the sequence space, and evolution does not occur.
Fig. 3.
Interpretation of evolutionary dynamics by thermodynamics-like concepts. The solid and dotted lines represent the entropy S and free fitness G, respectively, as a function of fitness. The right and left dotted lines are for the cases of low and high evolutionary temperature T, respectively.
represents the stationary point, in which G takes the maximum under a given T. The vectors (arrows) represent the evolutionary force X that acts on the walkers with the mean fitness
. Details are described in the text. Adapted from Aita and Husimi (2006)
Fitness fluctuation
We denote the change in mean fitness
after a single generation by
(top of Fig. 4). Since the exploration is done by random sampling of N mutants from among the underlying mutant population, the fitness change
is a stochastic quantity and is described by the theory of extreme value statistics. Let J and Σ be the expectation and standard deviation of
, respectively. These are described in the following based on Aita et al. (2004). .
Fig. 4.
Upper A probability density of the mean-fitness change after a single step walk (=generation) from the mean fitness
. J and Σ represent the expectation and standard deviation of the change. Middle The fitness-information change, ΔIfit, is the fitness change digitized by the fluctuation size,
. Bottom The concept of ΔIfit is analogous to the thermal entropy,
, which is the enthalpy change digitized by the fluctuation size of thermal energy per degree of freedom, kBT. Adapted from Aita and Husimi (2006)
The expectation
is described as follows:
![]() |
21 |
where L is the linear transport coefficient or mobility.
represents the diffusion coefficient for the mean fitness
along the fitness coordinate when walkers perform a random walk in the sequence space. The random walk occurs when N = M because there is no selection pressure (X
fit = 0). Equation 21 is analogous to Einstein’s relation in Brownian motion (Einstein 1905). Sato et al. found a similar scheme to Eq. 21 for the relationship between fluctuation and response in a biological system (Sato et al. 2003). Iguchi has tried to extend Eq. 21 to a case of coevolving biopolymers in a framework similar to Onsager’s reciprocal relations (Iguchi 2008).
The standard deviation
is described as follows:
![]() |
22 |
Σ is the degree of fluctuation of the change in mean fitness
after a single generation, while k
L
T is analogous to the thermal fluctuation energy per degree of freedom k
B
T (k
B is the Boltzmann constant and T is temperature). Therefore, Eq. 22 is analogous to the energy-fluctuation formula in thermo-statistical mechanics.
Information gained through in vitro evolution
Biomolecular information
Suppose an adaptive walk from the foot of the landscape
up to the stationary state
. Here, using entropy S in Eq. 9 and evolutionary potential ϕ in Eq. 14, we introduce the following three quantities as functions of mean fitness
:
| 23 |
| 24 |
| 25 |
| 26 |
and let ΔI fit, ΔI Sha, and ΔI bio be the changes in I fit, I Sha, and I bio, respectively, after a single generation. Since Eq. 24 describes the change in entropy S between the initial state and a certain state (Fig. 5), I Sha is interpreted as the Shannon information (Shannon and Weaver 1949).
Fig. 5.
Schematic representation of the fitness information I fit and Shannon information I Sha. The cone represents a fitness landscape schematically. The height represents the fitness W divided by T, while the area of the cross section represents the entropy S(W) ≡ k L × ln Ω(W), where Ω(W) is the number of all sequences with a given fitness value W. I fit(W) is defined as the change in W/T from the foot of the landscape, while I Sha(W) is defined as the negative change in S from the foot. ΔI fit and ΔI Sha are the changes in these quantities after a single step walk (=generation)
On the other hand, from Eq. 22, the meaning of
is as follows:
| 27 |
We can see that ΔI
fit is the fitness change digitized by the fitness fluctuation size Σ (in the middle of Fig. 4). Here, the analog to digital conversion is realized as the significant figures of the fitness change observed by the adaptive walkers with the observation error Σ. This is analogous to the thermal entropy change –ΔH/T, that is the enthalpy change digitized by the fluctuation size of thermal energy k
B
T, when a system emits the heat –ΔH to the surroundings (bottom of Fig. 4) (Atkins 1978). According to the analogy with thermodynamics, we can interpret
as the negative entropy that the evolving system absorbs from the surroundings (Fig. 2 right). Here, the surroundings mean the experimental setup (e.g., a column of affinity-chromatography) around the biopolymer as an evolving entity (Fig. 1). We designate I
fit as the fitness information. We can say that the evolving entity gains the fitness information from the surroundings (Figs. 1 and 2).
The expectation of ΔI bio is given by
![]() |
28 |
Equation 28 proves the theorem that I
bio
is a Lyapunov function of the evolution process. Therefore, we conclude that the evolution is driven in the direction in which I
bio increases, and then we designate I
bio as the biomolecular information in in vitro evolution.
In the stationary state
, I
fit, I
Sha, and I
bio satisfy the following simple relation:
| 29 |
| 30 |
where t* represents the mean generation (or mean step number) up to the stationary state and is defined as follows:
, where
is the expectation of the mean fitness
at the tth generation. The t* in this model is given approximately by
. Since in each selection step, the best M mutants are selected from among the N mutants, the term k
L ln (N/M) on the right-hand side corresponds to the operational information gain by selection. Therefore, the result shown in Eq. 30 is reasonable in that I
bio in a stationary state is approximately equivalent to the sum of the operational information gain over generations up to the stationary state:
.
Extent and content of information
According to Eigen (2000), the extent of information is related to the constrained volume of the sequence space. Therefore, we define explicitly the extent of information in in vitro evolution in the following. Let p s be the probability (or frequency) of occurrence of sequence s in a population. Entropy for this state is given by
![]() |
31 |
while the maximum entropy is given by
, which is for the case where every sequence in the sequence space occurs with the same probability of
. Particularly, S
max is called the potential (or source) entropy. The extent of information is defined as
![]() |
32 |
Several concepts of the genomic complexity (Adami et al. 2000), R sequence (Schneider 2000; Kim et al. 2003), functional information (Szostak 2003), functional sequence complexity (Durston et al. 2007), and I Sha (Eq. 24) can be classified to I extent. For example, Szostak introduced the functional information, which is defined as –log2 of the fraction of functional sequences that have fitness values (activity of a biopolymer) greater than a specified value (Szostak 2003). For example, suppose all possible RNA sequences of 470 and the fraction of functional sequences among them is 10–11, the functional information in this case is 37 bit compared with 140 bit to specify a unique 70-mer sequence. Szostak suggested the importance of the activity-functional information relationship, which affects the evolvability of the biopolymer.
On the other hand, the content of information means the meaning or semantic value of information. One may say that the concept of fitness corresponds to the content of information in biological evolution. However, the fitness cannot be treated within the same level of Shannon information. Therefore, it is desirable to introduce a novel measure for the content of information.
Pragmatic information
As an answer to the issue of the content of information, Weinberger proposed pragmatic information, which quantifies the impact of a message on the receiver’s subsequent actions (Weinberger 2002). His theory is based on a communication system consisting of a decision maker and an effector (Fig. 6) and the assumption that the semantic value of information stems from its usefulness in making an informed decision.
Fig. 6.
Conceptual framework for pragmatic information. Adapted from Weinberger (2002)
Suppose that the decision maker, in some current state S, receives a set of M messages (m = 1,2,3, ..., M) and chooses a message m among the set with the probability of φ m. Subsequently, based on the practical meaning of the chosen message m, the effector selects an outcome o from among a set of N outcomes (o = 1,2,3, ..., N). The probability that an outcome o is selected without any message is given by q o, and the conditional probability that an outcome o is selected when message m is given is P o|m. The pragmatic information of the message ensemble is defined by
![]() |
33 |
Then, the pragmatic information is the average of the relative entropy between {P o|m} and {q o} over the message ensemble.
Weinberger demonstrated that the pragmatic information is a global Lyapunov function for the quasi-species model. Here, the setup of the flow reactor effectively decides on the fitness of phenotype corresponding to each given genotype, where a phenotype’s fitness is defined to be its reproduction rate. At each time t, the flow reactor receives messages about the fitness of a particular replicator via the number of copies of that replicator’s genome. Prior to receipt of the messages, the initial probability of selecting a species s at random from the system is q s = x s(0). The probability of selecting a species s at subsequent time t is P s|m = x s(t), in which the process’s state at various times is the only message (M = 1) received, that is, φ 1 = 1. Then, the pragmatic information for the quasi-species model is given by
![]() |
34 |
Regardless of arbitrary initial distribution, {x s(0)}, dI pra/dt > 0 holds for all finite times. We can say that the pragmatic information is generated through the process of evolution for the quasi-species and answer Eigen’s call for a value parameter for the level of evolution.
For the natural selection type model (quasi-species model) and artificial selection type model, the Lyapunov functions are the pragmatic information I pra (Eq. 34) and the biomolecular information I bio (Eq. 25), respectively. We have not yet demonstrated the relationship between them. In addition, these concepts should be related with thermodynamic entropy production in the real evolving system and the surroundings (Smith 2008a, 2008b, 2008c), in a similar way as the problem of Maxwell’s demon (Szilard 1929; Brillouin 1956).
Fitness information
We provide further discussion about the concept of the fitness information I fit defined in Eq. 23. The evolving entity (=biopolymer) gains from the surroundings a content of information for adaptation and existence under given conditions. We interpret that the fitness information I fit corresponds to the content of information. For example, concerning the emergence of a specific peptide with a high affinity to a receptor molecule (see Fig. 1), as mentioned in the “Introduction” section, the emerging peptides gain I fit from the receptor under experimental conditions. In this case, fitness should be defined as a natural logarithm of the association constant between the peptide and the receptor: W ≡ ln K a. The value of the content of information is quantified by dividing a fitness change, Δ ln K a, by evolutionary temperature T, or in other words, by scaling Δ ln K a by the accuracy of observation by the adaptive walkers, that is, the fluctuation of the fitness change after a single generation, SD[Δ ln K a]. On the other hand, I Sha represents the extent of information. The biomolecular information in in vitro evolution, I bio, consists of the content and the extent of information (see Fig. 5).
Acknowledgments
We thank Dr. Kazumoto Iguchi for helpful discussions and providing information.
Footnotes
The validity of using the NK landscape was demonstrated in Aita et al. (2007)
The conditions are ln(N/M)
and
, where
is the size of the d-boundary of any sequence. The derivation of Eq. 6 is described in Aita et al. (2007).
Note that we limited the selected references described here to those published after the year 2000, although there are many important references published before that year.
**Aita et al. 2004. Based on additive fitness landscapes, the evolutionary dynamics of the artificial selection type model is analyzed and interpreted by showing an analogy to thermodynamics. This interpretation is extended to quantify information gain from the surroundings (environments) in an evolution process.
**Eigen 2000. A question about the extent and content of information in biological evolution is raised. The quasi-species theory is summarized, and the evolutionary dynamics including phase transitions of the quasi-species is discussed by showing an analogy to thermodynamics and from the viewpoint of the extent and content of information.
**Sato et al. 2003. A mathematical relationship between fluctuation and response in a biological system is described by showing an analogy to the fluctuation-dissipation theorem in physics. An application to an experimental observation is demonstrated.
**Weinberger 2002. As an answer to Eigen’s call for a value parameter, that is a content of information, for biological evolution, a theory of pragmatic information is described. The proof of the pragmatic information being a global Lyapunov function for the quasi-species model is given.
*Adami et al. 2000. A simulation study of the evolution of digital organisms. For analysis of the evolutionary dynamics, the concept of genomic complexity, which is a quantity increasing through the evolution process, is proposed.
*Aita et al. 2005. The thermodynamics-like concepts and information-theoretical concepts proposed in Aita et al. 2004 are extended to Kauffman’s NK fitness landscapes.
*Blackburne and Hirst 2005. A simulation study of population dynamics of lattice model proteins. The population is estimated using the analogy to thermodynamics.
*Szostak 2003. Based on a question as to how we can define and quantify the information content of biopolymer sequences, Szostak introduced the functional information by a concise description.
References
- Adami C, Ofria C, Collier TC. Evolution of biological complexity. Proc Natl Acad Sci USA. 2000;97:4463–4468. doi: 10.1073/pnas.97.9.4463. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Aita T, Husimi Y. An interpretation of evolutionary dynamics in in vitro molecular evolution from thermodynamical and informational viewpoint. Seibutsu Butsuri (Biophysics) 2006;46:137–143. doi: 10.2142/biophys.46.137. [DOI] [Google Scholar]
- Aita T, Morinaga S, Husimi Y. Thermodynamical interpretation of evolutionary dynamics on a fitness landscape in an evolution reactor, I. Bull Math Biol. 2004;66:1371–1403. doi: 10.1016/j.bulm.2004.01.004. [DOI] [PubMed] [Google Scholar]
- Aita T, Morinaga S, Husimi Y. Thermodynamical interpretation of evolutionary dynamics on a fitness landscape in an evolution reactor, II. Bull Math Biol. 2005;66:1371–1403. doi: 10.1016/j.bulm.2004.01.004. [DOI] [PubMed] [Google Scholar]
- Aita T, Hayashi Y, Toyota H, Husimi Y, Urabe I, Yomo T. Extracting characteristic properties of fitness landscape from in vitro molecular evolution: a case study on infectivity of fd phage to E.coli. J Theor Biol. 2007;246:538–550. doi: 10.1016/j.jtbi.2006.12.037. [DOI] [PubMed] [Google Scholar]
- Ao P. Emerging of stochastic dynamical equalities and steady state thermodynamics from Darwinian dynamics. Commun Theor Phys. 2008;49:1073–1090. doi: 10.1088/0253-6102/49/5/01. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Atkins PW (1978) Physical chemistry. Oxford University, New York
- Brillouin L. Science and information theory. New York: Academic Press; 1956. [Google Scholar]
- Blackburne BP, Hirst JD. Population dynamics simulations of functional model proteins. J Chem Phys. 2005;123:154907. doi: 10.1063/1.2056545. [DOI] [PubMed] [Google Scholar]
- Durston KK, Chiu DK, Abel DL, Trevors JT. Measuring the functional sequence complexity of proteins. Theor Biol Med Model. 2007;4:47. doi: 10.1186/1742-4682-4-47. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eigen M. Natural selection: a phase transition? Biophys Chem. 2000;85:101–123. doi: 10.1016/S0301-4622(00)00122-8. [DOI] [PubMed] [Google Scholar]
- Eigen M, Schuster P. The hypercycle. Berlin: Springer; 1979. [Google Scholar]
- Einstein A. On the movement of small particles suspended in stationary liquids required by the molecular-kinetic theory of heat. Ann D Phys. 1905;17:549–560. doi: 10.1002/andp.19053220806. [DOI] [Google Scholar]
- Fisher RA. The genetical theory of natural selection. Oxford: Clarendon; 1930. [Google Scholar]
- Husimi Y. Selective value landscape on the base sequence space and concept of free selective value: a model of quasi-species. Viva Origino. 1988;16:136–141. [Google Scholar]
- Iwasa Y. Free fitness that always increases in evolution. J Theor Biol. 1988;135:265–281. doi: 10.1016/S0022-5193(88)80243-1. [DOI] [PubMed] [Google Scholar]
- Iguchi K. Reciprocal relations in evolutionary processes. Prog Theor Phys. 2008;Suppl 173:235–242. [Google Scholar]
- Kauffman SA. The origin of order. Oxford: Oxford University Press; 1993. [Google Scholar]
- Kim JT, Martinetz T, Polani D. Bioinformatic principles underlying the information content of transcription factor binding sites. J Theor Biol. 2003;220:529–544. doi: 10.1006/jtbi.2003.3153. [DOI] [PubMed] [Google Scholar]
- Maynard-Smith J. Natural selection and the concept of a protein space. Nature. 1970;225:563–564. doi: 10.1038/225563a0. [DOI] [PubMed] [Google Scholar]
- Mills DR, Peterson RL, Spiegelman S. An extracellular Darwinian experiment with a self-duplicating nucleic acid molecule. Proc Natl Acad Sci USA. 1967;58:217–224. doi: 10.1073/pnas.58.1.217. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pande VS, Grosberg AY, Tanaka T. Statistical mechanics of simple models of protein folding and design. Biophys J. 1997;73:3192–3210. doi: 10.1016/S0006-3495(97)78345-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sato K, Ito Y, Yomo T, Kaneko K. On the relation between fluctuation and response in biological systems. Proc Natl Acad Sci USA. 2003;100:14086–14090. doi: 10.1073/pnas.2334996100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schneider TD. Evolution of biological information. Nucleic Acid Res. 2000;28:2794–2799. doi: 10.1093/nar/28.14.2794. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schuster P, Swetina J. Stationary mutant distributions and evolutionary optimization. Bull Math Biol. 1988;50:635–660. doi: 10.1007/BF02460094. [DOI] [PubMed] [Google Scholar]
- Shannon CE, Weaver W (1949) The mathematical theory of communication. University of Illinois Press, Champaign
- Smith E. Thermodynamics of natural selection I: energy flow and the limits on organization. J Theor Biol. 2008;252:185–197. doi: 10.1016/j.jtbi.2008.02.010. [DOI] [PubMed] [Google Scholar]
- Smith E. Thermodynamics of natural selection II: chemical Carnot cycles. J Theor Biol. 2008;252:198–212. doi: 10.1016/j.jtbi.2008.02.008. [DOI] [PubMed] [Google Scholar]
- Smith E. Thermodynamics of natural selection III: Landauer’s principle in computation and chemistry. J Theor Biol. 2008;252:213–220. doi: 10.1016/j.jtbi.2008.02.013. [DOI] [PubMed] [Google Scholar]
- Szilard L. Uber die Entropieverminderung in einem thermodynamicschen System bei eingriffen intelligenter Wesen. Z Physik. 1929;53:840–856. doi: 10.1007/BF01341281. [DOI] [Google Scholar]
- Szostak JW. Functional information: molecular messages. Nature. 2003;423:689. doi: 10.1038/423689a. [DOI] [PubMed] [Google Scholar]
- Weinberger ED. A theory of pragmatic information and its application to the quasi-species model of biological evolution. Biosystems. 2002;66:105–119. doi: 10.1016/S0303-2647(02)00038-2. [DOI] [PubMed] [Google Scholar]
- Wilke CO, Wang JL, Ofria C, Lenski RE, Adami C. Evolution of digital organisms at high mutation rates leads to survival of the flattest. Nature. 2001;412:331–333. doi: 10.1038/35085569. [DOI] [PubMed] [Google Scholar]
- Wolynes PG, Luthey-Schulten Z (1997) The energy landscape theory of protein folding. In: Flyvbjerg H, Hertz J, Jensen MH, Mouritsen OG, Sneppen K (eds) Physics of biological systems: from molecules to species. Springer, New York, pp 61-79

























