Skip to main content
Biophysical Reviews logoLink to Biophysical Reviews
. 2009 Dec 15;2(1):1–11. doi: 10.1007/s12551-009-0021-8

Biomolecular information gained through in vitro evolution

Takuyo Aita 1,, Yuzuru Husimi 2
PMCID: PMC5425671  PMID: 28509942

Abstract

An in vitro evolution is a simplified Darwinian evolution in well-controlled surroundings. This evolution process can be modeled as a hill-climbing or adaptive walk on a fitness landscape in sequence space. The evolving molecular system gains at least two kinds of information originating from the converged sequences and the fitness increment of the evolving biopolymer as the adaptive walker. These two represent two aspects of the biomolecular information, its extent and its content, respectively. Here, we review studies related to formulation of the “content” and “extent” of biomolecular information. The two aspects are interconnected through physicochemical properties of the biopolymer, contrary to the case of conventional information, which seems to be independent of matter. The interconnection was analyzed based on the analogy between the evolution process and thermodynamics. The linear combination of the two by a temperature-like fluctuation factor resulted in a free-energy-like monotonically increasing function during the evolution process.

Keywords: Biological information, Fitness landscape, Free fitness, In vitro evolution, Pragmatic information, Quasi-species, Sequence space

Introduction

In vitro evolution is an artificial molecular evolution that is conducted in a laboratory and driven by the Darwinian evolution mechanism. Study in this field began with Spiegelman’s experiment in 1967 (Mills et al. 1967) and has been further developed in both science and engineering, that is, the quest for the principle of emergence of functional biopolymers and practical applications to industry and medicine, e.g., evolutionary (or adaptive) drug design.

Here, we introduce several important terms to comprehend in vitro evolution theoretically. A quantitative measure of a molecular phenotype, that is, a certain physicochemical property (such as enzymatic activity or affinity to a ligand molecule or replication rate constant) of an evolving molecule is designated as the fitness. The conceptual space of all conceivable base or amino acid sequences (=genotypes) is designated as the sequence space (Maynard-Smith 1970). Each of the conceivable sequences is mapped onto their corresponding points in the sequence space. A distance between two arbitrary points is measured with the Hamming distance between the two corresponding sequences. The scalar field constructed by plotting the fitness value of each sequence into the corresponding point in the sequence space is designated as the fitness landscape, which is regarded as the evolutionary attribute of the biopolymer (Eigen and Schuster 1979; Kauffman 1993).

In an in vitro molecular evolution process, we have two parts: the evolving system, which consists of phenotype/genotype molecules as evolving entities and the surroundings (or environments) as an experimental setup. For example, we consider a typical case where we try to create specific peptides with high binding affinity to a given receptor protein through in vitro evolution (Fig. 1). The peptides correspond to the evolving system, and the shape of the receptor molecular surface including distribution of the electric charge on it and other experimental conditions of the solvent correspond to the surroundings. In this case, the fitness should be defined as the logarithm of the association constant between the peptide and the receptor, ln K a. Using the logarithm of the association constant means that the fitness is treated in the free-energy level. We must set and control the surroundings properly for correct evaluation of the fitness. Darwinian evolution is considered as an information gaining process from the surroundings. In order to simplify the information gaining process and treat it physically, we focused on the in vitro molecular evolution in a well-controlled environment as an extremely simplified process of biological evolution.

Fig. 1.

Fig. 1

Emergence of a specific peptide with a high affinity to a receptor molecule, by absorbing the fitness information Inline graphic from the surroundings, where T E represents the evolutionary temperature and T T represents the thermodynamical temperature. This is a case of M = 1

In this simplified system, an evolution process of a biopolymer is considered as an “adaptive walk or hill-climbing” on the corresponding fitness landscape in the sequence space (Fig. 2 right). Here, the evolving molecular system gains at least two kinds of information originating from the converged sequences and the fitness increment of the evolving biopolymer as the adaptive walker. These represent two aspects of the biomolecular information, extent and content, respectively. The two are interconnected through physicochemical properties of the biopolymer. The interconnection may be analyzed based on the analogy between the evolution process and thermodynamics. In fact, the right side of Fig. 2 is analogous to the conceptual view that protein folding is considered to be related to the downhill walk on the energy landscape in the conformation space (Wolynes and Luthey-Schulten 1997) (Fig. 2 left). Therefore, there have been many studies on the analogy between evolution and thermodynamics.

Fig. 2.

Fig. 2

An analogy between protein folding and protein evolution. In protein folding, a folding polypeptide descends the energy landscape by emitting the thermal entropy –ΔH/T (= ΔS sur) to the surroundings. In protein evolution, we interpret that the evolving sequences climb the fitness landscape by absorbing fitness information Inline graphic as the negative entropy from the surroundings, which is defined in an experimental setup. Adapted from Aita and Husimi (2006)

Extending the interpretation of evolution by thermodynamics-like concepts, we can clarify a view of the evolution process as an information gaining process from the surroundings. Eigen raised a question about the extent and content of information in biological evolution (Eigen 2000). According to him, the extent of information is related to the constrained volume of the sequence space and can be treated within the classical information theory (Shannon and Weaver 1949). On the other hand, the content of information means the meaning or semantic value of information.

In this review, we summarize two typical studies on the evolutionary dynamics of in vitro evolution, which has been interpreted in terms of thermodynamics-like concepts leading to both aspects of the information concept, its extent and its content. One is a theory for a natural selection type model. That is the quasi-species theory developed by Eigen’s group (Eigen and Schuster 1979; Eigen 2000; Weinberger 2002). The other is a theory of an artificial selection type model, which was developed by the authors (Aita et al. 2004, 2005, 2007; Aita and Husimi 2006). These studies gave a formulation for the information gaining process from the surroundings. Other studies are also reviewed within the framework of the two typical studies.

Natural selection type model of in vitro molecular evolution

We can classify models of in vitro molecular evolution into two types: natural selection type and artificial selection type. In the natural selection type, the fitness is the specific growth rate of evolving molecules. An example of this is self-replicating RNA molecules in a flow reactor. In the artificial selection type, the fitness is one of the physicochemical properties of evolving molecules, e.g., binding free energy to a target receptor, and the selection process is conducted through a cycle of evaluation and amplification by the experimenter. Therefore, in vitro evolution of the artificial selection type is also called directed evolution.

Eigen and Schuster proposed the quasi-species model as the natural selection type model. This model describes the evolutionary dynamics of the ensemble of simple self-replicators in a flow reactor, such as self-replicating RNA molecules (Eigen and Schuster 1979; Eigen 2000). The genome sequence of the self-replicators consists of ν sites, and λ letters are available at each site, that is, there are Inline graphic possible sequences. Then, each genome sequence is mapped onto the corresponding point in the λ-valued ν-dimensional sequence space. In the replication processes, it is possible to replace each letter with one of the other λ–1 letters with a probability of the mutation rate μ. The mole fraction of a certain species s at time t, denoted by x s(t), obeys the following differential equation:

graphic file with name M2.gif 1
graphic file with name M3.gif 2

where f s is the replication rate or fitness of s, and m su is a probability of mutation from s to u. The d(s, u) is the Hamming distance between s and u. D(t) is the dilution rate and works to satisfy Inline graphic and is given by the mean replication rate Inline graphic. Equations 1 and 2 mean that any species propagates cooperatively with neighbor species in the sequence space through mutation. It should be noted that the quasi-species model assumes an infinite population and can be applied to any model of fitness landscapes.

Artificial selection type model of in vitro molecular evolution

We considered evolutionary dynamics of the following artificial selection type model (Aita et al. 2004, 2005). M parents (parent sequences) produce N offspring (mutant sequences), and the fitness value of each offspring is evaluated. Subsequently, the best M individuals among the N offspring become new parents in the next generation. N is the library size of mutants to be screened in a single generation. In the reproduction process, d-fold point mutations occur randomly, that is, d represents the Hamming distance between a parent and each of its offspring. This iterative process is called the adaptive walk or hill-climbing, and the parents are regarded as the walkers or climbers with the step length of d on the landscape. The parameters M, N, and d are constant throughout the walking process.

The evolutionary dynamics of a finite population is dependent on local structures of fitness landscapes. We supposed the following NK landscape (Kauffman 1993) in the λ-valued ν-dimensional sequence space (Aita et al. 2005, 2007)1. In this model, an arbitrary site in a sequence interacts with other k sites. The fitness W for a given sequence Inline graphic is defined by

graphic file with name M7.gif 3

where w j is the site fitness, i.e., a fitness contribution from the jth site, and Aj represents a particular letter at the jth site. The value of w j is given as a function of 1+k letters at the jth site (Aj) and other k sites (Aj1, ..., Ajk). The interacting k sites {j 1, ..., j k} are randomly chosen from among all of the ν–1 sites except the jth site. Once a set of letters {Aj1, ..., Ajk} at these k sites is given, the value of w j for an arbitrary letter Aj, w j(Aj|Aj1, ..., Ajk) is assigned randomly from a given probability distribution. Here, we adopt a discrete uniform distribution in the range [Inline graphic], where Inline graphic is a positive constant (Inline graphic). Let σ 2 be the variance for the uniform distribution, then σ 2 = ε 2/3 for this case. On the whole, the ruggedness of the landscape is controlled by the parameter k. In the case of k = 0, the resulting fitness landscape has a smooth surface and a single global peak. As the k value increases, the surface of the fitness landscape becomes more rugged and many local optima appear.

Analogy between evolution and thermodynamics

Thermodynamic concepts connect the evolution process to information. Here, we review studies related to the analogy between evolution and thermodynamics from various viewpoints.

Iwasa stressed the importance of genetic drift rather than of mutation in a finite population and introduced the free fitness Inline graphic, where N is population size, and demonstrated that free fitness I always increases with time in the evolution process. This scheme indicates the analogy of free fitness to free energy, the mean fitness to enthalpy, and fluctuation parameter 1/2N to temperature (Iwasa 1988). The special case of N = ∞ corresponds to Fisher’s fundamental theorem of natural selection (Fisher 1930). Blackburne and Hirst conducted a simulation of population dynamics using simple lattice model proteins (Blackburne and Hirst 2005). They also estimated the population using the analogy with Boltzmann distribution in thermodynamics, in which a temperature-like parameter was empirically derived as a function of the mutation rate and selection pressure.

Sato et al. referred to a mathematical relationship between fluctuation V[X]a and response Inline graphic in a biological system (where Inline graphic and V[X]a are the average and variance of the variable X at the initial parameter value a), and demonstrated that the relationship they found, Inline graphic, is similar to Einstein’s relation in the fluctuation-dissipation theorem in Brownian motion (Sato et al. 2003). They confirmed the relationship through an experimental observation in which X represents the logarithm of fluorescence intensity per E. coli cell including mutant GFP proteins and Δa represents the synonymous mutation rate of their genes. Ao presented the relationship between Darwinian evolution and thermodynamics from the viewpoint of Langevin dynamics (Ao 2008). His theory describes the dynamics on a potential surface in genetic frequency space, where each coordinate axis represents the frequency of a species. From the viewpoint of molecular imprinting, Pande et al. developed statistical mechanics of protein folding and design (Pande et al. 1997), introducing the design temperature T des, which controls the probability of the occurrence of amino acid sequences with low energy (designed sequences) in a given canonical ensemble. They obtained a phase diagram for model heteropolymers in a two dimensional T desT (T is thermodynamic temperature) space.

Quasi-species

We summarized the result of the mathematical analysis of the natural selection type model (Eq. 1). In a special case of μ = 0, the solution of Eq. 1 is easily obtained by introducing the variable Inline graphic. Let s* be the fittest species that has a maximum replication rate among all the n species. In the stationary state, we can observe

graphic file with name M16.gif

That is, only the fittest species s* exists in the reactor.

In general cases of μ > 0, Eq. 1 must be transformed in the following manner. Considering the n-dimensional matrix [m su f u], we denote the qth eigenvalue and eigenvector of this matrix by Λq and Inline graphic (for q = 1,2, ..., n), respectively. By diagonalizing the matrix [m su f u] by the unitary matrix Inline graphic and introducing Inline graphic, Eq. 1 is transformed to

graphic file with name M20.gif 4

where q is designated as the quasi-species. The dilution rate is rewritten by Inline graphic. The solution of Eq. 4 is easily obtained in the same manner as mentioned above. Let q* be the quasi-species that has a maximum eigenvalue (according to Frobenius theorem, Inline graphic and Inline graphic. In the stationary state, we can observe

graphic file with name M24.gif

That is, only the quasi-species q* is realized in the reactor. The x s(t) in the stationary state is given by

graphic file with name M25.gif

The realized eigenvector Inline graphic is designated as the quasi-species distribution. The quasi-species distribution is caused by the mutation-selection balance.

It is important to note that the quasi-species distribution is strongly dependent on the shape of the fitness landscape ({f s}) and mutation rate (µ). For proper landscapes, the quasi-species distribution shows a phase transition at several critical points of the mutation rate. When the mutation rate µ exceeds a certain critical point called the error threshold,

graphic file with name M27.gif 5

an error catastrophe or a localization-delocalization transition occurs. In the delocalization state, all species have the identical mole fraction. In Eq. 5, the species m is the master sequence and Inline graphic is the mean fitness over all the species except m.

An example of a fitness landscape demonstrating a sharp localization-localization transition is that of asymmetric twin peaks consisting of a sharp high peak and a broad low peak. When µ is very small, the quasi-species members localize at the high peak. When µ becomes greater than some critical value (and less than the error threshold), the quasi-species members become localized at the broad low peak because the population at the broad peak is mutationally robust based on mutational interconnectedness. The transition is very narrow for µ and shows a critical slowing down phenomenon (Husimi 1988; Schuster and Swetina 1988). Wilke et al. called this situation the survival of the flattest (Wilke et al. 2001).

These results can be interpreted by thermodynamics-like concepts. Mutation causes the species to diffuse in the sequence space, while selection causes them to converge on the local area. Therefore, the mutation rate µ corresponds to a temperature-like parameter T. When µ = 0, the fittest s* that has a maximum fitness Inline graphic is realized. This is analogous to the case of T = 0 in thermodynamics because the thermodynamic system realizes the minimum energy state. On the other hand, when µ > 0, the quasi-species q* that has a maximum eigenvalue Inline graphic is realized. This is analogous to the case of the thermodynamic system realizing the minimum free energy state. In the above mentioned asymmetric twin peak case, the localization at the broad low peak is an analogous state with an intermediate conformation X of a protein in the unfolding process (native ↔ X ↔ denatured). Therefore, the eigenvalue Λq could be called the free fitness (Husimi 1988).

In thermodynamics, the phase transition temperature between state A and state B is given by Inline graphic, where ΔH and ΔS are the enthalpy change and entropy change between A and B, respectively. In Eq. 5, the numerator represents energy- or enthalpy-like quantity, and the denominator represents entropy-like quantity Inline graphic. Therefore, Eq. 5 is analogous to Inline graphic.

Free fitness

We summarized the result of the statistical analysis of the artificial selection type model. Denoting the mean fitness over the M parents by Inline graphic for every generation, we focus on the statistical properties of a time course of Inline graphic through the hill-climbing process. Consider that the hill-climbing starts from the foot of the landscape Inline graphic. The mean fitness Inline graphic increases exponentially and tends toward a stationary value denoted by Inline graphic. In the stationary state, the value of Inline graphic fluctuates around the attractor Inline graphic. As a result, under extreme conditions where λ, ν, d, and N have large values2, the attractor Inline graphic is given by:

graphic file with name 12551_2009_21_Equ1_HTML.gif 6

where ln(N/M) is the selection pressure because we select the best M individuals from among N offspring, and d(1 + k) is the expected number of the affected sites by random d-fold point mutations because a single point mutation changes the site-fitness values of on average 1 + k sites.

This stationary state is caused by the mutation-selection-random drift balance.

In order to interpret the evolutionary dynamics mentioned above, we introduce the following thermodynamics-like quantities. The analogy between the concepts in evolution and those in thermodynamics is compiled in Table 1. We denote the fitness coordinate by W. The frequency distribution of fitness over all conceivable sequences (of Inline graphic) is given approximately by the following normal distribution:

graphic file with name 12551_2009_21_Equ2_HTML.gif 7

where Hεν and Inline graphic. The average of fitness over the whole sequence space corresponds to the foot of the landscape, while regions where W < 0 correspond to below sea level and are negligible for the adaptive walks that start from random points, which are likely to be located at the foot of the landscape. Since the fitness at the global peak takes about H (= ɛν), H corresponds to the height of the landscape from the foot to the global peak. In this paper, we focus on the regions from the foot of the landscape to the global peak: 0 ≦ WH.

Table 1.

An analogy between thermodynamics and in vitro evolution

graphic file with name 12551_2009_21_Tab1_HTML.jpg

We define the landscape constant k L and the entropy S as a function of W as follows:

graphic file with name M35.gif 8
graphic file with name M36.gif 9
graphic file with name 12551_2009_21_Equ3_HTML.gif 10

Following the definition of thermodynamic temperature, we define the evolutionary temperature T as follows:

graphic file with name 12551_2009_21_Equ4_HTML.gif 11

where Inline graphic is a stationary value of mean fitness Inline graphic for the M walkers. Using Eqs. 6 and 10, T is given by

graphic file with name M37.gif 12

In Eq. 8, the Inline graphic means the standard deviation of a change in fitness for a unit Hamming distance. That is, the landscape constant k L indicates the degree of ruggedness of the landscape. In Eq. 12, the d indicates the degree of diffusion in the sequence space by random mutation, while ln(N/M) indicates the degree of convergence of sequence diversity by selection. T is the ratio of these conflicting effects. By using these parameters, the stationary value Inline graphic given by Eq. 6 is rewritten by:

graphic file with name 12551_2009_21_Equ5_HTML.gif 13

According to thermodynamics of protein folding, the most probable energy of a canonical ensemble is given by an equation similar to Eq. 13 (e.g., Eq. 11 in Wolynes and Luthey-Schulten 1997). Therefore, we can say that k L and T are analogous to the Boltzmann constant and thermodynamic temperature of the thermal bath, respectively.

Here, we define the evolutionary potential ϕ and free fitness G for the M walkers with mean fitness Inline graphic as follows:

graphic file with name 12551_2009_21_Equ6_HTML.gif 14
graphic file with name 12551_2009_21_Equ7_HTML.gif 15

Using Eq. 10, ϕ is rewritten as the following convex function of Inline graphic:

graphic file with name 12551_2009_21_Equ8_HTML.gif 16
graphic file with name 12551_2009_21_Equ9_HTML.gif 17

Equation 17 is derived by substituting Eq. 13 into Eq. 16. We can see that the evolutionary potential ϕ and free fitness G take the maximum values at Inline graphic. Furthermore, ϕ and G are Lyapunov functions of the evolution process (proof is shown in the next section).

In addition to these quantities, we define the evolutionary force X by

graphic file with name 12551_2009_21_Equ10_HTML.gif 18
graphic file with name 12551_2009_21_Equ11_HTML.gif 19

Substituting Eq. 14 into Eq. 18, the evolutionary force X is decomposed into the fitness force X fit and the entropy force Inline graphic, where

graphic file with name 12551_2009_21_Equ12_HTML.gif 20

Xfit is caused by a selection event and pushes the walkers upward, while Xent is caused by a mutation event and pushes the walkers downward. The mutation-selection-random drift balance occurs when Xfit and Xent cancel each other out.

Based on the definition we mentioned above, we can describe the dynamics of the adaptive walk as follows (see Fig. 3). Driven by the evolutionary force X, the walkers tend to achieve the maximum free fitness state (or maximum evolutionary potential state) in which the fitness force X fit and the entropy force X ent cancel each other out. The evolutionary force X depends strongly on the evolutionary temperature T. Here, consider the walkers are located at the middle point on the landscape. If T = ∞ (which is the case where N/M ≈ 1), then Inline graphic lies at the foot of the landscape and a negative force (X < 0) acts on the walkers so the walkers are pushed downward. In this case, the maximum entropy state is realized. As T becomes lower (which is the case where Inline graphic), the Inline graphic becomes higher up near the top of the landscape and a positive force (X > 0) acts on the walkers so the walkers are pushed upward. In this case, the (nearly) maximum fitness state is realized. If T = 0 (which is the case of d = 0), the walkers cannot move in the sequence space, and evolution does not occur.

Fig. 3.

Fig. 3

Interpretation of evolutionary dynamics by thermodynamics-like concepts. The solid and dotted lines represent the entropy S and free fitness G, respectively, as a function of fitness. The right and left dotted lines are for the cases of low and high evolutionary temperature T, respectively. Inline graphic represents the stationary point, in which G takes the maximum under a given T. The vectors (arrows) represent the evolutionary force X that acts on the walkers with the mean fitness Inline graphic. Details are described in the text. Adapted from Aita and Husimi (2006)

Fitness fluctuation

We denote the change in mean fitness Inline graphic after a single generation by Inline graphic (top of Fig. 4). Since the exploration is done by random sampling of N mutants from among the underlying mutant population, the fitness change Inline graphic is a stochastic quantity and is described by the theory of extreme value statistics. Let J and Σ be the expectation and standard deviation of Inline graphic, respectively. These are described in the following based on Aita et al. (2004). .

Fig. 4.

Fig. 4

Upper A probability density of the mean-fitness change after a single step walk (=generation) from the mean fitness Inline graphic. J and Σ represent the expectation and standard deviation of the change. Middle The fitness-information change, ΔIfit, is the fitness change digitized by the fluctuation size, Inline graphic. Bottom The concept of ΔIfit is analogous to the thermal entropy, Inline graphic, which is the enthalpy change digitized by the fluctuation size of thermal energy per degree of freedom, kBT. Adapted from Aita and Husimi (2006)

The expectation Inline graphic is described as follows:

graphic file with name M43.gif 21

where L is the linear transport coefficient or mobility. Inline graphic represents the diffusion coefficient for the mean fitness Inline graphic along the fitness coordinate when walkers perform a random walk in the sequence space. The random walk occurs when N = M because there is no selection pressure (X fit = 0). Equation 21 is analogous to Einstein’s relation in Brownian motion (Einstein 1905). Sato et al. found a similar scheme to Eq. 21 for the relationship between fluctuation and response in a biological system (Sato et al. 2003). Iguchi has tried to extend Eq. 21 to a case of coevolving biopolymers in a framework similar to Onsager’s reciprocal relations (Iguchi 2008).

The standard deviation Inline graphic is described as follows:

graphic file with name M45.gif 22

Σ is the degree of fluctuation of the change in mean fitness Inline graphic after a single generation, while k L T is analogous to the thermal fluctuation energy per degree of freedom k B T (k B is the Boltzmann constant and T is temperature). Therefore, Eq. 22 is analogous to the energy-fluctuation formula in thermo-statistical mechanics.

Information gained through in vitro evolution

Biomolecular information

Suppose an adaptive walk from the foot of the landscape Inline graphic up to the stationary state Inline graphic. Here, using entropy S in Eq. 9 and evolutionary potential ϕ in Eq. 14, we introduce the following three quantities as functions of mean fitness Inline graphic:

graphic file with name 12551_2009_21_Equ13_HTML.gif 23
graphic file with name 12551_2009_21_Equ14_HTML.gif 24
graphic file with name 12551_2009_21_Equ15_HTML.gif 25
graphic file with name 12551_2009_21_Equ16_HTML.gif 26

and let ΔI fit, ΔI Sha, and ΔI bio be the changes in I fit, I Sha, and I bio, respectively, after a single generation. Since Eq. 24 describes the change in entropy S between the initial state and a certain state (Fig. 5), I Sha is interpreted as the Shannon information (Shannon and Weaver 1949).

Fig. 5.

Fig. 5

Schematic representation of the fitness information I fit and Shannon information I Sha. The cone represents a fitness landscape schematically. The height represents the fitness W divided by T, while the area of the cross section represents the entropy S(W) ≡ k L × ln Ω(W), where Ω(W) is the number of all sequences with a given fitness value W. I fit(W) is defined as the change in W/T from the foot of the landscape, while I Sha(W) is defined as the negative change in S from the foot. ΔI fit and ΔI Sha are the changes in these quantities after a single step walk (=generation)

On the other hand, from Eq. 22, the meaning of Inline graphic is as follows:

graphic file with name 12551_2009_21_Equ17_HTML.gif 27

We can see that ΔI fit is the fitness change digitized by the fitness fluctuation size Σ (in the middle of Fig. 4). Here, the analog to digital conversion is realized as the significant figures of the fitness change observed by the adaptive walkers with the observation error Σ. This is analogous to the thermal entropy change –ΔH/T, that is the enthalpy change digitized by the fluctuation size of thermal energy k B T, when a system emits the heat –ΔH to the surroundings (bottom of Fig. 4) (Atkins 1978). According to the analogy with thermodynamics, we can interpret Inline graphic as the negative entropy that the evolving system absorbs from the surroundings (Fig. 2 right). Here, the surroundings mean the experimental setup (e.g., a column of affinity-chromatography) around the biopolymer as an evolving entity (Fig. 1). We designate I fit as the fitness information. We can say that the evolving entity gains the fitness information from the surroundings (Figs. 1 and 2).

The expectation of ΔI bio is given by

graphic file with name M46.gif 28

Equation 28 proves the theorem that I bio Inline graphic is a Lyapunov function of the evolution process. Therefore, we conclude that the evolution is driven in the direction in which I bio increases, and then we designate I bio as the biomolecular information in in vitro evolution.

In the stationary state Inline graphic, I fit, I Sha, and I bio satisfy the following simple relation:

graphic file with name 12551_2009_21_Equ18_HTML.gif 29
graphic file with name 12551_2009_21_Equ19_HTML.gif 30

where t* represents the mean generation (or mean step number) up to the stationary state and is defined as follows: Inline graphic, where Inline graphic is the expectation of the mean fitness Inline graphic at the tth generation. The t* in this model is given approximately by Inline graphic. Since in each selection step, the best M mutants are selected from among the N mutants, the term k L ln (N/M) on the right-hand side corresponds to the operational information gain by selection. Therefore, the result shown in Eq. 30 is reasonable in that I bio in a stationary state is approximately equivalent to the sum of the operational information gain over generations up to the stationary state: Inline graphic.

Extent and content of information

According to Eigen (2000), the extent of information is related to the constrained volume of the sequence space. Therefore, we define explicitly the extent of information in in vitro evolution in the following. Let p s be the probability (or frequency) of occurrence of sequence s in a population. Entropy for this state is given by

graphic file with name M49.gif 31

while the maximum entropy is given by Inline graphic, which is for the case where every sequence in the sequence space occurs with the same probability of Inline graphic. Particularly, S max is called the potential (or source) entropy. The extent of information is defined as

graphic file with name M52.gif 32

Several concepts of the genomic complexity (Adami et al. 2000), R sequence (Schneider 2000; Kim et al. 2003), functional information (Szostak 2003), functional sequence complexity (Durston et al. 2007), and I Sha (Eq. 24) can be classified to I extent. For example, Szostak introduced the functional information, which is defined as –log2 of the fraction of functional sequences that have fitness values (activity of a biopolymer) greater than a specified value (Szostak 2003). For example, suppose all possible RNA sequences of 470 and the fraction of functional sequences among them is 10–11, the functional information in this case is 37 bit compared with 140 bit to specify a unique 70-mer sequence. Szostak suggested the importance of the activity-functional information relationship, which affects the evolvability of the biopolymer.

On the other hand, the content of information means the meaning or semantic value of information. One may say that the concept of fitness corresponds to the content of information in biological evolution. However, the fitness cannot be treated within the same level of Shannon information. Therefore, it is desirable to introduce a novel measure for the content of information.

Pragmatic information

As an answer to the issue of the content of information, Weinberger proposed pragmatic information, which quantifies the impact of a message on the receiver’s subsequent actions (Weinberger 2002). His theory is based on a communication system consisting of a decision maker and an effector (Fig. 6) and the assumption that the semantic value of information stems from its usefulness in making an informed decision.

Fig. 6.

Fig. 6

Conceptual framework for pragmatic information. Adapted from Weinberger (2002)

Suppose that the decision maker, in some current state S, receives a set of M messages (m = 1,2,3, ..., M) and chooses a message m among the set with the probability of φ m. Subsequently, based on the practical meaning of the chosen message m, the effector selects an outcome o from among a set of N outcomes (o = 1,2,3, ..., N). The probability that an outcome o is selected without any message is given by q o, and the conditional probability that an outcome o is selected when message m is given is P o|m. The pragmatic information of the message ensemble is defined by

graphic file with name M53.gif 33

Then, the pragmatic information is the average of the relative entropy between {P o|m} and {q o} over the message ensemble.

Weinberger demonstrated that the pragmatic information is a global Lyapunov function for the quasi-species model. Here, the setup of the flow reactor effectively decides on the fitness of phenotype corresponding to each given genotype, where a phenotype’s fitness is defined to be its reproduction rate. At each time t, the flow reactor receives messages about the fitness of a particular replicator via the number of copies of that replicator’s genome. Prior to receipt of the messages, the initial probability of selecting a species s at random from the system is q s = x s(0). The probability of selecting a species s at subsequent time t is P s|m = x s(t), in which the process’s state at various times is the only message (M = 1) received, that is, φ 1 = 1. Then, the pragmatic information for the quasi-species model is given by

graphic file with name M54.gif 34

Regardless of arbitrary initial distribution, {x s(0)}, dI pra/dt > 0 holds for all finite times. We can say that the pragmatic information is generated through the process of evolution for the quasi-species and answer Eigen’s call for a value parameter for the level of evolution.

For the natural selection type model (quasi-species model) and artificial selection type model, the Lyapunov functions are the pragmatic information I pra (Eq. 34) and the biomolecular information I bio (Eq. 25), respectively. We have not yet demonstrated the relationship between them. In addition, these concepts should be related with thermodynamic entropy production in the real evolving system and the surroundings (Smith 2008a, 2008b, 2008c), in a similar way as the problem of Maxwell’s demon (Szilard 1929; Brillouin 1956).

Fitness information

We provide further discussion about the concept of the fitness information I fit defined in Eq. 23. The evolving entity (=biopolymer) gains from the surroundings a content of information for adaptation and existence under given conditions. We interpret that the fitness information I fit corresponds to the content of information. For example, concerning the emergence of a specific peptide with a high affinity to a receptor molecule (see Fig. 1), as mentioned in the “Introduction” section, the emerging peptides gain I fit from the receptor under experimental conditions. In this case, fitness should be defined as a natural logarithm of the association constant between the peptide and the receptor: W ≡ ln K a. The value of the content of information is quantified by dividing a fitness change, Δ ln K a, by evolutionary temperature T, or in other words, by scaling Δ ln K a by the accuracy of observation by the adaptive walkers, that is, the fluctuation of the fitness change after a single generation, SD[Δ ln K a]. On the other hand, I Sha represents the extent of information. The biomolecular information in in vitro evolution, I bio, consists of the content and the extent of information (see Fig. 5).

Acknowledgments

We thank Dr. Kazumoto Iguchi for helpful discussions and providing information.

Footnotes

1

The validity of using the NK landscape was demonstrated in Aita et al. (2007)

2

The conditions are ln(N/M) Inline graphic and Inline graphic, where Inline graphic is the size of the d-boundary of any sequence. The derivation of Eq. 6 is described in Aita et al. (2007).

Note that we limited the selected references described here to those published after the year 2000, although there are many important references published before that year.

**Aita et al. 2004. Based on additive fitness landscapes, the evolutionary dynamics of the artificial selection type model is analyzed and interpreted by showing an analogy to thermodynamics. This interpretation is extended to quantify information gain from the surroundings (environments) in an evolution process.

**Eigen 2000. A question about the extent and content of information in biological evolution is raised. The quasi-species theory is summarized, and the evolutionary dynamics including phase transitions of the quasi-species is discussed by showing an analogy to thermodynamics and from the viewpoint of the extent and content of information.

**Sato et al. 2003. A mathematical relationship between fluctuation and response in a biological system is described by showing an analogy to the fluctuation-dissipation theorem in physics. An application to an experimental observation is demonstrated.

**Weinberger 2002. As an answer to Eigen’s call for a value parameter, that is a content of information, for biological evolution, a theory of pragmatic information is described. The proof of the pragmatic information being a global Lyapunov function for the quasi-species model is given.

*Adami et al. 2000. A simulation study of the evolution of digital organisms. For analysis of the evolutionary dynamics, the concept of genomic complexity, which is a quantity increasing through the evolution process, is proposed.

*Aita et al. 2005. The thermodynamics-like concepts and information-theoretical concepts proposed in Aita et al. 2004 are extended to Kauffman’s NK fitness landscapes.

*Blackburne and Hirst 2005. A simulation study of population dynamics of lattice model proteins. The population is estimated using the analogy to thermodynamics.

*Szostak 2003. Based on a question as to how we can define and quantify the information content of biopolymer sequences, Szostak introduced the functional information by a concise description.

References

  1. Adami C, Ofria C, Collier TC. Evolution of biological complexity. Proc Natl Acad Sci USA. 2000;97:4463–4468. doi: 10.1073/pnas.97.9.4463. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Aita T, Husimi Y. An interpretation of evolutionary dynamics in in vitro molecular evolution from thermodynamical and informational viewpoint. Seibutsu Butsuri (Biophysics) 2006;46:137–143. doi: 10.2142/biophys.46.137. [DOI] [Google Scholar]
  3. Aita T, Morinaga S, Husimi Y. Thermodynamical interpretation of evolutionary dynamics on a fitness landscape in an evolution reactor, I. Bull Math Biol. 2004;66:1371–1403. doi: 10.1016/j.bulm.2004.01.004. [DOI] [PubMed] [Google Scholar]
  4. Aita T, Morinaga S, Husimi Y. Thermodynamical interpretation of evolutionary dynamics on a fitness landscape in an evolution reactor, II. Bull Math Biol. 2005;66:1371–1403. doi: 10.1016/j.bulm.2004.01.004. [DOI] [PubMed] [Google Scholar]
  5. Aita T, Hayashi Y, Toyota H, Husimi Y, Urabe I, Yomo T. Extracting characteristic properties of fitness landscape from in vitro molecular evolution: a case study on infectivity of fd phage to E.coli. J Theor Biol. 2007;246:538–550. doi: 10.1016/j.jtbi.2006.12.037. [DOI] [PubMed] [Google Scholar]
  6. Ao P. Emerging of stochastic dynamical equalities and steady state thermodynamics from Darwinian dynamics. Commun Theor Phys. 2008;49:1073–1090. doi: 10.1088/0253-6102/49/5/01. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Atkins PW (1978) Physical chemistry. Oxford University, New York
  8. Brillouin L. Science and information theory. New York: Academic Press; 1956. [Google Scholar]
  9. Blackburne BP, Hirst JD. Population dynamics simulations of functional model proteins. J Chem Phys. 2005;123:154907. doi: 10.1063/1.2056545. [DOI] [PubMed] [Google Scholar]
  10. Durston KK, Chiu DK, Abel DL, Trevors JT. Measuring the functional sequence complexity of proteins. Theor Biol Med Model. 2007;4:47. doi: 10.1186/1742-4682-4-47. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Eigen M. Natural selection: a phase transition? Biophys Chem. 2000;85:101–123. doi: 10.1016/S0301-4622(00)00122-8. [DOI] [PubMed] [Google Scholar]
  12. Eigen M, Schuster P. The hypercycle. Berlin: Springer; 1979. [Google Scholar]
  13. Einstein A. On the movement of small particles suspended in stationary liquids required by the molecular-kinetic theory of heat. Ann D Phys. 1905;17:549–560. doi: 10.1002/andp.19053220806. [DOI] [Google Scholar]
  14. Fisher RA. The genetical theory of natural selection. Oxford: Clarendon; 1930. [Google Scholar]
  15. Husimi Y. Selective value landscape on the base sequence space and concept of free selective value: a model of quasi-species. Viva Origino. 1988;16:136–141. [Google Scholar]
  16. Iwasa Y. Free fitness that always increases in evolution. J Theor Biol. 1988;135:265–281. doi: 10.1016/S0022-5193(88)80243-1. [DOI] [PubMed] [Google Scholar]
  17. Iguchi K. Reciprocal relations in evolutionary processes. Prog Theor Phys. 2008;Suppl 173:235–242. [Google Scholar]
  18. Kauffman SA. The origin of order. Oxford: Oxford University Press; 1993. [Google Scholar]
  19. Kim JT, Martinetz T, Polani D. Bioinformatic principles underlying the information content of transcription factor binding sites. J Theor Biol. 2003;220:529–544. doi: 10.1006/jtbi.2003.3153. [DOI] [PubMed] [Google Scholar]
  20. Maynard-Smith J. Natural selection and the concept of a protein space. Nature. 1970;225:563–564. doi: 10.1038/225563a0. [DOI] [PubMed] [Google Scholar]
  21. Mills DR, Peterson RL, Spiegelman S. An extracellular Darwinian experiment with a self-duplicating nucleic acid molecule. Proc Natl Acad Sci USA. 1967;58:217–224. doi: 10.1073/pnas.58.1.217. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Pande VS, Grosberg AY, Tanaka T. Statistical mechanics of simple models of protein folding and design. Biophys J. 1997;73:3192–3210. doi: 10.1016/S0006-3495(97)78345-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Sato K, Ito Y, Yomo T, Kaneko K. On the relation between fluctuation and response in biological systems. Proc Natl Acad Sci USA. 2003;100:14086–14090. doi: 10.1073/pnas.2334996100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Schneider TD. Evolution of biological information. Nucleic Acid Res. 2000;28:2794–2799. doi: 10.1093/nar/28.14.2794. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Schuster P, Swetina J. Stationary mutant distributions and evolutionary optimization. Bull Math Biol. 1988;50:635–660. doi: 10.1007/BF02460094. [DOI] [PubMed] [Google Scholar]
  26. Shannon CE, Weaver W (1949) The mathematical theory of communication. University of Illinois Press, Champaign
  27. Smith E. Thermodynamics of natural selection I: energy flow and the limits on organization. J Theor Biol. 2008;252:185–197. doi: 10.1016/j.jtbi.2008.02.010. [DOI] [PubMed] [Google Scholar]
  28. Smith E. Thermodynamics of natural selection II: chemical Carnot cycles. J Theor Biol. 2008;252:198–212. doi: 10.1016/j.jtbi.2008.02.008. [DOI] [PubMed] [Google Scholar]
  29. Smith E. Thermodynamics of natural selection III: Landauer’s principle in computation and chemistry. J Theor Biol. 2008;252:213–220. doi: 10.1016/j.jtbi.2008.02.013. [DOI] [PubMed] [Google Scholar]
  30. Szilard L. Uber die Entropieverminderung in einem thermodynamicschen System bei eingriffen intelligenter Wesen. Z Physik. 1929;53:840–856. doi: 10.1007/BF01341281. [DOI] [Google Scholar]
  31. Szostak JW. Functional information: molecular messages. Nature. 2003;423:689. doi: 10.1038/423689a. [DOI] [PubMed] [Google Scholar]
  32. Weinberger ED. A theory of pragmatic information and its application to the quasi-species model of biological evolution. Biosystems. 2002;66:105–119. doi: 10.1016/S0303-2647(02)00038-2. [DOI] [PubMed] [Google Scholar]
  33. Wilke CO, Wang JL, Ofria C, Lenski RE, Adami C. Evolution of digital organisms at high mutation rates leads to survival of the flattest. Nature. 2001;412:331–333. doi: 10.1038/35085569. [DOI] [PubMed] [Google Scholar]
  34. Wolynes PG, Luthey-Schulten Z (1997) The energy landscape theory of protein folding. In: Flyvbjerg H, Hertz J, Jensen MH, Mouritsen OG, Sneppen K (eds) Physics of biological systems: from molecules to species. Springer, New York, pp 61-79

Articles from Biophysical Reviews are provided here courtesy of Springer

RESOURCES