Abstract
The immune system normally protects the human host against death by infection. However, when an immune response is mistakenly directed at self antigens, autoimmune disease can occur. We describe a model of protein evolution to simulate the dynamics of the adaptive immune response to antigens. Computer simulations of the dynamics of antibody evolution show that different evolutionary mechanisms, namely gene segment swapping and point mutation, lead to different evolved antibody binding affinities. Although a combination of gene segment swapping and point mutation can yield a greater affinity to a specific antigen than point mutation alone, the antibodies so evolved are highly cross-reactive and would cause autoimmune disease, and this is not the chosen dynamics of the immune system. We suggest that in the immune system a balance has evolved between binding affinity and specificity in the mechanism for searching the amino acid sequence space of antibodies.
One of the main functions of the immune system is to generate B cells that secrete antibodies, protein molecules that recognize and bind to antigen. The immune system has evolved a hierarchical strategy to produce antibodies that combat invading pathogens [1]. The first level is VDJ recombination, in which gene fragments are rearranged to encode functional antibodies, producing a range of antibodies with a diversity on the order of 1012 to 1014. Upon invasion of the body by a foreign antigen, the second level of B cell somatic hypermutation evolution is initiated, in which division, mutation, and selection of B cells occurs. Those B cells that produce antibodies that bind the antigen with higher affinities are selected and propagated. Thus, the individual point mutations of somatic hypermutation essentially perform an optimizing local search of amino acid sequence space.
When an immune response is mistakenly directed at self antigens, autoimmune disease occurs [1]. In most cases, the antibody/antigen interaction is highly specific. Sometimes, however, the antibody evolved in response to one antigen can bind other antigens, and this phenomenon is called cross-reactivity [2]. Cross-reactivity happens when the other antigen has chemical features in common with the original antigen and is quantified experimentally by measuring the affinity of the antibody for the other antigen. Cross-reactivity is one mechanism by which autoimmune disease may develop.
In this Letter, we present two major results. First, we show how different evolutionary mechanisms influence the relaxation dynamics of an evolving population of proteins. Specifically we study the dynamics of antibody generation by the immune system in response to pathogen invasion. These dynamics are crucial to the efficacy of the immune response. The hierarchical structure of our model protein Hamiltonian plays a critical role in the dynamics. The hierarchical structure also distinguishes our model from traditional short- or long-range spin glass energy models [3, 4]. Second, we show that antibodies with significantly higher affinities to antigen than those produced in a primary immune response can be found with a biologically-plausible evolution process, but are more cross-reactive and would greatly increase susceptibility to autoimmune disorders. The concept of cross-reactivity is related to that of the chaos exponent in traditional spin glasses [5, 6], except that for cross-reactivity we are interested in the immediate energy response rather than in the equilibrated response of the system to a change in the couplings. Taking our two results together, we show that in the primary adaptive immune response a careful balance has evolved between specificity for and binding affinity to foreign antigen.
To express the interaction energy between antibodies and antigen, we use the generalized NK model, a model that captures important features of the immune response to vaccination and disease and of protein molecular evolution [7–9]. The energy of interaction for a given protein or antibody sequence is defined in the model as
| (1) |
The term Usd includes interactions within a secondary structure. Interactions between secondary structures are essential to protein folding and function. These terms are in Usd–sd. The total number of interactions with a typical amino acid is roughly 12, and half of these are in Usd (2(K − 1)), and the other half in Usd–sd (D(M − 1)/N). The term Uc is the direct binding interaction–between the antibody and antigen. The number of antibody secondary structural subdomains is M = 10, and the number of antibody amino acids involved directly in the binding is known to be approximately P = 5 [8]. We define the binding constant between antibody and antigen to be equal to K = expa−b⟨U⟩ where a and b are constants found by comparison to immunological data [9]. Thus, the lower the overall energy, the better the binding affinity of the antibody to the given antigen. The secondary structure energy Usd is
| (2) |
where N = 10 is the number of amino acids in a subdomain, and K = 4 [10] specifies the range of the local interactions within the secondary structure. We consider five chemically distinct amino acid classes (negative, positive, polar, hydrophobic, and other), and all subdomains belong to one of L = 5 different types (helices, strands, loops, turns, and others). The quenched Gaussian random number σαi is different for each value of its argument for a given subdomain type, αi. All of the Gaussian σ values have zero mean and unit variance. The energy of interaction between secondary structures i and j is
| (3) |
where D = 6 specifies the number of interactions between secondary structures. Here and the interacting amino acids, l1, …, lK, are selected at random for each interaction i, j, k. The chemical binding energy of each antibody amino acid to the antigen is given by . The contributing amino acid, ai, and the unit-normal weight of the binding, σi, are chosen at random.
In our first study, we replicate the immune system dynamics. Each protein sequence is of the length of N × M = 100 amino acids, corresponding to the length of the heavy chain variable region of an antibody. Initially there are 103 sequences, as in the human immune response [9]. Two different strategies are used in our simulations to search protein sequence space for high affinity antibodies. The first strategy mimics the normal adaptive humoral immune response. It starts with the combinatorial joining of optimized subdomains, after which the sequences undergo rounds of point mutation (PM) and selection. This is a simulation of the VDJ recombination, somatic hypermutation, and clonal selection that occurs in B cell development [2]. We generate five optimized subdomain pools, each composed of 300 low-energy subdomains, corresponding to the L = 5 types. We use three sequences from each subdomain pool in our VDJ recombination, mimicking the known diversity [9].
In our second study, we include a more powerful search of sequence space in the antibody evolution dynamics. VDJ recombination is used to generate an initial population, as in the normal immune response, but in addition to somatic hypermutation, we perform gene segment swapping (GSS) during each round of evolution and selection. GSS-type processes are used by experimentalists to produce antibodies with binding constants ≈ 1011 − 1013 [11], and exist within the natural hierarchy of evolutionary events [7, 12].
In the process of GSS, a subdomain belonging to one of the five types is replaced by another one from the optimized subdomain pool of the same type. Each sequence undergoes an average of 0.5 point mutations in each round of selection in both strategies. In GSS+PM, each subdomain in a sequence has a probability of 0.05 of being replaced in the process of GSS. Following mutation, selection occurs, and the 20% lowest energy antibodies are kept and are amplified to form the population of 103 sequences for the next round of mutation and selection. The primary response is comprised of 30 rounds of affinity maturation, corresponding to a lag phase of approximately 10 days [9], during which B cells undergo clonal selection in response to antigen and differentiate into plasma cells and memory cells. The results that follow are averaged over 5000 instances of the ensemble.
The average affinity of the population of antibodies evolves during mutation and selection. The evolution of the affinity energy under the two different strategies is shown in Fig. 1. The GSS+PM yields sequences with a lower energy than those from PM. The gap is due to each of the subdomains having an energy improved from an average to an exceptional one, , and due to better sd-sd interactions found by GSS. In other words, the former is a better mechanism than the latter in searching sequence space for higher affinity antibodies to the given antigen. The best binding energy averaged over 5000 instances of the ensemble is UG = −21.9 in the GSS+PM case and UP = −19.7 in the PM only case at round 30, corresponding to affinities of K = 6.7 × 107 l/mol and K = 1.6 × 106 l/mol, respectively. So, GSS+PM yields an affinity more than an order of magnitude improved, which is even better than the affinity obtained from a secondary response by PM only [9]. That is, the multi-spin flips of GSS+PM especially accelerate the optimization of Usd–sd in amino acid sequence space. Antibodies with a higher affinity to the antigen work more effectively in many ways. For example they can neutralize bacterial toxins, inhibit the infectiousness of viruses, and block the adhesion of bacteria to host cells at lower serum concentrations of antibody [2]. Based solely on Fig. 2, it is hard to understand why Darwinian evolution did not result in GSS+PM or any other more efficient strategy, rather than somatic hypermutation, as the preferred strategy for B cell expansion.
FIG. 1.

Evolution of the affinity energy for the cases of point mutation (PM) only and gene segment swapping (GSS) plus PM as a function of the number of rounds of mutation and selection used to evolve the population of antibodies. Energies evolved at larger number of rounds shown in inset.
FIG. 2.

Affinity of memory antibody sequences after a primary immune response for the two different immune system strategies (PM and GSS+PM) to altered antigens. The binding constant is K, and the antigenic distance of the new altered antigen from the original antigen is p. Cross-reactivity ceases at larger distances in the GSS+PM case (no cross-reactivity for p > 0.472) than in the PM only case (no cross-reactivity for p > 0.368).
A calculation of cross-reactivity, which quantifies the specificity of the antibody, is used to compare the antibodies generated by the two strategies. Thirty rounds of primary response affinity maturation are conducted for both PM and GSS+PM. The antigen is then changed by p, and so each interaction parameter in the Hamiltonian is changed with probability p. The affinity constants are calculated for the new antigen in the two cases. As shown in Fig. 2, we find that cross-reactivity ceases approximately Δp = 0.10 later in the GSS+PM case. We also find that the affinity constant, K, decreases exponentially with the degree of antigenic change, i.e. the binding energy, U, increases linearly with the change in the antigen, p. Within the region of specific binding (K > 100 l/mol), affinity is always better in the GSS+PM case. These cross-reactivity experiments show that the antibodies generated by the dynamics of GSS+PM can recognize more antigens and with higher affinity than those given by the PM dynamics only. Such cross reactivity has recently been observed [13]. Interestingly, U(p = 1) = U(T = 0)/5, since at p = 1 the only correlation is that αi remains unchanged 20% of the time. The fraction p of new σ values in Usd–sd and Uc are centered around zero and negligible compared to the original, negative evolved values. In Usd, a fraction p of the αi are changed, and the original, evolved sequence gives a value of roughly zero in a changed type αj. The overall energy lost in U is, thus, proportional to p, Fig. 2
Now the question we must answer is how many protein molecules are there between p1 = 0.368 and p2 = 0.472 that can be recognized by the antibodies produced through VDJ recombination and GSS+PM, but cannot be recognized by the antibodies produced through VDJ recombination and PM only. Antibodies recognize and bind to epitope regions of an antigen, so we are interested in how many epitopes these two different classes of antibodies are likely to recognize. We now show that the abnormal antibodies recognize 103 times more epitopes than do the antibodies produced by the normal primary immune response. We take the typical epitope length to be B = 20 amino acids [2, 14]. In the theory, the antigenic distance is defined by the chemical composition of the epitope. The total number of possible epitopes is, therefore, 20B since there are 20 different amino acids [15]. In Fig. 3 we show the normalized number of epitopes that are at an antigenic distance of p from the native epitope of the antibody, given by N(p) = 19iB!/[i!(B − i)!], where i = pB. This number is N(p1)/20B = 2 × 10−12 and N(p2)/20B = 2 × 10−9. The number of epitopes between p = 0 and p1 and the number between p1 and p2 is approximately A1 ≈ 2 × 10−12 and A2 ≈ 2 × 10−9, respectively, since N grows exponentially with p. In the limit of large B, N(p2)/N(p1) ~ exp{−B[ln(1 − p2)/(1 − p1) + p1 ln 19(1/p1 − 1) − p2 ln 19 (1/p2 − 1)]}. This ratio is 103 for 20 amino acids. The ratio varies between 160 and 4900 at the 15–25 amino acid limits of known epitope sizes. The ratio varies by only 30% with ±10% variation of both p values, the precision to which the generalized NK model p values agree with experiment [9, 16].
FIG. 3.

The number of possible epitopes that are an antigenic distance of p from another epitope. The epitopes are assumed to be 20 amino acids in length. The plot is normalized, so that ∑pN(p) = 2020.
In order to answer the question of why selection has favored the relatively slower dynamics in the immune response, we consider the function of the immune system as a whole rather than just the function of a single antibody binding to a single foreign antigen. The immune system is a delicate balance of a strong response to invading non-self antigens and weak or no response to self antigens. Thus, it is very important for the antibodies produced by the immune system to discriminate self from non-self [17–19]. An immune system incapable of recognizing an invading pathogen and initiating a response is an inadequate defense mechanism. On the other hand, production of antibodies binding to self antigens results in autoimmune diseases, e.g. diseases such as type I diabetes and rheumatoid arthritis. It might be thought that GSS+PM would be superior dynamics, if only the immune system were able to perform ≈ 10 rounds. However, each subsequent exposure to related antigen would lead to another ≈ 10 rounds. After a couple such exposures, GSS+PM has evolved an an unacceptably large binding constant, which saturates after several exposures to of the order of 1010 l/mol, whereas PM saturates at the value of 107 l/mol. Even with PM dynamics, our model predicts that chronic infection may lead to autoimmune disease, a mechanism postulated to be responsible for some fraction of rheumatic diseases, including arthritis [20]. To resolve the significance of this mechanism, it would be interesting to see whether the distribution of onset times is broad, as predicted by our model.
It has been experimentally shown that cross-reactivity can lead to autoimmune disease [1]. From Figs. 2–3, we know that the antibodies obtained at the end of a primary immune response can recognize a random epitope with a length of 20 amino acids with a probability of ≈ 10−12. It is also known that in a typical cell there are ≈ 104 proteins, each with a length of ≈ 500 amino acids [21]. Antibodies binding to proteins recognize only surface epitopes. The amino acids exposed on the surface of proteins are typically part of a loop or turn, and typically 1/3 of the loop or turn is exposed. Thus, a typical contiguous recognition fragment length is 6–7 amino acids. Given a typical length of 20 amino acids for the entire loop or turn, and the typical epitope size of 20 amino acids, an antibody will recognize roughly 3 non-contiguous regions of length 6–7 amino acids in the protein sequence of 500 amino acids. There are approximately 5003 ≈ 108 such epitopes. Given the 104 proteins in a cell, there will be ≈ 1012 total epitopes expressed in each cell. Thus, the number of epitopes recognized by a typical antibody in each cell is A1 × 1012 ≈ 1. Even taking an exceptional protein of length 1000 amino acids, the number increases only to 10.
This remarkable result shows that the antibodies produced by the immune system recognize on average only their intended target. Conversely, antibodies that would be evolved through an immune response composed of VDJ recombination followed by a period of GSS+PM would recognize on average A2 × 1012 ≈ 103 epitopes in each cell. Such antibodies, while having higher affinities for the intended target, would also lead to many more instances of autoimmune disease. Such promiscuous antibodies would place too large a burden on the regulatory mechanisms that eliminate the occasional aberrant antibody [22]. Thus, we find that selection has successfully evolved the human immune system to generate antibodies that recognize on average only the intended epitope after a humoral immune response. Inclusion of “more efficient” moves is generally excluded by the bound A × 1012 = O(1).
Footnotes
PACS numbers: 87.10.+e, 87.15.Aa, 87.17.-d, 87.23.Kg
References
- [1].Janeway CA, Travers P, Walport M, Shlomchik MJ. Immunobiology. 6th ed Garland Publishing; New York: 2004. [Google Scholar]
- [2].Goldsby RA, Kindt TJ, Osborne BA, Kuby J. Immunobiology. 5th ed W. H. Freeman and Company; New York: 2002. [Google Scholar]
- [3].Fischer KH, Hertz JA. Spin Glasses. Cambridge University Press; New York: 1991. [Google Scholar]
- [4].Gross DJ, Mézard M. Nucl. Phys. B. 1984;240:431. [Google Scholar]
- [5].Ney-Nifle M, Young AP. J. Phys. A. 1997;30:5311. [Google Scholar]
- [6].Azcoiti V, Follana E, Ritort F. J. Phys. A. 1995;28:3863. [Google Scholar]
- [7].Earl DJ, Deem MW. Proc. Natl. Acad. Sci. USA. 2004;101:11531. doi: 10.1073/pnas.0404656101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Bogarad LD, Deem MW. Proc. Natl. Acad. Sci. USA. 1999;96:2591. doi: 10.1073/pnas.96.6.2591. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Deem MW, Lee H-Y. Phys. Rev. Lett. 2003;91:068101. doi: 10.1103/PhysRevLett.91.068101. [DOI] [PubMed] [Google Scholar]
- [10].Kauffman S, Levin S. J. Theor. Biol. 1987;128:11. doi: 10.1016/s0022-5193(87)80029-2. [DOI] [PubMed] [Google Scholar]
- [11].S. R, et al. J. Mol. Bio. 1996;263:551. [Google Scholar]
- [12].Kidwell MG, Lisch DR. Evolution. 2001;55:1. doi: 10.1111/j.0014-3820.2001.tb01268.x. [DOI] [PubMed] [Google Scholar]
- [13].Holler PD, Chlewicki LK, Kranz DM. Nature Im. 2003;4:55. doi: 10.1038/ni863. [DOI] [PubMed] [Google Scholar]
- [14].Muñoz ET, Deem MW. Vaccine. 2004;23:1142. [Google Scholar]
- [15].Campbell NA, Reece JB. Biology. 7th ed Benjamin Cummings; San Francisco: 2004. [Google Scholar]
- [16].Gupta V, Earl DJ, Deem MW. 2005. q-bio.BM/0503030.
- [17].Percus JK, Percus OE, Perelson AS. Proc. Natl. Acad. Sci. USA. 1993;90:1691. doi: 10.1073/pnas.90.5.1691. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [18].Detours V, Bersini H, Stewart J, Varela F. J. Theor. Biol. 1994;170:401. doi: 10.1006/jtbi.1994.1201. [DOI] [PubMed] [Google Scholar]
- [19].Sulzer J, van Hemmen L, Newmann AU, Behn U. Bull. Math. Biol. 1993;55:1133. doi: 10.1007/BF02460702. [DOI] [PubMed] [Google Scholar]
- [20].Leirisalo-Repo M. Curr. Opin. in Rheumatology. 2005;17:433. doi: 10.1097/01.bor.0000166388.47604.8b. [DOI] [PubMed] [Google Scholar]
- [21].Tan T, Deem MW. Physica A. 2004;350:52. [Google Scholar]
- [22].Larché M, Wraith DC. Nature Med. 2005;11:S69. doi: 10.1038/nm1226. [DOI] [PubMed] [Google Scholar]
