Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Aug 5.
Published in final edited form as: Phys Rev Lett. 2010 Nov 17;105(21):218101. doi: 10.1103/PhysRevLett.105.218101

Factors Governing Fibrillogenesis of Polypeptide Chains Revealed by Lattice Models

Mai Suan Li 1, Nguyen Truong Co 2, Govardhan Reddy 3, Chin-Kun Hu 5,6, J E Straub 7, D Thirumalai 3,4
PMCID: PMC11298782  NIHMSID: NIHMS2007320  PMID: 21231356

Abstract

Using lattice models we explore the factors that determine the tendencies of polypeptide chains to aggregate by exhaustively sampling the sequence and conformational space. The morphologies of the fibril-like structures and the time scales τfib for their formation depend on a balance between hydrophobic and Coulomb interactions. The extent of population of an ensemble of N* structures, which are fibril-prone structures in the spectrum of conformations of an isolated protein, is the major determinant of τfib. This observation is used to determine the aggregating sequences by exhaustively exploring the sequence space, thus providing a basis for genome wide search of fragments that are aggregation prone.


Proteins that are unrelated by sequence or structure aggregate to form amyloidlike fibrils with a characteristic cross β structures [1(a)]. The observation that almost any protein could form fibrils seemed to imply that fibril rates can be predicted solely based on sequence composition and the propensity to adopt global secondary structure. Such a conclusion has limited validity because it does not account for fluctuations that populate aggregation-prone structures. Despite the common structural characteristics of amyloid fibrils [1] the factors that determine the fibril formation tendencies are not understood.

Experiments on fibril formation times τfib have been rationalized using global factors such as the hydrophobicity of side chains [2(a)], net charge [2(b),2(c)], patterns of polar and nonpolar residues [2(d)], frustration in secondary structure elements [2(e),2(f)], and aromatic interactions [2(g)]. However, the inability to sample the sequences and conformational spaces exhaustively [3] has prevented deciphering plausible general principles that govern protein aggregation. Here, we obtain a quantitative correlation between intrinsic properties of polypeptide sequences and their fibril growth rates using lattice models, which have given remarkable insights into the general principles of protein folding and aggregation [4]. Using a modification of the model in [5] we explore the sequencedependent variations of τfib on the nature of conformations explored by the monomer. We highlight the role of aggregation-prone ensemble of N* structures [6] in the folding landscape of the monomer in determining τfib and the propensity of sequences to form fibrils

Lattice model.—

We use a lattice model [5] in which each chain consists of M connected beads that are confined to the vertices of a cube. The simulations are done using N identical chains with M=8. The peptide sequence which is used to illustrate the roles of electrostatic and hydrophobic interaction is +HHPPHH (Fig. 1), where H, P, + and − are hydrophobic, polar, positively charged and negatively charged beads, respectively, [5].

FIG. 1.

FIG. 1.

(a) Spectrum of energies and low energy structures of the monomer sequence +HHPPHH, H, P, + and − are in green, yellow, blue, and red, respectively. We set EHH=1 and E+=1.4. There are 1831 possible conformations that are spread among 17 possible energy values. The conformations in the first excited state represent the ensemble of N* structures and the N* conformation that coincides with the peptide state in the fibril [see Fig. 2(a)] is enclosed in a box. (b) The probability PN* of populating the structure in the box in (a) as a function of T for E+=0,0.3,0.6,1 and −1.4 keeping EHH=1. The arrow indicates T*, where PN*=PN*max. Dependence of PN*max on E+ for EHH=1 (c), and on EHH for E+=1.4 (d).

The energy of N chains is [5] E=l=1Ni<jMEslisljδrija+m<1Ni,jMEslismjδrija, where rij is the distance between residues i and j, a is a lattice spacing, smi indicates the type of residue i from mth peptide, and δ0=1 and zero, otherwise. The first and second terms represent intrapeptide and interpeptide interactions, respectively.

The propensity of polar and charged residues to be “solvated” is mimicked using EPα=0.2 (in the units of hydrogen bond energy ϵH), where α=P,+, or −. To assess the importance of electrostatic and hydrophobic interactions, we vary either E+ in the interval 1.4E+0 or EHH between −1 and 0. If E+ is varied, we set EHH=1, while if EHH is varied, then E+=1.4. We used E++=E=E+/2 and all other contact interactions have Eαβ=0.2.

Monomer spectra depends on E+ and EHH.—

The spectrum of energy states of the monomer for a given sequence is determined by exact enumeration of all possible conformations (Fig. 1). For all sets of contact energies the native state (NS) of the monomer is compact [lowest-energy conformation in Fig. 1(a)]. For E+<0, the ensemble of N* structures are the first excited state [Fig. 1(a)]. However, if E+=0, the ensemble of N* structures are part of the 19 -fold degenerate states in the second excited state [see supplementary information (SI) [7], Fig. 1].

The population of the putative fibril-prone conformation in the monomeric state is PN*=expEN*/Z, where Z, the partition function is obtained by exact enumeration. Figure 1(b), shows the temperature dependence of PN* for various values of E+ interaction, with EHH=1 and other contact energies constant. Depending on E+, the maximum value of PN* varies from 2%<PN*max<12% [Fig. 1(c)]. PN*max decreases to a lesser extent as the hydrophobic interaction grows [Fig. 1(d)]. Here we consider only EHH0.4 because the fibril-like structure is not the lowest-energy when EHH>0.4.

Morphology of lowest-energy structures of multichain systems depends on sequences.—

When multiple chains are present in the unit cell, aggregation is readily observed, and in due course they lead to ordered structures. We used the Monte Carlo (MC) [5] annealing protocol, which allows for an exhaustive conformational search, to find the lowest-energy conformation. For nonzero values of E+ the chains adopt an antiparallel arrangement in the ordered protofilament, which ensures that the number of salt-bridge and hydrophobic contacts are maximized [Fig. 2(a)]. If E+=0 then the lowest-energy fibril structure has a vastly different architecture even though they are assembled from N* [Fig. 2(b)]. The structure in Fig. 2(b), in which a pair of N* conformations are stacked by flipping one with respect to the other is rendered stable by maximizing the number of +P and P contacts. We now set E+=1.4 and vary EHH. For EHH<0.4, the fibril conformation adopts the same shape as that shown in Fig. 2(a), but for EHH=0.4 the energetically more favorable double-layer structure emerges [Fig. 2(c)]. If EHH0.3, then the lowest-energy conformation ceases to have the fibril-like shape [Fig. 2(d)]. The close packed heterogenous structure is stitched together by a mixture of the NS conformation and one of the second excited conformations. Even for this simple model a variety of lowest-energy structures of oligomers and protofilaments with different morphologies emerge, depending on a subtle balance between electrostatic and hydrophobic interactions.

FIG. 2.

FIG. 2.

(a) The lowest-energy fibril structure for E+=1.4 and EHH=1. (b) Same as in (a) but with E+=0. (c) Double-layer structure for EHH=0.4 but with E+=1.4. (d) For E+=1.4 and EHH=0.3 the fibril structure is entirely altered. (e) Temperature dependence of τfib for E+=1.4 (circles) and E+=0.6 (triangles). N=6 and EHH=1. Arrows show the temperatures at which the fibril formation is fastest.

Dependence of τfib on E+ and EHH.—

Simulations were performed by enclosing N chains in a box with periodic boundary conditions and move sets described in Ref. [5]. The effect of finite size is discussed in the SI, Fig. 2. The fibril formation time τfib is defined as an average of first passage times needed to reach the fibril state with the lowest energy starting from initial random conformations. For a given value of T, we generated 50–100 MC trajectories to compute τfib. We measure time in units of a Monte Carlo step (MCS), which is a combination of local and global moves.

We performed an exhaustive study of the dependence of τfib on the number of chains in the simulation box, N (SI, Fig. 2). For highly favorable interaction between the terminal charged residues, E+=1.4, τfib scales linearly with the size of the system [SI, Fig. 2(a)], while for less favorable interactions, E+=0.6 and −0.8, lnτfib scales linearly with the size of the system (SI, Fig. 2(b)]. The temperature dependence of τfib displays a U shape [Fig. 2(e)] and the fastest assembly occurs at Tmin, which roughly coincides with the temperature, T*, where PN* reaches maximum [Fig. 1(b)]. To probe the correlation between τfib and E+ and EHH we performed simulations at Tmin. The dependence of τfib on E+ can be fit using τfibexpcE+α where α0.6 and the constant c7.12 and 9.23 for N=6 and 10 , respectively, [Fig. 3(a)]. Thus, variation of E+ drastically changes not only the morphology of the ordered protofilament (Fig. 2), but also τfib. As the strength of the charge interaction between the terminal beads increases, the faster is the fibril formation process. Interestingly, the fibril formation rate at E+=0 is about 4 orders of magnitude slower than that at E+=1.4. The propensity to fibril assembly strongly depends on the charge states of the polypeptide sequences [1(b)].

FIG. 3.

FIG. 3.

(a) Dependence of τfib on E+ for N=6 (circles) and N=10 (triangles) with EHH=1. The solid curves are fits to y=c0+c(x)α, where α0.59. c0=21.32 and c=7.12 and c0=25.14 and c=9.23 for N=6 and 10 , respectively. (b) Dependence of τfib on EHH with E+=1.4 hold constant for N=6 (solid circle) and N=10 (solid triangles). Lines are fits y=19.17+7.97x and y=22.69+8.56x for N=6 and 10 , respectively. For N=6 the first point EHH=1 is excluded from fitting. (c) Dependence of τfib on PN*max for N=6 and 10 . Symbols are the same as in (a) and (b) τfib is measured in MCS and PN*max in percent. The correlation coefficient for all fits R0.98.

By fixing E+=1.4 we calculated the dependence of τfib on the hydrophobic interaction [Fig. 3(b)], which may be approximated using τfibexpcEHH. Here constant c7.97 and 8.56 for N=6 and 10 , respectively. For N=10, a change in hydrophobicity of ΔEHH=0.6, leads to self-assembly rates that are more than 2 orders of magnitude. Thus, enhancement of hydrophobic interactions speeds up fibril formation rates [1,8].

Fibril formation rates depend on PN*.—

A plot of the data in Figs. 3(a) and 3(b) as a function of PN*max [Fig. 3(c)] yields the surprising relation

τfib=τfib0expcPN*max, (1)

where the prefactor τfib01.014×1010 MCS and 3.981×1011 MCS, and c0.9 and 1.0 , for N=6 and 10, respectively. Equation (1) is also valid for three other degenarate conformations in the N* ensemble, which are structurally similar to the one enclosed in the box in Fig. 1(a). There are a few implications of the central result given in Eq. (1). (i) The sequence-dependent spectrum of the monomer is a harbinger of fibril formation. In proteins there are multiple N* conformations corresponding to distinct free energy basins of attraction [6(b)]. Aggregation from each of the structures in the various basins of attraction could lead to fibrils with different morphologies (polymorphism) that cannot be captured using lattice models. (ii) Enhancement of PN* either by mutation or chemical cross linking should increase fibril formation rates. Indeed, a recent experiment [9] showed that the aggregation rate of Aβ140lactamD23K28, in which the residues D23 and K28 are chemically constrained by a lactam bridge, is nearly a 1000 times greater than in the wild-type. Since the salt-bridge constraint increases the population of the N* conformation in the monomeric state [10], it follows from Eq. (1), τfib should decrease. (iii) Since PN*T depends on the spectrum of the precise sequence for a given set of external conditions, it follows that the entire free energy landscape of the monomer [6(b)] and not merely the sequence composition as ascertained else where [1(b)], should be considered in the predictions of the amyloidogenic tendencies. (iv) Eq. (1) is suggestive of a fluctuation-driven nucleation mechanism with a complicated temperature dependence. (v) Finally, as a negative control, plots of lnτfib as a function of PCmax, where represents a conformation from the second or the third excited state (SI, Fig. 3) show that Eq. (1) does not hold for these structures.

Sequence space scanning.—

We use Eq. (1) to determine the amylome [11], the universe of sequences in the lattice model, that can form fibrils. We posit that aggregation-prone sequences are those with a unique native state with a maximum in PN*T in the interval 1.0T*/TF1.25. If TF=300K, which is physically reasonable, T*=375K if T*TF=1.25. Thus, for values of T*TF>1.25T* would be far too high to be physically relevant. Our conclusions will not change by increasing T*TF or alternatively by choosing a reasonable threshold value for PN*. Out of the 65536 sequences only 217 satisfy these criteria (see supplementary information for details [7] ). The sequence space exploration shows that there is a high degree of correlation between the positions of charged and hydrophobic residues leading to a limited number of aggregation-prone sequences with +HHPPHH being an example. In addition, there are substantial variations in T*/TF for sequences with identical sequence composition, which reinforces the recent finding [11] that context in which charged and hydrophobic residues are found is important in the tendency to form amyloidlike fibrils. Our study also provides a basis for genome wide search for consensus sequences with propensity to aggregate.

Supplementary Material

Supplemental Material

Acknowledgments

The work was supported by the Ministry of Science and Informatics in Poland (grant No 202-204-234), grants NSC 96-2911-M 001-003-MY3 & AS-95-TP-A07, National Center for Theoretical Sciences in Taiwan, NIH Grant R01GM076688-05, and Department of Science and Technology of Ho Chi Minh City, Vietnam.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Material

RESOURCES