Abstract
Computational studies have given a great contribution in building our current understanding of the complex behavior of protein molecules; nevertheless, a complete characterization of their free energy landscape still represents a major challenge. Here, we introduce a new coarse-grained approach that allows for an extensive sampling of the conformational space of a large number of sequences. We explicitly discuss its application in protein design, and by studying four representative proteins, we show that the method generates sequences with a relatively smooth free energy surface directed towards the target structures.
Introduction
Protein molecules play a central role in the large majority of biochemical reactions in living organisms [1]. Performance of these functions generally requires folding of the proteins into a specific three-dimensional structure, the so-called native state [2], [3], (a number of exceptions involving the so-called “disordered” proteins has also been discovered [4]). Computer simulations combined with experiments have given a great contribution to our current understanding of the complex behavior of protein molecules and of the mechanism by which folding takes place [3], [5], [6]. Advances have been made through the use of atomistic models, which are capable of providing detailed descriptions of protein dynamics [7], [5], [6], and through the development of coarse-grained representations, which enable more comprehensive sampling of the conformational space [3], [8], [9], [10].
A common approach to protein folding involves the use of Go-models [11]. The Go-models are non-transferable potentials tailored to the native structure such that each amino acid interacts selectively with a subset of residues and only when in the native configuration. Hence, Go-proteins are hypothetical proteins with a arbitrary variety of pair interactions among the residues (alphabet), but are able to successfully fold, and have a smooth free-energy landscape with a single global minimum in the native structure. However, if the size of the alphabet is reduced, for instance, to the letter alphabet of real proteins, it becomes more and more difficult to observe folding for a random sequence, as the landscape most of the time changes from the smoothnes of Go-models, to rugged with many local minima. Hence, folding becomes more complex and requires an extensive search in the space of possible sequences to obtain a folding chain. For this reason these methods are often referred as “protein design”. Protein design was originally developed for lattice heteropolymers by Wolynes [12], and recently has been extended by Coluzza et al. [13]. By using lattice models it was possible not only to design heteropolymers with a large variety of target configurations, but also to generate lattice proteins with more complex self-assembly properties [14], [15]. The solution of the design problem is of considerable interest in biotechnology as it holds promises for the engineering of proteins with new functional properties. Some successful designs of novel artificial enzymes have been obtained by introducing residues expected to play a catalytic role in a specific reaction [16] in sequences with known folds.
In this work we will go beyond lattice models by introducing a novel design procedure that can produce realistic amino acid sequences able to fold into protein structures taken directly from experimental data. In what follows we will demonstrate for the first time that accurate representation of the protein backbone is a necessary condition for successful protein design, as such constraints confine the possible configurations of proteins to the structural space of real proteins. Our hypothesis is based on the observation that the design procedure developed for lattice proteins was unable to produce folding sequences when applied to simple off-lattice representations (e.g. a flexible chains of particles) (as indicated by our earlier simualtions). In order to understand the importance of constraints, let us ignore for a moment long range correlations in the system (i. e., we make a mean field approximation). Hence, energy minimization can be viewed as a local optimization of the residue-residue pair-interactions. In such condition, sequence mutation (design) is guaranteed to find the same minimum as configuration changes (folding) [12], provided that the number of possible sequences is larger or equal to the number of all possible configurations. Hence, for real proteins with an alphabet limited to letters, it becomes clear that one needs to introduce constraints that limit the size of the configurational space (e.g. cubic lattice). Of course, in order to reproduce the space of real proteins, a specific set of constraints is needed that, contrary to Go potentials [11], does not vary from protein to protein.
Recently, Maritan and co-workers [17], [18], [19], have introduced a novel protein coarse-graining procedure by representing a typical protein as a flexible self-avoiding tube (from here the name “Tube” model) with a radius of and effective hydrogen bonds interactions along the tube. The configurations of the tube model are controlled by just two parameters, the total hydrophobicity and the bending rigidity, that drive the tube into all secondary and many known protein’s tertiary structures. Hence, the results obtained with the tube model, strongly suggest that the typical protein structures are inherent in the geometrical constraints of the backbone, as the latter are the main features of the the tube model. To put in the words of the authors the tube “pre-sculpts” the free energy landscape. So far, a design method for the tube model has not been introduced and when hydrophilic/hydrophobic patterns of typical proteins were tried, the tube model could not systematically fold to the native structures [20]. However, we believe that the tube model highlighted the important type of constraints necessary to design sequences for real protein structure, namely the self-avoidance of the backbone and the hydrogen bonds.
In order to support our hypothesis, we have developed a new model taking inspiration from the work of Maritan and co-workers, but unlike the tube model, the physico-chemical properties of individual amino acids are represented by an effective spherical potential centered on the atoms, and a more realistic potential to represents the hydrogen bonding interactions. We refer to this model as the caterpillar model because of the image created by the spheres that follow the backbone (Fig. 1). The behavior of the caterpillar model depends on the balance between the spherical and hydrogen bond potentials. The main differences between the caterpillar and the tube model is that our model considers an arbitrary alphabet of amino acids and has a more detailed structure of the backbone that represents more faithfully the hydrogen bonding interactions. However, we retain the tube nature of the protein, via the self-avoiding core of the spheres centred on the atoms [21]. We expect then the constraints resulting from the spherical and the hydrogen bonding potentials to confine the polypeptide chains, to regions of the conformational space with realistic protein-like structure elements. It is important to notice that the higher level of description of the caterpillar model allows not only for a higher precision in the representation of structures, but also to directly transfer the results obtained with the caterpillar model to the further refinement of full atomistic simulations. In fact, in order to further study the results of the caterpillar model with full atomistic simulations, we only need to add the atoms of the side chains of each amino acid directly on the backbone configurations of the coarse-grained simulations. Moreover, the use of spheres to account for self-avoidance is computationally more efficient [21] than the three-body interaction rules used in the tube model [18].
In this paper, we will show that the caterpillar model satisfies the two conditions mentioned above for foldabilty and designability, as it retains the elements of the polypeptide chain essential for the folding of designed sequences, and at the same time is simple enough to allow for an extensive exploration of the configurational space. Below we describe the novel design procedure based on the caterpillar model and we discuss the design of four representative protein structures taken directly from the Protein Data Bank (PDB) [22] [38]. We show that with our model we are able to design all test structures, and generate a large number of sequences with the target configurations stting at the bottom of a global free energy minimum. Finally, to further support the tangible link with real proteins we show that the hydrophobic/philic profile of designed sequences agrees with that typical of real sequences, and more importantly we demonstrate that the caterpillar model can refold the sequence of one of the four test proteins to its corresponding native structure.
Methods
Model
As outlined above, the caterpillar model is a 5-bead model with the augmented by the full main atomic positions to introduce directional hydrogen bonds. The degrees of freedom of the model are the torsional angles and ; all other structural parameters are kept fixed at values from the literature [23]. The C, O, N, H positions were determined from the atoms as shown in Fig. 1.
The side chain interactions are represented by and effective - sphere-sphere interaction energy given by
(1) |
where is the distance between the atoms at the centers of spheres and and () is the distance at which ; is a scale factor; see below. This expression provides a continuous square well form for the sphere-sphere interaction energy [39]. To determine the parameter we made use of the model of Betancourt and Thirumalai (BT) [24], in which the interaction energies were derived from a calculation of the contact frequency in the PDB. This potential had been used primarily for lattice proteins, but it is also appropriate for the caterpillar model, which employs a square-well-like potential. Backbone hydrogen bonds were modeled with a 10–12 Lennard-Jones type potential using the expression [25]
(2) |
where is the distance between the hydrogen atom of the amide group (NH) and the oxygen atom of the carboxyl group (CO) of the main chain. We set , , and ; the values are given in [25].
To complete the parametrization, we need to determine and . Since BT is a contact potential, there is no cutoff value. Here, we considered the - pair-correlation function g() of several proteins and found that it begins to decay at approximately (see Figure S1). This behavior can be interpreted as the range of the effective interactions among amino acids. For larger values of , the system tends to acquire a mean field behavior, where every particle interacts with all the others, regardless of the geometry. By contrast, for smaller value of , correlations that are crucial for the stability of the target structure can be missed. The parameter was chosen to balance the contributions of (Eq. (1)) and (Eq. (2)). With , and provide approximately the same contributions to the energy per particle. If is too small, all sequences form -helices, while if it is too large all sequences fail to self-assemble and collapse in random glassy structures. In Eq. (2), the directionality of the hydrogen bonds is accounted for by multiplying the Lennard-Jones term by a factor containing the and angles between the atoms COH and OHN, respectively. (Figures S2, S3 shows the distance dependence and angular dependence of 0). The directionality of the hydrogen bonds is essential to make more probable regions of conformational space characterized by the secondary structure elements typical of proteins. The spheres centered on the position of the atoms ensure that only the maximum of the term in Eq. (2) for angles close to is accessible; that at corresponds to configurations that are not allowed by the self-avoiding volumes of the spheres.
The energy function, Eq. (1), does not take the effects of the solvent into account explicitly. Although the designed sequences are able to fold to their respective target structures, their surface exposure profiles do not necessarily reproduce those of actual proteins. To improve this aspect of the design, we added an energy term that penalizes the surfaces exposure of hydrophobic amino acids; the expression has the form
(3) |
where is a threshold for the number of contacts in the native structure above which the amino acid is considered to be fully buried and is the Dolittle hydrophobicity index [26], rescaled by to make this term match the contributions from the other energy terms. The number of contacts for the amino acids in the native state varies between and ; the value 24 was chosen for (see Fig. 5).
The designs described in this paper were done mainly using only Eqs. (1) and (2) for the energy. A comparison calculation was then made for one protein including the solvation energy term of Eq. (3).
Design procedure
Given the potential function for the caterpillar model, there are two steps in the design procedure. First, a larger number () of sequences with a low energy and high sequence heterogeneity are generated using the target structure. Second, a selected subset is studied to determine its free energy surface and folding properties.
Several methods have been proposed to design the sequence of proteins such that they fold into a specific target conformation [27], [28], [13], [29]. We use here a modified version of a method that we described recently [13], which generates sequences by minimizing the energy of the target configuration and, at the same time, maximizes the number of amino acid permutations to increase the sequence heterogeneity. With this procedure the distribution of possible sequences remains large, which is necessary to generate sequences with a free energy minimum low enough to stabilize the folded state [27]. The search in sequence space is carried out by a parallel tempering Monte Carlo procedure with single point mutation moves. As in the conventional Metropolis scheme, the acceptance of trial moves depends on the ratio of the Boltzmann weights at a design temperature of the new and old states [30]. However, if this were the only criterion, there would be a tendency to generate homopolymer chains with a low energy, rather than chains that fold selectively into a specific target structure. To ensure an amino acid composition far from the homopolymer region of the sequence space, we impose the following acceptance criterion for a single mutation
(4) |
where is the difference of the energy before and after the mutation attempt, is a scale factor for the relative value of the two terms in the equation, and is the number of permutations that are possible for a given set of amino acids; is given by the multinomial distribution
(5) |
where is the total number of monomers and , etc are the number of amino acids of type 1,2, etc. While sampling the sequence space with the Monte Carlo scheme, we set to high enough value to generate sequences with a heterogeneous composition. To adequately sample the sequence space, we generated sequences with the native structure as the template using the parallel tempering scheme [13] with a set of temperatures in units of . From these we selected the ones most likely to yield stable structures for the native state. For this purpose, we used the Landau free energy , defined by
(6) |
and generate the two-dimensional normalized histogram of the distribution of the pair ( and ) collected over the ensemble of the generated sequences. For further study we chose a small number of sequences with low Landau free energy; i.e., ensembles of sequences that have a reasonably low energy and a high probability of being observed. The rationale for this choice is that such sequences are robust against point mutations, which are correlated with the overall thermodynamic stability ([31], [32]; see also [33]). Our criterion can be understood with a simple argument in the mean-field approximation, where we consider only short range correlations between the amino acids in the chains. In these conditions point mutations are equivalent to small structural distortions, as both perturbations only have a local effect. Hence, proteins that are robust against point mutations are most probably resistant to small deformations induced by thermal fluctuations.
For each selected sequence, we computed the free energy as a function of a the order parameter , where is defined by
(7) |
where denotes a normalized histogram of the number of sampled conformations with order parameter , and is the Distance Root mean square difference (DRMSD) from the native structure. In practice, a direct calculation of this histogram is not efficient, since even the caterpillar model tends to be trapped in local minima, especially at low temperatures. To induce escape from these local minima, we made use of the Virtual Move Parallel Tempering Monte Carlo sampling scheme proposed by Coluzza and Frenkel [34], based on the Waste Recycling approach [35]. This scheme is very efficient in sampling both high and low free energy states (see supplementary informations). We find that on a 4 quad-core dual Xeon (Harpertown) compute nodes the calculation of as a function of for a single sequence requires 336 hours of CPU time, while generation of the sequences requires only 2 hours CPU time.
We used the native conformations of four representative proteins as target structures (see Fig. 2), the B1 immunoglobulin-binding domain of streptococcal protein G (PDB ID 1PGB), the C-terminal domain of the ribosomal protein L7/L12 of E. coli (PDB ID 1CTF), a putative lipoprotein from Pseudomonas syringae (Gene Locus PSPTO2350, PDB code 2K57), and the UBA domain of Tap/NXF1 (PDB ID 1OAI).
Results
The Landau free energy diagram for protein 1CTF, which we studied in detail, is shown in Fig. 3. As is evident from the diagram, the lowest energy sequences and lowest Landau free energy are not directly correlated; i.e., there are numerous very low energy structures with sequences that have a low probability of being observed. We then calculated the free energy as a function of the DRMSD from the native structure (Eq. (7)) for five selected low free energy, high heterogeneity sequences of protein 1CTF; they corresponds to the point indicated by the arrow labeled “LowF” in Fig. 3. Figure 4 shows the free energy surfaces for these proteins at a low temperature where the proteins are stable with the present energy function, it is a relatively smooth surface with the minima of the free energy at an DRMSD in the range to ; the breadth of the surface can be argued to reflect the structural fluctuations present in the native state. Because of the definition of DRMSD, structures that are long lived would appear as free energy minima at high values of DRMSD. Hence, the smoothness of the free energy profiles in Fig. 4 indicates that the folding process of our artificial sequences occurs spontaneously with no long lived metastable states. An important result is that in the low temperature simulations, the free energy surface shows no misfolded states with free energies below that of the target structure. It is important to notice that in order to have a single free energy minimum, we did not explicitly impose to the design process to disfavor particular conformations of the chain. Similar results for the other three systems are given in supplementary informations (Figure S4), and overall we get a structure prediction precision between and in DRMSD ( and in RMSD [40]) as shown in Fig. 2.
We tested if the sequence selection mechanisms, based on the Landau free energy, performed better than simply taking a low energy sequence, as was done previously for lattice proteins [27]. Figure 4 also shows the results of a sequence selected for its low energy (LowE) with a relatively high number of permutations; see Fig. 3. In order to show how important is to select sequences from the most probable ensemble, we chose the LowE sequence not too far from the global sequence free energy minimum. Nevertheless, the folding of “LowE” is significantly less reliable than that of the LowF sequences, as the equilibrium configuration of LowE draomatically differs from the native structure.
We finally introduced the solvation term in equation (3). By including the latter we repeat the design procedure for L7/L12, and the refolding for the natural sequence of L7/L12 as taken from the PDB (1CTF). We set . In figure 5 we plot the hydrophobic/philic profile (HP) of the protein 1CTF designed with and without the solvation term in Eq. (3). The first important observation is that even with our ranged potential (Eq. (2)), we are able to distinguish between buried amino acids and surface residues, as is demonstrated by the large variation in the number of contacts (top frame). Moreover the HP profiles averaged over the designed sequences with solvation term (W.S.) follow much better the contact profile than the profile relative to sequences designed without the solvent term (Wo.S.), indicating that our “artificial” proteins have a hydrophobic core surrounded by hydrophilic amino acids as expected for molecules that live in aqueous solutions [36]. Finally we compared the artificial HP profiles to the average profile obtained from the Pfam alignment data (PF00542) for protein 1CTF; the curve for W.S. sequences is qualitatively comparable to one of the real proteins, as the discrepancies (between residue 20 and 30 and around residue 45) occur in regions where the wild type proteins express hydrophobic residues even if highly exposed to the solvent, which could be the results of functionalities that we did not include in the design procedure. At this point it is natural to ask if the caterpillar model, with the solvation term, is able to reproduce the folded structures of real proteins, since we have shown that designed sequences refold to the target structure, and the design now produces protein like sequences. In Fig. 4 we plot the folding free energy profile of the natural sequence of protein L7/L12. The profile is qualitatively similar to the one obtained from the folding of the artificial sequences, and the distance of the global free energy minimum from the X-Ray structure is still small (1.6 Å DRMSD, 3.4 Å RMSD). Hence, the quality is again striking considering that the only parameters we had to adjust in the model are the range of the potential, the scaling factor of the interaction and the threshold of the solvation term.
We conclude that a carefully tuned external field can produce a protein-like hydrophobicity profile (Fig. 5) and closely predict the native structure of the real sequence (Fig. 4). The proposed framework is then fully self consistent since the design procedure is able to produce natural-like sequences, while the folding properties of the caterpillar are compatible with the folding of real natural sequences. Moreover, the estimation of all free parameters is based on the condition that designed sequences must refold into their respective target structure, and as a result we reattain fundamental properties of real proteins that we did not impose to the system. Model and methodology are therefore shown to be an important step forward in bridging the crucial gap between a coarse grained representation and a fully atomistic description of proteins.
Discussion
In this work we introduce a fundamental criterion for the designability of coarse-grained models of proteins. With the caterpillar model we are able to design protein sequences for various proteins representative of the typical combinations of protein secondary structures. Each of the tested sequences reached the target structure with a very high precision considering the simplicity of the model, demonstrating that the procedure is universal for proteins with different proportions of alpha helices and beta sheets. With our model we could characterize in detail the free energy of the folding process, and we showed that each of the free energy landscapes has a global free energy minimum near the target structures. Moreover, the landscapes are relatively smooth indicating that our designed proteins can spontaneously fold without remaining trapped for long time in metastable states.
The caterpillar model provides a strong evidence to support our hypothesis that a minimum number of constraints is necessary in order to successfully perform protein design. By applying an accurate representation of the backbone we demonstrated that design and folding of real proteins is possible to a degree of accuracy that could not have been anticipated given the level of coarse-graining applied. To the best of our knowledge, a direct analysis of the importance of constraints for the design of protein like structures has never been done before. Our results, then not only extend protein design beyond lattice proteins but also further extend the important work of Maritan and co-workers [17], [18], [19] on the tube model. With the tube model, the authors showed that the protein structure universe is largely determined by the particular geometry imposed by the backbone, independently of the accuracy used to represent the amino acid pair interactions. With the caterpillar model we not only verify the results of Maritan and co-workers, but also we extend the function of the backbone geometry to the crucial role of enforcing the minimal set of constraints responsible for the protein design property.
It is important to stress that the three free parameters of the model have been adjusted only on the refolding ability of the designed sequences, and, as a result, the artificial sequences resemble real proteins in the hydrophilic/phobic profiles, and the folding of real sequences predicts the correct native structure with a surprising high accuracy. This last result suggests that it is possible to determine a universal set of values for the parameters valid for all proteins, which we intend to make the center of further investigation. Moreover, given its computational efficiency, we anticipate that the caterpillar model will be useful for studying other important aspects of protein behaviour such as folding, misfolding and aggregation. Especially considering that, thanks to the high detail of the backbone, the results of our model can be easily integrated in full atomistic simulations by adding the side chains of each amino acid.
Supporting Information
Acknowledgments
The author acknowledges the help provided by Prof M. Vendruscolo with enlightening discussions, and numerous suggestions from Prof A. Cacciuto, Dr B. Capone, Dr M. Miller and Prof C. Dellago to increase the quality of the manuscript. The figures were made with VMD/NAMD/BioCoRE/JMV/other software support. VMD/NAMD/BioCoRE/JMV/ is developed with NIH support by the Theoretical and Computational Biophysics group at the Beckman Institute, University of Illinois at Urbana-Champaign [37].
Footnotes
Competing Interests: The author has declared that no competing interests exist.
Funding: The author acknowledges financial supported by the Austrian Science Foundation (FWF) within the SFB ViCoM (F 41), and Intra-European mobility grant FP6 Project No.: 24496. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1.Fersht A. New York: W. H. Freeman & Co; 1999. Structure and Mechanism in Protein Science. [Google Scholar]
- 2.Anfinsen CB. Principles that govern folding of protein chains. Science. 1973;181:223–230. doi: 10.1126/science.181.4096.223. [DOI] [PubMed] [Google Scholar]
- 3.Dobson C, Šali A, Karplus M. Protein folding: A perspective from theory and experiment. Angewandte Chemie International Edition. 1998;37:868–893. doi: 10.1002/(SICI)1521-3773(19980420)37:7<868::AID-ANIE868>3.0.CO;2-H. [DOI] [PubMed] [Google Scholar]
- 4.Dunker AK, Brown C, Obradovic Z. Identification and functions of usefully disordered proteins. Advances in Protein Chemistry Volume. 2002;62:25–49. doi: 10.1016/s0065-3233(02)62004-2. [DOI] [PubMed] [Google Scholar]
- 5.Fersht A, Daggett V. Protein folding and unfolding at atomic resolution. Cell. 2002;108:573–582. doi: 10.1016/s0092-8674(02)00620-7. [DOI] [PubMed] [Google Scholar]
- 6.Karplus M, Kuriyan J. Molecular dynamics and protein function. P Natl Acad Sci USA. 2005;102:6679–6685. doi: 10.1073/pnas.0408930102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Das R, Baker D. Macromolecular modeling with rosetta. Annu Rev Biochem. 2008;77:363–382. doi: 10.1146/annurev.biochem.77.062906.171838. [DOI] [PubMed] [Google Scholar]
- 8.Mirny L, Shakhnovich E. Evolutionary conservation of the folding nucleus. J of Mol Bio. 2001;308:123–129. doi: 10.1006/jmbi.2001.4602. [DOI] [PubMed] [Google Scholar]
- 9.Onuchic JN, Wolynes PG. Theory of protein folding. Curr Opin Struc Biol. 2004;14:70–75. doi: 10.1016/j.sbi.2004.01.009. [DOI] [PubMed] [Google Scholar]
- 10.Tozzini V. Coarse-grained models for proteins. Curr Opin Struc Biol. 2005;15:144–150. doi: 10.1016/j.sbi.2005.02.005. [DOI] [PubMed] [Google Scholar]
- 11.Go N, Taketomi H. Respective roles of short-range and long-range interactions in protein folding. P Natl Acad Sci USA. 1978;75:559–563. doi: 10.1073/pnas.75.2.559. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Bryngelson JD, Wolynes PG. Spin-glasses and the statistical-mechanics of protein folding. P Natl Acad Sci USA. 1987;84:7524–7528. doi: 10.1073/pnas.84.21.7524. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Coluzza I, Muller H, Frenkel D. Designing refoldable model molecules. Phys Rev E. 2003;68:046703. doi: 10.1103/PhysRevE.68.046703. [DOI] [PubMed] [Google Scholar]
- 14.Coluzza I, Frenkel D. Monte carlo study of substrate-induced folding and refolding of lattice proteins. Biophys J. 2007;92:1150–1156. doi: 10.1529/biophysj.106.084236. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Coluzza I, Frenkel D. Designing specificity of protein-substrate interactions. Phys Rev E. 2004;70:051917. doi: 10.1103/PhysRevE.70.051917. [DOI] [PubMed] [Google Scholar]
- 16.Rothlisberger D, Khersonsky O, Wollacott AM, Jiang L, Dechancie J, et al. Kemp elimination catalysts by computational enzyme design. Nature. 2008;453:190–U4. doi: 10.1038/nature06879. [DOI] [PubMed] [Google Scholar]
- 17.Maritan A, Micheletti C, Trovato A, Banavar JR. Optimal shapes of compact strings. Nature. 2000;406:287–290. doi: 10.1038/35018538. [DOI] [PubMed] [Google Scholar]
- 18.Hoang T, Trovato A, Seno F, Banavar J, Maritan A. Geometry and symmetry presculpt the free-energy landscape of proteins. P Natl Acad Sci USA. 2004;101:7960–7964. doi: 10.1073/pnas.0402525101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Magee JE, Victor V, Lue L. Helical structures from an isotropic homopolymer model. Phys Rev Lett. 2006;96:2078028. doi: 10.1103/PhysRevLett.96.207802. [DOI] [PubMed] [Google Scholar]
- 20.Hoang T, Marsella L, Trovato A, Seno F, Banavar J, et al. Common attributes of native-state structures of proteins, disordered proteins, and amyloid. P Natl Acad Sci USA. 2006;103:6883–6888. doi: 10.1073/pnas.0601824103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Berman H, Westbrook J, Feng Z, Gilliland G, Bhat T, et al. Announcing the worldwide protein data bank. Nuc Acid Res. 2000;28:235–242. doi: 10.1093/nar/28.1.235. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Creighton . W.H.Freeman & Co Ltd 2nd Revised edition; 1992. Proteins. [Google Scholar]
- 24.Betancourt M, Thirumalai D. Pair potentials for protein folding: Choice of reference states and sensitivity of predicted native states to variations in the interaction schemes. Protein Science. 1999;8:361–369. doi: 10.1110/ps.8.2.361. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Irbäck A, Sjunnesson F, Wallin S. Three-helix-bundle protein in a ramachandran model. P Natl Acad Sci USA. 2000;97:13614–13618. doi: 10.1073/pnas.240245297. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Dolittle R. Springer; 1989. In Predictions of Protein Structure and the Principles of Protein Conformation. pp. 599–623. [Google Scholar]
- 27.Shakhnovich E, Gutin A. Engineering of stable and fast-folding sequences of model proteins. P Natl Acad Sci USA. 1993;90:7195–7199. doi: 10.1073/pnas.90.15.7195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Deutsch J, Kurosky T. New algorithm for protein design. Phys Rev Lett. 1996;76:323–326. doi: 10.1103/PhysRevLett.76.323. [DOI] [PubMed] [Google Scholar]
- 29.Seno F, Trovato A, Banavar JR, Maritan A. Maximum entropy approach for deducing amino acid interactions in proteins. Phys Rev Lett. 2008;100:1–4. doi: 10.1103/PhysRevLett.100.078102. [DOI] [PubMed] [Google Scholar]
- 30.Frenkel D, Smit B. Accademic Press; 2002. Understanding Molecular Simulations.389 [Google Scholar]
- 31.Vendruscolo M. Modified configurational bias monte carlo method for simulation of polymer systems. J Chem Phys. 1997;106:2970–2976. [Google Scholar]
- 32.Wilke C, Bloom J, Drummond D, Raval A. Predicting the tolerance of proteins to random amino acid substitution. Biophys J. 2005;89:3714–3720. doi: 10.1529/biophysj.105.062125. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Gutin A, Shakhnovich E. Ground-state of random copolymers and the discrete random energy-model. J Chem Phys. 1993;98:8174–8177. [Google Scholar]
- 34.Coluzza I, Frenkel D. Virtual-move parallel tempering. ChemPhysChem. 2005;6:1779–1783. doi: 10.1002/cphc.200400629. [DOI] [PubMed] [Google Scholar]
- 35.Frenkel D. Speed-up of monte carlo simulations by sampling of rejected states. P Natl Acad Sci USA. 2004;101:17571–17575. doi: 10.1073/pnas.0407950101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Abeln S, Frenkel D. Accounting for protein-solvent contacts facilitates design of non- aggregating lattice proteins. Biophys J. 2011;100:693–700. doi: 10.1016/j.bpj.2010.11.088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Humphrey W, Dalke A, Schulten K. Vmd - visual molecular dynamics. J Molec Graphics. 1996;14:33–38. doi: 10.1016/0263-7855(96)00018-5. [DOI] [PubMed] [Google Scholar]
- 38.Protein Data Bank website. Available: www.pdb.org. Accesed 2011 May 20.
- 39.See supplementary informations about the methods.
- 40.The values of RMSD have been calculated using “RMSD Calculator” in VMD excluding the first and the last 5 residues.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.