Skip to main content
PLOS Computational Biology logoLink to PLOS Computational Biology
. 2012 Apr 26;8(4):e1002501. doi: 10.1371/journal.pcbi.1002501

Are Long-Range Structural Correlations Behind the Aggregration Phenomena of Polyglutamine Diseases?

Mahmoud Moradi, Volodymyr Babin, Christopher Roland, Celeste Sagui 1,*
Editor: James M Briggs2
PMCID: PMC3343152  PMID: 22577357

Abstract

We have characterized the conformational ensembles of polyglutamine Inline graphic peptides of various lengths Inline graphic (ranging from Inline graphic to Inline graphic), both with and without the presence of a C-terminal polyproline hexapeptide. For this, we used state-of-the-art molecular dynamics simulations combined with a novel statistical analysis to characterize the various properties of the backbone dihedral angles and secondary structural motifs of the glutamine residues. For Inline graphic (i.e., just above the pathological length Inline graphic for Huntington's disease), the equilibrium conformations of the monomer consist primarily of disordered, compact structures with non-negligible Inline graphic-helical and turn content. We also observed a relatively small population of extended structures suitable for forming aggregates including Inline graphic- and Inline graphic-strands, and Inline graphic- and Inline graphic-hairpins. Most importantly, for Inline graphic we find that there exists a long-range correlation (ranging for at least Inline graphic residues) among the backbone dihedral angles of the Q residues. For polyglutamine peptides below the pathological length, the population of the extended strands and hairpins is considerably smaller, and the correlations are short-range (at most Inline graphic residues apart). Adding a C-terminal hexaproline to Inline graphic suppresses both the population of these rare motifs and the long-range correlation of the dihedral angles. We argue that the long-range correlation of the polyglutamine homopeptide, along with the presence of these rare motifs, could be responsible for its aggregation phenomena.

Author Summary

Nine neurodegenerative diseases are caused by polyglutamine (polyQ) expansions greater than a given threshold in proteins with little or no homology except for the polyQ regions. The diseases all share a common feature: the formation of polyQ aggregates and eventual neuronal death. Using molecular dynamics simulations, we have explored the conformations of polyQ peptides. Results indicate that for Inline graphic peptides (i.e., just above the pathological length for Hungtington's disease), the equilibrium conformations were found to consist primarily of disordered, compact structures with a non-negligible Inline graphic-helical and turn content. We also observed a small population of extended structures suitable for forming aggregates. For peptides below the pathological length, the population of these structures was found to be considerably lower. For longer Inline graphic peptides, we found evidence for long-range correlations among the dihedral angles. This correlation turns out to be short-range for the smaller polyQ peptides, and is suppressed (along with the extended structural motifs) when a C-terminal polyproline tail is added to the peptides. We believe that the existence of these long-range correlations in above-threshold polyQ peptides, along with the presence of rare motifs, could be responsible for the experimentally observed aggregation phenomena associated with polyQ diseases.

Introduction

Polyglutamine (polyQ) diseases involve a set of nine late-onset progressive neurodegenerative diseases caused by the expansion of CAG triplet sequence repeats [1]. These repeats result in the transcription of proteins with abnormally long polyQ inserts. When these inserts expand beyond a normal repeat length, the affected proteins form toxic aggregates [2] leading to neuronal death. PolyQ aggregation takes place through a complex multistage process involving transient and metastable structures that occur before, or simultaneously, with fibril formation [3][9]. Experimental findings suggest that the therapeutic target for polyQ diseases should be the soluble oligomeric intermediates, or the conformational transitions that lead to them [9], [10], and not the insoluble ordered fibrils. These findings, common to all amyloid diseases [11], have spurred efforts to understand the structural attributes of soluble oligomers and amyloidogenic precursors.

The free energy landscapes of polyQ aggregates display countless minima of similar depth that correspond to a great variety of metastable and/or glassy states. The aggregation kinetics of pure polyQ have been described as a nucleation-growth polymerization process [4][6], [12], where soluble expanded glutamine requires a considerable time lag for the creation of a critical nucleus, which then readily converts into a sheet in the presence of a template [13]. However, the “time lag” seems to properly be associated with the formation of the fully aggregated precipitates, since soluble aggregates – sometimes called “protofibrils” – that form during the putative lag phase have been reported [14], [15]. The variety of polyQ soluble and insoluble aggregates might correlate with the conformational flexibility of monomeric (non-aggregate single-chain) polyQ regions, which are influenced by the conformations of neighboring protein regions [4], [16][18]. One striking example of this conformational wealth – and still a source of controversy– is given by the polyQ expansion in the N-terminal of the huntingtin protein that is encoded in the exon 1 (EX1) of the gene. The N-terminal amino acid sequence consists of a seventeen, mixed residue sequence, the polyQ region of variable length, two polyproline regions of 11 and 10 residues separated by a region of mixed residues, and a C-terminal sequence. Toxicity develops after the polyQ expansion exceeds a threshold of approximately 36 repeats, leading to Huntington's disease. The flanking sequences have been shown to play a structural role in polyQ sequences, both in synthetic and natural peptides, and both in monomeric or aggregate form [4], [16], [17], [19]. In particular, a polyproline (polyP) region immediately adjacent to the C-terminal of a polyQ region has been shown to affect the conformation of the polyQ region; the resulting conformations depend on the lengths of both the polyQ and polyP sequences [16], [17], [20], [21].

In this work, we set out to obtain a conceptual and quantitative understanding of the role played by a polyP sequence that is placed at the C-terminal of a polyQ peptide, which is relevant for the understanding of the behavior of the EX1 segment in the huntingtin protein. Sedimentation aggregation kinetics experiments [17] show that the introduction of a Inline graphic sequence C-terminal to polyQ in synthetic peptides decreases both the rate of formation and the apparent stability of the associated aggregates. The polyP sequence can be trimmed to Inline graphic without altering the suppression effect, but a Inline graphic sequence is ineffective. There are no effects when the polyP sequences are attached to the N-terminal or via a side-chain tether [17]. These experiments were complemented with CD spectra for monomeric peptides, where the presence of polyP at the C-terminal of Inline graphic showed remarkable changes in the spectra. Analysis of their data led the authors to propose that addition of the C-terminal Inline graphic sequence does not alter the aggregation mechanism, which is nuclefated growth by monomer addition with a critical nucleus of 1 monomer (for Inline graphic), but destabilizes both the Inline graphic-helical and the (still unknown) aggregation-competent conformations of the monomer. These experimental results were unexpected: although a single proline residue interrupting an amyloidogenic sequence can decrease the propensity of that sequence to aggregate [22], [23], Pro replacements in amyloidogenic sequences placed in turns or disordered regions do not alter the aggregate core [23].

Here, we consider monomeric polyQ and polyQ-polyP chains, and quantify changes brought about in the conformations of the polyQ sequences by the addition of the polyP sequences at their C-terminal. In order to assess these changes, one must first characterize the conformation of pure monomeric polyQ in water. Wildly diverse conformations have been postulated experimentally for monomeric polyQ, including a totally random coil, Inline graphic-sheet, Inline graphic-helix, and PPII structures. At present there is growing experimental evidence that single polyQ chains are mainly disordered [6], [13][15]. The solvated polyQ disorder, however, is different from a total random coil or a protein denatured state. In particular, atomic X-ray experiments [18] show that single chains of polyQ (in the presence of flanking sequences) present isolated elements of Inline graphic-helix, random coil and extended loop. Single-molecule force-clamp techniques were used to probe the mechanical behavior of polyQ chains of varying lengths spanning normal and diseased polyQ expansions [24]. Under the application of force, no extension was observed for any of the polyQ constructs. Further analysis led the authors to propose that polyQ chains collapse to form a heterogeneous ensemble of globular conformations that are mechanically stable.

Simulations results for the monomer conformation have also been contradictory [25][31]. It is interesting that in the search for soluble prefibrillar intermediates, an Inline graphic-sheet was proposed to play a role in polyQ toxicity [32], [33]. In these molecular dynamics simulations, polyQ monomers of various lengths were found to display transient Inline graphic-strands of four residues or less. The authors proposed that fibril formation in polyQ may proceed through Inline graphic strands intermediates [33]. More recently, a molecular dynamics study of hexamers of Inline graphic in explicit water showed that Inline graphic-sheet aggregates are very stable (more stable than Inline graphic-sheets) [34]. These results strongly support the idea that Inline graphic-sheet may either be a stable, a metastable, or at least a long-lived transient, secondary structure of polyQ aggregates. Coming back to the monomeric polyQ conformation, further simulation evidence [35][38] supports the experimental findings that monomeric polyglutamine of various lengths is a disordered statistical coil in solution. The disorder is inherently different from that of denatured proteins and the average compactness and magnitude of conformational fluctuations increase with chain length [35]. In addition, the coils may present considerable Inline graphic-helical content [38], but there are acute entropic bottlenecks for the formation of Inline graphic-sheets.

The molecular dynamics results presented here for single polyQ and polyQ-PolyP chains consisting of Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic, and Inline graphic glutamine residues are in qualitative agreement with the experimental and simulation results mentioned above: polyQ is primarily disordered, with non-negligible Inline graphic-helical content and a small population of other secondary structures including both Inline graphic and Inline graphic strands. The addition of polyP reduces the population of the Inline graphic region of Ramachandran plot [39], and increases the population of Inline graphic and PPII Ramachandran regions for all PolyQ lengths. If one considers secondary structure motifs (i.e., hydrogen-bonds patterns in addition to dihedral angles), the addition of the polyP segment increases the populations of the PPII helices and turns, and decreases the Inline graphic-helical content of all peptides but Inline graphic (which may have a protective effect against aggregation, as discussed later). The addition of polyP does not change the average radius of gyration of polyQ, but changes the radius of gyration distribution function for Inline graphic, that becomes dependent on the prolyl bond isomerization state. Most importantly, the addition of polyP decreases the population of small Inline graphic and Inline graphic strands, and Inline graphic and Inline graphic hairpins.

Since the extended strands and hairpins in both Inline graphic and Inline graphic forms are found only in a small fraction of the structures, we used a novel statistical measure based on the odds ratio construction [40] to quantify to study the secondary structural propensities [41], [42], thereby learning about the possibility of the growth of such secondary structures under nucleation conditions. This study, also supported by more conventional linear correlation analysis, provides evidence that among all the peptides studied here, only Inline graphic exhibits a long-range correlation between all glutamine residue pairs that favors formation of both Inline graphic and Inline graphic-strands. This correlation is suppressed by the addition of only six proline residues to the C-terminal of the peptide, which suggests a mechanism in which nucleation starts at these scarcely populated secondary structures (mainly Inline graphic, Inline graphic, Inline graphic and Inline graphic strands, as well as Inline graphic-hairpins and Inline graphic-hairpins) and can only spread through positive correlations in polyQ peptides of approximately 40 residues or longer.

This paper is organized as follows. The Methods section details our simulation methodology and analysis. Specifically, we discuss the generalized Replica Exchange scheme used here for enhanced sampling, the simulation details, our clustering techniques to identify the Ramachandran regions and the secondary structural motifs, and the odds ratio construction, used here to study the correlations between residues. In the Results section, we present our results with a focus on a statistical analysis of the equilibrium conformations based on (i) Ramachandran regions (ii) secondary structure (iii) correlation analysis and (iv) radius of gyration. A discussion of our results and a short summary of this work is given in the last section.

Methods

In this section, we briefly describe the generalized replica exchange molecular dynamics [41][44] approach used to generate the equilibrium conformations. In addition, we describe our quantification of the secondary structural content, and review the odds ratio [40] construction for correlations between residues. For a more detailed description of our simulation methods and the clustering approach used to classify the secondary structure motifs of the peptides, please see the Supporting Information section.

Sampling Protocol

Room temperature, regular molecular dynamics (MD) simulations are often too computationally limited to carry out a full sampling of the conformational space of a biomolecular system and generate a reliable statistical ensemble. Thus, in order to deal with the sampling issue, we make use of a replica exchange scheme [43], [45]. In the replica exhange molecular dynamics (REMD) [43], [46] method, one considers several replicas of a system subject to some sort of ergodic dynamics based on different Hamiltonians, and attempts to exchange the trajectories of these replicas at a predetermined rate to increase the barrier crossing rates (i.e., decrease the ergodic time scale). One possibility is to successively increase the temperatures of the replicas [46]. This method, known as parallel tempering, is here referred to as Temperature REMD (T-REMD). Another possibility [43] is to construct the replicas by adding a biasing potential to the original Hamiltonian that acts on some collective variable that describes the slow modes of the system that need “acceleration”. This method can be referred to as Hamiltonian REMD (H-REMD). In practice, T-REMD is used to promote the barrier crossing events in a generic way but the use of H-REMD allows one to directly focus on specific slow modes of the system, such as the cis-trans isomerization of proline amino acids which involves a barrier of 10 to 20 Kcal/mol [47]. A combination of the two methods, known as Hamiltonian-Temperature REMD (HT-REMD) [41][44] provides for a practical way to reduce the computational costs associated with REMD sampling, since it facilitates the sampling by both means.

In this work, we used the T-REMD and HT-REMD methods for polyQ and polyQ-polyP peptides, respectively. In the T-REMD method, one replica runs at room temperature and the rest of the replicas run at higher temperatures. Care must be taken with respect to the choice of the number of replicas and their temperatures. The performance of the setting can be checked by monitoring the exchange rate between the neighboring replicas (i.e., with closest temperatures) as well as the ergodic time scale of the “hottest” replica. The equilibrium conformational ensemble is then generated by taking the structures at a predetermined rate from the trajectory of the replica at the lowest (room) temperature.

In the HT-REMD method, the replicas have different biasing potentials. The biasing potential is usually described in terms of a collective variable Inline graphic, defined as a smooth function of the atomic positions Inline graphic. The corresponding free energy or potential of mean force (PMF) [48], Inline graphic (where the angular brackets denote the equilibrium ensemble average), provides for an ideal biasing potential. Indeed, if the biasing potential is exactly Inline graphic, then the probabilities of different values of the collective variable would all be equal, since there are no barriers present. Although the true free energy Inline graphic is typically unknown in advance, a roughly approximate Inline graphic is often sufficient to improve the sampling considerably in an H-REMD or HT-REMD setting. Such free energies can be computed in a variety of ways [48]. For the polyQ-polyP systems, some of the slow modes originate in the cis-trans isomerization of the prolyl bonds, that occur when polyproline is in solution. We have recently carried out extensive work on proline-rich systems [41], [42], [44], [47], [49] and can take advantage of the free energy profiles previously obtained for polyproline of various lengths [44], calculated using the Adaptively Biased Molecular Dynamics (ABMD) [50], [51] method. The ABMD method is an umbrella sampling method with a time-dependent biasing potential, which can be used in conjunction with the REMD protocol, by combining different collective variables and/or temperatures on a per-replica basis [43], [50]. Currently, the ABMD method has been implemented into the AMBER v.10,11 simulation package [52]. Details of the calculation of the polyproline potentials are given elsewhere [41], [42], [44], [47].

The HT-REMD simulations proceeded in several stages. We recycled the previously computed free energies associated with a collective variable that “captures” the cis-trans transitions of the prolyl bonds of polyproline peptides of different lengths in implicit water at different temperatures.

The collective variable used for these calculations is defined based on the backbone dihedral angle Inline graphic of prolyl bonds, Inline graphic (here sum runs over all the prolyl bonds Inline graphic). The dihedral angle Inline graphic takes the values around Inline graphic and Inline graphic for cis and trans conformations, therefore Inline graphic can “capture” different patterns of the cis/trans conformations in any proline-containg peptide. The biasing potentials, transfered from our previous calculations were then refined for the polyQ-polyP peptides using similar simulation settings. Next, several additional replicas running at the lowest temperature Inline graphic were introduced into the setup. One of these replicas is completely unbiased, and therefore samples the Boltzmann distribution at Inline graphic. The other replicas, also at Inline graphic, are subject to a reduced bias (i.e., these biasing potentials are scaled down by a constant factor). The purpose of these “proxy” replicas is to ensure adequate exchange rates between the conformations, and thereby enhance the mixing [43]. Data was then taken from the unbiased replica at a suitable, predetermined rate.

Simulation Details

Simulations were carried out for the peptides with sequence Inline graphic (denoted as Inline graphic) and Inline graphic (denoted as Inline graphic). These peptides include Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic, and Inline graphic. In each case, we refer to the Inline graphic glutamine and Inline graphic proline residues as Inline graphic and Inline graphic, respectively. The simulations were carried out using the AMBER [52] simulation package with the ff99SB version of the Cornell et al force field [53] with an implicit water model based on the Generalized Born approximation (GB) [54], [55] including the surface area contributions computed using the LCPO model [56] (GB/SA). For more simulation details, our implementation of the REMD scheme and a discussion of convergence issues, please see the Supporting Information (Text S1).

Secondary Structure

We used the (Inline graphic, Inline graphic) dihedral angles (see Fig. 1 for their definition) to identify different regions [57] of the Ramachandran map [39]. Table 1 provides the corresponding definition for these regions. Although this delineates clear regions for the dihedrals of most residues, it turns out that the populations may overlap around the borders. In order to handle this situation, we used a clustering technique as explained in the Supporting Information (Text S1) to classify the conformations, rather than strictly enforcing the sharp boundaries between the defined regions.

Figure 1. (a) Schematic of amino acid backbone dihedrals .

Figure 1

Inline graphic and Inline graphic , and (b) a corresponding Ramachandran plot. In a typical Ramachandran plot of a glutamine residue, each pixel represents a Inline graphic bin, whose intensity represents its relative population, ranging from 1,2,Inline graphic,9, and 10 or more conformations, sampled in our simulations. Blue, yellow, grey, and pink clusters identify PPII, Inline graphic, Inline graphic, and Inline graphic regions, respectively.

Table 1. Secondary structure definitions.

Ramachandran regions
Inline graphic Inline graphic,Inline graphic
Inline graphic Inline graphic,Inline graphic
PPII Inline graphic,(Inline graphic or Inline graphic)
Inline graphic Inline graphic,(Inline graphic or Inline graphic) or Inline graphic,Inline graphic

For a detailed description see Methods .

Although the backbone dihedral angles of all the residues forming a right-handed Inline graphic-helix fall into the Inline graphic region of Ramachandran map, many of the residues in this region do not actually form Inline graphic-helices. As a matter of fact, several other secondary structural motifs, such as Inline graphic and Inline graphic helices as well as random coil and turn are characterized by or may involve backbone dihedral angles falling in the same region. An interesting example is provided by polyglutamine itself. It has been suggested recently [32][34] that an Inline graphic-sheet, whose backbone dihedral angles alternate between the Inline graphic and Inline graphic helical regions, can be a stable, metastable, or at least a long-lived transient secondary structure in oligomers.

In general, for a residue to be considered to belong to a given secondary structure, it is not enough to identify the Ramachandran region of its dihedral angles. Thus, we used the secondary structure prediction program DSSP [58], [59] that uses not only the backbone diheral angles, but also the inter-residual hydrogen bonding as well as the relative position of the CInline graphic atoms to identify secondary structural motifs. For our peptides, the DSSP secondary structures with highest probabilities were: (i) helices, including Inline graphic and Inline graphic types, (ii) turns, including H-bonded turns and bends, (iii) coils. There are also isolated residues involved in Inline graphic bridges and extended strands, participating in the Inline graphic ladders with small probabilities. Since DSSP does not specifically identify isolated Inline graphic or Inline graphic strands (i.e., strands not H-bonded to another strand of their type) or Inline graphic hairpins, we used a combination of H-bonding results from DSSP analysis and the Ramachandran regions from the clustering analysis to define Inline graphic and Inline graphic strands and hairpins. A Inline graphic strand is defined here as at least Inline graphic adjacent residues all falling into the Inline graphic region of Ramachandran plot. A Inline graphic strand is referred to as isolated if none of its Inline graphic residues is H-bonded. A Inline graphic hairpin is defined as two adjacent Inline graphic strands with a turn in between and at least one H-bond between the two strands. The turn between the two strands of a hairpin could be H-bonded or not and is of any length but it has to have the geometrical form of a turn, (i.e., identified as bend by DSSP). Each of the two strands has at least three adjacent residues in Inline graphic region to ensure the structure is relatively extended. At least one of these three Inline graphic residues are H-bonded to another Inline graphic residue in the other strand. We define an Inline graphic repeat as two adjacent residues, whose backbone dihedral angles alternate between Inline graphic and Inline graphic regardless of the order (i.e., this includes both Inline graphic and Inline graphic). An Inline graphic strand is formed from Inline graphic adjacent residues, involving Inline graphic alternating Inline graphic and Inline graphic repeats. In this definition, an Inline graphic strand is either Inline graphic or Inline graphic and an Inline graphic strand is either Inline graphic or Inline graphic but not Inline graphic. An isolated Inline graphic strand is defined as an Inline graphic strand not H-bonded to another strand, and the Inline graphic hairpin is defined as two adjacent Inline graphic strands with a turn in between and at least one H-bond between the two strands, similar to the Inline graphic hairpin. Another relatively extended secondary structure is PPII that is defined here as adjacent residues whose dihedral angles fall into the PPII region of Ramachandran plot. A PPIIInline graphic structure, is defined as a structure having Inline graphic adjacent PPII residues. A summary of these secondary structures is given in Table 1.

Finally, we determined the type of turn from both the DSSP analysis and our Ramachandran region clustering analysis. DSSP distinguishes between H-bonded turns and geometrical bends that do not involve any H-bonding. The DSSP analysis can be also used to identify Inline graphic and Inline graphic types based on the number of residues involved, which is 4 and 3 respectively. The dihedral angles of the two middle residues of Inline graphic turns (i.e. the second and the third residues) can be used to partition Inline graphic turns into more types such as I, I′, II, II′, etc. but we will only consider type I-Inline graphic that involves an Inline graphic sequence and the “other” type Inline graphic turns that involve other combinations of dihedral angles. Since the population of “other” combinations is relative small, we group these all together.

Odds Ratio

To quantify how the secondary structures of Gln residues influence each other we made use of the odds ratio (OR) construction [40][42]. The OR is a descriptive statistic that measures the strength of association, or non-independence, between two binary values. The OR is defined for two binary random variables (denoted as Inline graphic and Inline graphic) as:

graphic file with name pcbi.1002501.e206.jpg (1)

where Inline graphic is the joint probability of the Inline graphic event (with Inline graphic and Inline graphic taking on binary values of 0 and 1). For the purposes of this study, we can think of Inline graphic and Inline graphic as being some characteristic properties describing the conformations of different residues. For example, the variables could be assigned values of 1 or 0 depending on whether the backbone dihedral angles of corresponding residue falls into the Inline graphic region of Ramachandran plot or not. We denote this definition of OR as ORInline graphic. Similarly one can define Inline graphic based on the involvement of residues in Inline graphic repeats. In this case, to define the Inline graphic of two given residues Inline graphic and Inline graphic, the probabilities Inline graphic are defined such that the variables Inline graphic and Inline graphic take the values 1 or 0 depending on whether or not the corresponding residue is involved in an Inline graphic repeat as defined in the last subsection. For instance, Inline graphic if and only if residue Inline graphic either is in the Inline graphic region and is neighboring a residue in the Inline graphic region, or it is in the Inline graphic region and is neighboring a residue in the Inline graphic region. Note that in general, to calculate the Inline graphic of two residues, dihedral angles of not only the two residues but also their neighbors are needed, i.e., up to 6 residues could be involved.

The usefulness of the OR in quantifying the influence of one binary random variable upon another can be readily seen. If the two variables are statistically independent, then Inline graphic so that Inline graphic. In the opposite extreme case of Inline graphic (complete dependence) both Inline graphic and Inline graphic are zero, and the OR is infinite. Similarly, for Inline graphic Inline graphic rendering Inline graphic. To summarize, an OR of unity indicates that the values of Inline graphic are equally likely for both values of Inline graphic (i.e., Inline graphic, Inline graphic and Inline graphic are therefore independent); an OR greater than unity indicates that Inline graphic is more likely when Inline graphic (Inline graphic and Inline graphic are positively correlated), while an OR less than unity indicates that Inline graphic is more likely when Inline graphic (Inline graphic and Inline graphic are negatively correlated).

It is convenient to recast the log of the OR in terms of free energy language. If one expresses the probability of the Inline graphic events in terms of a free energy Inline graphic:

graphic file with name pcbi.1002501.e254.jpg (2)

then the ratio of probabilities Inline graphic translates into a free energy difference:

graphic file with name pcbi.1002501.e256.jpg (3)

Clearly, the logarithm of the OR then maps onto the difference of those differences, i.e.,

graphic file with name pcbi.1002501.e257.jpg (4)

For the case of statistically independent properties, Inline graphic; otherwise, this quantity takes on either positive or negative values, whose magnitude depends on the mutual dependence between the two variables. The standard error in its asymptotic approximation is:

graphic file with name pcbi.1002501.e259.jpg (5)

in which Inline graphic is the total number of independent Inline graphic events sampled. While this development may be perceived as purely formal, the use of an OR analysis couched in terms of free energy language provides for a useful and intuitive measure of the inter-residual correlations, as has been illustrated before [41], [42].

In this work, our OR-based correlation analysis is supported by the conventional linear correlation analysis. We have used the correlation coefficient (also know as cross-correlation or Pearson correlation) of Inline graphic dihedral angles of glutamine residues to measure the correlation of glutamine residues in different situations. We emphasize that in the context of secondary structural propensities, the odds ratio analysis is more powerful than the correlation coefficient since it eliminates the noise associated with the dihedral angles. This noise may dominate the linear correlation results such that even substantial correlations may be completely ignored. The OR-based correlation analysis, combined with the clustering technique explained here takes into account both nonlinearity and multivariate components of amino acid correlations in a peptide chain, although in some particular cases a conventional univariate linear correlation may reveal a correlation as we will report in the results. In the context of this paper, the multivariate component is particularly evident when the correlation of Inline graphic repeats is considered, since this may involve Inline graphic and Inline graphic angles of up to six residues for each single odds ratio calculation.

Results

We generated Inline graphic equilibrium structures of the Inline graphic and Inline graphic peptides, Inline graphic structures of Inline graphic, Inline graphic, and Inline graphic, and 10Inline graphic structures of Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic, and Inline graphic peptides at 300 K to compute the probabilities of different secondary structural motifs and thereby characterize the conformational ensemble of these peptides.

Here, we present our results in terms of (i) the regions of the Ramachandran map occupied by each individual glutamine residue, (ii) the secondary structures identified based not only by the backbone dihedral angles but also by the inter-residual hydrogen bonds and positions of the Inline graphic atoms, (iii) a correlation analysis on the dihedral angles of glutamine residues, and (iv) the ensemble distribution of the radius of gyration, describing the overall compactness of the structures. Figures 18 (and Figures S1, S2, S3) and Tables 23 (and Table S1) summarize these results.

Figure 2. Inline graphic, PPII and Inline graphic content of selected polyQ peptides.

Figure 2

Here, given are the contents (as a percentage) of individual glutamine residues found in: (a,b) Inline graphic-region (c,d) PPII-region (e,f) Inline graphic. These percentages are plotted against the Glu residue numbers for (a,c,e) Inline graphic [red], Inline graphic [blue] and (b,d,f) Inline graphic [red], Inline graphic [blue]. These percentages are obtained from clustering the conformations based on their dihedral angles in the Ramachandran plot.

Figure 3. Helical, turn and coil content of selected polyQ peptides.

Figure 3

Here, given are the contents (as a percentage) of individual glutamine residues found in the following conformations: (a,b) helical (Inline graphic,Inline graphic) (c,d) turn (H-bonded,bend) (e,f) coil. These percentages are plotted against the Glu residue numbers for (a,c,e) Inline graphic [red],Inline graphic[blue] and (b,d,f) Inline graphic [red], Inline graphic [blue]. These percentages are obtained from the DSSP [58], [59] analysis code.

Figure 4. Sample conformations of .

Figure 4

Inline graphic and Inline graphic . Cartoon representation of sample conformations of (a) Inline graphic and (b) Inline graphic. Purple, blue, cyan, and orange represent Inline graphic-helix, Inline graphic-helix, turn, and coil secondary structural motifs, respectively. The licorice-like representation of the proline segment of Inline graphic is given in (b). These structures are plotted by VMD [61] using STRIDE [60] for secondary structure prediction.

Figure 5. Selected extended conformations of .

Figure 5

Inline graphic peptides. Here, we give (a) cartoon and (b) licorice-like representation of select conformations of the Inline graphic peptide with (Inline graphic,Inline graphic,Inline graphic,Inline graphic) Inline graphic and (Inline graphic,Inline graphic,Inline graphic,Inline graphic) Inline graphic strands. (a) The coloring is similar to Fig. 4 with yellow and green representing Inline graphic and Inline graphic strands respectively. We used a dihedral angle-based algorithm to detect the Inline graphic strands and for other secondary structures in these plots we used STRIDE [60] distributed with VMD [61]. (b) The residues involved in (Inline graphic) Inline graphic-hairpin, (Inline graphic) isolated Inline graphic-strand, (Inline graphic) Inline graphic-harpin, and (iInline graphic) isolated Inline graphic-strand are highlighted. The rest of residues are grey and all the side chains are represented by thin lines.

Figure 6. Correlation analysis results for selected polyQ peptides.

Figure 6

Here is given the (a) odds ratio based Inline graphic between any two glutamine residues (Inline graphic and Inline graphic) of Inline graphic [red] and Inline graphic [blue] in terms of (Inline graphic). From each side of the peptide Inline graphic ending residues are omitted in the calculations to reduce the end effects. (b) Similar to (a) for Inline graphic [red], Inline graphic [blue], and Inline graphic [black]. Here Inline graphic residues from each end are omitted. (c,d) Correlation coefficient between Inline graphic dihedral angles of any two glutamine residues (Inline graphic and Inline graphic) in terms of (Inline graphic) for (c) Inline graphic [red], Inline graphic [blue] and (d) Inline graphic [red], Inline graphic [blue], and Inline graphic [black]. The end residues were omitted according to the same protocol used for odds ratio analysis. (e,f) Similar to (a,b) but with the odds ratio calculated using the probabilities that residues belong or not to an Inline graphic repeat region.

Figure 7. Correlation analysis results for selected polyQ peptides.

Figure 7

Specifically, we give Inline graphic for (a) Inline graphic (b) Inline graphic and (c) Inline graphic based on OR(Inline graphic)[red] OR(PPII)[blue] and OR(Inline graphic)[black]. (d) To compare the linear and OR-based results we plotted Inline graphic(r) versus the correlation coefficient corrInline graphic(r) for Inline graphic that suggests an almost linear behavior with a correlation coefficient of 0.97.

Figure 8. Distribution of radius of gyration of polyQ peptides.

Figure 8

(a) The estimated Inline graphic distribution for Inline graphic [red] and Inline graphic [blue]. (b) The estimated Inline graphic distribution for Inline graphic [red] and Inline graphic [blue]. The blue curve can be estimated as the sum [black] of three Gaussian distributions [dotted]. (c) The estimated Inline graphic distribution for Inline graphic, considering only the structures with an all-trans proline segment [green]. Similarly the green curve can be estimated as the sum [black] of four Gaussian distributions [dotted]. Considering only the structures that at least have one cis-proline results in the magenta curve for the Inline graphic distribution. All the histograms are obtained using a window of width Inline graphic. (d) The exponent Inline graphic in Inline graphic relation estimated from select pairs of Inline graphic (x axis) and Inline graphic (Inline graphic for blue circles and Inline graphic for yellow squares). Inset: The average Inline graphic (in Inline graphic) of QInline graphic peptides for Inline graphic.

Table 2. Secondary structure analysis of the polyQ peptides.

(a) Ramachandran regions (b) secondary structures (c) extended structures
peptide Inline graphic Inline graphic PPII Inline graphic Inline graphic helix turn other PPII Inline graphic-s Inline graphic-h Inline graphic-s Inline graphic-h
Inline graphic 87 5 5 3 7 30 23 47 6.5 (1.3) 7.1 (1.2) 1.1 42 (1.6) 1.9
Inline graphic 78 9 9 4 6 43 36 21 8.9 (3.3) 3.9 (0.1) 0.5 25 (0.1) 0.1
Inline graphic 80 8 9 3 7 37 32 31 7.3 (1.3) 4.2 (0.5) 0.5 19 (0.1) 0.7
Inline graphic 81 7 8 4 7 34 23 43 2.4 (0.3) 1.5 (0.5) 0.5 19 (0.1) 0.2
Inline graphic 72 13 12 3 6 14 23 63 7.3 (1.0) 0.9 (0.3) 0.1 15 (0.1) 0.1
Inline graphic 79 8 9 4 8 38 31 31 1.9 (0.2) 0.9 (0.2) 0.3 19 (0.8) 0.1
Inline graphic 70 14 12 4 6 26 38 36 2.3 (0.3) 1.6 (0.2) 0.1 8 (0.5) 0.0
Inline graphic 78 9 9 4 8 31 31 38 1.5 (0.2) 0.6 (0.1) 0.4 14 (0.0) 0.0
Inline graphic 68 15 11 6 10 23 51 26 1.1 (0.1) 1.1 (0.2) 0.6 17 (0.2) 0.0
Inline graphic 73 12 13 2 4 18 29 53 1.1 (0.2) 1.3 (0.2) 0.0 2 (0.0) 0.0
Inline graphic 67 17 13 3 8 10 50 40 1.3 (0.2) 1.4 (0.2) 0.0 2 (0.1) 0.0

Here, we give the (a) population (as a percentage) of the residues in the different Ramachandran regions (Inline graphic, Inline graphic, PPII, and Inline graphic), as well as the population of residues involved in Inline graphic repeats; (b) the population (as a percentage) of residues in different secondary structures (helix, turn, and other secondary structures); (c) the percentage of conformations having at least one PPII, Inline graphic, or Inline graphic extended secondary structures including isolated strands and hairpins. The isolated Inline graphic, Inline graphic, or Inline graphic (Inline graphic, Inline graphic, or Inline graphic) strands – identified in the table as PPII-s, Inline graphic-s, Inline graphic-s – are defined based on at least three (four) adjacent residues with the backbone dihedral angles falling into the region associated with these structures; and not involved in any inter-residual hydrogen bonding. Similarly a hairpin – identified in the table as PPII-h, Inline graphic-h, Inline graphic-h – is defined based on two adjacent strands of at least three residues with one or more hydrogen bonds between the two strands and a turn in between. For more details of this analysis, that is based on both DSSP [58], [59] and dihedral-based clustering, see Methods .

Table 3. Helix and turn populations of the polyQ peptides.

helical content turn content
helix type helical segments H-bonding turn type
peptide Inline graphic 3Inline graphic 0 1,2,3,4,5 H-bonded bend I-Inline graphic other Inline graphic Inline graphic
Inline graphic 23 7 31 3,16,27,18,4 15 7 18 1 4
Inline graphic 31 12 1 3,21,40,28,6 23 13 24 3 9
Inline graphic 28 9 11 15,37,30,6 22 10 23 2 7
Inline graphic 27 7 28 39,31,2 18 6 16 2 5
Inline graphic 10 4 61 25,13,1 13 10 12 2 9
Inline graphic 29 9 15 76,9 25 6 25 2 4
Inline graphic 20 6 30 66,4 23 15 28 3 7
Inline graphic 22 9 32 67 25 6 24 1 6
Inline graphic 15 8 48 52 31 20 37 4 10
Inline graphic 7 11 69 31 24 5 19 2 8
Inline graphic 3 7 82 18 25 25 36 4 10

The helical content is partitioned into Inline graphic- and Inline graphic-helix populations. The structures are also categorized based on the number of their helical segments. The population of each category (0,1,2,Inline graphic) is given if greater than Inline graphic%. The turn content is partitioned based on both the hydrogen-bonding and turn types. For the secondary structure prediction, the DSSP analysis code [58], [59] was used along with the protocols discussed in Methods .

Ramachandran Regions

Figure 1b shows the Ramachandran plot of a typical glutamine residue, for which the clusters in the different regions are computed according to the protocol described in the Methods section. Four clusters can be identified in these plots including PPII (blue), Inline graphic (yellow), Inline graphic (gray), and Inline graphic (pink). Figures S2 and S3 show the Ramachandran plots of all 40 glutamine residues of both Inline graphic and Inline graphic. Considering these, as well as similar plots for other peptides (not shown here), we observe the following trends: (i) The dominant region of most residues is the Inline graphic cluster that is present in all residues, except for the glutamines immediately followed by a proline, for which this region is precluded; (ii) PPII and Inline graphic clusters are present in almost all residues; (iii) The Inline graphic cluster is present in more than half of the residues but its population is often very small; (iv) Compared to Inline graphic, Inline graphic displays regions with higher non-Inline graphic intensities, particularly for the Inline graphic cluster (see Inline graphic, Inline graphic, Inline graphic, and Inline graphic).

Figure 2 plots the percent population of the Inline graphic, PPII, and Inline graphic regions of glutamine residues (top, middle and bottom rows, respectively) in terms of the residue number. The left column shows results for Inline graphic [red] and Inline graphic [blue] and the right column for Inline graphic [red] and Inline graphic [blue]. Table 2 presents the population of the different Ramachandran regions (averaged over all glutamine residues) and the Inline graphic repeats, the secondary structure motifs, and the “extended structures” including hairpins. The residue populations in the Ramachandran plot show that, on average, 67–87Inline graphic of the residues are in the Inline graphic region of the Ramachandran plot, 5–13Inline graphic of the residues are in the PPII region and 5–17Inline graphic of the residues are in the Inline graphic region. The PPII and Inline graphic regions are almost always equally probable, as can be seen in Figs. 2, S2, S3. The lowest population belongs to the Inline graphic region, comprising only 3–6Inline graphic although in certain residues it could be as high as 38% as, for instance, in Inline graphic in Inline graphic where the content of Inline graphic correlates with the presence of turns. The addition of PInline graphic decreases the population of the Inline graphic Ramachandran region and increases that of the Inline graphic and PPII regions, while leaving the small population of Inline graphic approximately invariant. In Inline graphic peptides, proline residues are excluded from the statistical analysis so that only Q residue propensities are compared (for instance, when we state that the average helical content of Inline graphic is 43%, it means that 43% of all Q residues are in a helix – the P residues are not counted in the statistic).

Figure 2 shows that the populations of the PPII and Inline graphic regions are always higher at the two ends of the polyQ peptides, particularly at the C-terminal. When a short proline segment is added at the C-terminal of polyQ, the population of these regions in the neighboring glutamines increases even more. For Inline graphic peptides shorter than Inline graphic (not shown here), the population of the PPII-Inline graphic region decreases in the middle of the peptide, but for Inline graphic (red line) we see a small peak in the middle of the peptide for both PPII and Inline graphic regions. In Inline graphic, we have two small peaks (rather than a single peak) centered around residues 13 and 25 for both the Inline graphic and PPII regions. The presence of the prolines at the C-terminal of a polyglutamine can drastically alter the population distribution. Fig. 2 shows that the few relatively wide peaks of the Inline graphic-PPII regions in both Inline graphic and Inline graphic are replaced by several narrow peaks of larger heights. Regarding the residues involved in Inline graphic repeats, one can see from Fig. 2e,f that the distribution of these repeats throughout these peptides depends both on the position of glutamine residues and the presence or absence of the C-terminal prolines although, as seen in Table 2, the average Inline graphic content is similar (6–7%) in all four peptides: Inline graphic, Inline graphic, Inline graphic, and Inline graphic. We note that the distribution of Inline graphic content in the peptide is mostly determined by the Inline graphic content as the Inline graphic content is abundant in these peptides and most Inline graphic residues are involved in an Inline graphic repeat. One can compare Fig. 2e,f with Figs. S2,S3 and observe similar behaviour, i.e., the residues with high Inline graphic content (Fig. 2e,f) have more intense Inline graphic clusters (pink clusters in Figs. S2,S3).

Secondary Structure

When one considers not only the backbone dihedral angles i.e., the (Inline graphic,Inline graphic) regions occupied by individual glutamine residues, but also the inter-residual hydrogen bonding and the relative positions of the Inline graphic atoms, one can identify different secondary structures, particularly Inline graphic-helical segments in many of the sampled conformations. Short Inline graphic helices are also possible but the majority of the residues are either in a turn or a coil conformations according to both DSSP [58], [59] and STRIDE [60] analysis. Figure 3 plots the helical, turn, and coil content of the individual glutamine residues against their residue numbers for Inline graphic, Inline graphic, Inline graphic, and Inline graphic. Figure 4 shows plots of select conformations of Inline graphic and Inline graphic peptides, as generated by VMD [61] using STRIDE [60] for the secondary structure assignment. Table 2 lists the population of helix, turn, and “other” secondary structures as obtained from DSSP, averaged over all residues. The “other” secondary structure category includes mainly what DSSP identifies as “loop or irregular” – sometimes called “coil” in other programs – but which may also include a very small population of other secondary structures such as extended Inline graphic strand and “isolated Inline graphic-bridge”. We use the protocols explained in Methods section to further identify these, as well as other extended structures (Tables 2 and 3).

When the population of residues in the Inline graphic region is compared to the actual helical content, one realizes that the majority of the residues in the Inline graphic region do not form Inline graphic or any other type of helices. Many of these residues in the Inline graphic region are followed and/or preceded by a residue in a different Ramachandran region, such as Inline graphic, as discussed in the previous subsection, forming an Inline graphic repeat. Similarly an Inline graphic repeat does not necessarily form an Inline graphic strand. Table 2 gives the population of the structures (or conformations) having at least one segment in one of the extended conformation forms, as defined in Methods section, including Inline graphic and Inline graphic strands either in the isolated form of length 3 (or length 4 in parenthesis) or in the hairpin form as well as PPII structures of length 3 (or length 4 in parenthesis). Note that unlike the other populations in part (a) and (b) in Table 2, the population of extended secondary structures in part (c) is not averaged over the residues. Instead, we counted all the conformations having at least one such secondary structures in the polyQ portion of the molecule and divided this number by the total number of sampled conformations. These structures are less common than helices or turns, but they are possible and form a small subpopulation of the secondary structures. Indeed, one can see that a non-negligible portion of the structures has at least one such segment. In particular, isolated Inline graphic strands are quite common, although they may simply be considered as part of a random coil. The isolated Inline graphic and PPII strands form the second most populated extended structures. Similarly, these structures may also be considered as part of a random coil. However Inline graphic, Inline graphic and Inline graphic strands form extended structures that are unlikely to be considered random coil elements. Figure 4 shows some examples of isolated and adjacent extended structures in both Inline graphic and Inline graphic forms.

Remarkably, among all the sequences presented here, Inline graphic has the highest percentage of extended structures. This peptide shows a significantly higher propensities for the extended structures, particularly the Inline graphic strands. The population of the structures having at least one Inline graphic-hairpin is almost 2%, and is higher than the number of structures having at least one Inline graphic-hairpin. However, the Inline graphic-hairpin rate is still the highest among all the peptides studied here. Adding the proline segment to the Inline graphic peptide reduces the chance of forming Inline graphic or Inline graphic extended structure dramatically, especially in the case of Inline graphic-hairpins and isolated strands of length four or more. However, PPII propensity is increased in the peptides of length Inline graphic by adding the proline segment.

Table 3 gives more details on the helices and turns observed in the polyQ and polyQ-polyP structures. The helices are found mostly in the right-handed Inline graphic form except for Inline graphic and Inline graphic that favor Inline graphic helices due to their short length. This Table also shows the percentage of helical segments present in a given peptide. A helical segment is defined as a series of residues adjacent in the sequence whose secondary structure has been identified as helical by DSSP. Thus helical segments can have varying lengths, and the table lists the number of helical segments (independent of their length). Thus, among Inline graphic conformations, 31% do not have any helical segment but when the prolines are added 99% form at least one helical segment (in particular, 40% of the structures in Inline graphic have 3 helical segments). The addition of PInline graphic to Inline graphic increases the helical content from 30% in Inline graphic to 43% in Inline graphic (the highest helical content in all peptides), while the addition of polyP decreases the helical content in all other peptides. Comparing Inline graphic and Inline graphic structures, the population of the structures having more than one helix increases.

The select Inline graphic and Inline graphic structures given in Fig. 4a,b illustrate various conformations, for which a statistical description is given in Figs. 2,3 and the Tables 23. In particular, the left column of Fig. 3 indicates that adding a polyP segment to Inline graphic reduces the helical content but increases the coil content (while the turn content stays the same). Instead, adding a polyP to Inline graphic (right column of Fig. 3) results in an increase of the helical content in the N-terminal of Inline graphic, farther away from the polyP segment. The addition of Inline graphic to Inline graphic increases not only the number of helical segments but also their length, particularly in the N-terminal half. The population of the structures having short helices (less than 7 residues) is very similar in Inline graphic (26%) and Inline graphic (27%) but 72% of Inline graphic conformations have longer helices (7 residues or more) as compared to only 43% in Inline graphic. Also 37% of the Inline graphic conformations have a helical segment longer than 9 residues while only 20% of Inline graphic conformations do.

Adding the polyP segment generally increases the turn content (both of Inline graphic and Inline graphic types), except for Inline graphic, where the total population of turns stays constant. The majority of turns are of I-Inline graphic type but there is a smaller population of other types of Inline graphic turns as well as Inline graphic turns. The increase in the Inline graphic-turn content of polyQ-polyP peptides can explain why adding the polyP to polyQ sometimes increases the Inline graphic content, as Inline graphic residues are involved in most of Inline graphic-turns. For instance, one finds more Inline graphic content in the residues of Inline graphic compared to Inline graphic but there are fewer residues in Inline graphic involved in Inline graphic repeats. There is no contradiction here as part of the Inline graphic content is involved in Inline graphic turns rather than Inline graphic-strands. Finally, Fig. 5 presents examples of (rare) extended conformations in the Inline graphic peptides. In particular, the figure shows Inline graphic hairpins and isolated strands, and Inline graphic hairpins and isolated strands.

Correlation Analysis

An odds ratio analysis based on the Ramachandran regions was conducted, and results summarized in Figures 6 and 7 for Inline graphic, Inline graphic, Inline graphic, Inline graphic, and Inline graphic peptides. We defined the OR as a function of sequence distance Inline graphic between two glutamine residues Inline graphic and Inline graphic. Inline graphic indicates an OR based on the Inline graphic region of Ramachandran plot. These figures display Inline graphic, for a better intuitive illustration. Inline graphic measures how the presence or absence of Inline graphic in the Inline graphic region can influence the presence or absence of Inline graphic in the Inline graphic region. Here, to reduce the end effects, Inline graphic only runs between Inline graphic and Inline graphic, with Inline graphic for Inline graphic and Inline graphic for Inline graphic.

In Fig. 6a, Inline graphic shows higher correlation Inline graphic for Inline graphic than Inline graphic. In other words, Inline graphic would have a greater chance of forming Inline graphic strands if the population of Inline graphic residues increases. However the correlation range between the Inline graphic residues in both Inline graphic and Inline graphic is about Inline graphic since for Inline graphic there is no significant deviation from Inline graphic, the expected value for independent events. This situation changes with polymer length. Inline graphic in Fig. 6b has a correlation length of about Inline graphic, after which it quickly loses correlation (it even becomes “anti-correlated”). Once again, Inline graphic exhibits unique behavior since Inline graphic does not decay to zero but oscillates around Inline graphic kcal/mol and more importantly, the oscillation does not seem to be damped by increasing Inline graphic (ignoring the smaller Inline graphic values). This indicates a long-range correlation between the glutamine residues of Inline graphic. (Oscillations can be seen for Inline graphic as well, but they are around zero).

The results of the OR analysis can be further confirmed by conducting a direct correlation analysis on the Inline graphic angles of the glutamine residues. We used the correlation coefficient (also known as cross-correlation or Pearson correlation) as a measure of linear correlation between the Inline graphic angles of Gln residues of sequence distance Inline graphic, using the same protocol explained above for odds ratio analysis (i.e., omitting the end residues) and verified the same unique behavior of Inline graphic. First, the Inline graphic dihedral angles were shifted Inline graphic degrees (with the assumption of periodic boundary condition at Inline graphic), then the correlation coefficient of Inline graphic of the residues with a sequence distance Inline graphic, corrInline graphic(r), was calculated. Note that this correlation measure does not involve any clustering and ignores any dependence on the Inline graphic dihedral angle, however, it confirms the OR predictions. Although in general both Inline graphic and Inline graphic angles are needed to identify the Ramachandran region of an amino acid, the linear correlation analysis on Inline graphic angles is still able to detect a long-range, positive correlation for Inline graphic (Figs. 6c,d).

An OR-based correlation analysis for Inline graphic is illustrated in Fig. 6e,f. Here, a residue is considered to be an Inline graphic residue if it is involved in an Inline graphic repeat. In the case of Inline graphic and Inline graphic there is an even shorter positive correlation range (compared to Inline graphic) for both peptides, with a significant negative correlation when Inline graphic increases. Inline graphic shows a somewhat similar oscillatory behavior around a non-zero average, with negative troughs. Note that the Pearson correlation coefficient cannot be used here for the Inline graphic analysis (in its univariate form) due to the fact that the definition of an Inline graphic repeat is highly dependent on the dihedral angles of both adjacent residues, involving four residues in the correlation analysis instead of two. The Inline graphic angles are also quite important for the Inline graphic/Inline graphic distinction.

Finally, Fig. 7 compares the behavior of OR-based Inline graphic in Inline graphic, Inline graphic, and Inline graphic peptides for Inline graphic. In Inline graphic there are differences between these different regions, but they all decay by increasing Inline graphic, as expected for short correlations. However, in Inline graphic we see an almost identical behaviour for all three Ramachandran regions. This clearly indicates that the dihedral angles of most of the glutamine residues are correlated in an indirect manner, influencing each other. We compared the Inline graphic of glutamine residues based on their distance Inline graphic and the correlation coefficients of their Inline graphic angles for Inline graphic. Fig. 7d shows that the two vary similarly for different Inline graphic and have a correlation coefficient of about 0.97, suggesting that OR and corr are linearly correlated.

In terms of the error estimate, we note that the estimated standard error for these calculations is different not only for different plots but also for different data points (varying by Inline graphic) in one plot. The latter is the result of having fewer samples with larger Inline graphic than shorter Inline graphic but the former is due to the difference between the population of secondary structures, the number of residues in each peptide, and the number of sampled conformations for each peptide. However, the standard error remains less than Inline graphic kcal/mol in most cases. In some exceptions in Fig. 6e,f the standard error could be as high as Inline graphic kcal/mol.

Radius of Gyration

Here we consider the statistical ensemble results concerning the radius of gyration and its distribution. The radius of gyration Inline graphic gives a simple and intuitive measure of the overall structure of the polyQ peptides as the collapsed (stretched) structures are associated with smaller (larger) values of Inline graphic. Table S1 gives the Inline graphic of the Inline graphic atoms of the Gln residues in Inline graphic and Inline graphic. The proline segments are not included in the calculation of Inline graphic so that the polyQ sequences are compared on equal footing. The averages are accompanied by the standard deviation that somewhat estimates the width of the distribution, if it is close to a normal distribution. The averages do not show much difference between Inline graphic and Inline graphic peptides. The standard deviation is also very similar between the two in most cases except for the case Inline graphic. Fig. 8a shows the Inline graphic distribution of Inline graphic [red] and Inline graphic [blue] peptides that is close to a normal distribution with a longer tail on the right as expected for a random-coil structure. Inline graphic is only slightly more compact. The normal distribution with a slightly longer tail as a characteristic distribution of random coil is seen for all of these peptides except for Inline graphic. Fig. 8b shows that although Inline graphic follows the same distribution, Inline graphic can be estimated as the sum of three distinct Gaussian distributions.

We used the Marquardt-Levenberg [62] algorithm to estimate the probability distribution of Inline graphic as the sum of three Gaussian distributions (see Fig. 8b), each representing one class of structures covering 24, 44, and 32Inline graphic of the samples distributed around an Inline graphic of 11.41, 13.65, and 17.08 Inline graphic, respectively. The fitting resulted in a reduced Inline graphic smaller than Inline graphic, indicating that this model explains the probability distribution of Inline graphic well. Examining the structures of each class shows that the Inline graphic segment is responsible for this clear difference between the three classes. The structures distributed around Inline graphic, accounting for almost one third of the samples, have relatively stretched conformations (see Fig. Inline graphic), and this correlates with the presence of all-trans prolyl bonds in Inline graphic. In these proline isomers, Inline graphic forms a rigid stretched helical segment, in contrast with a proline segment including one or more cis-isomers, particularly in the middle of the segment (see Fig. Inline graphic). Table S1 shows the trans content of each of the prolyl bonds of Inline graphic as well as the population of the Inline graphic isomers with all-trans prolyl bonds. There is a clear difference between Inline graphic and the rest of proline-containing peptides in terms of cis-trans isomerization. Although, 73–77% of the residues are in trans conformation in the shorter peptides, only 12–23% of the structures are all-trans. In Inline graphic 60% of the structures are stretched all-trans conformations. What is more interesting is that the distribution of radius of gyration is meaningfully different for the all-trans proline sub-ensemble as shown in Fig. 4c. Green curve is the Inline graphic distribution of this sub-ensemble and magenta curve is the Inline graphic distribution, obtained from the rest of the structures (i.e., cis-containing polyP). Here we somewhat recognize four normal distributions. We use a similar method as explained above to fit these Gaussians. We find four clusters with 6, 17, 29, and 48% of the population centered around Inline graphic11.02, 12.24, 13.94, and 17.27 respectively. The conclusion is that all-trans prolines increase the population of the stretched cluster considerably. This somewhat explains why we do not observe this partitioning of the clusters with proline segment in shorter peptides (see Fig. 8a) because in those cases the population of all-trans conformations is not large enough to affect the overall Inline graphic distribution.

As the peptides Inline graphic grow with residue number Inline graphic, their structure becomes more collapsed. In particular, the average radius of gyration for Inline graphic is only about 1.1 Å larger than for Inline graphic. The inset in Fig. 8d illustrates the dependence of the radius of gyration on the length of the peptide. Assuming Inline graphic one can estimate Inline graphic using any pair of peptides such as Inline graphic and Inline graphic from Inline graphic. Fig. 8d gives examples of the estimated Inline graphic for different pairs of Inline graphic and Inline graphic: Inline graphic is given by the indices in the x axis and Inline graphic is Inline graphic (cyan circles) or Inline graphic (yellow squares). There is an abrupt collapse of the structure (Inline graphic) on going from Inline graphic to Inline graphic.

Discussion

Our atomistic simulations show the disordered nature of monomeric polyglutamine peptides, in agreement with experimental conclusions [6], [13][15] and with previous all-atom MD simulations [35][38]. Our simulations are also in agreement with recent experiments [18] in that the monomeric polyQ is different from a total random coil or a protein denatured state, with a significant presence of short Inline graphic-helices. Therefore polyglutamine is a disordered peptide that is somewhat preorganized, containing short rigid segments [63], [64]. Contrary to certain coarse-grained models [27][29], [31], our atomistic simulations provide no evidence for a large Inline graphic content in monomeric polyglutamines.

We observed that the Inline graphic peptide forms an ensemble of mostly compact structures with an average radius of gyration only about 1.1 Å larger than that of Inline graphic. This agrees with the conclusions from single-molecule force-clamp experiments [24] that polyQ chains collapse to form a heterogeneous ensemble of globular conformations that are mechanically stable. For the radius of gyration of the shorter peptides, we observed an exponent Inline graphic slightly larger than that of a random-coil in a good solvent (i.e. about 0.6, [65]). However, we have not been able to simulate a large enough range of peptide sizes in order to get a good estimate of Inline graphic. This may not be necessary, since the simulations suggest that the radius of gyration does not follow a power law anyway (see Fig. 8d).

The addition of a short C-terminal proline segment to the Inline graphic peptide changes the distribution of the radius of gyration from a Gaussian-like function with a longer tail for larger Inline graphic – a characteristic of a random coil, seen also in all the other peptides studied here – to a combination of three distinct Gaussians. The way the proline segment affects the Inline graphic distribution is closely correlated with the cis-trans pattern of its prolyl bonds. An all-trans proline segment (the most common pattern in Inline graphic) results in the multi-modal distribution of Fig. 8. Instead, proline isomers with cis bonds are abundant in shorter peptides which results in the normal Inline graphic distribution. We note that prolyl bond isomerization requires crossing barriers of 10–20 kcal/mol, which can only be accomplished with special enhanced-sampling techniques such as used here [44], [47], [49].

The addition of the polyP segment to polyQ introduces position dependent features among the Gln residues. This is readily seen in Fig. 3. The fluctuations observed cannot be explained as “noise” resulting from sampling limitations. As explained in the previous section, sampling of independent data produces the same features, which suggests a sensitive dependence on the position of the residue in the sequence. Interestingly, polyP induces helix formation in the further residues in the N-terminal of Inline graphic, while creating more turns in the nearer Gln residues. As a result of the polyP addition, the overall Inline graphic-helical content of Inline graphic increases. This is in contrast with the shorter peptides in which the Inline graphic-helical content drops considerably by adding the polyP segment.

Experimentally, it has been claimed that the addition of polyP to polyQ decreases the Inline graphic-helical content of polyQ for all polyQ lengths [17]. A superficial comparison might indicate that this is in contradiction with our results for Inline graphic. Our results are, however, in agreement with the experimental data, which is based on the CD spectra of these peptides. These CD spectra identify the distribution of individual backbone dihedral angles rather than the actual Inline graphic-helical content, a quantity not only dependent on the individual residues but also the way they are aligned. Our simulations are in total agreement with this observation as we see a decrease in the population of the Inline graphic cluster (i.e., the residues falling into the Inline graphic region of Ramachandran plot) in all the peptides studied here, as we add a Inline graphic segment to the C-terminal (Table 2). As we have pointed out before [41], [42], care is needed in the interpretation of the CD data. Table 2 shows that the majority of the residues in the Inline graphic cluster are not involved in any form of helix in either polyQ or polyQ-polyP peptides, and while the helical content of all other peptides decreases, that of Inline graphic actually increases with the addition of Inline graphic. While this effect for Inline graphic cannot be ruled out as an defficiency of the force field, it is interesting to note that this would represent quite an effective way of neutralizing Inline graphic, since the rather stable Inline graphic helix will not be prone to aggregation.

In addition to Inline graphic and Inline graphic helices, as well as Inline graphic and Inline graphic turns, one can identify a small but non-negligible population of extended secondary structures of Inline graphic and Inline graphic strands, particularly in the Inline graphic peptides. PolyP increases the Inline graphic-region content in the Ramachandran plot, but decreases the Inline graphic-strand content (as explained before, several Inline graphic residues need to be adjacent in order to form a Inline graphic-strand). For Inline graphic, the addition of polyP dramatically decreases the content of Inline graphic, Inline graphic, Inline graphic and Inline graphic strands. On the other hand, relatively short PPII helices in polyQ form another extended secondary structure that happens to be more common in Inline graphic peptides than Inline graphic peptides for Inline graphic. The PPII strands do not form inter-residual hydrogen bonds (hairpins,sheets) and would not favor aggregation.

In this work we used an odds ratio analysis to quantify the dependencies among certain properties of the molecules. Regarding the Inline graphic-strand formation in Inline graphic, the graph for Inline graphic in Fig. 6 shows a positive, long-range correlation in sequence distance. In other words, the chances of two glutamine residues falling into the Inline graphic region of the Ramachandran map correlate positively with each other, even if they are distant in the sequence. This long range correlation was not seen in any other peptide but Inline graphic. Interestingly, this long-range correlation for the Inline graphic peptide is not limited to the Inline graphic-region but it is also seen in other regions such as Inline graphic and PPII. In particular, Inline graphic scales for the Inline graphic, Inline graphic and PPII regions as shown in Fig. 7. A linear correlation analysis on Inline graphic dihedral angle verifies the very same long-range correlation between glutamine residues of Inline graphic peptide, a correlation that is absent in other peptides studied here. This surprising phenomenon could be interpreted as the possibility of the growth of any of these secondary structures in the long polyQ peptides, especially if the conformation were “seeded” with a given secondary structure. In a polymeric form of polyglutamine, the nucleation of Inline graphic or Inline graphic strands could result in further growth of those strands or could induce growth in adjacent strands resulting in the the growth of Inline graphic or Inline graphic sheets. Interestingly, the “period” for the oscillations of Inline graphic is approximately 7–8 residues, which is also the optimal experimental extended chain length in an aggregate [7].

The populations of Inline graphic-strand, Inline graphic-strand, Inline graphic-hairpin, and Inline graphic-hairpin (Table 2) decrease and the long-range correlations Inline graphic and Inline graphic are disrupted by the presence of the C-terminal proline residues in Inline graphic. For shorter peptides, the corresponding populations are much lower, and the Inline graphic correlations are short-ranged. Taken together, these results indicate that for Inline graphic (but not for the shorter peptides) nucleation could start in one of these strands or hairpins (that can align two strands) and then grow from there, favored by the positive correlations generated by the longer peptide.

We can summarize the main findings of this work as follows:

  1. Monomeric Inline graphic peptide forms an ensemble of disordered, mostly compact structures with non-negligible Inline graphic helical content and other secondary structures, and with a very slow growth of the radius of gyration with the number of peptides for longer polyQ peptides. This is in agreement with previous experimental and simulation results [6], [13][15], [24], [35][38]. The average radius of gyration of Inline graphic is only about 1.1 Å larger than that of Inline graphic.

  2. The average radius of gyration for polyQ does not vary with the addition of polyP, but its distribution in Inline graphic is affected by the isomerization states of the polyP segment.

  3. For peptides of all lengths, the population of the Inline graphic region in the Ramachandran plot decreases while the populations of the Inline graphic and PPII Ramachandran regions increase with the addition of polyP.

  4. With respect to secondary structures (i.e., dihedrals angles and hydrogen bonds, the addition of polyP increases the PPII and turn contents, and decreases the helical content in all peptides but Inline graphic. These effects probably disfavor aggregation as PPII structures dislike backbone H-bonding, turns increase disorder, and the increase of helical content in Inline graphic may also disfavor aggregation as helices are quite stable, with all their H-bonds properly engaged.

  5. Although small, the populations of Inline graphic , Inline graphic , Inline graphic and Inline graphic strands, as well as Inline graphic -hairpins and Inline graphic -hairpins, are considerably larger for Inline graphic than for smaller peptides. These populations decrease when polyP is added. These small secondary structures are good candidates to initiate nucleation: the strands might “attract” other strands to hydrogen bond and the hairpins help to align two strands. Their suppression by the presence of polyP would disfavor aggregation.

  6. An odds-ratio based correlation function Inline graphic describes how the chances of two Gln residues of falling into a given region of the Ramachandran plot correlate. Only Inline graphic shows positive, long-range correlation in sequence space for various regions of the Ramachandran plot. The addition of polyP destroys this long-range correlation for Inline graphic and Inline graphic . In particular, Inline graphic scales for the Inline graphic, Inline graphic and PPII regions. Together with the results described in (6) above, this could be interpreted as the possibility of the growth of the Inline graphic or Inline graphic strands or hairpins already present in disordered Inline graphic (or longer polyQ peptides). Interestingly, the “period” for the oscillations of Inline graphic is approximately 7–8 residues, which is also the optimal experimental extended chain length in an aggregate [7]. A linear correlation analysis on Inline graphic dihedral angles confirms this period is a “universal” feature of correlations in long polyQ peptides.

Our careful statistical analysis has revealed a wealth of very subtle effects that are far from obvious. Secondary structures such as Inline graphic helices, Inline graphic-sheets, Inline graphic-sheets, PPII helices, and coils have all been reported in the literature. The picture that is emerging is that if one can induce the nucleation of one of these structures, or provide a template for it, a long enough polyQ polymer or an aggregate will probably continue growing in the given conformation, even if it is not the absolute thermodynamic minimum. In this sense, the wealth of conformations of polyQ is reminiscent of the different phases that appear in ‘inorganic’ systems with short-range attractive interactions and long-range electrostatics interactions such as Langmuir monolayers or block copolymers, where kinetics effects also play a fundamental role in determining the final phase of the system. PolyQ is a very special homopeptide due to its long side changes and the dipoles at the ends. The van der Waals packing of the side chains provides the source of short-range attractive interactions, while the carboxamide groups provide the long-range dipolar interactions [34]. In this sense, the only other peptide that would exhibit similar behavior is asparagine, with one methyl group less in its side chain [34]. The “collapsed” random coil would just represent the frustration between different phases.

Supporting Information

Figure S1

Inline graphic-helical content of Inline graphic and Inline graphic peptides. Here, we give (a,b) the Inline graphic-helical content (as a percentage) of individual glutamine residues plotted against their residue numbers for Inline graphic [red] and Inline graphic [blue] as obtained from the last 100 Inline graphic of two 200 Inline graphic long independent simulations; (c,d) The Inline graphic-helical content (as a percentage) of individual glutamine residues plotted against their residue numbers for Inline graphic [red] and Inline graphic [blue] as obtained from the third (c) and the fourth (d) 250 Inline graphic of 1000 Inline graphic REMD simulations.

(EPS)

Figure S2

Ramachandran plots of Gln residues in the Inline graphic peptide. On these plots, each pixel represents a Inline graphic bin, whose intensity represents its relative population, ranging from 1,2,Inline graphic, 49, and 50 or more samples out of Inline graphic conformations. Color scheme is as in Fig. 1.

(EPS)

Figure S3

Ramachandran plots of Gln residues in the Inline graphic peptide. See Figures 1 and S2 for the details.

(EPS)

Table S1

Radius of gyration and cis-trans isomerization.

(PDF)

Text S1

This text includes a description of our simulation details, secondary structure assignments, and radius of gyration analysis.

(PDF)

Acknowledgments

We thank the NC State HPC Center for extensive computational support.

Footnotes

The authors have declared that no competing interests exist.

This work was supported by the NSF grants FRG-0804549 and 1021883. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Zoghbi HY, Orr HT. Glutamine repeats and neurodegeneration. Ann Rev Neurosci. 2000;23:217–247. doi: 10.1146/annurev.neuro.23.1.217. [DOI] [PubMed] [Google Scholar]
  • 2.Davies SW, Turmaine M, Cozens BA, DiFiglia M, Sharp AH, et al. Formation of neuronal intranuclear inclusions underlies the neurological dysfunction in mice transgenic for the hd mutation. Cell. 1997;90:537–548. doi: 10.1016/s0092-8674(00)80513-9. [DOI] [PubMed] [Google Scholar]
  • 3.Michalik A, Van Broeckhoven C. Pathogenesis of polyglutamine disorders: aggregation re- visited. Hum Mol Genet. 2003;12:R173–186. doi: 10.1093/hmg/ddg295. [DOI] [PubMed] [Google Scholar]
  • 4.Scherzinger E, Lurz R, Turmaine M, Mangiarini L, Hollenbach B, et al. Huntingtin-encoded polyglutamine expansions form amyloid-like protein aggregates in vitro and in vivo. Cell. 1997;90:549–558. doi: 10.1016/s0092-8674(00)80514-0. [DOI] [PubMed] [Google Scholar]
  • 5.Scherzinger E, Sittler A, Schweiger K, Heiser V, Lurz R, et al. Self-assembly of polyglutamine-containing huntingtin fragments into amyloid-like fibrils: Implications for huntingtons disease pathology. Proc Natl Acad Sci U S A. 1999;96:4604–4609. doi: 10.1073/pnas.96.8.4604. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Chen S, Berthelier V, Yang W, Wetzel R. Polyglutamine aggregation behavior in vitro supports a recruitment mechanism of cytotoxicity. J Mol Biol. 2001;311:173–182. doi: 10.1006/jmbi.2001.4850. [DOI] [PubMed] [Google Scholar]
  • 7.Thakur AK, Wetzel R. Mutational analysis of the structural organization of polyglutamine aggregates. Proc Natl Acad Sci U S A. 2002;99:17014–17019. doi: 10.1073/pnas.252523899. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Wacker JL, Zareie MH, Fong H, Sarikaya M, Muchowski PJ. Hsp70 and Hsp40 attenuate formation of spherical and annular polyglutamine oligomers by partitioning monomer. Nat Struct Mol Biol. 2004;11:1215–1222. doi: 10.1038/nsmb860. [DOI] [PubMed] [Google Scholar]
  • 9.Nagai Y, Inui T, Popiel HA, Fujikake N, Hasegawa K, et al. A toxic monomeric conformer of the polyglutamine protein. Nat Struct Mol Biol. 2007;14:332–340. doi: 10.1038/nsmb1215. [DOI] [PubMed] [Google Scholar]
  • 10.Bodner RA, Outeiro TF, Altmann S, Maxwell MM, Cho SH, et al. Pharmacological promotion of inclusion formation: A therapeutic approach for Huntington's and Parkinson's diseases. Proc Natl Acad Sci U S A. 2006;103:4246–4251. doi: 10.1073/pnas.0511256103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Glabe CG, Kayed R. Common structure and toxic function of amyloid oligomers implies a common mechanism of pathogenesis. Neurology. 2006;66:S74–S78. doi: 10.1212/01.wnl.0000192103.24796.42. [DOI] [PubMed] [Google Scholar]
  • 12.Kar K, Jayaraman M, Sahoo B, Kodali R, Wetzel R. Critical nucleus size for disease-related polyglutamine aggregation is repeat-length dependent. Nat Struct Mol Biol. 2011;18:328–36. doi: 10.1038/nsmb.1992. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Chen S, Ferrone FA, Wetzel R. Huntington's disease age-of-onset linked to polyglutamine aggregation nucleation. Proc Natl Acad Sci U S A. 2002;99:11884–11889. doi: 10.1073/pnas.182276099. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Lee CC, Walters RH, Murphy RM. Reconsidering the mechanism of polyglutamine peptide aggregation. Biochemistry. 2007;46:12810–12820. doi: 10.1021/bi700806c. [DOI] [PubMed] [Google Scholar]
  • 15.Walters RH, Murphy RM. Examining polyglutamine peptide length: A connection between collapsed conformations and increased aggregation. J Mol Biol. 2009;393:978–992. doi: 10.1016/j.jmb.2009.08.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Nozaki K, Onodera O, Takano H, Tsuji S. Amino acid sequences flanking polyglutamine stretches influence their potential for aggregate formation. Neuroreport. 2001;12:3357–3364. doi: 10.1097/00001756-200110290-00042. [DOI] [PubMed] [Google Scholar]
  • 17.Bhattacharyya A, Thakur AK, Chellgren VM, Thiagarajan G, Williams AD, et al. Oligo-proline effects on polyglutamine conformation and aggregation. J Mol Biol. 2006;355:524–535. doi: 10.1016/j.jmb.2005.10.053. [DOI] [PubMed] [Google Scholar]
  • 18.Kim MW, Chelliah Y, Kim SW, Otwinowski Z, Bezprozvanny I. Secondary structure of huntingtin amino-terminal region. Structure. 2009;17:1205–1212. doi: 10.1016/j.str.2009.08.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Thakur AK, Jayaraman M, Mishra R, Thakur M, Chellgren VM, et al. Polyglutamine disruption of the huntingtin exon 1 n terminus triggers a complex aggregation mechanism. Nat Struct Mol Biol. 2009;16:380–389. doi: 10.1038/nsmb.1570. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Darnell GD, Orgel JP, Pahl R, Meredith SC. Flanking polyproline sequences inhibit beta-sheet structure in polyglutamine segments by inducing ppii-like helix structure. J Mol Biol. 2007;374:688–704. doi: 10.1016/j.jmb.2007.09.023. [DOI] [PubMed] [Google Scholar]
  • 21.Darnell GD, Derryberry J, Kurutz JW, Meredith SC. Mechanism of cis-inhibition of polyq fibrillation by polyp: Ppii oligomers and the hydrophobic effect. Biophys J. 2009;97:2295–2305. doi: 10.1016/j.bpj.2009.07.062. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Wood SJ, Wetzel R, Martin JD, Hurle MR. Prolines and amyloidogenicity in fragments of the alzheimers peptide beta/a4. Biochem. 1995;34:724–730. doi: 10.1021/bi00003a003. [DOI] [PubMed] [Google Scholar]
  • 23.Thakur AK, Wetzel R. Mutational analysis of the structural organization of polyglutamine aggregates. Proc Natl Acad Sci U S A. 2002;99:17014–17019. doi: 10.1073/pnas.252523899. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Dougan L, Li J, Badilla CL, Berne BJ, Fernandez JM. Single homopolypeptide chains collapse into mechanically rigid conformations. Proc Nat Acad Sci U S A. 2009;106:12605–12610. doi: 10.1073/pnas.0900678106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Starikov EB, Lehrach H, Wanker EE. Folding of oligoglutamines: a theoretical approach based upon thermodynamics and molecular mechanics. J Biomol Struct Dyn. 1999;17:409–427. doi: 10.1080/07391102.1999.10508374. [DOI] [PubMed] [Google Scholar]
  • 26.Burke MG, Woscholski R, Yaliraki SN. Differential hydrophobicity drives self-assembly in Huntington's disease. Proc Natl Acad Sci U S A. 2003;100:13928–13933. doi: 10.1073/pnas.1936025100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Barton S, Jacak R, Khare SD, Ding F, Dokholyan NV. The length dependence of the polyq-mediated protein aggregation. J Biol Chem. 2007;282:25487–25492. doi: 10.1074/jbc.M701600200. [DOI] [PubMed] [Google Scholar]
  • 28.Marchut AJ, Hall CK. Effects of chain length on the aggregation of model polyglutamine peptides: Molecular dynamics simulations. Prot: Struct Func Bioinf. 2007;66:96–109. doi: 10.1002/prot.21132. [DOI] [PubMed] [Google Scholar]
  • 29.Lakhani VV, Ding F, Dokholyan NV. Polyglutamine induced misfolding of huntingtin exon1 is modulated by the flanking sequences. PLoS Comput Biol. 2010;6:e1000772. doi: 10.1371/journal.pcbi.1000772. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Laghaei R, Mousseau N. Spontaneous formation of polyglutamine nanotubes with molecular dynamics simulations. J Chem Phys. 2010;132:165102. doi: 10.1063/1.3383244. [DOI] [PubMed] [Google Scholar]
  • 31.Digambaranath JL, Campbell TV, Chung A, McPhail MJ, Stevenson KE, et al. An accurate model of polyglutamine. Prot: Struct Funct Bioinf. 2011;79:1427–1440. doi: 10.1002/prot.22970. [DOI] [PubMed] [Google Scholar]
  • 32.Daggett V. α-sheet: The toxic conformer in amyloid diseases? Acc Chem Res. 2006;39:594–602. doi: 10.1021/ar0500719. [DOI] [PubMed] [Google Scholar]
  • 33.Armen RS, Bernard BM, Day R, Alonso DOV, Daggett V. Characterization of a possible amyloidogenic precursor in glutamine-repeat neurodegenerative diseases. Proc Natl Acad Sci U S A. 2005;102:13433–13438. doi: 10.1073/pnas.0502068102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Babin V, Roland C, Sagui C. The α-sheet: A missing-in-action secondary structure? Prot: Struct Funct Bioinf. 2011;79:937–946. doi: 10.1002/prot.22935. [DOI] [PubMed] [Google Scholar]
  • 35.Wang X, Vitalis A, Wyczalkowski MA, Pappu RV. Characterizing the conformational ensemble of monomeric polyglutamine. Prot: Struct Funct Bioinf. 2006;63:297–311. doi: 10.1002/prot.20761. [DOI] [PubMed] [Google Scholar]
  • 36.Vitalis A, Wang X, Pappu RV. Quantitative characterization of intrinsic disorder in polyglutamine: Insights from analysis based on polymer theories. Biophys J. 2007;93:1923–1937. doi: 10.1529/biophysj.107.110080. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Vitalis A, Wang X, Pappu RV. Atomistic simulations of the effects of polyglutamine chain length and solvent quality on conformational equilibria and spontaneous homodimerization. J Mol Biol. 2008;384:279–297. doi: 10.1016/j.jmb.2008.09.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Wang Y, Voth GA. Molecular dynamics simulations of polyglutamine aggregation using solvent-free multiscale coarse-grained models. J Phys Chem B. 2010;114:8735–8743. doi: 10.1021/jp1007768. [DOI] [PubMed] [Google Scholar]
  • 39.Ramachandran GN, Ramakrishnan C, Sasisekharan V. Stereochemistry of polypeptide chain configurations. J Mol Biol. 1963;7:95–99. doi: 10.1016/s0022-2836(63)80023-6. [DOI] [PubMed] [Google Scholar]
  • 40.Edwards AWF. The measure of association in a 2×2 table. J Royal Stat Soc Series A (General) 1963;126:109–114. [Google Scholar]
  • 41.Moradi M, Babin V, Sagui C, Roland C. A statistical analysis of the PPII propensity of amino acid guests in proline-rich peptides. Biophys J. 2011;100:1083–1093. doi: 10.1016/j.bpj.2010.12.3742. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Moradi M, Babin V, Sagui C, Roland C. PPII propensity of multiple-guest amino acids in a proline-rich environment. J Phys Chem B. 2011;115:8645–8656. doi: 10.1021/jp203874f. [DOI] [PubMed] [Google Scholar]
  • 43.Babin V, Sagui C. Conformational free energies of methyl-β-l-iduronic and methyl-β-d-glucronic acids in water. J Chem Phys. 2010;132:104108. doi: 10.1063/1.3355621. [DOI] [PubMed] [Google Scholar]
  • 44.Moradi M, Babin V, Roland C, Sagui C. A classical molecular dynamics investigation of the free energy and structure of short polyproline conformers. J Chem Phys. 2010;133:125104. doi: 10.1063/1.3481087. [DOI] [PubMed] [Google Scholar]
  • 45.Geyer CJ. Computing Science and Statistics: The 23rd symposium on the interface. Fairfax: Interface Foundation of North America; 1991. Markov chain monte carlo maximum likelihood. pp. 156–163. [Google Scholar]
  • 46.Sugita Y, Okamoto Y. Replica-exchange molecular dynamics method for protein folding. Chem Phys Lett. 1999;314:141. [Google Scholar]
  • 47.Moradi M, Babin V, Roland C, Darden T, Sagui C. Conformations and free energy landscapes of polyproline peptides. Proc Natl Aca Sci U S A. 2009;106:20746. doi: 10.1073/pnas.0906500106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Frenkel D, Smit B. Understanding Molecular Simulation. Comput Sci Ser Acad Press; 2002. [Google Scholar]
  • 49.Moradi M, Lee JG, Babin V, Roland C, Sagui C. Free energy and structure of polyproline peptides: an ab initio and classical molecualr dynamics investigation. Int J Quant Chem. 2010;110:2865–2879. [Google Scholar]
  • 50.Babin V, Roland C, Sagui C. Adaptively biased molecular dynamics for free energy calculations. J Chem Phys. 2008;128:134101. doi: 10.1063/1.2844595. [DOI] [PubMed] [Google Scholar]
  • 51.Babin V, Karpusenka V, Moradi M, Roland C, Sagui C. Adaptively biased molecular dynamics: An umbrella sampling method with a time-dependent potential. Int J Quant Chem. 2009;109:3666–3678. [Google Scholar]
  • 52.Case DA, Darden TA, Cheatham TE, III, Simmerling CL, Wang J, et al. “AMBER 10”. San Francisco: University of California; 2008. [Google Scholar]
  • 53.Hornak V, Abel R, Okur A, Strockbine B, Roitberg A, et al. Comparison of multiple amber force fields and development of improved protein backbone parameters. Proteins. 2006;65:712–725. doi: 10.1002/prot.21123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Onufriev A, Bashford D, Case DA. Modification of the generalized Born model suitable for macromolecules. J Phys Chem B. 2000;104:3712–3720. [Google Scholar]
  • 55.Onufriev A, Bashford D, Case DA. Exploring protein native states and large-scale conformational changes with a modified generalized Born model. Proteins. 2004;55:383–394. doi: 10.1002/prot.20033. [DOI] [PubMed] [Google Scholar]
  • 56.Weiser J, Shenkin PS, Still WC. Approximate Atomic Surfaces from Linear Combinations of Pairwise Overlaps (LCPO). J Comp Chem. 1999;20:217–230. [Google Scholar]
  • 57.Zimmerman SS, Pottle MS, N'emethy G, Scheraga HA. Conformational analysis of the 20 naturally occurring amino acid residues using ecepp. Macromolecules. 1977;10:1–9. doi: 10.1021/ma60055a001. [DOI] [PubMed] [Google Scholar]
  • 58.Kabsch W, Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983;22:2577–2637. doi: 10.1002/bip.360221211. [DOI] [PubMed] [Google Scholar]
  • 59.Joosten RP, Te Beek TAH, Krieger E, Hekkelman ML, Hooft RWW, et al. A series of pdb related databases for everyday needs. Nucleic Acids Res. 2011;39:D411–D419. doi: 10.1093/nar/gkq1105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Frishman D, Argos P. Knowledge-based secondary structure assignment. Prot: Struct Funct Bioinf. 1995;23:566–579. doi: 10.1002/prot.340230412. [DOI] [PubMed] [Google Scholar]
  • 61.Humphrey W, Dalke A, Schulten K. VMD – Visual Molecular Dynamics. J Mol Graph. 1996;14:33–38. doi: 10.1016/0263-7855(96)00018-5. [DOI] [PubMed] [Google Scholar]
  • 62.Levenberg K. A method for the solution of certain non-linear problems in least squares. Q App Math. 1944;2:164–168. [Google Scholar]
  • 63.Rose GD, Fleming PJ, Banavar JR, Maritan A. A backbone-based theory of protein folding. Proc Natl Acad Sci U S A. 2006;103:16623–16633. doi: 10.1073/pnas.0606843103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Fitzkee NC, Rose GD. Reassessing random-coil statistics in unfolded proteins. Proc Natl Acad Sci U S A. 2004;101:12497–12502. doi: 10.1073/pnas.0404236101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Flory PJ. Principles of Polymer Chemistry. Ithaca, NY: Cornell Univ. Press; 1953. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Figure S1

Inline graphic-helical content of Inline graphic and Inline graphic peptides. Here, we give (a,b) the Inline graphic-helical content (as a percentage) of individual glutamine residues plotted against their residue numbers for Inline graphic [red] and Inline graphic [blue] as obtained from the last 100 Inline graphic of two 200 Inline graphic long independent simulations; (c,d) The Inline graphic-helical content (as a percentage) of individual glutamine residues plotted against their residue numbers for Inline graphic [red] and Inline graphic [blue] as obtained from the third (c) and the fourth (d) 250 Inline graphic of 1000 Inline graphic REMD simulations.

(EPS)

Figure S2

Ramachandran plots of Gln residues in the Inline graphic peptide. On these plots, each pixel represents a Inline graphic bin, whose intensity represents its relative population, ranging from 1,2,Inline graphic, 49, and 50 or more samples out of Inline graphic conformations. Color scheme is as in Fig. 1.

(EPS)

Figure S3

Ramachandran plots of Gln residues in the Inline graphic peptide. See Figures 1 and S2 for the details.

(EPS)

Table S1

Radius of gyration and cis-trans isomerization.

(PDF)

Text S1

This text includes a description of our simulation details, secondary structure assignments, and radius of gyration analysis.

(PDF)


Articles from PLoS Computational Biology are provided here courtesy of PLOS

RESOURCES