Abstract
We have developed a computational method of atomistically refining the structural ensemble of intrinsically disordered peptides (IDPs) facilitated by experimental measurements using circular dichroism spectroscopy (CD). A major challenge surrounding this approach stems from the deconvolution of experimental CD spectra into secondary structure features of the IDP ensemble. Currently available algorithms for CD deconvolution were designed to analyze the spectra of proteins with stable secondary structures. Herein, our work aims to minimize any bias from the peptide deconvolution analysis by implementing a non-negative linear least-squares fitting algorithm in conjunction with a CD reference data set that contains soluble and denatured proteins (SDP48). The non-negative linear least-squares method yields the best results for deconvolution of proteins with higher disordered content than currently available methods, according to a validation analysis of a set of protein spectra with Protein Data Bank entries. We subsequently used this analysis to deconvolute our experimental CD data to refine our computational model of the peptide secondary structure ensemble produced by all-atom molecular dynamics simulations with implicit solvent. We applied this approach to determine the ensemble structures of a set of short IDPs, that mimic the calmodulin binding domain of calcium/calmodulin-dependent protein kinase II and its 1-amino-acid and 3-amino-acid mutants. Our study offers a, to our knowledge, novel way to solve the ensemble secondary structures of IDPs in solution, which is important to advance the understanding of their roles in regulating signaling pathways through the formation of complexes with multiple partners.
Significance
It is challenging to experimentally determine the structural ensemble of an intrinsically disordered peptide (IDP) alone because it lacks a defined structure in solution. Herein, we have developed a computational method of atomistically refining the structural ensemble of IDPs from the experimental measurement by circular dichroism. Our study offers a, to our knowledge, novel way to solve the secondary structures of the IDPs in solution, which is important to advance the understanding of their roles in regulating signaling pathways through the formation of complexes with multiple partners.
Introduction
Intrinsically disordered proteins/peptides (IDPs) are a category of proteins that possess a poorly defined equilibrium structure; they sample an ensemble of weakly ordered and unordered structures in solution (1, 2, 3, 4, 5). IDPs have been shown to play a central role in biological systems through cellular signaling, regulation, and translation (4,6,7). Additionally, misregulated IDPs are associated with cancer (8) and neurodegenerative diseases (9, 10, 11) such as Alzheimer’s disease. A distinguishing feature of IDPs is that they do not adhere to the classical structure-function paradigm and typically form stable secondary or tertiary structures only upon binding to target proteins (12,13). The lack of stable structures in the ensemble of unbound state (14, 15, 16, 17) enables binding to multiple targets on demand while maintaining a degree of selectivity and specificity because of their polymorphic properties (18). Multiple binding pathways exist between a given IDP and its protein targets (19). Furthermore, IDPs are susceptible to post-translational modifications (20, 21, 22, 23).
It is challenging to determine the structural feature from an ensemble of IDPs. Popular methods for experimental structure determination of proteins, such as cryogenic electron microscopy or crystallography, are incapable of determining the structure of IDPs (12). Solution experimental methods such as NMR spectroscopy are only able to produce an ensemble-averaged structure; thus, additional analysis must be performed to generate the structural ensemble (24). Computational approaches such as molecular dynamics (MD) simulations are also used to generate IDP structures; however, these methods largely rely on Hamiltonians whose coefficients are tuned using experimentally determined structures of stable proteins, resulting in overly biased structures (25, 26, 27). To address these drawbacks, combined computational and experimental approaches have also been used (16,28, 29, 30). A necessary feature of these combined approaches is the conversion between experimental observables and computationally generated structures. NMR structure back calculations (31) use a database that provides a relationship between known structures and chemical shifts. Unfortunately, the NMR chemical shift databases consist of conformationally stable proteins with α-helix and β-sheet structures instead of IDPs. The relationship between the spectroscopic observables and the distinguishing feature of a given protein is deconvoluted from the set of reference structures so that the features of proteins with unknown structures can be determined. This is the typical method for generation of computational models and force field refinement using spectroscopic methods (29,31, 32, 33, 34, 35). Despite the popularity of NMR analysis, there are several advantages to using circular dichroism (CD) spectroscopy for the analysis of IDPs in certain circumstances. CD measurements are of low cost, can be quickly performed, and require a small amount of sample material (36,37); however, they cannot provide high-resolution (residue-specific) structure approximations.
We used several standard CD deconvolution algorithms, including SELCON3, CDSSTR, and CONTIN/LL (38), and the reference data set SDP48 (39) to analyze our IDP experimental CD spectra. We discovered incoherent outcomes on measuring IDPs, mostly likely due to biases from these algorithms that favor defined secondary structures from stable globular proteins. Another deconvolution algorithm with no such biases, developed in the 1980s, uses a non-negative least-squares (NN-LSQ) fitting method for solving globular structures (40). However, the reference data set the authors used then does not include any information from denatured peptides. Now that we have noticed the knowledge gap for solving the CD spectra from IDPs, we applied NN-LSQ in conjunction with the SDP48 data set developed in the 2000s to infer secondary structures in our study. The results suggest that the NN-LSQ method in conjunction with the SDP48 reference data set is superior for proteins with high degrees of disorder, prompting us to use the results from this method in subsequent analysis. Using the secondary structure features from our CD deconvolution, we extract an approximate ensemble of structures from all-atomistic MD simulations. We applied this approach on refining the structures of a set of small disordered peptides derived from the calmodulin (CaM)-binding domain of calcium/CaM-dependent protein kinase II (CaMKII, 293–312) and its 1-amino-acid and 3-amino-acid mutants (see Table 1 for the amino acid sequences). These peptides were chosen for detailed examination because the mutated peptide induces a significant (up to 3000-fold) decrease in the CaM binding equilibrium dissociation constant (Kd) in solution at physiological ionic strength (41), with no current understanding of the underlying mechanism.
Table 1.
Peptide | Sequence |
---|---|
RRK (wild-type) | 293FNARRKLKGAILTTMLATRN312 |
RAK (mutant: 1 site) | 293FNARAKLKGAILTTMLATRN312 |
AAA (mutant: 3 sites) | 293FNAAAALKGAILTTMLATRN312 |
With our combined approach of CD experiments and MD simulations, we have unexpectedly discovered that the increase of secondary structures in a particularly revealing peptide mutant (AAA) was due to the formation of a β-hairpin conformation, which we speculate is the mechanism behind changes in the encounter rate for this set of IDPs. Obtaining the structural ensembles of CaMKII peptides was a necessary and essential step toward a more accurate estimation of their binding rates for CaM and presently serves as a, to our knowledge, novel example for how secondary structure can be a barrier to productive protein-protein interactions. The deconvolution of CD spectra and subsequent refinement of MD data associated with IDPs and proteins with significant disorder is extremely useful for studying the all-atom conformational dynamics of IDPs, which continue to remain elusive.
Materials and Methods
Peptide synthesis and preparation
The three 20-amino-acid-long peptides used in this study were modeled after the CaM-binding domain of CaMKII (residues 293–312; see amino acid sequences in Table 1) and were synthesized by LifeTein LLC (Somerset, NJ). Their purity was greater than 95%, and the composition of each peptide was validated by mass spectroscopy. The kinetics of their binding to calcium/CaM was determined previously using stopped-flow fluorimetry (41), and the values for kon and koff can also be found in Table 1. These experiments revealed that mutation (R296A/R297A/K298A) produced a 3000-fold decrease in the binding affinity.
Measurement with CD spectroscopy
Far-ultraviolet CD spectra were collected on a JASCO-815 spectrophotometer (Easton, MD) controlled by Spectra Manager software. Suprasil cuvettes with a 1.0 mm pathlength were used for all experiments. The spectrometer parameters were typically set to the following unless noted otherwise: bandwidth, 1 nm; response time, 1 s; and data pitch, 0.2 nm/min. A solution consisting of 100 μM peptide was made using 10 mM Tris buffer (pH 7.5), and measurements were taken by scanning the excitation wavelength between 190 and 260 nm with temperature controlled at 20°C. A total of 10 data accumulations for each run were made with a sweep rate of 100 nm/min. Data collection was repeated for each peptide a total of three times, using a freshly prepared sample in each run.
Deconvolution of CD using a standard package
The CDPro software package suite (38,42) was used to deconvolute the experimental CD spectra of the wild-type and mutant CaMKII peptides. We used the soluble and denatured protein (SDP48) data set in conjunction with the CDPro standard numerical fitting methods: CDSSTR, CONTIN/LL, and SELCON3 (38,42). Because CDPro gives reliable results with CD data in the range of wavelengths 190–240 nm when a large reference set is used (such as SDP48) (38), we inputted our data in the same range in increment of 1 nm. The resulting structure approximation is presented as fractional values for six main secondary structure categories: helix (regular), helix (distorted), strand (regular), strand (distorted), turn, and unordered. We generalized the secondary structure codes into four main categories by consolidating the helix (regular) and helix (distorted) into the helix category and strand (regular) and strand (distorted) into the strand category for comparison of the structure fractions produced by other analysis methods (see Table 2).
Table 2.
Defined Structure Categories | CDPro Structures | DSSP Structures | CPPTRAJ Implementation of DSSP |
---|---|---|---|
helix | helix (regular) | α-helix | α-helix |
helix (distorted) | 3–10 helix | 3–10 helix | |
β-strand | strand (regular) | β-strand | parallel β-sheet |
strand (distorted) | antiparallel β-sheet | ||
turn | turn | turn | turn |
bend | |||
other | unordered | π-helix | π-helix |
β-bridge | none | ||
irregular/loop | |||
turn (1 residue) | |||
bend (1 residue) |
We choose a consolidation scheme similar to Kardos et al. (86), in which the π-helix secondary structure is counted as unordered because of its lack of distinction as a stable secondary structure. The DSSP was implemented using the AMBERTOOLS trajectory analysis software CPPTRAJ, which contains an alternate set of structure codes despite using the DSSP algorithm.
Deconvolution of CD data using NN-LSQ fitting
To reliably deconvolute the experimentally determined CD spectra of the wild-type and mutant CaMKII peptides, we applied an NN-LSQ fitting method (40). Unlike other tools using generalized spectra for each secondary structure, we used the reference data set SDP48, which consists of CD spectra of 48 soluble and denatured proteins, as the basis spectra. We assumed that a linear combination of CD spectra from the reference data set is sufficient to approximate the experimental spectra seen in RRK, RAK, or AAA and that the CD spectra of the reference proteins are linearly independent. The squared difference between any non-negative linear combination of the CD basis spectra and the experimental CD spectrum Δ2 is minimized by finding the optimal weight coefficients , as shown in Eq. 1.
(1) |
where C is the 51 × 48 matrix representing the 51 CD spectrum points for all 48 reference proteins of SDP48, is a vector (xi ≥ 0, i = 1, 2, …, 48) of the weight coefficients for the reference proteins, and is the 51 by 1 vector of the experimentally measured CD values of the CaMKII peptide in the 190–240 nm wavelength range. The weight coefficients vector is determined by NN-LSQ fitting. To note, here each coefficient is not bounded between 0 and 1 to account for the possible differences in the signal amplitude in our experimental results and the reference data set CD spectrum.
We subsequently use the fitted weight coefficients vector to compute the secondary structure fractions given by Eq. 2,
(2) |
where A is the 6 × 48 matrix representing the six possible secondary structure fractions for each of the 48 reference proteins and is the 6 × 1 secondary structure solution for the CaMKII peptide.
The resulting structure approximation is presented as fractional values for six main secondary structure categories: helix (regular), helix (distorted), strand (regular), strand (distorted), turn, and unordered. We generalized the secondary structure codes into four main categories by consolidating the helix (regular) and helix (distorted) into the helix category, and strand (regular) and strand (distorted) into the strand category for comparison of secondary structure fractions produced by other analysis methods (see Table 2).
Validation of NN-LSQ deconvolution results
The performance of NN-LSQ, CONTIN/LL, SELCON3, and CDSSTR deconvolution methods were compared using the root mean square deviation (δ) and correlation (r) coefficients shown in Eqs. 3 and 4, originally defined by Woody and Sreerama (38). The analysis uses subsets of 411 proteins obtained from the Protein Circular Dichroism Data Bank (PCDDB) (43) with known secondary structures and CD spectra. The PCDDB entries for the selected spectra are provided in Table S11.
(3) |
and
(4) |
where N is the number of proteins, is the structure content obtained from CD deconvolution for structure i, and is the known structure fractional content for structure i.
All-atom MD simulations with implicit solvent of the peptides
MD setup and initialization
Because there is no high-resolution solved structure because of the disordered nature of the CaMKII peptides, we built the initial structures for MD simulations using the LEaP module of AMBERTOOLS 14 (44) based only on the amino acid sequences (Table 1). To be consistent with the experimental study (41), the N- and C-termini of these peptides were not capped or modified. All MD simulations were carried out using the package AMBER 14 with the ff99sb force field (26,44). We used an implicit solvent model with the generalized Born (45, 46, 47) approximation and the modified Born radius parameter set mbondi2 (46). We performed energy minimization on the initial structures using 1000 steps of conjugate gradient, followed by 1000 steps of steepest descent algorithms. The minimized structures were brought to the desired temperatures in two steps: heating each minimized structure to 277, 285, or 293 K, followed by a simulated annealing cycle. Simulated annealing was carried out by heating structure coordinates obtained in the previous step to 400 K over a period of 600 ps, followed by cooling to the designated temperature over a period of 600 ps with velocity randomization every 100 ps. All setup runs used a time step of 2 fs. We restrained hydrogen dynamics by employing the SHAKE algorithm (48). We used Langevin dynamics with a collision frequency of 2 ps−1 to regulate the temperature (Langevin thermostat); periodically randomizing the velocity distributions was therefore necessary to avoid the synchronization effects associated with Langevin thermostats (49).
MD production runs
We performed all-atomistic implicit solvent simulations for each CaMKII peptide at 277, 285, and 293 K, replicating the operating temperature of the stop-flow kinetics experiment (41), a midpoint temperature, and the operating temperature of the CD measurements, respectively. The production run was performed at the designated temperature for a period of 80 ns with a 2-fs time step. We sampled energy and trajectory data every 4 ps, which was determined through correlation time analysis. All simulation steps from the setup and production runs were repeated an additional 14 times for every temperature and peptide combination, resulting in a total production run simulation time of 2.4 μs (per peptide per temperature). Trajectories were tested for convergence using two approaches: Kullback-Liebler divergence (50,51) between distributions of the potential energy in accumulated simulation time (Table S5) and cluster analysis with respect to simulation time (Fig. S5). Details of convergence analysis can be found in the Supporting Materials and Methods.
Data-guided extraction of all-atom peptide conformation ensembles
Determination of the secondary structure content in MD trajectories
The secondary structure content of the peptides was computed using the CPPTRAJ module of AMBERTOOLS (52), which calculates structure content based on the Dictionary of Secondary Structures of Proteins (DSSP) (53). The results of our structure analysis generated seven possible secondary structure categories per residue: α-helix, parallel β-sheet, antiparallel β-sheet, 3–10 helix, π-helix, turn, and unordered. We consolidated the seven secondary structure categories into four generalized secondary structure categories (see Table 2) and generated a histogram of the structure codes associated with each residue to produce the overall fractional secondary structure values in each trajectory frame.
Refinement of IDP ensemble structures from MD using CD deconvolution data
Using the secondary structure data for each frame of our MD trajectories, we selected pairs of trajectory frames that produce average secondary structure fractions similar to those observed in the CD deconvolution data from our NN-LSQ fitting. For a given peptide trajectory, frames are extracted in pairs if the following equality is satisfied for each structure fraction:
(5) |
where and are the fractional values for the k-th structure category (helix, β-sheet, turn, or unordered secondary structure categories) for frames i and j and is the structure fraction for category k derived from our NN-LSQ deconvolution results.
Contact map analysis
CD-guided MD structures of the peptides from the CD-refined ensemble were used for contact map analysis. The definitions are described as follows:
-
1)
A contact between residue i and j (at least four residues away) is formed if any atom from residue i is within a cutoff distance of 4 Å of any atom from the residue j.
-
2)
A backbone (side-chain) contact between residue i and j (at least four residues away) is formed if any backbone (side-chain) atom from the residue i is within a cutoff distance of 4 Å of any backbone (side-chain) atom from the residue j. A single hydrogen atom from glycine is considered as its side chain.
-
3)
A hydrogen bonding contact between residue i and j is formed if a donor atom (D) from residue i is within a cutoff distance of 4 Å of an acceptor atom (A) from residue j and the D-H-A angle through a bonding hydrogen (H) is within a cutoff angle of 30°.
Results
CD spectra indicate a distinct secondary structure shift between RRK and AAA
The CD spectra presented in Fig. 1 show the average secondary structure ensembles of RRK, RAK, and AAA peptides: three peptides of identical length exhibiting significantly different binding kinetics with CaM (41). In general, a negative CD band at 220 nm indicates the presence of helical or strand structures, and a negative band at 195 nm corresponds to denatured or disordered structures (54). Here, the experimental data show the existence of a secondary structure in AAA that does not exist in RRK or RAK. These data suggest that each charged residue mutation reduces the disordered content of the peptide’s structure ensemble. Overall, the charged residue mutations of R296A/R297A/K298/A result in a significant conformational change from the disorder in RRK to the more ordered structures in AAA.
We first speculated the increased structures in AAA were due to a helical secondary structure because alanine residues have the highest propensity to form α-helices (55). However, the experimental CD spectrum for AAA displays only one negative peak at 222 nm but is missing a second smaller signal peak at 208 nm, which is a hallmark of α-helical regions in CD spectra (54). This indicates that there is a mixture of secondary structure components in the peptides. Therefore, we employed CDPro to deconvolute the CD spectra on the three peptides in the next section.
Standard CD deconvolution solvers produce inconsistent results on the content of secondary structures
We employed three standard CD deconvolution solvers, CDPro, CAPITO, and BeStSeL, in attempts to analyze the structural information of the peptide spectra shown in Fig. 1. We found that all three algorithms show nonconvergence and unacceptably large RMSDs compared with the experimental spectra as follows:
-
1)
CDPro: We generalized the secondary structure codes used by CDPro into four main categories (see Table 2). The three standard deconvolution solvers (CDSSTR, SELCON3, and CONTIN/LL) from CDPro generate inconsistent fractions of secondary structures as shown in Table 3. The CONTIN/LL method shows that RRK contains mostly turn and unordered secondary structures; however, the CDSSTR method shows that RRK contains similar quantities of structured and unstructured regions. In the AAA deconvolution results, the CDSSTR and CONTIN/LL methods suggest opposing secondary structure content, with CDSSTR resulting in the increase of helical fractions and CONTIN/LL resulting in the increase of turn content. The SELCON3 methods appear to perform the worst among the three, giving large RMSDs between the reconstructed CD spectra and the experimental data (Fig. 2) and producing unrealistic fractions of secondary structures.
-
2)
CAPITO: Use of more recently developed tools for the analysis of CD spectra either shows large RMSDs or underestimates the fraction of unordered secondary structure for proteins with rich disordered segments. Specifically, CAPITO (56), which uses basis spectra for each of the α-helix, β-strand, and irregular secondary structures extracted from SP-175, produced a poor fit for the CaMKII peptides (see Fig. S1; Table S1).
-
3)
BeStSel (57) carries out a detailed secondary structure analysis, providing information on eight secondary structure components, and provides improved estimation of the β-strand content. Our analysis of the CaMKII peptides with BeStSeL produced relatively large RMSDs (Fig. S2) and reinforces that present CD analysis tools are not useful for this class of peptides.
Table 3.
Helix | Strand | Turn | Unordered | RMSD in Δε | ||
---|---|---|---|---|---|---|
RRK | ∗SELCON3 | 0.00 | −0.06 | −0.07 | 1.28 | 15.86 |
CDSSTR | 0.15 | 0.32 | 0.28 | 0.24 | 1.38 | |
CONTIN/LL | 0.01 | 0.01 | 0.10 | 0.88 | 0.35 | |
NN-LSQ | 0.04 | 0.15 | 0.09 | 0.72 | 0.26 | |
RAK | SELCON3 | 0.04 | 0.03 | 0.01 | 0.94 | 4.72 |
CDSSTR | 0.17 | 0.29 | 0.23 | 0.31 | 0.84 | |
CONTIN/LL | 0.03 | 0.02 | 0.08 | 0.87 | 0.40 | |
NN-LSQ | 0.04 | 0.22 | 0.13 | 0.62 | 0.23 | |
AAA | SELCON3 | 0.29 | 0.20 | 0.19 | 0.36 | 3.26 |
CDSSTR | 0.37 | 0.30 | 0.16 | 0.17 | 0.63 | |
CONTIN/LL | 0.03 | 0.05 | 0.30 | 0.62 | 0.44 | |
NN-LSQ | 0.04 | 0.34 | 0.17 | 0.46 | 0.36 |
The approximate CD spectrum representing the CaMKII peptides is recreated from a linear combination of SDP48 known conformation and spectra definitions that we developed. The RMSD between the approximated and experimental spectrum (Δε) is given in unit of M-1 cm -1.
SELCON3 was unable to reach a convergent solution during the analysis of RRK.
CD deconvolution with NN-LSQ fitting indicates presence of β-hairpin secondary structure
The inconsistencies associated with the standard deconvolution models prompted us to review the fitting methods from the three standard deconvolution solvers. We noted that these methods overly favor helical content by fitting the CD spectrum to a data set of predominantly globular or membrane-bound proteins, as well as by employing algorithms emphasizing the weights on helical structures. To avert these two issues, we chose to fit the CD spectrum with the data set of denatured proteins (SDP48) and search for alternative fitting routines. It is necessary to use the data set of only denatured proteins (rendering the lowest RMSD between the approximated and experimental spectrum (Δε)) because using other data sets made up of globular proteins do not yield good fits (rendering large RMSDs between the approximated and experimental spectrum (Δε)), as shown in Table S2. We used NN-LSQ fitting, which simultaneously took into account the data from all protein structures in the SDP48 reference set and made no a priori assumptions about the secondary structure. Our NN-LSQ fit deconvolution results, presented in Table 3, indicate that the primary effects of the mutation in the CaMKII peptides emerge through an increase in the β-sheet category (strand) secondary structure, whereas the helical content remains the same. The increase in the strand secondary structure is naturally associated with a decrease in disordered secondary structure, where RRK has the highest disordered content with 72%, and AAA has the lowest disordered content with 46%.
All-atom MD simulations produce strongly biased structure ensembles
To generate an equilibrium ensemble of structures for the three peptides, we employed all-atom MD simulations with implicit solvent at three temperatures: 277, 285, and 293 K. A total of 2.4 μs of data sampled at 4-ps intervals was collected for each peptide and temperature combination and analyzed for their secondary structure content using the DSSP. Data produced from this analysis were translated into a four-category generalized secondary structure scheme shown in Table 2. The secondary structure fractions for each trajectory were first averaged to illustrate the overall conformational trend produced in each simulation (Fig. 3). The analysis of the secondary structures shows that the MD simulations was incapable of generating an ensemble of structures that match with the CD analyses. More specifically, compared with the deconvoluted secondary structure fractions from the CD data, the MD ensembles illustrate a significant bias toward helical content. The data for all peptides at all temperatures show the β-sheet content at less than 5% and the helical, turn, and unordered content in the range of 30–40%. Additionally, the DSSP analysis indicates no significant overall secondary structure shift between the wild-type and mutant peptides for all three temperatures. This finding is in contrast to the deconvolution data in NN-LSQ fitting, in which the fraction of β-sheet content for RRK is 15% and increases to 22% and 34% for RAK and AAA, respectively. In summary, the outcomes from the MD simulations do not appear to accurately represent the secondary structure shift that occurs between mutant peptides as indicated by our CD data.
Approximate structure ensemble of IDPs from all-atom trajectories and CD deconvolution
To gain useful information from the MD simulations that agrees with our CD deconvolution data, we select pairs of trajectory frames from the production run with similar averaged secondary structure fractions as those observed in our NN-LSQ fitted CD deconvolution data shown in Table 3. We analyzed peptide trajectories for the 293 K production run using a ϕ-value of 0.035 for each structure category (Eq. 5). Using the criteria, we obtained 11,002 structures for RRK, 2410 structures for RAK, and 130 structures for AAA. Deviations between the number of structures generated for each peptide appears to be correlated with the relative β-sheet content. All MD trajectories displayed poor sampling of β-sheet structures (Fig. 3), which may explain the decreasing number of extracted frames as the β-sheet content for each peptide increases. Specifically, ∼5% of the sampled trajectory frames contained β-sheet secondary structures for all three peptides. The results from NN-LSQ deconvolution showed that RRK, RAK, and AAA contained 15, 22, and 34% β-sheet secondary structures, respectively (Table 3). Because RRK contained the lowest fraction of β-sheet content, significantly more frames were able to be extracted from our trajectories than for AAA using Eq. 5.
We assume that the structural ensemble of the MD simulations is biased but still samples the correct peptide conformations in significantly smaller quantities. Because spectroscopic methods produce observables corresponding to the ensemble-averaged state, we only require that the extracted MD frames produce an ensemble whose average corresponds to the experimental CD data. Using the solutions obtained from CD deconvolution enables us to separate MD trajectory data that agree with the experimental data from biased trajectory data.
A set of 10 structures representing each peptide ensemble was generated by clustering. Initially, the Hieragglo clustering method from CPPTRAJ with 10 total clusters was used. The results of this clustering method appear to be misleading because of the disproportionally large populations of the first clusters in RRK and RAK (Fig. S7; Table S6). Because the CaMKII peptides possess significant fractions of disordered content, it is likely that these large clusters have conformational variation within them and are poor representations of the ensemble. To gain better resolution of the representative ensemble structures, a previously developed clustering algorithm was chosen to resolve the extracted structures. The combinatorial averaged transient structure (CATS) method has produced better structure resolution for IDPs than traditional clustering methods (58) and is therefore employed in this study.
The selected structures (Fig. 4) from CATS represent a set of highly probable conformations exhibited by the peptides in solution. Based on these representative structures, RRK and RAK display significant conformational variation compared with AAA, which forms compact β-sheet structures. In our NN-LSQ CD deconvolution results, RRK and RAK present a high percentage of unordered structure at 72 and 62%, respectively. On the other hand, AAA possesses a lower degree of unordered structure at 46% (Table 3). This result is consistent with the generated structure ensemble, which possess a maximal RMSD of 12.5 Å for RRK, 12.4 Å for RAK, and 10.5 Å for AAA (Table S3). The RMSD analysis of the generated ensemble also illustrates that AAA has the lowest standard deviation of RMSD values (1.0 Å) compared with RRK and RAK (1.4 and 1.7 Å, respectively).
An increasing secondary structure content can be observed in Fig. 4 as a result of each sequential mutation of the RRK peptide. This observation is in agreement with the shift in ordered and disordered content predicted by CD deconvolution, despite the apparent force field bias observed in the analysis of the complete trajectories (Fig. 3). We acknowledge that the precise quantitative shift in secondary structure fractions in each mutant may not be completely represented by the generated ensembles shown in Fig. 4; however, they illustrate the approximate location of residual secondary structure.
The set of RRK structures (Fig. 4 A) contains relatively small regions of helical and β-hairpin regions. The N-terminus appears to be largely unstructured, with the ability to participate in β-strand formation with C-terminus residues. On the other hand, the C-terminus appears to form turn, helical, and hairpin structures more readily with other C-terminal or central residues. In the set of RAK structures (Fig. 4 B), the presence of β-strand conformations is more prevalent in comparison with RRK. It can be observed that the N-terminus of RAK participates in the majority of β-hairpin structure formation. Additionally, the number of structures with turn/helical regions in the C-terminus has decreased with respect to RRK; however, this appears to be correlated with the increase in β-hairpin structure formation. Lastly, the set of AAA structures (Fig. 4 C) all contain the β-hairpin secondary structure; however, there appear to be two distinct variations of the hairpin: a symmetrical β-hairpin structure and an asymmetrical hairpin-helix structure. The asymmetrical structures begin their hairpin motif closer to the N-terminus and form a helical structure on the unbound C-terminal tail. Alternatively, the symmetrical structures start forming the hairpin motif in the central region of the peptide, with N- and C- terminal binding instead.
Contact map analysis shows AAA mutant adopts strong secondary structure formation
To gain more insight in the characteristics of the differential hairpin structures in the three peptides, we analyzed the amino acid contacts formed by each peptide (Fig. 5). The CaMKII peptide can be broken down into three regions: N-terminus, C-terminus, and the center. The N-terminal region (293–298) contains positively charged residues in RRK/RAK and neutral residues in AAA. The central region, or the CaM-binding motif (L299–L308), is mainly composed of hydrophobic residues. The C-terminal region of each peptide (309–312) contains a charged arginine residue, which can potentially form hydrogen bonds or repel other positively charged residues in the N-terminus.
-
1)
In the wild-type peptide RRK, as seen in Fig. 5 A, the probability of contact formation is generally low (<0.5), which suggests high variation in the conformations adopted by the peptide. Secondary structures such as β-sheets can be formed at a low probability. More specifically, the N-terminus and the C-terminus can possibly form an antiparallel β-sheet, suggested by the interactions in the cross-diagonal region of the contact map (blue ellipses), especially between side chains of M307 and the middle basic residue (R297); the central region of the peptide can form parallel β-sheets, suggested by the low-probability (∼0.2) interactions in the region of the contact map that are parallel to the diagonal (orange ellipse); more likely, the central region can form an α-helix, indicated by the sparsely distributed higher-probability contacts (∼0.4) parallel to the diagonal (residues separated by four residues, dotted lines parallel to the diagonal in Fig. 5 A), such as the backbone-to-backbone contact between L304 and L308 and the side-chain-to-side-chain contacts between L299 and I303 and between I303 and M307.
-
2)
Upon mutation of R297A, in Fig. 5 B, the interactions are sparser but mostly of higher probabilities. Compared with the wild-type, there is a higher probability of forming an antiparallel β-sheet between the N-terminus and the central region of the peptide (blue ellipses in Fig. 5 B). The N-terminus is likely to form stable contacts with hydrophobic residues in the central region close to the C-terminus, especially between the residues around the mutation A297 and M307-L308. Compared with the wild-type, interactions in the center of the RAK peptide do not seem to form any parallel β-sheet structures (Figs. 5 and S3).
-
3)
In the peptide AAA, further compaction in the peptide structure (Fig. 4) and increase in the secondary structures are observed (Fig. 5 C). In contrast to RRK and RAK, there is a relatively high probability of forming antiparallel β-sheet structures between the N-terminus and the central region (blue ellipses, Fig. 5 C) and a low probability of forming antiparallel β-sheet structures between the central region and the C-terminus (orange ellipses, Fig. 5 C). Interestingly, the mutated residues play an essential role. There are stable backbone-to-backbone interactions between the hydrophobic region formed by the mutated residues and neighboring residues (A297–G301) and hydrophobic residues in the central region (M308–L309) and side-chain-to-side-chain interactions between the mutated residues and residues in the central region, as well as the C-terminus. To note, the mutated residue A298 has a high probability for forming a side-chain-to-side-chain contact with charged residue R311, which is prohibited in RRK or RAK because of electrostatic repulsion. In summary, the AAA peptide shows a high probability of adopting an antiparallel β-sheet conformation (as shown in Fig. 4 C), and the stabilizing hydrophobic interactions of the AAA mutant may interfere with helix formation, which is a necessary conformational adjustment that aligns the CaM-binding motif to residues in CaM, including residues L299, I303, and L308 (lack of interactions within the CaM-binding motif along the lines parallel to the diagonal in Fig. 5 C).
Furthermore, we analyzed the hydrogen bonds within each peptide ensemble to investigate the role of charged residue distribution in each peptide’s equilibrium conformation (Fig. 6). Our analysis reveals two diagonal hydrogen bonding patterns in AAA between N- and C-terminal residues that do not exist in RRK or RAK. Upon closer examination of AAA, we observe that the charged residue mutation sites form hydrogen bonds with the C-terminal region near R311. This binding pattern appears to contribute to the β-sheet secondary structure formation of AAA. On the other hand, the two highest-probability contacts exist between T310-M307 and R311-L308, which may contribute to the formation of the C-terminal helical motif that is observed in several extracted AAA ensemble conformations (Fig. 4 C). RRK and RAK show alternate hydrogen bond patterns on the diagonal that loosely resemble helical or turn conformations. RRK and RAK appear to form only one high-probability hydrogen bond between M307 and L304, which is not formed by AAA. Although RRK and RAK both appear to form low-probability hydrogen bonds with both N- and C-terminal residues, RAK possesses a high-probability hydrogen bond between R296 and M307. Examination of the bonds formed by the charged residues of the N-terminus in RRK and RAK illustrates a pattern of interactions with over half of the other residues, with RRK possessing a greater spread of low-probability bonds than RAK. Direct binding pattern changes between RRK, RAK, and AAA at the mutation sites are expected; however, many of the new hydrogen bonds do not appear to directly involve the charge residues at the mutation sites, implying the effect of charged residue mutations is not localized. This phenomenon is observed in the L299-L308 hydrogen bond: AAA has a high probability of forming this contact compared to RRK and RAK, even though neither residue was mutated.
Discussion
Conformational ensemble of the CaMKII peptides are dependent on charged residue distribution
Our experimental CD measurements and CD deconvolution results indicate that the residual secondary structure of the three-residue mutant AAA is hairpin like. Additionally, our analysis revealed that the RRK and RAK peptides were composed of disordered and hairpin conformations, along with 4% residual helix structure (Table 3). The equilibrium conformational ensemble shift between the wild-type and mutant CaMKII peptides is directly correlated to solvation and electrostatic effects. Previously, several studies have shown that the specific distribution of charged residues within a peptide will affect the equilibrium conformation (59, 60, 61). To determine whether the conformational shift observed between RRK, RAK, and AAA can be attributed to changes in charge distribution, we analyzed the sequences of the CaMKII peptides using the IDP analysis tool CIDER (62). Our analysis found that AAA is predicted to be in a compact or globular ensemble, whereas RRK is predicted to be in the most expanded conformation (see Fig. S4; Table S4). This result was expected because RRK has the most heterogeneously distributed charges with respect to RAK or AAA. CIDER also predicted that RAK will be in a globular form based on the fraction of charged residues; however, the similarity between the RRK and RAK CD spectra leads us to question the validity of this prediction.
NN-LSQ deconvolution yields best results for proteins with large disordered regions
We chose to employ an alternative deconvolution method for the CaMKII peptide because using the NN-LSQ algorithm in conjunction with a data set containing soluble and denatured proteins (SDP48) yielded the best results. Table 3 illustrates that the reconstructed CD spectra using NN-LSQ has the lowest RMSD with respect to the experimental spectra in the 190–240 nm region for all three CaMKII peptides. Additionally, we searched PCDDB (43) to obtain a set of proteins with known structures and associated CD spectra. This criterion matched a set of 411 proteins, which were used to validate the NN-LSQ algorithm results. We determine the quality of deconvolution and validate our results using the methods described by Sreerama and Woody for SELCON3, CDSSTR, and CONTIN/LL (38,39), which are given by Eqs. 3 and 4. We analyzed subsets of proteins with varying disordered content, which showed that the NN-LSQ method has the highest correlation and lowest deviation between predicted and known secondary structures for proteins with high degrees of disorder (Fig. 7). Conversely, our validation revealed that the CDPro algorithms perform better than the NN-LSQ method for proteins that contain lower fractions of disordered content. The complete validation and comparison results can be found in Tables S7–S10.
The CONTIN/LL, CDSSTR and SELCON3 fitting algorithms have been developed to reliably analyze the spectra of stable proteins through a robust set of iterative and variable selection rules that can discard certain solutions. Because the algorithm features were optimized for globular proteins, the accepted solutions are inherently biased and are not applicable for this set of CaMKII peptides, which have a high probability of disorder (Fig. S8; Table S4). Because of this revealed incompatibility, we deconvoluted the experimental CD spectra with NN-LSQ and the SDP48 reference protein set.
Force fields for MD simulations favor helical formation
The Hamiltonian used in MD force fields refines coefficients through experiments with larger globular proteins, which are structured by nature (63,64). This effect has been demonstrated in our equilibrium peptide simulations, which were performed for all mutant variations and at different temperatures (Fig. 3). The effect of temperature on the secondary structures of each peptide appears to be minimal. In each simulation, the helical conformation is overexpressed regardless of temperature or even mutation. In all three peptide runs at each temperature, the same structural trend appears: helix, turn, and unordered structure components are similarly distributed. Compared with the experimental CD results (Fig. 1), we expect the emergence of a dominant structure in AAA that does not appear in RRK or RAK. Because the trajectory data do not display this trend, the force field we used is assumed to contain conformational bias despite previous efforts to improve accuracy (27).
Newer force fields for MD simulations that are designed for IDPs and folded proteins are available (65,66). However, choosing the best model for our specific system was not a simple task. In addition, variations in the water model heavily affect the outcome of IDP simulations (67,68). There is a limitation to IDP force field development because of the lack of experimental data detailing the conformational ensemble of IDPs. Common methods for experimentally refining force fields such as small-angle X-ray scattering (SAXS), fluorescence resonance energy transfer (FRET), and NMR are only able to produce an average of the conformational ensemble (34,69) and do not necessarily contain the observables needed to describe IDPs in silico. We elected to sample a larger set of data by implementing an implicit solvent model instead of focusing our efforts on finding the best MD parameterization. By combining simulation and experimental results, we are able to reveal and partially resolve the shortcomings in each method (70).
Conformations of unbound CaMKII peptide may be important to binding with CaM
The experimental study of the CaM-CaMKII binding kinetics between CaM and the CaMKII peptides illustrate an ∼6-fold increase in the association rate of RRK compared with AAA (41) in 150 mM ionic solution. This ionic strength effectively screens the electrostatic potential by a Debye length of 7.8 Å−1. This screening effect can decrease the electrostatic rate enhancement for diffusion-limited binding kinetics (71,72) of CaM and the CaMKII peptides; however, the electrostatic potential is not completely screened over localized peptide regions. Comparison of the kinetic results to the conformational analysis in this study resolves a finite set of possible binding mechanisms between CaM and the CaMKII peptides. We initially assumed that AAA would have a higher affinity for CaM because of the residual helical propensity induced by the alanine residues because these peptides are known to adopt a helical conformation when they bind to CaM. It has been hypothesized that the presence of a residual structure that resembles the bound state increases the rate of association (73,74). Because the stopped-flow experimental results (decreased on-rate for AAA relative to RRK (41)) disproved this hypothesis, we turned to our CD analysis, which has shown to offer a diverse range of secondary structures for other CaM-binding target peptides (75), for additional potential mechanisms.
The CD measurements indicate a distinct difference in the ensemble of RRK and AAA secondary structures. Our CD deconvolution results indicate that the secondary structure formed through each mutation is actually in the form of a hairpin structure. The apparent lack of helical structure in the peptide ensemble implies that the hypothesis that increased kinetics and peptide residual structure are positively correlated (76, 77, 78) is not applicable in modeling the CaM-CaMKII peptide binding. Moreover, a larger energy gap between the bound and unbound states may exist because of the presence of the stable hairpin structure in AAA (79, 80, 81). For the mutual and induced conformational fit mechanism (82) to take place, the peptide must transition from the hairpin structure to the extended state to form productive and stable contacts with CaM. Our findings suggest that a significant conformational change must occur for the AAA peptide, reversing the hairpin structure to allow formation of the helical conformation upon formation of the CaM-bound complex (83). This provides a plausible mechanistic explanation for the differences in association rates (41) and emphasizes that conformational frustration can be an important step in regulating the kinetics of protein-protein interactions.
Conclusions
The importance of IDPs in biological function has become readily apparent in recent years. A major challenge in IDP modeling stems from experimental sampling of the structure ensemble. Popular methods such as NMR spectroscopy offer higher resolution but are still limited in IDP ensemble determination. To overcome difficulties pertaining to experimental ensemble construction of IDPs, combined theoretical approaches are often used. CD spectroscopy does not offer high-resolution structure determination; however, this drawback appears to be inconsequential for IDPs because MD simulation can be used to perturb the averaged structure to generate the IDP ensemble. In this study, we have used a combination of techniques to bridge the experimental data with theoretical data to generate a detailed picture of our CaMKII peptides despite the inherent inaccuracy of the MD simulation. Our resulting ensemble approximations illustrate how the residual secondary structure of the CaMKII peptides changes because of charged residue mutation. Our findings suggest that the AAA ensemble becomes stabilized through the formation of the hairpin secondary structure, which may explain the binding phenomenon observed in previous studies (41). In addition to the free-peptide ensemble, the observed structure shift may play a significant role in complex stability postbinding because of the formation (or lack thereof) of “fuzzy structures” (84,85).
Author Contributions
M.N.W. and M.S.C. designed research. J.C.E. and P.Z. performed research. J.C.E., P.Z., and N.C.J. analyzed data. J.C.E., P.Z., M.S.C., and M.N.W. wrote the article.
Acknowledgments
M.S.C. and M.N.W. are grateful for the support from the National Institute of Health (2R01GM097553). J.C.E. is supported by a training fellowship from the Gulf Coast Consortia to the Houston Area Molecular Biophysics Program (National Institute of General Medical Sciences grant T32GM008280). The computational resource is uHPC at the University of Houston supported by the National Science Foundation (OAC: 1531814). M.S.C. acknowledges support from the Moores Professorship. M.N.W. also acknowledges support from an endowment from the William Wheless III Professorship.
Editor: Alan Grossfield.
Footnotes
Supporting Material can be found online at https://doi.org/10.1016/j.bpj.2020.02.015.
Supporting Material
References
- 1.Eliezer D. Biophysical characterization of intrinsically disordered proteins. Curr. Opin. Struct. Biol. 2009;19:23–30. doi: 10.1016/j.sbi.2008.12.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Dunker A.K., Lawson J.D., Obradovic Z. Intrinsically disordered protein. J. Mol. Graph. Model. 2001;19:26–59. doi: 10.1016/s1093-3263(00)00138-8. [DOI] [PubMed] [Google Scholar]
- 3.Tran H.T., Mao A., Pappu R.V. Role of backbone-solvent interactions in determining conformational equilibria of intrinsically disordered proteins. J. Am. Chem. Soc. 2008;130:7380–7392. doi: 10.1021/ja710446s. [DOI] [PubMed] [Google Scholar]
- 4.Uversky V.N., Oldfield C.J., Dunker A.K. Showing your ID: intrinsic disorder as an ID for recognition, regulation and cell signaling. J. Mol. Recognit. 2005;18:343–384. doi: 10.1002/jmr.747. [DOI] [PubMed] [Google Scholar]
- 5.Pauwels K., Lebrun P., Tompa P. To be disordered or not to be disordered: is that still a question for proteins in the cell? Cell. Mol. Life Sci. 2017;74:3185–3204. doi: 10.1007/s00018-017-2561-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Borgia A., Borgia M.B., Schuler B. Extreme disorder in an ultrahigh-affinity protein complex. Nature. 2018;555:61–66. doi: 10.1038/nature25762. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Wright P.E., Dyson H.J. Intrinsically disordered proteins in cellular signalling and regulation. Nat. Rev. Mol. Cell Biol. 2015;16:18–29. doi: 10.1038/nrm3920. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Iakoucheva L.M., Brown C.J., Dunker A.K. Intrinsic disorder in cell-signaling and cancer-associated proteins. J. Mol. Biol. 2002;323:573–584. doi: 10.1016/s0022-2836(02)00969-5. [DOI] [PubMed] [Google Scholar]
- 9.Grundke-Iqbal I., Iqbal K., Binder L.I. Abnormal phosphorylation of the microtubule-associated protein tau (tau) in Alzheimer cytoskeletal pathology. Proc. Natl. Acad. Sci. USA. 1986;83:4913–4917. doi: 10.1073/pnas.83.13.4913. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Mudher A., Lovestone S. Alzheimer’s disease-do tauists and baptists finally shake hands? Trends Neurosci. 2002;25:22–26. doi: 10.1016/s0166-2236(00)02031-2. [DOI] [PubMed] [Google Scholar]
- 11.Levine Z.A., Larini L., Shea J.E. Regulation and aggregation of intrinsically disordered peptides. Proc. Natl. Acad. Sci. USA. 2015;112:2758–2763. doi: 10.1073/pnas.1418155112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Wright P.E., Dyson H.J. Linking folding and binding. Curr. Opin. Struct. Biol. 2009;19:31–38. doi: 10.1016/j.sbi.2008.12.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Fuxreiter M. Fold or not to fold upon binding - does it really matter? Curr. Opin. Struct. Biol. 2019;54:19–25. doi: 10.1016/j.sbi.2018.09.008. [DOI] [PubMed] [Google Scholar]
- 14.Ball K.A., Phillips A.H., Head-Gordon T. Homogeneous and heterogeneous tertiary structure ensembles of amyloid-β peptides. Biochemistry. 2011;50:7612–7628. doi: 10.1021/bi200732x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Fisher C.K., Stultz C.M. Constructing ensembles for intrinsically disordered proteins. Curr. Opin. Struct. Biol. 2011;21:426–431. doi: 10.1016/j.sbi.2011.04.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Gong H., Zhang S., Zeng J. Constructing structure ensembles of intrinsically disordered proteins from chemical shift data. J. Comput. Biol. 2016;23:300–310. doi: 10.1089/cmb.2015.0184. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Lindorff-Larsen K., Kristjansdottir S., Vendruscolo M. Determination of an ensemble of structures representing the denatured state of the bovine acyl-coenzyme a binding protein. J. Am. Chem. Soc. 2004;126:3291–3299. doi: 10.1021/ja039250g. [DOI] [PubMed] [Google Scholar]
- 18.Dyson H.J., Wright P.E. Intrinsically unstructured proteins and their functions. Nat. Rev. Mol. Cell Biol. 2005;6:197–208. doi: 10.1038/nrm1589. [DOI] [PubMed] [Google Scholar]
- 19.Collins A.P., Anderson P.C. Complete coupled binding-folding pathway of the intrinsically disordered transcription factor protein brinker revealed by molecular dynamics simulations and Markov state modeling. Biochemistry. 2018;57:4404–4420. doi: 10.1021/acs.biochem.8b00441. [DOI] [PubMed] [Google Scholar]
- 20.Van Roey K., Gibson T.J., Davey N.E. Motif switches: decision-making in cell regulation. Curr. Opin. Struct. Biol. 2012;22:378–385. doi: 10.1016/j.sbi.2012.03.004. [DOI] [PubMed] [Google Scholar]
- 21.Borg M., Mittag T., Chan H.S. Polyelectrostatic interactions of disordered ligands suggest a physical basis for ultrasensitivity. Proc. Natl. Acad. Sci. USA. 2007;104:9650–9655. doi: 10.1073/pnas.0702580104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Lee C.W., Ferreon J.C., Wright P.E. Graded enhancement of p53 binding to CREB-binding protein (CBP) by multisite phosphorylation. Proc. Natl. Acad. Sci. USA. 2010;107:19290–19295. doi: 10.1073/pnas.1013078107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Gsponer J., Babu M.M. The rules of disorder or why disorder rules. Prog. Biophys. Mol. Biol. 2009;99:94–103. doi: 10.1016/j.pbiomolbio.2009.03.001. [DOI] [PubMed] [Google Scholar]
- 24.Brutscher B., Felli I.C., Sólyom Z. NMR methods for the study of instrinsically disordered proteins structure, dynamics, and interactions: general overview and practical guidelines. In: Felli I.C., Pierattelli R., editors. Intrinsically Disordered Proteins Studied by NMR Spectroscopy. Springer International Publishing; 2015. pp. 49–122. [DOI] [PubMed] [Google Scholar]
- 25.Best R.B., Buchete N.V., Hummer G. Are current molecular dynamics force fields too helical? Biophys. J. 2008;95:L07–L09. doi: 10.1529/biophysj.108.132696. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Hornak V., Abel R., Simmerling C. Comparison of multiple Amber force fields and development of improved protein backbone parameters. Proteins. 2006;65:712–725. doi: 10.1002/prot.21123. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Lindorff-Larsen K., Piana S., Shaw D.E. Improved side-chain torsion potentials for the Amber ff99SB protein force field. Proteins. 2010;78:1950–1958. doi: 10.1002/prot.22711. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Ota M., Koike R., Fukuchi S. An assignment of intrinsically disordered regions of proteins based on NMR structures. J. Struct. Biol. 2013;181:29–36. doi: 10.1016/j.jsb.2012.10.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Brookes D.H., Head-Gordon T. Experimental inferential structure determination of ensembles for intrinsically disordered proteins. J. Am. Chem. Soc. 2016;138:4530–4538. doi: 10.1021/jacs.6b00351. [DOI] [PubMed] [Google Scholar]
- 30.Navarro-Retamal C., Bremer A., Thalhammer A. Molecular dynamics simulations and CD spectroscopy reveal hydration-induced unfolding of the intrinsically disordered LEA proteins COR15A and COR15B from Arabidopsis thaliana. Phys. Chem. Chem. Phys. 2016;18:25806–25816. doi: 10.1039/c6cp02272c. [DOI] [PubMed] [Google Scholar]
- 31.Xu X.-P., Case D.A. Automated prediction of 15N, 13Calpha, 13Cbeta and 13C′ chemical shifts in proteins using a density functional database. J. Biomol. NMR. 2001;21:321–333. doi: 10.1023/a:1013324104681. [DOI] [PubMed] [Google Scholar]
- 32.Lincoff J., Sasmal S., Head-Gordon T. The combined force field-sampling problem in simulations of disordered amyloid-β peptides. J. Chem. Phys. 2019;150:104108. doi: 10.1063/1.5078615. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Cavalli A., Salvatella X., Vendruscolo M. Protein structure determination from NMR chemical shifts. Proc. Natl. Acad. Sci. USA. 2007;104:9615–9620. doi: 10.1073/pnas.0610313104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Jensen M.R., Zweckstetter M., Blackledge M. Exploring free-energy landscapes of intrinsically disordered proteins at atomic resolution using NMR spectroscopy. Chem. Rev. 2014;114:6632–6660. doi: 10.1021/cr400688u. [DOI] [PubMed] [Google Scholar]
- 35.Wang W., Ye W., Chen H.F. New force field on modeling intrinsically disordered proteins. Chem. Biol. Drug Des. 2014;84:253–269. doi: 10.1111/cbdd.12314. [DOI] [PubMed] [Google Scholar]
- 36.Kumagai P.S., DeMarco R., Lopes J.L.S. Advantages of synchrotron radiation circular dichroism spectroscopy to study intrinsically disordered proteins. Eur. Biophys. J. 2017;46:599–606. doi: 10.1007/s00249-017-1202-1. [DOI] [PubMed] [Google Scholar]
- 37.Whitmore L., Wallace B.A. Protein secondary structure analyses from circular dichroism spectroscopy: methods and reference databases. Biopolymers. 2008;89:392–400. doi: 10.1002/bip.20853. [DOI] [PubMed] [Google Scholar]
- 38.Sreerama N., Woody R.W. Estimation of protein secondary structure from circular dichroism spectra: comparison of CONTIN, SELCON, and CDSSTR methods with an expanded reference set. Anal. Biochem. 2000;287:252–260. doi: 10.1006/abio.2000.4880. [DOI] [PubMed] [Google Scholar]
- 39.Sreerama N., Venyaminov S.Y., Woody R.W. Estimation of protein secondary structure from circular dichroism spectra: inclusion of denatured proteins with native proteins in the analysis. Anal. Biochem. 2000;287:243–251. doi: 10.1006/abio.2000.4879. [DOI] [PubMed] [Google Scholar]
- 40.Provencher S.W., Glöckner J. Estimation of globular protein secondary structure from circular dichroism. Biochemistry. 1981;20:33–37. doi: 10.1021/bi00504a006. [DOI] [PubMed] [Google Scholar]
- 41.Waxham M.N., Tsai A.L., Putkey J.A. A mechanism for calmodulin (CaM) trapping by CaM-kinase II defined by a family of CaM-binding peptides. J. Biol. Chem. 1998;273:17579–17584. doi: 10.1074/jbc.273.28.17579. [DOI] [PubMed] [Google Scholar]
- 42.Sreerama N., Woody R.W. On the analysis of membrane protein circular dichroism spectra. Protein Sci. 2004;13:100–112. doi: 10.1110/ps.03258404. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Whitmore L., Woollett B., Wallace B.A. PCDDB: the Protein Circular Dichroism Data Bank, a repository for circular dichroism spectral and metadata. Nucleic Acids Res. 2011;39:D480–D486. doi: 10.1093/nar/gkq1026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Case D.A., Babin V., Betz R.M. University of California; San Francisco, CA: 2014. AMBER 14. [Google Scholar]
- 45.Srinivasan J., Trevathan M.W., Case D.A. Application of a pairwise generalized Born model to proteins and nucleic acids: inclusion of salt effects. Theor. Chem. Acc. 1999;101:426–434. [Google Scholar]
- 46.Onufriev A., Bashford D., Case D.A. Exploring protein native states and large-scale conformational changes with a modified generalized born model. Proteins. 2004;55:383–394. doi: 10.1002/prot.20033. [DOI] [PubMed] [Google Scholar]
- 47.Nguyen H., Roe D.R., Simmerling C. Improved generalized born solvent model parameters for protein simulations. J. Chem. Theory Comput. 2013;9:2020–2034. doi: 10.1021/ct3010485. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Ryckaert J.-P., Ciccotti G., Berendsen H.J.C. Numerical integration of the cartesian equations of motion of a system with constraints: molecular dynamics of n-alkanes. J. Comput. Phys. 1977;23:327–341. [Google Scholar]
- 49.Sindhikara D.J., Kim S., Roitberg A.E. Bad seeds sprout perilous dynamics: stochastic thermostat induced trajectory synchronization in biomolecules. J. Chem. Theory Comput. 2009;5:1624–1631. doi: 10.1021/ct800573m. [DOI] [PubMed] [Google Scholar]
- 50.Eguchi S., Copas J. Interpreting Kullback–Leibler divergence with the Neyman–Pearson lemma. J. Multivariate Anal. 2006;97:2034–2040. [Google Scholar]
- 51.Kullback S., Leibler R.A. On information and sufficiency. Ann. Math. Stat. 1951;22:79–86. [Google Scholar]
- 52.Roe D.R., Cheatham T.E., III PTRAJ and CPPTRAJ: software for processing and analysis of molecular dynamics trajectory data. J. Chem. Theory Comput. 2013;9:3084–3095. doi: 10.1021/ct400341p. [DOI] [PubMed] [Google Scholar]
- 53.Kabsch W., Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983;22:2577–2637. doi: 10.1002/bip.360221211. [DOI] [PubMed] [Google Scholar]
- 54.Greenfield N.J. Using circular dichroism spectra to estimate protein secondary structure. Nat. Protoc. 2006;1:2876–2890. doi: 10.1038/nprot.2006.202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Pace C.N., Scholtz J.M. A helix propensity scale based on experimental studies of peptides and proteins. Biophys. J. 1998;75:422–427. doi: 10.1016/s0006-3495(98)77529-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Wiedemann C., Bellstedt P., Görlach M. CAPITO--a web server-based analysis and plotting tool for circular dichroism data. Bioinformatics. 2013;29:1750–1757. doi: 10.1093/bioinformatics/btt278. [DOI] [PubMed] [Google Scholar]
- 57.Micsonai A., Wien F., Kardos J. BeStSel: a web server for accurate protein secondary structure prediction and fold recognition from the circular dichroism spectra. Nucleic Acids Res. 2018;46:W315–W322. doi: 10.1093/nar/gky497. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Ezerski J.C., Cheung M.S. CATS: a tool for clustering the ensemble of intrinsically disordered peptides on a flat energy landscape. J. Phys. Chem. B. 2018;122:11807–11816. doi: 10.1021/acs.jpcb.8b08852. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Das R.K., Pappu R.V. Conformations of intrinsically disordered proteins are influenced by linear sequence distributions of oppositely charged residues. Proc. Natl. Acad. Sci. USA. 2013;110:13392–13397. doi: 10.1073/pnas.1304749110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Mao A.H., Crick S.L., Pappu R.V. Net charge per residue modulates conformational ensembles of intrinsically disordered proteins. Proc. Natl. Acad. Sci. USA. 2010;107:8183–8188. doi: 10.1073/pnas.0911107107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Pace C.N., Alston R.W., Shaw K.L. Charge-charge interactions influence the denatured state ensemble and contribute to protein stability. Protein Sci. 2000;9:1395–1398. doi: 10.1110/ps.9.7.1395. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Holehouse A.S., Ahad J., Pappu R.V. CIDER: classification of intrinsically disordered ensemble regions. Biophys. J. 2015;108:228a. [Google Scholar]
- 63.Huang J., MacKerell A.D., Jr. CHARMM36 all-atom additive protein force field: validation based on comparison to NMR data. J. Comput. Chem. 2013;34:2135–2145. doi: 10.1002/jcc.23354. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Lange O.F., van der Spoel D., de Groot B.L. Scrutinizing molecular mechanics force fields on the submicrosecond timescale with NMR data. Biophys. J. 2010;99:647–655. doi: 10.1016/j.bpj.2010.04.062. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Robustelli P., Piana S., Shaw D.E. Developing a molecular dynamics force field for both folded and disordered protein states. Proc. Natl. Acad. Sci. USA. 2018;115:E4758–E4766. doi: 10.1073/pnas.1800690115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Huang J., Rauscher S., MacKerell A.D., Jr. CHARMM36m: an improved force field for folded and intrinsically disordered proteins. Nat. Methods. 2017;14:71–73. doi: 10.1038/nmeth.4067. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Best R.B., Mittal J. Protein simulations with an optimized water model: cooperative helix formation and temperature-induced unfolded state collapse. J. Phys. Chem. B. 2010;114:14916–14923. doi: 10.1021/jp108618d. [DOI] [PubMed] [Google Scholar]
- 68.Boonstra S., Onck P.R., van der Giessen E. CHARMM TIP3P water model suppresses peptide folding by solvating the unfolded state. J. Phys. Chem. B. 2016;120:3692–3698. doi: 10.1021/acs.jpcb.6b01316. [DOI] [PubMed] [Google Scholar]
- 69.Brucale M., Schuler B., Samorì B. Single-molecule studies of intrinsically disordered proteins. Chem. Rev. 2014;114:3281–3317. doi: 10.1021/cr400297g. [DOI] [PubMed] [Google Scholar]
- 70.Fisher C.K., Huang A., Stultz C.M. Modeling intrinsically disordered proteins with bayesian statistics. J. Am. Chem. Soc. 2010;132:14919–14927. doi: 10.1021/ja105832g. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Vijayakumar M., Wong K.Y., Zhou H.X. Electrostatic enhancement of diffusion-controlled protein-protein association: comparison of theory and experiment on barnase and barstar. J. Mol. Biol. 1998;278:1015–1024. doi: 10.1006/jmbi.1998.1747. [DOI] [PubMed] [Google Scholar]
- 72.Alsallaq R., Zhou H.X. Electrostatic rate enhancement and transient complex of protein-protein association. Proteins. 2008;71:320–335. doi: 10.1002/prot.21679. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Radivojac P., Vucetic S., Dunker A.K. Calmodulin signaling: analysis and prediction of a disorder-dependent molecular recognition. Proteins. 2006;63:398–410. doi: 10.1002/prot.20873. [DOI] [PubMed] [Google Scholar]
- 74.Mohan A., Oldfield C.J., Uversky V.N. Analysis of molecular recognition features (MoRFs) J. Mol. Biol. 2006;362:1043–1059. doi: 10.1016/j.jmb.2006.07.087. [DOI] [PubMed] [Google Scholar]
- 75.Dunlap T.B., Kirk J.M., Creamer T.P. Thermodynamics of binding by calmodulin correlates with target peptide α-helical propensity. Proteins. 2013;81:607–612. doi: 10.1002/prot.24215. [DOI] [PubMed] [Google Scholar]
- 76.Csermely P., Palotai R., Nussinov R. Induced fit, conformational selection and independent dynamic segments: an extended view of binding events. Trends Biochem. Sci. 2010;35:539–546. doi: 10.1016/j.tibs.2010.04.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Arai M., Sugase K., Wright P.E. Conformational propensities of intrinsically disordered proteins influence the mechanism of binding and folding. Proc. Natl. Acad. Sci. USA. 2015;112:9614–9619. doi: 10.1073/pnas.1512799112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Iešmantavičius V., Dogan J., Kjaergaard M. Helical propensity in an intrinsically disordered protein accelerates ligand binding. Angew. Chem. Int.Engl. 2014;53:1548–1551. doi: 10.1002/anie.201307712. [DOI] [PubMed] [Google Scholar]
- 79.Weinkam P., Pletneva E.V., Wolynes P.G. Electrostatic effects on funneled landscapes and structural diversity in denatured protein ensembles. Proc. Natl. Acad. Sci. USA. 2009;106:1796–1801. doi: 10.1073/pnas.0813120106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Higo J., Ito N., Nakamura H. Energy landscape of a peptide consisting of α-helix, 3(10)-helix, β-turn, β-hairpin, and other disordered conformations. Protein Sci. 2001;10:1160–1171. doi: 10.1110/ps.44901. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Higo J., Nishimura Y., Nakamura H. A free-energy landscape for coupled folding and binding of an intrinsically disordered protein in explicit solvent from detailed all-atom computations. J. Am. Chem. Soc. 2011;133:10448–10458. doi: 10.1021/ja110338e. [DOI] [PubMed] [Google Scholar]
- 82.Wang Q., Zhang P., Cheung M.S. Protein recognition and selection through conformational and mutually induced fit. Proc. Natl. Acad. Sci. USA. 2013;110:20545–20550. doi: 10.1073/pnas.1312788110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Meador W.E., Means A.R., Quiocho F.A. Modulation of calmodulin plasticity in molecular recognition on the basis of x-ray structures. Science. 1993;262:1718–1721. doi: 10.1126/science.8259515. [DOI] [PubMed] [Google Scholar]
- 84.Tompa P., Fuxreiter M. Fuzzy complexes: polymorphism and structural disorder in protein-protein interactions. Trends Biochem. Sci. 2008;33:2–8. doi: 10.1016/j.tibs.2007.10.003. [DOI] [PubMed] [Google Scholar]
- 85.Wu H., Fuxreiter M. The structure and dynamics of higher-order assemblies: amyloids, signalosomes, and granules. Cell. 2016;165:1055–1066. doi: 10.1016/j.cell.2016.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Micsonai A., Wien F., Kardos J. Accurate secondary structure prediction and fold recognition for circular dichroism spectroscopy. Proc. Natl. Acad. Sci. USA. 2015;112:E3095–E3103. doi: 10.1073/pnas.1500851112. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.