Abstract
Intrinsically disordered proteins (IDPs) play an important role in an array of biological processes but present a number of fundamental challenges for computational modeling. Recently, simple polymer models have regained popularity for interpreting the experimental characterization of IDPs. Homopolymer theory provides a strong foundation for understanding generic features of phenomena ranging from single-chain conformational dynamics to the properties of entangled polymer melts, but is difficult to extend to the copolymer context. This challenge is magnified for proteins due to the variety of competing interactions and large deviations in side-chain properties. In this work, we apply a simple physics-based coarse-grained model for describing largely disordered conformational ensembles of peptides, based on the premise that sampling sterically forbidden conformations can compromise the faithful description of both static and dynamical properties. The Hamiltonian of the employed model can be easily adjusted to investigate the impact of distinct interactions and sequence specificity on the randomness of the resulting conformational ensemble. In particular, starting with a bead–spring-like model and then adding more detailed interactions one by one, we construct a hierarchical set of models and perform a detailed comparison of their properties. Our analysis clarifies the role of generic attractions, electrostatics, and side-chain sterics, while providing a foundation for developing efficient models for IDPs that retain an accurate description of the hierarchy of conformational dynamics, which is nontrivially influenced by interactions with surrounding proteins and solvent molecules.
1. Introduction
Despite lacking stable tertiary structure under physiological conditions, intrinsically disordered proteins (IDPs) are involved in a large number of important biological functions, including intracellular signaling and regulation, and are also associated with a broad range of diseases, including cancer, neurodegenerative diseases, amylidoses, diabetes, and cardiovascular disease.1,2 The experimental characterization of IDPs is complicated by the heterogeneous nature of their disordered conformational ensembles (i.e., conformational distributions), which challenges traditional techniques developed for folded proteins. For example, X-ray crystallography and cryo-EM, which recover high-resolution images of biomolecules in the crystalline or frozen state, are fundamentally inappropriate for characterizing the distribution of relevant IDP conformations.3 However, techniques including nuclear magnetic resonance (NMR), small-angle X-ray scattering (SAXS), single-molecule Förster resonance energy transfer (FRET), dynamic light scattering (DLS), and two-focus fluorescence correlation spectroscopy (2f-FCS) are capable of identifying the conformational transitions sampled by IDPs,4−7 since they perform measurements of the protein as it fluctuates within its “natural” environment. However, these measurements provide limited resolution in terms of the specification of a unique corresponding microscopic distribution of conformations. In other words, there may exist multiple distinct conformational ensembles which reproduce the experimental measurements, requiring molecular models to infer the correct underlying distribution. As a result, molecular simulations have become increasingly important tools for obtaining microscopic insight that supports experimental observations (e.g., for the characterization of IDP conformational ensembles).4
All-atom (AA) models have gained significant popularity for providing detailed descriptions of complex biomolecular processes and, in conjunction with reweighting techniques, can also be used to assist in the interpretation of experimental measurements. The application of AA simulations to study IDPs has brought to light transferability problems of standard models, which were constructed to stabilize three-dimensional structures of folded proteins. These force fields not only predict overly compact structures,8 but distinct AA models can also generate widely varying and qualitatively different secondary structure content for a given protein sequence.9 Recent efforts have been made to adjust these models to more accurately describe the properties of IDPs.8,10,11 Despite these improvements, AA simulations remain prohibitively expensive for investigating the environment-dependent conformational dynamics of IDPs, due to the expansive conformational landscape traversed by these systems. Moreover, the large range of time scales (from picoseconds to hours), thermodynamic or chemical conditions (e.g., denaturation concentrations), as well as system variations (e.g., sequence mutations) commonly explored in experimental studies represent an overwhelming gap in computational accessibility for AA models that is unlikely to be overcome in the near future through improvements in software or hardware.
The computational expense of these detailed models has motivated the use of much simpler polymer models12,13 (e.g., ensemble construction methods14 or analytically solvable polymer models)4 to provide microscopic interpretations for the experimental characterizations of processes involving IDPs. The disordered nature of IDPs results in conformational heterogeneity and broad intramolecular distance distributions, reminiscent of generic models from the study of polymer physics.15 However, these models are limited in resolution and often lack the ability to provide significant microscopic insight beyond what can already be inferred from experiments. Moreover, the simplicity of the model approximations have been shown to generate inconsistencies in the interpretation of experimental measurements.16−19 Native-biased models (e.g., Go̅-type models)20, which use experimentally determined protein structures to construct a potential energy function with the protein’s native state at the global minimum, have contributed immensely to our basic understanding of the driving forces for protein folding.21−23 When combined with additional non-native interactions, these models provide a straightforward route to elucidate the essential features for reproducing a given experimental observation.24−26 Although these models have been useful for investigating the environment-dependent folding processes of IDPs27,28 (i.e., coupled folding and binding processes), their reliance on a well-defined native structure limits their ability to describe unfolded or disordered conformations. This limitation can even propagate into the characterization of the folding process of globular proteins, resulting in a qualitatively incorrect representation of folding pathways.29 Recent work from Shell and co-workers aims to partially alleviate this limitation by combining transferable bonded interactions with traditional nativelike “nonbonded” interactions.30 There have also been significant advancements in the development of physics-based coarse-grained (CG) models to describe the temperature-dependent collapse and liquid–liquid phase separation of IDPs.31,32
Recently, Rudzinski and Bereau proposed a simple physics-based model33 for describing largely disordered conformational ensembles of peptides. The foundational premise of the model is that the sampling of sterically forbidden conformations, due to missing degrees of freedom, can seriously complicate the faithful description of both static and dynamic properties in CG models of proteins. This complication is perhaps most severe for disordered ensembles, where conformational entropy plays an important role in shaping the free-energy landscape. For this reason, the steric interactions and local stiffness of the protein are described at a united-atom resolution (i.e., explicit representation of all heavy atoms). These interactions account only for excluded volume and chain stiffness, without explicit attractions between atoms which reside at significant separation along the peptide chain. In addition to these detailed interactions, CG attractive interactions are added to represent the characteristic driving forces for peptide secondary and tertiary structure formation. For example, in the introductory studies,33,34 the authors employed a generic attractive interaction between Cβ carbons in order to model the effective attractions between side chains due to the hydrophobic effect. Additionally, attractive interactions between Cα atoms separated by three peptide bonds along the protein backbone were employed to model helix-forming hydrogen-bonding interactions. These two interactions represent the minimum set of interactions necessary for qualitative reproduction of the conformational ensemble of short peptides, that is, to sample helical, coil, and swollen (i.e., hairpinlike) structures. The model was shown to accurately characterize both structural and kinetic properties of helix–coil transitions in small peptides, demonstrating its potential for efficiently describing disordered ensembles, while retaining relevant microscopic details.33,34 Furthermore, the Hamiltonian of the model can be easily adjusted to investigate the driving forces for particular processes.
In this work, we apply variants of this simple physics-based model to investigate the role of distinct interactions in shaping disordered protein ensembles. As a model system, we consider the activation domain, ACTR, of the SRC-3 protein, a “fully disordered” protein with only transient helical propensity.35−37 One way in which IDPs perform their function is by adapting to their environment through so-called coupled folding and binding processes.38 For example, ACTR can form a structured complex with the nuclear-coactivator binding domain (NCBD) of the transcriptional coactivator CREB-binding protein (CBP), which plays an important role in the regulation of eukaryotic transcription.39 CBP demonstrates the functional advantages of IDPs in the regulation of genes,39 participating in interactions with more than 400 transcription factors in the cell.40 In the absence of a binding partner, NCBD is a molten globule with three substantial helical regions39 but undergoes coupled folding and binding processes with a variety of distinct ligands.41 Within the NCBD/ACTR complex, the three helices of NCBD form a bundle with a hydrophobic groove in which ACTR is docked, and the assembly of the two proteins promotes three helices in ACTR37 (see Figure 1).
Figure 1.
Visualization of the NCBD/ACTR folded complex (PDB ID: 1KBH). The number labels correspond to residue numbers at the beginning and ends of helices formed by NCBD (red) and ACTR (blue).
Great efforts have been made to understand how IDPs recognize their binding partners.42,43 For example, electrostatic attractions have been shown to play an important role in driving the formation of encounter complexes between the binding pair.38,44 Additionally, the change in the solvent-accessible surface area of IDP residues upon binding suggests that IDPs can utilize different residues along the amino acid sequence for interactions with different binding partners.2 The conformational diversity of IDPs leading to the folded state makes it challenging to precisely characterize their binding mechanisms both experimentally and computationally.45,46 Previous work has identified two limiting mechanisms of coupled folding and binding: “conformational selection” and “induced fit”. The conformational selection mechanism is characterized by an IDP which samples the relevant folded structure (or some fraction of this structure) within the unbound ensemble. In the induced fit mechanism, the folded state only arises within the conformational ensemble of the IDP through interactions with its binding partner. In practice a combination of these is typically observed.35,47,48 Thus, a key step to describing the binding mechanism for a particular IDP/partner pair, especially in cases where conformational selection is prominent, is to characterize the unbound ensembles of the molecules. Previous computational work employing Go̅-type models has found that the NCBD/ACTR folding process demonstrates dominant characteristics of the induced fit mechanism.47 However, a mechanistic shift toward conformational selection is also possible when NCBD attains a distinct folded structure after a proline isomerization.49 Computational investigations of coupled folding and binding typically employ models that are not explicitly constructed to accurately represent the unbound ensembles of the individual binding partners. While the unbound ensemble of NCBD has been analyzed using both AA and CG simulations,35,36,50 the unbound conformational behavior of ACTR has not, to our knowledge, been investigated in detail. Instead, ACTR is usually taken to be a fully disordered ensemble, as characterized by simple polymer models.15,51
The present investigation employs an intermediate resolution physics-based model to characterize the ensemble of ACTR in the absence of a binding partner. This model enables systematic analysis of the impact of distinct interactions and sequence specificity on the randomness of the resulting conformational ensemble. In particular, starting with a model akin to a bead–spring (BS) model and then adding more detailed interactions one by one, we construct a hierarchical set of models and perform a detailed comparison of their properties. Our analysis shows the following: (i) The incorporation of generic attractions between amino acid side chains significantly expands the diversity of the conformational ensemble, without severely perturbing the distribution of the radius of gyration. (ii) Electrostatic interactions can increase the ruggedness of the conformational landscape, reducing the overall conformational heterogeneity, but can simultaneously introduce additional routes for stabilizing particular secondary structures. (iii) Side chain sterics play a crucial role in determining the overall shape of the free-energy landscape through stabilization of particular structural motifs.
The rest of the paper is organized as follows. In section 2, the hierarchical set of models, associated simulation protocol, and relevant analysis tools are described in detail. Section 3 presents a detailed characterization of the hierarchy of CG models describing the unbound conformational ensemble of ACTR. Two additional polypeptides are also considered to investigate the effect of side chain excluded volume on the conformational ensembles of IDPs. Then, the transferability to the unbound ensemble of NCBD using the more detailed models is assessed. Finally, section 4 provides a brief discussion and conclusions from the investigation.
2. Methods
2.1. Protein Sequences
This work considers the activation domain, ACTR, of the SRC-3 protein and the nuclear-coactivator binding domain (NCBD) of the transcriptional coactivator CREB-binding protein (CBP). The amino acid sequences of ACTR and NCBD are given by:
![]() |
1 |
where eqs 1 and 2 are the sequences for ACTR and NCBD with 71 and 59 residues, respectively. The spacing in the equations separate the sequences into groups of 10 residues. Hydrophobic residues are labeled in red font, while the positively and negatively charged residues are denoted by “+” and “–”, respectively. Upon interaction, NCBD and ACTR form a stable folded complex (Protein Databank (PDB) ID: 1KBH, see Figure 1).
2.2. A Simple Physics-Based Model for Describing Disordered Ensembles
ACTR and NCBD were modeled using a physics-based approach that represents the protein in near-atomic detail while treating the solvent implicitly through effective interactions between protein atoms.33,34 The total potential energy function of the model can be written as a sum of three terms:
![]() |
3 |
Uloc represents local interactions contributing to chain connectivity and stiffness and employs the standard functional forms and parameters for bond, angle, dihedral, and 1–4 interactions given by the Amber99SB-ILDN force field52,53 (see Figure S1). For reasons that will become clear below, we write Uloc as a sum of two contributions:
![]() |
4 |
where Ubond represents the bond interactions between pairs of covalently bonded atoms and Ustiff represents the remaining local interactions listed above. Uexc represents excluded volume interactions at a united-atom resolution (i.e., an explicit representation of all heavy atoms, without hydrogens). The excluded volume interaction for each heavy atom pair was determined by transforming the Lennard-Jones interactions between the pair (again given by the Amber99SB-ILDN force field) to a Weeks–Chandler–Andersen (WCA) potential (i.e., a purely repulsive potential). Uatt represents the attractive interactions employed between Cα, Cβ, and representative side-chain atoms and can contain several distinct contributions. In this work, we consider a hierarchy of eight different models which systematically build upon each other (see Table 1). The first model employs only bond and excluded volume interactions similar to the self-avoiding random walk model from polymer theory: U(1) = Ubond + Uexc.12 The second model adds stiffness to the chain by incorporating the other local interactions: U(2) = Uloc + Uexc. The remaining models employ the full Utot potential with varying representations of Uatt: U(id) = Uloc + Uexc + Uatt(id), for id ∈ {3a, 3b, 4, 5a, 5b, 6}. Model 3 employs attractive interactions between Cβ atoms, Uhp, to model the hydrophobic attraction between side chains:
![]() |
with σij = 0.5 nm. We consider two variants of model 3: (i) x = a, where the same parameter is employed for all
amino acids (denoted homo), ϵhp,ij(a) = ϵhp, and (ii) x = b, where the parameter
depends on the identity of the pair of residues (denoted hetero), , where εhp,i is determined according
to the Miyazawa–Jernigan interaction
matrix54 (see Figure S2 and Table S1). More specifically,
to set the absolute scale of these interactions, we followed the work
of Bereau and Deserno.55 Briefly, the 20
× 20 Miyazawa–Jernigan interaction matrix is reduced to
20 residue-specific energy values, which approximately generate the
full matrix through geometric averages between pairs of residue types.
These energy values are then normalized to be between 0 (most hydrophilic)
and 1 (most hydrophobic). Finally, a single overall interaction scale,
εhp, is chosen to determine all values of εhp,i simultaneously. Model 4 builds upon model
3 by incorporating electrostatic interactions, UDH, between charged residues: Uatt = Uhp(b) + UDH. These interactions are described
at a coarse-grained level of resolution (see Figure S3), using the Debye–Hückel formalism,56 where the full point charge is placed on the
last side chain carbon (i.e., furthest from the backbone) for each
charged residue: arginine (R), lysine (K), aspartic acid (D), and
glutamic acid (E). In particular, the electrostatic energy is given
by
![]() |
5 |
where kJ mol–1 nm e–2 and ε
= 80 at room temperature for monovalent salt; κ–1 is the Debye screening length; qi and qj are the
point charges of the ith
and jth charged sites; and rij is the distance between these sites. κ–1 = 0.313 I–1/2 nm mol1/2 L–1/2, where
is the ionic concentration
of the solution, ni is
the number of unique ionic
species, and ci is the
molar concentration of the ion type i with charge qi.57 Employing physiological concentrations, ci = 0.1 mol/L for all ions, we obtain
κ–1 = 1 nm.
Table 1. Overview of Interactions for Model Hierarchy.
model id | Ubond | Ustiff | Uexc | Uhp(x) | hp type | Uhb | UDH |
---|---|---|---|---|---|---|---|
1 | yes | no | yes | no | N/A | no | no |
2 | yes | yes | yes | no | N/A | no | no |
3a | yes | yes | yes | yes | homo | no | no |
3b | yes | yes | yes | yes | hetero | no | no |
4 | yes | yes | yes | yes | hetero | no | yes |
5a | yes | yes | yes | yes | homo | yes | no |
5b | yes | yes | yes | yes | hetero | yes | no |
6 | yes | yes | yes | yes | hetero | yes | yes |
Model 5 builds upon model 3 by incorporating “local” hydrogen-bonding interactions, Uhb, between Cα atoms that are separated by three residues along the peptide backbone: Uatt(5x) = Uhp + Uhb. This interaction ensures that the proteins are capable of forming α-helical conformations. The incorporation of 1–4 hydrogen bonds independently from hydrogen bonds occurring between residues farther apart along the peptide chain allows the independent investigation of the driving forces for helical versus β-sheet conformations. The latter are not considered in the present study since ACTR does not have a substantial propensity toward β-sheet formation. In a way, the local hydrogen bonds represent a “nativelike” interaction for peptides that fold into a single helix. For this reason, the model was originally designated as a “hybrid Go̅” model, indicating the combination of atomically detailed physics-based interactions with simplistic (possibly natively biased) attractive interactions at a coarser level of resolution. Note that in previous work the hydrogen-bonding interaction was denoted nc for “native contact”. Following previous work employing native-biased CG models, we employ a hydrogen-bonding interaction with a Lennard-Jones form along with a desolvation barrier using the following functional form:24Uhb = ∑i,j=i+3Udb,ij, where
![]() |
6 |
In eq 6, rcm = 0.5 nm is the position
of the first potential minimum with a corresponding depth of εhb, and rdb = 0.65 nm is the position
of the desolvation barrier maximum with a corresponding height of
εdb = 0.4εhb. Z(rij) = (rcm/rij)k, Y(rij) = (rij – rdb)2, , B = mεssm(rssm – rdb)2(m−1) with εssm = εdb/100 and rssm = rcm + 0.3
nm,
, and
. The
parameters k = 6, m = 3, and n = 2 control the
shape of Uhb (see ref (33) for a plot of the potential).
Again, two variants of Uhp(x) are considered, with
homo- and heterotype interactions for x = a and x = b, respectively, as described above. Finally, model
6 also incorporates electrostatic interactions: Uatt = Uhp(b) + Uhb + UDH. The hierarchy of models employed in this
work is summarized in Table 1.
Previous work using model 4 performed an extensive search in parameter space to characterize the behavior of the model in the context of helix–coil transitions of short peptides.33,34 Here, we tune the parameters of the model in an attempt to accurately describe the conformational ensemble of ACTR. There are no adjustable parameters for the local, excluded volume, and electrostatic interactions. Moreover, as described above, several of the parameters for the hydrogen-bonding interactions have been fixed based on previous work.33,34 Thus, the models are left with just two free parameters: εhp and εhb. εhp was initially determined by simulating model 3a with various parameter values and then comparing the generated Rg distribution with that determined from experimental measurements.6 For model 3b (hetero hp type), the residue-specific hydrophobic attractions were applied such that the average hydrophobic interaction energy (i.e., the average value of εhp,i along the chain) was identical to that of model 3a (homo hp type). εhp,ij(b) for ACTR is presented in Figure S3. After fixing εhp, εhb was determined by simulating model 4 with various parameter values and then comparing the generated average fraction of helical segments per residue, h(i), to experiments.6 With the exception of the difference in εhp used for the homo and hetero variants described above, identical εhp and εhb parameters were employed for the entire hierarchy of models, wherever applicable. We hypothesize that the very accurate representation of sterics in the model will result in energetic parameters that are quite sequence-transferable for sequences that exhibit largely disordered ensembles. A challenging test of transferability is assessed toward the end of this work by considering the molten globule NCBD.
When comparing models with fundamentally different interactions, there is no unique procedure for calibrating the energy scales of the models. When the interaction sets are not entirely different (as is the case for the hierarchy of models considered here), one option is to evaluate the models on the same absolute temperature scale, as dictated by the simulation protocol. This would lead to different ensemble properties at the relevant temperature, due to changes in the incorporated interactions. Alternatively, one can work with a reduced temperature scale, defined with respect to a reference temperature, T*, at which a particular ensemble property is reproduced. We follow this latter approach in the present work, and define T* as the temperature at which the average experimental radius of gyration is reproduced. In terms of the absolute temperatures employed in the simulation protocol, T* corresponds to 300 K for ACTR for models 3a, 3b, 5a, and 5b and 270 K for models 4 and 6. For NCBD, T* corresponds to 330 K for the two considered models, 5b and 6.
2.3. Simulations
All simulations of the hierarchical set of physics-based models were performed with the GROMACS 4.5.5 simulation suite58 in the constant NVT ensemble while employing the stochastic dynamics algorithm with a friction coefficient γ = (2.0 τ)−1 and a time step of 1 × 10–3 τ. The CG unit of time, τ, can be determined from the fundamental units of length, mass, and energy of the simulation model. Employing any one of the Lennard-Jones radii and energies from the Amber99SB-ILDN force field yields a time unit on the order of 1 ps. We report the connection to physical units since the models are simulated using these units within the GROMACS suite. For simplicity, we define τ = 1 ps, and report the simulation protocol in units of τ. This relationship to physical units does not provide any meaningful description of the absolute time scale of characteristic dynamical processes generated by the model, due to a lost connection to the true dynamics.59 The present study focuses on ensemble-averaged properties of the generated ensembles and does not attempt to calibrate or interpret the generated dynamics, although previous studies with this model have demonstrated the faithful reproduction of kinetic processes for secondary-structure formation.33,34 For each peptide, a single chain was placed in a cubic box with a volume of (20 nm)3 and simulated without periodic boundary conditions. Thus, no explicit cutoffs were used for the interaction functions described in the previous section. Replica exchange simulations60 were performed to enhance the sampling of the system. In total, 16 temperatures ranging from 225 to 450 K were scanned with an average acceptance ratio of 0.4. These represent absolute simulation temperatures, which were transformed to reduced temperatures for comparison of different models (as described above). The exchange of replicas was attempted every 500 or 1000 τ, and each simulation was run for at least 500 000 τ. The convergence of the simulations were assessed by randomly dividing each trajectory into two groups and then checking for consistency of various observables, including the average radius of gyration and the average fraction of helical segments, as well as autocorrelation functions of the radius of gyration and of the end-to-end distance. Representative examples of the convergence tests are presented in Figures S4 and S5.
For comparison with more generic polymer ensembles, we considered
a BS model (often referred to as the Kremer–Grest model),61 which represents each monomer (i.e., residue)
with a single CG site. Connections between monomers are represented
with the finite extensible nonlinear elastic (FENE) potential. We
considered two variations of the BS model, which differed in the treatment
of nonbonded interactions. The first (denoted “BS”)
employed a purely repulsive WCA potential to represent interactions
between monomers, while the second (denoted “BS-LJ”)
employed a standard Lennard-Jones (LJ) potential with a cutoff rc = 2.5σ. The properties of the BS models
are determined in reduced units in terms of the LJ interaction radius,
σ, the well depth, ε, and the mass, m, of a monomer. The corresponding time unit is . The BS models were simulated at a temperature
of T* = 2.0 ε/kB. Simulations of the BS models were performed with the ESPResSo++
package.62 Each simulation employed a time
step of 0.005 τ and was run for 3.2 × 109 τ,
while using the Langevin thermostat with a damping coefficient of
1.0 τ–1.
2.4. Analysis
2.4.1. Polymeric Behavior
Because IDPs possess some properties similar to those of more generic polymer systems, such as long-range fluctuations and structural heterogeneity, traditional polymer physics analysis can be useful for providing an overarching description of the conformational ensembles of IDPs.15 The single-chain backbone structure factor, which characterizes the overall shape of a molecule, is given by63,64
![]() |
7 |
where N is the number of
residues (N = 71 for ACTR and N = 59
for NCBD) and q is the wave vector. ri corresponds to the position of the Cα atom of the ith residue for the physics-based
models and the position of the ith bead for the BS models. S(q) is widely used to characterize polymer
systems.64 We also calculated
the shape parameters (radius of gyration), Re2 = (rN–r1)2 (end-to-end distance), and
(inter-residue distance
between the Cα atoms). We will use the notation
, where X = {Rg, Re, dCα(i, j)}.
The average (real space) distance between two residues separated
by m residues along the chain is calculated
as
, where
is a sum over all ij pairs
with |j – i| = m and Nij is the number of such pairs. Note that
, where ν is the Flory scaling exponent.
Thus, it is useful to consider the normalized quantity
![]() |
such that is constant for a random walk and proportional
to m0.1 for a self-avoiding random walk.13,64
For a slightly more detailed view of the ensemble, we also calculated contact probability maps, which are obtained by determining the probability that a pair of Cα atoms are within a given cutoff distance, rc, from one another. In this case, we have chosen rc = 1.0 nm. Additionally, we calculated the gyration tensor:
![]() |
where m, n ∈
{x, y, z}. Note
that only the Cα atoms were taken into account
in the calculation of the gyration tensor, for consistency with the
BS models. The eigenvalues of Smn are calculated and ordered as λ1 ≤
λ2 ≤ λ3. The asphericity
of the chain can be characterized in terms of these eigenvalues: . The asphericity values reported throughout
the text are normalized by Rg2 = λ3 + λ2 + λ1: b̃ = b/Rg. For a self-avoiding random walk, the ratios
of eigenvalues are λ3:λ2:λ1 ≅ 12:3:1 (i.e., λ3/λ1 = 12 and λ3/λ2 = 4).65
2.4.2. Helical Propensity
The helical propensity of the peptide is characterized by the average fraction of helical segments, h(i), for each residue i. h(i) is calculated within the context of the Lifson–Roig formulation,66 which represents the state of each residue as being in either a helical, h, or coil, c, state.67 More specifically, h(i) is defined as the average propensity of sequential triplets of h states along the peptide chain. Following previous work,68 we define the helical region of the Ramachandran (ϕ, ψ) map as ϕ ∈ [−160°, −20°] and ψ ∈ [−120°, 50°], although the precise definition has little impact on h(i).
2.4.3. Dimensionality Reduction and Clustering
The conformational landscape of disordered proteins is difficult to characterize within a low-dimensional representation. Linear dimensionality reduction methods typically fail to provide meaningful representations, due to the high level of structural heterogeneity and subtle distinctions between different sub-ensembles. Nonlinear manifold learning methods overcome the limited ability of linear methods to capture nonlinear relationships in the data and can determine the low-dimensional embedding based on a wide variety of criteria. These methods have been more successful in finding low-dimensional embeddings which provide a clear picture of distinct structures in disordered landscapes.69,70 Here we employed the Uniform Manifold Approximation and Projection (UMAP) method, a type of multidimensional-scaling algorithm that attempts to find a balance between resolving global and local properties of the conformational landscape.71 More specifically, given a set of N input features (e.g., intramolecular coordinates), the conformation of the peptide is defined within an N-dimensional space. UMAP obtains the optimal (nonlinear) projection into an n-dimensional space (n < N) using a cost function which simultaneously incorporates pairwise distances between conformations at the largest (global) and smallest (local) scales. In other words, the projection attempts to preserve these two sets of high-dimensional pairwise distances in the low-dimensional space, which results in the preservation of certain features of the conformational landscape. As input features, we employed pairwise distances between Cα atoms and angles between triplets of Cα atoms. To reduce the dimension of the input, we applied the following coarse-graining procedure. We divided the peptide into four-residue segments and computed the minimum distance between atoms belonging to pairs of segments. Pairs of segments separated by less than 3 other segments were excluded. Thus, a total of 28 pairwise distances were included in the input features. We then applied the same segment representation to calculate the average angles between triplets of segments, again excluding any combinations where any pair of segments is separated by less than 3 other segments. This yields a total of 84 angles.
We performed UMAP with an embedding dimension of 2, using the standard Euclidean distance as the metric for evaluating similarity of structures (according to their input features). UMAP requires the choice of two other hyperparameters: the number of neighbors and the minimum distance. Over the range of hyperparameters considered, the resulting embedding space appeared to be relatively robust, but displayed a noticeable change in the “clustering” of data points as a function of either of the hyperparameters. We chose parameter values which resulted in “reasonable” clustering (i.e., a balance between a single cluster and a very diffuse landscape of points): 819 neighbors and 0.01 minimum distance. Since the conclusions made from this analysis are largely qualitative, we do not believe that the hyperparameter choice plays a significant role in our analysis. The UMAP projection was determined using the conformational ensemble generated by model 4. Subsequently, this projection was applied to the ensembles from each of the other models for consistent comparisons. This projection involves a “small” statistical component which has been shown to be normally distributed. Thus, we performed the projection 10 times for each configuration while randomly shuffling the input features. The average of the resulting UMAP coordinates were taken as the “true” projection and used to generate the free-energy landscapes presented below.
While nonlinear dimensionality reduction is necessary for providing a clear description of the overall conformational ensemble, linear methods are very effective if one is only interested in distinguishing between different helical states. Thus, we also applied principal component analysis (PCA) on the conformation space characterized by the ϕ/ψ dihedral angles of each residue along the peptide backbone.72 We then performed a k-means clustering73 along the largest three principal components in order to partition the conformation space into 50 states. We subsequently grouped these 50 microstates into 8 coarser states by applying the PCCA+ dynamical coarse-graining method.74
3. Results and Discussion
In this work, we characterize the role of
distinct interactions
in determining the disordered ensembles of IDPs. The focus of the
study is the “fully disordered” peptide ACTR, which
displays only transient helical structures. ACTR has 71 residues consisting
of 26 hydrophobic residues, 18 charged residues, and a net charge
of −8 (eq 1).
The average radius of gyration of ACTR, , determined from small-angle X-ray scattering
experiments, is 26.5 Å at 5 °C and 23.9 Å at 45 °C.6 Note that the average size of ACTR decreases
when the temperature is increased from 5 to 45 °C. It has been
argued that many disordered proteins undergo such a collapse with
increasing temperature due to the unfavorable solvation free energy
of individual residues.75 The temperature-dependent
collapse of IDPs can be captured by atomistic simulations with explicit
solvent, while temperature-dependent force field parameters are required
for implicit solvent CG models.76 For this
reason, the present study focuses on the ensemble of conformations
sampled at a single temperature. In particular, we focus on the higher
temperature ensemble of ACTR and investigate models which approximately
reproduce
. We employ an intermediate-resolution physics-based
CG model, which represents the excluded volume of the peptide with
united-atom resolution, while treating the attractive interactions
which stabilize secondary and tertiary structure in a much coarser
manner. The model also represents the solvent implicitly through these
attractive interactions. We consider eight distinct models with different
interaction sets, as summarized in Table 1 and described in detail in the Methods. The models are separated into three groups:
(i) models 1 and 2, without explicit attractive interactions, (ii)
models 3a, 3b, and 4, without hydrogen-bonding-like interactions,
and (iii) models 5a, 5b, and 6, with hydrogen-bonding-like interactions.
3.1. ACTR as a Sequence-Specific Self-Avoiding Random Walk
By employing only bond and excluded volume interactions,
model 1 treats ACTR as a self-avoiding polymer, similar to standard
BS polymer models. The main difference here is that the excluded volume
interactions are highly specific (represented at a united-atom level
of resolution), such that they induce some amount of sequence specificity
into the model. Figure 2a shows the distribution of Rg values
for ACTR generated by simulations of model 1 (blue curve) at a reduced
temperature 0.87T*. T* is defined
as the temperature at which the model reproduces . For model 1, ⟨Rg⟩ is
approximately independent of temperature,
as expected for a self-avoiding random walk under athermal solvent
conditions.12 For this reason, and since
there is no free interaction parameter in model 1 for reproducing
, we cannot directly define T* in this case. However,
the value of the temperature-independent
⟨Rg⟩ for model 1 is 26.4 Å (dashed blue line in Figure 2a), which is nearly the same as the experimentally
measured
. Therefore,
we can interpret this model
as representing an ensemble at 0.87T* ([5 °C
+ 273 °C]/[45 °C + 273 °C] ≃ 0.87).
Figure 2.
(a) Distribution
of the radius of gyration, Rg, (b) single-chain
backbone structure factor, S(q),
(c) root-mean-square normalized distance between
pairs of residues separated by |j – i| residues along the chain, , and (d) the probability of pairs of Cα atoms to be within a cutoff of 1.0 nm. In panel (a)
the dashed black line indicates the experimental result of ⟨Rg⟩ at 45 °C. In panels (a)–(c),
the blue, red and magenta curves correspond to results from model
1, model 2 and the BS model, respectively. The arrows in panel (b)
indicate the value of q at which the scaling law
of S(q) changes for model 1 (filled
arrow) and for the BS model (empty arrow). In panel (d), the top and
bottom triangles correspond to results from model 1 and the BS model,
respectively.
Figure 2b–d
presents various ensemble-averaged properties of model 1 at 0.87T* (blue curves). The average fraction of helical segments
per residue is negligible in this model, due to the lack of interactions
that stabilize helices (see Figure S6). Figure 2b presents the structure
factor, S(q), which describes the
overall shape of the protein at three characteristic length scales.64 For small q (), S(q) ≈ N (N = 71 for ACTR).
For
, a power law of S(q) ∼ q–1/ν occurs, where ν describes the quality of the solvent according
to standard polymer theory.12,63 The so-called Kuhn
length, lk, is model-dependent. For
, S(q)
∼ q–1 corresponding to a
rigid rod. For model 1, lk ≈ 2.2
nm, since the crossover to rigid rod scaling occurs at approximately q ∼ 2.9 nm–1 (filled arrow in Figure 2b). Additionally,
ν ≈ 3/5 in the region
, indicating that the
conformational ensemble
generated by model 1 is comparable to a polymer in good solvent (i.e.,
extended conformations are prominent). Figure 2c presents the root-mean-square (normalized)
distance between Cα atoms for two residues
separated by |j – i| residues along the chain,
.
where
is a sum over all ij pairs
with |j – i| = m, Nij* is the number of such pairs, and
. Note that the normalization
by |j – i| is in contrast
to other related
work77 (see Methods for further details and Figures S7 and S8 for plots of the unnormalized root-mean-square distances and additional
analysis of
, respectively).
characterizes the local concentration of
peptide segments for short separation distances (
) and the global behavior
of the chain for
larger separation distances (|j – i| ∼ N). For model 1,
increases monotonically as a function of
|j – i|, reaching a value
of approximately 0.82 nm at |j – i| = N. This behavior is very similar to that of
a self-avoiding random walk (magenta curve, discussed further below)
and is thus consistent with the analysis of S(q). Figure 2d presents the probability that a particular pair of residues, i and j, are in contact (i.e., their Cα atoms are within 1 nm of one another). The top left
triangle of the plot corresponds to the conformational ensemble generated
by model 1, displaying a very low probability of two residues being
in contact if they are situated more than a few residues from one
another along the chain. In other words, the chain is very extended,
in further support of the results from the shape parameters.
For a more direct comparison with a standard polymer model, we also simulated a BS polymer model, commonly referred to as the Kremer–Grest model.61Figure S9 demonstrates the temperature-independent distribution of Rg values for this model. We aligned the length scale of the models by applying a rescaling factor (0.45 nm) to the BS model such that ⟨Rg(BS)⟩ = ⟨Rg⟩. The temperature of the BS model (T* = 2.0 ε/kB) was chosen such that the width of the distribution of Rg values approximately reproduced that of model 1. In the BS model, residues interact according to a purely repulsive (i.e., WCA) potential and the bonds between neighboring beads are represented with a FENE potential. Thus, the main difference between the models is the accuracy with which model 1 describes the excluded volume of both the backbone and the side chains. Figure 2 demonstrates that, with the exception of a broader distribution of Rg values (Figure 2a), a shorter Kuhn length (lk ≈ 0.66 nm, indicated by the empty arrow in Figure 2b), and a modest change in the probabilities of contact for neighboring residues (Figure 2d), the conformational ensemble of the BS model is very similar to the ensemble generated by model 1. We also compared the gyration tensors from the BS model and from model 1. The gyration tensor eigenvalues and normalized asphericity values, b̃, are given in Table 2. As shown in Figure 3, the ratios of the gyration tensor eigenvalues are λ3:λ2:λ1 = 12.20:3.13:1 for the BS model compared with λ3:λ2:λ1 = 11.81:3.12:1 for model 1, further confirming the self-avoiding random walk behavior generated by model 1. Additionally, the ensembles generated by these models yield similar asphericity values: b̃(BS) = 0.62; b̃(1) = 0.61.
Table 2. Eigenvalues of the Gyration Tensor and Normalized Asphericity Values.
model id | λ3 [nm2] | λ2 [nm2] | λ1 [nm2] | b̃ |
---|---|---|---|---|
BS | 4.88 | 1.25 | 0.40 | 0.62 |
1 | 5.08 | 1.34 | 0.43 | 0.61 |
2 | 6.47 | 1.76 | 0.55 | 0.61 |
BS-LJ | 3.58 | 1.41 | 0.57 | 0.47 |
3a | 3.54 | 1.20 | 0.41 | 0.53 |
3b | 3.80 | 1.19 | 0.42 | 0.55 |
4 | 3.56 | 1.14 | 0.39 | 0.55 |
5a | 3.55 | 1.15 | 0.40 | 0.54 |
5b | 3.51 | 1.12 | 0.39 | 0.55 |
6 | 3.38 | 1.04 | 0.36 | 0.56 |
Figure 3.
Ratio of eigenvalues of the gyration tensor: (a) λ3/λ1; (b) λ2/λ1.
Figure 2 also presents
properties generated from simulations of model 2 (red curves). In
contrast to model 1, model 2 introduces an effective backbone stiffness
into the set of interactions which results in an overall expansion
of the peptide for comparable absolute temperatures. In fact, for
this particular model, ⟨Rg(expt)⟩45°C is too low to reproduce
at any temperature due to the fixed nature of the effective stiffness
of the backbone, as determined by the Amber99SB-ILDN force field.
Nevertheless, to illustrate the overall properties of the model, Figure 2 presents results
from 0.4T*, with ⟨Rg(2)⟩0.4T* = 30.9 Å (dashed
red line in Figure 2a). In this case, T* was approximated via a linear
extrapolation of ln Rg(T) (i.e., assuming Arrhenius behavior). Figure 2b demonstrates that model 2 has properties
similar to those of model 1 (i.e., the peptide behaves approximately
as a polymer in good solvent). However, the crossover to S(q) ∼ q–1 occurs at a smaller q compared with model 1, indicating
that the addition of backbone stiffness results in a larger approximate lk, as expected. The contact probability maps
of the two models are also quite similar (Figure S10). However, Figure 2c demonstrates more clearly the effect of local backbone stiffness.
In particular, grows more quickly
for |j – i| ≤ 40,
compared with model 1,
and then drops slowly to a value of about 0.92 nm at |j – i| = N. The peak at |j – i| ≈ 40 indicates that
the chain is locally more rigid in model 2, while the larger distance
at |j – i| = N is indicative of more extended conformations overall, as seen in Figure 2a. We also compared
the gyration tensor for these models (Figure 3). The ratios of the gyration tensor eigenvalues
for model 2 is λ3:λ2:λ1 = 11.76:3.20:1, again demonstrating behavior similar to that
of model 1. The ensemble generated by model 2 also has asphericity
comparable to that of the ensemble generated by model 1 (see Table 2).
To obtain a more detailed picture of the conformational landscapes of these models, we performed a dimensionality reduction using the UMAP nonlinear manifold learning algorithm71 to determine a two-dimensional embedding upon which to view the ensembles. UMAP attempts to retain both the local pairwise connectivity as well as the overall global structure of the high-dimensional input space, within a lower-dimensional (e.g., two-dimensional) projection. As input features for this procedure, we employed distances between pairs of segments along the peptide and angles between triplets of segments, as described in the Methods. For consistent comparison, the two-dimensional UMAP embedding was determined from simulations of model 4 and subsequently applied to the other conformational ensembles. Figure 4a,b demonstrates an approximate physical interpretation of each of the embedding dimensions. Figure 4bi presents a scatter plot of points sampled along the embedding, with colors corresponding to the Rg of each conformation. There is a significant correlation between UMAP-1 and Rg, although this relationship is notably nonlinear. Additionally, the distribution of conformations is significantly broader along UMAP-1 compared with Rg. The second dimension is more difficult to directly interpret. Figure 4bii presents a scatter plot with colors corresponding to the average angle formed between segments 1, 7, and 11, when the peptide is partitioned into segments of four residues. Figure 4a presents an illustration of this angle for two representative conformations. As one moves from the lower left to the upper right of the embedding space, conformations display an overall transition from extended structures to more hairpin-like structures. The UMAP landscape provides a clearer view of the heterogeneous ensemble of structures sampled by ACTR, compared with, for example, free-energy landscapes plotted as a function of Rg and Re (Figure S11). The nonlinear nature of this embedding results in structured free-energy landscapes, which are often not possible for disordered ensembles using linear techniques.69,70Figure 4c presents the free-energy landscapes along the embedding for model 1 and for the BS model. Both models appear to sample very similar conformational ensembles (of primarily extended, larger Rg, structures), consistent with the analysis of the shape parameters above.
Figure 4.
(a) Illustrations of the angle θ, formed between segments 1, 7, and 11, when the peptide is partitioned into segments of four consecutive residues along the backbone. (b) Heat maps of (i) Rg and (ii) θ along the coordinates determined from the UMAP manifold learning algorithm. (c) Free-energy landscapes generated by (i) model 1 and (ii) the BS model along the UMAP coordinates.
3.2. Effect of Hydrophobic Attraction between Side-Chains
Models 3a, 3b, and 4 go beyond the simple self-avoiding walk picture by incorporating attractive interactions between Cβ atoms to represent the solvent-induced hydrophobic attraction between amino acid side chains. While models 3b and 4 take into account the relative hydrophobicity of each residue and scale this hydrophobic attraction accordingly, model 3a employs a uniform hydrophobic attraction which reproduces the average hydrophobicity of the peptide chain. In addition to hydrophobic attractions, model 4 incorporates explicit electrostatic interactions between charged residues via the Debye–Hückel formalism. Figure 5 presents a comparison of the properties generated by these models at T*. Figure 5a presents the distribution of Rg values for models 3a, 3b, and 4 as the blue, red, and orange curves, respectively. The distributions are nearly identical, although model 4 has a slight tendency toward more collapsed structures. This demonstrates an insensitivity in the overall dimensions of the peptide to changes in specific interactions between residues (given the constraints enforced by the excluded volume interactions). Similar to models 1 and 2, the formation of helices is negligible for these models (Figure S6). However, these models no longer demonstrate properties of a polymer in good solvent (Figure 5b,c). In particular, S(q) displays ν = 1/2 dependence, representing a polymer in Θ solvent. In other words, the attractive hydrophobic interactions approximately counteract the effect of excluded volume and chain stiffness, resulting in random walk behavior.
Figure 5.
(a) Distribution of the radius of gyration, Rg, (b) single-chain backbone structure factor, S(q), (c) root-mean-square normalized distance
between
pairs of residues separated by |j – i| residues along the chain, , and (d) the probability of pairs of Cα atoms to be within a cutoff of 1.0 nm. In panel (a)
the dashed black line indicates the experimental result of ⟨Rg⟩ at 45 °C. In panels (a)–(c),
blue, red, orange, and magenta curves correspond to results from model
3a, model 3b, model 4, and the BS-LJ model, respectively. In panel
(d), the top and bottom triangles correspond to results from model
3a and the BS-LJ model, respectively.
Figure 5c also demonstrates
notable differences of these conformational ensembles, relative to
the self-avoiding random walks. In particular, displays a maximum at |j – i| ≈ 15, which reflects the local rigidity of the chain due
to the backbone stiffness (as seen for model 2). As |j – i| increases
beyond 15,
decreases until a minimum is reached at
|j – i| ≈ 55, due
to the attractive hydrophobic interactions between side chains which
promote more collapsed structures. Finally, the slight increase of
for larger |j – i| values demonstrates persistent conformational heterogeneity
(i.e., the ensemble is not completely collapsed). Models 3a and 3b
demonstrate very similar behavior, although a slight expansion of
distances is observed in model 3b over the entire range of |j – i| separations. The inclusion
of electrostatics in model 4 results in noticeable compaction of the
ensemble for larger |j – i| separations. This result may seem surprising, since ACTR has a
−8 net charge. However, recall that we have calibrated the
energy scale of each model by adjusting the absolute simulation temperature
to match ⟨Rg⟩ with the experimental
value. In this case, the direct effect of adding electrostatics to
the model does indeed result in a shift in the Rg distribution to larger values if the absolute simulation
temperature remains fixed, as expected from the net charge on the
chain. By considering the models at T* we demonstrate
that, given ensembles with fixed ⟨Rg⟩, the ensemble generated by the model
with electrostatics samples somewhat more compact structures.
We again compare these ensembles with a standard polymer model
(BS-LJ) but incorporate attractive interactions between monomers,
as described in the Methods. The obtained
distribution of Rg values as a function
of temperature can be seen in Figure S9. We again aligned the length scale of the models by applying a rescaling
factor (0.73 nm) to the BS-LJ model such that ⟨Rg(BS-LJ)⟩ = ⟨Rg⟩45°C. The
distribution of Rg generated by the BS-LJ
model is presented in Figure 5a (magenta curve), showing a narrower distribution and fewer
very compact structures compared with model 3a. This may be partially
due to the fact that we have not reoptimized the temperature for the
BS-LJ model (T* = 2.0 ε/kB) to fit the width of the distribution of Rg values. Significant differences are also observed in S(q) (Figure 5b), which demonstrates ν = 1/4 behavior,
indicating that the chain behaves more like a polymer under poor solvent
conditions in the BS-LJ model (i.e., samples overall more compact
conformations). This result is consistent with previous work with
this model, which identified the Theta temperature as approximately T* = 3.0 ε/kB.78 The S(q) behavior
appears to be in conflict with the distribution of Rg (Figure 5a), which is narrower than the distribution generated by model 3a,
without sampling the compact tail of the distribution from model 3a.
However, Figure 5c
demonstrates that although a maximum in occurs at short residue separations in
the BS-LJ model, due to a lack of interactions governing local stiffness
of the chain, larger
values are also
attained in this region.
These larger average distances between residues at short separation
along the chain likely prevent the sampling of structures with the
smallest Rg values. At the same time,
the lack of chain rigidity along with the presence of attractive interactions
between monomers together promote an increased sampling of compact
structures, leading to apparently compact behavior at intermediate
length scales. Additional distinctions between the two ensembles can
be seen by examining the ratios of the gyration tensor eigenvalues,
which are λ3:λ2:λ1 = 6.28:2.47:1 for the BS-LJ model and λ3:λ2:λ1 = 8.63:2.93:1 for model 3a (Figure 3). Moreover, the
ensemble generated by the BS-LJ model (b̃(BS-LJ) = 0.47) is slightly more spherical than the
ensemble generated by model (b̃(3a) = 0.53). Figure 5d presents the contact probability maps generated by model 3a (upper
left) and the BS-LJ model (lower right). While both models display
increased probability of long-separation (along the chain) contacts,
relative to the models without attractive interactions, the comparison
highlights the simplicity of the BS-LJ ensemble relative to the ensemble
generated by model 3a. In contrast to the slightly more expanded ensembles
generated by models 1 and 2, sequence-specific excluded volume interactions
(along with the details of local protein chain stiffness) appear to
play a more significant role in determining the finer details of these
more collapsed conformational ensembles. However, the contact probability
maps of models 3a, 3b, and 4 display relatively smaller deviations
from one another (Figure S10). Overall,
the inclusion of attractive interactions results in a structured contact
probability map, but remains largely independent of the precise distribution
of hydrophobic attractions.
Column (i) of Figure 6 presents the free-energy landscapes for
models 3a, 3b, and 4, plotted
along the UMAP embedding introduced above. The most striking difference
between these landscapes compared to those generated by the self-avoiding
walk models is the expanded diversity of structures sampled despite
rather similar distributions of Rg. The
addition of attractive interactions results in sampling both more
collapsed and more expanded structures compared with model 1. There
are also more subtle differences between the conformational ensembles
generated by models 3a, 3b, and 4. The redistribution of hydrophobicity
in model 3b, compared with model 3a, leads to only a slight shift
in the conformational ensemble, as indicated by the analysis above.
The most prominent difference is perhaps the increased sampling of
the smallest UMAP-1 (largest Rg) values,
although this difference manifests itself as only a minor change in
the overall distribution of Rg. There
is also an increase of structures corresponding to the largest values
of UMAP-1. Overall, the differences between the ensembles generated
by models 3a and 3b appear to be distributed throughout the entire
embedding space, resulting in “averaging out” and little
difference in the overarching features of the disordered ensembles.
However, the introduction of electrostatics (model 4) leads to more
significant differences in the ensemble of structures and, in particular,
a more rugged free-energy landscape (i.e., a larger number of clearly
separated local minima), as seen in Figure 6ci. ACTR has 18 charged residues: 5 are positive
charges and 13 are negatively charged (see eq 1). Overall, the electrostatic interactions
lead to increased sampling of compact structures (positive values
of UMAP-1) and a slight increase in structures with the smallest UMAP-1
(largest Rg) values. The conformations
along UMAP-2 (i.e., with different θ values) appear to more
uniformly affected by the addition of electrostatic interactions.
It should be noted that the calibration of the energy scales through
the use of reduced temperatures, as discussed above, results in a
distinct balance of stiffness versus attractive interactions in the
different models. In the case of model 4 (and for model 6 below),
a lower absolute simulation temperature is required for this model
to reproduce the appropriate ⟨Rg⟩ value, resulting in larger stiffness energies relative to kBT*. This difference in absolute
simulation temperatures might be interpreted as the reason for the
larger difference in the free-energy landscape for model 4, compared
with models 3a and 3b. Alternatively, one can say that given the fixed
model details (e.g., chain stiffness, hydrophobicity, etc.), the ensemble
which incorporates electrostatics and reproduces does so through an increase in the ensemble
ruggedness.
Figure 6.
Free-energy landscapes generated by models 3a, 3b, and 4 [column (i)] and models 5a, 5b, and 6 [column (ii)] along the coordinates determined from the UMAP manifold learning algorithm.
3.3. Transient Helices
In addition to hydrophobic attractions between side chains, models 5a, 5b, and 6 employ attractive interactions between Cα atoms separated by three peptide bonds along the protein backbone in order to represent hydrogen-bonding interactions. The parameter for this interaction was chosen to approximately reproduce the overall propensity for helices in ACTR, as measured in experiments (described further in the Methods). The current models do not include hydrogen-bonding-like interactions between residues farther apart along the peptide chain, since propensity toward β-sheet-like secondary structures has not been observed in ACTR. Similar to the previous set of models, model 5a employs uniform hydrophobic interactions, while models 5b and 6 use residue-specific hydrophobicity parameters. Additionally, model 6 incorporates electrostatic interactions between charged residues. Figure 7a presents the distribution of Rg values at T* for models 5a, 5b, and 6 as the red, blue, and orange curves, respectively. We find that the distributions are rather insensitive to the addition of hydrogen-bonding interactions. These models generate SAXS profiles and corresponding Kratky plots in good agreement with experimental measurements (see Figure S12 compared with Figure 2b of ref (7)).
Figure 7.
(a) Distribution of the radius of gyration, Rg, (b) single-chain backbone structure factor, S(q), (c) root-mean-square normalized distance between
pairs of residues separated by |j – i| residues along the chain, , and (d) the average fraction of helical
segments, h(i). In panel (a) the
dashed black line indicates the experimental result of ⟨Rg⟩ at 45 °C. In panels (a)–(d),
blue, red, and orange curves correspond to results from models 5a,
5b, and 6, respectively.
Figure 7b,c presents S(q) and , respectively, for models 5a, 5b, and 6.
No significant differences are observed in the behavior of S(q), which can be fit to q–2 (i.e., a polymer in Θ solvent). Figure 7c demonstrates that
the behavior of
is insensitive
to the inclusion of hydrogen-bonding
interactions (i.e.,
follows the same
trend as for models 3a,
3b, and 4). However, similar to the case of model 4,
for model 6, which includes electrostatics,
is smaller than for models 5a and 5b for all separation distances
|j – i|. Additional differences
in the ensembles generated by these three models can be seen by examining
the gyration tensor. As shown in Figure 3, the ratios of the gyration tensor eigenvalues
are λ3:λ2:λ1 =
8.87:2.87:1 for model 5a, 9.00:2.87:1 for model 5b, and 9.39:2.89:1
for model 6. Overall, these results indicate that incorporating hydrogen-bonding
interactions causes a slight shift in the ensembles toward self-avoiding
walk behavior, although the conformational ensemble as a whole still
behaves like a random walk (per S(q)). Additionally, the addition of electrostatics amplifies this effect
through increased stabilization of helices, as examined in more detail
below. At the same time, the ensembles remain largely spherical (see Table 2). The contact probability
maps for these models are presented in Figure S10, but they exhibit differences similar to those between
the models without hydrogen-bonding interactions. We characterize
the formation of helices by the propensity of each residue to form
a helical segment, h(i), as described
in the Methods. Figure 7d presents h(i) for models 5a, 5b, and 6 (blue, red, and orange curves, respectively).
All three models demonstrate similar behavior in terms of the position
of helix formation along the chain, due to the accurate treatment
of side-chain excluded volume. For example, there is dip in the region
with residue indices from 28 to 30, likely due to the presence of
arginine with residue index 29, which is hydrophilic and contains
a rather bulky side chain. The helical regions at residue positions
[9:14], [29:40], [48:54], and [58:62] are in agreement with experimental
observations.6,79 The helical content of models
5a and 5b appears to be somewhat insensitive to the distribution of
hydrophobic interactions, indicating that the precise hydrophobic
contacts play a limited role in the formation of helices (given the
fixed representation of sterics). However, model 6 demonstrates significantly
larger helicity. This may be due to either (i) the generic stabilization
of helices from the increased compaction of the ensemble or (ii) the
increased contact of specific residues which then promote the formation
of helices, which we discuss further below.
The free-energy landscapes for models 5a, 5b, and 6, plotted along the UMAP embedding introduced above, are presented in Figure 6. Similar to the results for models 3a and 3b, the UMAP projections for models 5a and 5b are quite comparable. In contrast to the previous set of models, while model 6 does slightly focus the sampling toward particular regions of the landscape, the ensemble does not appear as rugged as for model 4. However, a clear view of the ensemble is perhaps clouded by the helical conformations, since the UMAP coordinates were determined based on an ensemble without helical conformations. To obtain a more detailed picture of the formation of helices, we performed a dimensionality reduction using PCA while employing the backbone dihedral angles as input features (i.e., dihedral PCA).72 Although the ensembles are largely disordered, linear dimensionality reduction can effectively characterize the formation of transient helices within these ensembles. Figure 8b–d presents the free-energy surfaces generated by models 5a, 5b, and 6, respectively, along the first two PCs. A clustering was performed along the first three PCs, in order to partition the conformational space into 50 microstates. Here, we present a coarse-grained view of this clustering, attained by grouping together sets of microstates. The coarse cluster definitions are presented in Figure 8a as a function of the first two PCs. Figure 9 characterizes each cluster with the intracluster h(i) distributions. Cluster A (blue curve) represents structures with a small helix formed from residues 25–40, while cluster E (gray curve) contains structures with negligible helical conformations. There are two pathways from the cluster A to E which sample either negative (a) or positive (b) values of PC-2. Figure 9 demonstrates that pathway (a) corresponds to unraveling the helix from the N-terminus (Figure 9a), while pathway (b) corresponds to unraveling the helix from the C-terminus (Figure 9b). The free-energy surfaces in Figure 8 show that models 5a and 5b sample a single dominant pathway for helix formation, while the introduction of electrostatics in model 6 allows for helix formation from either end. This additional pathway leads to a significant increase in the sampling of helical conformations (Figure 7d).
Figure 8.
(a) Conformational clusters of ACTR presented along the two dominant principal components (PCs). (b)–(d) Free-energy surfaces of ACTR generated using models 5a, 5b, and 6, plotted along the two dominant PCs.
Figure 9.
Intracluster fraction of helical segments, h(i), for (a) N-terminus and (b) C-terminus folding pathways, as characterized on the PCA landscape (Figure 8) using model 6.
3.4. Clarifying the Role of Excluded Volume in the Formation of Helical Structures
We have considered the
impact that both generic and specific attractive interactions have
on the resulting conformational ensembles of peptides with the length
and approximate excluded volume of ACTR. To explicitly demonstrate
the role that the steric interactions play, we consider models for
the uncharged polypeptides (Alanine)71 and (Glycine)71, denoted as polyA and polyG, respectively, which have the
same parameters εhp and εhb as model
5a but lack the sequence-specific side-chain sterics of ACTR. Because
the resulting ensembles are dramatically different from one another,
it is not feasible to match by adjusting the temperature,
as performed
for the other models in this study. Instead, we employ the same absolute
simulation temperature as for model 5a, allowing ⟨Rg⟩ to deviate from the experimental value.
Figure 10 presents the average fraction of helical segments, h(i), for ACTR, polyA, and polyG. In going from the ACTR model to the polyA model, all side-chain atoms except the Cβ atoms are removed. The removal of excluded volume interactions reduces the entropy loss upon helix formation, significantly promoting the sampling of helical conformations. By further removing the Cβ atoms, from polyA to polyG, the attractive interactions which stabilize compact structures (including helices) are removed, resulting in a complete absence of helical conformations. Despite the uniformity of the sequence, the helicity of polyA demonstrates sequence-dependent behavior. The two regions of smaller helicity at [25:30] and [47:52] (black lines in Figure 10) arise due to the likelihood of the chain bending at positions corresponding to 1/3 and 2/3 of the total chain length, in order to maximize hydrophobic contact in compact structures (see Figure S13). Thus, even if one were to reparametrize the model to reproduce the appropriate ⟨Rg⟩ value and overall h(i) magnitude, the formation of helices in the models that do not accurately represent the side-chain sterics would be qualitatively incorrect. Furthermore, in contrast to the small differences in the overall “shape” of the protein for the various models with differing energetics considered above, there is a dramatic change in the conformational ensemble associated with the amendment of excluded volume interactions (Figure 10b). This motivates the use of models that accurately represent the protein sterics for the investigation of disordered conformational ensembles.
Figure 10.
(a) Average fraction of helical segments, h(i), and (b) distribution of the radius of gyration, Rg, for ACTR (blue), polyA (red) and polyG (orange), determined from simulations of model 5a. The two regions marked by the black lines in panel (a) include the residues ranges [25:30] and [47:52].
3.5. Conformational Ensemble of NCBD
To investigate the applicability of the considered models for investigating distinct disordered ensembles, we consider NCBD, the binding partner of ACTR. NCBD has 59 residues, with 27 hydrophobic residues and 8 charged residues (see eq 1). Its unbound conformer (PDB ID: 2KKJ) is a molten globule that has three helices at residue positions [6:19], [23:36], and [36:47] (see Figure 1 for helix positions in the bound state).35,50 Thus, the unbound NCBD protein generates a very distinct conformational ensemble compared with that of ACTR. In fact, NCBD and ACTR are representative examples of two different classes of IDPs77 (see Figure S2b). The ⟨Rg⟩ for NCBD was measured from SAXS experiments to be approximately 18.8 Å under nativelike conditions.6 We consider here only models 5b and 6, to investigate whether electrostatics play a significant role in shaping the unbound conformational ensemble of NCBD. Because initial simulations of these models resulted in a lack of helix stabilization, we increased the energy of the hydrogen-bonding-like interaction, εhb, from 13 to 16.9 (30% larger than that of ACTR), which lead to good agreement of both ⟨Rg⟩ and h(i) with respect to the experimental values. We have again calibrated the energy scale of the model by finding the simulation temperature at which the experimental values of ⟨Rg⟩ and h(i) are reproduced, independent from ACTR, although the resulting T* is only 10% larger (in absolute temperature units, i.e., K) than the value for ACTR. The adjustment of parameters to reproduce the properties of NCBD was expected, since these quantities are free-energy functions which rigorously depend on the system identity and thermodynamic state point.80 In fact, the relative insensitivity of the model parameters indicates a certain level of transferability of the model, further motivating the use of simple energetic functions for representing disordered ensembles.
Figure 11a presents the distribution of Rg for models 5b (red curve) and 6 (orange curve), which
are nearly identical (⟨Rg⟩
= 18.7 and 18.3 Å, respectively). Similarly, S(q) (Figure 11b) demonstrates ν = 1/2 behavior for both models,
indicating that electrostatics play a relatively small role in the
overall shape of NCBD. However, Figure 11c demonstrates that is significantly different for models 5b
and 6 for |j – i| > 15,
qualitatively
similar to the comparison of models 5b and 6 for ACTR. Without electrostatics
(model 5b), NCBD demonstrates less compaction in the intermediate
regime due to the onset of attractive interactions, compared with
ACTR. The inclusion of electrostatics (model 6) leads to a greater
degree of compaction for NCBD in this regime and a significant difference
in
generated by the two models. The eventual
increase of
demonstrates that NCBD retains significant
conformational heterogeneity within its molten globule ensemble, despite
the presence of largely formed helices. The difference in the behavior
of
for the two models in the case of NCBD
is striking, considering the similarity of the ensemble in terms of
⟨Rg⟩, S(q), and h(i).
This may be a result of the slightly lower propensity for middle helices
in model 6 (Figure 11d), which can allow for the sampling of more compact structures through
stacking of the outer helices. However, the gyration tensor provides
further evidence of the similarity of the ensembles generated by models
5b and 6. The ratios of the gyration tensor eigenvalues are 9.75:3.35:1
and 9.78:3.04:1 for models 5b and 6, respectively, while the normalized
asphericity values are 0.53 and 0.56. Overall, it appears that the
conformational ensembles of IDPs with large fractions of secondary
structure motifs may be more robust to perturbations in the interactions,
assuming a fixed representation of sterics.
Figure 11.
(a) Distribution of
the radius of gyration, Rg, (b) single-chain
backbone structure factor, S(q),
(c) root-mean-square normalized distance between
pairs of residues separated by |j – i| residues along the chain, , and (d) the average fraction of helical
segments, h(i). In panel (a), the
dashed black line indicates the experimental result of ⟨Rg⟩. In panels (a)–-(d), red and
orange curves correspond to results from models 5b and 6, respectively.
4. Conclusions
We have studied the ensembles of two intrinsically disordered peptides, ACTR and NCBD, using a simple physics-based model, which accurately represents peptide sterics and allows an adjustable parametrization to match experimental quantities (e.g., ⟨Rg⟩ and h(i)). A hierarchy of models was considered, which systematically incorporated an increasing number and complexity of interactions, in order to clarify the impact of these interactions on the features of the resulting ensembles. Our analysis demonstrates that the differences between these distinct ensembles are difficult to fully characterize using only traditional shape parameters, such as the distribution of radius of gyration values and the single-chain backbone structure factor. However, the root-mean-square normalized inter-residue distances between Cα atoms, the ratios of gyration tensor eigenvalues, and the contact probability map assist in further distinguishing the overarching features of the ensembles. Additionally, we have employed a manifold learning algorithm in this work, to determine an optimal two-dimensional representation for viewing the ensemble of conformations, which provides an effective way to further clarify the differences between distinct disordered ensembles.
Our investigation found that, with respect to a self-avoiding random walk, disordered ensembles that incorporate hydrophobic interactions lead to a significant increase in conformational heterogeneity. However, given the presence of attractive interactions, the precise identity of these interactions (e.g., the distribution of hydrophobic interactions along the chain or the presence of electrostatics) appear to play a relatively small role in determining the major features of the disordered free-energy landscapes. At the same time, specific interactions can stabilize particular structures which may be relevant for processes under a perturbation of the system (e.g., when a disordered peptide comes into contact with its binding partner). For example, electrostatic interactions increase the ruggedness of the free-energy landscape and stabilize multiple routes to secondary structure formation. These effects appear to be more significant for more disordered, flexible IDPs (e.g., ACTR) than for molten globules (e.g., NCBD). While electrostatics are thought to play an important role in the formation of encounter complexes in IDPs,38,39 the present work suggests that specific contacts between charged residues can promote the presence of transient helices within the ensemble of conformations sampled in solution, which may be relevant for coupled folding and binding processes.
The flexible physics-based model employed in this work facilitated the reproduction of experimental ⟨Rg⟩ and h(i) values for both ACTR and NCBD. These two peptides are representative examples of two different classes of IDPs: “fully disordered” (ACTR) and molten globule (NCBD). Although the (free-energy) parameters of this simple model should be, in principle, highly sequence-specific, we find that only relatively small adjustments were necessary to reproduce the experimental measurements for both systems. This indicates a certain level of transferability in terms of the essential features shaping the free-energy landscape for these disordered systems, motivating the continued use of CG models. Moreover, in conjunction with previous investigations of helix–coil transitions,33,34 our results indicate that excluded volume interactions play a key role in determining the overarching characteristics of heterogeneous landscapes. This further motivates the development of models that can accurately model protein sterics while efficiently sampling conformational space.
Acknowledgments
The authors thank Hsiao-Ping Hsu and Govardhan Reddy for critical reading of the manuscript. Y.Z. and J.F.R. thank Tristan Bereau and Hsiao-Ping Hsu for fruitful discussions. J.F.R. thanks Yasemin Bozkurt Varolgüneş for assistance with the UMAP calculations. J.F.R is very grateful to Ben Schuler and his group for insightful discussions regarding the NCBD/ACTR system. This work was partially supported by European Research Council under the European Union’s Seventh Framework Programme (FP7/2007-2013)/ERC Grant Agreement No. 340906-MOLPROCOMP, and by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) project number 233630050-TRR 146.
Supporting Information Available
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jpcb.0c01949.
Additional model and simulation details as well as further analysis (PDF)
The authors declare no competing financial interest.
Supplementary Material
References
- Oldfield C. J.; Dunker A. K. Intrinsically Disordered Proteins and Intrinsically Disordered Protein Regions. Annu. Rev. Biochem. 2014, 83, 553–584. 10.1146/annurev-biochem-072711-164947. [DOI] [PubMed] [Google Scholar]
- Uversky V. N.; Oldfield C. J.; Dunker A. K. Intrinsically Disordered Proteins in Human Diseases: Introducing the D(2) Concept. Annu. Rev. Biophys. 2008, 37, 215–246. 10.1146/annurev.biophys.37.032807.125924. [DOI] [PubMed] [Google Scholar]
- Kaptein R.; Wagner G. Integrative Methods in Structural Biology. J. Biomol. NMR 2019, 73, 261–263. 10.1007/s10858-019-00267-z. [DOI] [PubMed] [Google Scholar]
- Borgia A.; Zheng W.; Buholzer K.; Borgia M. B.; Schueler A.; Hofmann H.; Soranno A.; Nettels D.; Gast K.; Grishaev A.; et al. Consistent View of Polypeptide Chain Expansion in Chemical Denaturants from Multiple Experimental Methods. J. Am. Chem. Soc. 2016, 138, 11714–11726. 10.1021/jacs.6b05917. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sterckx Y. G. J.; Volkov A. N.; Vranken W. F.; Kragelj J.; Jensen M. R.; Buts L.; Garcia-Pino A.; Jove T.; Van Melderen L.; Blackledge M.; et al. Small-Angle X-Ray Scattering and Nuclear Magnetic Resonance-Derived Conformational Ensemble of the Highly Flexible Antitoxin PaaA2. Structure 2014, 22, 854–865. 10.1016/j.str.2014.03.012. [DOI] [PubMed] [Google Scholar]
- Kjaergaard M.; Teilum K.; Poulsen F. M. Conformational Selection in the Molten Globule State of the Nuclear Coactivator Binding Domain of CBP. Proc. Natl. Acad. Sci. U. S. A. 2010, 107, 12535–12540. 10.1073/pnas.1001693107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kjaergaard M.; Nørholm A. B.; Hendus-Altenburger R.; Pedersen S. F.; Poulsen F. M.; Kragelund B. B. Temperature-Dependent Structural Changes in Intrinsically Disordered Proteins: Formation of α–Helices or Loss of Polyproline II?. Protein Sci. 2010, 19, 1555–1564. 10.1002/pro.435. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Robustelli P.; Piana S.; Shaw D. E. Developing a Molecular Dynamics Force Field for Both Folded and Disordered Protein States. Proc. Natl. Acad. Sci. U. S. A. 2018, 115, E4758–E4766. 10.1073/pnas.1800690115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rauscher S.; Gapsys V.; Gajda M. J.; Zweckstetter M.; de Groot B. L.; Grubmüller H. Structural Ensembles of Intrinsically Disordered Proteins Depend Strongly on Force Field: A Comparison to Experiment. J. Chem. Theory Comput. 2015, 11, 5513–5524. 10.1021/acs.jctc.5b00736. [DOI] [PubMed] [Google Scholar]
- Best R. B.; Zheng W.; Mittal J. Balanced Protein-Water Interactions Improve Properties of Disordered Proteins and Non-Specific Protein Association. J. Chem. Theory Comput. 2014, 10, 5113–5124. 10.1021/ct500569b. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang J.; Rauscher S.; Nawrocki G.; Ran T.; Feig M.; de Groot B. L.; Grubmueller H.; MacKerell A. D. Jr. CHARMM36m: An Improved Force Field for Folded and Intrinsically Disordered Proteins. Nat. Methods 2017, 14, 71–73. 10.1038/nmeth.4067. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Doi M.; Edwards S. F.. The Theory of Polymer Dynamics; Claredon Press: Oxford, U.K., 1986. [Google Scholar]
- de Gennes P. G.Scaling Concepts in Polymer Physics; Cornell University Press: Ithaca, NY, 1979. [Google Scholar]
- Huang J.; Grzesiek S. Ensemble Calculations of Unstructured Proteins Constrained by RDC and PRE Data: A Case Study of Urea-Denatured Ubiquitin. J. Am. Chem. Soc. 2010, 132, 694–705. 10.1021/ja907974m. [DOI] [PubMed] [Google Scholar]
- Schuler B.; Soranno A.; Hofmann H.; Nettels D. Single-Molecule FRET Spectroscopy and the Polymer Physics of Unfolded and Intrinsically Disordered Proteins. Annu. Rev. Biophys. 2016, 45, 207–231. 10.1146/annurev-biophys-062215-010915. [DOI] [PubMed] [Google Scholar]
- O’Brien E. P.; Morrison G.; Brooks B. R.; Thirumalai D. How Accurate are Polymer Models in the Analysis of Forster Resonance Energy Transfer Experiments on Proteins?. J. Chem. Phys. 2009, 130, 124903. 10.1063/1.3082151. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Maity H.; Reddy G. Folding of Protein L with Implications for Collapse in the Denatured State Ensemble. J. Am. Chem. Soc. 2016, 138, 2609–2616. 10.1021/jacs.5b11300. [DOI] [PubMed] [Google Scholar]
- Fuertes G.; Banterle N.; Ruff K. M.; Chowdhury A.; Mercadante D.; Koehler C.; Kachala M.; Girona G. E.; Milles S.; Mishra A.; et al. Decoupling of Size and Shape Fluctuations in Heteropolymeric Sequences Reconciles Discrepancies in SAXS vs. FRET Measurements. Proc. Natl. Acad. Sci. U. S. A. 2017, 114, E6342–E6351. 10.1073/pnas.1704692114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thirumalai D.; Samanta H. S.; Maity H.; Reddy G. Universal Nature of Collapsibility in the Context of Protein Folding and Evolution. Trends Biochem. Sci. 2019, 44, 675–687. 10.1016/j.tibs.2019.04.003. [DOI] [PubMed] [Google Scholar]
- Taketomi H.; Ueda Y.; Go̅ N. Studies on Protein Folding, Unfolding and Fluctuations by Computer-Simulation. 1. Effect of Specific Amino-Acid Sequence Represented by Specific Inter-Unit Interactions. Int. J. Pept. Protein Res. 1975, 7, 445–459. 10.1111/j.1399-3011.1975.tb02465.x. [DOI] [PubMed] [Google Scholar]
- Dill K. A.; Chan H. S. From Levinthal to Pathways to Funnels. Nat. Struct. Mol. Biol. 1997, 4, 10–19. 10.1038/nsb0197-10. [DOI] [PubMed] [Google Scholar]
- Onuchic J.; Luthey-Schulten Z.; Wolynes P. Theory of Protein Folding: The Energy Landscape Perspective. Annu. Rev. Phys. Chem. 1997, 48, 545–600. 10.1146/annurev.physchem.48.1.545. [DOI] [PubMed] [Google Scholar]
- Onuchic J. N.; Wolynes P. G. Theory of Protein Folding. Curr. Opin. Struct. Biol. 2004, 14, 70–75. 10.1016/j.sbi.2004.01.009. [DOI] [PubMed] [Google Scholar]
- Cheung M.; Garcia A.; Onuchic J. Protein Folding Mediated by Solvation: Water Expulsion and Formation of the Hydrophobic Core Occur After the Structural Collapse. Proc. Natl. Acad. Sci. U. S. A. 2002, 99, 685–690. 10.1073/pnas.022387699. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Clementi C.; Plotkin S. S. The Effects of Nonnative Interactions on Protein Folding Rates: Theory and Simulation. Protein Sci. 2004, 13, 1750–1766. 10.1110/ps.03580104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chan H. S.; Zhang Z.; Wallin S.; Liu Z. Cooperativity, Local-Nonlocal Coupling, and Nonnative Interactions: Principles of Protein Folding from Coarse-Grained Models. Annu. Rev. Phys. Chem. 2011, 62, 301–326. 10.1146/annurev-physchem-032210-103405. [DOI] [PubMed] [Google Scholar]
- De Sancho D.; Best R. B. Modulation of an IDP Binding Mechanism and Rates by Helix Propensity and Non-Native Interactions: Association of HIF1 Alpha with CBP. Mol. BioSyst. 2012, 8, 256–267. 10.1039/C1MB05252G. [DOI] [PubMed] [Google Scholar]
- Kumar S.; Showalter S. A.; Noid W. G. Native-Based Simulations of the Binding Interaction Between RAP74 and the Disordered FCP1 Peptide. J. Phys. Chem. B 2013, 117, 3074–3085. 10.1021/jp310293b. [DOI] [PubMed] [Google Scholar]
- Habibi M.; Rottler J.; Plotkin S. S. As Simple as Possible, but not Simpler: Exploring the Fidelity of Coarse-Grained Protein Models for Simulated Force Spectroscopy. PLoS Comput. Biol. 2016, 12, e1005211 10.1371/journal.pcbi.1005211. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sanyal T.; Mittal J.; Shell M. S. A Hybrid, Bottom-Up, Structurally Accurate, Go̅-Like Coarse-Grained Protein Model. J. Chem. Phys. 2019, 151, 044111. 10.1063/1.5108761. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dignon G. L.; Zheng W.; Kim Y. C.; Best R. B.; Mittal J. Sequence Determinants of Protein Phase Behavior from a Coarse-Grained Model. PLoS Comput. Biol. 2018, 14, e1005941 10.1371/journal.pcbi.1005941. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dignon G. L.; Zheng W.; Kim Y. C.; Mittal J. Temperature-Controlled Liquid-Liquid Phase Separation of Disordered Proteins. ACS Cent. Sci. 2019, 5, 821–830. 10.1021/acscentsci.9b00102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rudzinski J. F.; Bereau T. Structural-Kinetic-Thermodynamic Relationships Identified from Physics-Based Molecular Simulation Models. J. Chem. Phys. 2018, 148, 204111. 10.1063/1.5025125. [DOI] [PubMed] [Google Scholar]
- Rudzinski J. F.; Bereau T. The Role of Conformational Entropy in the Determination of Structural-Kinetic Relationships for Helix-Coil Transitions. Computation 2018, 6, 21. 10.3390/computation6010021. [DOI] [Google Scholar]
- Knott M.; Best R. B. A Preformed Binding Interface in the Unbound Ensemble of an Intrinsically Disordered Protein: Evidence from Molecular Simulations. PLoS Comput. Biol. 2012, 8, e1002605 10.1371/journal.pcbi.1002605. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Knott M.; Best R. B. Discriminating Binding Mechanisms of an Intrinsically Disordered Protein via a Multi-State Coarse-Grained Model. J. Chem. Phys. 2014, 140, 175102. 10.1063/1.4873710. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Demarest S.; Martinez-Yamout M.; Chung J.; Chen H.; Xu W.; Dyson H.; Evans R.; Wright P. Mutual Synergistic Folding in Recruitment of CBP/p300 by p160 Nuclear Receptor Coactivators. Nature 2002, 415, 549–553. 10.1038/415549a. [DOI] [PubMed] [Google Scholar]
- Marino J.; Buholzer K. J.; Zosel F.; Nettels D.; Schuler B. Charge Interactions Can Dominate Coupled Folding and Binding on the Ribosome. Biophys. J. 2018, 115, 996–1006. 10.1016/j.bpj.2018.07.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dyson H. J.; Wright P. E. Role of Intrinsic Protein Disorder in the Function and Interactions of the Transcriptional Coactivators CREB-binding Protein (CBP) and p300. J. Biol. Chem. 2016, 291, 6714–6722. 10.1074/jbc.R115.692020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bedford D. C.; Kasper L. H.; Fukuyama T.; Brindle P. K. Target Gene Context Influences the Transcriptional Requirement for the KAT3 Family of CBP and p300 Histone Acetyltransferases. Epigenetics 2010, 5, 9–15. 10.4161/epi.5.1.10449. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lin C.; Hare B.; Wagner G.; Harrison S.; Maniatis T.; Fraenkel E. A Small Domain of CBP/p300 Binds Diverse Proteins: Solution Structure and Functional Studies. Mol. Cell 2001, 8, 581–590. 10.1016/S1097-2765(01)00333-1. [DOI] [PubMed] [Google Scholar]
- Uversky V. N. The Multifaceted Roles of Intrinsic Disorder in Protein Complexes. FEBS Lett. 2015, 589, 2498–2506. 10.1016/j.febslet.2015.06.004. [DOI] [PubMed] [Google Scholar]
- Habchi J.; Tompa P.; Longhi S.; Uversky V. N. Introducing Protein Intrinsic Disorder. Chem. Rev. 2014, 114, 6561–6588. 10.1021/cr400514h. [DOI] [PubMed] [Google Scholar]
- Fyodorov D. V.; Zhou B. R.; Skoultchi A. I.; Bai Y. Emerging Roles of Linker Histones in Regulating Chromatin Structure and Function. Nat. Rev. Mol. Cell Biol. 2018, 19, 192–206. 10.1038/nrm.2017.94. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Arai M.; Sugase K.; Dyson H. J.; Wright P. E. Conformational Propensities of Intrinsically Disordered Proteins Influence the Mechanism of Binding and Folding. Proc. Natl. Acad. Sci. U. S. A. 2015, 112, 9614–9619. 10.1073/pnas.1512799112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bonetti D.; Troilo F.; Brunori M.; Longhi S.; Gianni S. How Robust is the Mechanism of Folding-Upon-Binding for an Intrinsically Disordered Protein?. Biophys. J. 2018, 114, 1889–1894. 10.1016/j.bpj.2018.03.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Boehr D. D.; Nussinov R.; Wright P. E. The Role of Dynamic Conformational Ensembles in Biomolecular Recognition. Nat. Chem. Biol. 2009, 5, 789. 10.1038/nchembio.232. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sugase K.; Dyson H. J.; Wright P. E. Mechanism of Coupled Folding and Binding of an Intrinsically Disordered Protein. Nature 2007, 447, 1021–1027. 10.1038/nature05858. [DOI] [PubMed] [Google Scholar]
- Zosel F.; Mercadante D.; Nettels D.; Schuler B. A Proline Switch Explains Kinetic Heterogeneity in a Coupled Folding and Binding Reaction. Nat. Commun. 2018, 9, 3332. 10.1038/s41467-018-05725-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Naganathan A. N.; Orozco M. The Native Ensemble and Folding of a Protein Molten-Globule: Functional Consequence of Downhill Folding. J. Am. Chem. Soc. 2011, 133, 12154–12161. 10.1021/ja204053n. [DOI] [PubMed] [Google Scholar]
- Samanta H. S.; Chakraborty D.; Thirumalai D. Charge Fluctuation Effects on the Shape of Flexible Polyampholytes with Applications to Intrinsically Disordered Proteins. J. Chem. Phys. 2018, 149, 163323. 10.1063/1.5035428. [DOI] [PubMed] [Google Scholar]
- Hornak V.; Abel R.; Okur A.; Strockbine B.; Roitberg A.; Simmerling C. Comparison of Multiple Amber Force Fields and Development of Improved Protein Backbone Parameters. Proteins: Struct., Funct., Genet. 2006, 65, 712–725. 10.1002/prot.21123. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lindorff-Larsen K.; Piana S.; Palmo K.; Maragakis P.; Klepeis J. L.; Dror R. O.; Shaw D. E. Improved Side-Chain Torsion Potentials for the Amber ff99SB Protein Force Field. Proteins: Struct., Funct., Genet. 2010, 78, 1950–1958. 10.1002/prot.22711. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Miyazawa S.; Jernigan R. Residue-Residue Potentials with a Favorable Contact Pair Term and an Unfavorable High Packing Density Term, for Simulation and Threading. J. Mol. Biol. 1996, 256, 623–644. 10.1006/jmbi.1996.0114. [DOI] [PubMed] [Google Scholar]
- Bereau T.; Deserno M. Generic Coarse-grained Model for Protein Folding and Aggregation. J. Chem. Phys. 2009, 130, 235106. 10.1063/1.3152842. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wright M. R.An Introduction to Aqueous Electrolyte Solutions; John Wiley & Sons Ltd: West Sussex, U.K., 2007. [Google Scholar]
- Givaty O.; Levy Y. Protein Sliding Along DNA: Dynamics and Structural Characterization. J. Mol. Biol. 2009, 385, 1087–1097. 10.1016/j.jmb.2008.11.016. [DOI] [PubMed] [Google Scholar]
- Hess B.; Kutzner C.; van der Spoel D.; Lindahl E. GROMACS 4: Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. J. Chem. Theory Comput. 2008, 4, 435–447. 10.1021/ct700301q. [DOI] [PubMed] [Google Scholar]
- Rudzinski J. F. Recent Progress Towards Chemically-Specific Coarse-Grained Simulation Models with Consistent Dynamical Properties. Computation 2019, 7, 42. 10.3390/computation7030042. [DOI] [Google Scholar]
- Best R. B.; Mittal J. Protein Simulations with an Optimized Water Model: Cooperative Helix Formation and Temperature-Induced Unfolded State Collapse. J. Phys. Chem. B 2010, 114, 14916–14923. 10.1021/jp108618d. [DOI] [PubMed] [Google Scholar]
- Kremer K.; Grest G. Dynamics of Entangled Linear Polymer Melts: A Molecular-Dynamics Simulation. J. Chem. Phys. 1990, 92, 5057–5086. 10.1063/1.458541. [DOI] [Google Scholar]
- Halverson J. D.; Brandes T.; Lenz O.; Arnold A.; Bevc S.; Starchenko V.; Kremer K.; Stuehn T.; Reith D. ESPResSo++: A Modern Multiscale Simulation Package for Soft Matter Systems. Comput. Phys. Commun. 2013, 184, 1129–1149. 10.1016/j.cpc.2012.12.004. [DOI] [Google Scholar]
- Stevens M. J.; Kremer K. The Nature of Flexible Linear Polyelectrolytes in Salt Free Solution: A Molecular Dynamics Study. J. Chem. Phys. 1995, 103, 1669–1690. 10.1063/1.470698. [DOI] [Google Scholar]
- Hsu H. P.; Kremer K. Static and Dynamic Properties of Large Polymer Melts in Equilibrium. J. Chem. Phys. 2016, 144, 154907. 10.1063/1.4946033. [DOI] [PubMed] [Google Scholar]
- Vettorel T.; Grosberg A. Y.; Kremer K. Statistics of Polymer Rings in the Melt: A Numerical Simulation Study. Phys. Biol. 2009, 6, 025013. 10.1088/1478-3975/6/2/025013. [DOI] [PubMed] [Google Scholar]
- Lifson S.; Roig A. On the Theory of Helix-Coil Transition in Polypeptides. J. Chem. Phys. 1961, 34, 1963–1974. 10.1063/1.1731802. [DOI] [Google Scholar]
- Vitalis A.; Caflisch A. 50 Years of Lifson-Roig Models: Application to Molecular Simulation Data. J. Chem. Theory Comput. 2012, 8, 363–373. 10.1021/ct200744s. [DOI] [PubMed] [Google Scholar]
- Best R. B.; Hummer G. Optimized Molecular Dynamics Force Fields Applied to the Helix-Coil Transition of Polypeptides. J. Phys. Chem. B 2009, 113, 9004–9015. 10.1021/jp901540t. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kukharenko O.; Sawade K.; Steuer J.; Peter C. Using Dimensionality Reduction to Systematically Expand Conformational Sampling of Intrinsically Disordered Peptides. J. Chem. Theory Comput. 2016, 12, 4726–4734. 10.1021/acs.jctc.6b00503. [DOI] [PubMed] [Google Scholar]
- Lemke T.; Peter C. EncoderMap: Dimensionality Reduction and Generation of Molecule Conformations. J. Chem. Theory Comput. 2019, 15, 1209–1215. 10.1021/acs.jctc.8b00975. [DOI] [PubMed] [Google Scholar]
- McInnes L.; Healy J.; Melville J.. Umap: Uniform Manifold Approximation and Projection for Dimension Reduction. 2018, arXiv:1802.03426. arXiv.org e-Print archive. https://arxiv.org/abs/1802.03426.
- Altis A.; Otten M.; Nguyen P. H.; Hegger R.; Stock G. Construction of the Free Energy Landscape of Biomolecules via Dihedral Angle Principal Component Analysis. J. Chem. Phys. 2008, 128, 245102. 10.1063/1.2945165. [DOI] [PubMed] [Google Scholar]
- MacQueen J.Some Methods for Classification and Analysis of Multivariate Observations. In Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability; Le Cam L. M., Neyman J., Eds.; Univ. of California Press, 1967; pp 281–297. [Google Scholar]
- Röblitz S.; Weber M. Fuzzy Spectral Clustering by PCCA+: Application to Markov State Models and Data Classification. Advances in Data Analysis and Classification 2013, 7, 147–179. 10.1007/s11634-013-0134-6. [DOI] [Google Scholar]
- Zerze G. H.; Best R. B.; Mittal J. Sequence- and Temperature-Dependent Properties of Unfolded and Disordered Proteins from Atomistic Simulations. J. Phys. Chem. B 2015, 119, 14622–14630. 10.1021/acs.jpcb.5b08619. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vitalis A.; Pappu R. V. ABSINTH: A New Continuum Solvation Model for Simulations of Polypeptides in Aqueous Solutions. J. Comput. Chem. 2009, 30, 673–699. 10.1002/jcc.21005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Das R. K.; Pappu R. V. Conformations of Intrinsically Disordered Proteins are Influenced by Linear Sequence Distributions of Oppositely Charged Residues. Proc. Natl. Acad. Sci. U. S. A. 2013, 110, 13392–13397. 10.1073/pnas.1304749110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Graessley M.; Hayward R.; Grest G. Excluded-Volume Effects in Polymer Solutions. 2. Comparison of Experimental Results with Numerical Simulation Data. Macromolecules 1999, 32, 3510–3517. 10.1021/ma981915p. [DOI] [Google Scholar]
- Iešmantavičius V.; Jensen M. R.; Ozenne V.; Blackledge M.; Poulsen F. M.; Kjaergaard M. Modulation of the Intrinsic Helix Propensity of an Intrinsically Disordered Protein Reveals Long-Range Helix-Helix Interactions. J. Am. Chem. Soc. 2013, 135, 10155–10163. 10.1021/ja4045532. [DOI] [PubMed] [Google Scholar]
- Noid W. G. Perspective: Coarse-Grained Models for Biomolecular Systems. J. Chem. Phys. 2013, 139, 090901. 10.1063/1.4818908. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.