Abstract
MD simulations can provide uniquely detailed models of intrinsically disordered proteins (IDPs). However, these models need careful experimental validation. The coefficient of translational diffusion , measurable by pulsed field gradient NMR, offers a potentially useful piece of experimental information related to the compactness of the IDP’s conformational ensemble. Here, we investigate, both experimentally and via the MD modeling, the translational diffusion of a 25-residue N-terminal fragment from histone H4 (N-H4). We found that the predicted values of , as obtained from mean-square displacement of the peptide in the MD simulations, are largely determined by the viscosity of the MD water (which has been reinvestigated as a part of our study). Beyond that, our analysis of the diffusion data indicates that MD simulations of N-H4 in the TIP4P-Ew water give rise to an overly compact conformational ensemble for this peptide. In contrast, TIP4P-D and OPC simulations produce the ensembles that are consistent with the experimental result. These observations are supported by the analyses of the 15N spin relaxation rates. We also tested a number of empirical methods to predict based on IDP’s coordinates extracted from the MD snapshots. In particular, we show that the popular approach involving the program HYDROPRO can produce misleading results. This happens because HYDROPRO is not intended to predict the diffusion properties of highly flexible biopolymers such as IDPs. Likewise, recent empirical schemes that exploit the relationship between the small-angle x-ray scattering-informed conformational ensembles of IDPs and the respective experimental values also prove to be problematic. In this sense, the first-principle calculations of from the MD simulations, such as demonstrated in this work, should provide a useful benchmark for future efforts in this area.
Graphical abstract

Significance
Intrinsically disordered proteins play a prominent role in neurodegenerative disease as well as cancer. Structural propensities and dynamics of these proteins can be comprehensively characterized by MD modeling. However, such MD models need rigorous experimental validation. In this study, we show how different MD models of the disordered N-terminal fragment of histone H4 can be successfully validated using the translational diffusion coefficient measured by diffusion NMR experiments. Our first-principle approach that tracks the diffusion of the peptide in the MD trajectory provides a useful benchmark for further efforts in this area. In particular, it demonstrates that some widely used empirical tools, which predict the diffusion properties of disordered proteins based on their MD snapshots, can produce misleading results.
Introduction
Intrinsically disordered proteins (IDPs) and intrinsically disordered protein regions (IDPRs) have gained much attention over the last three decades. From the evolutionary perspective, IDPs are considered to be an advanced element of cellular machinery: the fraction of disordered sequences increases from archeal proteomes to bacterial proteomes to eukaryotic proteomes (1). IDPs or IDPRs play a prominent role in molecular machines such as ribosome (2), in chromatin (3), in nuclear pore transporters (4), in cytoskeletal assembly (5), and in phase-separated organelles (6). In broad terms, the prominence of IDPs and IDPRs stems from their involvement in cell signaling. Signaling networks involving disordered proteins have a higher degree of interconnectedness and a broader range of dynamic responses, resulting in improved functional efficiency (7). The increased efficiency, however, comes at a price: disordered proteins are susceptible to proteolytic cleavage (8), aberrant modifications (9), conversion to a prionic form (10), or misassembly such as formation of amyloid fibrils (11). All of this explains the ominous role of IDPs in neurodegenerative disorders, as well as wide involvement in cancer (12).
A search for pharmaceuticals to target IDPs is ongoing. However, finding effective small-molecule ligands of IDPs (which is a prerequisite for intracellular targets) remains an outstanding challenge (13). Industry-wide efforts to develop therapeutic antibodies against extracellular IDPs such as Aβ and extracellular tau so far have proved futile (14,15).
From a structural standpoint, IDPs can be thought of as dynamic ensembles consisting of a multitude of constantly interconverting conformers. Roughly speaking, an unbound IDP can be viewed as a “statistical coil.” Typically, it acquires structure upon binding to its structured target, cf. “folding upon binding” (16), although in some cases IDPs remain partially disordered after forming the so-called fuzzy complexes (17,18) or even fully disordered after an engagement with another IDP (19). Historically, unbound IDPs were assumed to be featureless coils that were of little interest per se (20). However, eventually it has been realized that IDPs have certain structural propensities, termed “residual structure” (21,22). These structural propensities can have an effect on IDPs’ affinities for their biological targets. The effect is usually modest, yet biologically significant. For example, the level of residual helicity in transactivation domain of p53 has an influence on its binding constant to Mdm2 (23). This and many other such examples have spurred interest in structural studies of IDPs.
Standard methods of structural biology such as x-ray crystallography, cryo-EM, or NOE-based NMR are unsuitable for characterization of IDPs and IDPRs or, otherwise, provide only indirect information. Instead, the bulk of experimental evidence about structural propensities of IDPs comes from other NMR data, such as heteronuclear chemical shifts, heteronuclear relaxation rates, and paramagnetic relaxation enhancements, as well as diffusion data obtained from pulsed-field gradient NMR (PFG-NMR) experiments. In particular, diffusion data offer a potentially important piece of information, shedding light on the compactness of the IDP’s conformational ensemble (which can have a different appearance ranging from a highly extended random coil to a rather densely packed molten globule). The PFG-NMR measurements could be particularly useful for smallish IDPs (disordered peptides) with molecular weights of several kDa. For systems of this size it can be difficult to obtain comparable information by means of small-angle x-ray scattering (SAXS) since the sensitivity of SAXS declines at low molecular weight (24) and boosting the signal by using high-brilliance synchrotron sources often causes radiation damage to protein samples (25). In contrast, diffusion NMR spectroscopy can be successfully used for IDPs of an arbitrary size and, furthermore, smaller IDPs usually lend themselves to a comprehensive characterization by various NMR methods.
While in the case of folded proteins the experimental data translate into structural models that are deposited into Protein Data Bank, what are the comparable models for disordered proteins? Early on, IDPs conformational ensembles were modeled by a collection of multiple conformers with attached statistical weights. The best-known ensemble generation schemes are ENSEMBLE (26) and ASTEROIDS (27). Later, it has been proposed that MD trajectories can serve as structural models of IDPs, allowing for direct testing against NMR observables and other experimental data (28). Indeed, MD simulations offer a natural path toward a realistic IDP’s conformational ensemble. Moreover, they capture the dynamic aspect of IDP’s conformational equilibrium, i.e., encode the transition rates between a multitude of different conformers. This is particularly valuable for analyses of relaxation rates and paramagnetic relaxation enhancements, which depend on motional time constants—these observables can be calculated directly from the MD trajectories and then compared with the experimental data.
While the idea of using an MD trajectory as a “structural-dynamic model” of an IDP seemed appealing, it faced some major challenges. First, the conventional MD force fields (and water models) proved to be ill-suited for modeling of disordered proteins. Second, typical MD trajectories were too short to sample the vast conformational phase space of an IDP. Over the last decade, a substantial headway has been made to address both of these problems. A number of new force fields and water models have been developed to model unstructured (as well as structured) proteins (29,30,31,32). At the same time, the advent of GPU computing along with development of special schemes for enhanced conformational sampling improved the situation with statistics of IDP simulations (33,34,35,36). As a consequence, it became possible to generate a converged trajectory involving a short IDP (i.e., a long peptide) and use it as a bona fide model to test against the experimental data (37,38). Otherwise, for longer IDPs, Blackledge and co-workers developed the ABSURD approach whereby the disordered protein is modeled using a collection of shorter MD trajectories, which enter into the model with adjustable weights (39).
In this report, we address the question of whether an MD model of an IDP can be validated against PFG-NMR diffusion data. As already mentioned, the translational diffusion coefficient is one of the potentially informative pieces of data, characterizing the overall compactness of the IDP’s conformational ensemble. However, calculating from the MD data (for the purpose of further comparison with the experimental result) is less than trivial.
Indeed, it is well known that calculated values depend on the size of the simulation box. This has been demonstrated for simple fluids, polymers, and proteins (40,41,42,43,44,45,46). The origin of this effect can be traced to the constraints imposed on the MD simulation and, more specifically, the requirement of zero net momentum. This requirement leads to a situation where some of the dynamics in the simulated system is subtracted out; the effect is significant for small-sized systems, leading to slower-than-expected apparent diffusion (47). To counter this problem, Yeh and Hummer proposed to conduct several simulations in boxes of increasing size and then extrapolate the resulting values toward the limit of an infinitely large box (47). While this approach has been commonly accepted, it is rather demanding from a computational standpoint. In particular, it requires one to record a long IDP simulation in a very large water box.
Furthermore, accurate determination of under the commonly used NPT ensemble equipped with a Langevin thermostat proves to be problematic. It is a general observation that simulations employing the Langevin thermostat do not reproduce hydrodynamics (48). For the typically used values of friction constant (which are necessary for effective temperature control), the friction leads to appreciable increases in solvent viscosity and, accordingly, results in underestimated (49). In principle, empirical schemes can be developed to correct for this effect; e.g., Hicks et al. recently presented such corrections for rotational diffusion coefficient (50). However, these corrections are rather cumbersome and depend on the size of the simulated protein; in the case of IDPs, this approach is likely to face further complications. A better solution to this problem is to use another thermostat, such as a Bussi-Parrinello velocity rescaling thermostat (51).
In this study we focus on the intrinsically disordered N-terminal fragment of human histone H4 (residues 1–25). The N-terminal tail of H4 (hereafter referred to as N-H4) plays an important role in regulating gene transcription as well as chromatin architecture and remodeling (52). In addition to the N-H4, we also studied the well-known globular protein ubiquitin (Ub), which serves as a point of reference in our study of translational diffusion. Since the MD-predicted constants of N-H4 and Ub are dependent on the viscosity of the modeled solvent, we also systematically investigated the self-diffusion of water (53).
The simulations were conducted in the program Amber20 (54) under ff14SB force field (55) using three different water models: TIP4P-Ew (56), TIP4P-D (31), and OPC (57). For both Ub and N-H4, we observe that the simulations using TIP4P-Ew significantly overestimate the diffusion coefficients, the simulations using TIP4P-D somewhat underestimate them, while the simulations using OPC overestimate them but only very slightly. These observations are largely attributable to variations in viscosity between the three water models. However, in the case of the disordered N-H4 peptide the results also convey some information about the conformational ensemble of the peptide. Specifically, our data suggest that TIP4P-Ew solvent leads to overly compact representation of the N-H4 species. This observation is confirmed by the comparison of MD-simulated and experimental 15N relaxation data. On the other hand, both OPC and TIP4P-D simulations are compatible with the experimental results within the uncertainty range.
Interestingly, analyses of the MD data using HYDROPRO software (58) or HYDROPRO-based parameterization (59) lead to the opposite conclusion, i.e., that the TIP4P-Ew model is the best water model to simulate N-H4. This serves as a cautionary note regarding the application of HYDROPRO, which is neither intended nor suited to be used on disordered proteins. Similarly, caution should be exercised with regard to other empirical schemes to predict using the simulated conformational ensembles of IDPs (60,61). Direct determination of from MD simulations, as demonstrated in this work, can provide a useful benchmark for further studies in this area.
Materials and methods
MD simulations
The initial coordinates of N-H4 peptide (amino acid sequence SGRGKGGKGLGKGGAKRHRKVLRDN) were generated as described previously (37). In brief, we built 2000 random N-H4 conformations using the server unfolded.uchicago.edu (62) and the program Scwrl4 (63). All conformations were energy minimized and then ordered according to their energies in GBneck2 solvent (64). One structure was chosen at random from the central portion of the energy histogram and subsequently used in all simulations.
The initial structure was protonated in accordance with the experimental pH 4.0 using the program PROPKA (65). The low pH was originally chosen to minimize amide proton exchange with solvent; the protonation of the peptide at pH 4.0 is the same as at physiologically relevant pH 7.2 except for a single histidine residue (37). The simulations were conducted in Amber 20 (ff14SB) using three different water models: TIP4P-Ew, TIP4P-D, and OPC. For each choice of water model, we recorded the trajectories in solvation boxes of different size. Here, we describe the simulations using smaller boxes; the procedure to record trajectories in medium- and large-sized boxes is explained later.
The truncated octahedral boxes were built using SolvateOct command in LEaP; the distance parameter (nominally, a minimal separation between the peptide and the boundary of the cell) has been set to 12 Å for TIP4P-Ew simulations or, otherwise, 15 Å for TIP4P-D and OPC simulations. The system was neutralized by adding nine Cl‒ ions (66) and then energy minimized with harmonic restraints applied to N-H4 heavy atoms. After that the system was brought to the target temperature of 298 K and equilibrated for 1 ns in the NVE ensemble before the production run.
As already pointed out, in our simulations we opted for a velocity rescaling thermostat, also known as Bussi thermostat (51), which preserves the native-like dynamics of the system. For this purpose, we have implemented Bussi thermostat as a part of the official Amber 20 release (option ntt = 11). The pressure was controlled by Berendsen barostat (67) with the coupling time of 1 ps. The nonbonded interactions were calculated with a cutoff of 11 Å. The particle mesh Ewald summation scheme has been employed to treat long-range electrostatic interactions with the default parameters for grid spacing and spline interpolation. Bonds involving hydrogens have been restrained by means of the SHAKE algorithm (68). The integration timestep was 1 fs (motivated by a separate series of NVE simulations, see below). The coordinates were stored every 1 ps. The simulations were conducted using in-house GPU workstations equipped with NVIDIA GeForce GTX 1080, RTX 2080 Ti, and RTX 3080 cards. The length of the small-box N-H4 simulations employing three different water models was 5 μs apiece.
Note that the above description refers to the MD simulation of the disordered peptide in a periodic boundary cell of a limited size. Does this have any constraining effect on the N-H4 exploring its conformational phase space? The answer depends on a number of encounters between the peptide and its periodic images during the course of the simulations. To address this issue, we analyzed the trajectories and calculated the fraction of all frames featuring close encounters between the master copy of N-H4 and its periodic images (where at least a pair of atoms come to within 5 Å of each other). As it turns out, the proportion of such frames is only a fraction of percentage point, viz. 0.06, 0.36, and 0.17% for trajectories in TIP4P-Ew, TIP4P-D, and OPC water, respectively. Hence, we conclude that the crowding effect in our simulations must be minimal and can be safely ignored.
As already pointed out, the procedure to determine and relies on several MD trajectories recorded in water boxes of increasing size. While the simulations using relatively small solvent boxes (see above) are reasonably fast, their counterparts involving medium- and large-sized boxes are far more time consuming. As a time-saving device, we propose a special scheme to record these latter trajectories (see Fig. 1). In brief, we extract the N-H4 coordinates from the frames at times 0, 10, 20, …, 4990 ns of the small-box trajectory and then place these conformers into bigger boxes. Specifically, medium-sized boxes were generated by adding, via SolvateOct, a water shell with the minimal thickness of 24 Å, whereas the large boxes were generated with the water layer of 48 Å. The so-prepared simulation cells were then equilibrated as described above and used to start short trajectories with the length of ns (see Fig. 1).
Figure 1.
Schematic design of N-H4 simulations using small-, medium-, and large-size water boxes. The short trajectories were initially recorded with the length of ns and later extended to ns. To see this figure in color, go online.
In this work, we recorded a series of trajectories with ns. This scheme offers substantial savings in computational time. First, there is a twofold gain due to the reduced overall length of the simulations (see Fig. 1). Second, there is a severalfold gain from using a network of GPU computers equipped with a queuing system (in our case, SLURM), which allows one to quickly record a series of short trajectories. At the same time, the described scheme ensures satisfactory conformational sampling for the medium- and large-box simulations, as inherited from the continuous 5-μs trajectory of N-H4 in a smaller water box. To demonstrate the results are converged, we later extended ns series of simulations to ns.
Alongside with NPT simulations employing the Bussi thermostat, we also recorded a series of NVE simulations of the N-H4 peptide in TIP4P-Ew and TIP4P-D water. It is well known that NVE simulations represent “true” dynamics of the system, free from potential interference by thermostating algorithms. At the same time, NVE simulations often develop a substantial temperature drift, which makes them ill-suited for routine applications (69,70). In the context of our work, NVE simulations were conducted as a control, with the aim to validate the NPT results using the Bussi thermostat. The protocol to record NVE trajectories was identical to the one described above, including the integration timestep of 1 fs (intended to reduce the temperature drift). Using the scheme shown in Fig. 1, we only need to worry about the temperature drift during the 5-μs NVE simulations in the smaller box (since the simulations in medium- and large-sized boxes are recorded in short segments of or 10 ns and hence do not suffer from the temperature drift). As it happens, the observed temperature drift in 5 μs NVE simulations of N-H4 using small water boxes is modest: on average, the temperature is increased by 1.2 K in the TIP4P-Ew trajectory and by 0.5 K in the TIP4P-D trajectory. Although these effects are not inconsequential, they are tolerable for the purpose of our analyses, which is to validate the results of the NPT simulations.
The summary of all N-H4 trajectories recorded in this study is given in Table S1. The net length of these trajectories amounts to 75 μs.
Separately, to investigate the effect of N-H4 internal mobility on its diffusion properties (in relation to HYDROPRO-based predictions of ) we conducted a special restrained simulation of N-H4. Specifically, we regenerated the set of five hundred 10-ns trajectories of N-H4 in OPC water in the medium-sized box. Each of these trajectories was recorded with soft restraints imposed on all pairwise interatomic distances (a total of 73,536 restrains). The restraint potential was a well with a parabolic bottom, (extending from 0.99 to 1.01 , where is the interatomic distance found in the initial frame, with set to 1 kcal mol−1 Å−2) and linear sides beyond that (54). The combined effect of all restraints was to maintain the peptide conformation close to its initial conformation (which is specific to each individual 10-ns simulation); the root-mean-square deviation of the peptide coordinates relative to the initial frame did not exceed 0.15 Å. The temperature, pressure and volume of the restrained simulations remained on target and stable; the net restraint energy was a small fraction, 0.11%, of the total energy. The same kind of simulations were also conducted in a small box; in doing so, the continuous small-box trajectory of N-H4 in OPC water was divided into 10-ns segments and used to seed five hundred 10-ns restrained simulations.
In addition to N-H4, we also recorded a series of trajectories for the popular model protein Ub, which has been chosen in our study as a control system. The initial coordinates were from the structure PDB: 1UBQ (71) protonated using tleap (according to PROPKA, the protonation pattern of Ub remains the same between pH 4.5 and 6.5). The simulation temperature was 303 K; other MD parameters were the same as in the N-H4 simulation. The design of the simulations was similar to the one described above (see Fig. 1), i.e., the trajectory in a small box was recorded in one piece, whereas the trajectories in medium and large boxes were recorded as collections of multiple short trajectories with ns (subsequently extended to 10 ns). Since the requirements regarding conformational sampling in Ub simulations are rather minimal, we reduced the duration of the simulations from 5 to 2 μs. The same tactic was used for NVE simulations of Ub in TIP4P-Ew and TIP4P-D water. The temperature drift in the small-box NVE trajectories of Ub proved to be insignificant (0.4 and 0.2 K, respectively), thus facilitating the type of procedure shown in Fig. 1. The summary of Ub simulations can be found in Table S2; their net length amounts to 30 μs.
To obtain a handle on water viscosity for TIP4P-Ew, TIP4P-D, and OPC models, we also recorded a set of trajectories for pure water in the simulation boxes of increasing size. The dimensions of the small, medium, and large boxes were chosen to be the same as in Ub simulations. The length of each trajectory was 0.15 μs; the net length of all water simulations was 1.35 μs.
In addition, to explore the significance of the integration timestep, we simulated Ub in the OPC water using 2 and 4 fs timesteps with or without the hydrogen mass repartition (HMR) scheme (72).
Processing of MD data
As a first step, the MD trajectories using periodic boundary conditions were unwrapped, paying special attention to box volume fluctuations in the NPT simulations (73). To this end we used the in-house script based on the python library pyxmolpp2.
The translational diffusion coefficients were calculated in a standard manner using mean-square displacement (MSD) of the protein’s center of mass (74). For collection of trajectories with the individual length of ns, the MSD was computed as follows:
| (1) |
where defines the protein’s center of mass in the i-th trajectory, and are a pair of time points separated by the interval , and is a number of such pairs within the trajectory of length . Whenever a single long trajectory is used instead of the collection of short trajectories, the formula in Eq. 1 is reduced to the familiar simple expression (74).
The obtained dependencies were fitted using a linear fitting function to extract . In doing so, it is important to choose the appropriate interval over which the fitting is performed. This interval should not extend to large values, where the accuracy of the curve suffers from increasingly poor statistics that manifests itself in large correlated errors (75,76). On the other hand, one can argue that small values should also be left out. In this connection, the role of internal protein dynamics is particularly relevant (discussed below).
Conformational transitions in a disordered protein lead to a (limited) displacement of the protein’s center of mass. This effect can be thought of as restrained translational diffusion. The characteristic time constants of this diffusion process can be gleaned from N-H4 simulations, where the two dominant motional modes were observed: the fast mode on a ∼100 ps timescale and the slow mode on a ∼1 ns timescale (37). Therefore, one can expect that conformational dynamics of N-H4 makes a certain contribution to the simulated dependence in the subnanosecond time window. Indeed, some evidence of this behavior can be seen in our MD data (an example is shown in the inset of Fig. S1 E). On the other hand, we note that the experimentally measured values correspond to very long intervals, on the order of tens or hundreds of milliseconds, and thus must be completely insensitive to protein internal motions. Therefore, our goal is to fit such as to leave out the presumed small contributions from fast internal dynamics.
Considering these criteria, we chose the fitting interval [100 ps, 1 ns] to fit profiles and determine (illustrated in Fig. S1). We also tested other choices of the fitting interval and found only a small amount of random variation in the resulting values, thus indicating that the scheme is sufficiently robust (Fig. S1). We also tested the convergence of our scheme by comparing the values from ns and ns sets of trajectories. The so-obtained values are in good agreement with each other, confirming that the results are converged (see Fig. S2 A). In what follows, we use the ns data to determine with maximum possible accuracy. Likewise, the more statistically sound ns data have also been used for all other calculations in this paper.
The procedure to calculate for Ub is the same as described above for N-H4. The procedure to calculate for water has a similar design as well. In water molecules, internal dynamics occurs on a femtosecond timescale; therefore, we used a short time-fitting interval [1 ps, 100 ps] to reduce the amount of statistical uncertainty affecting (75,76). The values were determined for 100 water molecules chosen at random from within the water box and subsequently averaged.
Ultimately, to predict the translational diffusion coefficient, we plot the obtained values as a function of inverse simulation box size. Any linear size of the truncated octahedron can be used for this purpose, e.g., the edge length ; we chose the diameter of the inscribed sphere, , as a characteristic size of the box. Conveniently, the value of is listed in the last line of the rst7 coordinate file.
Of note, N-H4 simulations employing medium and large cells involve boxes of somewhat variable size (depending on the starting peptide conformation, see above). For example, considering large boxes in the 500 short OPC trajectories, we find = 142 ± 9 Å. Of interest to us, the inverse size, , varies according to 0.0071 ± 0.0004 Å−1. When extrapolating to an infinitely large box, the average value has been used, 0.0071 Å−1 (see Fig. 5). In principle, it would be straightforward to use boxes of unified size (e.g., corresponding to the largest of all large boxes generated in our treatment).
Figure 5.
Determination of diffusion coefficients for (A) N-H4 and (B and C) Ub from the series of MD simulations using TIP4P-Ew, TIP4P-D, and OPC water models. The simulations were conducted in the NPT ensemble at 298 K (for N-H4) and 303 K (for Ub). The per-box values are plotted as a function of the inverse linear size of the simulation box, , and fitted via the linear regression (dashed lines). The y-intercept of each regression line corresponds to the predicted diffusion coefficients or . The per-box values are plotted as a function of the inverse (generalized) box volume, , and likewise fitted via the linear regression. The y-intercept of each regression line corresponds to the predicted rotational diffusion coefficient . The positioning of the points in graphs B and C may lead one to assume that the y-intercept of the vs. dependence is determined more reliably than that of the vs. . However, there is some amount of scatter associated with values (readily visible for the middle point in C). Ultimately, the comparison of the results from the NPT and NVE simulations, as well as jackknife estimation of uncertainties (see Table S3), indicate that and were determined with similar precision. In addition to the simulated data, the experimental results are also shown in the plot (horizontal red lines). To see this figure in color, go online.
Comparing values derived from the independently performed NPT and NVE simulations provides a good handle on MD-related uncertainty. In addition, we also employed a jackknife method to estimate the uncertainties of MD-predicted diffusion coefficients. In doing so, the trajectories in small-sized boxes were divided into 10-ns segments, similar to trajectories in medium- and large-sized boxes. Then for each box size we discarded at random 20% of all 10-ns segments and reprocessed the resulting redacted data set to determine the translational diffusion coefficient. This procedure was repeated 1000 times; the standard deviation of the obtained distribution was taken to be the uncertainty of the prediction (reported in Table S3).
While has been determined for both N-H4 and Ub, the rotational diffusion coefficient is well defined only for Ub. To extract this parameter from the MD data, we use the following multistep procedure:
-
1)
superpose (via the secondary-structure Cα atoms) the Ub molecules from all MD frames onto the Ub molecule from the first frame
-
2)
parameterize the above superposition operations via rotation matrices
-
3)
construct a pseudo-molecule containing 64 vectors (emulating N-HN-bonds) with near-uniform distribution on a unit sphere (77)
-
4)
apply the above rotation matrices to this pseudo-molecule (thus generating a pseudo-trajectory that encodes the protein’s tumbling motion)
-
5)
evaluate the time-correlation functions for all 64 vectors
| (2) |
where is the second-order Legendre polynomial, is the angle between the vector’s orientations at points in time and in the i-th trajectory, and the averaging is over pairs of time points separated by the interval and over trajectories of ns length (78). To speed up the calculations, the inner sum in Eq. 2 was evaluated using fast Fourier transformation (79).
-
6)
combine the correlation functions from all 64 vectors, including the prescribed integration weights (77), to obtain
-
7)
fit to a mono-exponential function to determine the tumbling time and subsequently calculate . Similar to the treatment of translational diffusion, it is important to carefully select the fitting interval. We investigated this aspect in some depth using long MD and BD (Brownian dynamics) simulations of Ub (not shown) and concluded that the most accurate results are obtained when using the fitting interval [0, 2 ns]. The examples of curves along with their best fits are illustrated in Fig. S3.
The above procedure allows us to recover that is comparable with the one extracted from NMR relaxation studies. There are also other schemes that can be used to extract from MD trajectories. For example, one can use MD data to calculate 15N relaxation rates (cf. discussion below), analyze these simulated data to determine the anisotropic rotational diffusion tensor of Ub (80), and from there make a transition to the isotropic diffusion coefficient . However, here we favor a simpler and more general approach by Wong and Case (78), as described above.
Similar to translation, data also need to be extrapolated to the infinite-size box. In this case, the dependence of on the inverse box volume, rather than the linear size, is used to perform linear extrapolation (81). In lieu of the volume one can use any quantity that is proportional to the volume; for the sake of convenience we used . The convergence of this procedure is illustrated in Fig. S2 B. The uncertainty of predictions has been estimated using the same jackknife procedure as described above for .
The scripts written in-house to extract and from an MD trajectory are available for download at https://github.com/bionmr-spbu-projects/2023-UBQ-NH4-DIFFUSION. This repository also contains installation and usage notes, as well as a short sample trajectory of Ub that can be used for the purpose of testing. The requisite python library pyxmolpp2, which is a toolkit for processing of MD data developed in-house, can be downloaded from https://github.com/bionmr-spbu/pyxmolpp2.
While the concept of rotational diffusion is not well defined for a disordered protein, a comparable dynamics information can be obtained from heteronuclear relaxation data. Here, we used the MD simulations of N-H4 to calculate residue-specific 15N longitudinal relaxation rates and CSA-dipolar cross-correlated cross-relaxation rates (37). The corresponding correlation functions were calculated using Eq. 2; in the case of cross-correlations, the angle was taken to be the angle between the dipolar (N-HN) vector at time and the unique axis of the axially symmetric 15N CSA tensor at time . When dealing with fragmented trajectories, the calculations were conducted on the ns data set (same as for and ).
The MD-derived correlation functions were fitted using the 4-exponential ansatz, . The four characteristic times were constrained to the intervals [1 ps, 10 ps], [10 ps, 100 ps], [100 ps, 1 ns], and [1 ns, 10 ns], while the weights were normalized to ensure that they sum up to 1.0. The fitting was performed using the Levenberg-Marquardt algorithm implemented in SciPy function curve_fit (82). The fitted and values were translated into spectral densities, which were in turn used to calculate and (83). In these calculations, we used the standard values for N-HN-bond length (1.02 Å), 15N chemical shift anisotropy (−170 ppm), and the angle that the unique axis of the nitrogen CSA tensor makes with N-HN-bond (20°) (84).
The HYDROPRO calculations were conducted on the individual frames from the small-box MD simulations (stride 1 ns) using the experimental water viscosity and density at 298 or 303 K (for N-H4 and Ub, respectively) and other input parameters set to default values. The calculations using HullRadSAS (85) were performed in the same manner using the experimental water viscosity and otherwise in the default mode.
The empirical relationship between the MD-derived average radius of gyration and the HYDROPRO-calculated hydrodynamic radius was used as given in the original report (59). The was converted into according to the authors’ prescriptions via the Einstein-Stokes equation:
| (3) |
where is the Boltzmann constant, is the temperature, and is the water viscosity, or otherwise via the diffusion data from the reference molecule, 1,4-dioxane:
| (4) |
where the hydrodynamic radius of dioxane was assumed to be 2.27 Å, as advocated by the same investigators (61), and the diffusion coefficient of dioxane in water at 298 K was taken to be 1.10 × 10−9 m2/s (86).
The Kirkwood-Riseman formula was used in the following form (87):
| (5) |
where is the distance between the Cα atoms from i-th and j-th residues, is the number of residues in the disordered protein chain, and the angular brackets denote averaging over multiple MD frames. The transition from to has been made using either Eq. 3 or Eq. 4.
Experimental measurements and data
The N-H4 peptide was synthesized by Pepmic (Suzhou, China). The sample with peptide concentration 1 mM was prepared in 20 mM NaAc-d3 buffer containing 5% D2O (pH 4.0). Recombinant Ub was expressed and purified as described elsewhere (88). The sample was prepared with low protein concentration, 0.23 mM, to reduce the proportion of Ub dimers (89); lyophilized Ub was dissolved in 20 mM NaAc-d3 buffer containing 5% D2O (pH 6.0), approximating the conditions of a Ub dimerization study (89). For the purpose of diffusion measurements, both N-H4 and Ub solutions were placed in D2O-susceptibility-matched Shigemi tubes with the sample volume 250 μL (90).
Translational diffusion measurements were conducted on a Bruker Avance III 500 MHz spectrometer equipped with 5-mm BBI probehead with z axis gradient. The experiments were carried out at 298 K for N-H4 and 303 K for Ub. The convection compensated double-stimulated echo sequence (91) with 3-9-19 WATERGATE (92) water suppression was employed; the sequence code is based on Bruker pulse programs dstebpgp3s and stebpgp1s19. The standard Bruker smoothed square shape SMSQ10.100 with shape factor of 0.9 was used for all gradient pulses. The duration of each component of the bipolar encoding/decoding gradient pulses was equal to 2.7 ms. Twenty spectra were acquired with the gradient amplitude ranging from 2.4 to 50 G/cm. The diffusion delay was 0.1 s, the acquisition time was 3.6 s and the recycling delay was 5 s.
The aliphatic high-field proton signals (several spectral lines in the range from 0.93 to 0.73 ppm for N-H4 and a single line at −0.17 ppm for Ub) that are far removed from the residual water resonance were chosen for diffusion coefficient determination. The integral intensity of these signals were evaluated using the new algorithm developed in our laboratory (93), which is now available through the web server DDfit (Diffusion Data fit), https://ddfit.bio-nmr.spbu.ru. To extract the coefficient, the data were fitted using Stejskal-Tanner equation (94) with Jerschow-Müller modifications (91).
Before comparison with the MD data, the experimental results were corrected for a small fraction of D2O and sodium acetate in the buffer (assuming that is inversely proportional to solvent viscosity, cf. the Einstein-Stokes equation). Furthermore, for Ub the result was corrected for the presence of Ub dimers (89) (which are responsible for 8% of the NMR signal from the dilute sample at hand). Of note, this correction amounts to only 1.7% of the measured value; in this sense, the experiment is rather insensitive to the modest proportion of dimers in the sample. The corrected values, corresponding to a monomeric protein in pure water, allow for direct comparison with the results of MD simulations.
The value for Ub is from the paper by Charlier et al., where a unique set of 15N relaxation data at multiple magnetic fields has been collected for samples with different protein concentrations, including a highly dilute sample (95). Under the conditions of their study, pH 4.5, the authors did not find any evidence of Ub dimerization, but observed nonspecific self-association behavior at high protein concentration. To avoid this effect, we used the reported value measured at low protein concentration of 0.2 mM. This result was corrected for temperature (296.6 K in the experimental measurements), as well as the presence of D2O and acetate in the solvent, using the Einstein-Stokes equation.
The 15N relaxation data for N-H4 peptide were used as originally reported (37).
Results and discussion
Water models for IDP simulations
The field of biomolecular MD simulations is dominated by classical 3-site or 4-site rigid point-charge water models (96). The interactions of these water models are limited to pairwise Coulomb attraction/repulsion and pairwise Lennard-Jones attraction (dispersion)/repulsion. The geometry of the water molecule, the magnitudes of point charges and their placement, as well as the Lennard-Jones constants are usually adjusted to reproduce certain experimental characteristics of water. For example, the widely used 3-site TIP3P model approximates the geometry of water molecule in gas phase, with tunable interaction parameters adjusted such as to reproduce a water density and heat of vaporization (97). The 4-site TIP4P model, reported by the same authors, has the negative charge slightly shifted away from the oxygen atom, which improves the agreement with the oxygen-oxygen and oxygen-hydrogen radial distribution functions from the neutron scattering experiments. Later, the TIP4P model has been reoptimized by using the Ewald summation technique instead of simply truncating the Coulomb interactions (56). The resulting TIP4P-Ew model along with the classical TIP3P and SPC/E (98) have been the most popular choices in the field of biomolecular MD simulations over the last two decades.
About 15 years ago, emerging interest in IDPs and advent of GPU computing prompted a number of research groups to begin simulating flexible peptide molecules. Soon it was recognized that such simulations suffer from inadequate solvation of peptide chains (99). Specifically, solvation free energies of peptide moieties turned out to be less favorable than those measured experimentally. A number of attempts have been made to repair this problem by fine-tuning the parameters of the relevant van der Waals interactions (100,101). In particular, these efforts led to a new version of the Amber ff99SB force field (102,103), which was later dubbed ff99SB-UCB. This modified force field has been complemented with a suitably modified TIP4P-Ew water model. Specifically, van der Waals parameters of the oxygen atom were tuned to better reproduce solvation free energies for a set of small molecules representative of peptide chemical space. Unfortunately, this modified water model remains nameless.
Soon thereafter, another modified water model was introduced under the name TIP4P-D (31). This model was built upon TIP4P/2005 (104), with van der Waals potential originating on the oxygen atom reparametrized such as to make peptide hydration more enthalpically favorable. Charges were also adjusted to fit the temperature-dependent density and heat of vaporization data. The model has been tested with a number of force fields, including Amber ff99SB-ILDN (105), and showed good results in IDP simulations. However, the authors also noted a slight tendency of TIP4P-D to destabilize globular proteins.
Two years later, a similarly modified TIP3P model was introduced for use in conjunction with the amended force field, CHARMM36m, with the intent to simulate both folded and unfolded proteins (29). Likewise, this revised water model was constructed by redefining van der Waals parameters in the original model. Unfortunately, this water model is also lacking a unique name.
A similar effort has also been undertaken to generalize Amber force fields, with a99SB-disp aspiring to model both folded and unfolded proteins (30). This force field was equipped with a slightly revised version of TIP4P-D, which also lacks a distinctive name. The new package was extensively tested on a set of 21 benchmark proteins.
In the meantime, a different type of water model, OPC, has been developed by optimizing positions of point charges with no regard for covalent geometry; the van der Waals parameters of the oxygen atom were tuned to match certain key experimental metrics (57). Later, the authors reported that the OPC model achieves good results in modeling of IDPs (32). In particular, the trajectory of a 26-residue N-terminal peptide from histone H4 recorded in OPC water under the Amber99SB force field showed a significantly more expanded conformational ensemble compared with the trajectory in conventional TIP3P water.
A number of papers have also been published which compared the performance of the newly developed water models combined with various popular force fields (37,106,107,108,109,110,111,112,113,114). The authors used radii of gyration measured by SAXS, chemical shifts and 3J(HN,Hα) couplings measured by NMR, as well as a host of other diverse experimental characteristics, to test and validate the new water models. From these studies it emerged that the new water models clearly perform better in simulating disordered proteins than the previous-generation models such as TIP3P. However, beyond that it is difficult to draw any conclusions—it appears that some systems are better modeled by one water / force field combination, while other systems favor other combinations. It should also be added that only the TIP4P-D model has been tested in more than a few studies, whereas other models received little attention. Altogether, the field needs more information on this problem, including a wider range of IDPs, more comprehensive experimental data sets, longer trajectories, and improved study designs. The search for an optimal water model should be generally framed as a search for an optimal force field, including a water model as one of its most consequential elements.
In this study we chose three water models, TIP4P-Ew, TIP4P-D, and OPC, to investigate their performance in the context of MD modeling of protein diffusion, with the focus on disordered proteins. TIP4P-Ew is a classic model, which is selected here as a point of reference. TIP4P-D is a better established specialized model, developed for disordered proteins. Finally, OPC represents the latest generation of water models, aspiring to model both structured and disordered proteins. All of them are 4-site fixed-charge fixed-geometry models, which facilitates the comparison (e.g., there is no difference in the computational costs between these models). For reader’s convenience, the parameters of these models are summarized in Fig. 2.
Figure 2.
Parameterization of water models TIP4P-Ew, TIP4P-D, and OPC. van der Waals parameters are for the oxygen atom, with the variable representing oxygen-to-oxygen distance. To see this figure in color, go online.
To interpret our MD results on protein diffusion, which is the main focus of this paper, we also need data on the viscosity of the modeled bulk water. To address this question, we consider self-diffusion coefficients for TIP4P-Ew, TIP4P-D, and OPC water models. While the data on self-diffusion coefficients for various water models are available in the literature, they are often obtained from simulating a relatively small cluster of water molecules with no regard for box-size dependence of (53). Furthermore, the published data do not always document the temperature dependence of . Therefore, we decided to quantify as a part of this study, using the same simulation setup as employed in our study of protein diffusion.
Specifically, for each of the considered water models we recorded three 150-ns-long trajectories involving small-, medium-, and large-sized water boxes. The box dimensions were taken to be the same as in Ub simulations (for instance, the large OPC box contained 65,243 water molecules). The simulations were conducted at two temperatures, 298 and 303 K (relevant for N-H4 and Ub, respectively). The trajectories were processed as described in materials and methods, resulting in three values per water model per temperature. These results are plotted in Fig. 3 as a function of the inverse box size, (color coded as indicated in the legend). Linear extrapolation of to the limit of an infinitely large box, , allows one to recover the true MD-predicted self-diffusion coefficients of water. These extrapolated values , corresponding to the y-intercepts of the dashed lines in the graph, can be compared with the relevant experimental data (115), as indicated by the horizontal red lines. The results are also summarized in Table 1.
Figure 3.
Determination of self-diffusion coefficients from the series of MD simulations of water using TIP4P-Ew, TIP4P-D, and OPC models at (A) 298 K and (B) 303 K. The results are plotted as a function of the inverse linear size of the simulation box, . The per-box values are extrapolated to the limit of an infinitely large box, , using simple linear regression (dashed lines). The y-intercept of each regression line corresponds to the predicted self-diffusion coefficient for a given water model, (listed in Table 1). The experimental results (115) are represented by solid red lines. To see this figure in color, go online.
Table 1.
MD-predicted self-diffusion coefficients for three different water models at 298 and 303 K together with the corresponding experimental results (115)
| Temperature (K) |
(10−9 m2/s) |
(10−9 m2/s) |
||
|---|---|---|---|---|
| Experiment |
MD simulations |
|||
| TIP4P-Ew | TIP4P-D | OPC | ||
| 298 | 2.30 | 2.63 | 2.13 | 2.41 |
| 303 | 2.60 | 2.96 | 2.39 | 2.66 |
The inspection of Fig. 3 confirms that there is slight, but distinct size dependence of the obtained values. Therefore, if the objective is to compare the MD results with the experimental data one should use the extrapolated values, . Likewise, when discussing the properties of solvent water in the protein simulations, it is appropriate to refer to . Surveying the results in Fig. 3 and Table 1, we observe that TIP4P-Ew model leads to appreciably overestimated , TIP4P-D model leads to somewhat underestimated , whereas the OPC model leads to minimally overestimated .
It is also instructive to compare the results with the information in the literature. For TIP4P-Ew water, the self-diffusion coefficient at 298 K was reported to be 2.4 × 10−9 m2/s (56), which is appreciably lower than the value of 2.63 × 10−9 m2/s reported here. This can be understood by noticing that the original estimates were obtained using a small water cluster (512 molecules) simulated under the Andersen thermostat (116), which is, strictly speaking, not suitable for modeling of transport properties (49). A similar observation concerning TIP4P-Ew has been recently made by the Economou group, who reported the size-corrected diffusion coefficient of 2.7 ×10−9 m2/s for this water model (53).
For the OPC water, the originally reported self-diffusion coefficient at 298 K was 2.3 × 10−9 m2/s (57), which is slightly lower than the value 2.41 × 10−9 m2/s found in our study. One should bear in mind that the water cluster simulated by Izadi et al. was also rather small (804 molecules) and the Langevin thermostat used by the authors of that study tend to bias the extracted diffusion rates (discussed in the introduction).
Finally, for the TIP4P-D water, the self-diffusion coefficient of 2.1 × 10−9 m2/s was originally reported at a temperature of 300 K. This is slightly lower than the values 2.13 × 10−9 and 2.39 × 10−9 m2/s that we obtained at 298 and 303 K, respectively. Of note, the authors used the Nosé-Hoover thermostat (117,118), which is appropriate for this type of problem, and a somewhat bigger water box (3,054 water molecules).
MD simulations of N-H4
We recorded an extensive series of MD simulations for the disordered N-terminal tail of histone H4 (net length 75 μs, see Table S1 for details). This positively charged peptide is comprised of the highly mobile glycine-rich segment (residues 1–15) and the segment that consists of bulkier residues with a very high proportion of charged amino acids (residues 16–25). There were some early reports of α-helical propensity in N-H4, particularly in the acetylated form of the peptide, as well as propensity to adopt β sheet conformations (119,120,121,122). However, subsequent experimental studies and simulations of the isolated N-H4 peptide, as well as the N-terminal H4 tail within the nucleosome core particle, have not found any evidence of residual secondary structure in this segment (37,123,124,125,126). Of interest, NMR studies of nucleosome core particle samples have exposed some intriguing aspects of N-H4 dynamics, including fuzzy interactions with the nucleosomal DNA (scalable by lysine acetylation) and a competition between H4 and H3 tails for the DNA interaction sites (38,126,127,128).
Reviewing the trajectories of N-H4 in the TIP4P-Ew, TIP4P-D, and OPC water, we note that in all of them the peptide dynamically interconverts between a multitude of random conformations. However, in the case of TIP4P-Ew simulation, it tends to constantly form various hairpin-like arrangements; sometimes, the entire peptide forms a semblance of hairpin, which continually morphs from one shape to another, but remains recognizable for up to several hundreds of nanoseconds. Occasionally, the peptide chain forms a “mini fold,” which features little or no secondary structure, but likewise can retain its topology for up to several hundreds of nanoseconds. Such fluid structural motifs are held together by a handful of backbone-to-backbone, sidechain-to-backbone, and sidechain-to-sidechain hydrogen bonds. In contrast, in the TIP4P-D trajectory the peptide mainly adopts extended conformations. Different hairpin- and loop-like motifs are more local in character and appear less frequently. Among them there is a distinctive small hairpin that is sporadically formed by the C-terminal portion of the peptide (37). Finally, the OPC trajectory falls somewhere in between the other two, as it features a mix of the extended conformations and loosely packed conformations.
From the perspective of diffusion measurements, it is essential that N-H4 adopts more compact conformations in the TIP4P-Ew solvent, more extended conformations in the TIP4P-D solvent, and a mixture of the two in the OPC solvent. This can be conveniently visualized through the gyration radius distribution function, . These distributions for the N-H4 simulations in the TIP4P-Ew, TIP4P-D, and OPC water are shown in Fig. 4 (blue, green, and magenta histograms, respectively). In each case, the distribution can be loosely described as bimodal. For the TIP4P-Ew histogram, the dominant narrow peak is centered just under 10 Å, with the secondary broader peak appearing at around 13 Å. For the TIP4P-D distribution, the broad main peak is centered at approximately 16 Å, while a minor component is visible at 11 Å. Finally, in the OPC histogram the broad peak with the larger area under the curve is positioned at ca. 15 Å, whereas a sharper peak with a roughly triangular shape occurs at 11 Å. The average values are 11.5, 13.8, and 15.3 Å for TIP4P-Ew, OPC, and TIP4P-D trajectories, respectively—indicating that the TIP4P-Ew and TIP4P-D results are wide apart, with OPC somewhat closer to the latter rather than the former.
Figure 4.
Gyration radius distributions for the N-H4 simulations in the TIP4P-Ew, TIP4P-D, and OPC water (small-box trajectories, NPT simulations). The histograms for medium- and large-box NPT simulations, Fig. S4, B and C, are near-identical to the ones shown in this plot; this is not surprising given that the medium- and large-box trajectories were recorded as a set of short 10-ns segments starting from the frames that were extracted from the small-box trajectory (see Fig. 1). The histograms for small-, medium-, and large-box NVE simulations employing TIP4P-Ew and TIP4P-D simulations, Fig. S4, D–F, are similar but not identical to the ones shown in this plot. Specifically, the distributions derived from NVE and NPT simulations in TIP4P-D solvent are, in fact, almost indistinguishable, whereas in the case of TIP4P-Ew there are some small, but visible differences. This outcome is understandable—indeed, one should expect that TIP4P-D simulations involving extended and highly dynamic peptide species are better converged than TIP4P-Ew simulations involving a host of interconverting hairpin-like conformers and other loosely structured arrangements. To see this figure in color, go online.
The results shown in Fig. 4 are expectable. As indicated previously, TIP4P-Ew favors intraprotein interactions at the expense of protein-to-water interactions, thus giving rise to (transiently) structured, more compact N-H4 species. At the same time, TIP4P-D shifts the balance in the opposite direction, emphasizing amply solvated, extended species. In the context of this work, it is important that the three water models, TIP4P-Ew, TIP4P-D, and OPC predict significantly different conformational ensembles. The key question is whether the experimental measurements can help to discriminate between these three distinctive N-H4 ensembles. This question will be addressed in the remainder of the paper.
To conclude this section, we briefly discuss the situation with Ub simulations (see Table S2 for the summary of trajectories). Theoretically speaking, one can imagine that the Ub molecule becomes somewhat “compacted” in the TIP4P-Ew solvent and somewhat “expanded” in the TIP4P-D solvent. This can conceivably happen due to the surface side chains that cling to the protein surface in the TIP4P-Ew solvent, but become extended outward in the TIP4P-D solvent. In fact, this kind of effect proved to be minor, with average values of Ub 11.69, 11.70, and 11.81 Å in TIP4P-Ew, OPC, and TIP4P-D trajectories, respectively (small-box NPT simulations). Therefore, we can safely assume that, insofar as diffusion characteristics are concerned, the structural state of Ub does not depend on the water model used in the simulations.
Protein diffusion coefficients from the MD simulations
The MD data from the simulations of N-H4 in the TIP4P-Ew, TIP4P-D, and OPC solvent were processed as described in materials and methods to determine the box-size-dependent quantities, which were then extrapolated to the limit of an infinitely large box, yielding the predictions for translational diffusion coefficients, . The latter step is illustrated in Fig. 5 A for the N-H4 simulations in the NPT ensemble (the net duration of the trajectories 45 μs). The per-box values of (circles in the plot, colored according to the type of water model used in the simulations) are successfully fitted with straight lines, thus confirming the sound character of the box-size correction scheme (47) and indicating that our results are well converged. We further observe that the predicted diffusion coefficient from the simulations in the TIP4P-Ew water significantly overestimate the experimental result (cf. the y-intercepts of the dashed blue line and the solid red line in Fig. 5 A). At the same time, the simulations in TIP4P-D water (green symbols) and OPC water (magenta symbols) lead to moderate under- and overestimation of the experimental diffusion coefficient, respectively. The OPC result is particularly close to the experimental value (see Table 2).
Table 2.
Summary of experimental and calculated diffusion coefficients for N-H4 and Ub
| Protein | Diffusion coefficient | Experimental value | Simulations |
||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TIP4P-Ew |
TIP4P-D |
OPC |
|||||||||||||||
| Direct | HYDROPRO | Nygaard | Kirkwood-Riseman | HullRadSAS | Direct | HYDROPRO | Nygaard | Kirkwood-Riseman | HullRadSAS | Direct | HYDROPRO | Nygaard | Kirkwood-Riseman | HullRadSAS | |||
| N-H4 | (10−10 m2/s) | 1.71 | 2.14 | 1.75 | 1.72 | 2.34 | 2.05 | 1.59 | 1.56 | 1.62 | 1.96 | 1.80 | 1.81 | 1.63 | 1.66 | 2.11 | 1.90 |
| Ub | (10−10 m2/s) | 1.62 | 1.89 | 1.61 | 1.66 | – | 1.75 | 1.50 | 1.60 | 1.65 | – | 1.74 | 1.69 | 1.61 | 1.66 | – | 1.75 |
| (107 rad2/s) | 4.87 | 5.50 | 4.00 | – | – | 4.14 | 4.35 | 3.87 | – | – | 4.05 | 4.94 | 3.98 | – | – | 4.13 | |
Direct scheme to calculate diffusion coefficients from the MD data is based on the analyses and accounts for the box-size dependence of the results (see Fig. 5) (the obtained values of diffusion constants are in bold). Four empirical schemes to predict the diffusion coefficients from a series of MD frames (HYDROPRO, Nygaard, Kirkwood-Riseman, and HullRadSAS schemes) are detailed in materials and methods. The experimental results for and are from this study, while is from (95), subject to minor adjustments related to temperature and solvent viscosity (see materials and methods).
The same kind of analysis has also been conducted for translational and rotational diffusion of Ub (the net duration of the NPT simulations 18 μs). The results shown in Fig. 5, B and C display the same pattern as described above: TIP4P-Ew predictions are significantly overestimated, TIP4P-D predictions are underestimated, while OPC predictions are close to the target but slightly over the mark. This outcome can be explained by considering the following two factors. First, the viscosity of the simulated solvent varies between the different water models (see Fig. 3 and Table 1). Second, in the case of N-H4, the simulated conformational ensemble of the peptide also varies between the models (see Fig. 4). We will analyze these two factors and their impact on the predicted and values in the next section.
In addition to the NPT simulations, we also generated a series of N-H4 and Ub trajectories using the NVE ensemble. These simulations were conducted in TIP4P-Ew and TIP4P-D solvent, totaling 30 μs for N-H4 and 12 μs for Ub. The trajectories were processed and analyzed in the same manner as discussed above. The availability of the independent and data obtained from the NVE simulations offers a good opportunity to test the convergence of our computational scheme.
A direct comparison of the diffusion coefficients obtained from the independent NPT and NVE simulations is presented in Fig. S5 and also summarized in Table S3. The agreement is indeed very good, which confirms the suitability of the Bussi-Parrinello thermostat for modeling of molecular diffusion. The average (unsigned) deviation between the NPT and NVE results amounts to a mere 2.0%. This is similar to the uncertainty of the experimental measurements—for example, the translational diffusion coefficient of Ub determined in this study has an uncertainty of 2.5% (see Table S3). Therefore, we conclude that the precision of our calculations is sufficient to conduct a meaningful comparison to the experimental data. The calculated and experimental diffusion coefficients should be regarded as fully consistent so long as they fall within several percentage points of each other.
MD-simulated versus experimental diffusion coefficients: Interpretation
In Fig. 6 we visualize the deviations (expressed in percentage points) between the MD-predicted and experimental diffusion coefficients for water, Ub, and N-H4. It is convenient to first discuss the case of Ub, where the conformational state of the protein is uniquely defined and protein diffusion is presumably controlled by solvent viscosity alone. The data on solvent viscosity in the form of at the relevant temperature of 303 K are summarized in Fig. 6 A. We compare those with the data on translational diffusion of Ub, , shown in Fig. 6 B.
Figure 6.
The differences between the simulated and experimental values of diffusion coefficients for (А–C) water and Ub at 303 K, as well as (D and E) water and N-H4 at 298 K (based on Tables 1, 2, and S3). The simulation and measurement conditions are annotated in the plot. Of note, the results for water self-diffusion coefficient at 298 K are very similar to those at 303 K (cf. A and D). While different water models approximate the experimental water viscosity at different temperatures with a variable degree of success, the temperature difference here is too small to make any significant difference. To see this figure in color, go online.
As seen from the plot, in the TIP4P-Ew simulations the self-diffusion coefficient of water is overestimated by 13.8% (blue bar in Fig. 6 A). Similarly, the diffusion coefficient of Ub using this water model is overestimated by 16.7 or 14.8% (blue and light blue bars in Fig. 6 B, corresponding to the NPT and NVE simulations, respectively).
Next, in the TIP4P-D simulations the self-diffusion coefficient of water is underestimated by 8.1% (green bar in Fig. 6 A). Similarly, the diffusion coefficient of Ub using this water model is underestimated by 7.4% (green and light green bars in Fig. 6 B, corresponding to the NPT and NVE simulations).
Finally, in the OPC simulations the self-diffusion coefficient of water is overestimated by 2.3% (magenta bar in Fig. 6 A). Similarly, the diffusion coefficient of Ub using this water model is overestimated by 4.3% (magenta bar in Fig. 6 B, NPT simulations).
Thus, comparing the results in Fig. 6, A and B we observe that they are essentially identical to within a couple of percentage points. This means that the simulated translational diffusion of Ub is indeed determined entirely by the viscosity of the MD solvent. In particular, when using the OPC solvent, which closely reproduces the experimental self-diffusion coefficient of water, the simulations also closely reproduce the experimental diffusion coefficient of Ub. In turn, this means that the MD-based scheme employed in our study can be used to successfully predict the translational diffusion of a globular protein—one just needs to use a water model with correct viscosity (such as OPC) or, otherwise, apply a simple ex-post viscosity correction (recommendable for TIP4P-Ew and TIP4P-D).
This conclusion is supported by the rotational diffusion data for Ub (see Fig. 6 C). As can be seen from the plot, the predicted values of the rotational diffusion coefficient can be fully explained by the variations in solvent viscosity. In particular, when using the OPC solvent, the predicted diffusion coefficient reproduces the experimental result to within 1.4% (magenta bar in Fig. 6 C). Otherwise, in the case of TIP4P-Ew or TIP4P-D solvent one needs to apply a simple viscosity correction to obtain a similarly accurate result.
In the above discussion, we observe that the bars in Fig. 6 A are very similar in magnitude to the bars in Fig. 6, B and C. While some deviations are seen in the graphs, they are small and largely reflect the statistical uncertainty in the MD-based calculations (cf. green and light green bars in Fig. 6 C, corresponding to the NPT and NVE simulations, respectively). Another source of uncertainty is the experimental error. As already indicated (see the previous section), the data entries in Fig. 6 can be deemed consistent so long as they fall within several percentage points of each other.
Next, we turn to the N-H4 data, which are of prime interest to us in the context of this study. Since N-H4 in solution is disordered, we expect that the accuracy of the MD predictions for this peptide depends in this case on two factors: the viscosity of the simulated solvent (already discussed for Ub) and the characteristics of the simulated conformational ensemble (see Fig. 4).
The deviations between the MD-predicted and experimental diffusion coefficients for water and N-H4 at the relevant temperature of 298 K are graphed in Fig. 6, D and E. Comparing the data from water simulations with those from the N-H4 simulations, we immediately notice a big difference between the respective TIP4P-Ew results. While this water model overestimates the water self-diffusion coefficient by 14.3% (blue bar in Fig. 6 D), it overestimates the N-H4 diffusion coefficient by a much wider margin, 25.1 or 23.4% (blue and light blue bars in Fig. 6 E). The mismatch is too large to be explained away by the experimental error or statistical uncertainty of the MD simulations. Hence, we are led to conclude that the reason lies with the modeling of the peptide conformational ensemble in the N-H4 simulations.
Indeed, considering the results in Fig. 4 we observe that the N-H4 peptide in the TIP4P-Ew solvent tends to adopt compact conformations. These compact conformational species diffuse more rapidly, resulting in the higher-than-expected translational diffusion coefficient. Thus, our analysis identifies a problem with the modeling of the N-H4 conformational ensemble in TIP4P-Ew water. Additional relaxation-based evidence that supports this interpretation is shown below.
Returning to the discussion of Fig. 6, D and E, we note for the TIP4P-D and OPC models that the data on self-diffusion of water are compatible with the data on diffusion of N-H4. Specifically, in the case of the TIP4P-D model the self-diffusion coefficient of water is underestimated by 7.4% (green bar in Fig. 6 D), while the diffusion coefficient of N-H4 is underestimated by 7.0 or 3.5% (green and light green bars in Fig. 6 E). Finally, in the case of the OPC model the self-diffusion coefficient of water is overestimated by 4.8% (magenta bar in Fig. 6 D), while the diffusion coefficient of N-H4 is overestimated by 5.8% (magenta bar in Fig. 6 E).
The simple explanation is that both water models, TIP4P-D and OPC, are reasonably successful in modeling of the N-H4 conformational ensemble. Specifically, it can be suggested that the true distribution for this peptide is somewhere in between those predicted by the TIP4P-D and OPC simulations (cf. Fig. 4). In this situation, the MD-derived diffusion coefficient of N-H4 is determined mainly by solvent viscosity, which is what we observe for these two water models.
To corroborate these observations, we turn to the 15N relaxation data that have been previously collected in our laboratory (37). These data include a set of per-residue longitudinal relaxation rates and transverse cross-correlated (dipolar-CSA) relaxation rates measured at the temperature of 298 K. To draw a comparison, we also calculated the and rates using the MD trajectories reported in this study (see materials and methods). Both experimental and calculated results are shown in Fig. 7.
Figure 7.
Simulated and experimental 15N relaxation rates (A) and (B) in the N-H4 peptide. The notations are described in the figure legend. The simulated data are from the NPT trajectories in the small box. We found that the calculated relaxation rates are essentially independent of the box size (see Fig. S6), which is understandable since the spin relaxation in the N-H4 peptide is dictated by its extensive conformational dynamics rather than the overall tumbling. To see this figure in color, go online.
Of particular interest to us are the cross-correlated relaxation rates , Fig. 7 B. Unlike the longitudinal rates , the transverse rates have a simple dependence on motional time constants—namely, any slowing of protein dynamics leads to increases in (129). Let us first discuss the simulations employing TIP4P-Ew water. As already demonstrated, the viscosity of TIP4P-Ew water is substantially lower than the experimentally measured viscosity (cf. Table 1). In this situation, one should expect that the simulated dynamics of the N-H4 peptide in the TIP4P-Ew water should be faster than it is in reality. In turn, this implies that the simulated rates should be lower than the experimentally measured ones. However, these expectations are not borne out by the actual results—in fact, the simulated values are substantially higher than the experimental values (blue profile versus the black circles in Fig. 7 B).
Therefore, it remains to conclude that the results in Fig. 7 B are influenced by the details of the N-H4 conformational ensemble. Recall that the ensemble observed in TIP4P-Ew simulations involves a significant proportion of loosely structured (hydrogen bonded) species and overall appears to be exceedingly compact (see MD simulations of N-H4). The conformational dynamics of N-H4 is obviously slowed down in such transiently structured compact species, resulting in the elevated rates. The effect is significant enough to overcompensate for the low viscosity of the TIP4P-Ew water, producing the simulated rates that are greater than the experimental values by a factor of ∼1.5 (see Fig. 7 B). Thus, the relaxation evidence appears to support our diffusion-based findings, indicating that the conformational ensemble of the N-H4 peptide in the TIP4P-Ew water is exceedingly compact. This is also in line with the previous knowledge that the classical water models tend to produce unrealistic “collapsed” models of disordered proteins (99).
Finally, returning to the discussion of Fig. 7 we note that both TIP4P-D and OPC water produce a better description of the N-H4 relaxation rates than TIP4P-Ew. In fact, they accurately capture the rates in the more disordered N-terminal portion of the peptide, while still overestimating the rates in the C-terminal segment (green and magenta profiles versus the black circles in Fig. 7). Given the smallness of the difference between the relaxation-rate predictions from the TIP4P-D and OPC simulations, it is hardly possible to favor one model over the other (i.e., it is impossible to tell which one provides a more realistic description of the N-H4 conformational ensemble, see Fig. 4). While both diffusion and relaxation analyses point at the problem with the TIP4P-Ew water, further studies are needed to assess the relative performance of the TIP4P-D and OPC models.
Alternative methods to predict diffusion coefficients from the MD data
In this paper, we present a direct scheme to determine the protein translational diffusion coefficient from MD simulations. As a simple alternative, one can rely on the program HYDROPRO (58), which is intended to calculate hydrodynamic properties of macromolecules based on their coordinates. Of particular significance to us, in a number of studies this program has been used to predict the hydrodynamic radii and translational diffusion coefficients of disordered proteins. Specifically, HYDROPRO was applied to a set of MD-simulated IDP conformers and the results were subsequently averaged to obtain the predicted values of and (106,130,131,132,133). The same approach has also been used for various ensemble models of IDPs generated by means other than MD simulations (134,135,136). Furthermore, a few years ago, Nygaard et al. proposed a simple empirical parameterization to express the relationship between the average radius of gyration and the HYDROPRO-predicted for simulated or otherwise constructed conformational ensembles of IDPs (59). This parametrization has also been used to predict the translational diffusion parameters of the MD-simulated IDPs (113,114).
Strictly speaking, the approach whereby HYDROPRO is used to predict of disordered protein lacks any solid theoretical foundation. The description of HYDROPRO clearly states that it is intended to calculate the hydrodynamic properties of rigid macromolecules (58); hydrodynamic calculations on flexible macromolecules require different approaches (137,138).
In effect, application of HYDROPRO to conformational ensembles of IDPs implies that all conformers are “frozen” and preserve their shape while diffusing in solution. This is clearly different from the real-life situation where the peptide’s diffusion involves continuous conformational rearrangements. Intuitively, we anticipate that diffusion of the frozen conformers should be, on average, slower than that of the conformationally mobile peptide. Indeed, numeric simulations of simple polymer chains support this conjecture (139,140).
To directly test this point, we designed a special restrained simulation of the N-H4 peptide. Specifically, we regenerated a series of five hundred 10-ns trajectories representing N-H4 in OPC water in a medium-sized box. In each of these new trajectories, a large number of “soft” distance restraints are imposed on the peptide so that it preserves its initial conformation during the simulation (see materials and methods). The obtained set of short trajectories is thus representative of the diverse conformational ensemble of N-H4, but all of the simulated conformers are essentially rigid (frozen) and diffuse as such. As it turns out, this model leads to the value, which is 12% lower than our original (unrestrained) result. The same kind of restrained simulation in a small-size box leads to the value that is underestimated by 8%. These findings directly confirm our notion that the ensemble consisting of frozen conformers shows a slower diffusion rate compared with the fully dynamic peptide model.
Based on these observations, we expect that application of HYDROPRO to our original (unrestrained) MD data should lead to a similar underestimation of diffusion coefficients. To address this question, we conducted HYDROPRO calculations on 5000 frames from the small-box trajectories of N-H4 in TIP4P-Ew, TIP4P-D, and OPC water. The calculations used the experimental values for water viscosity and water density at 298 K, with other input parameters set to default values. The results were compared with the experimental value as measured in this work; the deviations between the HYDROPRO-based predictions and the experiment are illustrated in Fig. 8 A.
Figure 8.
The differences between the calculated and experimental values of diffusion coefficients for N-H4. The calculations use 5000 frames from the small-box NPT trajectories of N-H4 in TIP4P-Ew, TIP4P-D, and OPC water (blue, green, and magenta bars, respectively). The results from medium- and large-box simulations are very similar to those shown in the plot, which is understandable given that the respective conformational ensembles are very similar (see Fig. S4). The calculated values are from the following computational tools: (A) HYDROPRO program, (B) empirical formula, which emulates the HYDROPRO results (59), (C) Kirkwood-Riseman formula, Eq. 5, (D) HullRadSAS program. In the calculations B and C, the calculated values of were converted into by means of the Einstein-Stokes equation, Eq. 3. The alternative conversion method using the data from the reference molecule, Eq. 4, produces similar results (not shown). To see this figure in color, go online.
Inspection of the HYDROPRO-based results in Fig. 8 A suggests that TIP4P-Ew is the best water model, which faithfully reproduces the experimentally measured diffusion coefficient of N-H4. Indeed, the deviation from the target for this water model is only 2.3%, comparable with the experimental uncertainty and appreciably better than that for the TIP4P-D and OPC models. This is in contrast to our direct analyses of the MD data (see MD-simulated versus experimental diffusion coefficients: Interpretation), which suggests that the use of TIP4P-Ew leads to the overly compact conformational ensemble for the N-H4 peptide; in addition, this water model suffers from low viscosity. We believe that HYDROPRO results are in error, as determined by the following two factors.
-
1)
since HYDROPRO treatment tacitly assumes that the peptide molecules are rigid, it tends to underestimate the diffusion coefficients of the disordered species (see above);
-
2)
at the same time, the simulations using TIP4P-Ew water produce an overly compact peptide ensemble, which results in HYDROPRO overestimating the relevant diffusion constants.
The combination of factors (1) and (2) leads to error compensation, resulting in a false conclusion that TIP4P-Ew is the best choice of water model to simulate this disordered peptide.
In this connection, one other factor should also be mentioned.
-
3)
HYDROPRO calculations rely on the user-supplied value of water viscosity (which is usually the experimentally measured value) and, therefore, are unaffected by the TIP4P-Ew viscosity per se.
From our perspective, this latter aspect is both a strength and a weakness. On the one hand, the results are immune to the viscosity bias, which we have to deal with in our direct analyses of the peptide diffusion, see MD-simulated versus experimental diffusion coefficients: Interpretation. On the other hand, the HYDROPRO results fail to alert us to the problem with TIP4P-Ew viscosity (which can compromise other MD-based calculations, e.g., the calculations of spin relaxation rates).
To conclude the discussion of HYDROPRO, we note that this program shows very good accuracy in predicting for Ub (see Fig. S7 A). The results are essentially independent of the water model used. Indeed, the structure of this small globular protein is only minimally sensitive to the type of the water model employed in the MD simulations (see MD simulations of N-H4). At the same time, HYDROPRO performs rather poorly in predicting the of Ub, registering errors of up to 20% (see Fig. S7 D). This fact, which has also been noted by others (141), probably reflects the limitations of the hydrodynamics models used by HYDROPRO.
It should also be noted that the empirical relationship between and the HYDROPRO-predicted due to Nygaard et al. holds well for the proteins at hand (cf. Fig. 8, A vs. B and Fig. S7, A vs. B). This means, however, that our criticism of the approach whereby HYDROPRO is used to predict translational diffusion coefficients of IDPs also applies to the Nygaard’s formula.
In conclusion, we recommend against using HYDROPRO on ensemble models of disordered proteins because this program is not designed for disordered proteins and can produce misleading results.
Very recently, Lindorff-Larsen and co-workers proposed two alternative schemes to predict translational diffusion parameters of disordered proteins based on their ensemble models. For a number of IDPs of different size, the authors generated conformational ensembles by using either the well-known program Flexible-Meccano (142) or their original Langevin simulations using the coarse-grained force field CALVADOS (143). The ensembles were subsequently reweighted using the experimental SAXS data. The authors then sought to establish the relationship between these ensembles and the () data from the PFG-NMR diffusion measurements.
To this end, the authors initially invoked a generalistic Kirkwood-Riseman model pertaining to the hydrodynamics properties of flexible polymers (144). As it appeared, this model was capable of accurately reproducing the experimental values based on the ensemble models at hand, subject to some reservations concerning the protein size (60). However, shortly thereafter the experimental data were reassessed and it was found that the Kirkwood-Riseman formula actually tends to overestimate the translational diffusion coefficients of IDPs (61).
As an alternative, the same investigators proposed to use the recent program HullRadSAS (85). When applied to the ensemble models at hand, this approach reproduces the experimental values with a fairly good accuracy (61). It should be noted, however, that HullRadSAS is a program for hydrodynamics calculations, which is conceptually similar to HYDROPRO. Just like HYDROPRO, it is not intended for disordered proteins. Therefore, the use of HullRadSAS in this context should be regarded as an empirical solution to draw a bridge between the given set of models and the () data.
As a part of our study, we tested both the Kirkwood-Riseman formula and the HullRadSAS scheme on our data (see materials and methods). The results from the Kirkwood-Riseman interpretation appear to be unsatisfactory (see Fig. 8 C). In accordance with the latest observations (61), this method overestimates the diffusion coefficients of IDPs. While our rigorous analysis of the N-H4 diffusion in the MD simulations suggests that the trajectories recorded in TIP4P-D and OPC water achieve a near-quantitative accuracy in modeling the translational diffusion of this peptide (see MD-simulated versus experimental diffusion coefficients: interpretation), the application of the Kirkwood-Riseman formula to the MD snapshots leads to large errors of 14.6 and 23.4% for the two respective water models.
Turning to the very recent HullRadSAS scheme (see Fig. 8 D), we note that the results are indeed more in line with the direct MD-based analysis. Nevertheless, they still fall short of quantitative agreement. Specifically, if we factor out the trivial viscosity effects, we expect that the conformational ensembles obtained from the TIP4P-D and OPC simulations should accurately reproduce the experimental diffusion characteristics of N-H4 (see Fig. 6, E and D). Instead, the predictions by HullRadSAS overestimate the experimental value by 5.3 and 11.1% for TIP4P-D and OPC, respectively. This may lead one to incorrectly conclude that the OPC model is poorly suited for modeling of disordered proteins.
Given that HullRadSAS has not been designed to work with flexible IDPs, we repeat the call for caution that we previously made with regard to HYDROPRO. If the goal is to validate an MD model of a disordered peptide, it is safer to directly extract the diffusion coefficient from the MD simulations (such as demonstrated in this work) rather than rely on empirical tools to predict based on a selection of MD snapshots.
Accelerating MD simulations of protein diffusion using hydrogen mass repartitioning
As discussed above, direct determination of using the metric is preferable to various empirical schemes that make predictions based on the coordinates of the MD-simulated conformers. However, the downside of the direct approach is that it is time consuming. We already showed that the computational time can be reduced severalfold by recording fragmented trajectories of the peptide in the medium- and large-size water boxes. In this section we consider the possibility that further time savings can be achieved by using a longer integration timestep. To obtain some insight into this problem, we treat the simple test case, that of Ub.
Toward this goal, we recorded three additional series of Ub trajectories using longer integration timesteps: = 2 fs, = 4 fs (with the HMR scheme applied to Ub, but not to water, as per the Amber manual recommendation), and = 4 fs (with the HMR scheme applied to both Ub and solvent water). Each series consists of the simulations in small-, medium-, and large-size boxes and has a combined length of 6 μs. Aside from the timestep and the HMR scheme, the simulation protocol is identical to the one described in materials and methods. One of the two more successful solvent models, OPC, has been selected for these simulations. The results are summarized in Table 3.
Table 3.
Diffusion coefficients of Ub from the additional series of MD simulations recorded with different integration timesteps Δt and partial or full use of the HMR scheme
| Protein | Diffusion coefficient | Experimental values | Simulations |
|||
|---|---|---|---|---|---|---|
| Δt = 1 fs no HMR | Δt = 2 fs no HMR | Δt = 4 fs HMR (protein) | Δt = 4 fs HMR (protein + H2O) | |||
| Ub | (10−10 m2/s) | 1.62 | 1.69 | 1.80 | 1.93 | 1.68 |
| (107 rad2/s) | 4.87 | 4.94 | 4.97 | 5.91 | 4.72 | |
Other than the integration parameters, all simulations employed the same MD protocol as described in materials and methods. The NPT thermodynamic ensemble and the OPC solvent model were used. The extrapolation of the calculated diffusion coefficients to the limit of an infinitely large water box is illustrated in Fig. S8.
Inspection of the data in Table 3 shows that making a transition from = 1 fs to 2 fs causes moderate increases in the Ub diffusion coefficients on the order of several percentage points. On the other hand, an attempt to use a longer timestep, = 4 fs, in conjunction with the HMR scheme on Ub leads to more substantial increases, on the order of 15–20%. At the same time, if the HMR scheme is applied to both Ub and water, the results from = 4 fs simulations are brought back in line with the original 1 fs results.
This outcome is apparently at odds with the Amber default setup, where the HMR scheme is only applied to the protein molecule (145). The rationale for this default setting is that the water molecules are already constrained to their rigid geometry via the SETTLE algorithm (146) and, therefore, do not need to be additionally constrained via the HMR. However, it has been previously pointed out that the SETTLE algorithm does not necessarily eliminate all high-frequency motions in water (147). In fact, there are short-range nonbonded interactions between proximal water molecules, which can give rise to such high-frequency motions (147). We argue that the application of the HMR scheme to both Ub and solvent helps to suppress these high-frequency motional modes and thus improve the accuracy of the MD simulations using the extended 4 fs timestep.
Our preliminary studies indicate that this conclusion also holds for pure water, as well as the intrinsically disordered N-H4 peptide. It appears that the simulations using = 4 fs are feasible for these systems and produce the accurate diffusion parameters, but if and only if the HMR scheme is applied to the entire simulation cell. If confirmed, this result means that the direct MD-based procedure to quantify protein diffusion can be completed in one-quarter of the time that has been expended in this work.
Concluding remarks
Nowadays long peptides or smaller IDPs can be adequately modeled by means of conventional MD simulation techniques. It is therefore highly desirable to test and experimentally validate different MD models of such disordered systems. In particular, it would be useful to identify a model system that can be investigated through concerted efforts of many research groups. For instance, peptides such as RS peptide (29,60,106,148) or N-H4 (37,125) investigated in this work can be used toward this goal. Such a thoroughly characterized disordered system could play a role similar to the one played by Ub or lysozyme in structural and dynamic studies of globular proteins.
Measurements of translational diffusion by PFG-NMR can potentially provide a valuable piece of experimental data to test and validate MD models of disordered proteins. However, this type of analysis faces a number of hurdles. As it happens, is not a very sensitive parameter. This becomes clear when you consider the fact that of monomeric and homodimeric proteins differ by only a factor of 21/3 ≈ 1.26. This lack of sensitivity makes it difficult to register small changes in compactness/extendedness of the IDP’s conformational ensemble via the diffusion measurements.
Despite their wide popularity, PFG-NMR experiments are technically challenging, suffering from convection artifacts and baseline distortions caused by residual water signal (90). On the computational front, prediction of from the MD data also presents a number of challenges. As shown in this work, it requires a judicious choice of the thermostating algorithm and multiple long MD simulations, including those in extralarge water boxes. To interpret the results, it is also necessary to accurately know the viscosity of the MD-simulated water for the water models at hand.
In addressing these problems, we implemented the Bussi-Parrinello velocity rescaling thermostat in the MD simulation program Amber. We also implemented a fragmentation scheme, which allows one to efficiently record MD trajectories in large-sized boxes while maintaining a good sampling of the peptide’s conformational space. Our preliminary findings suggest that the simulations can be further accelerated by switching to the 4 fs integration step, with the proviso that the HMR scheme should be applied not only to the peptide but also to solvent. Separately, we investigated the viscosity of TIP4P-Ew, TIP4P-D, and OPC water and validated the results by modeling the diffusion of a small globular protein, Ub.
The central theme of this work is a careful analysis of translational diffusion of the N-H4 peptide based on a series of NPT and NVE simulations with the net length of 75 μs. The analysis suggests that the simulations using the classical TIP4P-Ew water model produce an overly compact conformational ensemble for this disordered peptide. This conclusion is convincingly supported by the comparison of the simulated and experimental 15N relaxation rates. Indeed, in the TIP4P-Ew trajectory the peptide is observed to form various mini fold arrangements, which are held together by opportunistic hydrogen bonds and gradually morph from one shape to another, but remain recognizable for up to several hundreds of nanoseconds.
On the other hand, we found that both TIP4P-D and OPC water models lead to the N-H4 conformational ensembles that are consistent with our experimental result. While the two ensembles are somewhat different, as characterized by the average values of 15.3 and 13.8 Å, the diffusion analyses are not sufficiently sensitive to discriminate between them. Likewise, the 15N relaxation rates calculated from the TIP4P-D and OPC trajectories are similar to each other and both show good agreement with the experimental results, such that we cannot prefer one water model over the other. Further data are needed to shed additional light on this problem.
As an alternative to the rigorous procedure to extract from the MD simulation data, various simplified methods have been widely used in this area. In particular, is often calculated by applying the program HYDROPRO to the MD-simulated conformers of the disordered protein. Here, we show that this approach can lead to misleading results. For example, it identifies TIP4P-Ew as the water model that is best suited to simulate the disordered N-H4 peptide, which is contrary to our (rigorous) findings. The main issue with the use of HYDROPRO is that it is designed to predict the diffusion properties of rigid biomolecules rather than the intrinsically flexible IDPs.
Very recently, a number of empirical tools have been developed to predict the translational diffusion coefficients of IDPs based on their simulated conformational ensembles. These tools were calibrated on the (presumed accurate) ensembles informed by the experimental SAXS data and the experimental results from PFG-NMR experiments. We found that these predictions can also be inaccurate, likely due to subtle experimental biases, e.g., related to the use of 1,4-dioxane as a reference molecule (61). In this sense, the first-principle predictions of from MD simulations of a disordered protein, such as demonstrated in this paper, provide an important benchmark and validation point for future efforts in this area.
Data and code availability
Programmatic tools to calculate translational and rotational diffusion coefficients based on MD simulation data can be downloaded from https://github.com/bionmr-spbu-projects/2023-UBQ-NH4-DIFFUSION.
Author contributions
O.O.L. conducted the simulations, analyzed and interpreted the results, and wrote the first draft of the manuscript. V.A.S. made the samples of N-H4 and Ub and performed the diffusion measurements. S.A.I. provided initial advice and developed software tools to record and process MD trajectories. I.S.P. implemented the experimental setup and developed the program to process diffusion NMR data. N.R.S. conceptualized the results and wrote the article with input from O.O.L., I.S.P., and other authors.
Acknowledgments
This work was supported by the SPbU grant 104236506 to N.R.S. The study has used the facilities of the Center for Magnetic Resonance (where we would like to acknowledge the assistance of M.A. Vovk and A.S. Mazur), Center for Molecular & Cell Technologies, Center for Chemical Analysis & Materials Research and Computing Center at SPbU. The authors express their strong wishes for peace in Ukraine.
Declaration of interests
The authors declare no competing interests.
Editor: Scott Showalter.
Footnotes
Supporting material can be found online at https://doi.org/10.1016/j.bpj.2023.11.020.
Supporting material
References
- 1.Xue B., Dunker A.K., Uversky V.N. Orderly order in protein intrinsic disorder distribution: disorder in 3500 proteomes from viruses and the three domains of life. J. Biomol. Struct. Dyn. 2012;30:137–149. doi: 10.1080/07391102.2012.675145. [DOI] [PubMed] [Google Scholar]
- 2.Peng Z., Oldfield C.J., et al. Uversky V.N. A creature with a hundred waggly tails: intrinsically disordered proteins in the ribosome. Cell. Mol. Life Sci. 2014;71:1477–1504. doi: 10.1007/s00018-013-1446-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Musselman C.A., Kutateladze T.G. Characterization of functional disordered regions within chromatin-associated proteins. iScience. 2021;24:102070. doi: 10.1016/j.isci.2021.102070. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Denning D.P., Patel S.S., et al. Rexach M. Disorder in the nuclear pore complex: the FG repeat regions of nucleoporins are natively unfolded. Proc. Natl. Acad. Sci. USA. 2003;100:2450–2455. doi: 10.1073/pnas.0437902100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Guharoy M., Szabo B., et al. Tompa P. Intrinsic Structural Disorder in Cytoskeletal Proteins. Cytoskeleton. 2013;70:550–571. doi: 10.1002/cm.21118. [DOI] [PubMed] [Google Scholar]
- 6.Uversky V.N. Intrinsically disordered proteins in overcrowded milieu: Membrane-less organelles, phase separation, and intrinsic disorder. Curr. Opin. Struct. Biol. 2017;44:18–30. doi: 10.1016/j.sbi.2016.10.015. [DOI] [PubMed] [Google Scholar]
- 7.Wright P.E., Dyson H.J. Intrinsically disordered proteins in cellular signalling and regulation. Nat. Rev. Mol. Cell Biol. 2015;16:18–29. doi: 10.1038/nrm3920. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Suskiewicz M.J., Sussman J.L., et al. Shaul Y. Context-dependent resistance to proteolysis of intrinsically disordered proteins. Protein Sci. 2011;20:1285–1297. doi: 10.1002/pro.657. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Darling A.L., Uversky V.N. Intrinsic disorder and posttranslational modifications: the darker side of the biological dark matter. Front. Genet. 2018;9:a158. doi: 10.3389/fgene.2018.00158. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Halfmann R., Alberti S., et al. Lindquist S. Opposing Effects of Glutamine and Asparagine Govern Prion Formation by Intrinsically Disordered Proteins. Mol. Cell. 2011;43:72–84. doi: 10.1016/j.molcel.2011.05.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.De Simone A., Kitchen C., et al. Frenkel D. Intrinsic disorder modulates protein self-assembly and aggregation. Proc. Natl. Acad. Sci. USA. 2012;109:6951–6956. doi: 10.1073/pnas.1118048109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Iakoucheva L.M., Brown C.J., et al. Dunker A.K. Intrinsic disorder in cell-signaling and cancer-associated proteins. J. Mol. Biol. 2002;323:573–584. doi: 10.1016/s0022-2836(02)00969-5. [DOI] [PubMed] [Google Scholar]
- 13.Santofimia-Castaño P., Rizzuti B., et al. Iovanna J. Targeting intrinsically disordered proteins involved in cancer. Cell. Mol. Life Sci. 2020;77:1695–1707. doi: 10.1007/s00018-019-03347-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Ji C., Sigurdsson E.M. Current Status of Clinical Trials on Tau Immunotherapies. Drugs. 2021;81:1135–1152. doi: 10.1007/s40265-021-01546-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Song C., Shi J., et al. Chen H. Immunotherapy for Alzheimer’s disease: targeting β-amyloid and beyond. Transl. Neurodegener. 2022;11:18. doi: 10.1186/s40035-022-00292-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Dyson H.J., Wright P.E. Coupling of folding and binding for unstructured proteins. Curr. Opin. Struct. Biol. 2002;12:54–60. doi: 10.1016/s0959-440x(02)00289-0. [DOI] [PubMed] [Google Scholar]
- 17.Sigalov A.B., Zhuravleva A.V., Orekhov V.Y. Binding of intrinsically disordered proteins is not necessarily accompanied by a structural transition to a folded form. Biochimie. 2007;89:419–421. doi: 10.1016/j.biochi.2006.11.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Tompa P., Fuxreiter M. Fuzzy complexes: polymorphism and structural disorder in protein-protein interactions. Trends Biochem. Sci. 2008;33:2–8. doi: 10.1016/j.tibs.2007.10.003. [DOI] [PubMed] [Google Scholar]
- 19.Borgia A., Borgia M.B., et al. Schuler B. Extreme disorder in an ultrahigh-affinity protein complex. Nature. 2018;555:61–66. doi: 10.1038/nature25762. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Oldfield C.J., Dunker A.K. Intrinsically Disordered Proteins and Intrinsically Disordered Protein Regions. Annu. Rev. Biochem. 2014;83:553–584. doi: 10.1146/annurev-biochem-072711-164947. [DOI] [PubMed] [Google Scholar]
- 21.Aune K.C., Salahuddin A., et al. Tanford C. Evidence for residual structure in acid- and heat-denatured proteins. J. Biol. Chem. 1967;242:4486–4489. [PubMed] [Google Scholar]
- 22.Hughson F.M., Wright P.E., Baldwin R.L. Structural characterization of a partly folded apomyoglobin intermediate. Science. 1990;249:1544–1548. doi: 10.1126/science.2218495. [DOI] [PubMed] [Google Scholar]
- 23.Borcherds W., Theillet F.X., et al. Daughdrill G.W. Disorder and residual helicity alter p53-Mdm2 binding affinity and signaling in cells. Nat. Chem. Biol. 2014;10:1000–1002. doi: 10.1038/nchembio.1668. [DOI] [PubMed] [Google Scholar]
- 24.Da Vela S., Svergun D.I. Methods, development and applications of small-angle X-ray scattering to characterize biological macromolecules in solution. Curr. Res. Struct. Biol. 2020;2:164–170. doi: 10.1016/j.crstbi.2020.08.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Schroer M.A., Blanchet C.E., et al. Svergun D.I. Smaller capillaries improve the small-angle X-ray scattering signal and sample consumption for biomacromolecular solutions. J. Synchrotron Radiat. 2018;25:1113–1122. doi: 10.1107/S1600577518007907. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Choy W.Y., Forman-Kay J.D. Calculation of ensembles of structures representing the unfolded state of an SH3 domain. J. Mol. Biol. 2001;308:1011–1032. doi: 10.1006/jmbi.2001.4750. [DOI] [PubMed] [Google Scholar]
- 27.Nodet G., Salmon L., et al. Blackledge M. Quantitative description of backbone conformational sampling of unfolded proteins at amino acid resolution from NMR residual dipolar couplings. J. Am. Chem. Soc. 2009;131:17908–17918. doi: 10.1021/ja9069024. [DOI] [PubMed] [Google Scholar]
- 28.Xue Y., Skrynnikov N.R. Motion of a disordered polypeptide chain as studied by paramagnetic relaxation enhancements, 15N relaxation, and Molecular Dynamics simulations: how fast is segmental diffusion in denatured ubiquitin? J. Am. Chem. Soc. 2011;133:14614–14628. doi: 10.1021/ja201605c. [DOI] [PubMed] [Google Scholar]
- 29.Huang J., Rauscher S., et al. MacKerell A.D. CHARMM36m: an improved force field for folded and intrinsically disordered proteins. Nat. Methods. 2017;14:71–73. doi: 10.1038/nmeth.4067. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Robustelli P., Piana S., Shaw D.E. Developing a molecular dynamics force field for both folded and disordered protein states. Proc. Natl. Acad. Sci. USA. 2018;115:E4758–E4766. doi: 10.1073/pnas.1800690115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Piana S., Donchev A.G., et al. Shaw D.E. Water dispersion interactions strongly influence simulated structural properties of disordered protein states. J. Phys. Chem. B. 2015;119:5113–5123. doi: 10.1021/jp508971m. [DOI] [PubMed] [Google Scholar]
- 32.Shabane P.S., Izadi S., Onufriev A.V. General purpose water model can improve atomistic simulations of intrinsically disordered proteins. J. Chem. Theory Comput. 2019;15:2620–2634. doi: 10.1021/acs.jctc.8b01123. [DOI] [PubMed] [Google Scholar]
- 33.Do T.N., Choy W.-Y., Karttunen M. Accelerating the Conformational Sampling of Intrinsically Disordered Proteins. J. Chem. Theory Comput. 2014;10:5081–5094. doi: 10.1021/ct5004803. [DOI] [PubMed] [Google Scholar]
- 34.Stanley N., Esteban-Martín S., De Fabritiis G. Kinetic modulation of a disordered protein domain by phosphorylation. Nat. Commun. 2014;5:5272. doi: 10.1038/ncomms6272. [DOI] [PubMed] [Google Scholar]
- 35.Shrestha U.R., Smith J.C., Petridis L. Full structural ensembles of intrinsically disordered proteins from unbiased molecular dynamics simulations. Commun. Biol. 2021;4:243. doi: 10.1038/s42003-021-01759-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Löhr T., Kohlhoff K., et al. Vendruscolo M. A kinetic ensemble of the Alzheimer’s Aβ peptide. Nat. Comput. Sci. 2021;1:71–78. doi: 10.1038/s43588-020-00003-w. [DOI] [PubMed] [Google Scholar]
- 37.Kämpf K., Izmailov S.A., et al. Skrynnikov N.R. What drives 15N spin relaxation in disordered proteins? Combined NMR/MD study of the H4 histone tail. Biophys. J. 2018;115:2348–2367. doi: 10.1016/j.bpj.2018.11.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Rabdano S.O., Shannon M.D., et al. Jaroniec C.P. Histone H4 Tails in Nucleosomes: a Fuzzy Interaction with DNA. Angew. Chem., Int. Ed. 2021;60:6480–6487. doi: 10.1002/anie.202012046. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Salvi N., Abyzov A., Blackledge M. Multi-Timescale Dynamics in Intrinsically Disordered Proteins from NMR Relaxation and Molecular Simulation. J. Phys. Chem. Lett. 2016;7:2483–2489. doi: 10.1021/acs.jpclett.6b00885. [DOI] [PubMed] [Google Scholar]
- 40.Dünweg B., Kremer K. Molecular dynamics simulation of a polymer chain in solution. J. Chem. Phys. 1993;99:6983–6997. [Google Scholar]
- 41.Heyes D.M., Cass M.J., et al. Evans W.A.B. Self-Diffusion Coefficient of the Hard-Sphere Fluid: System Size Dependence and Empirical Correlations. J. Phys. Chem. B. 2007;111:1455–1464. doi: 10.1021/jp067373s. [DOI] [PubMed] [Google Scholar]
- 42.Raabe G., Sadus R.J. Molecular dynamics simulation of the effect of bond flexibility on the transport properties of water. J. Chem. Phys. 2012;137:104512. doi: 10.1063/1.4749382. [DOI] [PubMed] [Google Scholar]
- 43.Aimoli C.G., Maginn E.J., Abreu C.R.A. Transport properties of carbon dioxide and methane from molecular dynamics simulations. J. Chem. Phys. 2014;141:134101. doi: 10.1063/1.4896538. [DOI] [PubMed] [Google Scholar]
- 44.Moultos O.A., Zhang Y., et al. Maginn E.J. System-size corrections for self-diffusion coefficients calculated from molecular dynamics simulations: The case of CO2, n-alkanes, and poly(ethylene glycol) dimethyl ethers. J. Chem. Phys. 2016;145:074109. doi: 10.1063/1.4960776. [DOI] [PubMed] [Google Scholar]
- 45.Ferrario V., Pleiss J. Simulation of protein diffusion: a sensitive probe of protein-solvent interactions. J. Biomol. Struct. Dyn. 2019;37:1534–1544. doi: 10.1080/07391102.2018.1461689. [DOI] [PubMed] [Google Scholar]
- 46.Klein T., Lenahan F.D., et al. Fröba A.P. Characterization of Long Linear and Branched Alkanes and Alcohols for Temperatures up to 573.15 K by Surface Light Scattering and Molecular Dynamics Simulations. J. Phys. Chem. B. 2020;124:4146–4163. doi: 10.1021/acs.jpcb.0c01740. [DOI] [PubMed] [Google Scholar]
- 47.Yeh I.-C., Hummer G. System-Size Dependence of Diffusion Coefficients and Viscosities from Molecular Dynamics Simulations with Periodic Boundary Conditions. J. Phys. Chem. B. 2004;108:15873–15879. [Google Scholar]
- 48.Dünweg B. Molecular dynamics algorithms and hydrodynamic screening. J. Chem. Phys. 1993;99:6977–6982. [Google Scholar]
- 49.Basconi J.E., Shirts M.R. Effects of Temperature Control Algorithms on Transport Properties and Kinetics in Molecular Dynamics Simulations. J. Chem. Theory Comput. 2013;9:2887–2899. doi: 10.1021/ct400109a. [DOI] [PubMed] [Google Scholar]
- 50.Hicks A., MacAinsh M., Zhou H.-X. Removing Thermostat Distortions of Protein Dynamics in Constant-Temperature Molecular Dynamics Simulations. J. Chem. Theory Comput. 2021;17:5920–5932. doi: 10.1021/acs.jctc.1c00448. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Bussi G., Donadio D., Parrinello M. Canonical sampling through velocity rescaling. J. Chem. Phys. 2007;126:014101. doi: 10.1063/1.2408420. [DOI] [PubMed] [Google Scholar]
- 52.Hansen J.C., Tse C., Wolffe A.P. Structure and Function of the Core Histone N-Termini: More Than Meets the Eye. Biochemistry. 1998;37:17637–17641. doi: 10.1021/bi982409v. [DOI] [PubMed] [Google Scholar]
- 53.Tsimpanogiannis I.N., Moultos O.A., et al. Economou I.G. Self-diffusion coefficient of bulk and confined water: a critical review of classical molecular simulation studies. Mol. Simul. 2019;45:425–453. [Google Scholar]
- 54.Case D.A., Belfon K., et al. Kollman P.A. University of California; 2020. Amber 2020. [Google Scholar]
- 55.Maier J.A., Martinez C., et al. Simmerling C. ff14SB: improving the accuracy of protein side chain and backbone parameters from ff99SB. J. Chem. Theory Comput. 2015;11:3696–3713. doi: 10.1021/acs.jctc.5b00255. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Horn H.W., Swope W.C., et al. Head-Gordon T. Development of an improved four-site water model for biomolecular simulations: TIP4P-Ew. J. Chem. Phys. 2004;120:9665–9678. doi: 10.1063/1.1683075. [DOI] [PubMed] [Google Scholar]
- 57.Izadi S., Anandakrishnan R., Onufriev A.V. Building water models: a different approach. J. Phys. Chem. Lett. 2014;5:3863–3871. doi: 10.1021/jz501780a. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.García De La Torre J., Huertas M.L., Carrasco B. Calculation of hydrodynamic properties of globular proteins from their atomic-level structure. Biophys. J. 2000;78:719–730. doi: 10.1016/S0006-3495(00)76630-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Nygaard M., Kragelund B.B., et al. Lindorff-Larsen K. An Efficient Method for Estimating the Hydrodynamic Radius of Disordered Protein Conformations. Biophys. J. 2017;113:550–557. doi: 10.1016/j.bpj.2017.06.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Pesce F., Newcombe E.A., et al. Lindorff-Larsen K. Assessment of models for calculating the hydrodynamic radius of intrinsically disordered proteins. Biophys. J. 2023;122:310–321. doi: 10.1016/j.bpj.2022.12.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Tranchant E.E., Pesce F., et al. Lindorff-Larsen K. Revisiting the Use of Dioxane as a Reference Compound for Determination of the Hydrodynamic Radius of Proteins by Pulsed Field Gradient NMR Spectroscopy. bioRxiv. 2023 Preprint at. [Google Scholar]
- 62.Jha A.K., Colubri A., et al. Sosnick T.R. Statistical coil model of the unfolded state: resolving the reconciliation problem. Proc. Natl. Acad. Sci. USA. 2005;102:13099–13104. doi: 10.1073/pnas.0506078102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Krivov G.G., Shapovalov M.V., Dunbrack R.L. Improved prediction of protein side-chain conformations with SCWRL4. Proteins. 2009;77:778–795. doi: 10.1002/prot.22488. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Nguyen H., Roe D.R., Simmerling C. Improved Generalized Born solvent model parameters for protein simulations. J. Chem. Theory Comput. 2013;9:2020–2034. doi: 10.1021/ct3010485. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Olsson M.H.M., Søndergaard C.R., et al. Jensen J.H. PROPKA3: consistent treatment of internal and surface residues in empirical pKa predictions. J. Chem. Theory Comput. 2011;7:525–537. doi: 10.1021/ct100578z. [DOI] [PubMed] [Google Scholar]
- 66.Joung I.S., Cheatham T.E. Determination of alkali and halide monovalent ion parameters for use in explicitly solvated biomolecular simulations. J. Phys. Chem. B. 2008;112:9020–9041. doi: 10.1021/jp8001614. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Berendsen H.J.C., Postma J.P.M., et al. Haak J.R. Molecular Dynamics with coupling to an external bath. J. Chem. Phys. 1984;81:3684–3690. [Google Scholar]
- 68.Ryckaert J.P., Ciccotti G., Berendsen H.J. Numerical integration of the cartesian equations of motion of a system with constraints: molecular dynamics of N-alkanes. J. Comput. Phys. 1977;23:327–341. [Google Scholar]
- 69.Page A.J., Isomoto T., et al. Morokuma K. Effects of Molecular Dynamics Thermostats on Descriptions of Chemical Nonequilibrium. J. Chem. Theory Comput. 2012;8:4019–4028. doi: 10.1021/ct3004639. [DOI] [PubMed] [Google Scholar]
- 70.Frenkel D., Smit B. Academic Press; 2002. Understanding Molecular Simulation: From Algorithms to Applications. [Google Scholar]
- 71.Vijay-Kumar S., Bugg C.E., Cook W.J. Structure of ubiquitin refined at 1.8 Å resolution. J. Mol. Biol. 1987;194:531–544. doi: 10.1016/0022-2836(87)90679-6. [DOI] [PubMed] [Google Scholar]
- 72.Hopkins C.W., Le Grand S., et al. Roitberg A.E. Long-Time-Step Molecular Dynamics through Hydrogen Mass Repartitioning. J. Chem. Theory Comput. 2015;11:1864–1874. doi: 10.1021/ct5010406. [DOI] [PubMed] [Google Scholar]
- 73.von Bülow S., Bullerjahn J.T., Hummer G. Systematic errors in diffusion coefficients from long-time molecular dynamics simulations at constant pressure. J. Chem. Phys. 2020;153:021101. doi: 10.1063/5.0008316. [DOI] [PubMed] [Google Scholar]
- 74.Haile J.M. John Wiley & Sons, Inc.; 1992. Molecular Dynamics Simulation. [Google Scholar]
- 75.Qian H., Sheetz M.P., Elson E.L. Single particle tracking. Analysis of diffusion and flow in two-dimensional systems. Biophys. J. 1991;60:910–921. doi: 10.1016/S0006-3495(91)82125-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Michalet X. Mean square displacement analysis of single-particle trajectories with localization error: Brownian motion in an isotropic medium. Phys. Rev. E Stat. Nonlin. Soft Matter Phys. 2010;82:041914. doi: 10.1103/PhysRevE.82.041914. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Fliege J., Maier U. The distribution of points on the sphere and corresponding cubature formulae. IMA J. Numer. Anal. 1999;19:317–334. [Google Scholar]
- 78.Wong V., Case D.A. Evaluating rotational diffusion from protein MD simulations. J. Phys. Chem. B. 2008;112:6013–6024. doi: 10.1021/jp0761564. [DOI] [PubMed] [Google Scholar]
- 79.Press W.H., Teukolsky S.A., et al. Flannery B.P. Cambridge University Press; 1992. Numerical Recipes in C. [Google Scholar]
- 80.Tjandra N., Feller S.E., et al. Bax A. Rotational diffusion anisotropy of human ubiquitin from 15N NMR relaxation. J. Am. Chem. Soc. 1995;117:12562–12566. [Google Scholar]
- 81.Linke M., Köfinger J., Hummer G. Rotational Diffusion Depends on Box Size in Molecular Dynamics Simulations. J. Phys. Chem. Lett. 2018;9:2874–2878. doi: 10.1021/acs.jpclett.8b01090. [DOI] [PubMed] [Google Scholar]
- 82.Virtanen P., Gommers R., et al. SciPy 1.0 Contributors SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods. 2020;17:261–272. doi: 10.1038/s41592-019-0686-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Palmer A.G. NMR characterization of the dynamics of biomacromolecules. Chem. Rev. 2004;104:3623–3640. doi: 10.1021/cr030413t. [DOI] [PubMed] [Google Scholar]
- 84.Palmer A.G. NMR probes of molecular dynamics: overview and comparison with other techniques. Annu. Rev. Biophys. Biomol. Struct. 2001;30:129–155. doi: 10.1146/annurev.biophys.30.1.129. [DOI] [PubMed] [Google Scholar]
- 85.Fleming P.J., Correia J.J., Fleming K.G. Revisiting macromolecular hydration with HullRadSAS. Eur. Biophys. J. 2023;52:215–224. doi: 10.1007/s00249-022-01627-8. [DOI] [PubMed] [Google Scholar]
- 86.Leaist D.G., MacEwan K., et al. Zamari M. Binary Mutual Diffusion Coefficients of Aqueous Cyclic Ethers at 25 °C. Tetrahydrofuran, 1,3-Dioxolane, 1,4-Dioxane, 1,3-Dioxane, Tetrahydropyran, and Trioxane. J. Chem. Eng. Data. 2000;45:815–818. [Google Scholar]
- 87.Clisby N., Dünweg B. High-precision estimate of the hydrodynamic radius for self-avoiding walks. Phys. Rev. E. 2016;94:052102. doi: 10.1103/PhysRevE.94.052102. [DOI] [PubMed] [Google Scholar]
- 88.Lazar G.A., Desjarlais J.R., Handel T.M. De novo design of the hydrophobic core of ubiquitin. Protein Sci. 1997;6:1167–1178. doi: 10.1002/pro.5560060605. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Liu Z., Zhang W.P., et al. Tang C. Noncovalent dimerization of ubiquitin. Angew. Chem., Int. Ed. 2012;51:469–472. doi: 10.1002/anie.201106190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Price W.S. Cambridge University Press; 2009. NMR Studies of Translational Motion: Principles and Applications. [Google Scholar]
- 91.Jerschow A., Müller N. Suppression of convection artifacts in stimulated-echo diffusion experiments. Double-stimulated-echo experiments. J. Magn. Reson. 1997;125:372–375. [Google Scholar]
- 92.Sklenar V., Piotto M., et al. Saudek V. Gradient-tailored water suppression for 1H-15N HSQC experiments optimized to retain full sensitivity. J. Magn. Reson. 1993;102:241–245. [Google Scholar]
- 93.Kharkov B.B., Podkorytov I.S., et al. Skrynnikov N.R. The role of rotational motion in diffusion NMR experiments on supramolecular assemblies: application to Sup35NM fibrils. Angew. Chem. Int. Ed. 2021;60:15445–15451. doi: 10.1002/anie.202102408. [DOI] [PubMed] [Google Scholar]
- 94.Stejskal E.O., Tanner J.E. Spin diffusion measurements: spin echoes in the presence of a time-dependent field gradient. J. Chem. Phys. 1965;42:288–292. [Google Scholar]
- 95.Charlier C., Khan S.N., et al. Ferrage F. Nanosecond Time Scale Motions in Proteins Revealed by High-Resolution NMR Relaxometry. J. Am. Chem. Soc. 2013;135:18665–18672. doi: 10.1021/ja409820g. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Onufriev A.V., Izadi S. Water models for biomolecular simulations. WIREs Comput. Mol. Sci. 2018;8:e1347. [Google Scholar]
- 97.Jorgensen W.L., Chandrasekhar J., et al. Klein M.L. Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 1983;79:926–935. [Google Scholar]
- 98.Berendsen H.J.C., Grigera J.R., Straatsma T.P. The missing term in effective pair potential. J. Phys. Chem. 1987;91:6269–6271. [Google Scholar]
- 99.Best R.B., Mittal J. Protein Simulations with an Optimized Water Model: Cooperative Helix Formation and Temperature-Induced Unfolded State Collapse. J. Phys. Chem. B. 2010;114:14916–14923. doi: 10.1021/jp108618d. [DOI] [PubMed] [Google Scholar]
- 100.Mobley D.L., Bayly C.I., et al. Dill K.A. Small Molecule Hydration Free Energies in Explicit Solvent: An Extensive Test of Fixed-Charge Atomistic Simulations. J. Chem. Theory Comput. 2009;5:350–358. doi: 10.1021/ct800409d. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.Best R.B., Zheng W., Mittal J. Balanced protein-water interactions improve properties of disordered proteins and non-specific protein association. J. Chem. Theory Comput. 2014;10:5113–5124. doi: 10.1021/ct500569b. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102.Nerenberg P.S., Head-Gordon T. Optimizing Protein−Solvent Force Fields to Reproduce Intrinsic Conformational Preferences of Model Peptides. J. Chem. Theory Comput. 2011;7:1220–1230. doi: 10.1021/ct2000183. [DOI] [PubMed] [Google Scholar]
- 103.Nerenberg P.S., Jo B., et al. Head-Gordon T. Optimizing solute-water van der Waals interactions to reproduce solvation free energies. J. Phys. Chem. B. 2012;116:4524–4534. doi: 10.1021/jp2118373. [DOI] [PubMed] [Google Scholar]
- 104.Abascal J.L.F., Vega C. A general purpose model for the condensed phases of water: TIP4P/2005. J. Chem. Phys. 2005;123:234505. doi: 10.1063/1.2121687. [DOI] [PubMed] [Google Scholar]
- 105.Lindorff-Larsen K., Piana S., et al. Shaw D.E. Improved side-chain torsion potentials for the Amber ff99SB protein force field. Proteins. 2010;78:1950–1958. doi: 10.1002/prot.22711. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 106.Rauscher S., Gapsys V., et al. Grubmüller H. Structural ensembles of intrinsically disordered proteins depend strongly on force field: a comparison to experiment. J. Chem. Theory Comput. 2015;11:5513–5524. doi: 10.1021/acs.jctc.5b00736. [DOI] [PubMed] [Google Scholar]
- 107.Henriques J., Skepö M. Molecular Dynamics simulations of intrinsically disordered proteins: on the accuracy of the TIP4P-D water model and the representativeness of protein disorder models. J. Chem. Theory Comput. 2016;12:3407–3415. doi: 10.1021/acs.jctc.6b00429. [DOI] [PubMed] [Google Scholar]
- 108.Miller M.S., Lay W.K., Elcock A.H. Osmotic Pressure Simulations of Amino Acids and Peptides Highlight Potential Routes to Protein Force Field Parameterization. J. Phys. Chem. B. 2016;120:8217–8229. doi: 10.1021/acs.jpcb.6b01902. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 109.Zapletal V., Mládek A., et al. Hritz J. Choice of force field for proteins containing structured and intrinsically disordered regions. Biophys. J. 2020;118:1621–1633. doi: 10.1016/j.bpj.2020.02.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 110.Gil Pineda L.I., Milko L.N., He Y. Performance of CHARMM36m with modified water model in simulating intrinsically disordered proteins: a case study. Biophys. Rep. 2020;6:80–87. [Google Scholar]
- 111.Gopal S.M., Wingbermühle S., et al. Schäfer L.V. Conformational Preferences of an Intrinsically Disordered Protein Domain: A Case Study for Modern Force Fields. J. Phys. Chem. B. 2021;125:24–35. doi: 10.1021/acs.jpcb.0c08702. [DOI] [PubMed] [Google Scholar]
- 112.Abriata L.A., Dal Peraro M. Assessment of transferable forcefields for protein simulations attests improved description of disordered states and secondary structure propensities, and hints at multi-protein systems as the next challenge for optimization. Comput. Struct. Biotechnol. J. 2021;19:2626–2636. doi: 10.1016/j.csbj.2021.04.050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 113.Paul A., Samantray S., et al. Strodel B. Thermodynamics and kinetics of the amyloid-β peptide revealed by Markov state models based on MD data in agreement with experiment. Chem. Sci. 2021;12:6652–6669. doi: 10.1039/d0sc04657d. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 114.Pedersen K.B., Flores-Canales J.C., Schiøtt B. Predicting molecular properties of α-synuclein using force fields for intrinsically disordered proteins. Proteins. 2023;91:47–61. doi: 10.1002/prot.26409. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 115.Holz M., Heil S.R., Sacco A. Temperature-dependent self-diffusion coefficients of water and six selected molecular liquids for calibration in accurate 1H NMR PFG measurements. Phys. Chem. Chem. Phys. 2000;2:4740–4742. [Google Scholar]
- 116.Andersen H.C. Molecular dynamics simulations at constant pressure and/or temperature. J. Chem. Phys. 1980;72:2384–2393. [Google Scholar]
- 117.Nosé S. A unified formulation of the constant temperature molecular dynamics methods. J. Chem. Phys. 1984;81:511–519. [Google Scholar]
- 118.Hoover W.G. Canonical dynamics: Equilibrium phase-space distributions. Phys. Rev. A. 1985;31:1695–1697. doi: 10.1103/physreva.31.1695. [DOI] [PubMed] [Google Scholar]
- 119.Wang X., Moore S.C., et al. Ausió J. Acetylation increases the alpha-helical content of the histone tails of the nucleosome. J. Biol. Chem. 2000;275:35013–35020. doi: 10.1074/jbc.M004998200. [DOI] [PubMed] [Google Scholar]
- 120.Potoyan D.A., Papoian G.A. Energy Landscape Analyses of Disordered Histone Tails Reveal Special Organization of Their Conformational Dynamics. J. Am. Chem. Soc. 2011;133:7405–7415. doi: 10.1021/ja1111964. [DOI] [PubMed] [Google Scholar]
- 121.Yang D., Arya G. Structure and binding of the H4 histone tail and the effects of lysine 16 acetylation. Phys. Chem. Chem. Phys. 2011;13:2911–2921. doi: 10.1039/c0cp01487g. [DOI] [PubMed] [Google Scholar]
- 122.Winogradoff D., Echeverria I., et al. Papoian G.A. The acetylation landscape of the H4 histone tail: disentangling the interplay between the specific and cumulative effects. J. Am. Chem. Soc. 2015;137:6245–6253. doi: 10.1021/jacs.5b00235. [DOI] [PubMed] [Google Scholar]
- 123.Zhou B.R., Feng H., et al. Bai Y. Histone H4 K16Q mutation, an acetylation mimic, causes structural disorder of its N-terminal basic patch in the nucleosome. J. Mol. Biol. 2012;421:30–37. doi: 10.1016/j.jmb.2012.04.032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 124.Gao M., Nadaud P.S., et al. Jaroniec C.P. Histone H3 and H4 N-terminal tails in nucleosome arrays at cellular concentrations probed by magic angle spinning NMR spectroscopy. J. Am. Chem. Soc. 2013;135:15278–15281. doi: 10.1021/ja407526s. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 125.Shabane P.S., Onufriev A.V. Significant compaction of H4 histone tail upon charge neutralization by acetylation and its mimics, possible effects on chromatin structure. J. Mol. Biol. 2021;433:166683. doi: 10.1016/j.jmb.2020.10.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 126.Kim T.H., Nosella M.L., et al. Kay L.E. Correlating histone acetylation with nucleosome core particle dynamics and function. Proc. Natl. Acad. Sci. USA. 2023;120 doi: 10.1073/pnas.2301063120. e2301063120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 127.Furukawa A., Wakamori M., et al. Nishimura Y. Acetylated histone H4 tail enhances histone H3 tail acetylation by altering their mutual dynamics in the nucleosome. Proc. Natl. Acad. Sci. USA. 2020;117:19661–19663. doi: 10.1073/pnas.2010506117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 128.Peng Y., Li S., et al. Panchenko A.R. Binding of regulatory proteins to nucleosomes is modulated by dynamic histone tails. Nat. Commun. 2021;12:5280. doi: 10.1038/s41467-021-25568-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 129.Cavanagh J., Fairbrother W.J., et al. Rance M. Principles and Practice. Second edition. Academic Press Inc.; 2007. Protein NMR Spectroscopy. [Google Scholar]
- 130.Lindorff-Larsen K., Kristjansdottir S., et al. Vendruscolo M. Determination of an ensemble of structures representing the denatured state of the bovine acyl-coenzyme A binding protein. J. Am. Chem. Soc. 2004;126:3291–3299. doi: 10.1021/ja039250g. [DOI] [PubMed] [Google Scholar]
- 131.Mao A.H., Crick S.L., et al. Pappu R.V. Net charge per residue modulates conformational ensembles of intrinsically disordered proteins. Proc. Natl. Acad. Sci. USA. 2010;107:8183–8188. doi: 10.1073/pnas.0911107107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 132.Lincoff J., Haghighatlari M., et al. Head-Gordon T. Extended Experimental Inferential Structure Determination Method in Determining the Structural Ensembles of Disordered Protein States. Commun. Chem. 2020;3:74. doi: 10.1038/s42004-020-0323-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 133.Fagerberg E., Lenton S., et al. Skepö M. Self-Diffusive Properties of the Intrinsically Disordered Protein Histatin 5 and the Impact of Crowding Thereon: A Combined Neutron Spectroscopy and Molecular Dynamics Simulation Study. J. Phys. Chem. B. 2022;126:789–801. doi: 10.1021/acs.jpcb.1c08976. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 134.Choy W.Y., Mulder F.A.A., et al. Kay L.E. Distribution of molecular size within an unfolded state ensemble using small-angle X-ray scattering and pulse field gradient NMR techniques. J. Mol. Biol. 2002;316:101–112. doi: 10.1006/jmbi.2001.5328. [DOI] [PubMed] [Google Scholar]
- 135.Bernadó P., Blackledge M. A self-consistent description of the conformational behavior of chemically denatured proteins from NMR and small angle scattering. Biophys. J. 2009;97:2839–2845. doi: 10.1016/j.bpj.2009.08.044. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 136.Naullage P.M., Haghighatlari M., et al. Head-Gordon T. Protein Dynamics to Define and Refine Disordered Protein Ensembles. J. Phys. Chem. B. 2022;126:1885–1894. doi: 10.1021/acs.jpcb.1c10925. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 137.Amorós D., Ortega A., García de la Torre J. Prediction of hydrodynamic and other solution properties of partially disordered proteins with a simple, coarse-grained model. J. Chem. Theory Comput. 2013;9:1678–1685. doi: 10.1021/ct300948u. [DOI] [PubMed] [Google Scholar]
- 138.García de la Torre J., Hernández Cifre J.G. Hydrodynamic Properties of Biomacromolecules and Macromolecular Complexes: Concepts and Methods. A Tutorial Mini-review. J. Mol. Biol. 2020;432:2930–2948. doi: 10.1016/j.jmb.2019.12.027. [DOI] [PubMed] [Google Scholar]
- 139.Braga C., Travis K.P. Computer simulation of the role of torsional flexibility on mass and momentum transport for a series of linear alkanes. J. Chem. Phys. 2012;137:064116. doi: 10.1063/1.4742187. [DOI] [PubMed] [Google Scholar]
- 140.Bulacu M., van der Giessen E. Effect of bending and torsion rigidity on self-diffusion in polymer melts: A molecular-dynamics study. J. Chem. Phys. 2005;123:114901. doi: 10.1063/1.2035086. [DOI] [PubMed] [Google Scholar]
- 141.Brookes E., Demeler B., et al. Rocco M. The implementation of SOMO (SOlution MOdeller) in the UltraScan analytical ultracentrifugation data analysis suite: enhanced capabilities allow the reliable hydrodynamic modeling of virtually any kind of biomacromolecule. Eur. Biophys. J. 2010;39:423–435. doi: 10.1007/s00249-009-0418-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 142.Ozenne V., Bauer F., et al. Blackledge M. Flexible-meccano: a tool for the generation of explicit ensemble descriptions of intrinsically disordered proteins and their associated experimental observables. Bioinformatics. 2012;28:1463–1470. doi: 10.1093/bioinformatics/bts172. [DOI] [PubMed] [Google Scholar]
- 143.Tesei G., Schulze T.K., et al. Lindorff-Larsen K. Accurate model of liquid–liquid phase behavior of intrinsically disordered proteins from optimization of single-chain properties. Proc. Natl. Acad. Sci. USA. 2021;118 doi: 10.1073/pnas.2111696118. e2111696118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 144.Kirkwood J.G., Riseman J. The Intrinsic Viscosities and Diffusion Constants of Flexible Macromolecules in Solution. J. Chem. Phys. 1948;16:565–573. [Google Scholar]
- 145.Case D.A., Aktulga H.M., et al. Kollman P.A. University of California; 2023. Amber 2023. [Google Scholar]
- 146.Miyamoto S., Kollman P.A. SETTLE: an analytical version of the SHAKE and RATTLE algorithm for rigid water models. J. Comput. Chem. 1992;13:952–962. [Google Scholar]
- 147.Jung J., Kobayashi C., Sugita Y. Optimal Temperature Evaluation in Molecular Dynamics Simulations with a Large Time Step. J. Chem. Theory Comput. 2019;15:84–94. doi: 10.1021/acs.jctc.8b00874. [DOI] [PubMed] [Google Scholar]
- 148.Hermann M.R., Hub J.S. SAXS-Restrained Ensemble Simulations of Intrinsically Disordered Proteins with Commitment to the Principle of Maximum Entropy. J. Chem. Theory Comput. 2019;15:5103–5115. doi: 10.1021/acs.jctc.9b00338. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Programmatic tools to calculate translational and rotational diffusion coefficients based on MD simulation data can be downloaded from https://github.com/bionmr-spbu-projects/2023-UBQ-NH4-DIFFUSION.








