Abstract
For intrinsically disordered proteins (IDPs), a pressing question is how sequence codes for function. Dynamics serves as a crucial link, reminiscent of the role of structure in sequence-function relations of structured proteins. To define general rules governing sequence-dependent backbone dynamics, we carried out long molecular dynamics simulations of eight IDPs. Blocks of residues exhibiting large amplitudes in slow dynamics are rigidified by local inter-residue interactions or secondary structures. A long region or an entire IDP can be slowed down by long-range contacts or secondary-structure packing. On the other hand, glycines promote fast dynamics and either demarcate rigid blocks or facilitate multiple modes of local and long-range inter-residue interactions. The sequence-dependent backbone dynamics endows IDPs with versatile response to binding partners, with some blocks recalcitrant while others readily adapting to intermolecular interactions.
Graphical Abstract

1. INTRODUCTION
The sequence-structure-function paradigm has guided protein biophysics for many decades. Intrinsically disordered proteins (IDPs) account for 30% to 50% of proteomes and perform myriad cellular functions including signaling and regulation1–2. Many IDPs, including amyloid-β peptides (e.g., Aβ40), tau, and α-synuclein, are also central players in human diseases such as Alzheimer’s and Parkinson’s3–4. The lack of well-defined structures poses the fundamental question of how IDP sequences code for functions. Many studies have shown that, for IDPs, dynamics serves as a crucial link between sequence and function, replacing the role ascribed to structure for structured proteins. For example, fast dynamics allows IDPs to rapidly adapt to target proteins in forming stereospecific complexes5. Likewise, uncoupled dynamics between neighboring blocks of an IDP enables fast dissociation from its target protein, leading to a desired short lifetime for the signaling complex6. The level of dynamic coupling between neighboring blocks may also dictate the competition between IDPs in binding to the same target protein7–8.
Sequence-dependent backbone dynamics on the ps-ns timescales of many IDPs has been characterized by NMR relaxation9–17. Whereas relaxation properties of structured proteins can be easily interpreted by well-separated ns global motions and ps internal motions18, dynamics of IDPs occur on many scales that are intricately linked and therefore analyzing the resulting NMR relaxation properties is not trivial. One approach is to fit the time correlation function of each backbone N-H bond vector to a sum of exponentials11–13, but then one is still at a loss regarding the molecular origins of any sequence dependence of the backbone dynamics. Another approach is to incorporate results from molecular dynamics (MD) simulations14, 16. Force fields for MD simulations have become accurate for predicting conformational properties19–21 but, up to the recent past, have not been reliable for predicting dynamic properties14, 16. Therefore MD results for relaxation rates or time constants had to be rescaled in order to match with NMR relaxation data, which unfortunately introduces uncertainty to MD-derived interpretations of backbone dynamics. For α-synuclein, both residues with small amplitudes (the NAC region, i.e., residues 62–95) and large amplitudes (e.g., Tyr39) on the slow timescale play important roles in fibrillation, but the connection between backbone dynamics and fibrillation is unclear14. In Aβ40, residues Leu16-Phe20 have large amplitudes on the slow time scale, and noting that these residues form direct contact with the protofibril surface22, it was speculated that the higher rigidity of these residues reduces the entropic cost for monomer-protofibril binding16. Similarly, a block of residues around Lys311 of tau nucleates its aggregation23; in the K18 fragment (residues 244–372) these residues exhibit restricted fast motions10.
Recent advances in the implementation of force fields and water models in MD simulations have led to accurate predictions of not only conformational but also dynamic properties of IDPs24–25. We tested multiple force fields in simulations of the disordered N-terminal region (NT) of ChiZ25, a member of the cell division machinery of Mycobacterium tuberculosis. The AMBER ff14SB force field26 in conjunction with the TIP4P-D water model27 predicted well not only the small-angle X-ray scattering (SAXS) profile and chemical shifts but also NMR relaxation rates. Two blocks of residues in the N-terminal half (N-half) of the 64-residue ChiZ-NT exhibit large amplitudes on the slow timescale, due to salt bridges, hydrogen bonds, cation-pi interactions, as well as polyproline II (PPII) formation. It was proposed that the N-half, rigidified by intramolecular interactions, would be recalcitrant to interactions with binding partners, whereas the C-terminal half (C-half) would readily adapt to binding partners. This model is validated in a subsequent study on the association of ChiZ with acidic membranes, showing that the C-half of NT is dominant in forming membrane contacts28. Our study on ChiZ backbone dynamics further hinted that the thermostat needed for regulating temperature in MD simulations can distort dynamics, and correcting for such distortions can further improve agreement with NMR relaxation data25. We have now developed a method for removing thermostat distortions of protein dynamics29.
To define general rules governing sequence-dependent backbone dynamics, here we performed long MD simulations of eight IDPs with varying sequence lengths and extents of secondary structure formations (Table 1 and Fig. 1). The MD results were validated by a variety of experimental data, including SAXS profiles, chemical shifts, paramagnetic relaxation enhancements (PREs), and NMR relaxation rates. The time correlation function of each backbone NH bond vector was fit to a sum of three exponentials, with time constants of 3–8 ns (τ1), 1–2 ns (τ2), and 0.1–0.2 ns (τ3). Blocks exhibiting large amplitudes in slow dynamics (τ1) are rigidified by side chain-side chain interactions or secondary structures, whereas glycines promote fast dynamics (τ3) and either demarcate rigid blocks or facilitate multiple modes of side chain-side chain interactions. The sequence-dependent backbone dynamics endows IDPs with versatile response to binding partners, with some blocks recalcitrant while others readily adapting to intermolecular interactions.
Table 1.
Sequences of eight IDPs.
| IDP name | # of Res | Sequence* | ||
|---|---|---|---|---|
| Aβ40 | 40 | 1 | DAEFRHDSGY EVHHQKLVFF AEDVGSNKGA IIGLMVGGVV | 40 |
| HOX-SCR | 87 | 298 | KKNPPQIYPW MKRVHLGTST VNANGETKRQ RTSYTRYQTL | 337 |
| 338 | ELEKEFHFNR YLTRRRRIEI AHALSLTERQ IKIWFQNRRM | 377 | ||
| 378 | KWKKEHK | 384 | ||
| HOX-DFD | 90 | 337 | TDGERIIYPW MKKIHVAGVA NGSYQPGMEP KRQRTAYTRH | 376 |
| 377 | QILELEKEFH YNRYLTRRRR IEIAHTLVLS ERQIKIWFQN | 416 | ||
| 417 | RRMKWKKDNK | 426 | ||
| SEV-NT | 124 | 401 | LSGGDGAYHE PTGGGAIEVA LDNADIDLET EAHADQDARG | 440 |
| 441 | WGGESGERWA RQVSGGHFVT LHGAERLEEE TNDEDVSDIE | 480 | ||
| 481 | RRIAMRLAER RQEDSATHGD EGRNNGVDHD EDDDAAAVAG | 520 | ||
| 521 | IGGI | 524 | ||
| tau K18 | 129 | 244 | QTAPVPMPDL KNVKSKIGST ENLKHQPGGG KVQIINKKLD | 283 |
| 284 | LSNVQSKCGS KDNIKHVPGG GSVQIVYKPV DLSKVTSKAG | 323 | ||
| 324 | SLGNIHHKPG GGQVEVKSEK LDFKDRVQSK IGSLDNITHV | 363 | ||
| 364 | PGGGNKKIE | 372 | ||
| A1-LCD | 131 | 1 | GSMASASSSQ RGRSGSGNFG GGRGGGFGGN DNFGRGGNFS | 40 |
| 41 | GRGGFGGSRG GGGYGGSGDG YNGFGNDGSN FGGGGNYNNQ | 80 | ||
| 81 | SSNFGPMKGG NFGGRSSGPY GGGGQYFAKP RNQGGYGGSS | 120 | ||
| 121 | SSSSYGSGRR F | 131 | ||
| β-synuclein | 134 | 1 | MDVFMKGLSM AKEGVVAAAE KTKQGVTEAA EKTKEGVLYV | 40 |
| 41 | GSKTREGVVQ GVASVAEKTK EQASHLGGAV FSGAGNIAAA | 80 | ||
| 81 | TGLVKREEFP TDLKPEEVAQ EAAEEPLIEP LMEPEGESYE | 120 | ||
| 121 | DPPQEEYQEY EPEA | 134 | ||
| α-synuclein | 140 | 1 | MDVFMKGLSK AKEGVVAAAE KTKQGVAEAA GKTKEGVLYV | 40 |
| 41 | GSKTKEGVVH GVATVAEKTK EQVTNVGGAV VTGVTAVAQK | 80 | ||
| 81 | TVEGAGSIAA ATGFVKKDQL GKNEEGAPQE GILEDMPVDP | 120 | ||
| 121 | DNEAYEMPSE EGYQDYEPEA | 140 |
Shaded and underlined residues: olive, α-helices; cyan, PGGG near the end of a microtubule-binding motif of tau; yellow, KTKEGV motifs of α- and β-synuclein; underline, GGGG, GGG, or GG.
Figure 1. Snapshots from MD simulations of three IDPs.

(A) Aβ40. (B) HOX-DFD. (C) α-synuclein. Basic and acidic residues are colored blue and red, respectively. The N-terminal and C-terminal residue numbers are shown, as are the residue number marking the start of the helical region of HOX-DFD and the residue numbers marking the end of the N-terminal region and the start of the C-terminal region of α-synuclein.
2. COMPUTATIONAL METHODS
2.1. MD Simulations.
MD simulations were run in AMBER1830 using the ff14SB force field26 for proteins and TIP4P-D for water27. The eight IDPs span a wide range in sequence length and level of secondary structure. Accordingly we generated their initial structures in several ways. Those of Aβ40 were snapshots, with a short α-helix or a short β-sheet, from a previous simulation using a different force field31. Initial structures of HOX-SCR and HOX-DFD were generated by the I-TASSER web server32; that of tau K18 was from the RaptorX web server33. The initial structure of the C-terminal domain of the nucleoprotein of Sendai virus (SEV-NT) consisted of an α-helix for residues 476–492 [built in Pymol (https://pymol.org/)] and disordered N- and C-terminal regions (built by tleap). Initial structures of the low-complexity domain of heterogeneous nuclear ribonucleoprotein A1 (A1-LCD) and α- and β-synuclein were generated by the TraDES web server34. Each initial structure was placed into a rectangular box, with a solvent layer ranging from ~20 Å (for the more open) to ~50 Å (for the more compact). Na+ and Cl− were added to neutralize the systems and provide the experimental salt concentrations. The initial extent of secondary structures, total number of atoms, NaCl concentration, and simulation temperature for each IDP are listed in Table S1.
For each system, with sander, energy minimization (2000 steps of steepest descent and 3000 steps of conjugate gradient) was followed by heating from 0 to the final temperature (Table S1) over 100 ps at a 1 fs timestep. Temperature was regulated by the Langevin thermostat35 with a 3.0 ps−1 damping constant. Bond lengths involving hydrogens were constrained using the SHAKE algorithm36. Long-range electrostatic interactions were treated by the particle mesh Ewald method37. The cutoff distance for nonbonded interactions were 10 Å. The simulations then continued on GPUs using pmemd.cuda38 in four replicates with different random seeds at constant temperature and pressure (1 atm) at a 2 fs timestep, initially for 3 ns and then for 3.2 μs. In the case of Aβ40 and A1-LCD, four replicates were prepared from the beginning using different initial structures. Pressure was regulated using the Berendsen barostat39. The final 2.5 μs of each trajectory, with snapshots saved every 20 ps, was used for analysis.
Chemical Shifts, Small Angle X-ray Scattering Profiles, and Paramagnetic Relaxation Enhancements.
These properties were averaged over MD snapshots saved every 1 ns. Chemical shifts were calculated using SHIFTX240. Secondary Cα and Cβ chemical shifts were calculated by subtracting random-coil values from POTENCI41. SAXS profiles were calculated using FoXS42, with calculated profile for each snapshot scaled to optimize the match with the experimental data. PREs were calculated using DEER-PREdict43.
Secondary Structures.
Secondary structures were calculated using the dssp command in cpptraj44. These results were expanded to include PPII formation when three or more consecutive coil residues fell into the PPII region of the Ramachandran map.
Contact Numbers and Contact Maps.
Distances between heavy atoms (except those in the same residue) were calculated by loading trajectories into MDTraj45. A contact was defined when two heavy atoms were within 3.5 Å of each other. The contact number of a residue was the number of residues beyond the immediate neighbors with which at least one contact was formed. This number was calculated for each snapshot and then averaged over all the saved snapshots. For calculating the contact map, the contact frequency between any two heavy atoms from two different side chains was calculated by counting the fraction of snapshots in which that contact was formed. We also tested a 5 Å cutoff in calculating contact maps.
NMR Relaxation Properties.
For each non-proline residue, the time correlation function of the NH bond vector was calculated as
| [1] |
where n(t′) is the unit vector along the NH bond at time t′; P2(x) is the second order Legendre polynomial; and < ⋯ >t′ denotes time average over each trajectory. After further averaging over the four replicate simulations, the correlation function was fit to a tri-exponential function,
| [2] |
by a custom python code25. The fit was done without constraints on any parameters. In particular, the sum of the amplitudes was not constrained to 1, allowing for missing amplitudes of ultrafast motions (i.e., comparable to the 20 ps interval at which snapshots were saved). The default for the upper bound of the time range was 15 ns, but for a small number of residues fitting errors were reduced by increasing this upper bound. We also tested fits to a sum of two or four exponentials, and the tri-exponential function was judged to be optimal (see below). Following our previous study29, the three time constants were corrected by a scaling factor according to Eq [8] given below in order to remove dynamic distortions of the Langevin thermostat used in the MD simulations.
The spectral density for the time correlation function of Eq [2] was
| [3] |
Finally R1, R2, and NOE were given by
| [4] |
| [5] |
| [6] |
where and . The meanings of the symbols are: μ0, permittivity of free space; ℏ, reduced Plank constant; γH and γN, gyromagnetic ratios of hydrogen and nitrogen; ωH = γHB0, Larmor frequency of hydrogen; ωN, counterpart of nitrogen; rNH, NH bond length (set at 1.02 Å); and ΔCSA (= −170 ppm), chemical shift anisotropy of nitrogen. The magnetic field strengths were 600 MHz for Aβ40, α-and β-synuclein, HOX-DFD, and HOX-SCR; 700 MHz for tau K18; 800 MHz for A1-LCD; and 850 MHz for SEV-NT. Root-mean-square-errors (RMSEs) for R1, R2, and NOE were calculated over non-terminal residues.
3. RESULTS
The eight IDPs studied here range in sequence length from Aβ40 with 40 residues to α-synuclein with 140 residues (Table 1). Five of the proteins, Aβ40, tau K18, A1-LCD, α- and β-synuclein, are fully disordered10, 17, 46–47 (Fig. 1A,C), but two others, HOX transcription factors DFD and SCR, have three stable α-helices15 (Fig. 1B), and one, SEV-NT, has a well-populated α-helix11. Following our previous study on ChiZ-NT25, we used the AMBER ff14SB force field26 and the TIP4P-D water model27 to run four replicate simulations for each of the eight IDPs. Each replicate simulation was 3.2 μs, resulting in a total simulation time of 102.4 μs. The total numbers of atoms in the simulation systems ranged from 240,000 to 370,000 atoms (Table S1). Below we present validation of the MD results by a variety of experimental data (Table S1) and identify the origins of the sequence-dependent backbone dynamics.
3.1. Experimental Validation of Conformational Ensembles From MD Simulations.
SAXS profiles report on the overall size of a protein and have been determined for three of the IDPs, tau K1848, A1-LCD17, and α-synuclein49. In all the three cases, the SAXS profiles calculated on conformations sampled by our MD simulations show good agreement with the experimental counterparts (Fig. S1). To provide a direct measure of the sizes of the IDPs, we calculated the distributions of the radius of gyration (Rg; Fig. S2). For three of the IDPs, Aβ40, tau K18, and α-synuclein, the mean Rg values agree with those predicted from a scaling relation,
| [7] |
deduced from a set of IDPs (N: number of residues)50. The mean Rg values of HOX-DFD, HOX-SCR, and SEV-NT are lower than predicted by Eq [7], due to the presence of α-helices. The mean Rg value of A1-LCD is also lower than predicted, due to extensive local and long-range interactions (see below).
Secondary chemical shifts indicate the formation of α-helices and β-strands, and have been measured for all the eight IDPs9–11, 15, 17, 46, 51. In Figs. 2A–C and S3A–E, we compare the experimental data with the secondary chemical shifts calculated on conformations sampled by our MD simulations. Overall, there is reasonable agreement. Most of the secondary chemical shifts are close to 0, indicating complete disorder. The exceptions are three blocks in both of the HOX transcription factors and one block in SEV-NT, with large positive secondary chemical shifts indicating high propensities for α-helices. In later figures, we also report the helix and β-strand propensities of individual residues, which confirm the lack of well-populated secondary structures other than the α-helices in HOX-DFD, HOX-SCR, and SEV-NT.
Figure 2. Experimental validation of MD conformational ensembles.

(A-C) Comparison of calculated and experimental secondary chemical shifts for Aβ40, HOX-DFD, and α-synuclein. RMSE values are shown in the legends. (D) Comparison of calculated (red curves) and experimental (gray bars) PREs by spin labels at 12 residues in α-synuclein.
Inter-residue nuclear Overhauser effect (NOE) cross-peaks report on pairs of residues that can form close contact between hydrogen atoms; for Aβ40, NOEs between Hα at residue i and HN at residue i + 2 “were seen within the Asp7-Glu11, Phe20-Ser26, and Gly29-Ile31 regions”46 (Fig. S3F). We calculated the effective interproton distances, <r−6>−1/6 52, which implicated NOEs for His6, Asp7, His13, Asp23, and Asn27 (Fig. S3F). The calculated results, except for the addition of His13, span the same three regions as the observed αN(i, i + 2) NOEs.
PREs can reveal long-range interactions, and have been measured for spin labels placed at 12 residues distributed throughout the sequence of α-synuclein47, 53–55. Consistent with the experimental data, a site in the N-terminal region (residues 1–61) can interact with the C-terminal region (residues 96–140) and vice versa; likewise a site in the central NAC region can interact with the C-terminal region and vice versa (Fig. 2D). For example, our MD results correctly predict that a spin label at residue 12047 has propensities for interacting with residues around position 40 and residues around position 90. The calculated PREs for β-synuclein do not agree as well with the experimental values47 (Fig. S4A–C). The MD results correctly predict the absence of contacts between a spin label at position 20 with the central region, but incorrectly predict interaction propensities between a spin label at residue 114 with residues around position 10 and residues around position 75, due to over-compaction of the conformations sampled in the simulations (Fig. S2G). For tau K18, consistent with experimental data56, PREs calculated from the MD conformations indicate absence of long-range interactions (Fig. S4D). Next we document the validation of our MD simulations by NMR relaxation data.
3.2. Correcting for Dynamic Distortion by the Langevin Thermostat.
The backbone amide transverse (R2) and longitudinal (R1) relaxation rates and nuclear Overhauser effects (NOE) are determined by the time correlation function of the NH bond vector (see Computational Methods). We fit this correlation function to a sum of three exponentials, with time constants τ1, τ2, and τ3 (ordered from long to short) and amplitudes A1, A2, and A325. Illustrative fits are shown in Fig. S5A–C. Note that we did not restrain the sum of the amplitudes to be 1; the missing amplitude represents ultrafast motion (comparable to the 20-ps interval for saving snapshots from the MD simulations). The exponential components with the long (τ1), intermediate (τ2), and short (τ3) time constants contribute the most to R2, R1, and NOE, respectively. We also tested fits to a sum of two or four exponentials (Fig. S6). The bi-experimental fits clearly miss a component with a time constant ~ 0.1 ns, except for residues in stably structured regions (e.g., Ile397 in the middle of a helix in HOX-DFD; Fig. S6D). The four-exponential function produced marginally lower residuals for disordered residues when compared to the tri-exponential function, but resulted in overfit for stably structured residues. The tri-exponential function thus has the optimal balance between accuracy and robustness.
We have recently shown systematic time dilation by the Langevin thermostat35 used here and elsewhere for regulating temperature in MD simulations, and found a correction scheme29. A raw time constant τraw obtained in an MD simulation is corrected by a scaling factor,
| [8] |
The constants a and b depend on the damping constant of the Langevin thermostat and are 1.526 and 0.086, respectively, for the damping constant of 3 ps−1 in the present study. The MD simulations in the present study were also run at constant pressure using the Berendsen barostat39; however, we did not find any additional effect of the barostat on protein dynamics. Note that the scaling factor is proportional to τraw, and thus has the strongest effect on τ1 and weakest effect on τ3. Figure S5D illustrates the correction for the three time constants of Gln15 in Aβ40 and of Ile397 in HOX-DFD. Figure S7 demonstrates that the correction for the thermostat effects dramatically reduces the RMSE, from 3.1 s−1 to 0.5 s−1, of the R2 values of α-synuclein, while maintaining the sequence-dependent profile. In fact, the R2, R1, and NOE values calculated from the MD simulations match well with the experimental data for all the eight IDPs (Figs. 3 and S8A to S12A).
Figure 3. Backbone NMR relaxation properties.

(A) Aβ40. (B) HOX-DFD. (C) α-synuclein. Calculated and experimental R2, R1, and NOE values are compared, with RMSE values shown in the legends. Horizontal dashed lines show average values over the N-half and C-half of Aβ40 or of HOX-DFD (demarcated by black vertical lines), or over the entire sequence of α-synuclein. For the latter protein, blue vertical lines are placed at residues 15, 26, 37, 48, 68, and 85 to indicate dips in R1 (and R2 in the case of Gly68); red vertical lines are placed at Tyr39, Lys96, and Asp121 to indicated elevated R2. Also shown at the top are secondary structure propensities and average contact numbers. The latter are displayed as bars on a gray scale, with white at 0 and black at 5 or above.
3.3. Backbone Dynamics of Aβ40.
Figure 3A displays the comparison between the calculated and experimental16 R2, R1, and NOE values for Aβ40 residues. We make the following observations on the sequence dependences of the calculated results. First, the profile of each relaxation property along the sequence has a bell shape, with the terminal residues having much lower values than internal residues. This observation holds for all the IDPs, and the corresponding elevated fast dynamics of the terminal residues on the fast timescale can be attributed to the fact that these residues are linked to the polypeptide chain only on one side. Second, the N-half of the sequence has higher R2 and NOE values than the C-half; the mean R2 values are ~4 s−1 and 3 s−1 for the N- and C-halves, respectively. This asymmetry, corresponding to more significant slow dynamics in the N-half, is similar to what was observed for ChiZ-NT25. Third, the highest R2 values are found for residues His13-His14-Gln15. The experimental R2 values are even higher, likely due to exchange between unprotonated and protonated histidines16 that is not modeled by our MD simulations. Fourth, a local R2 peak is observed in His6-Gln7 of the N-half, while the first block of eight residues in the C-half (Ala21-Lys28) has higher R2 values than the next block of eight residues.
To uncover the origins of the sequence-dependent backbone dynamics (of the non-terminal residues), we calculated the secondary structure propensities and contact numbers of individual residues (displayed above the relaxation properties in Fig. 3A). The latter was defined as the number of residues, other than the immediate neighbors, that form contacts with a given residue; contacts were defined as between heavy atoms within a cutoff distance of 3.5 Å. In line with the secondary chemical shifts (Fig. 2A), helices and β-strands are only sampled infrequently (< ~ 10% in all residues). PPIIs are also only sampled at low levels. The lowly populated secondary structures in Aβ40 do not explain the sequence-dependent backbone dynamics. Instead, the contact numbers provide the explanations. It is immediately clear that the N-half residues have higher contact numbers than the C-half residues. Moreover, His6, His13, and Gln15 that are in the local or global R2 peaks are among the residues with the highest contact numbers.
To further characterize residue-residue contacts, we calculated the contact map, which displays the frequencies of contact formation between any pair of side chain heavy atoms25. The contact frequencies above a threshold of 0.005 are displayed in Fig. 4A. Two blocks of residues, Glu3- Glu11 and Glu11-Lys16, show tendencies to form extensive contacts, leading to rigidification. That these two blocks overlap and are both in the N-half underlies the asymmetry in backbone dynamics between the N- and C-halves. In the Glu3-Glu11 block, the most frequent contacts include salt bridges of Arg5 with Glu3, Asp7, and Glu11 (Fig. 4A inset); hydrogen bonds of Ser8 with His6 (Fig. 4A inset) and Asp7; and cation-π interaction between Arg5 and Tyr10. In the Glu11-Lys16 block, the most frequent contacts include a salt bridge of Glu11 with Lys16; π-π stacking between His13 and His14 (Fig. 4A inset); a hydrogen bond between His13 and Lys16; and amino-π interactions of Gln15 with His13 and His14. These side chain-side chain interactions explain the local R2 peak in residues 6–7 and the global R2 peak in residues 13–15. Interestingly, Vemulapalli et al.57 recently reported NMR evidence for salt bridge formation of Arg5 with nearby acidic residues including Glu3 and Asp7. In the C-half, the Glu22-Lys28 block has a tendency to form a less extensive network of contacts, including a salt bridge between Glu22 and Lys28 (Fig. 4A inset) and hydrogen bonds of Ser26 with Glu22, Asp23, Asn27, and Lys28. These interactions explain why the first eight residues of the C-half have higher R2 values than the rest of the C-half. Note that the two serine residues, Ser8 and Ser26, can both hydrogen bond with multiple neighboring residues. These hydrogen bonds may be the reason for the αN(i, i + 2) NOEs described above.
Figure 4. Side chain-side chain contact maps.

(A) Aβ40. Contacts with frequencies higher than 0.005 are shown, but for Arg5-Tyr10 contacts, the threshold is lowered to 0.002. (B) HOX-DFD. Contacts with frequencies higher than 0.018 are shown, but for the Arg394-Trp346 contact, the threshold is lowered to 0.010. Blocks of residues with tendencies for extensive contacts are highlighted by red boxes. For HOX-DFD, α-helices are identified by blue boxes, and prominent interhelical and linker-helix contacts are indicated by magenta and orange ovals. Insets: snapshots illustrating selected contacts.
All the foregoing interactions involve polar (including charged) residues. Val18 and Phe20 are the only two nonpolar residues that form contacts with elevated frequencies. The N- and C-halves contain 12 and 5 polar residues, respectively, and thus the former sequence is much more polar than the latter sequence. Therefore the origin of the asymmetry in backbone dynamics can be traced to the disparity in number of polar residues between the two halves.
3.4. Backbone Dynamics of HOX-DFD.
In Fig. 3B we compare the calculated and experimental15 NMR relaxation data for HOX-DFD. In line with the experimental data, the most noticeable feature of the calculated relaxation data along the sequence is the significantly higher R2 and NOE of the helical C-half (starting at residue Arg375) relative to the disordered N-half. The mean R2 values are ~4 s−1 and 9 s−1 for the disordered and helical regions, respectively. In contrast, R1 is relatively uniform, except for a dip around residue Gly354. In the helical region, R2 shows dips in the two inter-helical linkers. In the disordered region, R2, R1, and NOE all show a local peak in residues Trp346-Met347, which is a part of the conserved YPWM motif. After the dip around Gly354, R2 rises steadily from Tyr360 to Arg368, and then rapidly until reaching the helical region.
The high R2 and NOE of the C-half can be easily explained by the three α-helices, which remain intact in the MD simulations (Fig. 3B, top row). On the other hand, the N-half has low secondary structure content and forms few contacts. The contact numbers do show higher values in the YPWM motif and in the 14 residues (starting at Tyr360) preceding the helical region, and thus track well the sequence dependence of R2 in the N-half. The contact map, with a frequency threshold of 0.018, in Fig. 4B provides more details. In the C-half, 119 side-chain heavy atoms, or 43%, form contacts with other side-chain heavy atoms with frequencies above the threshold. The counterparts in the N-half are only 30 and 20%. The side chain-side chain contacts in the C-half show a well-packed 3-helix bundle, including salt bridges between the helices (Glu384-Arg395, Glu382-Arg417, and Arg396-Glu407) and cation-π interactions between the inter-helical linkers and the helices (Arg389-Trp421 and Tyr390-Arg418). In the N-half, most of the atoms with elevated contact frequencies belong to two blocks: the YPWM motif and Tyr360 to Arg368. The contacts in the YPWM motif are anchored by the Tyr344-Trp346 π-π stacking (Fig. 4B inset). In addition, this motif can occasionally form long-range contacts with the second α-helix in the C-terminal region, including a hydrogen bond between Tyr344 and Arg393 and a cation-π interaction between Trp346 and Arg394 (Fig. 4B inset). These interactions explain the local R2 peak in residues Trp346-Met347. The interactions within the Tyr360 to Arg368 block (e.g., a Glu365-Arg368 salt bridge) and between Tyr373 with the third α-helix and between Thr374 and the first α-helix explain the rise in R2 to the level of the C-terminal helical region. Interestingly, Maiti and De58 recently showed that mutations that introduce three alternate oppositely charged residues between Met347 and Tyr360 rigidify the intervening region, presumably by forming salt bridges.
3.5. Backbone Dynamics of α-Synuclein.
The calculated and experimental14 R2, R1, and NOE values for α-synuclein are shown in Fig. 3C. In both sets of data, R1 and NOE are rather uniform along the sequence, but R2, with a mean value of ~ 3 s−1, shows local peaks at Tyr39, Lys96, and Asp121. α-Synuclein contains six imperfect repeats with a consensus sequence KTKEGV (Table 1). In five of these repeats, the fifth position is a glycine. R1 dips at the sixth position in each of these five repeats (vertical lines in Fig. 3C). Two adjacent Gly residues occupy positions 67 and 68. Each of the three relaxation properties dips prominently at Gly68.
The secondary structure propensities and contact numbers provide some explanations for the R2 local peaks. Both Leu38 and Val95 have relatively high propensities (~20%) for β-strands, while residues near Asp121 have PPII propensities close to 50%. Four proline residues, at positions 108, 117, 120, and 128, surround Asp121. Moreover, Tyr39 and those near Lys96 have the highest contact numbers. The contact map, with a frequency threshold of 0.005, provides more details (Fig. S13). Tyr39 is upstream of the fourth KTKE motif and can hydrogen bond with Ser42 (Fig. S13 inset); the latter can also form additional hydrogen bonds with Thr44 and Lys45 from the fourth KTKE motif. These interactions explain the elevated R2 of Tyr39. Lys96 starts a block of residues with elevated frequencies for contact formation, including a salt bridge between Lys97 and Asp98 and hydrogen bonding of Lys96 with Gln99 (Fig. S13 inset) and of Asn103 with Gln 99 and Glu105. Furthermore, Lys96, along with Gln99, can also form long-range contacts with Asp121 (Fig. S13 inset). These interactions explain the elevated R2 of Lys96. In addition to the just-noted long-range contacts, the elevated R2 of Asp121 is also contributed by its location inside a block of residues, Pro108-Pro128, that contain four Pro residues and thus have high propensities for forming PPII.
3.6. Backbone Dynamics of the Other Five IDPs.
HOX-SCR is a very close homologue of HOX-DFD and their relaxation properties show very similar sequence dependences (Figs. 3B and S8A). Likewise β- and α-synuclein have significant sequence similarity, and their relaxation properties are all pretty uniform along the sequence, except for minor R2 peaks and a few R1 dips around glycines (Figs. 3C and S9A). Compared to the HOX proteins and the synucleins, SEV-NT is intermediate both in the level of secondary structure formation and in the variation of relaxation properties over the sequence (Fig. S10A). Whereas the three α-helices of the HOX proteins are intact, the α-helix (Val476 to Gln492) in SEV-NT is only transient: the helical propensities peak at 90% over three residues but taper off on both sides. R2 exhibits a significant elevation in the middle of this transient helix, though the MD results do not rise as much as the experimental values11. NOE also shows a modest rise whereas R1 shows a depression over the helical region. NOE and R1 are otherwise very uniform, but R2 shows a local peak at Ala450 and dips at Gly415 and Gly456. That glycines promote fast dynamics is now a familiar occurrence. The elevated R2 of Ala450 can be explained by its proximity to a block of residues, Gln436-Lys448, that tends to form extensive interactions, including Arg439-Asp437, Arg439-Glu444, and Arg448-Glu444 salt bridges, Gln436-Arg439 and Trp441-Glu444 hydrogen bonds, and Arg448-Trp441 cation-π interaction. Arg439 can also form salt bridges with upstream Asp427, Glu429, and Glu431, as well as Asp494 that is downstream of the transient helix. Glu429 can also form a salt bridge with Arg482 that is within the helical region. These long-range interactions help keep SEV-NT compact (Fig. S2D).
Tau K18 and Al-LCD happen to have similar sequence lengths (129 and 131 residues, respectively). While Al-LCD has propensities (~20%) for β-strands in some residues, both of these IDPs lack stable secondary structures (Figs. S11A and S12A, top). However, as already noted above, whereas the overall size of tau K18 is typical of IDPs, Al-LCD is much more compact (Figs. S2E,F). Correspondingly, slow dynamics is largely absent in tau K18 but prominent in Al-LCD. In tau K18, R2 is below 5 s−1 for all but a few residues (e.g., Lys311 and Val313) (Figs. S11A). In contrast, in A1-LCD, R2 is above 5 s−1 for essentially all non-terminal residues, and reaches 9 s−1 for Tyr61-Phe64 and Asn79-Gln80 (Figs. S12A). The contact maps explain the difference. In tau K18, contacts with frequencies above 0.01 are sparse and all local (Fig. S14A). The sequence of this IDP can be divided into four imperfect repeats ending with PGGG (Table 1 and Fig. S14A). All the three relaxation properties show prominent dips at the last glycine of each repeat (Figs. S11A). That observation, along with the lack of any significant contact between the repeats, suggests that the four repeats behave dynamically as independent units. In the third repeat, the relatively higher R2 values of Lys311 and Val313 can be accounted for by local contacts including a salt bridge between Lys311 and Asp314 and a hydrogen bond between Asp314 and Ser316 (Fig. S14A), and additionally by some rigidification provided by Pro312.
The sequence of Al-LCD contains three GGGG motifs, with the third Gly at positions 52, 74, and 103 (Table 1). R2 dips at these positions; the dips are particularly prominent for the last two GGGG motifs, where dips in NOE also occur (Figs. S12A). The contact map of Al-LCD shows three blocks of residues, Phe19-Arg49, Ser57-Phe71, and Asn76-Phe84, with elevated frequencies for extensive contacts (Fig. S14B). It is clear that the GGGG motifs demarcate these blocks. Within the second block, hydrogen bonds can be formed by Ser57 with Asp59, Asp59 with Asn62, Asn62 with Asn66, Asn66 with Asn70, and Asp67 with Ser69, and an amino-π interaction can be formed between Asn70 and Phe71. This block contains five Gly residues, and it appears that the flexibility of these Gly residues facilitates neighboring residues in forming multiple modes of interactions, e.g., Asp59 hydrogen bonding with either Ser57 or Asn62. In addition, residues in this block can also form long-range interactions, including hydrogen bonding of Tyr54 with Ser16, Asn62 with Gln113 and Asn79, and Asn66 with Asn91, cation-π interactions of Tyr61 (and Tyr54) with Arg95 (Fig. S14B inset) and of Phe64 with Arg95, and an amino-π interaction of Tyr61 with Asn91. Again, intervening Gly residues, e.g., between Asn91 and Arg95, appear to facilitate the multiple modes of interactions. These extensive networks of interactions explain the elevated R2 of Tyr61-Phe64. In the third block, hydrogen bonds can form between Asn76 and Asn78 or Gln80, between Ser81 and Asn83, and an amino-π interaction can form between Asn83 and Phe84, along with the already noted inter-block hydrogen bond between Asn62 and Asn79. These interactions explain the elevated R2 of Asn79-Gln80.
The calculated relaxation properties exhibit good reproducibility among the four replicate simulations for each IDP. For example, for Aβ40, the standard deviations calculated among the replicate simulations (Fig. S15A) and then averaged over all the residues are 0.35 s−1, 0.07 s−1, and 0.09 for R2, R1, and NOE, respectively. These values are smaller than the RMSEs of the MD predictions when benchmarked against the experimental results: 0.38 s−1, 0.17 s−1, and 0.14. Likewise, for A1-LCD, the standard deviations (Fig. S15B) average out to be 1.4 s−1, 0.08 s−1, and 0.1 for R2, R1, and NOE, respectively. The average standard deviation for R2 is on par with the RMSE (experimental data for R1 and NOE were not available).
All the contact maps presented above were calculated using a 3.5 Å cutoff between side-chain heavy atoms. To see whether the cutoff distance affects the conclusions drawn, we also calculated contact maps at a 5 Å cutoff. The results are shown in Fig. S16A for HOX-DFD and S16B for tau K18. By raising the contact-frequency threshold to compensate for the relaxed cutoff distance and select the same number of residues with elevated contact frequencies, we found 44 shared residues (out of 51) between the 3.5-Å and 5-Å maps for HOX-DFD, and 35 shared residues (out of 47) for tau K18. The contact patterns of the 3.5-Å and 5-Å maps are also very similar, except that the latter are denser around bulky residues and have a little less long-range contacts due to the raised threshold.
3.7. Time Constants and Amplitudes of Three-Exponential Fits.
Our NMR relaxation properties were calculated from the three-exponential fits of backbone NH time correlation functions. The time constants and amplitudes of these fits provide additional insight into the backbone dynamics of the IDPs. We display the distributions of the three time constants for each IDP in Fig. S17, showing good separation. The ranges of τ1, τ2, and τ3 are 3–8 ns, 1–2 ns, and 0.1–0.2 ns, respectively. The means and standard deviations of each time constant and the corresponding amplitude, calculated among all the residues of each IDP, are collected in Table S2; the values for individual residues are shown in Figs. 5 and S8B,C to S12B,C. The mean τ1 values separate the 8 IDPs into two groups. The first group, with mean τ1 close to or even above 6 ns, consists of HOX-DFD, HOX-SCR, and SEV-NT, all of which have stable or well-populated α-helices, and Al-LCD, which form numerous side chain-side chain contacts. The other four IDPs, with mean τ1 around 4 ns, are fully disordered and form much fewer side chain-side chain contacts than A1-LCD (β-synuclein has a mean τ1 at 5.2 ns but that is probably overestimated). The mean τ2 values of the first group are close to 1.6 ns and somewhat higher than those the second group, which are close to 1 ns. However, there is no distinction in mean τ3 between the two groups of IDPs. The first group has approximately double the mean A1 (at ~0.6) but half the mean A2 (at ~0.2) and mean A3 (at 0.1) of the second group. Correspondingly, A1 is the dominant amplitude in the first group but A1 and A2 are roughly equal in the second group. It is clear that the formation of secondary structures and side chain-side chain interactions promotes slow dynamics and suppresses fast dynamics.
Figure 5. Time constants and amplitudes from fitting NH time correlation functions to a sum of three exponentials.

(A-B) Aβ40. (C-D) HOX-DFD. (E-F) α-synuclein. In panels (A), (C), and (E), the left ordinate displays the scale for τ1 and τ2 whereas the right ordinate displays the scale for τ3. In panels (B), (D), and (F), the sum of the amplitudes, Asum, was not constrained to 1 and is shown in black; the missing amplitude is due to ultrafast motions. Horizontal dashed lines show average values over the N-half and C-half (demarcated by black vertical lines) or over the entire sequence. For α-synuclein, vertical lines are placed in panel (E) at Tyr39, Lys96, and Asp121 to indicate elevated τ1, and in panel (F) at residues 15, 26, 37, 48, 68, and 85 to indicate dips in Asum.
Next we examine the sequence dependences of the time constants and amplitudes of the IDPs. Figures 5C and S8B show that, for HOX-DFD and HOX-SCR, the helical C-half has moderately higher τ1, slightly higher τ2, and similar τ3 when compared with the disordered N-half. More interestingly, as shown in Fig. 5D and S8C, going from the N-half to the C-half, there is a significant increase in A1 along with significant decreases in A2 and A3. Correspondingly, A1 and A2 are approximately equal in the N-half but A1 dominates in the C-half. The contrasts in time constants and amplitudes between the C- and N-halves of the two HOX proteins mirror precisely the contrasts between the IDPs as two separate groups, one rich in structures or interactions and the other lacking such features. The same contrasts apply to SEV-NT when the helical region is compared to the non-helical region, except that the dominance of A1 extends to most of the non-helical region as well (Fig. S10B,C), due to the long-range contacts detailed above. Likewise, due to the extensive local and long-range contacts, A1 dominates nearly the entire sequence of A1-LCD (Fig. S12B,C). The opposite scenario is represented by α-synuclein and tau K18, which are disordered and lack significant long-range contacts (Figs. S13 and S14A), and correspondingly motions on the intermediate timescale have the largest amplitudes (Figs. 5F and S11C). β-Synuclein is similar to α-synuclein, except that A1 is overestimated and becomes comparable to A2 (Fig. S9C). Aβ40 represents an intermediate scenario, where the N-half with extensive contacts has higher A1 than A2 but the C-half has comparable A1 and A2 (Fig. 5B).
Lastly let us take a closer look at the time constants and amplitudes of some residues noted above for elevated or depressed relaxation rates. The elevated R2 at Tyr39, Lys96, and Asp121 in α-synuclein (Fig. 3C) and at Lys311 and Val313 in tau K18 (Fig. S11A) can both be attributed to elevated τ1 (Figs. 5E and S11B), but the elevated R2 at Ala450 of SEV-NT (Fig. S10A) and at Tyr61-Phe64 and Asn79-Gln80 in A1-LCD (Fig. S12A) is due to elevated A1 (Figs. S10C and S12C). Many Gly residues have been seen to promote fast dynamics, as evidenced by dips in R2 and/or R1, including six in α-synuclein (Fig. 3C), five in β-synuclein (Fig. S9A), two in SEV-NT (Fig. S10A), four in tau K18 (Fig. S11A), and three in A1-LCD (Fig. S12A). All these Gly residues (or their immediate neighbors) can be recognized by dips in Asum (Figs. 5F and S9C to S12C). In some cases the dips in Asum are specifically due to dips in A1 (Figs. S10C to S12C) and occasionally are also be accompanied by dips in τ1 (Fig. S10B).
4. DISCUSSION
Based on experimentally validated MD simulations, we have characterized the sequence-dependent backbone dynamics of a variety of IDPs. The dynamics spans a wide range of timescales, from sub-ps to 10 ns, and the distribution of amplitudes on these timescales can vary greatly from IDP to IDP. Slow dynamics can be promoted by two mechanisms. The first mimics what happens in structured proteins, through the formation and packing of secondary structures, as exemplified by the helical C-half of the two HOX proteins. The second is through extensive local and long-range contacts, including hydrogen bonds, salt bridges, and cation- and amino-π interactions. Local contacts can rigidify a block of residues, as illustrated by the N-half of Aβ40. When long-range contacts are also formed, as typified by Al-LCD, the entire IDP becomes relatively compact and tumbles as a globule without any secondary structures. SEV-NT represents a hybrid of the above two mechanisms, where an α-helix is well-populated and also engages in long-range contacts.
Glycine is flexible and generally promotes fast dynamics. The flexibility allows glycine to play opposite roles in IDP dynamics. On the one hand, Gly repeats are very effective in interrupting rigid blocks, as illustrated by three GGGG motifs in A1-LCD and GGG and GG in SEV-NT. In tau K, four GGG motifs effectively break the IDP into four dynamically independent units. On The other hand, as seen in A1-LCD, Gly residues facilitate multiple modes of local interactions within rigid blocks and multiple modes of long-range contacts between rigid blocks.
The present work has further validated the force-field combination, AMBER ff14SB26 for proteins and TIP4P-D27 for water, to be reliable for MD simulations of IDPs. Both conformational (e.g., SAXS and chemical shifts) and dynamic properties (NMR relaxation) are predicted well without any input from experiments. Some of the 8 IDPs studied here have been simulated previously using other force-field combinations (e.g., CHARMM27/TIP3P), resulting in high levels of secondary structures for Aβ4031, 59 and overly compact ensembles for tau K18 (Rg ~ 16 Å)60 and α-synuclein (Rg ~ 14 Å)61. Our Aβ40 simulations started from snapshots generated in one of these studies31, but these initial secondary structures melted in the present simulations so Aβ40 became fully disordered, in agreement with NMR data46 and with MD simulations using IDP-specific force fields.19 The NMR study46 concluded β-strand formation for the Leu17-Phe20 stretch. Although this conclusion is supported by the negative values of the secondary chemical shift parameter ΔCα - ΔCβ in this stretch of residues (Fig. 2A, black curve), our MD simulations largely reproduced these negative values (Fig. 2A, red curve) but the β-strand content of these residues in the simulations is still miniscule (Fig. 3A, top). The lesson is two-fold. First, caution should be exercised when interpreting secondary chemical shifts with moderate amplitudes. Second, when validating MD simulations, it is important to make direct quantitative comparison between measured properties (e.g., chemical shifts) and those calculated from the simulations, rather than relying on qualitative interpretations of experimental data. For tau K18 and α-synuclein, our average Rg values (36 Å and 32 Å; Fig. S2E,H) are more than twice of those obtained in the previous simulations60–61 and are validated by SAXS data for both proteins (Fig. S1A,C). Still, even in our simulations, an IDP can become trapped in overly compact conformations, as occurred in the case of β-synuclein (Fig. S2G), so continued refinement of force fields and development of sampling strategies are needed for IDPs. An earlier NMR study of tau K18, based on Cα and C′ secondary chemical shifts, deduced the highest β-strand contents, at 16% and 24%, respectively, for Lys274-Asp283 and Ser305-Asp31462. It should be noted that Cα and C′ chemical shifts are not as reliable as Cβ chemical shifts in distinguishing β-strands from random coils63. Cβ chemical shifts of tau K18 became available in a later NMR study10, and their ΔCα - ΔCβ data are reproduced well by our MD simulations (Fig. S3C). In our simulations, the Lys274-Asp283 and Ser305-Asp314 stretches are local maxima in β-strand propensities, but the peak is only at 7% (Fig. S11, top).
We have fit NH time correlation functions to a sum of three exponentials. The three time constants fall in the ranges of 3–8 ns (τ1), 1–2 ns (τ2), and 0.1–0.2 ns (τ3) for all the IDPs. The greatest difference in dynamics between different regions of an IDP and among different IDPs is captured by the motional amplitudes of the different timescales. A1, the amplitude in the slow timescale, is dominant in the entire A1-LCD and SEV-NT, in the helical C-half of the two HOX proteins, and in the N-half of Aβ40, reflecting the effects of secondary structures and local and long-range interactions. In contrast, A2, the amplitude in the intermediate timescale, is either dominant or on par with A1 in the entire tau K18, α- and β-synuclein, in the disordered N-half of the two HOX proteins, and in the C-half of Aβ40, reflecting the lack of long-range or even local interactions. The A1-dominated IDPs or regions thereof also have a slightly higher Asum than their A2-dominated counterparts, indicating that the suppression of slower dynamics in the former group extends all the way to the ultrafast (i.e., ps and sub-ps) timescale. Lastly the A1-dominated group has somewhat longer τ1 (and τ2 to a lesser extent) than the A2-dominated group.
It is instructive to compare the time constants and amplitudes of the IDPs with those of structured globular proteins. In the latter case the NH time correlation functions could be fit to the sum of two exponentials, with one time constant (τG) for ns global tumbling and the other (τL) for fast local motion29. τG scales linearly with the number of residues (τG ≈ 0.051N) whereas τL fluctuates around 25 ps. The typical amplitudes of these two types of motions are 0.85 (AG) and 0.05 (AL). The τ1 motion of IDPs has some resemblance to the global tumbling of globular proteins, but differs in important ways. First, while the helical region of the two HOX proteins and the few most highly helical residues of SEV-NT have A1 amplitudes that approach the typical value of 0.85 for AG (Figs. 5D, S8C, and S10C), other IDP residues, especially those in the IDPs that lack long-range contacts, have much lower A1 amplitudes. Second, the mean τ1 values of the IDPs lack dependence on sequence length. The expected τG value for a 40-residue protein like Aβ40 is 2.0 ns and that for a 140-residue like α-synuclein is 7.1 ns. However, Aβ40 has a mean τ1 of 3.8 ns and α-synuclein has a mean τ1 of 4.4 ns. Relative to the expected τG, the higher mean τ1 of Aβ40 can be explained by its conformational ensemble being more open than the structure of a globular protein of the same sequence length, whereas the lower mean τ1 of α-synuclein can be rationalized if each of its residues belongs to a dynamically independent unit that is much shorter than the full-length protein. The fact that both Aβ40 and α-synuclein have a mean τ1 around 4 ns suggests that, for IDPs that lack well-populated secondary structures and long-range contacts, such dynamically independent units may be as short as 30 residues. As noted above, tau K18 may be divided into four such units demarcated by PGGG motifs, and the length of each unit is about 30 residues. So apparently two factors pull the τ1 values of IDPs in opposite directions: being more open than globular proteins could lead to higher τ1, but the accompanying loss of long-range contacts could lead to lower τ1.
The fast local motion seen in globular proteins with a very small amplitude does not show up in our three-exponential fits for IDPs, possibly being lumped into the missing amplitude. Instead, τ2 and τ3 emerge. An interesting question is motions on these timescales involve how many residues. If a 4-ns τ1 indeed involves 30 residues, then we could estimate a 1-ns τ2 could involve perhaps 5–7 residues, and a 0.15-ns τ3 might involve at most the nearest neighbors.
Sequence-dependent backbone dynamics can potentially code for functions or contribute to disease mechanisms, by endowing IDPs with versatile response to binding partners. In a previous study, based on sequence-dependent dynamics, we proposed that the rigid N-half of ChiZ-NT would be recalcitrant to interactions with binding partners whereas the flexible C-half would readily adapt to binding partners25. This prediction was verified by a subsequent study on the association of ChiZ with acidic membranes, showing that the C-half is dominant in forming membrane contacts28. A similar mechanism may be at work in the nucleation of Aβ40 fibrils, where the flexible C-half may readily undergo a conformation transition to form a β-sheet between monomers under a concentrated condition (Fig. 6A). Likewise tau K18 may utilize its flexibility to achieve multiple modes of binding with microtubule. On the other hand, the preformed three-helix bundle allows the HOX transcription factors to recognize their DNA target, while the dynamic N-terminal tail, which occasionally forms intramolecular contacts with the helix bundle in the free state, now binds to the co-transcription factor to provide further stabilization64 (Fig. 6B). Similarly, the nascent α-helix in SEV-NT recognizes the target protein and becomes further rigidified on the target surface65. Note that most of the local and long-range intramolecular contacts in free SEV-NT can still form in the bound state; a notable exception is the salt bridge between Glu429 and Arg482, as Arg482 now engages in intermolecular interactions with the target protein. It is also likely that many intramolecular contacts of A1-LCD become intermolecular under a concentrated condition, thereby enabling condensate formation17. Intermolecular interactions in turn dictate dynamic properties of biomolecular condensates66. Knowledge of sequence-dependent dynamics and its origins provides deep insight into functional and disease mechanisms.
Figure 6. Mechanisms by which sequence-dependent backbone dynamics of IDPs codes for functions or contributes to diseases.

(A) The fast dynamics of the C-half of Aβ40 allows it to readily form a β-sheet between monomers and nucleate fibrillation. (B) The rigid three-helix bundle of a HOX transcription factor allows it to recognize the DNA target; the dynamic N-terminal region can form intramolecular contacts in the free state but engage with the co-transcription factor to further stabilize the complex.
Supplementary Material
ACKNOWLEDGMENT
We thank Alan Hicks and Ramash Prasad for technical assistance. This work was supported by National Institutes of Health Grant R35 GM118091.
ABBREVIATIONS
- A1-LCD
low-complexity domain of heterogeneous nuclear ribonucleoprotein A1
- ChiZ-NT
N-terminal region of ChiZ
- IDP
intrinsically disordered protein
- MD
molecular dynamics
- PPII
polyproline II
- PRE
paramagnetic relaxation enhancement
- SAXS
small-angle X-ray scattering
- SEV-NT
C-terminal domain of the nucleoprotein of Sendai virus
Footnotes
Supporting Information. Two tables listing simulation details and time constants, and 17 figures displaying SAXS profiles; radii of gyration; experimental validation of MD conformations; PREs; fits of NH time correlation functions and the resulting residuals; transverse relaxation rates; residue-specific backbone dynamics of five proteins; contact maps of four proteins; reproducibility of predicted NMR relaxation properties; and histograms of time constants (PDF)
This information is available free of charge via the Internet at http://pubs.acs.org
The authors declare no competing financial interests.
REFERENCES
- 1.Xue B; Dunker AK; Uversky VN Orderly order in protein intrinsic disorder distribution: disorder in 3500 proteomes from viruses and the three domains of life. J Biomol Struct Dyn 2012, 30, 137–49. [DOI] [PubMed] [Google Scholar]
- 2.Uversky VN Introduction to Intrinsically Disordered Proteins (IDPs). Chemical Reviews 2014, 114, 6557–6560. [DOI] [PubMed] [Google Scholar]
- 3.Busche MA; Hyman BT Synergy between amyloid-beta and tau in Alzheimer’s disease. Nat Neurosci 2020, 23, 1183–1193. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Stefanis L alpha-Synuclein in Parkinson’s disease. Cold Spring Harb Perspect Med 2012, 2, a009399. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Wu D; Zhou HX Designed Mutations Alter the Binding Pathways of an Intrinsically Disordered Protein. Sci Rep 2019, 9, 6172. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Zhou HX Intrinsic disorder: signaling via highly specific but short-lived association. Trends Biochem Sci 2012, 37, 43–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Berlow RB; Martinez-Yamout MA; Dyson HJ; Wright PE Role of backbone dynamics in modulating the interactions of disordered ligands with the TAZ1 domain of the CREB-binding protein. Biochemistry 2019, 58, 1354–1362. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Berlow RB; Dyson HJ; Wright PE Multivalency enables unidirectional switch-like competition between intrinsically disordered proteins. Proc Natl Acad Sci U S A 2022, 119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Bertoncini CW; Rasia RM; Lamberto GR; Binolfi A; Zweckstetter M; Griesinger C; Fernandez CO Structural characterization of the intrinsically unfolded protein beta-synuclein, a natural negative regulator of alpha-synuclein aggregation. J Mol Biol 2007, 372, 708–22. [DOI] [PubMed] [Google Scholar]
- 10.Barre P; Eliezer D Structural transitions in tau k18 on micelle binding suggest a hierarchy in the efficacy of individual microtubule-binding repeats in filament nucleation. Protein Sci 2013, 22, 1037–48. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Abyzov A; Salvi N; Schneider R; Maurin D; Ruigrok RW; Jensen MR; Blackledge M Identification of Dynamic Modes in an Intrinsically Disordered Protein Using Temperature-Dependent NMR Relaxation. J Am Chem Soc 2016, 138, 6240–51. [DOI] [PubMed] [Google Scholar]
- 12.Gill ML; Byrd RA; Palmer AG III Dynamics of GCN4 facilitate DNA interaction: a model-free analysis of an intrinsically disordered region. Phys Chem Chem Phys 2016, 18, 5839–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Khan SN; Charlier C; Augustyniak R; Salvi N; Dejean V; Bodenhausen G; Lequin O; Pelupessy P; Ferrage F Distribution of Pico- and Nanosecond Motions in Disordered Proteins from Nuclear Spin Relaxation. Biophys J 2015, 109, 988–99. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Rezaei-Ghaleh N; Parigi G; Soranno A; Holla A; Becker S; Schuler B; Luchinat C; Zweckstetter M Local and global dynamics in intrinsically disordered synuclein. Angew Chem Int Ed Engl 2018, 57, 15262–15266. [DOI] [PubMed] [Google Scholar]
- 15.Maiti S; Acharya B; Boorla VS; Manna B; Ghosh A; De S Dynamic Studies on Intrinsically Disordered Regions of Two Paralogous Transcription Factors Reveal Rigid Segments with Important Biological Functions. J Mol Biol 2019, 431, 1353–1369. [DOI] [PubMed] [Google Scholar]
- 16.Rezaei-Ghaleh N; Parigi G; Zweckstetter M Reorientational Dynamics of Amyloid-beta from NMR Spin Relaxation and Molecular Simulation. J Phys Chem Lett 2019, 10, 3369–3375. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Martin EW; Holehouse AS; Peran I; Farag M; Incicco JJ; Bremer A; Grace CR; Soranno A; Pappu RV; Mittag T Valence and patterning of aromatic residues determine the phase behavior of prion-like domains. Science 2020, 367, 694–699. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Lipari G; Szabo A Model-free approach to the interpretation of nuclear magnetic resonance relaxation in macromolecules. 1. Theory and range of validity. Journal of the American Chemical Society 2002, 104, 4546–4559. [Google Scholar]
- 19.Meng F; Bellaiche MMJ; Kim JY; Zerze GH; Best RB; Chung HS Highly disordered amyloid-beta monomer probed by single-molecule FRET and MD simulation. Biophys J 2018, 114, 870–884. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Robustelli P; Piana S; Shaw DE Developing a molecular dynamics force field for both folded and disordered protein states. Proc Natl Acad Sci U S A 2018, 115, E4758–E4766. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Shrestha UR; Juneja P; Zhang Q; Gurumoorthy V; Borreguero JM; Urban V; Cheng X; Pingali SV; Smith JC; O’Neill HM; Petridis L Generation of the configurational ensemble of an intrinsically disordered protein from unbiased molecular dynamics simulation. Proc Natl Acad Sci U S A 2019, 116, 20446–20452. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Fawzi NL; Ying J; Ghirlando R; Torchia DA; Clore GM Atomic-resolution dynamics on the surface of amyloid-beta protofibrils probed by solution NMR. Nature 2011, 480, 268–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.von Bergen M; Friedhoff P; Biernat J; Heberle J; Mandelkow EM; Mandelkow E Assembly of tau protein into Alzheimer paired helical filaments depends on a local sequence motif ((306)VQIVYK(311)) forming beta structure. Proc Natl Acad Sci U S A 2000, 97, 5129–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Kampf K; Izmailov SA; Rabdano SO; Groves AT; Podkorytov IS; Skrynnikov NR What drives (15)N spin relaxation in disordered proteins? combined NMR/MD study of the H4 histone tail. Biophys J 2018, 115, 2348–2367. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Hicks A; Escobar CA; Cross TA; Zhou HX Sequence-dependent correlated segments in the intrinsically disordered region of ChiZ. Biomolecules 2020, 10, 1–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Maier JA; Martinez C; Kasavajhala K; Wickstrom L; Hauser KE; Simmerling C ff14SB: improving the accuracy of protein side chain and backbone parameters from ff99SB. J Chem Theory Comput 2015, 11, 3696–713. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Piana S; Donchev AG; Robustelli P; Shaw DE Water dispersion interactions strongly influence simulated structural properties of disordered protein states. J Phys Chem B 2015, 119, 5113–23. [DOI] [PubMed] [Google Scholar]
- 28.Hicks A; Escobar CA; Cross TA; Zhou HX Fuzzy association of an intrinsically disordered protein with acidic membranes. JACS Au 2021, 1, 66–78. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Hicks A; MacAinsh M; Zhou HX Removing thermostat distortions of protein dynamics in constant-temperature molecular dynamics simulations. J Chem Theory Comput 2021, 17, 5920–5932. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Case DA; Ben-Shalom IY; Brozell SR; Cerutti DS; Cheatham TE; Cruzeiro VWD; Darden TA; Duke RE; Ghoreishi D; Gilson MK AMBER 2018, University of California, San Francisco. 2018. [Google Scholar]
- 31.Guo C; Zhou HX Fatty acids compete with Abeta in binding to serum albumin by quenching Its conformational flexibility. Biophys J 2019, 116, 248–257. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Yang J; Yan R; Roy A; Xu D; Poisson J; Zhang Y The I-TASSER Suite: protein structure and function prediction. Nat Methods 2015, 12, 7–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Kallberg M; Wang H; Wang S; Peng J; Wang Z; Lu H; Xu J Template-based protein structure modeling using the RaptorX web server. Nat Protoc 2012, 7, 1511–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Feldman HJ; Hogue CW Probabilistic sampling of protein conformations: new hope for brute force? Proteins 2002, 46, 8–23. [PubMed] [Google Scholar]
- 35.Pastor RW; Brooks BR; Szabo A An analysis of the accuracy of Langevin and molecular dynamics algorithms. Molecular Physics 1988, 65, 1409–1419. [Google Scholar]
- 36.Ryckaert J-P; Ciccotti G; Berendsen HJC Numerical integration of the cartesian equations of motion of a system with constraints: molecular dynamics of n-alkanes. J Comput Phys 1977, 23, 327–341. [Google Scholar]
- 37.Essmann U; Perera L; Berkowitz ML; Darden T; Lee H; Pedersen LG A smooth particle mesh Ewald method. J Chem Phys 1995, 103, 8577–8593. [Google Scholar]
- 38.Salomon-Ferrer R; Götz AW; Poole D; Le Grand S; Walker RC Routine Microsecond Molecular Dynamics Simulations with AMBER on GPUs. 2. Explicit Solvent Particle Mesh Ewald. J Chem Theory Comput 2013, 9, 3878–3888. [DOI] [PubMed] [Google Scholar]
- 39.Berendsen HJC; Postma JPM; van Gunsteren WF; DiNola A; Haak JR Molecular dynamics with coupling to an external bath. The Journal of Chemical Physics 1984, 81, 3684–3690. [Google Scholar]
- 40.Han B; Liu Y; Ginzinger SW; Wishart DS SHIFTX2: significantly improved protein chemical shift prediction. J Biomol NMR 2011, 50, 43–57. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Nielsen JT; Mulder FAA POTENCI: prediction of temperature, neighbor and pH-corrected chemical shifts for intrinsically disordered proteins. J Biomol NMR 2018, 70, 141–165. [DOI] [PubMed] [Google Scholar]
- 42.Schneidman-Duhovny D; Hammel M; Tainer JA; Sali A FoXS, FoXSDock and MultiFoXS: Single-state and multi-state structural modeling of proteins and their complexes based on SAXS profiles. Nucleic Acids Res 2016, 44, W424–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Tesei G; Martins JM; Kunze MBA; Wang Y; Crehuet R; Lindorff-Larsen K DEER-PREdict: Software for efficient calculation of spin-labeling EPR and NMR data from conformational ensembles. PLoS Comput Biol 2021, 17, e1008551. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Roe DR; Cheatham TE PTRAJ and CPPTRAJ: Software for Processing and Analysis of Molecular Dynamics Trajectory Data. J Chem Theory Comput 2013, 9, 3084–3095. [DOI] [PubMed] [Google Scholar]
- 45.McGibbon RT; Beauchamp KA; Harrigan MP; Klein C; Swails JM; Hernandez CX; Schwantes CR; Wang LP; Lane TJ; Pande VS MDTraj: A Modern Open Library for the Analysis of Molecular Dynamics Trajectories. Biophys J 2015, 109, 1528–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Hou L; Shao H; Zhang Y; Li H; Menon NK; Neuhaus EB; Brewer JM; Byeon IJ; Ray DG; Vitek MP; Iwashita T; Makula RA; Przybyla AB; Zagorski MG Solution NMR studies of the A beta(1–40) and A beta(1–42) peptides establish that the Met35 oxidation state affects the mechanism of amyloid formation. J Am Chem Soc 2004, 126, 1992–2005. [DOI] [PubMed] [Google Scholar]
- 47.Sung YH; Eliezer D Residual structure, backbone dynamics, and interactions within the synuclein family. J Mol Biol 2007, 372, 689–707. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Mylonas E; Hascher A; Bernado P; Blackledge M; Mandelkow E; Svergun DI Domain conformation of tau protein studied by solution small-angle X-ray scattering. Biochemistry 2008, 47, 10345–53. [DOI] [PubMed] [Google Scholar]
- 49.Ahmed MC; Skaanning LK; Jussupow A; Newcombe EA; Kragelund BB; Camilloni C; Langkilde AE; Lindorff-Larsen K Refinement of alpha-synuclein ensembles against SAXS data: comparison of force fields and methods. Front Mol Biosci 2021, 8, 654333. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Bernado P; Blackledge M A self-consistent description of the conformational behavior of chemically denatured proteins from NMR and small angle scattering. Biophys J 2009, 97, 2839–45. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Rao JN; Kim YE; Park LS; Ulmer TS Effect of pseudorepeat rearrangement on alpha-synuclein misfolding, vesicle binding, and micelle binding. J Mol Biol 2009, 390, 516–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Schwalbe H; Fiebig KM; Buck M; Jones JA; Grimshaw SB; Spencer A; Glaser SJ; Smith LJ; Dobson CM Structural and dynamical properties of a denatured protein. Heteronuclear 3D NMR experiments and theoretical simulations of lysozyme in 8 M urea. Biochemistry 1997, 36, 8977–91. [DOI] [PubMed] [Google Scholar]
- 53.Bertoncini CW; Jung YS; Fernandez CO; Hoyer W; Griesinger C; Jovin TM; Zweckstetter M Release of long-range tertiary interactions potentiates aggregation of natively unstructured alpha-synuclein. Proc Natl Acad Sci U S A 2005, 102, 1430–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Dedmon MM; Lindorff-Larsen K; Christodoulou J; Vendruscolo M; Dobson CM Mapping long-range interactions in alpha-synuclein using spin-label NMR and ensemble molecular dynamics simulations. J Am Chem Soc 2005, 127, 476–7. [DOI] [PubMed] [Google Scholar]
- 55.Salmon L; Nodet G; Ozenne V; Yin G; Jensen MR; Zweckstetter M; Blackledge M NMR characterization of long-range order in intrinsically disordered proteins. J Am Chem Soc 2010, 132, 8407–18. [DOI] [PubMed] [Google Scholar]
- 56.Peterson DW; Zhou H; Dahlquist FW; Lew J A soluble oligomer of tau associated with fiber formation analyzed by NMR. Biochemistry 2008, 47, 7393–404. [DOI] [PubMed] [Google Scholar]
- 57.Vemulapalli SPB; Becker S; Griesinger C; Rezaei-Ghaleh N Combined high-pressure and multiquantum NMR and molecular simulation propose a role for N-terminal salt bridges in amyloid-beta. J Phys Chem Lett 2021, 12, 9933–9939. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Maiti S; De S Identification of potential short linear motifs (SLiMs) in intrinsically disordered sequences of proteins by fast time-scale backbone dynamics. J Magn Reson Open 2022, 10–11, 100029. [Google Scholar]
- 59.Rosenman DJ; Connors CR; Chen W; Wang C; García AE Aβ Monomers Transiently Sample Oligomer and Fibril-Like Configurations: Ensemble Characterization Using a Combined MD/NMR Approach. J Mol Biol 2013, 425, 3338–3359. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Dong X; Bera S; Qiao Q; Tang Y; Lao Z; Luo Y; Gazit E; Wei G Liquid–Liquid Phase Separation of Tau Protein Is Encoded at the Monomeric Level. J Phys Chem Lett 2021, 12, 2576–2586. [DOI] [PubMed] [Google Scholar]
- 61.Brodie NI; Popov KI; Petrotchenko EV; Dokholyan NV; Borchers CH Conformational ensemble of native α-synuclein in solution as determined by short-distance crosslinking constraint-guided discrete molecular dynamics simulations. PLoS Comput Biol 2019, 15, e1006859. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Mukrasch MD; Biernat J; von Bergen M; Griesinger C; Mandelkow E; Zweckstetter M Sites of Tau Important for Aggregation Populate β-Structure and Bind to Microtubules and Polyanions*. J Biol Chem 2005, 280, 24978–24986. [DOI] [PubMed] [Google Scholar]
- 63.Wang Y; Jardetzky O Probability-based protein secondary structure identification using combined NMR chemical-shift data. Protein Sci 2002, 11, 852–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Joshi R; Passner JM; Rohs R; Jain R; Sosinsky A; Crickmore MA; Jacob V; Aggarwal AK; Honig B; Mann RS Functional specificity of a Hox protein mediated by the recognition of minor groove structure. Cell 2007, 131, 530–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Houben K; Marion D; Tarbouriech N; Ruigrok RW; Blanchard L Interaction of the C-terminal domains of sendai virus N and P proteins: comparison of polymerase-nucleocapsid interactions within the paramyxovirus family. J Virol 2007, 81, 6807–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Ghosh A; Kota D; Zhou HX Shear relaxation governs fusion dynamics of biomolecular condensates. Nat Commun 2021, 12, 5995. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
