Abstract
Diffusion measurements by pulsed-field gradient NMR and fluorescence correlation spectroscopy can be used to probe the hydrodynamic radius of proteins, which contains information about the overall dimension of a protein in solution. The comparison of this value with structural models of intrinsically disordered proteins is nonetheless impaired by the uncertainty of the accuracy of the methods for computing the hydrodynamic radius from atomic coordinates. To tackle this issue, we here build conformational ensembles of 11 intrinsically disordered proteins that we ensure are in agreement with measurements of compaction by small-angle x-ray scattering. We then use these ensembles to identify the forward model that more closely fits the radii derived from pulsed-field gradient NMR diffusion experiments. Of the models we examined, we find that the Kirkwood-Riseman equation provides the best description of the hydrodynamic radius probed by pulsed-field gradient NMR experiments. While some minor discrepancies remain, our results enable better use of measurements of the hydrodynamic radius in integrative modeling and for force field benchmarking and parameterization.
Significance
Accurate models of the conformational properties of intrinsically disordered proteins rely on our ability to interpret experimental data that report on the conformational ensembles of these proteins in solution. Methods to calculate experimental observables from conformational ensembles are central to link experiments and computation, for example, in integrative modeling or the assessment of molecular force fields. Benchmarking such methods is, however, difficult for disordered proteins because it is difficult to construct accurate ensembles without using the data. Here, we circumvent this problem by combining independent measures of protein compaction to test several methods to calculate the hydrodynamic radius of a disordered protein, as measured by pulsed-field gradient NMR diffusion experiments, and find the Kirkwood-Riseman model to be most accurate.
Introduction
Intrinsically disordered proteins and regions (here collectively termed IDPs) are highly flexible molecules in solution and they should therefore be described as ensembles of different conformations. The biological function of IDPs is often linked to their dynamics and therefore the knowledge of the conformational ensemble can be helpful in understanding their functions (1,2). Integrative modeling approaches are often used to study the conformational ensembles of IDPs (3,4,5,6,7,8). Here, experiments typically probe ensemble-averaged structural information, and are interpreted using computational methods to generate structures at atomic or coarse-grained resolutions.
A key property that describes the conformation of an IDP is its average dimension. For example, the expansion of an IDP determines its “capture radius” for binding (9) and is correlated with its propensity to phase separate (10,11). Until relatively recently, the force fields used in all-atom and certain coarse-grained molecular dynamics simulations led to conformational ensembles that were too compact (12,13,14,15,16,17,18). Experimentally, compaction may be probed by, for example, small-angle x-ray scattering (SAXS) (19), pulsed-field gradient (PFG) nuclear magnetic resonance (NMR) diffusion experiments (20), fluorescence correlation spectroscopy (21), and dynamic light scattering (22).
Comparison of experiments and simulations is often based on so-called forward models that enable the calculation of experimental observables (or close proxies) from atomic (or coarse-grained) coordinates. Forward models play a key role in integrative modeling and force field assessment. Developing accurate forward models for IDPs is, however, complicated by the lack of precise conformational ensembles that can be used to train and parametrize these models (23). Instead, forward models are generally developed and benchmarked for folded and relatively static proteins, whose structures may more easily and accurately be determined, but it is not always clear how well these models are transferable to highly dynamic, unfolded, and disordered proteins.
Here, we examine the accuracy of methods to calculate the hydrodynamic radius of IDPs from conformational ensembles and the comparison to PFG NMR diffusion measurements. The is a measure of the overall dimension of a protein as it represents the radius of a sphere that diffuses with the same translational diffusion coefficient of the protein, and may conveniently be probed via PFG NMR experiments. In these, is probed via monitoring the effects of a nonuniform magnetic field (defined by a gradient strength, ), in a spin echo NMR experiment. Depending on how far the protein has moved in the sample during a set diffusion time, , different levels of signal decays are observed (20,24,25,26). In practice, this is often detected by integrating a specific region of the NMR spectrum to measure the signal intensity, , and varying the gradient strength, . This profile can then be fitted to the Stejskal-Tanner equation to obtain (20):
(1) |
Here, is the gyromagnetic ratio and is the length of the gradient. The value of may then be obtained from either via the Stokes-Einstein equation, when the solvent viscosity is known:
(2) |
where is the Boltzmann constant, is the temperature, and is the solvent viscosity, or, and most often used, indirectly using an internal reference with known :
(3) |
Different forward models have been proposed to calculate from atomic coordinates (27,28,29). In particular, HYDROPRO (27) and HYDROPRO-derived models (30) are widely used to compare values obtained by PFG NMR diffusion experiments to conformational ensembles of IDPs from molecular simulations (15,31,32,33,34,35). Despite this, differences of ∼20% between the results provided by different forward models have been observed (30,36).
Given the widespread use of measurements for constructing conformational ensembles and the potential for assessing and improving force fields, we decided to assess the accuracy of different forward models for . Having an accurate forward model, for example, makes it possible to provide a more fine-grained assessment of force field accuracy. In the context of integrative modeling, a conformational ensemble of an IDP can be pushed to be either more expanded or more compact to fit the experimental data depending on the forward model used. The first step in our work was thus to generate conformational ensembles of IDPs that are accurate in terms of reproducing their overall dimensions, but without using the PFG NMR diffusion measurements. We did so by using state-of-the-art computational methods for sampling the overall dimensions of IDPs, and further used SAXS data to benchmark and improve the agreement with independent experimental data that also provide information on the average dimension of proteins in solution (Fig. 1). While SAXS and NMR diffusion experiments may probe different aspects of compaction (34), we assume that these differences are small and would vary between different proteins. Thus, we used the SAXS-refined conformational ensembles as input to different forward models for and compared the results with experiments. As data for benchmarking the forward models, we chose 11 IDPs with varying lengths (24–441 residues) and amino acid compositions and recorded both SAXS and PFG NMR diffusion data unless data were already available in literature. We find that, for this diverse set of proteins, the Kirkwood-Riseman equation (28) gives a better agreement between our ensembles and the measured hydrodynamic radii.
Materials and methods
Protein purifications and experimental conditions
The growth hormone receptor intracellular domain
For SAXS experiments, the growth hormone receptor intracellular domain (GHR-ICD) (residues 270–620) was expressed and purified as described in (37). For NMR experiments, GHR-ICD was expressed as a His6-SUMO fusion protein (His6-SUMO-GHR-ICD) in E. coli BL21(DE3) cells, transformed by heat shock transformation. One liter of LB medium supplemented with 50 g L−1 kanamycin was inoculated with a preculture, grown at 37°C. At OD600 of 0.6–0.8, expression was induced by addition of 1 mM isopropyl -D-1-thiogalactopyranoside and grown for 4 h. Cells were harvested by centrifugation at 5000 × g for 15 min at 4°C and the pellet resuspended in 25 mL lysis buffer (50 mM Tris-HCl, 150 mM NaCl, 10 mM imidazole, 10 mM -mercaptoethanol [ME], 1 mM phenylmethylsulfonyl fluoride, 1 tablet ethylenediaminetetraacetic acid-free protease inhibitor [Roche Diagnostics, Copenhagen, Denmark] [pH 8]), and lysed using a French pressure cell disrupter (Constant Systems MC Cell Disrupter, Daventry, United Kingdom) at 20 kPsi. Lysate was cleared by centrifugation at 20,000 × g at 4°C for 20 min. The supernatant was incubated with 2 mL Ni-NTA resin (GE Healthcare, Brøndby, Denmark), equilibrated with buffer A (50 mM Tris-HCl, 150 mM NaCl, 10 mM imidazole [pH 8]) for 1 h at room temperature. The column was washed with 50 mL buffer B (50 mM Tris-HCl, 1 M NaCl, 10 mM imidazole, 10 mM ME [pH 8]), and His6-SUMO-GHR-ICD was eluted with 15 mL buffer C (50 mM Tris-HCl, 250 mM imidazole, 10 mM ME [pH 8]). The elution was kept for further purification, while the flowthrough was reincubated with 2 mL Ni-NTA resin. His6-SUMO-GHR-ICD was eluted. The His6-SUMO tag was off cleaved by adding 200 g of the ULP1 protease and dialyzed overnight at 4°C against 3 L cleavage buffer (50 mM Tris-HCl, 150 mM NaCl, 10 mM ME [pH 8]). After cleavage, the His-SUMO tag was separated from GHR-ICD by incubating the sample with 2 mL Ni-NTA resin for 1 h. The flowthrough was collected and used for further purification by reversed-phase chromatography using a Resource RPC column (GE Healthcare), equilibrated in Milli-Q water with 0.08% trifluoroacetic acid (v/v),and eluted with a linear gradient from 0 to 100% of 70% acetonitrile (v/v), 0.1% trifluoroacetic acid (v/v). NMR experiments were recorded in 20 mM Na2HPO4/NaH2PO4, 150 mM NaCl, 10 mM ME (pH 7.3), 10% (v/v) D2O, 0.25 mM DSS, 0.05% (v/v) dioxane, 0.02% NaN3 and SAXS data in 20 mM Na2HPO4/NaH2PO4, 300 mM NaCl, 10× excess of DTT, 2% (v/v) glycerol as described, with protein concentration in the range from 1 to 6 mg mL−1 (38).
The human sodium-proton exchanger 6 intracellular distal domain
A modified pET-24b vector with an N-terminal His6-SUMO tag was inserted with the sodium-proton exchanger 6 intracellular distal domain (NHE6cmdd) sequence (residues 554–669). BL21(DE3) E. coli cells were heat shock transformed with the finalized plasmid and incubated in LB medium for 45 min at 37°C, plated on agar containing 50 mg L−1 kanamycin, and incubated overnight at 37°C. Preheated LB medium (10 mL) with 50 mg L−1 kanamycin was inoculated with one colony and incubated overnight at 37°C and 200 rpm. The next day, the culture was added to 1 L of LB medium containing 50 mg L−1 kanamycin and incubated at 37°C and 200 rpm. For His6-SUMO-NHE6cmdd expression, -D-1-thiogalactopyranoside was added to a final concentration of 0.5 mM at an OD600 of 0.6–0.8. Cells were harvested after 4 h by centrifuging at 5000 × g for 20 min at 4°C. The cell pellet resuspended in 20 mL of Tris-HCl buffer (50 mM Tris-HCl [pH 8.0], 150 mM NaCl, 10 mM imidazole, 1 mM DTT) and cells lysed by 1 cycle of French Press at 25 kPsi (Constant Systems MC Cell Disrupter). The lysate was centrifuged at 4°C and 20,000 × g for 30 min and the supernatant applied to a gravity flow column with 4 mL preequilibrated Ni-NTA Sepharose resin (GE Healthcare). The column was washed with 50 mL of high-salt Tris-HCl buffer (50 mM Tris-HCl [pH 8.0], 1 M NaCl, 10 mM imidazole, 1 mM DTT) and bound protein eluted with 15 mL of high-salt Tris-HCl buffer (50 mM Tris-HCl [pH 8.0], 150 mM NaCl, 250 mM imidazole, 1 mM DTT). An aliquot of 100 g His-ULP-1 was added, and the sample transferred to a presoaked dialysis bag with a 3.5 kDa cutoff, and dialyzed against 2 L of a low-salt Tris-HCl buffer (50 mM Tris-HCl [pH 8.0], 150 mM NaCl, 10 mM imidazole, 1 mM DTT) overnight at 4°C while stirred. The sample was applied to 4 mL Ni-NTA Sepharose resin and the flowthrough containing NHE6cmdd was collected. The NHE6cmdd was concentrated to <2 mL by centrifuging with 3 kDa cutoff spin filters (Amicon Ultra), before being loaded onto a 3 mL RPC column (Cytiva prepacked 3 mL SOURCE 15RPC) mounted on an Äkta Purifier system, preequilibrated with 50 mM NH4HCO3 (pH 7.8). A 0–100% linear gradient (20 column volumes) of 50 mM NH4HCO3 (pH 7.8), 70% (v/v) acetonitrile was used to elute the bound NHE6cmdd. Identity and purity of NHE6cmdd was confirmed by SDS-PAGE analysis and mass spectrometry. NMR data were recorded in 20 mM Tris-HCl (pH 7.4), 150 mM NaCl, 5 mM DTT, 0.1% (v/v) dioxane, 25 M DSS, 10% (v/v) D2O, 15°C, and SAXS data in 20 mM Tris-HCl (pH 7.4), 150 mM NaCl, 2% (v/v) glycerol, 5 mM DTT, 15°C. PFG NMR experiments were recorded with 150 M (1.9 mg mL−1) NHE6cmdd, while the SAXS experiments were recorded with 0.7, 1.2, and 1.6 mg mL−1 NHE6cmdd.
Prothymosin-
Prothymosin- (ProT) was produced and purified as described in (39), where also PFG NMR diffusion experiments are reported. SAXS intensities were recorded in 1× TBSK (10 mM Tris, 0.1 mM ethylenediaminetetraacetic acid, 155 mM KCl [pH 7.4]) and 2% glycerol at 15°C. Protein concentrations for SAXS samples were 0.27, 0.74, and 1.6 mg mL−1. Due to the absence of aromatic residues in the sequence of ProT, the absorbance had to be measured at 214 nm. This was not possible in the TBSK buffer, where salts absorb most of the light at 214 nm. The concentration in the most diluted sample is calculated from the elution peak from a chromatogram in a reversed-phase run, where the protein is in water and acetonitrile and there is no background absorbance. The eluted fractions are then lyophilized and resuspended in 1× TBSK, and then concentrated. Due to the unfeasibility of measuring concentration from absorbance at 214 nm in TBSK buffer, we recovered the concentrations of the samples at 0.74 and 1.6 mg mL−1 from the intensity of the forward scattering of their SAXS profiles, using as reference the most diluted sample that had a known concentration. The intensity of the forward scattering was obtained by Guinier fit using the ATSAS package (40).
-Synuclein
-Synuclein (Syn) was produced and purified as described in (41). NMR experiments were recorded in PBS buffer (20 mM Na2HPO4/NaH2PO4, 150 mM NaCl [pH 7.4]), 2% glycerol 10% D2O, 0.25 mM DSS, 0.02% dioxane, 0.02% NaN3, and recorded at 20°C. SAXS data are from (42).
ANAC046172–338
The disordered region (residues 172–338) of the Arabidopsis NAC (no apical meristem, ATAF1/2, and cup-shaped cotyledon [CUC2]) transcription factor, ANAC046, was produced and purified as described in (41). NMR experiments were recorded in PBS buffer (20 mM Na2HPO4/NaH2PO4, 100 mM NaCl, 1 mM DTT, 0.02% dioxane, 0.02% NaN3 [pH 7.0]) at 25°C, and SAXS data in 20 mM Na2HPO4/NaH2PO4, 100 mM NaCl, 5 mM DTT (pH 7.0), same temperature. Protein concentrations for SAXS samples were 1, 3, and 5 mg mL−1.
Dss1
Deleted in split hand/split foot 1 protein (Dss1) from S. pombe (43) was produced and purified as in (41) in the presence of 5 mM ME. NMR experiments were recorded in Tris buffer (20 mM Tris, 150 mM NaCl, 5 mM DTT, 2% glycerol, 10% D2O, 0.25 mM DSS, 0.02% dioxane, 0.02% NaN3) and recorded at 15°C and SAXS data recorded in 20 mM Tris, 150 mM NaCl 2% glycerol 5 mM DTT (pH 7.4), 15°C, protein concentrations were 1, 1.5, and 3 mg mL−1.
hnRNPA1-LCD
The low complexity domain from hnRNPA1 (hereafter called A1) was produced and purified as described previously (10,44). NMR experiments were recorded in 20 mM HEPES, 150 mM NaCl, 0.02% dioxane, 10% D2O at 25°C. Protein concentration was 70 M.
Diffusion ordered NMR spectroscopy
Translational diffusion constants for each protein (50–150 M) and the internal reference were determined by fitting peak intensity decays within 0.5 and 2.5 ppm (where protons belonging to methyl and methylene groups resonate) (26) from diffusion ordered spectroscopy experiments (45), using the Stejskal-Tanner equation (20). We used 1,4-dioxane (0.02–0.10% [v/v]) as internal reference, with an value of 2.12 Å (24). Spectra (16 scans for Syn, 64 scans for NHE6cmdd and A1, and 32 scans for the other proteins) were recorded on a Bruker 600 MHz equipped with a cryoprobe and Z-field gradient, and were obtained over gradient strengths from 2 to 98% ( = 26,752 rad s−1 Gauss−1) with a diffusion time of 200 ms (299.9 ms for GHR-ICD and 50 ms for A1) and gradient length of 3 ms, except for NHE6cmdd where this was 2 ms, and A1 where it was 6 ms. Diffusion constants were fitted in Dynamics Center v2.5.6 (Bruker, Fällanden, Switzerland) and GraphPad Prism v9.2.0. Diffusion constants were used to estimate the for each protein (46), with error propagation using the diffusion coefficients of both the protein and dioxane.
SAXS
Samples for SAXS were prepared in the same buffers as for the NMR experiment, leaving out D2O and DSS, and in some cases adding 2% (v/v) glycerol, and using a range of protein concentrations. The samples were either dialyzed extensively into the buffer, or a final size-exclusion step into the buffer was done, and collecting either the dialysate or the SEC buffer for SAXS analyses. Buffer samples were run before and after the protein samples. SAXS data on Dss1, ProT, and NHE6cmdd were collected at the DIAMOND beamline B21 (London, UK), using a monochromatic ( = 0.9524 Å) beam operating with a flux of photons/s. The detector was an EigerX 4M (Dectris, Baden, Switzerland). The detector to sample distance was set to 3.7 m. Samples were placed in a Ø = 1.5 mm capillary at 288 K during data acquisition. SAXS data on ANAC046 and GHR-ICD were collected at the EMBL bioSAXS-P12 beam line ( = 0.124 nm, 10 keV) at the PETRA III storage ring (Hamburg, Germany) (47). Scattering profiles were recorded on a Pilatus 2M detector (Dectris) (47) following standard procedures and at 298 K. The resulting scattering curves were analyzed as an average of consecutive frames recorded for each sample (detected degenerate frames were removed). The averaged scattering curves of the buffer were subtracted from the averaged scattering curve of the samples. Finally, we scaled the buffer-subtracted curves to absolute scale with DATABSOLUTE, part of the ATSAS package (48), using water and empty capillary measurements, performed at the same temperature as the experiments.
Conformational ensembles
We generated conformational ensembles with two distinct methods, specifically Flexible-Meccano (hereafter FM) (49) and Langevin simulations with the CALVADOS (coarse-graining approach to liquid-liquid phase separation via an automated data-driven optimization scheme) M1 parameters for a Cα-based coarse-grained model (50).
FM generates conformations for the backbone atoms of IDP sampling from backbone dihedral potentials derived by disordered regions of entries in the PDB. We varied the number of conformers produced with the length of the proteins, to reflect the higher complexity of the ensembles for longer chains (Table S1).
Langevin simulations with CALVADOS were run for 1 with a 10 fs time step using OpenMM v7.5.1 (51). Trajectories were subsampled taking a frame every 50 ps according to the shortest observed lag-time resulting in a close-to-zero autocorrelation function of the (Fig. S1), resulting in 20,000 frames per simulation. Temperatures of the experimental measurements were reproduced in the simulations, as well as the ionic strength, by means of the Debye-Hückel potential used in CALVADOS to describe electrostatic interactions.
For both FM and CALVADOS simulations we generated all-atom representations for the ensembles before SAXS calculations using PULCHRA (52) with default settings; these structures were also used to calculate the hydrodynamic radius when using centers of mass to represent the positions of the amino acids.
SAXS calculations
We calculated SAXS intensities using Pepsi-SAXS (53) as described recently (54). The scale factor and constant background were fitted as global parameters for all the conformers in an ensemble (see below), while the contrast of the hydration layer and the effective atomic radius were fixed (respectively, 3.34 e/nm3 and 1.025 rm, where rm is the average atomic radius of the protein).
Ensemble reweighting
We used the Bayesian/maximum entropy (BME) software (3) to improve the agreement of the conformational ensembles with the SAXS experiments by minimizing the functional (4,6):
(4) |
Here, is the number of experimental data points, are the weights associated with each conformer of an ensemble, measures the agreement between the calculated and experimental data, measures how much the optimized weights (i.e., the posterior distribution) diverge from the initial weights (i.e., the prior distribution), and is a parameter that sets the balance between minimizing and maximizing . The value indicates the fraction of the frames that effectively contributes to the averages calculated with the optimized weights. A low means a considerable deviation from the initial ensemble and it can indicate overfitting and artifacts in the reweighted ensemble (3). By scanning different values for and plotting versus , it is possible to choose the optimal value for as the one located at the “elbow” of the curve, where the reaches a plateau with the least amount of deviation from the initial weights.
For SAXS data, the iterative extension of BME (iBME) (54) enabled us to fit a scale factor (s) and constant background (cst) of the calculated SAXS profile by iterating least-squares fitting of experimental and calculated SAXS profiles and BME reweighting until convergence of the . In this approach, the in the functional is:
(5) |
where is the calculated SAXS intensity at scattering angle for the conformer , is the experimental SAXS intensity at scattering angle , and is the error of the experimental intensity at scattering angle normalized as described by Larsen and Pedersen (55). The Bayesian indirect Fourier transformation (BIFT) was used to compute the pair distance distribution function from a model SAXS profile by minimizing the calculated against the experimental SAXS profile and maximizing a prior on the smoothness of the . Then the experimental errors were corrected according to . This procedure enabled a more direct comparison of values from different systems.
Hydrodynamic radius calculation
We employed four distinct approaches to compute the from a specific protein conformation:
-
1.
The equation described by Nygaard et al. (30), who derived a sequence-length-(N)-dependent relationship between the radius of gyration of the Cα atoms and the :
(6) |
where the fitting parameters are = 0.216 Å−1, = 4.06 Å, and = 0.821. This expression for was obtained by fitting the calculated with HYDROPRO (27) as a means to have a more computationally efficient forward model, and which interpolates between the behavior for compact and expanded states.
-
2.
The HullRad algorithm (29) that uses the convex hull method to predict hydrodynamic properties of proteins. The computed with HullRad will hereafter be referred as .
-
3.
The Kirkwood-Riseman equation (28,56,57): , where is the distance between the Cα atoms i and j. An alternative approach is to calculate employing the center of mass of each residue instead of the . The two strategies can lead to minor differences in the resulting (see results and supporting material).
-
4.
The linear fit proposed by Nygaard et al. (30) to approximate the of HYDROPRO from : Å.
Once was calculated for all conformers of an ensemble of size , the average was calculated as (31,34). The transformation of the of each conformer of the ensemble before averaging was done to reflect that the intensities measured by PFG NMR are proportional to (Eqs. 1 and 3). The exponential transformation can be omitted because it does not change the calculated average (31). For simplicity, we hereafter refer to as .
We calculated the to compare calculated with the models above with experiments across the 11 proteins. The reported errors of the experimentally determined values of varied considerably across the different experiments. To avoid putting too much weight on a few experiments with the smallest estimated errors, we instead used the average relative error of (∼2%) in the calculation of .
Scripts and data used in this study are available at https://github.com/KULL-Centre/papers/tree/main/2022/rh-fwd-model-pesce-et-al.
Results and discussion
Proteins and experimental measurements
We collected a data set consisting of 11 IDPs of different lengths (spanning from 24 to 441 residues) and sequence features (net charge per residue, number of prolines, overall charge, etc.; see Table S1) with both SAXS and PFG NMR diffusion measurements. We measured PFG NMR diffusion and SAXS data for those proteins (see materials and methods) where data were not already available in literature (Table 1). We stress that, although different approaches exist to extract the from a SAXS profile (63,64,65) (Table S2), we do not build our ensembles using the SAXS-derived values; rather we use the SAXS intensities themselves. We note, however, that the average ratio of the extracted from SAXS data and the from PFG NMR for the 11 proteins is 1.2 (Table S2), in line with expectations from disordered Gaussian chains (66).
Table 1.
Name | Length (residues) | (nm) | SAXS |
---|---|---|---|
Hst5 | 24 | 1.28 0.02 (58) | (58) |
RS | 24 | 1.19 0.01 (59) | (15) |
Dss1 | 71 | 1.70 0.06 | this study |
Sic1 | 90 | 2.15 0.1 (60) | (36) |
ProT | 111 | 2.89 0.08 (39) | this study |
NHE6cmdd | 116 | 2.67 0.02 | this study |
A1 | 137 | 2.29 0.06 | (44) |
Syn | 140 | 2.79 0.03 | (42) |
ANAC046 | 167 | 3.04 0.01 | this study |
GHR-ICD | 351 | 5.08 0.02 | (38) |
Tau | 441 | 5.40 0.2 (61) | (62) |
To minimize discrepancies related to the dimensions of IDPs being influenced by experimental conditions (for example, temperature and ionic strength of the buffer), we aimed at having SAXS and PFG NMR diffusion measured in the same buffer and conditions. There are few exceptions, where we note some differences in the conditions at which PFG NMR and SAXS measurements were performed (Table S3). Buffers used for SAXS often contain glycerol to limit the radiation damage, which might in principle cause some discrepancies as glycerol was not present in some of the PFG NMR experiments. Previous work, however, suggests that small amounts (2%) of glycerol do not affect the compaction of the IDPs (67,68). Similarly, PFG NMR experiments of use dioxane as an internal reference, and potential interactions between dioxane and the IDPs could also cause discrepancies between NMR and SAXS experiments. Previous work shows consistency across different internal reference compounds (69) and below we find good consistency between SAXS and PFG NMR data; together these results suggest that protein-dioxane interactions do not affect the experimental diffusion measurements. To examine this further, we recorded 1H-15N HSQC spectra of the 15N-labeled ANAC046 alone and in presence of different concentration of dioxane. The resulting data show no changes in position or intensity of the peaks in the spectra (Fig. S2). Together, these observations support the notion that minor discrepancies between experimental conditions in SAXS and NMR experiments do not cause systematic differences.
We measured SAXS data for Dss1, ProT, NHE6cmdd, and ANAC046 at different protein concentrations (Fig. S3). We then inspected the resulting intensities as a function of the scattering angle to check for signs of aggregation or interparticle repulsion in the small-angle region (19). In absence of these effects, we selected the SAXS profiles showing the lowest amount of experimental noise. Therefore, in further analyses we used SAXS data collected at 3 mg/mL for Dss1, 1.6 mg/mL for ProT, 1.6 mg/mL for NHE6cmdd, and 5 mg/mL for ANAC046 (Fig. 2).
Agreement of the ensembles with SAXS data
As described above, our approach involved first generating conformational ensembles that were in agreement with SAXS data and then assessing four different forward models by calculating from these ensembles. We based this procedure on the assumption that conformational ensembles that are generated by accurate physical models and that are in agreement with SAXS data will also be in good agreement with measurements of . While there can be differences in the conformational properties probed by SAXS and PFG NMR (31,34), we expect that such differences are generally small and will be “averaged out” when examining a diverse set of proteins.
We generated ensembles for the 11 proteins using both FM (49) and Langevin simulations with the CALVADOS coarse-grained model (50), both of which are known to generate ensembles in good agreement with SAXS data (8,50,70,71,72). Forward models for SAXS data are relatively consistent with each other and many are based on the same physical principle and spherical harmonics approximation (53,73). Moreover, issues related to fitting free-parameters describing hydration layer and excluded volume in implicit-solvent-based SAXS forward models have recently been addressed also in the context of IDPs (54,74). We therefore calculated SAXS data from the conformational ensembles and compared these with experiments (Fig. 3, partially transparent bars). For both FM and CALVADOS we found good agreements in many cases. The largest outlier was the highly charged protein ProT, where the FM ensemble did not provide as good a fit to the SAXS data (Figs. 3 and S4). Presumably the difference in agreement for ProT arises because CALVADOS explicitly takes the effect of the charges into account.
We improved the agreement with the SAXS data further by using BME reweighting of the ensembles against the SAXS data (Fig. 3, solid bars). For all but the ProT FM ensemble this led to excellent agreement with experiments with only minor levels of reweighting (Table S4). For the FM ensemble of ProT we were also able to obtain a reasonably good fit, although at the cost of stronger reweighting and lower (Table S4).
We analyzed the effect of the different priors (FM versus CALVADOS) and reweighting by examining the distribution of the Rg (Fig. 4). In most cases, we found very similar distributions both before and after reweighting and with the two different methods to generate the conformational ensembles. For the few ensembles with intermediate values of (in the range 2–5) before reweighting, we also observe minor adjustments in the distributions due to reweighting. In general, the fit to the small-angle region of the SAXS profile was already good in most cases for the CALVADOS prior, indicating that this prior is highly efficient in reproducing the average chain dimensions (Fig. S5). We note that this is likely explained—at least in part—by the fact that CALVADOS was parameterized to reproduce for IDPs. The deviations responsible for the higher of ANAC046 and Tau were decreased by reweighting the CALVADOS ensembles (Fig. S5).
To summarize, with the only exception of ProT, the two priors provided similar levels of agreement with SAXS data (Fig. 3) and similar distributions of (Fig. 4). Given that the CALVADOS prior provided a good estimate of the average chain dimensions even without reweighting and considering the specific case of ProT, we below focus our further analyses on the ensembles generated by CALVADOS.
Comparison of forward models for the
We tested four previously described forward models to compute from atomic coordinates. These models are based on different principles, different ways of treating the hydration of proteins, and were developed for different types of molecules. We applied these models to the SAXS-reweighted CALVADOS ensembles, and compared the resulting ensemble-averaged values for to the from PFG NMR diffusion experiments (Fig. 5). We find that all four models led to a high correlation between the calculated and experimental values of . To quantify the accuracy of the models, we calculated the across the 11 proteins, and find the Kirkwood-Riseman equation provided the best agreement with experiments (Fig. 5), with a of 188. Indeed, for all but the two longest proteins and Dss1, the Kirkwood-Riseman equation resulted in very good agreement with experiments.
We generally use distances between the Cα atoms (or beads) when applying the Kirkwood-Riseman equation to calculate . We explored the effect of instead using the center of mass to represent the position of each amino acid. Overall, we find very similar results from the two different approaches for all but the shortest IDPs (Fig. S6). The two approaches also give very similar agreement with experiments ( for Cα atoms and 232 for centers of mass); additional work is needed to examine which method is more accurate for shorter proteins.
The other three forward models (the two equations described by Nygaard et al. and HullRad) gave values that are five to eight times greater than that for the Kirkwood-Riseman equation. For all three models, this higher is due to an apparently overestimated (Figs. 5 and S7), and we see also that the three models are very similar to one another. The exception is for the shortest proteins, Histatin 5 (Hst5) and the RS repeat peptide (RS). This may be because the length of Hst5 and RS is at the limit of the chain length range the Nygaard equations were parametrized for. Despite the overall better agreement when using the Kirkwood-Riseman equation, the agreement with the from PFG NMR experiment is not uniform across the data set and shows a sequence-length-dependent trend (Figs. 5 and S7). When looking at the two longest proteins, GHR-ICD (351 residues) and Tau (441 residues) (Figs. 5 and S7), the Kirkwood-Riseman equation apparently underestimates , whereas the other three models give values closer to experiments.
The general picture for the eight shortest proteins (24–167 residues) is thus that the Kirkwood-Riseman equation provides an accurate model for , and that the other three models provide relatively similar values that are generally greater than the experimental values. For most proteins, there is thus good agreement between the SAXS-refined ensembles and the value calculated from the ensembles using the Kirkwood-Riseman equation without the need for any further refinement of the ensembles. In contrast, for the other three methods, the experimental values lie in the tail of the distributions (Fig. S7). Thus, while it would be possible to construct ensembles that simultaneously agree with both the SAXS and PFG NMR data (31,34), this would require a greater level of reweighting.
We suspect that the Nygaard equations, which are derived from HYDROPRO, and HullRad may be less precise for disordered proteins because the models themselves are derived to predict the of globular, folded proteins. The Kirkwood-Riseman equation instead was developed in the context of theoretical studies on the hydrodynamic properties of disordered polymer chains. The behavior of these chains is more similar to that of IDPs compared with folded proteins, as they exist in an extreme disordered state governed only by self-avoidance of the component particles. This observation is supported by calculating the ratio / from the experiments. For globular proteins, this ratio is expected to be around 0.78 and between 1.2 and 1.5 for disordered chains (30,34,66,75), and indeed we find the average to be 1.2 with some variation across proteins (Table S3).
In the analyses above, we compared the ensembles with the estimated values of ; however, this value is derived from fitting the intensity profiles in the PFG NMR experiments. To examine whether a more direct comparison would give a different picture, we used the predicted values of to derive the diffusion profiles using Eqs. 1 and 3. We found that comparing the calculated and experimental diffusion profiles (for the PFG NMR measurements reported in this study) gave a similar picture as when comparing the (Fig. S8).
We also tested the forward models on the ensembles generated by FM. Since SAXS-refined FM and CALVADOS ensembles are similar in terms of the level of compaction (except ProT), the results were similar. In particular, we saw an overall better agreement with experiments using the Kirkwood-Riseman equation (Fig. S9) and a sequence-length-dependent discrepancy in this agreement. We also note that Dss1 appears to be an outlier, since both ensembles seem to be more expanded than what the PFG NMR diffusion experiment detects for all forward models for the used.
To find possible reasons for this apparent sequence length dependency, we looked at different conformational properties of the ensembles produced with CALVADOS and their relation to sequence length. We computed the sequence-length-normalized asphericity (the degree to which a molecule deviates from a fully spherical shape) and the relative shape anisotropy from the ensembles produced with CALVADOS to highlight potential differences in the shapes adopted by short and long chains. Nevertheless, we did not find properties for which the two longest proteins stood out (Fig. S10).
Expanding the data set
The analyses above were made possible by collecting a set of proteins for which both SAXS and PFG NMR data had been measured on the same protein and under comparable conditions. While the set of 11 proteins covers a wide range of lengths and sequence properties (Table S1), there are two areas that are not covered well. First, there are no proteins of length between 167 and 351 residues (Table S1). Second, most of the proteins are relatively expanded IDPs, with 9 of the 11 proteins having SAXS-derived scaling exponents (Table S2). To complement our analysis described above we therefore collected data from the literature for an additional 11 proteins for which had been measured using PFG NMR, although not in all cases measured using internal referencing by dioxane (Table S5). We used CALVADOS to generate ensembles for these 11 proteins and calculated using the 4 different models (Fig. S11 a). As expected from the fact that the ensembles were not refined using SAXS data and that the data are more heterogeneous, the agreement is more noisy. We find that, within this set of proteins, the four different approaches perform comparably well, with values in the range of 606 to 669 (compared with the span of 188–1406 for the proteins with both SAXS and PFG NMR data).
Comparing the distribution of calculated with the different methods for these 11 proteins with experiments shows that the largest discrepancies between experiments and calculations using the Kirkwood-Riseman model are for A2, FUS, and SBD (Fig. S12). Both A2 and FUS are known to form relatively compact ensembles and so we examined whether there was a relationship between the compaction of the protein—evaluated by the scaling exponent calculated from the conformational ensembles—and the accuracy of the values calculated using the Kirkwood-Riseman model (Fig. S11 b). Overall we find a correlation between compaction and the error in the calculated values of . We note, however, that we obtain accurate results for the two compact disordered proteins Ddx4 and A1, and note that the experimental measurements of A2 and FUS did not use internal referencing with dioxane. Finally, we analyzed simulations of 200-residue-long homopolymeric peptides to examine whether the calculations of using the Kirkwood-Riseman model capture the expected relationship between and compaction (30,34,66,75). Indeed, we find a high correlation between the calculated scaling exponent, , and the ratio, so that the most compact peptides have a ratio and the most expanded peptides have ratios approaching 1.4 (Fig. S13).
Conclusions
Reliable forward models to compare conformational ensembles and biophysical measurements of IDPs are important both in integrative modeling and for benchmarking and optimizing molecular mechanics force fields. Here, we have explored the accuracy of forward models to calculate the from structural ensembles of IDPs. To do so, we first constructed conformational ensembles for 11 IDPs, ranging in length from 24 to 441 residues and diverse in sequence composition. We then determined and optimized the agreement of these ensembles with SAXS data to reproduce the average chain dimensions in solution encoded in SAXS data. Finally, we used four different models to calculate from the refined ensembles and assessed their accuracy by comparison to measurements by PFG NMR measurements. Of the four models that we tested, the Kirkwood-Riseman equation gives the best overall agreement with experiments. Nevertheless, we also found that the accuracy of this model appears to drop in a sequence-length-dependent fashion, which is evident for GHR-ICD and Tau. It is not clear if the source of the sequence-length-dependent discrepancy is due to inaccuracies in the forward model or in the ensembles, or if it is a property of long IDPs, and further studies are needed to clarify this.
In addition to collecting additional data for long IDPs, one approach to get some insight might be to refine the ensembles of GHR-ICD, Tau (and other IDPs of similar length) simultaneously against the and SAXS data. Indeed, previous studies have demonstrated that it is possible to combine SAXS and PFG NMR measurements to refine the distribution of conformations in a disordered ensemble (31,34,36). Examining whether such a refinement is possible and which conformations are retained might give insights into whether the problems are with the ensembles or the forward model. Nevertheless, a more detailed analysis would require measurements on several more proteins.
Another issue to consider relates to how the values have been obtained. In particular, the values depend on the value for dioxane (2.12 Å from Wilkins et al. (24)) that is used as a reference in the PFG NMR diffusion experiments. This value, however, comes with some uncertainty. Specifically, it is based on the assumption that for a globular protein and the use of a SAXS-derived value for for a single protein (24). If the reference used for dioxane is not exact, the forward models may implicitly absorb such a scale factor. Given that our calculations predict relatively accurately, our results suggest that the estimate of the for dioxane is probably quite accurate. To examine to what extent our conclusions depend on the reference value for dioxane, we analyzed how the of the obtained with the different models from the CALVADOS simulation would change if different values for the of dioxane had been used to obtain the of the proteins from PFG NMR (Fig. S14). We find that 2.12 Å lies very close to the minimum for the Kirkwood-Riseman equation, and that a for dioxane greater than ca. 2.25 Å would be needed to change the conclusion that the Kirkwood-Riseman model provides the better fit to the data (Fig. S14). Additional measurements on folded proteins as well as measurements using other references, such as cyclodextrin (26), might help understand these issues better. Finally, it has recently been observed that the of dioxane and other reference compounds may be pressure dependent (76). This suggests that the value could also be temperature dependent, and this should be studied in more detail to help interpret better PFG NMR diffusion experiments at different temperatures (58).
We also analyzed an additional set of 11 proteins with PFG NMR measurements. These proteins do not have measured SAXS data and so we instead rely on the overall accuracy of the CALVADOS model to capture the expansion of these proteins. We selected these proteins to include more proteins with intermediate lengths and to represent proteins with a wider range of properties. While the results confirmed that all models perform relatively well, the results are less clear than for the proteins with consistent SAXS and PFG NMR data. The results hint at a dependency on the accuracy of the Kirkwood-Riseman model on the level of compaction in line with the expectation that this model is expected to work best for disordered expanded polymers. A more detailed analysis, however, would ideally be based on a set of proteins that have been referenced in a consistent way and for which both SAXS and PFG NMR data have been recorded at near identical conditions.
In summary, we present an analysis of 11 proteins for which we have collected SAXS, PFG NMR, and simulation data to generate conformational ensembles. We have used these data to compare different methods to calculate the hydrodynamic radius from conformational ensembles of disordered proteins. Overall we find good agreement from all models, and that the Kirkwood-Riseman model gives the best overall agreement.
Author contributions
F.P., B.B.K., and K.L.-L. designed the study. E.A.N., P.S., E.E.T., F.P., and C.R.G. purified proteins for NMR and SAXS measurements and recorded and analyzed the NMR data. J.G.O. and F.P. analyzed the SAXS data. F.P. produced and analyzed the ensembles. F.P. and K.L.-L. analyzed the data and wrote the paper with input from all authors.
Acknowledgments
We thank our colleagues who measured the data that have made this work possible. We acknowledge the use of computational resources from the core facility for biocomputing at the Department of Biology and support for NMR infrastructure from Villumfonden. We thank the beamline scientists Cy Jeffries at EMBL DESY P12 and Nathan Cowieson at DIAMOND B21 for technical support and data acquisition, and Jacob H. Martinsen for assistance with protein purification and Eric M. Morrow and Stine F. Pedersen for input on NHE6cmdd. We thank Amanda D. Due for preparation of the ANAC046 samples. We thank Anne Bremer and Tanja Mittag for support and assistance in purification and studies of the A1 LCD, and acknowledge access to the St. Jude Biomolecular NMR Spectroscopy Center. We thank Giulio Tesei for fruitful discussion on how to calculate the Kirkwood-Riseman equation. We thank David de Sancho for comments on our work. This research was supported by the Lundbeck Foundation BRAINSTRUC initiative (R155-2015-2666 to B.K.K. and K.L.-L.) and the Novo Nordisk Foundation Challenge grant REPIN (no. NNF18OC0033926 to B.B.K.). E.A.N. has received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Sklodowska-Curie grant agreement no. 101023654.
Declaration of interests
The authors declare no competing interests.
Editor: Samrat Mukhopadhyay.
Footnotes
Supporting material can be found online at https://doi.org/10.1016/j.bpj.2022.12.013.
Contributor Information
Birthe B. Kragelund, Email: bbk@bio.ku.dk.
Kresten Lindorff-Larsen, Email: lindorff@bio.ku.dk.
Supporting material
References
- 1.Wright P.E., Dyson H.J. Intrinsically disordered proteins in cellular signalling and regulation. Nat. Rev. Mol. Cell Biol. 2015;16:18–29. doi: 10.1038/nrm3920. http://www.nature.com/articles/nrm3920 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Babu M.M. The contribution of intrinsically disordered regions to protein function, cellular complexity, and human disease. Biochem. Soc. Trans. 2016;44:1185–1200. doi: 10.1042/BST20160172. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Bottaro S., Bengtsen T., Lindorff-Larsen K. Springer US; 2020. Integrating Molecular Simulation and Experimental Data: A Bayesian/Maximum Entropy Reweighting Approach; pp. 219–240. [DOI] [PubMed] [Google Scholar]
- 4.Orioli S., Larsen A.H., et al. Lindorff-Larsen K. In: Computational Approaches for Understanding Dynamical Systems: Protein Folding and Assembly. Strodel B., Barz B., editors. Academic Press; 2020. Chapter Three - how to learn from inconsistencies: integrating molecular simulations with experimental data; pp. 123–176.https://www.sciencedirect.com/science/article/pii/S1877117319302121 Volume 170 of Progress in Molecular Biology and Translational Science. [Google Scholar]
- 5.Bonomi M., Camilloni C., et al. Vendruscolo M. Metainference: a Bayesian inference method for heterogeneous systems. Sci. Adv. 2016;2:e1501177. doi: 10.1126/sciadv.1501177. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Hummer G., Köfinger J. Bayesian ensemble refinement by replica simulations and reweighting. J. Chem. Phys. 2015;143:243150. doi: 10.1063/1.4937786. [DOI] [PubMed] [Google Scholar]
- 7.Różycki B., Kim Y.C., Hummer G. SAXS ensemble refinement of ESCRT-III CHMP3 conformational transitions. Structure. 2011;19:109–116. doi: 10.1016/j.str.2010.10.006. http://www.sciencedirect.com/science/article/pii/S0969212610003953 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Bernadó P., Blanchard L., et al. Blackledge M. A structural model for unfolded proteins from residual dipolar couplings and small-angle x-ray scattering. Proc. Natl. Acad. Sci. USA. 2005;102:17002–17007. doi: 10.1073/pnas.0506202102. https://www.pnas.org/content/102/47/17002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Shoemaker B.A., Portman J.J., Wolynes P.G. Speeding molecular recognition by using the folding funnel: the fly-casting mechanism. Proc. Natl. Acad. Sci. USA. 2000;97:8868–8873. doi: 10.1073/pnas.160259697. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Martin E.W., Holehouse A.S., et al. Mittag T. Valence and patterning of aromatic residues determine the phase behavior of prion-like domains. Science. 2020;367:694–699. doi: 10.1126/science.aaw8653. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Lin Y.-H., Chan H.S. Phase separation and single-chain compactness of charged disordered proteins are strongly correlated. Biophys. J. 2017;112:2043–2046. doi: 10.1016/j.bpj.2017.04.021. https://www.sciencedirect.com/science/article/pii/S000634951730437X [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Thomasen F.E., Pesce F., et al. Lindorff-Larsen K. Improving martini 3 for disordered and multidomain proteins. J. Chem. Theor. Comput. 2022;18:2033–2041. doi: 10.1021/acs.jctc.1c01042. [DOI] [PubMed] [Google Scholar]
- 13.Henriques J., Cragnell C., Skepö M. Molecular dynamics simulations of intrinsically disordered proteins: force field evaluation and comparison with experiment. J. Chem. Theor. Comput. 2015;11:3420–3431. doi: 10.1021/ct501178z. [DOI] [PubMed] [Google Scholar]
- 14.Palazzesi F., Prakash M.K., et al. Barducci A. Accuracy of current all-atom force-fields in modeling protein disordered states. J. Chem. Theor. Comput. 2015;11:2–7. doi: 10.1021/ct500718s. [DOI] [PubMed] [Google Scholar]
- 15.Rauscher S., Gapsys V., et al. Grubmüller H. Structural ensembles of intrinsically disordered proteins depend strongly on force field: a comparison to experiment. J. Chem. Theor. Comput. 2015;11:5513–5524. doi: 10.1021/acs.jctc.5b00736. [DOI] [PubMed] [Google Scholar]
- 16.Piana S., Donchev A.G., et al. Shaw D.E. Water dispersion interactions strongly influence simulated structural properties of disordered protein states. J. Phys. Chem. B. 2015;119:5113–5123. doi: 10.1021/jp508971m. [DOI] [PubMed] [Google Scholar]
- 17.Best R.B., Zheng W., Mittal J. Balanced protein–water interactions improve properties of disordered proteins and non-specific protein association. J. Chem. Theor. Comput. 2014;10:5113–5124. doi: 10.1021/ct500569b. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Robustelli P., Piana S., Shaw D.E. Developing a molecular dynamics force field for both folded and disordered protein states. Proc. Natl. Acad. Sci. USA. 2018;115:E4758–E4766. doi: 10.1073/pnas.1800690115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Mertens H.D.T., Svergun D.I. Structural characterization of proteins and complexes using small-angle X-ray solution scattering. J. Struct. Biol. 2010;172:128–141. doi: 10.1016/j.jsb.2010.06.012. https://www.sciencedirect.com/science/article/pii/S1047847710001905 new Trends in Protein Expression. [DOI] [PubMed] [Google Scholar]
- 20.Stejskal E.O., Tanner J.E. Spin diffusion measurements: spin echoes in the presence of a time-dependent field gradient. J. Chem. Phys. 1965;42:288–292. doi: 10.1063/1.1695690. [DOI] [Google Scholar]
- 21.Rigler R., Mets U., et al. Kask P. Fluorescence correlation spectroscopy with high count rate and low background: analysis of translational diffusion. Eur. Biophys. J. 1993;22 http://link.springer.com/10.1007/BF00185777 [Google Scholar]
- 22.Stetefeld J., McKenna S.A., Patel T.R. Dynamic light scattering: a practical guide and applications in biomedical sciences. Biophys. Rev. 2016;8:409–427. doi: 10.1007/s12551-016-0218-6. http://link.springer.com/10.1007/s12551-016-0218-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Lindorff-Larsen K., Kragelund B.B. On the potential of machine learning to examine the relationship between sequence, structure, dynamics and function of intrinsically disordered proteins. J. Mol. Biol. 2021;433:167196. doi: 10.1016/j.jmb.2021.167196. https://www.sciencedirect.com/science/article/pii/S0022283621004290 from Protein Sequence to Structure at Warp Speed: How Alphafold Impacts Biology. [DOI] [PubMed] [Google Scholar]
- 24.Wilkins D.K., Grimshaw S.B., et al. Smith L.J. Hydrodynamic radii of native and denatured proteins measured by pulse field gradient NMR techniques. Biochemistry. 1999;38:16424–16431. doi: 10.1021/bi991765q. [DOI] [PubMed] [Google Scholar]
- 25.Kärger J., Pfeifer H., Heink W. Academic Press; 1988. Principles and Application of Self-Diffusion Measurements by Nuclear Magnetic Resonance; pp. 1–89.https://www.sciencedirect.com/science/article/pii/B978012025512250004X volume 12 of Advances in Magnetic and Optical Resonance. [Google Scholar]
- 26.Leeb S., Danielsson J. Springer US; 2020. Obtaining Hydrodynamic Radii of Intrinsically Disordered Protein Ensembles by Pulsed Field Gradient NMR Measurements; pp. 285–302. [DOI] [PubMed] [Google Scholar]
- 27.Ortega A., Amorós D., García de la Torre J. Prediction of hydrodynamic and other solution properties of rigid proteins from atomic- and residue-level models. Biophys. J. 2011;101:892–898. doi: 10.1016/j.bpj.2011.06.046. https://www.sciencedirect.com/science/article/pii/S0006349511007764 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Kirkwood J.G., Riseman J. The intrinsic viscosities and diffusion constants of flexible macromolecules in solution. J. Chem. Phys. 1948;16:565–573. doi: 10.1063/1.1746947. [DOI] [Google Scholar]
- 29.Fleming P.J., Fleming K.G. HullRad: fast calculations of folded and disordered protein and nucleic acid hydrodynamic properties. Biophys. J. 2018;114:856–869. doi: 10.1016/j.bpj.2018.01.002. https://www.sciencedirect.com/science/article/pii/S0006349518300651 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Nygaard M., Kragelund B.B., et al. Lindorff-Larsen K. An efficient method for estimating the hydrodynamic radius of disordered protein conformations. Biophys. J. 2017;113:550–557. doi: 10.1016/j.bpj.2017.06.042. https://www.sciencedirect.com/science/article/pii/S0006349517306926 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Ahmed M.C., Crehuet R., Lindorff-Larsen K. Springer US; 2020. Computing, Analyzing, and Comparing the Radius of Gyration and Hydrodynamic Radius in Conformational Ensembles of Intrinsically Disordered Proteins; pp. 429–445. [DOI] [PubMed] [Google Scholar]
- 32.Naullage P.M., Haghighatlari M., et al. Head-Gordon T. Protein dynamics to define and refine disordered protein ensembles. J. Phys. Chem. B. 2022;126:1885–1894. doi: 10.1021/acs.jpcb.1c10925. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Lincoff J., Haghighatlari M., et al. Head-Gordon T. Extended experimental inferential structure determination method in determining the structural ensembles of disordered protein states. Commun. Chem. 2020;3:74. doi: 10.1038/s42004-020-0323-0. http://www.nature.com/articles/s42004-020-0323-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Choy W.-Y., Mulder F.A.A., et al. Kay L.E. Distribution of molecular size within an unfolded state ensemble using small-angle X-ray scattering and pulse field gradient NMR techniques. J. Mol. Biol. 2002;316:101–112. doi: 10.1006/jmbi.2001.5328. https://www.sciencedirect.com/science/article/pii/S0022283601953288 [DOI] [PubMed] [Google Scholar]
- 35.Lindorff-Larsen K., Kristjansdottir S., et al. Vendruscolo M. Determination of an ensemble of structures representing the denatured state of the bovine acyl-coenzyme A binding protein. J. Am. Chem. Soc. 2004;126:3291–3299. doi: 10.1021/ja039250g. [DOI] [PubMed] [Google Scholar]
- 36.Gomes G.-N.W., Krzeminski M., et al. Gradinaru C.C. Conformational ensembles of an intrinsically disordered protein consistent with NMR, SAXS, and single-molecule FRET. J. Am. Chem. Soc. 2020;142:15697–15710. doi: 10.1021/jacs.0c02088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Haxholm G.W., Nikolajsen L.F., et al. Kragelund B.B. Intrinsically disordered cytoplasmic domains of two cytokine receptors mediate conserved interactions with membranes. Biochem. J. 2015;468:495–506. doi: 10.1042/BJ20141243. [DOI] [PubMed] [Google Scholar]
- 38.Seiffert P., Bugge K., et al. Kragelund B.B. Orchestration of signaling by structural disorder in class 1 cytokine receptors. Cell Commun. Signal. 2020;18:132. doi: 10.1186/s12964-020-00626-6. https://biosignaling.biomedcentral.com/articles/10.1186/s12964-020-00626-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Borgia A., Borgia M.B., et al. Schuler B. Extreme disorder in an ultrahigh-affinity protein complex. Nature. 2018;555:61–66. doi: 10.1038/nature25762. http://www.nature.com/articles/nature25762 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Manalastas-Cantos K., Konarev P.V., et al. Franke D. Atsas 3.0: expanded functionality and new tools for small-angle scattering data analysis. J. Appl. Crystallogr. 2021;54:343–355. doi: 10.1107/S1600576720013412. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Newcombe E.A., Fernandes C.B., et al. Kragelund B.B. Insight into calcium-binding motifs of intrinsically disordered proteins. Biomolecules. 2021;11:1173. doi: 10.3390/biom11081173. https://www.mdpi.com/2218-273X/11/8/1173 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Ahmed M.C., Skaanning L.K., et al. Lindorff-Larsen K. Refinement of α-synuclein ensembles against SAXS data: comparison of force fields and methods. Front. Mol. Biosci. 2021;8:654333. doi: 10.3389/fmolb.2021.654333. https://www.frontiersin.org/article/10.3389/fmolb.2021.654333 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Crackower M.A., Scherer S.W., et al. Tsui L.-C. Characterization of the split hand/split foot malformation locus SHFM1 at 7q21. 3–q22. 1 and analysis of a candidate gene for its expression during limb development. Hum. Mol. Genet. 1996;5:571–579. doi: 10.1093/hmg/5.5.571. [DOI] [PubMed] [Google Scholar]
- 44.Bremer A., Farag M., et al. Mittag T. Deciphering how naturally occurring sequence features impact the phase behaviours of disordered prion-like domains. Nat. Chem. 2022;14:196–207. doi: 10.1038/s41557-021-00840-w. https://www.nature.com/articles/s41557-021-00840-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Wu D., Chen A., Johnson C. An improved diffusion-ordered spectroscopy experiment incorporating bipolar-gradient pulses. J. Magn. Reson., Ser. A. 1995;115:260–264. https://www.sciencedirect.com/science/article/pii/S106418588571176X [Google Scholar]
- 46.Prestel A., Bugge K., et al. Kragelund B.B. In: Intrinsically Disordered Proteins. Rhoades E., editor. Academic Press; 2018. Chapter eight - characterization of dynamic IDP complexes by NMR spectroscopy; pp. 193–226.https://www.sciencedirect.com/science/article/pii/S0076687918303057 volume 611 of Methods in Enzymology. [DOI] [PubMed] [Google Scholar]
- 47.Blanchet C.E., Spilotros A., et al. Svergun D.I. Versatile sample environments and automation for biological solution X-ray scattering experiments at the P12 beamline (PETRA III, DESY) J. Appl. Crystallogr. 2015;48:431–443. doi: 10.1107/S160057671500254X. http://scripts.iucr.org/cgi-bin/paper?S160057671500254X [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Franke D., Petoukhov M.V., et al. Svergun D.I. Atsas 2.8 : a comprehensive data analysis suite for small-angle scattering from macromolecular solutions. J. Appl. Crystallogr. 2017;50:1212–1225. doi: 10.1107/S1600576717007786. http://scripts.iucr.org/cgi-bin/paper?S1600576717007786 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Ozenne V., Bauer F., et al. Blackledge M. Flexible-meccano: a tool for the generation of explicit ensemble descriptions of intrinsically disordered proteins and their associated experimental observables. Bioinformatics. 2012;28:1463–1470. doi: 10.1093/bioinformatics/bts172. [DOI] [PubMed] [Google Scholar]
- 50.Tesei G., Schulze T.K., et al. Lindorff-Larsen K. Accurate model of liquid-liquid phase behavior of intrinsically disordered proteins from optimization of single-chain properties. Proc. Natl. Acad. Sci. USA. 2021;118 doi: 10.1073/pnas.2111696118. e2111696118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Eastman P., Swails J., et al. Pande V.S. OpenMM 7: rapid development of high performance algorithms for molecular dynamics. PLoS Comput. Biol. 2017;13:e1005659. doi: 10.1371/journal.pcbi.1005659. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Rotkiewicz P., Skolnick J. Fast procedure for reconstruction of full-atom protein models from reduced representations. J. Comput. Chem. 2008;29:1460–1465. doi: 10.1002/jcc.20906. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Grudinin S., Garkavenko M., Kazennov A. Pepsi-SAXS: an adaptive method for rapid and accurate computation of small-angle X-ray scattering profiles. Acta Crystallogr. D Struct. Biol. 2017;73:449–464. doi: 10.1107/S2059798317005745. [DOI] [PubMed] [Google Scholar]
- 54.Pesce F., Lindorff-Larsen K. Refining conformational ensembles of flexible proteins against small-angle x-ray scattering data. Biophys. J. 2021;120:5124–5135. doi: 10.1016/j.bpj.2021.10.003. https://www.sciencedirect.com/science/article/pii/S0006349521008286 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Larsen A.H., Pedersen M.C. Experimental noise in small-angle scattering can be assessed using the Bayesian indirect Fourier transformation. J. Appl. Crystallogr. 2021;54:1281–1289. doi: 10.1107/S1600576721006877. [DOI] [Google Scholar]
- 56.Kirkwood J.G. The general theory of irreversible processes in solutions of macromolecules. J. Polym. Sci. 1954;12:1–14. doi: 10.1002/pol.1954.120120102. [DOI] [Google Scholar]
- 57.Clisby N., Dünweg B. High-precision estimate of the hydrodynamic radius for self-avoiding walks. Phys. Rev. E. 2016;94:052102. doi: 10.1103/PhysRevE.94.052102. [DOI] [PubMed] [Google Scholar]
- 58.Jephthah S., Staby L., et al. Skepö M. Temperature dependence of intrinsically disordered proteins in simulations: what are we missing? J. Chem. Theor. Comput. 2019;15:2672–2683. doi: 10.1021/acs.jctc.8b01281. [DOI] [PubMed] [Google Scholar]
- 59.Xiang S., Gapsys V., et al. Zweckstetter M. Phosphorylation drives a dynamic switch in serine/arginine-rich proteins. Structure. 2013;21:2162–2174. doi: 10.1016/j.str.2013.09.014. https://www.sciencedirect.com/science/article/pii/S0969212613003651 [DOI] [PubMed] [Google Scholar]
- 60.Mittag T., Orlicky S., et al. Forman-Kay J.D. Dynamic equilibrium engagement of a polyvalent ligand with a single-site receptor. Proc. Natl. Acad. Sci. USA. 2008;105:17772–17777. doi: 10.1073/pnas.0809222105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Mukrasch M.D., Bibow S., et al. Zweckstetter M. Structural polymorphism of 441-residue Tau at single residue resolution. PLoS Biol. 2009;7:e1000034. doi: 10.1371/journal.pbio.1000034. https://dx.plos.org/10.1371/journal.pbio.1000034 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Mylonas E., Hascher A., et al. Svergun D.I. Domain conformation of Tau protein studied by solution small-angle X-ray scattering. Biochemistry. 2008;47:10345–10353. doi: 10.1021/bi800900d. [DOI] [PubMed] [Google Scholar]
- 63.Guinier A. La diffraction des rayons X aux très petits angles : application à l’étude de phénomènes ultramicroscopiques. Ann. Phys. 1939;11:161–237. [Google Scholar]
- 64.Riback J.A., Bowman M.A., et al. Sosnick T.R. Innovative scattering analysis shows that hydrophobic disordered proteins are expanded in water. Science. 2017;358:238–241. doi: 10.1126/science.aan5774. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Zheng W., Best R.B. An extended guinier analysis for intrinsically disordered proteins. J. Mol. Biol. 2018;430:2540–2553. doi: 10.1016/j.jmb.2018.03.007. https://www.sciencedirect.com/science/article/pii/S0022283618301359 intrinsically Disordered Proteins. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Oono Y., Kohmoto M. Renormalization group theory of transport properties of polymer solutions. I. Dilute solutions. J. Chem. Phys. 1983;78:520–528. doi: 10.1063/1.444477. [DOI] [Google Scholar]
- 67.Soranno A., Buchli B., et al. Schuler B. Quantifying internal friction in unfolded and intrinsically disordered proteins with single-molecule spectroscopy. Proc. Natl. Acad. Sci. USA. 2012;109:17800–17806. doi: 10.1073/pnas.1117368109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Moses D., Yu F., et al. Sukenik S. Revealing the hidden sensitivity of intrinsically disordered proteins to their chemical environment. J. Phys. Chem. Lett. 2020;11:10131–10136. doi: 10.1021/acs.jpclett.0c02822. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Jones J.A., Wilkins D.K., et al. Dobson C.M. Characterisation of protein unfolding by NMR diffusion measurements. J. Biomol. NMR. 1997;10:199–203. http://link.springer.com/10.1023/A:1018304117895 [Google Scholar]
- 70.Jensen M.R., Markwick P.R., et al. Blackledge M. Quantitative determination of the conformational properties of partially folded and intrinsically disordered proteins using NMR dipolar couplings. Structure. 2009;17:1169–1185. doi: 10.1016/j.str.2009.08.001. https://www.sciencedirect.com/science/article/pii/S0969212609002986 [DOI] [PubMed] [Google Scholar]
- 71.Wells M., Tidow H., et al. Fersht A.R. Structure of tumor suppressor p53 and its intrinsically disordered N-terminal transactivation domain. Proc. Natl. Acad. Sci. USA. 2008;105:5762–5767. doi: 10.1073/pnas.0801353105. https://www.pnas.org/content/105/15/5762 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Mukrasch M.D., Markwick P., et al. Blackledge M. Highly populated turn conformations in natively unfolded Tau protein identified from residual dipolar couplings and molecular simulation. J. Am. Chem. Soc. 2007;129:5235–5243. doi: 10.1021/ja0690159. [DOI] [PubMed] [Google Scholar]
- 73.Svergun D., Barberato C., Koch M.H.J. Crysol – a program to evaluate X-ray solution scattering of biological macromolecules from atomic coordinates. J. Appl. Crystallogr. 1995;28:768–773. doi: 10.1107/S0021889895007047. [DOI] [Google Scholar]
- 74.Henriques J., Arleth L., et al. Skepö M. On the calculation of SAXS profiles of folded and intrinsically disordered proteins from computer simulations. J. Mol. Biol. 2018;430:2521–2539. doi: 10.1016/j.jmb.2018.03.002. https://www.sciencedirect.com/science/article/pii/S0022283618301232 intrinsically Disordered Proteins. [DOI] [PubMed] [Google Scholar]
- 75.Burchard W., Schmidt M., Stockmayer W.H. Information on polydispersity and branching from combined quasi-elastic and intergrated scattering. Macromolecules. 1980;13:1265–1272. [Google Scholar]
- 76.Ramanujam V., Alderson T.R., et al. Bax A. Protein structural changes characterized by high-pressure, pulsed field gradient diffusion NMR spectroscopy. J. Magn. Reson. 2020;312:106701. doi: 10.1016/j.jmr.2020.106701. https://www.sciencedirect.com/science/article/pii/S1090780720300197 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.