Skip to main content
Biophysical Journal logoLink to Biophysical Journal
. 2021 Oct 8;120(22):5124–5135. doi: 10.1016/j.bpj.2021.10.003

Refining conformational ensembles of flexible proteins against small-angle x-ray scattering data

Francesco Pesce 1, Kresten Lindorff-Larsen 1,
PMCID: PMC8633713  PMID: 34627764

Abstract

Intrinsically disordered proteins and flexible regions in multidomain proteins display substantial conformational heterogeneity. Characterizing the conformational ensembles of these proteins in solution typically requires combining one or more biophysical techniques with computational modeling or simulations. Experimental data can either be used to assess the accuracy of a computational model or to refine the computational model to get a better agreement with the experimental data. In both cases, one generally needs a so-called forward model (i.e., an algorithm to calculate experimental observables from individual conformations or ensembles). In many cases, this involves one or more parameters that need to be set, and it is not always trivial to determine the optimal values or to understand the impact on the choice of parameters. For example, in the case of small-angle x-ray scattering (SAXS) experiments, many forward models include parameters that describe the contribution of the hydration layer and displaced solvent to the background-subtracted experimental data. Often, one also needs to fit a scale factor and a constant background for the SAXS data but across the entire ensemble. Here, we present a protocol to dissect the effect of the free parameters on the calculated SAXS intensities and to identify a reliable set of values. We have implemented this procedure in our Bayesian/maximum entropy framework for ensemble refinement and demonstrate the results on four intrinsically disordered proteins and a protein with three domains connected by flexible linkers. Our results show that the resulting ensembles can depend on the parameters used for solvent effects and suggest that these should be chosen carefully. We also find a set of parameters that work robustly across all proteins.

Significance

The flexibility of a protein is often key to its biological function, yet understanding and characterizing its conformational heterogeneity is difficult. Here, we describe a robust protocol for combining small-angle x-ray scattering experiments with computational modeling to obtain a conformational ensemble. In particular, we focus on the contribution of protein hydration to the experiments and how this is included in modeling the data. Our resulting algorithm and software should make modeling intrinsically disordered proteins and multidomain proteins more robust, thus aiding in understanding the relationship between protein dynamics and biological function.

Introduction

Small-angle x-ray scattering (SAXS) experiments are widely used in the field of integrative structural biology as a versatile tool to probe conformational ensembles of biomolecules in solution. When faced with highly dynamic and flexible systems, solving a crystallographic structure may either not be possible or only provide a static image that does not capture key aspects of the system. SAXS experiments, instead, give a low-to-medium resolution ensemble-averaged view of the biomolecule. Whereas for relatively rigid macromolecules it may be possible to derive a single averaged shape directly from a SAXS experiment (1, 2, 3, 4), this is generally not possible for very flexible molecules. The possibility to calculate SAXS profiles from atomic coordinates, however, makes it possible to average across a distribution of conformations (a “prior distribution”), compare the result with an experiment, and to correct the prior distribution in case of poor agreement (5,6).

SAXS measurements report on the total scattering of x rays from all molecules in solution. Thus, the resulting scattering profile represents the entire solution (buffer and solute). For this reason, one also collects scattering data for the buffer alone, which is then subtracted from the data from the solution with the macromolecule to get the excess scattering. Because the density of solvent around the protein (the hydration layer) may differ from that of the bulk solvent (7), the resulting data (colloquially termed SAXS data, although in practice it is a difference between two SAXS measurements) represents the signal coming from the protein together with its hydration envelope and the solvent displaced by the protein.

The calculation of SAXS intensities is done using a so-called forward model (i.e., an algorithm to predict an experimental observable from a structural model). Several forward models for SAXS exist, and a key distinction is how they treat the scattering contribution from the protein hydration layer (5). In particular, to account for the hydration layer contribution, modeling approaches for SAXS data generally fall into two distinct categories.

In explicit solvent models (8, 9, 10, 11, 12), displaced water molecules and perturbed properties of the hydration envelope are explicitly taken into account in the calculation of scattering intensities. In particular, the scattering from the solute (protein) and hydration layer effects is estimated by explicit subtraction of the scattering calculated from the solvated protein and the solvent alone.

In contrast, in implicit solvent models (13, 14, 15, 16), the contribution of the hydration layer and displaced volume to the scattering is modeled typically through one or more parameters that need to be set. The effects of the hydration layer can be modeled by a hydration shell of some width, Δ, and with excess density (and thus changed scattering compared to bulk solvent), δρ. Typically, only the product Δ × δρ is important, thus Δ is often fixed (e.g., at 3 Å), and only δρ needs to be set (or fit). Similarly, implicit solvent models may include a parameter for the effective atomic radius (r0), which both affects the overall displaced volume but also to some extent the contribution of the hydration layer.

Whereas the explicit solvent strategy may provide a more realistic representation of the hydration layer and its contribution to the scattering data, it can be computationally expensive and requires a force field and water model that accurately models protein-water interactions. Although it has been shown that the calculations on folded proteins are not particularly sensitive to the choice of water model (9,17), there is more uncertainty about the best models for protein-water interactions for disordered proteins (17, 18, 19, 20).

In many cases, one would want to use an implicit solvent strategy to calculate scattering data because of the smaller computational overhead. On the other hand, and as described above, these methods may require setting parameters that describe, for example, the protein’s solvent envelope, and it is not always clear how best to determine these. Here, we note that many forward models, including implicit solvent-based forward models for SAXS, have mostly been developed, parametrized, and benchmarked using globular and folded protein structures. For folded proteins, free parameters in forward models may be determined using SAXS data for proteins with known structures or fitted for a given structural model. This approach, however, is difficult to apply to disordered proteins because for these there is not a well-defined reference structure from, for example, crystallography and there is uncertainty in computational methods for generating distributions of conformations (17,20, 21, 22), both in terms of sampling efficiently all the possible conformations with the right probabilities or in terms of parametrization. Similarly, it is not reasonable to fit these parameters independently to each structure because of the risk of substantial overfitting (9,17) and because it is expected that the properties of the hydration layer will not depend fundamentally on the details of the structure. Finally, a key problem is that SAXS calculations are often used to construct or bias conformational ensembles, so that a procedure needs to be able to determine the free parameters self-consistently together with the conformational ensemble.

Here, we focus our attention on these issues with implicit solvent calculations of scattering data for heterogeneous ensembles of conformations. We illustrate the effect of varying the parameters describing the hydration layer and displaced volume on ensemble refinement of intrinsically disordered proteins (IDPs) and a flexible multidomain protein. We do so via an iterative and self-consistent strategy to select and optimize free parameters in SAXS calculations while at the same time constructing a conformational ensemble to represent the data.

Our approach is based on a reweighting approach that is rooted in Bayesian inference (23, 24, 25, 26, 27, 28, 29, 30, 31) and the maximum entropy principle (32, 33, 34, 35, 36, 37). Although these methods show similarities to other approaches based, for example, on genetic algorithms (6,38,39) or Monte Carlo processes (40,41), they differ in how they balance prior information (often encoded in a force field) with the experimental data. This balance can be particularly important for disordered proteins in which the solutions are typically severely underdetermined and in which large ensembles are required to provide a realistic structural description of the conformations present in solution (38). Finally, we note that in some cases it is possible to absorb some effects of protein dynamics into the forward model rather than to represent it explicitly in the form of a conformational ensemble, and such modifications exist both for various types of NMR data (42, 43, 44, 45) and x-ray scattering or diffraction data (46, 47, 48, 49). Although this may be useful when studying small fluctuations around a well-defined “average” conformation or when the dynamics is of rigid bodies in a crystal, here we examine systems in which we explicitly represent a broad conformational ensemble.

Materials and methods

Generating conformational ensembles of IDPs

We generated conformational ensembles for the polypeptide backbones of four IDPs using flexible-meccano (50). Flexible-meccano implicitly represents a potential energy function derived from the populations of backbone dihedrals in loop regions in folded protein structures. The backbone chains are built by random sampling these potentials. Other methods exist to efficiently generate conformational ensembles of IDPs, also with the possibility of taking into account transient secondary structure elements (if part of the sequence is known to assume these) (51). We chose flexible-meccano for most of the analyses presented here because it has been shown to generate conformational ensembles of IDPs that are in good agreement with both NMR observables and SAXS data (52, 53, 54, 55) without the need to provide any prior knowledge about the system. Because the complexity of the ensembles may be influenced by the length of the protein, we generated larger ensembles for the longer proteins, including Hst5 (24 residues, 10,000 conformers), Sic1 (90 residues, 15,000 conformers), α-Synuclein (140 residues, 20,000 conformers), and Tau (441 residues, 30,000 conformers). We added side chains to the backbone structures generated by flexible-meccano using PULCHRA (56) with default settings.

Iterative Bayesian/maximum entropy reweighting scheme

In integrative structural modeling, one approach is to use reweighting to refine probability distributions to improve the agreement between calculated averages and experimental values (37). Here, we use the Bayesian/maximum entropy (BME) reweighting procedure (36) that, by minimal modification of the prior distribution and taking into account the uncertainty in the experimental observable (σi), modifies the prior weights ωj0 to minimize the pseudo-free energy functional (24,26,37):

L(ω1ωn)=m2χred2(ω1ωn)θSrel(ω1ωn) (1)

Here, m is the number of experimental data points, (ω1ωn) are the weights associated with each conformer of the ensemble, the reduced χ2 quantifies the agreement of the weighted average forward model predicted from each conformation xj (F(xj)) with the experimental data FiEXPas follows:

χred2(ω1ωn)=1mim(jnωjF(xj)FiEXP)2σi2, (2)

and Srel=jnωjlnωjωj0 is the relative entropy that quantifies how much the reweighted distribution deviates from the prior.

Thus, when minimizing L, we aim to lower χred2 (to improve agreement with experiment), while not decreasing the relative entropy term too much, in such a way as to obtain the minimal modification of the prior distribution that results in a better agreement with the experimental data. The parameter θ is a temperature-like free parameter that effectively sets the balance between experiments and computation and takes into account various sources of error, such as inaccuracies in the force field or the forward model (24,37). In the limit θ→∞, no confidence is assigned to the experimental data, and no reweighting is performed. Because θ is decreased, more weight is put on improving agreement with experiments (χred2) but at the cost of an increased deviation between the posterior (refined) distribution and the prior distribution (in this case generated by flexible-meccano or simulations with the Martini or all-atom force fields). This can also be quantified as φeff = exp(Srel), corresponding to the fraction of the original n frames that effectively contributes to the refined ensemble. For θ = 0, the χ2 is minimized without considering the prior distribution, in some cases leading to very low values of φeff, and so very few conformations contribute to the final average. In methods such as BME, θ should be chosen in such a way as to find the balance between minimizing the χ2 and retaining as much information as possible from the prior, such as, for example, when χ2 reaches a plateau (24,37).

We highlight that, as a consequence of the points above, the prior distribution is an important part of the procedure because the goal of BME and related approaches is to perform the minimal modification of the prior to get a reasonable agreement with the experimental data. The definition of the prior weights is strictly dependent on the method used to sample conformations. In case of standard molecular dynamics simulations, in which the conformations are sampled from a Boltzmann distribution, or flexible-meccano, in which the conformations are sampled from specific backbone dihedral angle potential wells, the weights are uniform because the probability of a certain conformation is related to its occurrence in the ensemble.

Because the goal of the BME is to decrease the χ2, it is important to ensure, when needed, that the experimental and calculated values are on the same scale. Whereas SAXS intensities can be measured and calibrated on an absolute scale, this depends on careful calibration of the instrument and accurate measurements of the protein concentration. Thus, calculated SAXS profiles are often rescaled to match the experimental data. Moreover, experimental SAXS data may contain a small, nonzero background scattering (e.g., from imperfect background subtraction), which sometimes is dealt with by shifting calculated SAXS profiles to get a better fit.

To account for these issues, we present here the iterative Bayesian/maximum entropy (iBME) approach, an iterative scheme that we have developed with the aim of coupling ensemble refinement and optimization of scale factor and constant background of the calculated SAXS profiles. The scheme is structured as follows:

  • 1)

    Given a set of SAXS profiles calculated from each structure in a conformational ensemble, the corresponding ensemble-averaged SAXS profile is calculated using a set of initial (prior) weights (uniform weights in all our ensembles). We then perform a weighted least-squares fit between the ensemble-averaged calculated SAXS profile and the experimental SAXS profile to get slope and intercept of the resulting linear fit. Weights for the weighted least-squares fit are defined as 1(σi2).

  • 2)

    The slope and intercept from 1 are used as scale factor and constant background to rescale and shift the calculated SAXS profiles.

  • 3)

    BME is used for optimizing the weights starting from the prior weights.

  • 4)

    The optimized weights from 3 are used to calculate a new ensemble average of the SAXS profiles, which in turn is used for a new weighted least-square fit to the experimental profile.

  • 5)

    With the new slope and intercept, the calculated SAXS data set used in the previous BME reweighting is again rescaled and shifted.

  • 6)

    Repeat 3–5 until the drop of χred2 between consecutive iteration of the algorithm falls below a predefined threshold or for a fixed number of iterations (we used 20 iterations in our analyses).

We initially tested the method using synthetic data to examine how well it can recover the scale factor and constant background (see Supporting materials and methods and Figs. S1–S4). We note also that iBME, in part, has overlap with features in BioEn (26), in which only the scale factor is adjusted iteratively upon optimizing the weights.

iBME is implemented in an updated version of BME (https://github.com/KULL-Centre/BME). Data and scripts used for the analyses presented in this manuscript are available at https://github.com/KULL-Centre/papers/tree/main/2021/SAXS-pesce-et-al.

Calculation of the radii of gyration

We use two different methods to estimate the (average) radius of gyration (Rg) of a conformational ensemble, one based on the protein coordinates and another based on the SAXS data.

From a conformational ensemble, the Rg for each conformer of n atoms can be calculated as Rg=inmi|rirCOM|inmi, with ri being the position of the ith atom, mi its mass, and rCOM=inrimiinmi the center of mass. We used MDTraj (57) for these calculations and calculate the ensemble average ‹Rg› as a linear or weighted average of the Rg-values from each conformer.

As an alternative to using the atomic masses to weigh the distances in the calculation of Rg, we also use the atom contrasts, defined for the ith atom as δρi2=(ρiρw)2, where ρw is the density of bulk water (334 e/nm3) and ρi is the density of the ith atom calculated as the ratio between its number of electrons and its volume (58). We do not, however, observe substantial differences between the mass-weighted and contrast-weighted values of Rg (Fig. S5).

From an experimental SAXS profile, we use the Guinier approximation to estimate the average Rg in solution (59). We first transform the SAXS profile as 1n I(q) vs. q2, then obtain ‹Rg› from the slope (a) of a linear fit in the small-angle region using Rg=3a. The linear fit takes into account the uncertainty of the intensities (propagated as |σiIi|) and was performed using the scikit-learn python library (60).

Results and discussion

Conformational ensembles and SAXS data

Our aim here is to develop a strategy to model conformational ensembles of flexible proteins with SAXS data, taking into account both uncertainty about a scale factor and constant background in the experimental SAXS data as well as effects of the hydration layer and displaced solvent. As the object for our analyses, we selected five proteins for which SAXS profiles had been determined experimentally and published. Also, because protein flexibility may exist in multiple forms and to include different types, we first choose three IDPs of different lengths and a multidomain protein with flexible linkers, including Histatin 5 (Hst5) (SAXS data collected at 323K from (61)), Sic1 (SAXS data from (62)), full-length (ht40-)Tau (SAXS data from (63)), and the three-domain protein TIA1 without its flexible low-complexity domain (SAXS data from (64)). Furthermore, we also analyze below an additional IDP (α-Synuclein, with SAXS data from (65)) to examine the robustness of the analyses done on the four proteins listed above.

We generated conformational ensembles of the four IDPs using flexible-meccano (50). Additionally we also analyzed two previously performed molecular dynamics simulations of α-Synuclein (66) produced using either the Amber a99SB-disp or the Amber ff03ws force field. We also used a previously generated molecular dynamics simulation of TIA1 (67). The TIA1 simulations were performed with the Martini force field (68) after increasing the interaction strength between protein and water by 6% (67). All structures were converted to all-atom representation before calculating SAXS data.

We used the implicit solvent SAXS calculation approach Pepsi-SAXS (Polynomial Expansions of Protein Structures and Interactions SAXS) (14) to calculate SAXS profiles from atomic coordinates. We choose this method for its versatility and computational efficiency, but our approach will also apply to other similar methods (13,15), and below we also discuss and show results using FoXS. When no additional information, other than atomic coordinates and an experimental SAXS profile, is provided to Pepsi-SAXS, the software may tune four parameters to optimize the fit between the calculated and experimental SAXS profile: 1) the intensity of the forward scattering I(0) (i.e., the scale of the profiles), 2) a constant background cst, 3) the effective atomic radius r0, and 4) the contrast of the hydration layer δρ. In our calculations, we do not enable direct parameter fitting within Pepsi-SAXS and, instead, keep these parameters fixed to the same value for each conformer of an ensemble. As described in more detail below, we instead fit I(0) and cst as global ensemble averages and scan r0 and δρ to determine self-consistent ensembles.

Determining self-consistent ensembles and hydration layer and displaced solvent parameters

By default, Pepsi-SAXS performs a grid search for the combination of r0 and δρ that provides the best fit (lowest χ2) between the SAXS profile calculated from of a specific protein structure and the experimental data. Although this strategy may be appropriate to calculate a SAXS profile for globular proteins with little conformational heterogeneity, it can result in overfitting if applied to each structure in highly heterogenous conformational ensembles. Default values of r0 and δρ might be determined by fitting SAXS data to known crystal structures and used without modification on other proteins. This, however, would amount to making the assumption that the hydration effects are constant and transferable from specific globular proteins to, for example, IDPs. We note here that it has been shown that the surface properties of the protein affect the hydration contribution (69,70).

Instead, to determine the combination of parameters that best describes a conformational ensemble, to shed light on the influence of these two parameters, and to find a single set of parameters that provides a good description of the data, here, we want to keep the rationale of a grid search but add an ensemble perspective. Similar to the standard grid scan, we calculate SAXS data for a range of values of r0 and δρ. To define the ranges for the grid, we compare the search ranges for r0 and δρ implemented by three of the most widely used algorithms for SAXS calculations, CRYSOL (13), FoXS (15), and Pepsi-SAXS (Table S1), and use the widest ranges allowed by the three methods. Specifically, for r0, we use 11 values in the range 1.4–1.8 Å, whereas for δρ, we use 30 values in the range −27.0 to 70.0 e/nm3. We also make the assumption that r0 and δρ are the same for all conformers in the ensemble. Whereas these might in principle be conformation dependent (70,71), we do so to decrease the risk of overfitting when varying these two parameters for each of the thousands of conformations. Also, because the goal here is to describe the conformational distribution of the protein in solution, we do not expect a substantial difference as long as there is not a strong conformational dependency on the properties of the solvation layer.

Given that the input (“prior”) ensemble may not be fully representative of the protein in solution, we do not just compare the experimental SAXS profiles with each of the average SAXS profiles calculated with Pepsi-SAXS with different values of r0 and δρ (37,72). Instead, we use the BME approach to reweight the prior ensembles against the experimental data (36), using as input (one at a time) the SAXS calculations with different values of r0 and δρ. In the reweighting, it is key that the calculated SAXS profiles match the intensity of the forward scattering I(0) and the constant background cst of the experimental signal. To accurately fit both I(0) and cst upon reweighting, we developed the iBME scheme (see Materials and methods for detailed description and Supporting materials and methods for validation). The iBME method uses iterations of rescaling and shifting the calculated SAXS profiles and reweighting of the conformational ensemble to fit a global value of I(0) and cst. The only requirement is that the same values for both I(0) and cst are used to calculate the SAXS data for all conformers (I(0) and cst are ensemble properties related to the experimental SAXS profile and independent of the single conformation). We set I(0) and cst = 0 for all conformers in the Pepsi-SAXS calculations, but because these parameters are adjusted by iBME, the choice of the starting values is not essential. In this way, we scan a range of r0 and δρ and use iBME to fit I(0), cst, and the conformational ensemble. To simplify interpretations and analysis, we kept the parameter θ constant for each protein (values specified in Table 1). This was done to keep the balance between the prior and experiment constant so as to focus on changes that arise because of differences in the hydration and displaced solvent parameters. The resulting reweighted ensembles (at different values of r0 and δρ) are analyzed further below.

Table 1.

Best fitting SAXS parameters, input, and results of the iBME optimization

Hst5 Sic1 Tau TIA1
r0 (Å) 1.722 1.558 1.640 1.722
δρ[e/nm3] 10.02 10.02 0.00 −3.34
θ 80 80 50 100
χred2 (before iBME) 3.52 1.39 1.64 0.919
χred2 (after iBME) 1.04 1.02 1.14 0.540
φeff 0.911 0.941 0.707 0.884

We also validated the grid-scanning approach using synthetic SAXS data generated using a specific choice of r0 and δρ to generate the data (see Supporting materials and methods). The results show that, both with a correct prior (Fig. S6) and a prior that is different from that used to generate the synthetic data (Fig. S7), the method is able to recover values of r0 and δρ very close to those used to generate the data.

A scoring function for the ensembles on the r0 × δρ grid

Once we calculated SAXS profiles and refined (i.e., reweighted) the ensembles for each pair of parameters on the r0 × δρ grid, we needed a scoring function to quantify the agreement with the experimental data. We already note here the complication arising from the fact that the ensembles have been refined against the experiments.

We first calculated the χred2 after the iBME optimization to indicate the quality of the ensembles. For Hst5, Sic1, Tau, and TIA1, this led to a large region with lowχred2 (Fig. 1, aj), suggesting that most of the combinations of r0 and δρ tested with r0 ≤ 1.722 Å can be fitted to the experimental data. We note also that, although the χred2 is widely used for the purpose of comparing SAXS profiles, it has been noticed that it can be prone to overfitting if the noise is not estimated correctly (73,74). For this reason, previous studies have focused, for example, on identifying the amount of information in a SAXS profile or in correcting the experimental noise (74, 75, 76). Here, because we are comparing different fits to the same data and with the same number of degrees of freedom, we did not use such corrections. In turn, this means that the calculated values of χred2 cannot easily be compared across the four systems that we analyzed.

Figure 1.

Figure 1

Reweighting ensembles using SAXS data calculated using different values for the parameters that effect the contribution from for the hydration layer and displaced solvent. The grids show the results from the iBME ensemble optimization with different combinations of δρ and r0. The top row (ac) shows Hst5, the second row (df) shows Sic1, the third row (gi) shows Tau, and the last row (jl) shows results for TIA1. For each protein, we show in the first column (a, d, g, and j) ln(χred2), we show in the second column (b, e, h, and k) φeff, and we show in the third column (c, f, i, and l) γ=ln(χred2φeff). White spots correspond to ensembles in which the iBME reweighting failed. To see this figure in color, go online.

When reweighting an ensemble against experiments, it is important to monitor the effective fraction of frames (φeff) that, as explained above, quantifies how much the posterior distribution deviates from the prior. When φeff is low, this indicates that the ensemble has to be modified substantially to achieve the desired agreement with experiments. To ease comparison across the different ensembles in the grid, we have chosen to use the same value of θ for all combinations of r0 and δρ, where θ sets the balance between not deviating too much from the prior ensemble (maximizing φeff) and fitting the experimental data (minimizing χred2). Thus, at fixed values of θ, the resulting value of φeff is another indicator of the quality of the ensembles (77), and we find a relatively narrow region of the grids with high values of φeff (Fig. 1, b, e, h, and k). Thus, comparing the maps of χred2 and φeff, we find that, whereas it is possible to achieve a relatively good fit at a wider range of values of δρ and r0, in many cases this comes at the cost of a substantial deviation from the prior (low φeff). To combine the balance of achieving a low χred2 and a high φeff, we thus introduce a variable, γ=ln(χred2φeff), that combines these two effects in a single number (Fig. 1, c, f, i, and l). The results show that it is not possible to obtain a good fit (defined here as giving rise to a low γ) at all values of δρ and r0, but that there are certain regions that appear to give rise to comparable fits. The parameter sets that give the lowest values of γ for Hst5, Sic1, Tau, and TIA1 are reported in Table 1 together with the χred2 before and after reweighting and the φeff. We note that the final ensemble may not be optimal and that further refinement could be obtained by scanning θ (24,36,37).

We observe that, whereas there are some differences between the four proteins analyzed above, it also appears that there is a region that gives relatively good fits for all proteins (Fig. 1). Because it may be computationally expensive to scan many sets of parameters, we also aimed to find a set of parameters that provides good scores for these four proteins. We therefore normalized and averaged the γ scores and found the minimum to be at δρ = 3.34 e/nm3 and r0 = 1.681 Å.

We note that, in the literature, higher values are generally reported as default for δρ (generally 10% (7) or 6% (17,71) of the bulk density). In the context of SAXS calculations, however, we also note that the main quantity that determines the contribution of the hydration layer is the product between δρ and its width Δ (13,14). Whereas Δ is 3 Å in CRYSOL, it is chosen in a slightly different fashion in Pepsi-SAXS, and it is 5 Å in most of the cases that we examined. To demonstrate that different values for the contrast of the hydration layer alone can lead to the same result when the width is treated in different ways, we also repeated the grid scans for Hst5, Sic1, Tau, and TIA1 employing FoXS. Although the minima for γ are different from those obtained using Pepsi-SAXS (Fig. S8), the reweighted distributions of Rg from these minima are essentially identical (Fig. S9), reinforcing the observation that δρ alone is meaningful only in the context of a specific SAXS calculator. In addition, by again normalizing and averaging the γ score, we obtain a global minimum for the parameters in FoXS at r0 =1.68 Å (as in Pepsi-SAXS) and δρ = −7.07 e/nm3. We note that this value for δρ appears substantially different from those used for folded proteins and suggest that further studies are needed to examine better the physical origins of these effects.

Conversely, the value r0 = 1.681 Å is slightly higher than the average values used by CRYSOL and FoXS (1.62 Å) and Pepsi-SAXS (1.64 Å). Although the origin of this observation is unclear, we note that there are differences in protein volume, depending on whether a protein is folded or unfolded (78,79). Thus, the excluded volume, as described by the Fraser model (58), might need different parameters for compact and expanded proteins, and we suggest that this could be studied further using molecular simulations (9).

Effect of hydration and atomic radius parameters on the conformational ensemble

The idea of the grid search is to find a combination of r0 and δρ that gives rise to the best agreement with experimental data, also taking into account that we need to determine the parameters and ensemble weights at the same time. Here, we explore the effect of choosing specific sets of these parameters on the conformational ensembles.

We first examined how much the individual ensembles differed from those determined using the r0 and δρ parameters that give rise to the lowest value of γ (Table 1). We therefore calculated, as a measure of the difference between ensembles, φeff between the weights optimized using the different combinations of r0 and δρ relative to the weights obtained using the “optimal” values of r0 and δρ (Fig. 2). As expected, values around the optimum give rise to comparable weights (φeff close to 1). For Sic1 and TIA1, we also note a correlation between r0 and δρ, such that increasing the excess density (δρ) and decreasing the radius (r0) appear to give rise to more comparable ensembles. Nevertheless, the results also show that, whereas several different combinations of r0 and δρ can give rise to a good fit (Fig. 1), the resulting ensembles differ depending on the choice of parameters used to calculate the scattering data. In particular, we find that the ensembles are rather sensitive to the choice of δρ, in particular for the three IDPs analyzed above.

Figure 2.

Figure 2

Comparing ensembles relative to the optimum. For each protein (a: Hst5, b: Sic1, c: Tau and d: TIA1) we calculated the effective fraction of frames (shown here as φeff) between the weights obtained using the parameters in Table 1 and the weights obtained at all other combinations of r0 and δρ. White spots correspond to ensembles in which the iBME reweighting failed. Purple spots correspond to the minima for γ. To see this figure in color, go online.

SAXS data are often used to estimate Rg, so we demonstrate how the different ensembles have different distributions of Rg. Using Sic1 as an example, we chose the optimal parameters as well as three other combinations of r0 and δρ and calculated p(Rg) after reweighting (Fig. 3). The results show that, as long as r0 and δρ are chosen within the range that gives a low value of γ, the resulting distribution is relatively similar. On the other hand, if more extreme values for the r0 and δρ parameters are chosen, the average Rg may differ substantially in the reweighted ensembles (Fig. S10).

Figure 3.

Figure 3

Effect of the δρ and r0 parameters on reweighted probability distributions of Rg. We use Sic1 as an example and show p(Rg) from both the optimal (lowest γ) parameters (blue) as well as three other choices of r0 and δρ in the low-γ region (orange, green, and red). The insert shows the parameters used in each case and the results of the reweighting on the Rg distribution. To see this figure in color, go online.

Assessing the influence of the prior on the parameters search

Because our strategy to determine self-consistent values for δρ and r0 is based on the BME refinement of probability distributions, it is reasonable to ask how the results depend on the statistical prior used in the approach. In this context, there are two related questions that we address here. First, as we have also examined previously (65,67), is the question of how much the distribution of conformations after reweighting depends on the prior that is used. Second is the question of how much the δρ and r0 parameters depend on the prior. The latter is important because the optimal parameters may in part compensate for imperfections in the prior.

To examine these questions we applied our protocol to three different ensembles of α-Synuclein. The first ensemble was generated using flexible-meccano, whereas the other two were previously generated by molecular dynamics simulations (66) using either the Amber a99SB-disp or the Amber ff03ws (a03ws) force field. The distributions of the Rg for the three priors are relatively different (Fig. 4 a), and consequently, the minima of the γ parameter indicate small differences in the best values of δρ and r0 (Fig. 5). The reweighted distributions of Rg, however, appear very similar (Fig. 4 b). Notably, for each prior, we obtain essentially indistinguishable distributions of Rg whether we use the optimal parameters (for each prior) from the grid search or the set of parameters that we proposed as default values (Fig. 4 b).

Figure 4.

Figure 4

Effect of the prior distribution. (a) Distributions of Rg of α-Synuclein sampled with flexible-meccano (FM), a99SB-disp (disp), and a03ws. (b) Reweighted Rg distributions, either from the optimal (lowest γ) δρ and r0 parameters for each ensemble (solid lines) or using the default values, we propose (δρ= 3.34 e/nm3 and r0 = 1.681 Å; dotted lines). To see this figure in color, go online.

Figure 5.

Figure 5

Reweighting α-Synuclein ensembles using SAXS data calculated using different values for the parameters that effect the contribution from the hydration layer and displaced solvent. The grids show the results from the iBME ensemble optimization with different combinations of δρ and r0. The top row (ac) shows the results from the flexible-meccano ensemble, the second row (df) shows the results using a99SB-disp as the prior, and the third row (gi) shows the results from a03ws as the prior. For each ensemble we show in the first column (a, d, and g) ln(χred2), in the second column we show (b, e, and h) φeff, and in the third column (c, f, and i) we show γ=ln(χred2φeff). White spots correspond to ensembles in which the iBME reweighting failed. Purple spots in the third column correspond to the minima for γ. To see this figure in color, go online.

Returning to the original two questions, these results show that the prior may influence the optimal parameters resulting from the grid search, similar to our observations using synthetic SAXS data (Figs. S6 and S7). They also show, in line with previous observations (65,67), that even when starting from somewhat different priors, the posterior distributions tend to be substantially similar. Noteworthy, the results are robust to the choice of δρ and r0, so that very similar results are obtained, even when using the global minimum from our analyses of Hst5, Sic1, Tau, and TIA1.

Comparing ensembles to experimental estimates of Rg

In the analyses of the Rg described above, we implicitly referred to the values calculated from the protein coordinates as the mass-weighted root mean-square distance from the center of mass of the protein. This is a geometric quantity that is often used to study protein behavior and biophysics. Because the ensembles were constructed by fitting to the experimental SAXS data, the resulting averages and distributions of Rg represent the experimental system, but exactly because the hydration effects were included in the SAXS calculations, this means that these Rg-values only represent the protein.

Another approach to estimate ‹Rg› from experiment is to fit the SAXS data directly without resorting to a conformational ensemble. The most common approach is to use the Guinier approximation (59), although other approaches exist (80, 81, 82). Because the SAXS data potentially contain a contribution from the hydration layer, the ‹Rg› estimated by a Guinier analysis (or similar methods) may, in principle, contain contributions from this (17). One complication of a Guinier analysis is to identify the linear part of the curve (the Guinier region), in particular because the first few low q points of the scattering curve may often be noisy. As rule of thumb, the maximal scattering angle that can be used for the Guinier approximation satisfies the condition qmaxRg› < 1.3 (1), but a threshold value of 0.9 has also been proposed for disordered systems (83).

Because both approaches to estimate ‹Rg› are commonly used, we here compare the two results. In addition to shedding light on differences, this analysis is also relevant because it is relatively common to compare ‹Rg›-values calculated from simulations with values estimated from experiments, although the two might differ because of effects of the hydration layer. Thus, we performed a Guinier analysis of the SAXS data for the four proteins, progressively extending the upper limit of the q-range from 0.9 to 1.3 and plotting Rg vs. qmaxRg (83). We find that the Guinier fits can show substantial differences in the estimated ‹Rg› values, depending on the range used. Returning to the question of how the Rg-values estimated from the Guinier fit compare to the average Rg from the conformational ensembles with the lowest γ scores (horizontal black line in Fig. 6), we find that these are in a reasonable agreement (within 0.2 nm) with the values calculated from Guinier fits using qmaxRg = 1.3. Looking across the four proteins, we do not find a unique qmaxRg-value for which the Guinier fit gives rise to an average Rg that is similar to that obtained from the conformational ensembles.

Figure 6.

Figure 6

Estimating ‹Rg› from experimental SAXS profiles of (a) Hst5, (b) Sic1, (c) α-Synuclein, (d) Tau, and (e) TIA1 using Guinier fitting and ensemble refinement. We used the Guinier approximation to estimate Rg by fitting from the lowest measured value of q (in the case of Hst5 we ignored the first 10 points due to noise) to different values of qmax, reporting the results as Rg vs. qmaxRg (black circles). The horizontal black lines are the ensemble-averaged Rg calculated from the conformational ensembles (in the case of α-Synuclein, we used the flexible-meccano prior) with the chosen optimal r0 and δρ parameters (Table 1).

Conclusions

SAXS experiments are widely used as source of structural information and are often integrated with computational methods to determine conformational ensembles. Generally, such approaches rely on a forward model, such as Pepsi-SAXS (14), to calculate SAXS data from one or more conformations and optimize the structures or weights to improve agreement with experiments. Although these approaches are very powerful, they are subject to uncertainty due to the choice of unknown parameters in the forward model. In principle, these parameters can be “integrated out” using Bayesian approaches (31,84), although this can become computationally prohibitive for SAXS calculations. Thus, the aim of our work is to provide a robust protocol that estimates values for the relevant free parameters. In the context of SAXS, these include the two parameters that determine the effects of the hydration layer and displaced volume (δρ and r0) as well as a scale factor and constant background (I(0) and cst) that are often necessary to estimate.

We have developed and tested iBME as an extension to BME to include a scale factor and constant background between experimental and calculated values. Importantly, the values are estimated as the globally best fitting parameters and are determined self-consistently with the weights of the ensembles. Although we have presented iBME here in the context of SAXS data, other types of data, such as NMR residual dipolar couplings, solvent paramagnetic relaxation enhancement effects, or circular dichroism spectra, may also involve estimating an overall scale. For small-angle neutron scattering data, the ability to include (fit) a constant background can be important because of contributions from incoherent scattering.

We also present the results from an extensive analysis of the effect of the r0 and δρ parameters on calculated SAXS data and the resulting ensembles. We have determined self-consistent ensembles in which the ensembles are reweighted using SAXS data calculated using different values for these parameters. Such an analysis is, in particular, important for large ensembles of flexible molecules because fitting these parameters to each conformation could lead to substantial overfitting. We also note that the calculations of SAXS intensities could potentially be improved further by being able to predict the features and contribution from the hydration layer for different sequences and conformations rather than relying on fitting parameters.

Combining these two aspects, the approach that we have described can be summarized as follows: 1) sampling a conformational ensemble; 2) calculating SAXS profiles from the conformers of the ensemble, keeping scale and background parameters fixed (I(0) = 1, cst = 0) and performing a grid scan for δρ and r0; 3) for each value of δρ and r0, optimizing the weights, I(0) and cst using iBME and 4) examining the results by calculating χred2, φeff, and γ, and selecting the ensemble with the lowest value of γ.

One complication of the algorithm is that it requires a large number of calculations of SAXS intensities. In cases where the prior ensemble already exists or is fast to generate, the SAXS calculations can quickly become limiting in terms of computational efficiency. For these reasons we also propose default values for δρ and r0 that we find to provide relatively accurate results for the four proteins that we examined. To test this further, we also used these default parameters to calculate SAXS intensities from different conformational ensembles of α-Synuclein and find that the resulting distributions of Rg are almost the same as if the parameters are optimized. We also note that the computational overhead of the grid scans could be drastically reduced by precomputing partial SAXS intensities once per grid and then adding the contributions from δρ and r0. Although this procedure is already internally used by several methods to calculate SAXS data, options to output and process partial intensities for specific scattering angles are, at the moment, not easily accessible.

Finally, we also discuss considerations on the common practice of comparing the experimentally determined Rg (calculated with the Guinier approximation) with Rg calculated from the structural ensemble. Although the results show good agreement, they also suggest that caution should be exerted when comparing average Rg-values from simulations and experiments. In particular, we find that both changing the r0 and δρ parameters (Fig. S10) or the region used for Guinier fitting (Fig. 6) can change the Rg substantially, so generally, we recommend that it is better to compare the experimental data (in this case SAXS intensities) with values calculated from simulations rather than comparing parameters estimated from experiments. Nevertheless, even such comparisons contain ambiguities because one needs to choose parameters in the SAXS calculations. Thus, we suggest that our work will be useful when benchmarking molecular simulations against SAXS data by providing additional insight into the effect of the hydration layer (17) and suggest default values that can be used as a starting point.

Author contributions

F.P. and K.L.-L. designed research. F.P. performed research. F.P. and K.L.-L. analyzed data. F.P. and K.L.-L. wrote the manuscript.

Acknowledgments

We acknowledge Sandro Bottaro for fruitful discussions and for implementing the iBME algorithm in the BME software, Simone Orioli for useful discussions and input about small-angle x-ray scattering and other aspects of this manuscript. We thank Andreas Haahr Larsen and Ramon Crehuet for discussions and comments on the manuscript. We are grateful to Dimitri Svergun and Tanja Mittag for sharing small-angle x-ray scattering data for, respectively, Tau and Sic1.

This research was funded by the Lundbeck Foundation BRAINSTRUC initiative in structural biology (R155-2015-2666 to K.L.L.)

Editor: Jill Trewhella.

Footnotes

Supporting material can be found online at https://doi.org/10.1016/j.bpj.2021.10.003.

Supporting materials and methods

Document S1. Supporting materials and methods, Figs. S1–S10, and Table S1
mmc1.pdf (1.6MB, pdf)
Document S2. Article plus supporting material
mmc2.pdf (3MB, pdf)

References

  • 1.Kikhney A.G., Svergun D.I. A practical guide to small angle X-ray scattering (SAXS) of flexible and intrinsically disordered proteins. FEBS Lett. 2015;589:2570–2577. doi: 10.1016/j.febslet.2015.08.027. [DOI] [PubMed] [Google Scholar]
  • 2.Grant T.D. Ab initio electron density determination directly from solution scattering data. Nat. Methods. 2018;15:191–193. doi: 10.1038/nmeth.4581. [DOI] [PubMed] [Google Scholar]
  • 3.Prior C., Davies O.R., et al. Pohl E. Obtaining tertiary protein structures by the ab initio interpretation of small angle X-ray scattering data. J. Chem. Theory Comput. 2020;16:1985–2001. doi: 10.1021/acs.jctc.9b01010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.He H., Liu C., Liu H. Model reconstruction from small-angle x-ray scattering data using deep learning methods. iScience. 2020;23:100906. doi: 10.1016/j.isci.2020.100906. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Hub J.S. Interpreting solution X-ray scattering data using molecular simulations. Curr. Opin. Struct. Biol. 2018;49:18–26. doi: 10.1016/j.sbi.2017.11.002. [DOI] [PubMed] [Google Scholar]
  • 6.Tria G., Mertens H.D.T., et al. Svergun D.I. Advanced ensemble modelling of flexible macromolecules using X-ray solution scattering. IUCrJ. 2015;2:207–217. doi: 10.1107/S205225251500202X. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Svergun D.I., Richard S., et al. Zaccai G. Protein hydration in solution: experimental observation by x-ray and neutron scattering. Proc. Natl. Acad. Sci. USA. 1998;95:2267–2272. doi: 10.1073/pnas.95.5.2267. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Park S., Bardhan J.P., et al. Makowski L. Simulated x-ray scattering of protein solutions using explicit-solvent models. J. Chem. Phys. 2009;130:134114. doi: 10.1063/1.3099611. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Chen P.C., Hub J.S. Validating solution ensembles from molecular dynamics simulation by wide-angle X-ray scattering data. Biophys. J. 2014;107:435–447. doi: 10.1016/j.bpj.2014.06.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Knight C.J., Hub J.S. WAXSiS: a web server for the calculation of SAXS/WAXS curves based on explicit-solvent molecular dynamics. Nucleic Acids Res. 2015;43 doi: 10.1093/nar/gkv309. W225-30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Köfinger J., Hummer G. Atomic-resolution structural information from scattering experiments on macromolecules in solution. Phys. Rev. E Stat. Nonlin. Soft Matter Phys. 2013;87:052712. doi: 10.1103/PhysRevE.87.052712. [DOI] [PubMed] [Google Scholar]
  • 12.Grishaev A., Guo L., et al. Bax A. Improved fitting of solution x-ray scattering data to macromolecular structures and structural ensembles by explicit water modeling. J. Am. Chem. Soc. 2010;132:15484–15486. doi: 10.1021/ja106173n. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Svergun D., Barberato C., Koch M.H.J. CRYSOL – a program to evaluate x-ray solution scattering of biological macromolecules from atomic coordinates. J. Appl. Cryst. 1995;28:768–773. [Google Scholar]
  • 14.Grudinin S., Garkavenko M., Kazennov A. Pepsi-SAXS: an adaptive method for rapid and accurate computation of small-angle x-ray scattering profiles. Acta Crystallogr. D Struct. Biol. 2017;73:449–464. doi: 10.1107/S2059798317005745. [DOI] [PubMed] [Google Scholar]
  • 15.Schneidman-Duhovny D., Hammel M., Sali A. FoXS: a web server for rapid computation and fitting of SAXS profiles. Nucleic Acids Res. 2010;38:W540–W544. doi: 10.1093/nar/gkq461. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Schneidman-Duhovny D., Hammel M., et al. Sali A. Accurate SAXS profile computation and its assessment by contrast variation experiments. Biophys. J. 2013;105:962–974. doi: 10.1016/j.bpj.2013.07.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Henriques J., Arleth L., et al. Skepö M. On the calculation of SAXS profiles of folded and intrinsically disordered proteins from computer simulations. J. Mol. Biol. 2018;430:2521–2539. doi: 10.1016/j.jmb.2018.03.002. [DOI] [PubMed] [Google Scholar]
  • 18.Best R.B., Zheng W., Mittal J. Balanced protein – water interactions improve properties of disordered proteins and non-specific protein association. J. Chem. Theory Comput. 2014;10:5113–5124. doi: 10.1021/ct500569b. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Piana S., Donchev A.G., et al. Shaw D.E. Water dispersion interactions strongly influence simulated structural properties of disordered protein states. J. Phys. Chem. B. 2015;119:5113–5123. doi: 10.1021/jp508971m. [DOI] [PubMed] [Google Scholar]
  • 20.Henriques J., Cragnell C., Skepö M. Molecular dynamics simulations of intrinsically disordered proteins: force field evaluation and comparison with experiment. J. Chem. Theory Comput. 2015;11:3420–3431. doi: 10.1021/ct501178z. [DOI] [PubMed] [Google Scholar]
  • 21.Rauscher S., Gapsys V., et al. Grubmüller H. Structural ensembles of intrinsically disordered proteins depend strongly on force field: a comparison to experiment. J. Chem. Theory Comput. 2015;11:5513–5524. doi: 10.1021/acs.jctc.5b00736. [DOI] [PubMed] [Google Scholar]
  • 22.Palazzesi F., Prakash M.K., et al. Barducci A. Accuracy of current all-atom force-fields in modeling protein disordered states. J. Chem. Theory Comput. 2015;11:2–7. doi: 10.1021/ct500718s. [DOI] [PubMed] [Google Scholar]
  • 23.Fisher C.K., Huang A., Stultz C.M. Modeling intrinsically disordered proteins with Bayesian statistics. J. Am. Chem. Soc. 2010;132:14919–14927. doi: 10.1021/ja105832g. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Różycki B., Kim Y.C., Hummer G. SAXS ensemble refinement of ESCRT-III CHMP3 conformational transitions. Structure. 2011;19:109–116. doi: 10.1016/j.str.2010.10.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Beauchamp K.A., Pande V.S., Das R. Bayesian energy landscape tilting: towards concordant models of molecular ensembles. Biophys. J. 2014;106:1381–1390. doi: 10.1016/j.bpj.2014.02.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Hummer G., Köfinger J. Bayesian ensemble refinement by replica simulations and reweighting. J. Chem. Phys. 2015;143:243150. doi: 10.1063/1.4937786. [DOI] [PubMed] [Google Scholar]
  • 27.Bonomi M., Camilloni C., et al. Vendruscolo M. Metainference: a Bayesian inference method for heterogeneous systems. Sci. Adv. 2016;2:e1501177. doi: 10.1126/sciadv.1501177. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Shevchuk R., Hub J.S. Bayesian refinement of protein structures and ensembles against SAXS data using molecular dynamics. PLoS Comput. Biol. 2017;13:e1005800. doi: 10.1371/journal.pcbi.1005800. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Potrzebowski W., Trewhella J., Andre I. Bayesian inference of protein conformational ensembles from limited structural data. PLoS Comput. Biol. 2018;14:e1006641. doi: 10.1371/journal.pcbi.1006641. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Lincoff J., Haghighatlari M., et al. Head-Gordon T. Extended experimental inferential structure determination method in determining the structural ensembles of disordered protein states. Commun. Chem. 2020;3:74. doi: 10.1038/s42004-020-0323-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Spill Y.G., Karami Y., et al. Nilges M. Automatic Bayesian weighting for SAXS data. Front. Mol. Biosci. 2021;8:671011. doi: 10.3389/fmolb.2021.671011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Pitera J.W., Chodera J.D. On the use of experimental observations to bias simulated ensembles. J. Chem. Theory Comput. 2012;8:3445–3451. doi: 10.1021/ct300112v. [DOI] [PubMed] [Google Scholar]
  • 33.Boomsma W., Ferkinghoff-Borg J., Lindorff-Larsen K. Combining experiments and simulations using the maximum entropy principle. PLOS Comput. Biol. 2014;10:e1003406. doi: 10.1371/journal.pcbi.1003406. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Cesari A., Reißer S., Bussi G. Using the maximum entropy principle to combine simulations and solution experiments. Computation (Basel) 2018;6:15. [Google Scholar]
  • 35.Hermann M.R., Hub J.S. SAXS-restrained ensemble simulations of intrinsically disordered proteins with commitment to the principle of maximum entropy. J. Chem. Theory Comput. 2019;15:5103–5115. doi: 10.1021/acs.jctc.9b00338. [DOI] [PubMed] [Google Scholar]
  • 36.Bottaro S., Bengtsen T., Lindorff-Larsen K. Springer US; New York, NY: 2020. Integrating Molecular Simulation and Experimental Data: A Bayesian/Maximum Entropy Reweighting Approach. [DOI] [PubMed] [Google Scholar]
  • 37.Orioli S., Larsen A.H., et al. Lindorff-Larsen K. In: Computational Approaches for Understanding Dynamical Systems: Protein Folding and Assembly. Strodel B., Barz B., editors. Academic Press; 2020. Chapter Three - How to learn from inconsistencies: integrating molecular simulations with experimental data; pp. 123–176. [Google Scholar]
  • 38.Bernadó P., Svergun D.I. Structural analysis of intrinsically disordered proteins by small-angle x-ray scattering. Mol. Biosyst. 2012;8:151–167. doi: 10.1039/c1mb05275f. [DOI] [PubMed] [Google Scholar]
  • 39.Pelikan M., Hura G.L., Hammel M. Structure and flexibility within proteins as identified through small angle x-ray scattering. Gen. Physiol. Biophys. 2009;28:174–189. doi: 10.4149/gpb_2009_02_174. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Krzeminski M., Marsh J.A., et al. Forman-Kay J.D. Characterization of disordered proteins with ENSEMBLE. Bioinformatics. 2013;29:398–399. doi: 10.1093/bioinformatics/bts701. [DOI] [PubMed] [Google Scholar]
  • 41.Yang S., Blachowicz L., et al. Roux B. Multidomain assembled states of Hck tyrosine kinase in solution. Proc. Natl. Acad. Sci. U.S.A. 2010;107:15757–15762. doi: 10.1073/pnas.1004569107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Brüschweiler R., Case D. Adding harmonic motion to the Karplus relation for spin-spin coupling. J. Am. Chem. Soc. 1994;116:11199–11200. [Google Scholar]
  • 43.Lindorff-Larsen K., Best R.B., Vendruscolo M. Interpreting dynamically-averaged scalar couplings in proteins. J. Biomol. NMR. 2005;32:273–280. doi: 10.1007/s10858-005-8873-0. [DOI] [PubMed] [Google Scholar]
  • 44.Louhivuori M., Otten R., et al. Annila A. Conformational fluctuations affect protein alignment in dilute liquid crystal media. J. Am. Chem. Soc. 2006;128:4371–4376. doi: 10.1021/ja0576334. [DOI] [PubMed] [Google Scholar]
  • 45.Salvatella X., Richter B., Vendruscolo M. Influence of the fluctuations of the alignment tensor on the analysis of the structure and dynamics of proteins using residual dipolar couplings. J. Biomol. NMR. 2008;40:71–81. doi: 10.1007/s10858-007-9210-6. [DOI] [PubMed] [Google Scholar]
  • 46.Vitkup D., Ringe D., et al. Petsko G.A. Why protein R-factors are so large: a self-consistent analysis. Proteins. 2002;46:345–354. doi: 10.1002/prot.10035. [DOI] [PubMed] [Google Scholar]
  • 47.Moore P.B. The effects of thermal disorder on the solution-scattering profiles of macromolecules. Biophys. J. 2014;106:1489–1496. doi: 10.1016/j.bpj.2014.02.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Meisburger S.P., Thomas W.C., et al. Ando N. X-ray scattering studies of protein structural dynamics. Chem. Rev. 2017;117:7615–7672. doi: 10.1021/acs.chemrev.6b00790. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Xu D., Meisburger S.P., Ando N. Correlated motions in structural biology. Biochemistry. 2021;60:2331–2340. doi: 10.1021/acs.biochem.1c00420. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Ozenne V., Bauer F., et al. Blackledge M. Flexible-meccano: a tool for the generation of explicit ensemble descriptions of intrinsically disordered proteins and their associated experimental observables. Bioinformatics. 2012;28:1463–1470. doi: 10.1093/bioinformatics/bts172. [DOI] [PubMed] [Google Scholar]
  • 51.Estaña A., Sibille N., et al. Bernadó P. Realistic ensemble models of intrinsically disordered proteins using a structure-encoding coil database. Structure. 2019;27:381–391.e2. doi: 10.1016/j.str.2018.10.016. [DOI] [PubMed] [Google Scholar]
  • 52.Jensen M.R., Markwick P.R., et al. Blackledge M. Quantitative determination of the conformational properties of partially folded and intrinsically disordered proteins using NMR dipolar couplings. Structure. 2009;17:1169–1185. doi: 10.1016/j.str.2009.08.001. [DOI] [PubMed] [Google Scholar]
  • 53.Bernadó P., Blanchard L., et al. Blackledge M. A structural model for unfolded proteins from residual dipolar couplings and small-angle x-ray scattering. Proc. Natl. Acad. Sci. USA. 2005;102:17002–17007. doi: 10.1073/pnas.0506202102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Wells M., Tidow H., et al. Fersht A.R. Structure of tumor suppressor p53 and its intrinsically disordered N-terminal transactivation domain. Proc. Natl. Acad. Sci. USA. 2008;105:5762–5767. doi: 10.1073/pnas.0801353105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Mukrasch M.D., Markwick P., et al. Blackledge M. Highly populated turn conformations in natively unfolded tau protein identified from residual dipolar couplings and molecular simulation. J. Am. Chem. Soc. 2007;129:5235–5243. doi: 10.1021/ja0690159. [DOI] [PubMed] [Google Scholar]
  • 56.Rotkiewicz P., Skolnick J. Fast procedure for reconstruction of full-atom protein models from reduced representations. J. Comput. Chem. 2008;29:1460–1465. doi: 10.1002/jcc.20906. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.McGibbon R.T., Beauchamp K.A., et al. Pande V.S. MDTraj: a modern open library for the analysis of molecular dynamics trajectories. Biophys. J. 2015;109:1528–1532. doi: 10.1016/j.bpj.2015.08.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Fraser R.D.B., MacRae T.P., Suzuki E. An improved method for calculating the contribution of solvent to the x-ray diffraction pattern of biological molecules. J. Appl. Cryst. 1978;11:693–694. [Google Scholar]
  • 59.Guinier A. La diffraction des rayons X aux très petits angles: application à l’étude de phénomènes ultramicroscopiques. Ann. Phys. (Paris) 1939;11:161–237. [Google Scholar]
  • 60.Pedregosa F., Varoquaux G., et al. Duchesnay E. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 2011;12:2825–2830. [Google Scholar]
  • 61.Jephthah S., Staby L., et al. Skepö M. Temperature dependence of intrinsically disordered proteins in simulations: what are we missing? J. Chem. Theory Comput. 2019;15:2672–2683. doi: 10.1021/acs.jctc.8b01281. [DOI] [PubMed] [Google Scholar]
  • 62.Gomes G.-N.W., Krzeminski M., et al. Gradinaru C.C. Integrating multiple experimental data to determine conformational ensembles of an intrinsically disordered protein. bioRxiv. 2020 doi: 10.1101/2020.02.05.935890. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Mylonas E., Hascher A., et al. Svergun D.I. Domain conformation of tau protein studied by solution small-angle x-ray scattering. Biochemistry. 2008;47:10345–10353. doi: 10.1021/bi800900d. [DOI] [PubMed] [Google Scholar]
  • 64.Sonntag M., Jagtap P.K.A., et al. Sattler M. Segmental, domain-selective perdeuteration and small-angle neutron scattering for structural analysis of multi-domain proteins. Angew. Chem. Int.Engl. 2017;56:9322–9325. doi: 10.1002/anie.201702904. [DOI] [PubMed] [Google Scholar]
  • 65.Ahmed M.C., Skaanning L.K., et al. Lindorff-Larsen K. Refinement of α-synuclein ensembles against SAXS data: comparison of force fields and methods. Front. Mol. Biosci. 2021;8:654333. doi: 10.3389/fmolb.2021.654333. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Robustelli P., Piana S., Shaw D.E. Developing a molecular dynamics force field for both folded and disordered protein states. Proc. Natl. Acad. Sci. USA. 2018;115:E4758–E4766. doi: 10.1073/pnas.1800690115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Larsen A.H., Wang Y., et al. Lindorff-Larsen K. Combining molecular dynamics simulations with small-angle x-ray and neutron scattering data to study multi-domain proteins in solution. PLoS Comput. Biol. 2020;16:e1007870. doi: 10.1371/journal.pcbi.1007870. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Monticelli L., Kandasamy S.K., et al. Marrink S.-J. The MARTINI coarse-grained force field: extension to proteins. J. Chem. Theory Comput. 2008;4:819–834. doi: 10.1021/ct700324x. [DOI] [PubMed] [Google Scholar]
  • 69.Virtanen J.J., Makowski L., et al. Freed K.F. Modeling the hydration layer around proteins: HyPred. Biophys. J. 2010;99:1611–1619. doi: 10.1016/j.bpj.2010.06.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Virtanen J.J., Makowski L., et al. Freed K.F. Modeling the hydration layer around proteins: applications to small- and wide-angle x-ray scattering. Biophys. J. 2011;101:2061–2069. doi: 10.1016/j.bpj.2011.09.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Persson F., Söderhjelm P., Halle B. The geometry of protein hydration. J. Chem. Phys. 2018;148:215101. doi: 10.1063/1.5026744. [DOI] [PubMed] [Google Scholar]
  • 72.van Gunsteren W.F., Daura X., et al. Smith L.J. Validation of molecular simulation: an overview of issues. Angew. Chem. Int.Engl. 2018;57:884–902. doi: 10.1002/anie.201702945. [DOI] [PubMed] [Google Scholar]
  • 73.Rambo R.P., Tainer J.A. Accurate assessment of mass, models and resolution by small-angle scattering. Nature. 2013;496:477–481. doi: 10.1038/nature12070. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Larsen A.H., Pedersen M.C. Experimental noise in small-angle scattering can be assessed and corrected using the Bayesian Indirect Fourier Transformation. J. Appl. Cryst. 2020;54:1281–1289. [Google Scholar]
  • 75.Hansen S. Bayesian estimation of hyperparameters for indirect fourier transformation in small-angle scattering. J. Appl. Cryst. 2000;33:1415–1421. [Google Scholar]
  • 76.Konarev P.V., Svergun D.I. A posteriori determination of the useful data range for small-angle scattering experiments on dilute monodisperse systems. IUCrJ. 2015;2:352–360. doi: 10.1107/S2052252515005163. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Qian H. Relative entropy: free energy associated with equilibrium fluctuations and nonequilibrium deviations. Phys. Rev. E Stat. Nonlin. Soft Matter Phys. 2001;63:042103. doi: 10.1103/PhysRevE.63.042103. [DOI] [PubMed] [Google Scholar]
  • 78.Zamyatnin A.A. Protein volume in solution. Prog. Biophys. Mol. Biol. 1972;24:107–123. doi: 10.1016/0079-6107(72)90005-3. [DOI] [PubMed] [Google Scholar]
  • 79.Roche J., Royer C.A. Lessons from pressure denaturation of proteins. J. R. Soc. Interface. 2018;15:20180244. doi: 10.1098/rsif.2018.0244. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Vestergaard B., Hansen S. Application of Bayesian analysis to indirect Fourier transformation in small-angle scattering. J. Appl. Cryst. 2006;39:797–804. [Google Scholar]
  • 81.Riback J.A., Bowman M.A., et al. Sosnick T.R. Innovative scattering analysis shows that hydrophobic disordered proteins are expanded in water. Science. 2017;358:238–241. doi: 10.1126/science.aan5774. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Zheng W., Best R.B. An extended Guinier analysis for intrinsically disordered proteins. J. Mol. Biol. 2018;430:2540–2553. doi: 10.1016/j.jmb.2018.03.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Borgia A., Zheng W., et al. Schuler B. Consistent view of polypeptide chain expansion in chemical denaturants from multiple experimental methods. J. Am. Chem. Soc. 2016;138:11714–11726. doi: 10.1021/jacs.6b05917. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Rieping W., Habeck M., Nilges M. Inferential structure determination. Science. 2005;309:303–306. doi: 10.1126/science.1110428. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Supporting materials and methods, Figs. S1–S10, and Table S1
mmc1.pdf (1.6MB, pdf)
Document S2. Article plus supporting material
mmc2.pdf (3MB, pdf)

Articles from Biophysical Journal are provided here courtesy of The Biophysical Society

RESOURCES