Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2018 Aug 6.
Published in final edited form as: Methods Mol Biol. 2018;1688:375–389. doi: 10.1007/978-1-4939-7386-6_17

Structures of dynamic protein complexes: Hybrid techniques to study MAP kinase complexes and the ESCRT system

Wolfgang Peti 1,2,*, Rebecca Page 3, Evzen Boura 4, Bartosz Rozycki 5,*
PMCID: PMC6078100  NIHMSID: NIHMS979485  PMID: 29151218

Abstract

The integration of complementary molecular methods (including X-ray crystallography, NMR spectroscopy, small angle X-ray/neutron scattering and computational techniques) is frequently required to obtain a comprehensive understanding of dynamic macromolecular complexes. In particular, these techniques are critical for studying intrinsically disordered protein regions (IDRs) or intrinsically disordered proteins (IDRs) that are part of large protein:protein complexes. Here we explain how to prepare IDP samples suitable for study using NMR spectroscopy, and describe a novel SAXS modeling method (ensemble refinement of SAXS; EROS) that integrates the results from complementary methods, including crystal structures and NMR chemical shift perturbations, among others, to accurately model SAXS data and describe ensemble structures of dynamic macromolecular complexes.

Keywords: intrinsically disordered proteins (IDP), NMR spectroscopy, SAXS, EROS, ensemble

1. Introduction

Hybrid methods are increasingly being used to molecularly characterize the macromolecular machines that control all cellular processes. These complementary techniques include X-ray crystallography, nuclear magnetic resonance (NMR) spectroscopy, electron microscopy, small angle X-ray scattering (SAXS), small angle neutron scattering (SANS), mass spectrometry, chemical cross-linking, super-high resolution optical microscopy, optical microscopy, fluorescence spectroscopy and electron paramagnetic resonance (EPR) spectroscopy, among others. In particular, SAXS, a technique commonly used to characterize the shape(s) and dimensions of proteins in solution [1,2] is increasingly used as a hybrid method in structural biology, especially as a complimentary technique with NMR spectroscopy and X-ray crystallography. This is especially true for the structural investigation of intrinsically disordered proteins (IDPs), IDP complexes and large complexes with large intrinsically disordered regions (IDRs), all of which are difficult or impossible to crystallize and are also difficult to study using cryo-EM.

Like most NMR measurements, SAXS experiments are performed on samples in aqueous solutions. As a result, the macromolecules that scatter X-rays in SAXS experiments are oriented randomly relative to the incident beam. This results in the spherical averaging of the signal, and as a consequence the diffraction image depends on only a single scattering angle. In this way, three-dimensional molecular structures are reduced to one-dimensional intensity profiles. Despite this loss of information, the scattering intensity profiles can be used to determine the molecular shapes and dimensions of the biomacromolecule under investigation. Because of this, NMR spectroscopy and SAXS experiments are frequently used concurrently to study IDPs, IDP:protein complexes as well as for multi-domain proteins with long IDRs. However, in order to model these proteins and protein complexes that exhibit considerable conformational fluctuations, it is necessary to employ methods that use multiple structural models to fit experimental data as an ensemble rather than a single structure, as commonly done for X-ray crystallography. Thus, multiple SAXS-based approaches for modeling conformational ensembles have been recently developed, including the ensemble optimization method [3], the minimal ensemble search method [4], the SAXS module in the integrative modeling platform [5] and the ensemble refinement of SAXS (EROS) method [6]. Critically, all these approaches require a vast pool of diverse protein conformations as input; i.e. extensive sampling of the ensemble/conformation pool. This is essential as a conformation pool that is not inclusive enough is likely to be incapable of accounting for the experimental data. This conformation pool can be generated based on steric exclusion [3], high-temperature molecular dynamic simulations [4], statistical potentials for protein binding [5,6], or topology-based (Go-type) models [7]. After generating this conformation pool, heuristic algorithms are used to determine the combination of conformations that optimally fit the experimental SAXS data. Additional experimental constraints (e.g. NMR constraints, FRET, among others) can be used to either modify the ensemble pool or to determine the conformations that allow for the best SAXS data fit.

EROS uses a different strategy; namely, the pool of simulated structures is gently reweighed to improve the agreement with the SAXS data. It can also incorporate data from other methods, including chemical shift perturbations from NMR spectroscopy, FRET and EPR distance data. Here, we will provide a detailed description of how to express and purify IDPs, collect high quality SAXS data on these samples and finally describe how hybrid structures—ones that combine SAXS, NMR and crystallographic data—are obtained using the EROS method.

2. Materials

Detailed methods for obtaining chemical shift perturbation (CSP) data on large, dynamic complexes have recently been described [8]. Here, we focus on sample preparation and data collection for SAXS, with a focus on strategies for enhancing the expression and stability of IDPs for NMR and SAXS measurements. Finally, we described how to combine this data with NMR derived CSPs using EROS to obtain accurate models of IDP ensembles.

Prepare all solutions using ultrapure Water (Milli-Q water purification system, Millipore). Chemicals should be at least ACS grade. Prepare and store reagents/solutions at temperatures and conditions recommended by the manufacturer. Standard buffers are used for purification; when necessary, buffers are autoclaved when prepared in order to prevent unwanted proteolytic degradation due to the presence of trace amounts of protease. This is particularly important for the study of IDPs, which are highly susceptible to proteolytic degradation. All buffers are stored at 4°C and are filtered (0.22 μm PES filter, Millipore) immediately prior to use. When possible, use uniform buffers throughout all experiments. Standard water baths and/or heat blocks that can be heated to 90 °C are required for heat purification steps.

3. Methods

3.1 IDP expression and purification

IDPs and IDRs are estimated to comprise more than 30% of the human genome [9]. It is now clear that IDPs play essential roles in multiple biological processes, but especially for signaling. However, their lack of a single stable structure often renders them highly susceptible to proteolytic degradation in the laboratory. This requires the implementation of additional experimental steps to ensure they can be expressed to high levels and are stable in order to be studied at a molecular level using techniques such as NMR spectroscopy and SAXS (see Note 1).

  1. E. coli expression plasmids: Typically, expression plasmids are used that allow IDPs to be fused to N-terminal tags that facilitate both expression (maltose binding protein, MBP, and glutathione-S-transferase, GST) and purification (6xHis, tobacco etch virus [TEV] protease sequence) (see Notes 24).

  2. Expression: Standard methods are used to express IDPs in E. coli [10].

  3. Protease inhibition during cell lysis and purification: Standard methods are used for cell lysis and purification [10]. However, because IDPs are extremely sensitive to proteolytic degradation, additional steps are used to minimize protease exposure.

    1. Protease inhibitors (i.e., EDTA-free Complete tabs, Sigma-Aldrich) are added to all lysis buffers and, if needed, purification buffers.

    2. Buffers: Purification buffers are autoclaved prior to use.

    3. Columns and purification systems: All columns and purification system tubing are rigorously cleaned using 1 M NaOH (1 column volume [CV]) and 30% isopropanol/water (0.5 CV) prior to use.

    4. Elution collection: fraction collection tubes/blocks are autoclaved prior to use.

  4. N-terminal tag cleavage: Dialyze the purified IDP with TEV protease for N-terminal tag cleavage (see Note 5) using standard protocols [10].

  5. Heat purification: Because IDPs do not adopt a single, folded conformation, they are often heat stable. This provides a unique opportunity for both purification from folded N-terminal fusion tags (MBP/GST) and for minimizing proteolytic exposure (see Note 6).

    1. Transfer dialysate to 50 ml conical vial.

    2. Incubate dialysate in water bath (65 °C) for 15 min.

    3. Centrifuge at 10,000 ×g for 15 min to separate soluble and insoluble fractions (see Note 7).

    4. In a second step, incubate soluble fraction at 90 °C (see Note 8) for 15 min.

    5. Repeat ‘c’ (see Note 9).

  6. Purify the IDPs using size exclusion chromatography (SEC) to remove any remaining contaminant proteins and aggregates (see Note 10).

  7. In a final step, heat (90 °C) the pooled, concentrated IDP to denature any trace proteases (see Note 11).

  8. If steps 1–7 do not overcome IDP proteolytic degradation during purification, additional steps, including adding protease inhibitors at every step of the purification procedure and minimizing the time from lysis to final heat purification (i.e., < 12 hours) can also increase IDP protein yield and stability.

3.2 SAXS experiments

The use of SAXS data in molecular modeling has a number of advantages, the most significant of which is that SAXS experiments are performed on samples in aqueous solutions. Thus, SAXS provides information about the conformations of macromolecules in their natural environment. Improper data processing can lead to errors. While this is certainly true for any method, SAXS is exceedingly sensitive to errors. For example, the SAXS intensity profile is the difference in signals between the sample and the corresponding buffer; inadequate signal subtraction can lead to significant systematic errors in the resulting profile. Furthermore, SAXS-based modeling must take into account the protein hydration shell. However, SAXS intensity profiles from atomic models, such as CRYSOL [11], FoXS [12], AXES [13], AquaSAXS [14] and SASTBX [15], differ in how they treat the hydration shell, putting additional uncertainty on SAXS-derived models.

Sample preparation for SAXS experiments requires neither crystal growth nor protein labeling. Unlike X-ray crystallography, which relies on diffracting crystals, macromolecules in solution always scatter X-rays. Similarly, unlike solution NMR techniques which have some molecular size limitations, SAXS is not limited by the molecular mass; rather larger proteins will scatter better. Furthermore, the quality of SAXS data depends neither on the size nor on the flexibility of the macromolecules under study [16].

SAXS measurements can be performed using a home X-ray source or, more typically, synchrotron radiation. They are performed on samples in a wide range of solution conditions, molecular concentrations and temperatures. For all SAXS experiments, optimal sample preparation is essential for obtaining interpretable SAXS data. In particular, SAXS is exceptionally sensitive to aggregation, as soluble aggregates, even if they represent less than 1% of the sample, are significantly larger and thus will have a major impact on the overall measured scattering. Thus, identifying conditions that prevent sample aggregation is essential (see Note 12). Detailed protocols for SAXS methods (including strategies for detecting aggregation and minimizing radiation) have been recently summarized [16]. Here, we focus on SAXS data collection for IDPs and IDP containing proteins.

  1. Sample production: Purify all samples immediately prior to SAXS measurements using size exclusion chromatography (SEC) to remove trace aggregates.

  2. When possible, filter any trace aggregates immediately prior to SAXS measurements. For this, 0.02 μm syringe filters (GE Healthcare Anotop 10) are suitable (see Note 12).

  3. Sample cells: Sample cells should be thoroughly cleaned prior to use to eliminate trace proteases (i.e., NaOH washes). Prior to measurements, the cells should subsequently be thoroughly washed with SAXS buffer (see Note 13).

  4. Sample concentration: The optimal concentration for SAXS measurements depends on the X-ray source, the size and assembly of the cell (flow-through or static), among other parameters. Typically, measurements are initiated using the lowest concentrations possible. Samples are then concentrated and the SAXS data collected. This is continued until the sample is concentrated as high as can be achieved before aggregation is detected (sometimes this can be as high as 30 mg/ml) (see Note 14).

  5. SAXS data analysis: Numerous software packages are available to analyze SAXS data, with the two most widely used being ATSAS [17] and SCATTER (https://bl1231.als.lbl.gov/scatter/), both of which also allow for the calculation of 3D envelops from the data; these calculations are usually performed with the highest signal/noise dataset (commonly, the highest measured concentration). This can also be done using Fast-SAXS-pro [18].

  6. IDP detection: The Kratky plot, i.e., the plot of q2I(q) as a function of the momentum transfer q, is used to identify IDPs (see Note 15).

    1. Convergence of the Kratky plot at high q suggests compaction, whereas a hyperbolic shape suggests flexibility [19]; the hyperbolic feature is a trademark of random coils and IDPs.

    2. In practice, Kratky plots may be difficult to assess if the SAXS data are noisy or truncated. Recently, analysis based on the Porod-Debye law, i.e. analysis of q4I(q) versus q4 at intermediate q-values, has been introduced as a more robust approach to tell apart flexible molecules from rigid ones [19].

    3. Molecular flexibility can also be presumed if SAXS data cannot be accounted for with a single model, suggesting that an ensemble of models may be required to fit the experimental data [20,21].

  7. Data processing and the need for ensemble modeling approaches: The standard approach for SAXS data analysis is to analyze the scattering intensity profile, I(q), which enables the determination of the pair-distance distribution function, P(r), and the corresponding molecular envelope [22,23,17]. Molecular envelopes provide an informative visual interpretation of the SAXS data; however, this approach holds only for rigid systems with minor ensemble fluctuations. When SAXS is used to study IDPs, this standard envelope calculation fails [24]. Furthermore, SAXS can be used to determine structures of protein complexes if atomic structures of the constituent proteins are known [19]. However, to achieve this goal with optimal accuracy, structural models of the protein complexes should be fitted directly to the experimental SAXS data; simply placing the protein models into molecular envelopes does not fully use the structural information encoded in the scattering intensity profile I(q).

3.3 Ensemble refinement of SAXS (EROS) method

EROS was developed from the outset to combine SAXS with other spectroscopy methods, especially those that use site-directed labeling [25,20], such as fluorescence and electron paramagnetic resonance (EPR) spectroscopy. In this way, data from various biophysical experiments can be readily combined and used simultaneously for molecular modeling. For example, by combining X-ray crystallography (which provides high resolution structures of individual domains) with SAXS data (which provides information on the global size and shape of the molecular assembly) and NMR chemical shift, DEER (EPR) or FRET (which enables to impose local restrains on distances between selected sites) data, it is possible to obtain detailed representations of the structures in a variety of biological important systems ranging such as the ESCRT membrane-protein trafficking system [25,20] and protein kinases complexes with their regulatory phosphatases [21,26].

Different to other SAXS ensemble analysis methods, in EROS, the pool of simulated structures is only moderately reweighed to improve the agreement with the experimental SAXS data. In particular, the maximum-entropy method is used to prevent data over-fitting.

  1. Different to other SAXS ensemble analysis methods, in EROS, the pool of simulated structures is only moderately reweighed to improve the agreement with the experimental SAXS data. The maximum-entropy method is used to prevent data over-fitting.

  2. Generating a starting model: A structural model of the protein system/complex under investigation is constructed using atomic structures (PDB files) of the constituent proteins or/and domains. If individual experimental structures are unavailable, homology models can be used. All flexible loops and inter-domain linkers, which are often missing in atomic structures (e.g. due to lack of electron density), must be built using programs such as MODELLER [27].

  3. Generating the EROS ensemble: An ensemble of structural models is generated using molecular dynamics simulations; the structural model obtained in (1) is used as input for the simulations.

    1. All-atom molecular dynamics simulations of large proteins are computationally demanding, especially when it comes to simulating large conformational fluctuations in flexible protein systems. Thus, EROS uses a more efficient, coarse-grained approach [28] in order to gain speed and to increase sampling of ensemble structures.

    2. EROS simulations of multiple protein complexes have been performed using in-house software but many coarse-grained protein simulation packages that are freely available [29,30] can be used to generate starting ensembles (see Note 16).

    3. These protein simulations can be positively biased using experimental data [31]. For example, we used this approach for the simulations of the p38α:HePTP complex, where NMR chemical shift perturbations constrains were incorporated into the energy function via a weak bias potential acting between the dynamic HePTP-KIS linker and residues on the surface of p38α [21].

  4. Scattering intensity profile: For each of the simulated structures obtained in (2) a scattering intensity profile is computed. Different algorithms are available to compute SAXS intensity profiles on the basis of protein coarse-grained representations [32]. The EROS method uses a particularly simple approach, which assumes constant form-factors of the amino-acid beads [6], which makes this step simpler and faster compared to many other approaches.

  5. Comparison with experiment: For each of the simulated structures additional parameters can be calculated and directly compared with experimental results, e.g. from FRET efficiencies or DEER dipolar evolution functions (see Note 17). However, for this approach it is necessary to model the fluorescence or spin labels onto the protein surface. Either rotamer libraries [33] or molecular dynamics simulations [34,35] can be used to generate a pool of possible conformations of the fluorescence or spin labels.

  6. Cluster generation: Simulation structures are sorted into clusters based on their mutual similarity.

    1. Standard clustering algorithms, such as k-means [36] or QT-clustering method [37], are typically sufficient.

    2. However, it is important to choose an appropriate metric to cluster the simulated structures. Indeed, many quantities can be used as a measure of similarity between protein structures. The most common one is the root-mean-square deviation (RMSD) of atomic positions.

    3. To compute the RMSD it is necessary to superimpose structures, which can be problematic in the case of flexible protein systems. For this reason, EROS employs the distance root-mean-square (DRMS) analysis. The DRMS between structures A and B is defined as follows DRMS(A,B)=(1N2n,m(dn,m(A)-dn,m(B))2)1/2 where dn,m(A) is the Cartesian distance between the amino-acid beads n and m in structure A, and N2 is the number of bead pairs over which the sum is performed.

  7. Assign measurable quantities to the clusters of the simulation structures. Use the results obtained in (4), (5) and (6).

    1. The SAXS intensity Ik(q) assigned to cluster number k is the arithmetic mean of SAXS intensities resulting from all individual structures in this cluster.

    2. By analogy, FRET efficiencies or DEER dipolar evolution functions assigned to a given cluster are arithmetic averages of FRET or DEER signals resulting from all structures in this cluster.

  8. Assign statistical weights to the clusters obtained in point 5.

    1. Use normalized weights, which fulfill the condition kwk=1, where wk denotes the statistical weight of cluster number k. The average SAXS intensity profile resulting from the whole ensemble of simulation strictures is now given by a weighted average over the clusters, i.e., Isim(q)=kwkIk(q), where Ik(q) are the SAXS intensity profiles assigned to the individual clusters in (6).

    2. Isim(q) depends on the set of weights wk assigned to the clusters. Also other ensemble-averaged quantities such as FRET efficiencies or DEER dipolar evolution functions, which should be compared directly to experimental data, depend on the cluster weights.

    3. The discrepancy between the computed, ensemble-averaged intensity profile Isim(q) and the experimental SAXS data Iexp(q) can be quantified by χSAXS2=1Nqi=1Nq(cIsim(qi)-Iexp(qi))2σ2(qi) where the scale factor c results from the condition χSAXS2/c=0.

    4. The discrepancy between the computed, ensemble-averaged FRET or DEER signals and the data from FRET or DEER experiments, respectively, can be quantified by analogous expressions.

    5. The resulting model-data discrepancy χ2=χSAXS2+χFRET2+χDEER2 is a function of the statistical weights of the clusters. Note that re-weighting the clusters can result in a decrease in the discrepancy χ2 between simulations and experiments.

  9. Fit the simulation ensemble to experimental data by optimizing the statistical weights of the clusters.

    1. Before any refinement, the statistical weights are proportional to cluster populations. That is, if cluster number k consists of nk structures, its initial weight is wk(0)=nk/knk. In the course of the ensemble refinement, the cluster weights are varied to improve agreement with experimental data.

    2. To re-weight the clusters in controlled way, and to prevent data over-fitting, a minimum entropy method can be used. This method is based on numerical minimization of a pseudo-potential function F = χ2 - S that consists of the model-data discrepancy function, χ2 as introduced in (7), and a cross-entropy term S=-1βkwklnwkwk(0) that quantifies how far the refined ensemble is from the original simulation ensemble. Here, β is a control parameter. Including the two terms, χ2 and S, in the pseudo-potential function F reflects our confidence in both experiments and simulations. The function F can be minimized with respect to the statistical weights wk by using simulated annealing [6] or more advanced algorithms such as COPER [38]. As a result, the optimal weights wk(β) are obtained for a given value of parameter β.

    3. For sufficiently small β-values, when χ2 is negligible in comparison to S, minimization of function F leads to only small changes in the initial weights, i.e., wk(β)wk(0) for most of the clusters. In contrast, for large values of β, when F≈χ2, minimizing F leads to the best possible agreement with experiment but may result in data over-fitting.

    4. Therefore, a sensible approach is to determine such a value of parameter β for which minimization of F yields χ2≈1. Note that if the simulation structures correctly capture the relevant conformations of the protein system under study then the condition χ2≈1 is obtained when wk(β)wk(0) for most of the clusters.

  10. Optional: Refine the structural ensemble using an alternative method and compare the outcomes [21].

    1. Another way of refining the simulation ensemble is the minimum ensemble method that selects the smallest possible set of clusters that accounts for experimental data [20,21].

    2. In this approach, another function G = χ2 + μN is minimized numerically. Here, N is the number of clusters with non-zero weights, wk>0, and μ is another control parameter, which should be fine-tuned in such a way that minimization of function G leads to χ2≈1.

    3. The advantage of this method is that it usually produces only a small set of representative structures that can be easily inspected visually. However, by discarding a significant portion of the simulation ensemble, the minimum ensemble method does not fully use the predictive power of molecular simulations.

  11. Validate the structural ensemble using independent datasets excluded from refinement [25,20].

Figure 1.

Figure 1

Diagram showing the integration of experimental (crystallography, NMR, SAXS, FRET, EPR; marked in red) and computational data (molecular simulations, ensemble refinement procedures; marked in blue) to determine representative ensemble structures of protein complexes containing IDPs and/or IDRs in EROS.

Acknowledgments

The authors thank all members of the Page and Peti laboratory. This work was supported by NIH grant R01GM098482 to RP; NIH R01GM100910 and American Diabetes Association Pathway to the Cure 1-14-ACN-31 to WP. EB was supported by the Project InterBioMed LO1302 from the Ministry of Education of the Czech Republic and by the Academy of Sciences of the Czech Republic (RVO: 61388963). BR was supported by the European Framework Programme VII NMP grant 604530-2 (CellulosomePlus) and co-financed by the Polish Ministry of Science and Higher Education from the resources granted for the years 2014–2017 in support of scientific projects.

Footnotes

1

All steps below are suitable for IDPs and folded proteins that contain extended IDP domains, with the exception of the heat purification steps, which are appropriate only for IDPs (i.e., those sample without folded domains).

2

IDPs expressed in the absence of an N-terminally fused folded proteins (i.e., which just a 6xHis-TEV sequence) are often rapidly degraded by the bacterial proteolytic machinery in E. coli. In these cases, fusion to a large folded protein (MBP or GST) most commonly overcomes this problem.

3

For this reason, we routinely use 1 of 3 pET-based expression plasmids that contain different N-terminal tags for IDP expression: (1) 6xHis-TEV-IDP, (2) 6xHis-MBP-TEV-IDP or (3) 6xHis-GST-TEV-IDP.

4

TEV protease rarely exhibits non-specific cleavage and thus is more suitable for IDP tag cleavage than other proteases, such as Factor Xa or thrombin.

5

Because the samples are IDPs, complete cleavage is often achieved is less time than for folded proteins, sometimes as soon as just a few hours. TEV cleavage time courses can be performed to minimize this step.

6

Not all IDPs can be readily heat purified; rather it must be determined experimentally.

7

We typically prepare samples of both the soluble and insoluble samples for SDS-PAGE gel analysis.

8

The maximum temperature use for heat purification is specific to each IDP and must be determined experimentally. Typically, for a new IDP purification, temperature intervals of 5 °C will be tested in order to identify the maximal temperature at which the IDP remains soluble (commonly from 60–90 °C).

9

Greater than 90% of the MBP, GST and TEV precipitates at temperatures of ≥70 °C. Thus, heat purification is highly effective for removing large, folded proteins when working with IDPs.

10

Because IDPs are not globular, they sometimes elute in positions expected for ‘larger’ proteins during the SEC step.

11

This last heat purification step maximizes the long-term stability of IDPs, as it substantially slows/prevents proteolytic degradation, which is often important for NMR spectroscopy and SAXS analysis steps.

12

Typically, we prepare 4–8 L of the appropriate buffer that is then used for all subsequent experiments. This ensures accurate buffer subtraction.

13

The SAXS intensity profile must be taken as a difference in signals between the sample and the corresponding buffer, which may lead to significant systematic errors if the signal subtraction is inadequate.

14

Concentrating samples to high concentration and then diluting the samples for SAXS measurements often results in unwanted aggregation and thus is avoided.

15

We note that in addition to the analysis of SAXS data, it is essential to assess molecular flexibility using also biochemical or biophysical methods, such as limited proteolysis or hydrogen/deuterium exchange.

16

The ensemble of structures generated by the simulations does not need to be perfect to be useful. The molecular simulations serve merely to produce an initial pool of meaningful candidate conformations. However, part of the success of the EROS method is the use of appropriate, physics-based simulations to sample the relevant conformations of multi-domain proteins and multi-protein complexes. In fact, the transferable energy function used in the EROS simulations has been shown to correctly predict structures and binding affinities of a number of protein-protein complexes [28]. Also, in a recent study on cellulosomal proteins [39], the ensemble of simulation structures has been found to fit SAXS experimental data without any refinement.

17

In principle, the structural ensemble can be fitted either to raw experimental data or to commensurate quantities such as SAXS-derived pair-distance distribution function or DEER-derived inter-label distance distribution. However, to avoid introducing any regularization-dependent artifacts into the ensemble refinement, the simulation structures are fitted directly to experimental data in the framework of the EROS method.

References

  • 1.Blanchet CE, Svergun DI. Small-angle X-ray scattering on biological macromolecules and nanocomposites in solution. Annu Rev Phys Chem. 2013;64:37–54. doi: 10.1146/annurev-physchem-040412-110132. [DOI] [PubMed] [Google Scholar]
  • 2.Graewert MA, Svergun DI. Impact and progress in small and wide angle X-ray scattering (SAXS and WAXS) Curr Opin Struct Biol. 2013;23(5):748–754. doi: 10.1016/j.sbi.2013.06.007. [DOI] [PubMed] [Google Scholar]
  • 3.Bernado P, Perez Y, Svergun DI, Pons M. Structural characterization of the active and inactive states of Src kinase in solution by small-angle X-ray scattering. J Mol Biol. 2008;376(2):492–505. doi: 10.1016/j.jmb.2007.11.066. [DOI] [PubMed] [Google Scholar]
  • 4.Pelikan M, Hura GL, Hammel M. Structure and flexibility within proteins as identified through small angle X-ray scattering. Gen Physiol Biophys. 2009;28(2):174–189. doi: 10.4149/gpb_2009_02_174. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Alber F, Forster F, Korkin D, Topf M, Sali A. Integrating diverse data for structure determination of macromolecular assemblies. Annu Rev Biochem. 2008;77:443–477. doi: 10.1146/annurev.biochem.77.060407.135530. [DOI] [PubMed] [Google Scholar]
  • 6.Rozycki B, Kim YC, Hummer G. SAXS ensemble refinement of ESCRT-III CHMP3 conformational transitions. Structure. 2011;19(1):109–116. doi: 10.1016/j.str.2010.10.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Yang S, Blachowicz L, Makowski L, Roux B. Multidomain assembled states of Hck tyrosine kinase in solution. Proc Natl Acad Sci U S A. 2010;107(36):15757–15762. doi: 10.1073/pnas.1004569107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Peti W, Page R. NMR Spectroscopy to Study MAP Kinase Binding to MAP Kinase Phosphatases. Methods Mol Biol. 2016;1447:181–196. doi: 10.1007/978-1-4939-3746-2_11. [DOI] [PubMed] [Google Scholar]
  • 9.Wright PE, Dyson HJ. Intrinsically disordered proteins in cellular signalling and regulation. Nat Rev Mol Cell Biol. 2015;16(1):18–29. doi: 10.1038/nrm3920. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Peti W, Page R. Strategies to maximize heterologous protein expression in Escherichia coli with minimal cost. Protein Expr Purif. 2007;51(1):1–10. doi: 10.1016/j.pep.2006.06.024. [DOI] [PubMed] [Google Scholar]
  • 11.Svergun DI, Barberato C, Koch MHJ. CRYSOL - A program to evaluate x-ray solution scattering of biological macromolecules from atomic coordinates. J Appl Crystallogr. 1995;28:768–773. [Google Scholar]
  • 12.Schneidman-Duhovny D, Hammel M, Sali A. FoXS: a web server for rapid computation and fitting of SAXS profiles. Nucleic Acids Res. 2010;38(Web Server issue):W540–544. doi: 10.1093/nar/gkq461. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Grishaev A, Guo L, Irving T, Bax A. Improved fitting of solution X-ray scattering data to macromolecular structures and structural ensembles by explicit water modeling. J Am Chem Soc. 2010;132(44):15484–15486. doi: 10.1021/ja106173n. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Poitevin F, Orland H, Doniach S, Koehl P, Delarue M. AquaSAXS: a web server for computation and fitting of SAXS profiles with non-uniformally hydrated atomic models. Nucleic Acids Res. 2011;39(Web Server issue):W184–189. doi: 10.1093/nar/gkr430. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Liu HG, Hexemer A, Zwart PH. The small angle scattering ToolBox (SASTBX): an open-source software for biomolecular small-angle scattering. J Appl Crystallogr. 2012;45:587–593. [Google Scholar]
  • 16.Kikhney AG, Svergun DI. A practical guide to small angle X-ray scattering (SAXS) of flexible and intrinsically disordered proteins. FEBS Lett. 2015;589(19 Pt A):2570–2577. doi: 10.1016/j.febslet.2015.08.027. [DOI] [PubMed] [Google Scholar]
  • 17.Petoukhov MV, Franke D, Shkumatov AV, Tria G, Kikhney AG, Gajda M, Gorba C, Mertens HD, Konarev PV, Svergun DI. New developments in the ATSAS program package for small-angle scattering data analysis. J Appl Crystallogr. 2012;45(Pt 2):342–350. doi: 10.1107/S0021889812007662. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Ravikumar KM, Huang W, Yang S. Fast-SAXS-pro: a unified approach to computing SAXS profiles of DNA, RNA, protein, and their complexes. J Chem Phys. 2013;138(2):024112. doi: 10.1063/1.4774148. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Rambo RP, Tainer JA. Characterizing flexible and intrinsically unstructured biological macromolecules by SAS using the Porod-Debye law. Biopolymers. 2011;95(8):559–571. doi: 10.1002/bip.21638. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Boura E, Rozycki B, Herrick DZ, Chung HS, Vecer J, Eaton WA, Cafiso DS, Hummer G, Hurley JH. Solution structure of the ESCRT-I complex by small-angle X-ray scattering, EPR, and FRET spectroscopy. Proc Natl Acad Sci U S A. 2011;108(23):9437–9442. doi: 10.1073/pnas.1101763108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Francis DM, Rozycki B, Koveal D, Hummer G, Page R, Peti W. Structural basis of p38alpha regulation by hematopoietic tyrosine phosphatase. Nat Chem Biol. 2011;7(12):916–924. doi: 10.1038/nchembio.707. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Svergun DI, Petoukhov MV, Koch MH. Determination of domain structure of proteins from X-ray solution scattering. Biophys J. 2001;80(6):2946–2953. doi: 10.1016/S0006-3495(01)76260-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Franke D, Svergun DI. DAMMIF, a program for rapid ab-initio shape determination in small-angle scattering. J Appl Crystallogr. 2009;42(Pt 2):342–346. doi: 10.1107/S0021889809000338. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Rozycki B, Boura E. Large, dynamic, multi-protein complexes: a challenge for structural biology. J Phys Condens Matter. 2014;26(46):463103. doi: 10.1088/0953-8984/26/46/463103. [DOI] [PubMed] [Google Scholar]
  • 25.Boura E, Rozycki B, Chung HS, Herrick DZ, Canagarajah B, Cafiso DS, Eaton WA, Hummer G, Hurley JH. Solution structure of the ESCRT-I and -II supercomplex: implications for membrane budding and scission. Structure. 2012;20(5):874–886. doi: 10.1016/j.str.2012.03.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Francis DM, Rozycki B, Tortajada A, Hummer G, Peti W, Page R. Resting and active states of the ERK2:HePTP complex. J Am Chem Soc. 2011;133(43):17138–17141. doi: 10.1021/ja2075136. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Fiser A, Do RK, Sali A. Modeling of loops in protein structures. Protein Sci. 2000;9(9):1753–1773. doi: 10.1110/ps.9.9.1753. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Kim YC, Hummer G. Coarse-grained models for simulations of multiprotein complexes: application to ubiquitin binding. J Mol Biol. 2008;375(5):1416–1433. doi: 10.1016/j.jmb.2007.11.063. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Kenzaki H, Koga N, Hori N, Kanada R, Li W, Okazaki K, Yao XQ, Takada S. CafeMol: A Coarse-Grained Biomolecular Simulator for Simulating Proteins at Work. J Chem Theory Comput. 2011;7(6):1979–1989. doi: 10.1021/ct2001045. [DOI] [PubMed] [Google Scholar]
  • 30.Liwo A, Baranowski M, Czaplewski C, Golas E, He Y, Jagiela D, Krupa P, Maciejczyk M, Makowski M, Mozolewska MA, Niadzvedtski A, Oldziej S, Scheraga HA, Sieradzan AK, Slusarz R, Wirecki T, Yin Y, Zaborowski B. A unified coarse-grained model of biological macromolecules based on mean-field multipole-multipole interactions. J Mol Model. 2014;20(8):2306. doi: 10.1007/s00894-014-2306-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Dannenhoffer-Lafage T, White AD, Voth GA. A Direct Method for Incorporating Experimental Data into Multiscale Coarse-Grained Models. J Chem Theory Comput. 2016;12(5):2144–2153. doi: 10.1021/acs.jctc.6b00043. [DOI] [PubMed] [Google Scholar]
  • 32.Yang S, Park S, Makowski L, Roux B. A rapid coarse residue-based computational method for x-ray solution scattering characterization of protein folds and multiple conformational states of large protein complexes. Biophys J. 2009;96(11):4449–4463. doi: 10.1016/j.bpj.2009.03.036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Polyhach Y, Bordignon E, Jeschke G. Rotamer libraries of spin labelled cysteines for protein studies. Phys Chem Chem Phys. 2011;13(6):2356–2366. doi: 10.1039/c0cp01865a. [DOI] [PubMed] [Google Scholar]
  • 34.Best RB, Merchant KA, Gopich IV, Schuler B, Bax A, Eaton WA. Effect of flexibility and cis residues in single-molecule FRET studies of polyproline. Proc Natl Acad Sci U S A. 2007;104(48):18964–18969. doi: 10.1073/pnas.0709567104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Merchant KA, Best RB, Louis JM, Gopich IV, Eaton WA. Characterizing the unfolded states of proteins using single-molecule FRET spectroscopy and molecular simulations. Proc Natl Acad Sci U S A. 2007;104(5):1528–1533. doi: 10.1073/pnas.0607097104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Hartigan JA, Wong MA. A k-means clustering algorithm. Applied Statistics. 1979;28:100–108. [Google Scholar]
  • 37.Heyer LJ, Kruglyak S, Yooseph S. Exploring expression data: identification and analysis of coexpressed genes. Genome Res. 1999;9(11):1106–1115. doi: 10.1101/gr.9.11.1106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Leung HT, Bignucolo O, Aregger R, Dames SA, Mazur A, Berneche S, Grzesiek S. A Rigorous and Efficient Method To Reweight Very Large Conformational Ensembles Using Average Experimental Data and To Determine Their Relative Information Content. J Chem Theory Comput. 2016;12(1):383–394. doi: 10.1021/acs.jctc.5b00759. [DOI] [PubMed] [Google Scholar]
  • 39.Rozycki B, Cieplak M, Czjzek M. Large conformational fluctuations of the multi-domain xylanase Z of Clostridium thermocellum. J Struct Biol. 2015;191(1):68–75. doi: 10.1016/j.jsb.2015.05.004. [DOI] [PubMed] [Google Scholar]

RESOURCES