Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Dec 1.
Published in final edited form as: Curr Opin Struct Biol. 2022 Sep 29;77:102470. doi: 10.1016/j.sbi.2022.102470

Reweighting methods for elucidation of conformation ensembles of proteins

Raquel Gama Lima Costa a, David Fushman a,b,*
PMCID: PMC9771963  NIHMSID: NIHMS1842428  PMID: 36183447

Abstract

Proteins are inherently dynamic macromolecules that exist in equilibrium among multiple conformational states, and motions of protein backbone and side chains are fundamental to biological function. The ability to characterize the conformational landscape is particularly important for intrinsically disordered proteins, multidomain proteins, and weakly bound complexes, where single-structure representations are inadequate. As the focus of structural biology shifts from relatively rigid macromolecules toward larger and more complex systems and molecular assemblies, there is a need for structural approaches that can paint a more realistic picture of such conformationally heterogeneous systems. Here we review reweighting methods for elucidation of structural ensembles based on experimental data, with the focus on applications to multidomain proteins.

Keywords: Maximum parsimony, Maximum entropy, Bayesian method, Multi-domain protein, Reweighting, Ensemble selection

1. Introduction

Proteins are inherently dynamic macromolecules that exist in equilibrium among multiple conformational states, and knowledge of the relevant conformers and of the extent of motions is essential for understanding their energy landscapes and the molecular mechanisms underlying key elements of their biological function including recognition, allostery, and catalysis [1]. Conformational heterogeneity of macromolecules is essential for conformational selection that underlies biomolecular recognition and other events that enable protein function. Aside from conformational heterogeneity of a macromolecule there is also heterogeneity in assembly. In addition, metabolites and post-translational modifications can shift the population of the conformational ensemble rather than fully lock down a single conformation. Thus, methods that can probe the energy landscapes of macromolecular systems and detect population shifts as a result of interactions or other factors provide valuable tools for structural biology.

Characterization of proteins as structurally heterogeneous systems presents a major challenge because traditional structural biology approaches are geared towards producing a coherent set of similar structures and are generally deficient in treating macromolecules as conformational ensembles. For example, experimental data from solution NMR measurements generally reflect physical characteristics averaged over multiple conformational states of a molecule; yet the existing software packages for biomolecular structure determination (such as CNS/XPLOR, CYANA, HADDOCK etc) [24] were originally designed to produce a single-structure snapshot that matches the conformationally averaged constraints in their entirety. This might work well for relatively “rigid” single-domain proteins, but could paint an inadequate portrait of inherently flexible macromolecular systems such as intrinsically disordered proteins (IDP), multidomain proteins, and weakly-bound macromolecular assemblies. A conceptually different approach is needed that aims at determining an ensemble of conformers where no single one needs to match the composite experimental data, but instead, a weighted average of a given pertinent physical observable over such an ensemble has to be in agreement with experiment. This paradigm shift in structural biology, from a single-snapshot picture to more adequate, ensemble representation of biomacromolecules, requires novel computational approaches and tools. Here we review some of such approaches, with the focus on applications to multidomain proteins in solution.

Determining structural ensembles from experimental data faces a fundamental challenge of solving a mathematically underdetermined system because the number of degrees of freedom associated with dynamic macromolecules generally greatly exceeds the number of experimentally available independent observables. This renders direct conversion of experimental data into a representative ensemble an ill-posed problem and can yield an unlimited number of possible solutions. This also implies that an integrated structural approach combining different methods/types of data is needed because no single experimental technique can fully capture all the features of a conformationally heterogeneous macromolecule, and thus has difficulties in ‘explaining’ the composite of the data [57].

Experimental techniques that can provide ensemble-related structural information range from those capable of observing or detecting individual molecules (like cryo-EM or smFRET) to methods extracting distance distribution information (smFRET and DEER) to those methods where the measured values of the observables are averaged over all molecules in the sample (FRET, SAS) and also over various conformations each individual molecule samples during the characteristic measurement time (solution NMR). Determining structural ensembles from experimental data obtained by the latter methods is particularly challenging because of the need to deconvolute contributions from various conformational states, and here we mainly focus on those methods.

A conformational ensemble can be defined by a set of relevant structures/conformers and their respective populations (relative weights). Conceptually, the ensemble selection methods aim at finding the weights for various members of (typically) an in-silico generated set of structures [8] that provide the best match to experimental data. In this chapter we review the so-called reweighting methods for ensemble selection. The name “reweighting” reflects that at the beginning all conformations included in the initial input ensemble are considered possible and with equal a priori probabilities/weights [9]. As the result of analysis, a new weight wi is assigned to each conformer i, such that the ensemble-averaged predicted data (dpred) match the experimental data (dexpt) within their errors, see Figure 1. This is achieved by predicting experimental data for each member of the input ensemble and finding the appropriate weights (elements of vector w) by solving the relevant optimization problem, as discussed below. Thus, reweighting methods work in a posterior way: an initial pool of structures is generated and experimental data are used to refine the ensemble to a final solution. Note that reweighting methods are different from approaches that include experimental data as restraints when sampling the conformational space using molecular dynamics or Monte-Carlo simulations [10].

Figure 1:

Figure 1:

Illustration of the workflow of reweighting methods for determining the conformational ensemble based on experimental data. (From left to right) Step 1: An initial ensemble of N possible conformers is generated. Step 2: Given the type of experimental data available, the corresponding data are predicted for each conformer of the initial ensemble. From the mathematical perspective, the experimental data can be considered as a vector (dexpt), and the predicted data can be combined to form a matrix (A) in which columns 1 through N represent predicted data for each of the N conformers. Step 3: The relative populations/weights for the conformers (vector w of length N ) are determined by solving the optimization problem of matching the experimental data to the conformationally weighted predicted data for the entire ensemble, dpred = Aw. The agreement between dexpt and dpred is quantified through χ2, where σ’s represent experimental errors. Finally, step 4: The resulting ensemble is analyzed and the relevant structures are visualized. The input experimental data can be in a form of various observables (for example, RDC, PCS, PRE data for various residues/atoms in a protein or SAS data for various values of the scattering vector length q) analyzed separately or a combination of such data analyzed together, as exemplified here for NMR and SAS data [6, 11]

2. Reweighting methods

A variety of methods for ensemble selection using reweighting have been developed. They generally range from those searching for a smallest-size ensemble (based on the Maximum Parsimony or Occam’s razor principle) that can reproduce the experimental data to methods searching for solutions that encompass the entire input set of conformers (using the Maximum Entropy principle). Obviously, these principles reflect two extremes in approaching the complicated problem of treating structurally heterogeneous macromolecular systems. The Maximum Entropy principle provides an intuitively meaningful approximation of the generally continuous distribution of structures. However, due to the sheer size of the ensemble the resulting solutions could be difficult to visualize and interpret without performing additional analyses, including clustering/discretization. Furthermore, such solutions often contain numerous conformers having almost negligible weights, and might eventually require a further analysis to separate significant from less significant conformers [12]. On the other hand, the typically small discrete set of structures resulting from the Maximum Parsimony principle, while clearly a simplification that picks conformers making major contributions to the measured data, has the appeal of yielding a solution that often contains an easier visualizable and interpretable number of conformers.

At the heart of finding the weights for various conformers is the “matching game” of minimizing the difference between the experimental and ensemble-averaged predicted data, quantified as χ2(w) (see Fig. 1). This has been generally approached from two different directions: (A) solving the following minimization problem:

w=argminw0{χ2(w)+F(w)} (1)

where F (w) is a regularization term included to prevent overfitting (see below); or (B) by maximizing the probability p(wdexpt,E) of finding the proper combination of weights w, given the experimental data dexpt and ensemble E:

w=argmaxw0{p(wdexpt,E)} (2)

where the solution with the highest probability is the optimal solution. To solve the latter problem, Bayesian inference-based methods are used, and the model is often assumed to be a Gaussian-type function that depends on (dpreddexpt), such that by maximizing the probability, minimization of χ2(w) is achieved under regularization functions that depend on the data and implementation.

2.1. Maximum Parsimony

In the search for the smallest number of conformers necessary to ‘explain’ experimental data, constraints that limit the resulting ensemble size have to be imposed depending on the approach. When minimizing χ2 using Eq 1, regularization is directly imposed by finding solutions for a fixed size (M ) of the resulting ensemble (i.e. the number of nonzero elements in w), and screening various M values to determine the smallest M that provides a match between dexpt and dpred within experimental errors (see, for example: [12, 13]). Finding the right size solutions can be tricky, and L-curve based methods [12, 14], initial guesses etc. have been used to achieve that. For an initial ensemble of N conformers, testing all possible N !/(M !(NM )!) combinations for a given solution size M could be intractable even for N as small as ∼ 100, and greedy-type algorithms (e.g. [12]) reducing the computational complexity are used to efficiently select possible relevant combinations of conformers while minimizing the risk of missing proper solutions.

In the probabilistic approach (Eq 2), the ensemble size reduction is used as a means to simplify the probability p(wdexpt,E), such that convergence of the assumed probability with the true probability is achieved [11, 15]. The weights are optimized by reducing the ensemble size iteratively using a cutoff threshold for wi until an optimal size is reached.

Examples of Maximum Parsimony-based methods using χ2 minimization include SES (Sparse Ensemble Selection) [12], Sample and Select [16], ASTEROIDS (A Selection Tool for Ensemble Representations Of Intrinsically Disordered States) [7], MES (Minimum Ensemble Search) [17], MESMER (Minimal Ensemble Solutions to Multiple Experimental Restraints) [18], and EOM (Ensemble Optimization Method) [19, 20], while Bayesian inference-based methods (maximizing the probability p(wdexpt,E)) include BW (Bayesian Weighting) [21] and BioEM (Bayesian inference of Electron Microscopy) [22].

A different approach called Maximum Occurrence (MaxOcc) [23] has also been developed. Instead of finding an ensemble solution, the focus is on determining the maximum possible weight a conformer from a predefined set can have as part of an ensemble. After finding the conformers with the highest possible weights, the method can be combined with MaxOR and MinOR (Maximum and Minimum Occurrence of a Region) [24] to zoom on respective regions of the conformational space that provide a match to experimental data.

It should be mentioned here that Maximum Parsimony-based methods could produce multiple solutions with comparable values of the target function [12], and care should be exercised to validate them by comparison with other experimental data (e.g., [25, 26]) as well as with the outcomes of the Maximum Entropy-based analysis (see e.g. [12]).

2.2. Maximum Entropy

In this approach, when minimizing χ2(w) that contains contributions from the entire input ensemble, a relative entropy term of the form

F(w)=λi=1Nwilog(wipi)

is included as the regularizer in Eq 1, where λ > 0 is a regularization parameter that can be obtained using an L-curve method [27] and pi is a prior probability (pi = 1/N for a uniform distribution).

When solving the problem by maximizing the probability p(wdexpt,E), the Bayesian inference principle is applied, and regularization is achieved by using relative entropy as a measure of deviation between the probability density and a reference distribution [28].

In both cases, the reweighting is performed for the entire initial ensemble. The implementation of these methods has been done using various approaches (e.g., [29]) to facilitate the solution convergence. Examples of methods using the Maximum Entropy principle to minimize χ2(w) include: EROS (Ensemble-Refinement Of SAXS) [30], COPER (Convex Optimization for Ensemble Reweighting) [31], ENSEMBLE [32], and MaxEnt (Maximum Entropy) [12]. Methods maximizing p(wdexpt,E) are used in BE-SAXS (Bayesian Ensemble SAXS) [33], BELT (Bayesian Energy Landscape Tilting) [34], Bayesian ensemble refinement [28], BME (Bayesian/Maximum Entropy) [35], and its iterative version, iBME [36].

In the methods discussed here reweighting is primarily used to select conformers from an initial unbiased ensemble of structures. However, reweighting can also be implemented when generating ensembles using molecular dynamics simulations [37, 38].

3. Types of experimental data

Below we briefly review various types of experimental data typically used for ensemble determination.

3.1. Nuclear Magnetic Resonance (NMR) Spectroscopy

NMR spectroscopy is a versatile technique capable of detecting and providing information on every magnetically active nucleus in a molecule. It allows studies of proteins in their native milieu and usually with no need for chemical modifications. Solution NMR experiments provide various types of data/constraints for structure/ensemble analysis. These include information on inter-atomic distances (Nuclear Overhauser Effect (NOE)), bond orientations (Residual Dipolar Coupling (RDC)) [25, 26, 39] and dynamics (spin-relaxation rates [4042]), as well as distances between protein atoms and a paramagnetic moiety (metal ion or radical) in a form of pseudo-contact shift (PCS) [4347] and paramagnetic relaxation enhancement (PRE) [40, 44, 45]. RDCs can be caused by weak molecular alignment in anisotropic media (such as liquid crystalline media, stretched gels etc) or by the anisotropy of magnetic susceptibility of the paramagnetic moiety (reference [13] compares RDCs caused by steric versus paramagnetic alignment and PCSs as constraints for ensemble selection).

The interconversion among various conformational states of a protein is typically fast on the time scale of NMR experiment, resulting in the measured values of the observables being averaged over the conformational ensemble. Several dual-domain proteins have been used as ‘toy’ systems for developing tools for ensemble analysis using various NMR data, including calmodulin [11, 13] and covalently linked ubiquitin dimers [12, 25, 26, 48] among others.

3.2. Small-Angle Scattering: SAXS and SANS

Data from small-angle scattering (SAS) of X-rays (SAXS) and neutrons (SANS) in solution are also commonly used for ensemble selection [25, 36, 49, 50]. Although SAS measurements do not yield atomic-resolution structures, this technique is capable of providing information on the overall size and shape of the molecule [51]. The measured SAS data come from all molecules in the sample, thus reflecting all possible conformational states. The scattering profile or the reconstructed pair distance distribution function can then be used for ensemble selection either on their own or combined with other types of data, for example, from NMR measurements [5, 6, 25, 52, 53]. SAS data have also been used for validation and ranking of the NMR-derived ensemble solutions [5, 25, 53].

3.3. Förster Resonance Energy Transfer (FRET)

FRET measures distance-dependent energy transfer between the donor and acceptor fluorophores attached to specific sites on a molecule [54, 55]. Bulk/ensemble FRET data are averaged over a large number of molecules, while single-molecule FRET (smFRET) can detect those distances in individual molecules, thus uniquely sensing the conformational states for each molecule at both spatial and temporal resolution. This enables smFRET to directly sample the conformational ensemble of a macromolecule (e.g. [55]). However, this capability is limited by the time resolution of the measurement, and for flexible molecules, like IDPs, that interconvert among different states on a faster time scale, the measured smFRET efficiency becomes averaged over a range of distances/conformations [7]. Therefore ensemble FRET and smFRET can be used as input in ensemble selection methods and also to validate ensemble solutions derived from SAS, NMR, or other data.

3.4. Double Electron-Electron Resonance (DEER) Spectroscopy

The DEER (aka PELDOR) method measures the distance between two paramagnetic labels attached to select sites in a protein [56, 57]. It utilizes magnetic dipole-dipole interaction between their electron spins and allows extraction of the distance distribution profile directly from the observed modulation of the signal. Unlike smFRET, DEER measurements are performed on frozen samples and the results are averaged over all molecules/conformations. DEER has recently been used to characterize conformational ensembles of structurally heterogeneous protein systems [58, 59].

3.5. Cryo-electron Microscopy (cryo-EM)

Cryo-EM is a powerful technique that has recently revolutionized structural biology. It takes 2D snapshots of single particles frozen in various conformations and therefore could be ideally suited for direct observation and reconstruction of the structural ensembles of macromolecules at unprecedented details. However, converting the low-contrast 2D images into high-resolution 3D density maps/structures of the individual states requires the ability to detect numerous molecules trapped in the same conformational state. This works well for relatively rigid macromolecular systems (e.g. the proteasome) or for highly populated states of a dynamic system but the reconstruction of less populated conformers or transition states still presents a significant challenge. Addressing this problem requires integrative approaches using ensemble selection methods [60, 61].

We would like to mention here that the accuracy of ensemble reweighting methods depends on the ability to generate realistic predicted values, which is more challenging for some experimental techniques relative to others. For example, simplified physical models (disks, rods) used to simulate steric alignment can limit accuracy of the predicted RDCs [62, 63], and prediction of paramagnetic effects (PCSs, PREs, RDCs, DEER), while based on exact equations, can be affected by the intrinsic conformational heterogeneity of some paramagnetic tags/moieties [44]. In addition, data predicted for multidomain structures generated using rigid-body Monte-Carlo-based methods might not faithfully account for internal motions (of loops, side chains, etc) within segments that were kept rigid.

4. Outlook: progress and challenges

Past reviews of the subject from 2015–2017 [64, 65] proposed several community goals for further development of ensemble determination methods. Below we highlight recent progress toward these goals as well as some challenges.

Validation of the derived structural ensembles is essential, and recent studies [6, 7, 49] exhibit advances in this direction. An encouraging trend is to use more than one category of ensemble selection methods, for example, one based on Maximum Entropy and one using Maximum Parsimony. Another approach to validation is by using different types of experimental data, for example, comparing predicted data for ensembles derived using NMR and SAS with experimental smFRET and/or DEER data. The use of the probabilistic/Bayesian approach has significantly enhanced the capability of ensemble selection methods by allowing different types of data to be combined together and new types of data (e.g., cryo-EM) to be incorporated. Coverage of the experimentally relevant conformational space by the initial ensemble is essential for the success of reweighting methods. However, finding proper means to assess and visualize the completeness of conformational coverage has proven challenging. An advance in visual and quantitative evaluation of conformational coverage was made in a 2017 study [8] that used density plots obtained by dividing the 3D space into voxels to count the number of occupied voxels as a function of the generated ensemble size.

Visual representation of the ensemble solutions is another challenge, especially when considering the full ensemble (Maximum Entropy), due to the dimensionality of the conformational distribution space. Even a simple dualdomain protein requires six degrees of freedom to specify the orientation and spatial location of one domain relative to another. Thus, there is a need for reduced-dimensionality representations. An example of such an approach, proposed in a 2018 paper [66], utilizes a disk-on-sphere model to visualize probability distributions of interdomain orientations in a dual-domain protein, and enables representation of continuous distributions.

Ensemble determination methods thus far have focused on soluble proteins and RNA (not discussed here). However, tools are much needed to elucidate the structure and conformational ensembles of membrane proteins which, despite constituting about a quarter of all proteins, remain largely unexplored structurally [67, 68]. These systems are particularly difficult to study for a number of reasons including flexibility, instability, and the need to extract them from cell membrane while maintaining or closely mimicking their native environments in order to preserve their structure and function. Developing experimental and computational approaches to tackle membrane proteins should be one of the next frontiers for structural biology.

Several web portals are available at no cost to users. SASSIE [69], URL: https://sassie-web.chem.utk.edu/sassie2 provides efficient tools for generating structural ensembles and for prediction and analysis of SAS data; ATSAS online [70], URL: https://www.embl-hamburg.de/biosaxs/atsas-online/ implements RanCh and GAJOE in the EOM method [19] to generate an ensemble of structures and perform ensemble selection. As for ensemble selection, NMRsuite, URL: https://nmrsuite.genapp.rocks/ enables ensemble determination using MaxEnt and SES methods; WeNMR, URL: http://py-enmr.cerm.unifi.it/access/index/maxocc performs the MaxOcc analysis [23]. Other resources and tutorials are also available online.

As the recent publications demonstrate [7, 20, 35, 36], various methods for ensemble determination have been developed in the recent years. These are exciting advancements in the methodology, and it is important that new implementations do not stay hidden within a specific research group, otherwise combining these tools could become a challenge. Therefore, although the potential of ensemble selection methods is vast and multiple developments are already available, new ideas still need to reach their potential of impact.

Outside the method development community, ensemble selection methods have yet to reach the broad community of potential users. Published applications often use ‘toy’ models for the purpose of testing new methods, although encouraging recent extensions to other protein systems are found in references [6, 7, 49, 50]. Scientists studying structurally heterogeneous protein systems are generally focused on directly using data as restraints and are often unaware that reweighting methods can help answer their questions and at a lesser computational cost. Making the reweighting methods known and accessible to the scientific community requires presenting a clear picture of what each method does, what it is best used for, and what the limitations are.

Acknowledgements

This work was supported by NIH grant GM065334 to D.F. D.F. thanks Drs. Emre H. Brookes, Joseph E. Curtis, and Susan Krueger for insightful discussions.

Footnotes

The authors declare no conflict of interest.

References:

  • [1].Boehr DD, Nussinov R and Wright PE: The role of dynamic conformational ensembles in biomolecular recognition. Nat Chem Biol 2009, 5:789–796. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Brünger AT, Adams PD, Clore GM, DeLano WL, Gros P, Grosse-Kunstleve RW, Jiang JS, Kuszewski J, Nilges M, Pannu NS et al. : Crystallography & NMR System: A New Software Suite for Macromolecular Structure Determination. Acta Crystallogr D 1998, 54:905–921. [DOI] [PubMed] [Google Scholar]
  • [3].Güntert P: Automated NMR Structure Calculation With CYANA. In Protein NMR Techniques. Edited by Downing AK. Humana Press; 2004:353–378. [DOI] [PubMed] [Google Scholar]
  • [4].Dominguez C, Boelens R and Bonvin AMJJ: HADDOCK: A Protein-Protein Docking Approach Based on Biochemical or Biophysical Information. J Am Chem Soc 2003, 125:1731–1737. [DOI] [PubMed] [Google Scholar]
  • [5]. Boughton AJ, Krueger S and Fushman D: Branching via K11 and K48 Bestows Ubiquitin Chains with a Unique Interdomain Interface and Enhanced Affinity for Proteasomal Subunit Rpn1. Structure 2020, 28:29–43.e6. *Structural studies combining X-ray crystallography, solution NMR, and SANS revealed a novel interdomain interface in K11/K48-linked branched tri-ubiquitin. However, the crystal and NMR-derived structures of this compact state of the tri-ubiquitin were significantly different from each other and neither agreed with SANS data. A multi-state ensemble comprising compact and open conformers was needed to achieve agreement with SANS data.
  • [6]. Gomes GNW, Krzeminski M, Namini A, Martin EW, Mittag T, Head-Gordon T, Forman-Kay JD and Gradinaru CC: Conformational Ensembles of an Intrinsically Disordered Protein Consistent with NMR, SAXS, and Single-Molecule FRET. J Am Chem Soc 2020, 142:15697–15710. **An integrative approach combining modeling with SAXS, NMR, and smFRET data is developed and applied to characterize the intrinsically disordered N-terminal region of protein Sic1 and (phosphorylated) pSic1. When analyzed separately, SAXS and smFRET data, yield discrepant inferences of Sic1 and pSic1 global dimensions. However, a joint refinement of SAXS and NMR/PRE data produced conformational ensembles that are consistent with smFRET and other experimental data, demonstrating the utility of integrating diverse experimental data for characterization of IDPs.
  • [7]. Naudi-Fabra S, Tengo M, Jensen MR, Blackledge M and Milles S: Quantitative Description of Intrinsically Disordered Proteins Using Single-Molecule FRET, NMR, and SAXS. J Am Chem Soc 2021, 143:20109–20121. **An integrated approach using a combination of FRET, NMR, and SAXS data in conjunction with the ASTEROIDS selection method is developed and applied to characterize the conformational landscape of an intrinsically disordered N-terminal region of the measles virus phosphoprotein. This detailed study shows how to properly generate predicted smFRET data for the ensemble of an IDP and how combining with PRE data helps overcome smFRET limitations, to obtain conformational ensembles that agree with all these data.
  • [8].Zhang W, Howell SC, Wright DW, Heindel A, Qiu X, Chen J and Curtis JE: Combined Monte Carlo/torsion-angle molecular dynamics for ensemble modeling of proteins, nucleic acids and carbohydrates. J Mol Graph Model 2017, 73:179–190. [DOI] [PubMed] [Google Scholar]
  • [9].Daughdrill GW, Kashtanov S, Stancik A, Hill SE, Helms G, Muschol M, Receveur-Bréchot V and Ytreberg FM: Understanding the structural ensembles of a highly extended disordered protein. Mol Biosyst 2012, 8:308–319. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Kaynak BT, Krieger JM, Dudas B, Dahmani ZL, Costa MGS, Balog E, Scott AL, Doruker P, Perahia D and Bahar I: Sampling of Protein Conformational Space Using Hybrid Simulations: A Critical Assessment of Recent Methods. Front Mol Biosci 2022, 9:832847. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].Potrzebowski W, Trewhella J and Andre I: Bayesian inference of protein conformational ensembles from limited structural data. PLoS Comput Biol 2018, 14:e1006641. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].Berlin K, Castañeda CA, Schneidman-Duhovny D, Sali A, Nava-Tudela A and Fushman D: Recovering a Representative Conformational Ensemble from Underdetermined Macromolecular Structural Data. J Am Chem Soc 2013, 135:16595–16609. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].Andralojć W, Berlin K, Fushman D, Luchinat C, Parigi G, Ravera E and Sgheri L: Information content of long-range NMR data for the characterization of conformational heterogeneity. J Biomol NMR 2015, 62:353–371. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [14].Hansen PC: The L-Curve and its Use in the Numerical Treatment of Inverse Problems. In Computational Inverse Problems in Electrocardiology. Edited by Johnston P. WIT Press; 2000:119–142. [Google Scholar]
  • [15].Fisher CK, Ullman O and Stultz CM: Efficient construction of disordered protein ensembles in a Bayesian framework with optimal selection of conformations. Pac Symp Biocomput 2012, 82–93. [PMC free article] [PubMed] [Google Scholar]
  • [16].Chen Y, Campbell SL and Dokholyan NV: Deciphering Protein Dynamics from NMR Data Using Explicit Structure Sampling and Selection. Biophys J 2007, 93:2300–2306. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [17].Pelikan M, Hura GL and Hammel M: Structure and flexibility within proteins as identified through small angle X-ray scattering. Gen Physiol Biophys 2009, 28:174–189. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [18].Ihms EC and Foster MP: MESMER: minimal ensemble solutions to multiple experimental restraints. Bioinformatics 2015, 31:1951. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [19].Bernadó P, Mylonas E, Petoukhov MV, Blackledge M and Svergun DI: Structural Characterization of Flexible Proteins Using Small-Angle X-ray Scattering. J Am Chem Soc 2007, 129:5656–5664. [DOI] [PubMed] [Google Scholar]
  • [20].Sagar A, Jeffries CM, Petoukhov MV, Svergun DI and Bernadó P: Comment on the Optimal Parameters to Derive Intrinsically Disordered Protein Conformational Ensembles from Small-Angle X-ray Scattering Data Using the Ensemble Optimization Method. J Chem Theory Comput 2021, 17:2014–2021. [DOI] [PubMed] [Google Scholar]
  • [21].Fisher CK, Huang A and Stultz CM: Modeling Intrinsically Disordered Proteins with Bayesian Statistics. J Am Chem Soc 2010, 132:14919–14927. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [22].Cossio P and Hummer G: Bayesian analysis of individual electron microscopy images: Towards structures of dynamic and heterogeneous biomolecular assemblies. J Struct Biol 2013, 184:427–437. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [23].Bertini I, Ferella L, Luchinat C, Parigi G, Petoukhov M, Ravera E, Rosato A and Svergun DI: MaxOcc: a web portal for maximum occurrence analysis. J Biomol NMR 2012, 53:271–280. [DOI] [PubMed] [Google Scholar]
  • [24].Andralojć W, Luchinat C, Parigi G and Ravera E: Exploring Regions of Conformational Space Occupied by Two-Domain Proteins. J Phys Chem B 2014, 118:10576–10587. [DOI] [PubMed] [Google Scholar]
  • [25].Castañeda CA, Chaturvedi A, Camara CM, Curtis JE, Krueger S and Fushman D: Linkage-specific conformational ensembles of non-canonical polyubiquitin chains. Phys Chem Chem Phys 2016, 18:5771–5788. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [26].Castañeda CA, Dixon EK, Walker O, Chaturvedi A, Nakasone MA, Curtis JE, Reed MR, Krueger S, Cropp TA and Fushman D: Linkage via K27 Bestows Ubiquitin Chains with Unique Properties among Polyubiquitins. Structure 2016, 24:423–436. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [27].Chiang YW, Borbat PP and Freed JH: Maximum entropy: A complement to Tikhonov regularization for determination of pair distance distributions by pulsed ESR. J Magn Reson 2005, 177:184–196. [DOI] [PubMed] [Google Scholar]
  • [28].Hummer G and Köfinger J: Bayesian ensemble refinement by replica simulations and reweighting. J Chem Phys 2015, 143:243150. [DOI] [PubMed] [Google Scholar]
  • [29].Byrd R, Hribar M and Nocedal J: An Interior Point Algorithm for Large-Scale Nonlinear Programming. SIAM J Optim 1999, 9:877–900. [Google Scholar]
  • [30].Różycki B, Kim YC and Hummer G: SAXS Ensemble Refinement of ESCRT-III CHMP3 Conformational Transitions. Structure 2011, 19:109–116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [31].Leung HTA, Bignucolo O, Aregger R, Dames SA, Mazur A, Bernèche S and Grzesiek S: A Rigorous and Efficient Method To Reweight Very Large Conformational Ensembles Using Average Experimental Data and To Determine Their Relative Information Content. J Chem Theory Comput 2016, 12:383–394. [DOI] [PubMed] [Google Scholar]
  • [32].Choy WY and Forman-Kay JD: Calculation of ensembles of structures representing the unfolded state of an SH3 domain. J Mol Biol 2001, 308:1011–1032. [DOI] [PubMed] [Google Scholar]
  • [33].Antonov LD, Olsson S, Boomsma W and Hamelryck T: Bayesian inference of protein ensembles from SAXS data. Phys Chem Chem Phys 2016, 18:5832–5838. [DOI] [PubMed] [Google Scholar]
  • [34].Beauchamp KA, Pande VS and Das R: Bayesian Energy Landscape Tilting: Towards Concordant Models of Molecular Ensembles. Biophys J 2014, 106:1381–1390. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [35].Bottaro S, Bengtsen T and Lindorff-Larsen K: Integrating Molecular Simulation and Experimental Data: A Bayesian/Maximum Entropy Reweighting Approach. In Structural Bioinformatics. Edited by Gáspári Z. Springer US; 2020:219–240. [DOI] [PubMed] [Google Scholar]
  • [36]. Pesce F and Lindorff-Larsen K: Refining conformational ensembles of flexible proteins against small-angle x-ray scattering data. Biophys J 2021, 120:5124–5135. *A method is presented for obtaining conformational ensembles of flexible proteins by combining SAXS data with computer modeling using Bayesian Maximum Entropy (BME method) approach and improved prediction of SAXS data. The method is applied to several IDPs and a multi-domain protein containing flexible linkers. iBME (iterative version of BME) is proposed where minimization of χ2 is used first to rescale the predicted data, then BME is used to optimize the weights and calculate a new solution.
  • [37].Rangan R, Bonomi M, Heller GT, Cesari A, Bussi G and Vendruscolo M: Determination of Structural Ensembles of Proteins: Restraining vs Reweighting. J Chem Theory Comput 2018, 14:6632–6641. [DOI] [PubMed] [Google Scholar]
  • [38].Cesari A, Reißer S and Bussi G: Using the Maximum Entropy Principle to Combine Simulations and Solution Experiments. Computation 2018, 6:15. [Google Scholar]
  • [39].Tjandra N and Bax A: Direct measurement of distances and angles in biomolecules by NMR in a dilute liquid crystalline medium. Science 1997, 278:1111–1114. [DOI] [PubMed] [Google Scholar]
  • [40].Fushman D, Varadan R, Assfalg M and Walker O: Determining domain orientation in macromolecules by using spin-relaxation and residual dipolar coupling measurements. Prog Nucl Magn Reson Spectrosc 2004, 44:189–214. [Google Scholar]
  • [41].Ryabov YE and Fushman D: A Model of Interdomain Mobility in a Multidomain Protein. J Am Chem Soc 2007, 129:3315–3327. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [42]. Kauffmann C, Zawadzka-Kazimierczuk A, Kontaxis G and Konrat R: Using Cross-Correlated Spin Relaxation to Characterize Backbone Dihedral Angle Distributions of Flexible Protein Segments. ChemPhysChem 2021, 22:18–28. *The methodology is developed for use of cross-correlated spin-relaxation effects (CCR) in conjunction with Maximum Entropy-based ensemble reweighting approach to characterize the distribution of backbone dihedral angles in a protein. Ubiquitin is used as an example where both flexible and rigid regions are analyzed. This study also demonstrates the utility of the less commonly used CCR data as experimental constraints to guide ensemble analysis.
  • [43].Parigi G, Ravera E and Luchinat C: Magnetic susceptibility and paramagnetism-based NMR. Prog Nucl Magn Reson Spectrosc 2019, 114–115:211–236. [DOI] [PubMed] [Google Scholar]
  • [44].Miao Q, Nitsche C, Orton H, Overhand M, Otting G and Ubbink M: Paramagnetic Chemical Probes for Studying Biological Macromolecules. Chem Rev 2022, 122:9571–9642. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [45]. Orton HW, Huber T and Otting G: Paramagpy: software for fitting magnetic susceptibility tensors using paramagnetic effects measured in NMR spectra. Magn Reson 2020, 1:1–12. *A new publicly available software package for analysis of paramagnetic NMR data (PCS, PRE, RDC and CCR) is presented. Given experimental PCS data and molecular structure, the software determines the magnetic susceptibility tensor and predicts related NMR data that can be used in ensemble selection methods.
  • [46].Müntener T, Joss D, Häussinger D and Hiller S: Pseudocontact Shifts in Biomolecular NMR Spectroscopy. Chem Rev 2022, 122:9422–9467. [DOI] [PubMed] [Google Scholar]
  • [47].Pilla KB, Leman JK, Otting G and Huber T: Capturing Conformational States in Proteins Using Sparse Paramagnetic NMR Data. PLoS One 2015, 10:1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [48]. Hou XN, Sekiyama N, Ohtani Y, Yang F, Miyanoiri Y, Akagi Ki, Su XC and Tochio H: Conformational Space Sampled by Domain Reorientation of Linear Diubiquitin Reflected in Its Binding Mode for Target Proteins. ChemPhysChem 2021, 22:1505–1517. **Paramagnetic NMR data were utilized to characterize the conformations of linear (head-to-tail linked) di-ubiquitin. Two sets of PCS and RDC data resulting from lanthanide tagging at two different positions were analyzed together using ensemble selection methods MESMER and MaxOcc. The results are in agreement with published SAXS and smFRET data. Comparison with crystal structures of di-ubiquitin in complexes with binding partners suggests existence of conformers prealigned for ligand binding.
  • [49]. Sicorello A, Różycki B, Konarev PV, Svergun DI and Pastore A: Capturing the Conformational Ensemble of the Mixed Folded Polyglutamine Protein Ataxin-3. Structure 2021, 29:70–81.e5. **Ensemble selection methods are applied to study protein Ataxin-3 using NMR and SAXS data in conjunction with EROS and EOM methods. The study explores how the conformational plasticity of Ataxin-3 affects its function as it relates to the disease.
  • [50].Hammel M, Amlanjyoti D, Reyes FE, Chen JH, Parpana R, Tang HYH, Larabell CA, Tainer JA and Adhya S: HU multimerization shift controls nucleoid compaction. Sci Adv 2016, 2:e1600650. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [51].Jacques DA and Trewhella J: Small-angle scattering for structural biology—Expanding the frontier while avoiding the pitfalls. Protein Sci 2010, 19:642–657. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [52].Sterckx YGJ, Volkov AN, Vranken WF, Kragelj J, Jensen MR, Buts L, Garcia-Pino A, Jové T, Van Melderen L, Blackledge M et al. : Small-Angle X-Ray Scattering- and Nuclear Magnetic Resonance-Derived Conformational Ensemble of the Highly Flexible Antitoxin PaaA2. Structure 2014, 22:854–865. [DOI] [PubMed] [Google Scholar]
  • [53].Castañeda CA, Kashyap TR, Nakasone MA, Krueger S and Fushman D: Unique Structural, Dynamical, and Functional Properties of K11-Linked Polyubiquitin Chains. Structure 2013, 21:1168–1181. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [54].Hellenkamp B, Schmid S, Doroshenko O, Opanasyuk O, Kühnemuth R, Adariani SR, Ambrose B, Aznauryan M, Barth A, Birkedal V et al. : Precision and accuracy of single-molecule FRET measurements-a multi-laboratory benchmark study. Nat Methods 2018, 15:669–676. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [55].Ye Y, Blaser G, Horrocks MH, Ruedas-Rama MJ, Ibrahim S, Zhukov AA, Orte A, Klenerman D, Jackson SE and Komander D: Ubiquitin chain conformation regulates recognition and activity of interacting proteins. Nature 2012, 492:266–270. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [56].Schiemann O, Heubach CA, Abdullin D, Ackermann K, Azarkh M, Bagryanskaya EG, Drescher M, Endeward B, Freed JH, Galazzo L et al. : Benchmark Test and Guidelines for DEER/PELDOR Experiments on Nitroxide-Labeled Biomolecules. J Am Chem Soc 2021, 143:17875–17890. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [57].Pannier M, Veit S, Godt A, Jeschke G and Spiess HW: Dead-time free measurement of dipole-dipole interactions between electron spins. J Magn Reson 2000, 142:331–340. [DOI] [PubMed] [Google Scholar]
  • [58].Sweger SR, Pribitzer S and Stoll S: Bayesian Probabilistic Analysis of DEER Spectroscopy Data Using Parametric Distance Distribution Models. J Phys Chem A 2020, 124:6193–6202. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [59].Kniss A, Schuetz D, Kazemi S, Pluska L, Spindler PE, Rogov VV, Husnjak K, Dikic I, Güntert P, Sommer T et al. : Chain Assembly and Disassembly Processes Differently Affect the Conformational Space of Ubiquitin Chains. Structure 2018, 26:249–258.e4. [DOI] [PubMed] [Google Scholar]
  • [60].Bonomi M, Pellarin R and Vendruscolo M: Simultaneous Determination of Protein Structure and Dynamics Using Cryo-Electron Microscopy. Biophys J 2018, 114:1604–1613. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [61].Bonomi M and Vendruscolo M: Determination of protein structural ensembles using cryo-electron microscopy. Curr Opin Struct Biol 2019, 56:37–45. [DOI] [PubMed] [Google Scholar]
  • [62].Zweckstetter M: NMR: prediction of molecular alignment from structure using the PALES software. Nat Protoc 2008, 3:679–690. [DOI] [PubMed] [Google Scholar]
  • [63].Berlin K, O’Leary DP and Fushman D: Improvement and analysis of computational methods for prediction of residual dipolar couplings. J Magn Reson 2009, 201:25–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [64].Bonomi M, Heller GT, Camilloni C and Vendruscolo M: Principles of protein structural ensemble determination. Curr Opin Struct Biol 2017, 42:106–116. [DOI] [PubMed] [Google Scholar]
  • [65].Ravera E, Sgheri L, Parigi G and Luchinat C: A critical assessment of methods to recover information from averaged data. Phys Chem Chem Phys 2016, 18:5686–5701. [DOI] [PubMed] [Google Scholar]
  • [66].Qi Y, Martin JW, Barb AW, Thélot F, Yan AK, Donald BR and Oas TG: Continuous Interdomain Orientation Distributions Reveal Components of Binding Thermodynamics. J Mol Biol 2018, 430:3412–3426. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [67].Carpenter EP, Beis K, Cameron AD and Iwata S: Overcoming the challenges of membrane protein crystallography. Curr Opin Struct Biol 2008, 18:581. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [68].Liang B and Tamm LK: NMR as a tool to investigate the structure, dynamics and function of membrane proteins. Nat Struct Mol Biol 2016, 23:468–474. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [69].Curtis JE, Raghunandan S, Nanda H and Krueger S: SASSIE: A program to study intrinsically disordered biological molecules and macromolecular ensembles using experimental scattering restraints. Comput Phys Commun 2012, 183:382–389. [Google Scholar]
  • [70].Petoukhov MV, Franke D, Shkumatov AV, Tria G, Kikhney AG, Gajda M, Gorba C, Mertens HDT, Konarev PV and Svergun DI: New developments in the ATSAS program package for small-angle scattering data analysis. J Appl Crystallogr 2012, 45:342. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES