Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Feb 5.
Published in final edited form as: Structure. 2018 Dec 6;27(2):359–370.e12. doi: 10.1016/j.str.2018.10.013

Structural characterization of biomolecules through atomistic simulations guided by DEER measurements

Fabrizio Marinelli 1,*, Giacomo Fiorin 1
PMCID: PMC6860373  NIHMSID: NIHMS1510434  PMID: 30528595

Summary

Double Electron-Electron Resonance (DEER) is a popular technique that exploits attached spin-labels to probe the collective dynamics of biomolecules in a native environment. Like most spectroscopic approaches, DEER detects an ensemble of states accounting for biomolecular dynamics as well as the intrinsic labels flexibility. Hence, the DEER data alone does not provide high-resolution structural information. To disentangle this variability, we introduce a minimally-biased simulation method to sample a structural ensemble that reproduces multiple experimental signals within the uncertainty. In contrast to previous approaches, our method targets the raw data itself, thereby it brings forth an unbiased molecular interpretation of the experiments.

After validation on the T4-Lysozyme, we apply this technique to interpret recent DEER experiments on a membrane transporter binding-protein (VcSiaP). The results highlight the large-scale conformational movement that occurs upon substrate binding and reveal that the unbound VcSiaP is more open in solution than the X-ray structure.

Keywords: restrained-average dynamics, double electron-electron resonance, coupling simulations and experiments, molecular dynamics, maximum entropy principle, adaptive biasing approach, T4-Lysozyme, VcSiaP, substrate binding protein, tripartite ATP-independent periplasmic transporter

Graphical Abstract

graphic file with name nihms-1510434-f0001.jpg

eTOC blurb

Spectroscopic techniques to monitor functional changes of biomolecular structures, such as double electron-electron resonance, are of difficult interpretation. To address this problem, Marinelli and Fiorin devised a method to optimally combine experiments and molecular simulations. After benchmarking, they applied the approach to a binding protein, resulting in key mechanistic insights.

Introduction

Spectroscopic methods provide important insights into the molecular mechanisms of biomolecules. Long-range detection approaches, such as fluorescence resonance energy transfer (FRET) and double electron-electron resonance (DEER), monitor spectroscopic probes attached to a biomolecule, furnishing information on its structural dynamics. DEER, also referred to as pulsed electron-electron double resonance (PELDOR), is a paramagnetic resonance technique that exploits the dipolar interaction between pairs of electron spins to measure distances between spin-labels added to a biomolecule (Jeschke, 2012; Jeschke et al., 2002). Contrarily to X-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy, DEER does not provide high resolution structural data and requires chemical alteration of the biomolecule but is not limited by its size and does not require crystallization. The main observable is a time-domain signal that is translated into a distance distribution between pairs of spin-labels, often from 1.5 to 8 nm (Jeschke, 2012) and for deuterated samples up to 16 nm(Schmidt et al., 2016). Differences in the distribution at different experimental conditions reveal changes in the structure and dynamics on diverse physiological environments. Thus, DEER is a prominent approach to study conformational movements of globular and membrane proteins (Cafiso, 2014; Jeschke, 2012; Mchaourab et al., 2011). Despite these landmarks, is not often straightforward to describe the biomolecular flexibility from the DEER signal, which encompass an ensemble average of specific observables over multiple states of the biomolecule and spin-label rotamers.

A clear-cut example of this problem is VcSiaP, a protein investigated here. VcSiaP is the substrate-binding protein (SBP) of the tripartite ATP-independent periplasmic (TRAP) transporter, VcSiaPQM, from Vibrio cholera. X-ray structures (Gangi Setty et al., 2014; Johnston et al., 2008; Müller et al., 2006) and DEER data (Glaenzer et al., 2017) identify an open-to-close conformational change upon substrate binding. Nonetheless, the DEER signals lump together protein movements and different spin-labels conformations. Hence, it is unclear to which degree the X-ray structures are in agreement with the DEER experiments and whether the apo form can transiently visit closed conformations of the protein, affecting transport (Mulligan et al., 2009). The distance distribution estimated by sampling labels rotamers libraries(Alexander et al., 2013; Polyhach et al., 2011) of a single structure (e.g. derived from crystallography) or the label accessible volume(Hagelueken et al., 2012) does not fully resolve this issue(Glaenzer et al., 2017). Unless all relevant protein conformations are considered, the protein structural variability is underestimated.

One of the most accurate approaches is to restrain selected observables in molecular dynamics (MD) simulations so that their mean value matches the experimental one, a strategy widely used for NMR refinement(Schwieters et al., 2006). Yet the typical experimental restraints used, e.g. harmonic potentials, alter not only the mean value of the observable but also the amplitude of its fluctuations(Pitera and Chodera, 2012). The implication is that these schemes introduce more conformational restrictions than the ones actually required to reproduce the experimental data. Thus, they are more suitable for short-range structural descriptors (e.g. from nuclear Overhauser enhancement) that are not associated to flexible regions of the biomolecule. In the case of DEER, applying harmonic restraints may result in overfitting, even in a rigid system, due to the high flexibility of the conventional spin-labels.

Alternatively, the experimental constraints can be enforced applying a “minimal bias” to a MD simulation. This may be formulated under the framework of a Bayesian (Bonomi et al., 2016; Hummer and Koefinger, 2015) or a maximum entropy approach(Pitera and Chodera, 2012). Nevertheless, there is no consensus on how to include experimental uncertainty or unknown parameters in the simulation (Boomsma et al., 2014; Hummer and Koefinger, 2015). Some of the previous efforts in this direction, such as Ensemble-Biased MetaDynamics (Marinelli and Faraldo-Gomez, 2015) (EBMetaD) and the restrained ensemble technique (Roux and Islam, 2013), target, exactly and with a minimal bias, the probability density of the distance between spin-labels inferred from the DEER signal. The drawback is that such distance distributions are obtained assuming either smoothness(Jeschke et al., 2006) or Gaussian fitting (Brandon et al., 2012) and are strongly affected by the experimental uncertainty.

We propose a minimally biased atomistic MD simulation technique to target directly the DEER signal without the need to formulate a prior structural model. This approach, here called restrained-average dynamics (RAD), is based on an extended formulation of the maximum-entropy principle that includes explicitly the experimental error and unknown model parameters, as for example the DEER background signal (Brandon et al., 2012). The RAD method is computationally efficient and can be applied either to a single simulation or to multiple coupled replicas. Multiple experiments can be used concurrently, and the energetic contributions of each converge to equilibrium values that can be used to assess simulation models easily and rapidly.

After testing this technique on the T4 lysozyme, we applied it to VcSiaP in apo and substrate bound forms, revealing new insights into the structural features of this protein.

Proposed method

Molecular analysis of the DEER signal

The DEER signal is here indicated as Ftdexp, where td is the experimental detection time of each data point and is unrelated to the simulation time t (discussed below).At each value of td, Ftdexp is a mean value over multiple biomolecular configurations: Ftdexp=Ftd(r(X),Λ1,Λ2) where 〈…〉 stands for ensemble average. The DEER observable, Ftd(r(X),Λ1,Λ2), is the signal originating from a single molecular configuration X, which for a dilute solution of a biomolecule with two spin-labels attached is well described by (Milov et al., 1984; Milov and Tsvetkov, 1997):

Ftd(r(X),Λ1,Λ2)=[(1Λ1)+Λ1k(td,r(X))]e(Λ2|td|)D/3 1

The dipolar kernel, k(td, r(X)), represents the fundamental component of the signal related to the intra-molecular spin-spin dipolar interactions and r(X) is the distance between the two spin labels in the biomolecule; in MTSSL spin labels (Berliner et al., 1982) it is the centers of mass distance between the respective nitroxide groups. The expression for k(td, r(X)) is provided in the Methods (eq. 47). The amplitude Λ1 is called modulation depth and the exponential component represents the background contribution arising from the spin-spin interactions between different biomolecules in the sample. In the latter term, Λ2 depends on the spin concentration (Edwards and Stoll, 2016), and D is the dimensionality of the system: D = 3 for a soluble protein in a homogeneous solution, and 2 for a membrane protein reconstituted into liposomes (Hilger et al., 2007). Λl and Λ2 are usually not known directly, and are determined from fitting procedures (Jeschke et al., 2006).

Restrained-average dynamics

We describe here the basic methodology underlying the optimal bias of MD simulations to reproduce the DEER signal. The RAD approach here introduced builds on concepts outlined in previous studies based on MD simulations (Cesari et al., 2016; Hummer and Koefinger, 2015; Olsson et al., 2015; White and Voth, 2014). As those methods, RAD can be applied to other types of experiments by replacing Ftd(r(X),Λ1,Λ2) with the respective observables.

An ideal agreement between simulation and experiments implies that the time average of Ftd(r(X),Λ1,Λ2) converges to the experimental signal. This outcome is unlikely, due to inaccuracies of force field and model observables, sampling inefficiencies and/or the experimental uncertainty.

To compensate for such discrepancies with a minimal bias on the MD simulation, we use the maximum entropy principle, from which we define a linear perturbation to the MD energy function, UMD(X)(Pitera and Chodera, 2012):

URAD(X)=UMD(X)kBTtdλtdFtd(r(X),Λ1,Λ2). 2

The bias potential, V(X)=kBTtdλtdFtd(r(X),Λ1,Λ2) (kB is the Boltzmann constant and T is the temperature) results in additional atomic forces that remodel the conformational dynamics of the biomolecule to conform to the experiments. In absence of uncertainty, the parameters λtd can be selected so that the time average of Ftd(r(X),Λ1,Λ2) is equal to the experimental value, Ftdexp. If experimental/model errors are not negligible, the parameters λtd are set so that the latter average converges to an optimal value F¯tdRAD (Cesari et al., 2016; Hummer and Koefinger, 2015) that is compatible with the level of uncertainty. Under this requirement, optimal values and model parameters are determined by minimizing the bias on the MD simulation (i.e. optimizing the functional of eq. 8).

Using a gradient descent minimization approach (Pitera and Chodera, 2012), the parameters λtd are evolved during the simulation until convergence:

λtd[t+dt]=λtd[t]Ftd(r(X[t]),Λ1[t],Λ2[t])F¯tdRAD[t]σtd2[t]τ[t]dt 3

where t is the simulation time, σtd2 is a parameter resembling the variance of Ftd(r(X),Λ1,Λ2) and τ[t] is the coupling time; σtd2 and τ[t] control the update rate. To ensure asymptotic convergence of eq. 3, the coupling time is gradually increased as τ[t]=τ0(τ0+t) (Zinkevich, 2003), in which τ0 is the initial value of τ[t], reflecting the typical time scale of the fluctuations of Ftd(r(X[t]),Λ1[t],Λ2[t]) (and thus of the spin-labels distance). Assuming that errors (e.g. experimental and of the model observable) are described by a Gaussian distribution(Edwards and Stoll, 2016), the update rule for the bias targets, F¯tdRAD, is given by:

F¯tdRAD[t]=Ftdexpη2γ[t]λtd[t]. 4

In eq. 4, η is an estimate of the overall uncertainty. Often for DEER it is dominated by the random noise. Otherwise, it can be inferred from RAD simulations on a minimal model system in which η is obtained by scaling the experimental noise to reasonably fit the experimental data (see Assessing the Uncertainty of the DEER data). The noise component can be assessed from the imaginary part of the signal (Brandon et al., 2012) and is generally constant and uncorrelated between data points (Edwards and Stoll, 2016).

The factor γ[t] in eq. 4 is defined by imposing a certain degree of agreement between simulation and experiments(Boomsma et al., 2014), for example by setting Dexp=(1/N)td|F¯tdRADFtdexp|/η=α where N is the number of data points:

γ[t]=ηαNtd|λtd[t]|. 5

The tolerance level, α, accounts for the contribution of unknown errors and can be set to avoid an unnecessarily large simulation bias (Hummer and Koefinger, 2015). Here we selected α = 1, considering η as an accurate estimate of the uncertainty. Alternative choices can be based on the outcome of initial simulation runs. The value of α can be then validated a posteriori or changed by reweighting the simulation ensemble (see Tuning the agreement with the experiments from Reweighted Simulations).

Non-Gaussian errors, including e.g. measurements correlations, are supported through a more general relation (eq. 24). An expression for γ[t] based on χ2 is also given by eq. 37.

We avoid using a priori estimates of the background correction in the DEER signal, and optimize the parameters Λ1 and Λ2 during the simulation to minimize the discrepancy between simulated and experimental signal:

Λi[t+dt]=Λi[t]+εi[t]tdλtd[t]Ftd(r(X),Λ1,Λ2)Λidt. 6

The same expression can be used to optimize model parameters for experiments other than DEER: for example, to quantify the molecular alignment with the magnetic field in residual dipolar couplings (Losonczi et al., 1999; Olsson et al., 2015). Owing to the linear dependence of Ftd(r(X),Λ1,Λ2) with respect to the model parameters, eq. 6 can be directly expressed as a function of the time average of Ftd(r(X),Λ1,Λ2) (or F¯tdRAD; eq. 5354). The term ɛi[t] in eq. 6 regulates the update rate of the parameters; closed-form expressions for it and other specifics are provided in Methods (eq. 5557).

The RAD method can be also used to target concurrently multiple experimental data over different pairs of spin-labels. This is achieved by summing the corresponding bias potential terms in eq. 2 (Pitera and Chodera, 2012). The use of distinct error terms η/γ ensures that each measurement is included with the proper weight. A useful estimate of the amount of bias introduced in the simulation is the reversible work required to fulfill the experimental measurements (see eq. 23 in Methods):

W=kBTln(1ttottetettotexp{[V(X[t])V¯]/kBT}dt) 7

where V(X) is the RAD bias potential at convergence, ttot is the final simulation time, te is the equilibration time and V¯=(ttotte)1tettotV(X[t])dt is the mean value of V(X) along the simulation. Because the RAD bias potential is designed to converge asymptotically (eq. 36), the work value, W, is calculated near equilibrium conditions. Larger discrepancies between experiment and simulation yield larger values of W, which can be used to rank competing structural interpretations of the same experiment. If kinetically distinct conformational states require different amounts of work to equally agree with the experiments, the state that best represents the experimental data entails the smallest work value W (see eq. 8).

In the following, we illustrate the method by using multiple observables on a single simulation: applications with coupled replicas (eq. 4041) or as post-simulation reweighting are also supported (see Ensemble Reweighting).

Results

Illustration of RAD on a model system

We illustrate the basic features of RAD on a simple two-dimensional system undergoing Langevin dynamics. We first constructed a putative DEER signal (Fig 1B) with selected random noise (see Methods, eq. 17), reflecting the level of uncertainty (η = 0.002 in eq. 45). Then, we mimicked the effect of the simulation force field with a potential energy function that depends on two coordinates, one of which is the spin-labels distance (r) outlined in Fig. 1A. The latter potential is specifically designed so that the time average of the DEER observables (Ftd(r,Λ1,Λ2)) during the dynamics does not reproduce the DEER signal (Fig. 1B-C). The RAD method modifies on time the potential energy function of the system, adding a bias potential that is a function of the spin-labels distance (as in eq. 2; see also Fig. 1A). After an initial equilibration stage, the simulated inter-spin heterogeneity matches the experimental data (Fig. 1B–1C) within the uncertainty, and without overfitting (Fig. S2AC); Dexp converges to α = 1 (Fig. 1C). Concurrently, RAD optimizes the value of the model parameters, Λ1(modulation depth) and Λ2 (background contribution). As in standard procedures, we determined the initial value of Λ1, Λ2 by exponential fit of the DEER signal at long time ranges (Jeschke et al., 2006)(after ~2 μs in Fig. 1B); when signal undulations become negligible (i.e. k(td, r)~0 in eq. 1). However, such initial values are suboptimal and during the dynamics, Λl and Λ2 converge to an ideal value that minimizes the bias on the simulation and that is closer to the “true values” (orange lines in Fig. 1D) used to construct the DEER signal. Owing to the presence of the signal noise, ideal and “true values” are not necessarily identical. Further details on the impact of the uncertainty and parameters optimization are reported in the STAR Methods (Test of RAD on a Two-Dimensional System) and Supplemental Information (SI, Fig. S1, S2).

Figure 1.

Figure 1.

Illustration of RAD on two-dimensional system. A. Scheme of a protein with two attached MTSSL spin-labels (R1) (Berliner et al., 1982). B. Comparison between reference, Ftdexp (orange points), and calculated DEER signals (eq. 59), from the unbiased dynamics (green line) or from RAD (black line). C. Average deviation between reference and calculated DEER data (Dexp) along the simulation (see Quantification and Statistical Analysis) for standard dynamics (green) and RAD (black). D. Trend of the model parameters,Λ1 and Λ2, during the RAD simulation. See also Fig.S1.

RAD is the least biased approach to target DEER experiments on the spin-labeled T4 Lysozyme

RAD was applied to the T4-Lysozyme; a benchmark molecular system for DEER analysis and interpretation(Brandon et al., 2012; Kazmier et al., 2011; Marinelli and Faraldo-Gomez, 2015; Roux and Islam, 2013). As in our previous work, we considered the DEER data from three double spin-labeled variants: residue pairs 62–109, 62–134 and 109–134(Marinelli and Faraldo-Gomez, 2015). In the X-ray structure of T4-Lysozyme(Weaver and Matthews, 1987), these residues are sufficiently far apart and we can assume no interconnected movements between the associated spin-labels. Therefore, although the experiments are performed for each pair independently, we solvated the T4-Lysozyme with three R1 spin-labels attached at positions 62, 109 and 134 (Fig. 2A). We simultaneously targeted the three DEER experimental signals (Fig. 2BCD) along 470 ns of RAD simulation. Our reference is a 600 ns conventional MD simulation. The DEER signals calculated with the RAD simulation (see eq. 59) perfectly converge to the experimental ones within the estimated uncertainty (Fig. 2BCD), that for this system is dominated by the random noise (see Methods, table S1 and Fig. S3ABC). Conversely, the MD simulation is slower to convergence (insets of Fig. 2BCD) and the results are not consistent with the DEER data, particularly for the 109–134 pair (Fig. 2D).

Figure 2.

Figure 2.

Comparison between experimental and calculated DEER signals for the spin-labeled T4-Lysozyme. A. Cartoon representation of Solvated T4-Lysozyme (pdb 2LZM(Weaver and Matthews, 1987)). Nitroxide groups distances between labels pairs (outlined as sticks) used in the experiments are highlighted in red. Mesh surfaces reflect nitroxide groups occupancies during MD (green) and RAD (grey) simulations. B, C and D. Experimental (orange dots) and calculated DEER signals (see eq. 59) using MD (green line) or RAD simulations (black line). The shaded area in the signals reflects the statistical uncertainty (see quantification and statistical analysis). Inset: average deviation between calculated and experimental DEER signal (Dexp; see Quantification and Statistical Analysis) during MD (green), and RAD (black) simulations.

Our previous approach, EBMetaD(Marinelli and Faraldo-Gomez, 2015), targets the distribution of the spin-labels distances (red lines in Fig. 2A), derived from the DEER signal; e.g. using the Tikhonov regularization (TR) approach(Jeschke et al., 2006). Instead, the RAD method directly reproduces the DEER signal and distance distributions are obtained from simulation analysis. For T4-Lysozyme, the discrepancy between DEER signals calculated with RAD and MD is translated directly into the corresponding spin-labels distance distributions (Fig. 3ABC: larger panels). Consistently, the largest deviation is observed for the 109–134 pair (Fig. 3C). The Cα-Cα distance distributions obtained with RAD or MD perfectly match each other (Fig. 3ABC: narrow panels), indicating that RAD mostly shift the population of the labels rotameric states (Jeschke, 2013) (occupancy iso-surfaces of Fig. 2A). This effect mainly arises from a larger solvent exposure of spin-label residue 109 (Fig. 3D), arguably related to an overstabilization of compact states in the MD force field (Piana et al., 2014) or to a weakened hydrophobic effect (van Dijk et al., 2015) induced by cryogenic conditions(Jeschke, 2012, 2013). Note that residues 109 and 134 reside on the protein C lobe whereas 62 pertains to the helix connecting C lobe and N lobe, implying that the typical hinge-bending motion of T4-Lysozyme(Yirdaw and McHaourab, 2012) is not significantly altered in the RAD simulation (Fig. S3GH). Owing to the high signal to noise ratio of the DEER experiments, the spin-labels distance distributions obtained with RAD are similar to the target ones used for the EBMetaD approach (Fig. 3ABC: larger panels). There are a few differences between the two approaches at the tail of each histogram, mainly for the 62–134 pair (Fig. 3B). Owing to the long time range undulations of this DEER signal (Fig. 2C), such discrepancy is arguably related to the suboptimal values of the model parameters (Λl, Λ2) obtained by exponential fit, requiring larger regularization in the TR approach (Jeschke et al., 2006).

Figure 3.

Figure 3.

Comparison between MD, RAD and EBMetaD conformational ensembles for T4-Lysozyme. A B and C. Probability distribution of spin-labels (larger panels) and Cα-Cα(narrow panel) distances calculated with RAD (black), MD (green) and EBMetaD (orange) simulations. For EBMetaD, the spin-labels probabilities reflect the reference distributions obtained with the TR approach(Marinelli and Faraldo-Gomez, 2015). D. Probability density of the spin-labels solvent accessible surface area (SASA) calculated from RAD (black) and MD (green). E. Reversible work required to sustain the bias potential (eq. 7) in RAD (black) or EBMetaD (orange). The shaded area in the distributions and the error bars in (E) reflect the statistical uncertainty (see quantification and statistical analysis).

To compare the performance of RAD vs EBMetaD, we computed the reversible work required to construct the bias potential of both methods (eq. 7). Fig. 3E shows that RAD entails the smallest amount of work; hence, is the least biased of the two approaches. This result is also reflected by the larger deviation from the MD of the Cα-Cα distance distributions of EBMetaD compared to RAD (Fig. 3ABC: narrow panels).

RAD simulations reproduce DEER signals for apo and substrate bound VcSiaP

After validation on the T4-Lysozyme, we used the proposed method to investigate the structural dynamics of VcSiaP. The shape of this binding protein comprises a cleft between two lobes, denoted as C and N lobe, in which the substrate binds (Fig. 4A). In analogy with other TRAP transporters binding proteins(Marinelli et al., 2011) the two lobes are connected through a long helix termed α9. X-ray structures of the homologous H. influenzae sialic acid SBPs (HiSiaP)(Johnston et al., 2008; Müller et al., 2006) underline a structural transition upon ligand binding in which the two protein lobes approach each other. In the closed conformation, the substrate interacts with conserved residues of the C and N lobe, while the helix α9 is bent(Gangi Setty et al., 2014; Johnston et al., 2008; Müller et al., 2006). Despite the lack of a substrate bound X-ray structure for VcSiaP, recent DEER data support such kind of ligand induced conformational change also for this protein(Glaenzer et al., 2017). Nonetheless, it is not clear whether apo X-ray structure and DEER signals are totally compatible and what are the relative populations of open vs closed states in the apo and substrate bound forms.

Figure 4.

Figure 4.

Experimental and calculated DEER signals for VcSiaP (A) in the apo (B) and substrate bound (C) states. A. The structure of VcSiaP (pdb 4MAG(Gangi Setty et al., 2014)) is drawn in a cartoon representation and the distance between spin-labels (depicted in sticks) pairs used in the experiments(Glaenzer et al., 2017) are drawn in red. B,C. Experimental data are displayed as orange dots and the DEER signal calculated with RAD and MD simulations (see eq. 59) as black and green lines respectively. The standard error (from three independent simulations) in the calculated signal, is shown as a shaded area.

To address this issue, we performed several µs-long RAD simulations of solvated VcSiaP in apo and sialic acid (Neu5Ac) bound states, biasing each system with the corresponding DEER experimental data reported in a recent work (Glaenzer et al., 2017) (Fig. 4B and 4C). These measurements refer to four double spin-labeled mutants corresponding to residue numbers 54–173, 110–173, 173–225 and 54–110 (Fig. 4). Among these, the first three pairs are used to monitor relative movements of the C lobe respect to the N lobe whereas the last one is a control pair to assess structural variations within the N lobe (Fig. 4A). These residues are sufficiently separated in space that we can exclude interrelated dynamics of the corresponding spin-labels, hence in the simulation setup we introduced four R1 spin-labels at positions 54, 110, 173 and 225 (Fig. 4A). As a control, we also carried out three independent conventional MD simulations of ~1μs for each system. All simulations were initiated from the apo X-ray structure of VcSiaP(Gangi Setty et al., 2014).

The results highlight that in contrast to conventional MD, RAD simulations provide a set of protein conformations and rotameric states of the spin-labels that fulfill the experimental data (Fig. 4B, 4C) within the uncertainty (see Methods and Fig. S6A, S7A).

Atomistic simulations and DEER experiments reveal that the apo state of VcSiaP is on average more open than the X-ray structure

To rationalize the protein structural ensemble arising from RAD and MD trajectories, we calculated the distance between the C lobe and the N lobe during the simulations (see Computational Setup and Analysis). The reference N lobe-C lobe distances of the apo VcSiaP X-ray structure (Gangi Setty et al., 2014) (31Å, Fig. 5) and of a model of the substrate-bound state (23.6Å, Fig. 5; see Computational Setup and Analysis) reflect the extent of protein conformational movement that occurs upon ligand binding. Apo VcSiaP simulations employing the RAD approach were performed enforcing the experimental data obtained in absence of substrate (Fig. 4B). Distance distributions calculated from independent RAD simulations are unimodal and remarkably consistent between each other (Fig. 5A). In the histograms of Fig. 5A, the apo X-ray structure represents a low populated conformation located at a shorter distance (31Å, Fig. 5A and Apo X-ray in Fig. 5C) compared to the main distribution peak (33Å, Fig. 5A and RAD in Fig. 5C). Such distributions drop below 30Å underlying that only open protein conformations are populated in RAD simulations.

Figure 5.

Figure 5.

Conformational ensemble of apo VcSiaP from RAD and MD simulations. A. N lobe-C lobe distance distribution calculated using three RAD simulations of 1 μs each (black lines) or three 250 ns RAD simulations (magenta line) started from the final configurations of the MD simulations of panel B. B. N lobe-C lobe distance distributions calculated from three 1 μs MD simulations. Vertical blue and brown lines denote the N lobe-C lobe distance in the apo VcSiaP X-ray structure(Gangi Setty et al., 2014) and in a model ligand bound structure (see Methods). C. Representative open (RAD) and semi-closed conformations extracted from the simulations and reference apo-X-ray and model bound structures. RAD and semi-closed reference structures represent the peaks of the RAD histogram and of the MD one at lower distances. Helix α9 (red) is shown in cartoon representation. For each structure, we show the protein surface (orange), its cross-section (grey area) parallel to helix α9 and the N lobe-C lobe distance (see Methods).

C lobe-N lobe distance histograms derived from different MD simulations are instead dissimilar (Fig. 5B), suggesting a slower convergence compared to RAD. MD simulations point to a larger flexibility of apo VcSiaP, in which partially closed conformations are accessible (semi-closed in Fig. 5C). This result is consistent with previous simulations performed on a related binding protein(Marinelli et al., 2011). Such semi-closed state is however unstable in presence of the experimental restraints: after a few ns of RAD simulation, these conformations are completely reverted into an open state (magenta line in Fig. 5A). This result is also robust with respect to the choice of RAD parameters (red α values in Fig S6AB).

Fully closed structures resembling the model bound conformation (model bound in Fig. 5C) are undetected in both RAD and MD simulated ensembles, suggesting that this conformation is not accessible in absence of substrate.

Summarizing, in contrast to T4-Lysozyme, the restraints imposed on the DEER data alter the backbone conformations of apo VcSiaP, stabilizing a state of the protein that is on average more open than the X-ray structure(Gangi Setty et al., 2014) (Fig. 5 and Fig. S4B). No major differences in the structural features of the individual C/N lobe domains are instead observed between RAD and MD simulations (see Fig. S5).

The conformation of VcSiaP that best represent the DEER data in presence of substrate resembles ligand bound X-ray structures of homologous SBPs

Simulations and experiments underscore a marked change in the DEER signal from apo to substrate bound VcSiaP. In the previous section, we outlined that in absence of sialic acid, the experimental data mainly reflects a completely open state of the protein. Here, we employ a similar analysis to obtain a molecular interpretation of the DEER data in presence of substrate. To elucidate the role of conserved C lobe residues (e.g. R145 and R125) on the protein conformational change, we started the simulations from (Fig. 6C) the apo X-ray structure of the protein(Gangi Setty et al., 2014) in which the sialic acid molecule (Fig. S4A) is docked into the N lobe of VcSiaP according to the X-ray structure of homologous proteins (Gangi Setty et al., 2014; Johnston et al., 2008; Müller et al., 2006). In this conformation, direct contacts between substrate and C lobe residues are absent. We then performed several μs-long RAD simulations that were biased using the DEER signals obtained for sialic acid bound VcSiaP (Fig. 4C). During all simulations, the protein undergoes a large-scale structural transition from open to closed conformations (Fig. S7D and movie S1). To quantify the results, we computed the N lobe-C lobe distance distributions: their histograms (Fig. 6A) underline that, compared to apo VcSiaP, the presence of substrate and experimental restraints completely switches the structural ensemble over closed conformations in which the model bound structure is now well populated. Consistently with the latter (model bound, Fig. 6G), helix α9 is at least partially bent in all simulations (Fig. 6DEF). However, RAD is based on the principle of minimal perturbation to the MD force field and it is not designed to cancel out free energy barriers that are present in the unbiased simulation. This implies that each simulation remains trapped on a different structural state corresponding to three slightly different peaks in the distance histograms (Fig. 6A). One of these states (RAD3, Fig. 6D) corresponds to a partially closed protein conformation featuring a salt bridge interaction between the ligand and residue R145; nevertheless, H-bonds with other important C lobe residues are not fully formed (Fig. 6 and table 1). The middle peak (RAD2, Fig. 6E) reflects a closed conformation in which sialic acid interacts with several conserved C lobe residues and partially with R125 (Fig. 6 and table 1). The remaining structural state (RAD1, Fig. 6F) entails a fully coordinated configuration of the ligand with C lobe/N lobe residues and closely resembles the substrate-bound X-ray structure of the homologous HiSiaP(Johnston et al., 2008) (model bound, Fig. 6G and Fig. S4C).

Figure 6.

Figure 6.

Structural features of sialic acid-bound VcSiaP from RAD and MD simulations. The N lobe-C lobe distance distribution is shown for independent RAD (panel A, black lines) and MD (panel B) trajectories. The bar chart in panel A represents the work performed by the RAD bias potential for each simulation (see eq. 7), in which we excluded the contribution of 54–110 pair, as it is not associated to opening or closing motions of the protein (see also Fig. S7B). The error bars reflect the statistical uncertainty (see Quantification and Statistical Analysis). C. Starting configuration of all simulations. D, E and F. Reference structures corresponding to peaks of the distance histograms in panel A. G. Homology model of the VcSiaP substrate-bound state (see Computational Setup and Analysis). The protein structure is represented as in Fig. 5C; insets show the sialic acid molecule coordinated by VcSiaP residues in the binding site.

Table 1.

Contact probabilities between R145/R125 residues and sialic acid in RAD and MD simulations. The R145/R125-Neu5Ac contacts were measured monitoring the distance between the CZ carbon of R145 or R125 and the carbon pertaining to the Neu5Ac carboxylate group or the one next to it respectively. Cutoffs of 4.5 Å and 5.5 Å were used for R145-Neu5Ac and R125-Neu5Ac contacts.

RAD1 RAD2 RAD3
R145-Neu5Ac 100% 100% 74%
R125-Neu5Ac 100% 42% 4%
MD1 MD2 MD3
R145-Neu5Ac 74% 10% 0%
R125-Neu5Ac 6% 0% 0%

To evaluate which of these conformational states is the most compatible with the experimental data, we calculated for each of them the work required to bias the sampling (eq. 7). Remarkably, the state featuring the smallest value of the work (RAD1) is also the closest to the model bound structure (bar chart in Fig. 6A). Different ranking strategies based on ensemble reweighting (see Ensemble Reweighting) lead to the same result (Fig. S7BC), supporting the reliability of the work values to sort distinct structural states.

Our results are supported by previous mutational analysis(Gangi Setty et al., 2014; Glaenzer et al., 2017; Johnston et al., 2008) underlying the role of residue R125 to stabilize the closed conformation of the protein(Glaenzer et al., 2017). Consistently, we observed a progressive weakening of the R125-Neu5Ac interaction going from RAD1 to RAD3 states (see table 1) that correlates with the associated higher degree of opening (histograms in Fig. 6A).

As a control, we performed three μs-long MD simulations. Owing to the slower convergence compared to RAD, the distance histograms obtained with different MD trajectories are very diverse (Fig. 6B). Furthermore, none of these simulations reach a fully closed state of the protein. Instead, two out of three trajectories mainly populate open protein conformations (Fig. 6B and table 1).

These results highlight that the experimental restraints enhance the exploration of structural states compatible with the DEER data. Note that in absence of substrate, the restraints on the experiments (i.e. of Fig. 4C) are not sufficient to sample a fully closed state of the protein (Fig. S6C), supporting the notion that the ligand effectively stabilizes the latter conformation.

Discussion

It is widely recognized that both structured and unstructured biomolecules reside in multiple conformational states. The relative balance between the states of a biomolecule profoundly affects the response to different stimuli and is an integral component of many cellular mechanisms. Several biophysical techniques can be used to obtain information on the structural variability of biomolecules, such as diamagnetic NMR, paramagnetic relaxation enhancement (PRE), FRET and DEER. Among all, DEER-based methods allow measuring the distances between spin-labels attached to a system of interest over a range of several nanometers, thus capturing large-scale conformational transitions in a biomolecule(Mchaourab et al., 2011).

Like most solution techniques, a DEER measurement represents an ensemble average of specific observables over multiple conformations, and it is often difficult to translate the spectroscopic signal into a set of high-resolution structures of the biomolecule. A practical solution to this limitation is to use additional structural information alongside the actual DEER data. This approach has been widely followed for diamagnetic NMR, where atomistic models are obtained directly from experimental restraints (Schwieters et al., 2006). The underlying assumption is that the biomolecule is quasi-rigid beyond the range of nuclear dipolar interactions (less than 10 Å), allowing to use a single structure to analyze short-ranged dynamics (Lipari and Szabo, 1982). When such assumption is not valid, computation may offer a framework to derive simultaneously structure and dynamics from NMR data (Lindorff-Larsen et al., 2005).

In the case of DEER spectroscopy, the commonly used MTSSL spin-labels (Berliner et al., 1982) and their chemical analogs are highly flexible by themselves: a unified approach to characterize the structural features of the biomolecule is needed in nearly every experiment. To this purpose, we introduced the restrained-average dynamics (RAD) method to generate a structural ensemble consistent with DEER experiments. Leveraging the good accuracy achieved by atomistic simulations with modern force fields (Piana et al., 2014), the premise of the RAD approach is to limit the bias to the extent of the information provided by the experiment. Any additional force needed to align the predicted observables with the experiment is limited in magnitude by the experimental/model uncertainty.

The interpretation of each DEER signal relies on parameters, such as the background term, which are unknown a priori and must be estimated from fitting (Jeschke et al., 2006). A similar approach is used to analyze residual dipolar couplings in diamagnetic NMR(Losonczi et al., 1999). In the RAD method, no separate fitting is required, and the optimal value of each parameter is determined by minimizing the bias added during the simulation.

The method introduced here (RAD) is rather flexible and can be applied to a single simulation trajectory or to parallel simulation replicas (eq. 4041, Fig. S2D)(Bonomi et al., 2016; Roux and Islam, 2013). Compared to existing techniques, our approach has the advantage that it is designed to reproduce the experimental signal itself, rather than the geometrical distributions derived from it (Marinelli and Faraldo-Gomez, 2015; Roux and Islam, 2013), a concept that has been previously applied for modelling protein dimers formation(Hilger et al., 2007). Therefore, RAD avoids the approximations and error accumulation that are at the onset of the data analysis process(Brandon et al., 2012; Jeschke et al., 2006).

The numerical performance of RAD was validated on an idealized model (Fig. 1) and on the widely used benchmark system, the spin-labeled T4-Lysozyme (Fig. 2). For this system, we demonstrated that a conventional MD simulation fails to reproduce the DEER measurements, and the addition of a minimal bias through the RAD approach allows to fully remove the discrepancy. The RAD ensemble is statistically consistent with the distance distributions derived from previous DEER-analysis methods (Jeschke et al., 2006) (Fig. 3). However, by targeting directly the experimental signal rather than model distance distributions, RAD simulations require a smaller bias and their results are more transferable (Fig. 3E). For the T4-Lysozyme, the RAD correction translates essentially into a perturbation of the rotameric states of the spin-labels (Fig. 2A, Fig. 3, Fig. S3GH).

Lastly, in a more biologically relevant application, we used μs-long RAD and MD simulations to obtain a structural interpretation of recent DEER measurements on the substrate-binding protein VcSiaP (Glaenzer et al., 2017), which is used by the Na+-coupled sialic acid importer, VcSiaPQM, to sequester the substrate and deliver it to its membrane domain (Mulligan et al., 2009).

X-ray crystallography and DEER experiments have established that VcSiaP, as other members of its protein family, undergoes an open-to-closed conformational transition upon substrate binding(Gangi Setty et al., 2014; Glaenzer et al., 2017). The closed conformation of the protein has been suggested to bind the membrane domain of VcSiaPQM and promote substrate transport (Mulligan et al., 2009). It was unclear whether the unbound VcSiaP is able to assume transiently closed conformations and form spurious interactions with the membrane domain of VcSiaPQM. To elucidate this issue we used simulations of unbound and sialic acid-bound VcSiaP targeting the available DEER data for the respective conditions (Glaenzer et al., 2017) (Fig. 4).

Our simulations support an induced-fit binding mechanism, in which the unbound VcSiaP never populates closed conformations and favors instead structures more open than the unbound X-ray structure (Gangi Setty et al., 2014) (Fig. 5). This result is a direct outcome of the use of DEER information (Fig. 5), and assumes that cryogenic conditions do not affect the protein conformational equilibrium (Glaenzer et al., 2017). Binding of sialic acid produces a full closure of VcSiaP in multiple simulations, prompted by the interaction of the ligand with conserved protein residues. In agreement with the mutational analyses, residue R125 is essential to stabilize the closed state of the protein(Gangi Setty et al., 2014; Glaenzer et al., 2017; Johnston et al., 2008). The calculated closed structure most compatible with DEER experiments is also supported by the X-ray structures of substrate-bound homologous proteins (Gangi Setty et al., 2014; Johnston et al., 2008; Müller et al., 2006) (Fig. 6). This last result reinforces the suitability of the work performed by the RAD forces as a parameter to rank multiple independent simulations.

Conclusions

We introduced a MD simulation-based method (the restrained-average dynamics, or RAD) to reconstruct structural ensembles from DEER experiments. RAD does not target model distributions of interatomic distances as in recent simulations approaches, rather, it achieves a minimal bias by reproducing the experimental signal itself. Its application to a substrate-binding protein identifies the structural details of its conformational response by directly observing it when introducing the DEER signals in the simulation. Numerical results converge asymptotically to the equilibrium ensemble, and the work performed to fulfill the experiments provides the likelihood of each predicted model. The theory and computational implementation are formulated without restriction to DEER spectroscopy and can be applied to multiple experimental measurements, opening the possibility of an integrative framework to accurately characterize the structural features of complex biomolecules.

Star Methods

Contact for reagent and resource sharing

Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact Fabrizio Marinelli (fabrizio.marinelli@nih.gov)

Method details

Computational setup and analysis

In this work, we used two types of computer-simulation analyses based on atomistic simulations; conventional MD simulations and the approach presented in this study, restrained-average dynamics (RAD).

All simulations were carried out in explicit solvent using NAMD (Phillips et al., 2005). We used CHARMM27/CMAP (Mackerell et al., 2004) and CHARMM36(Best et al., 2012) for T4-Lysozyme and VcSiaP respectively. The simulations were carried out at constant temperature (298 K) and pressure (1 bar) with a time step of 2 fs and periodic boundary conditions. A cut off of 12 Å was used for Van der Waals and short-range electrostatic interactions; long-range electrostatic interactions were evaluated using the PME approach. The EBMetaD simulations of T4-Lysozyme (~200 ns) are taken from previous work (Marinelli and Faraldo-Gomez, 2015). The total simulated time amounts to ~1.1 μs for T4-Lysozyme and ~13 μs for VcSiaP. Additional details on the setup of the calculations are provided in the section below. The model bound structure of Fig. 56 is a homology model of sialic-acid bound VcSiaP that was constructed using the SWISS-MODEL web server (Bienert et al., 2017), employing as template the ligand-bound X-ray structure of HiSiaP (sequence identity ~50%, PDB code 3B50) (Johnston et al., 2008). For the VcSiaP system, the N lobe-C lobe distance is calculated as the distance between the centers of mass (white spheres in Fig. 5C) of Cα atoms belonging to N lobe (residues 2–7, 37–41, 49–57, 61–63) and C lobe (residues 145–147, 149–157, 163–165, 169–176) respectively. Protein reference structures were depicted using PyMOL (Schrödinger, 2015).

Computational setup details

The VcSiaP simulation systems comprise the protein, ~18000 TIP3P water molecules, 42 (41 for apo VcSiaP) Na+ ions and 34 Cl- ions (~100 mM and additional counterions). The setup was initially prepared using the CHARMM-GUI graphical interface(Jo et al., 2008) and then slowly equilibrated in different stages lasting in total ~30 ns, in which restraints on protein and eventually ligand atoms are gradually removed.

To set the protonation state of ionizable residues we used PROPKA(Olsson et al., 2011) according to pH=7.5; same as the DEER experiments(Glaenzer et al., 2017). To determine the protonation state of ambiguous residues (E66 in particular: pKa in the range 6–7), we performed preliminary MD simulations assessing the stability of the initial X-ray configuration for different protonation states. Accordingly, we evaluated the protonation states of both substrate bound and unbound protein, using MD simulations (tens of ns long) in which the initial structure was either the apo VcSiaP X-ray structure(Gangi Setty et al., 2014) or the ligand bound HiSiaP X-ray structure(Johnston et al., 2008) (assessing in this case the protonation state of analogous residues). These tests suggest that there is no change in the protonation state upon ligand binding, e.g. E66 residue results charged in both apo and substrate bound states of the protein.

For the substrate bound VcSiaP system, flat bottom restraining potentials were applied during all simulations to keep the ligand bound to the N lobe of the protein (Fig. 4A). These restraints are defined according to a half-harmonic potential; V(d) = (k/2)(ddupper)2, were dupper is an upper limit for the distance d, after which the bias potential V(d) is applied and k is a force constant. In particular (see sialic acid atom names in Fig S4A), the distances between the nitrogen atom of the side chain of residue Q9 and atoms O or O7 of sialic acid were kept within 4 Å and 5.5 Å respectively. The distance between the Cγ atom of residue D48 and atom O7 of sialic acid is maintained within 4 Å. The distance between the Cδ atom of residue E66 and atoms O8 or O9 of sialic acid is kept below 4.3 Å. The force constants for the bias potential were in each case 100 kcal/mol Å2. An additional harmonic restraint (k = 25 kcal/mol Å2) was imposed on the root mean square deviation of the sialic acid heavy atoms respect to the X-ray coordinates of the ligand(Johnston et al., 2008) in order to limit the conformational movements of the substrate.

In all RAD simulations, the time constant of eq. 3 (or eq. 50) was scaled as τ(t)=τ0(τ0+t), in which τ0 is its initial value. The value of τ0 was selected smaller for T4-Lysozyme than VcSiaP to account for the slower backbone protein movements that occur in the latter system (see table S1 and S2). To enhance the equilibration of the system, for apo VcSiaP RAD simulations, the time constant was kept static at τ0 = 2.5 ns for the initial 40 ns trajectory and then scaled as τ(t)=τ0(τ0+t). For substrate bound VcSiaP the latter parameter was kept constant to 5 ns in the first 140 ns of simulation and then scaled in the same manner. Other parameters for RAD atomistic simulations of T4-Lysozyme and VcSiaP are provided in table S1 and S2.

For T4-Lysozyme, the starting value of Λ1 and Λ2 was obtained from a Gaussian-fit analysis of the experimental data(Brandon et al., 2012; Stein et al., 2015). For VcSiaP, the initial value of such parameters was obtained using preliminary tests on the two-dimensional model system and subsequent RAD equilibration runs on the fully atomistic system.

Maximum entropy formulation

The main concept of the maximum entropy principle (Boomsma et al., 2014; Pitera and Chodera, 2012) is to correct the conformational ensemble of a biomolecule on the basis of the best compromise between amount of bias applied (W) and agreement with the experimental data (Perr). This can be formalized as an optimization problem whose result minimizes the generalized Kullback–Leibler functional:

DKL=W(F¯tdRAD, Λ1, Λ2)kBTlnPerr(F¯tdRAD)+C 8

in which Perr(F¯tdRAD) is the uncertainty distribution (including all sources of error) and C is an irrelevant shift constant. W(F¯tdRAD,Λ1,Λ2) is the reversible work performed to bias the ensemble (same as eq. 7), which depends on the optimal values, F¯tdRAD (and therefore on λtd ) and on a finite set of model parameters (here labeled Λ1, Λ2).

Eq. 36 arise from a gradient based minimization of DKL, in particular, eq. 4 stems from the assumption that the error distribution is Gaussian:

Perr(F¯tdRAD)=exp{γtdN(F¯tdRADFtdexp)2/2η2}(η2π/γ)N 9

The derivations are reported in sections Theoretical basis of the maximum entropy formulation and Theoretical derivation of the RAD approach).

Ensemble reweighting

In principle the conformational ensemble of any simulation can be reweighted to account for a modified energy function, provided that the starting ensemble covers the most relevant conformations of the target one. The energy function of the reference simulation can be generically written as:

UREF(X)=UMD(X)+VREF(X) 10

where VREF (X) is an initial bias potential, for example the average bias potential of a previous RAD simulation. Here we assume that the target ensemble stems from imposing a given degree of agreement with respect to a set of experimental (DEER) data, thereby it arises from the same type of energy function given by eq. 2:

ρRAD(X)=exp{UMD(X)kBT+tdλtdFtd(r(X),Λ1,Λ2)}exp{UMD(X)kBT+tdλtdFtd(r(X),Λ1,Λ2)}dX 11

After some rearrangement, eq. 11 can be also written as:

ρRAD(X)=ρREF(X)Ω(X)ρREF(X)Ω(X)dX 12

where ρREF(X)exp{(UMD(X)+VREF(X))/kBT} is the reference ensemble distribution and Ω(X)=exp{tdλtdFtd(r(X),Λ1,Λ2)+VREF(X)/kBT} is the weight required to correct the initial ensemble. As in in eq. 2, the parameters λtd ensure that the ensemble average of Ftd(r(X),Λ1,Λ2) converges to the optimal value, F¯tdRAD and, exploiting eq. 45, they can be calculated self-consistently according to the following equations:

nF¯tdRAD=Ftd(r(X),Λ1n,Λ2n)ρREF(X)Ωn(X)dXρREF(X)Ωn(X)dX 13
λtdn+1=λtdn(1θ)+θγn(FtdexpnF¯tdRAD)η2 14
γn+1=ηαNtd|λtdn+1| 15

in which Ωn(X)=exp{tdλtdnFtd(r(X),Λ1n,Λ2n)+VREF(X)/kBT} is the weight at step n and θ is a suitable update rate that can be chosen as a fraction of η/(γnDexpnσtd) (see also eq. 3), in which Dexpn=td|nF¯tdRADFtdexp|/Nη.The parameters Λ1n,Λ2n can be updated at each step according to eq. 6, i.e. minimizing the discrepancy between optimal and experimental values (eq. 57). At convergence, the work required to bias the ensemble in reference to the unbiased MD is estimated as in eq. 7:

W=kBTlnexp{[V(X)V¯]/kBT}ρREF(X)Ω(X)dXρREF(X)Ω(X)dX 16

where V(X)V¯=kBTtdλtd(Ftd(r(X),Λ1,Λ2)F¯tdRAD). Besides the global work value, the work associated to each individual experiment can be calculated introducing the respective bias potential term in eq. 16 (or in eq. 7). Note however that owing to the correlations between the different experimental observables, the total amount of work is generally not the sum of the individual components.

This scheme can be used to reweight a conventional MD simulation to account for the experimental data or to correct/validate the ensemble of a previous RAD simulation imposing different degrees of agreement with the experiments, α (see below).

Tuning the agreement with the experiments from reweighted simulations

In absence of precise information on all sources of error, the RAD conformational ensemble is constructed in order to achieve a good balance between amount of bias introduced by the experimental restraints (work) and agreement with the experiments, α (see eq. 5).

A useful strategy to set this balance, without resorting to multiple simulation trials, is to reweight the ensemble of existing simulations at different values of the parameter α, as reported in the previous section. Exploiting eq. 16, the work required to bias the sampling can be measured at each value of α. The typical shape of this function is an “L-curve” (see for example Fig. S2AB) in which for small values of α a large amount of work is necessary to further improve the agreement with the experiments(Hummer and Koefinger, 2015). The smallest reasonable value of α is then near the kink of the function (L-curve criterion). If values of α near the kink are associated to a significant amount of bias, α can be chosen as the smallest value for which the associated work does not exceed the expected accuracy of the force field(Hummer and Koefinger, 2015), e.g. 2–3 kBT. Large bias values can also suggest that, owing to sampling issues, the simulation did not explore structural states that are fully compatible with the experiments (e.g. RAD3 trajectory in Fig. 6) or that the experimental conditions stabilize particular conformations that are poorly sampled in the unbiased simulation. If the kink is located near values of α that are significantly smaller than one, then α = 1 can be considered an adequate choice provided that the estimated uncertainty, η, is reasonably accurate, e.g. η is given by the DEER random noise and error correlation effects are negligible(Edwards and Stoll, 2016). In the applications presented in this work, this behavior was observed especially for experimental signals that are already close to the ones obtained with the unbiased simulation (e.g. the RAD1 trajectory of VcSiaP in the bound state, Fig. S7A).

This procedure can be used to validate and eventually adjust a posteriori the value of α of a previous RAD simulation. In case such simulation entails multiple experimental data, this analysis can be performed on each individual measurement maintaining unaltered the bias potential on the others (see Fig. S3ABC, Fig. S6A and Fig. S7A). The RAD conformational ensemble can then be reweighted according to the optimal values of α, provided that such reweighting is reliable; for example, verifying that the work associated to this correction (referred to the initial RAD simulation) is within a few (2–3) kBT.

The application of the L-curve approach to model systems with and without systematic error (Fig. S2AB, see also the section Test of RAD on a two-dimensional system) and to the systems investigated in this work (Fig. S3ABC, Fig. S6A, Fig. S7A) is reported in the supplemental information (SI). In particular, it shows that, for the RAD simulations of T4-Lysozyme and VcSiaP, the combination of estimated uncertainties, η, and selected values of α(= 1) is reasonably valid. In these systems, the number of DEER data points (N) is much larger than the number of effective degrees of freedom of the model (table S1 and S2, eq. 45) and, inspecting the imaginary component of the DEER signal, error correlations between adjacent data points are not apparent(Edwards and Stoll, 2016), thus α = 1 is an indication of good fit within the estimated uncertainty(Stein et al., 2015). In particular, for spin-label pairs in which the kink of the L-curve is significantly below α = 1, the estimated uncertainty is dominated by the signal noise (table S1, S2).

The reweighting scheme can be also exploited to select a reasonable value of α from preliminary RAD or MD simulations, to be used in a subsequent RAD simulation. Owing to the expected poor sampling of such initial simulations, the value of α can be chosen near the kink of the L-curve.

Other approaches can be also used to select the optimal value of α as for example cross validation and Bayesian criteria(Edwards and Stoll, 2018). These methods however are not based on the assessment of the bias on the simulation, therefore the L-curve approach is preferable in the context of molecular simulations. An example of application of the generalized cross validation approach to RAD (eq. 46) is reported in the SI (Fig. S2C).

Two-dimensional model system

The model system that we used to test the RAD approach (Fig. 1 and Fig. S1-S2) comprises two coordinates (r1 and r2) undergoing an overdamped Langevin dynamics. Each of these coordinates represents a spin-labels distance. We considered a free energy function along each distance that is given by F(r) = −kBTlnP(r), in which P(r) is a reference “unbiased” distribution (Fig. S1A) and r is the distance between a putative spin-labels pair. The unbiased potential energy function that modulates the Langevin dynamics was defined as U(r1,r2)=F(r1)+F(r2). To construct a reference DEER signal we used the following equation (based on eq. 48):

Ftdexp=Pref(r)Ftd(r,Λ1,Λ2)dr+g(η) 17

Where Pref (r) is a designed spin-labels distance distribution (orange distribution in Fig. S1A) and g(η) is a Gaussian random noise with standard deviation η.

The DEER signal from eq. 17 was used as target experimental data for the RAD approach (Fig. 1 and Fig. S1–S2), in which the RAD bias potential acts on one of the two spin-labels distances. For the target DEER signal and for the one calculated from the unbiased dynamics (Fig. 1B and S1) the model parameters were set to Λ1 = 0.18 and Λ2 = 0.08193 μs−1. Their initial values for RAD simulations were calculated by a single exponential fit on the target DEER signal of Fig. 1B (same of Fig. S1C) for a time larger that 2 μs, yielding Λl = 0.1865 and Λ2 = 0.07994 μs−1. For all Langevin simulations, the diffusion coefficients in the two coordinates were set to 0.001 Å2/[simulation step] and the temperature was 300 K. For the RAD simulation τ0 = 107 steps (see eq. 3 and eq. 50).

Assessing the uncertainty of the DEER data

A useful strategy to assess the contribution of systematic errors and of DEER observables approximations on the overall uncertainty, η, is the following:

  • Set the initial uncertainty, η, as the signal noise, e.g. estimated from the imaginary component of the experimental signal (Brandon et al., 2012).

  • Perform test RAD simulations on a fast model system consisting of a Langevin dynamics along the spin-labels distance (as in Fig. 1), using a constant potential energy landscape. For such uninformative model, in absence of errors other than noise, RAD would typically best fit the DEER data at approximately α = 1: if this choice underestimated the uncertainty, the deviation between simulation and experiments, Dexp[t]=td|F¯td[t]Ftdexp|/Nη ( F¯td[t] is given by eq. 59), would remain greater than one throughout the simulation and large biasing forces would be developed (as the case of RAD with static Λl and Λ2 in Fig. S1B). A simple correction in this case would be to set η as the signal noise multiplied by a factor α estimated by the value of Dexp that the simulation can easily reach.

    More rigorously, one could run Langevin simulations at different values of α and compute the function W (work) vs α (see above). The corrected η is the signal noise multiplied by the value of α at the kink of the curve. An example is in the inset of Fig. S7A (110–173 pair).

    According to this scheme, for several DEER measurements used in this work, the random noise already provides a reasonable estimate of the uncertainty (table S1, S2), for a few signals the latter had to be adjusted to avoid experimental data overfitting (VcSiaP system: values marked in red in table S2).

  • A RAD simulation can now be performed on the atomistic system using the value of η estimated in the previous steps and setting α = 1. In the applications presented here, the preliminary estimate of η from the simple model system and the choice α = 1 resulted to be adequate as shown in the subsequent L-curve validation tests from the reweighted RAD ensemble (Fig. S3ABC, S6A, S7A). This notwithstanding, the value of α can be verified during the simulation by monitoring the degree of agreement with the experiments and the work performed by the bias potential. If the value of α is underestimated, Dexp[t] is expected to remain greater than α during the simulation and the work calculated for different time ranges will increase reaching large unphysical values (case α = 0.1 in Fig. S3). This behavior can be amended setting a as a small value of Dexp measured from the simulation before an excessive bias is built up.

  • If preliminary simulations are available, the initial value of α can be alternatively assessed using the L-curve criterion from the reweighted ensembles.

  • The RAD ensemble can be finally refined through reweighting at optimal values of α as reported in the previous sections.

Theoretical basis of the maximum entropy formulation

Here we introduce the basic principles to couple simulations and experiments according to the maximum entropy principle, in presence of uncertainty and unknown model parameters. The proposed formulation can be applied to all the experimental techniques for which the experimental measurement reflects the mean value of a particular observable over an ensemble of atomic configurations. In the context of a molecular simulation, such ensemble average is represented by the time average of the latter observable over the simulation trajectory.

According to the maximum entropy principle, to impose a specific time average on a set of observables, ξi(X, Λ), is sufficient to add a linear perturbation to the simulation energy function, UMD(X), (Pitera and Chodera, 2012):

URAD(X)=UMD(X)1βiλiξi(X,Λ) 18

where X denotes the atomic configurations, Λ1, Λ2, …, Λn are unknown model parameters and β = 1/kBT, where kB is the Boltzmann constant and T is the temperature. The bias potential in eq. 18, V(X)=(1/β)iλiξi(X,Λ) generates additional atomic forces, fV=V(X)=1βiλiξi(X,Λ).

If the uncertainty is negligible, the parameters λi ensure that the time average of ξi(X, Λ) during the simulation (denoted ξ¯i is equal to the experimental value, ξiexp. We now assume instead that the overall uncertainty in the ensemble average of the experimental observable, ξi(X,Λ)=ξ¯i, in a realistic system, is described by a given distribution function Perr(ξ¯) representing the likelihood that ξ¯=ξ¯1,ξ¯2,,ξ¯m reproduces the experimental observations.

The concept is to determine the corrected ensemble of configurations and average values ρRAD(X,ξ¯/Λ), that deviates minimally from the molecular dynamics (MD) ensemble, ρMD(X)exp{βUMD(X)}, and from the error distribution, Perr(ξ¯). Our premise is that, notwithstanding the uncertainty, eq. 18 still provides the optimal correction to a conventional MD simulation; i.e. ρRAD(X)=ρRAD(X,ξ¯)dξ¯exp{βUMD(X)+iλiξi(X,Λ)}. However, the parameters λi are now set so that the average value of ξi(X, Λ) converges not exactly to the experimental data but to an optimal value, ξ¯iRAD, accounting for such uncertainty. These considerations translate into the following expression for ρRAD(X,ξ¯/Λ) :

ρRAD(X,ξ¯/Λ)exp{βUMD(X)+iλiξi(X,Λ)}i(ξ¯iξ¯iRAD) 19

in which (ξ¯iξ¯iRAD) denotes a function that is peaked around ξ¯iRAD resembling a Dirac δ. Note that λi depends implicitly on ξ¯ according to equation below:

ξ¯i=ξi(X,Λ)exp{βUMD(X)+iλiξi(X,Λ)}dXexp{βUMD(X)+iλiξi(X,Λ)}dX. 20

The next step is to identify the corrected ensemble that maximizes the excess cross-

entropy relative to the reference probability distribution, ρMD(X)Perr(ξ¯). Therefore, we introduce the following Kullback–Leibler functional (negative of excess cross-entropy):

DKL=ρRAD(X,ξ¯/Λ)lnρRAD(X,ξ¯/Λ)ρMD(X)ρexp(ξ¯)dXdξ¯. 21

Substituting the expressions for ρRAD(X,ξ¯/Λ) and ρMD(X) in eq. 21, the latter equation can be written as:

DKL=iλiξ¯iRADlnexp{βUMD(X)+iλiξi(X,Λ)}dXlnPerr(ξ¯RAD)+C. 22

In eq. 22, C denotes an irrelevant constant, while the first two terms are related to the work required to bias the conformational ensemble (see below), yielding eq. 8. The reversible work can be in fact defined as the free energy difference between biased and unbiased ensemble distributions:

W=1βln[exp{β(UMD(X)+V(X)V¯)}dXexp{βUMD(X)}dX] 23

where V¯=V(X)=V(X)exp{β(UMD(X)+V(X))}dX/exp{β(UMD(X)+V(X))}dX, i.e. the notation 〈〉 stands for ensemble average over the simulation ensemble. Notably, eq. 23 can be also rewritten in reference to the Kullback–Leibler divergence from biased to unbiased conformational ensembles, βW=ρRAD(X)ln[ρRAD(X)/ρMD(X)]dX, leading to βW=iλiξ¯iRADlnexp{βUMD(X)+iλiξi(X,Λ)}dX, that are the first two terms of eq. 22.

Additionally, eq. 23 can be also rearranged as W=(1/β)lnexp{β[V(X)V¯]} being formally equivalent to eq. 7 and eq. 16.

The optimal values (ξ¯iRAD) and model parameters (Λ) are obtained by minimizing the DKL functional, for example by finding the values that set to zero the derivatives of eq. 22:

(DKLξ¯jRAD)Λ,ξ¯ijRAD=λi(lnPerr(ξ¯RAD)ξ¯jRAD)ξ¯ijRAD 24
(DKLΛj)Λnj,ξ¯RAD=kλk(ξk(X,Λ)Λj)Λnj,. 25

It is worth pointing out that eq. 24 can be also derived using the Bayesian approach introduced by Hummer et al. (Hummer and Koefinger, 2015).

Theoretical derivation of the RAD approach

In this section, we derive a set of equations describing the time evolution of the bias potential used in the RAD approach. This time evolution is provided by a time dependence of the parameters λi and Λ in eq. 18 and it is determined such that at convergence, the underlying dynamics fulfills the experimental measurements within the uncertainty. To simplify the formulation, we assume initially a single observable and we introduce an energy function analogous to the one in eq. 18(Pitera and Chodera, 2012):

URAD(X,λ)=UMD(X)1βλ(ξ(X,Λ)ξ¯RAD)+(K+1)βlnexp{βUMD(X)+λ(ξ(X,Λ)ξ¯RAD)}dX. 26

From the latter energy function, we can derive the dynamic evolution of both the atomic configurations and the parameter λ. Compared to eq. 18, eq. 26 features an additional term representing a bias potential acting on λ, in which K reflects the biasing strength. The usefulness of eq. 26 becomes clear by introducing the free energy as a function of λ:

F(λ)=1βlnexp{βURAD(X,λ)}dX=Kβlnexp{βUMD(X)+λ(ξ(X,Λ)ξ¯RAD)}dX. 27

The derivatives of F(λ) respect to λ are given by:

dF(λ)dλ=Kβ[ξ(X,Λ)ξ¯RAD] 28
d2F(λ)dλ2=Kβ[ξ(X,Λ)2ξ(X,Λ)2]. 29

Hence, the free energy, F(λ), has a minimum at the λ value for which ξ(X,Λ)=ξ¯RAD.

This result implies that for a large value of the parameter K, the dynamical evolution of λ and atomic coordinates dictated by eq. 26 provides a time average of ξ(X, Λ) that is close to the optimal value ξ¯RAD. To illustrate a specific case, we assume that the dynamics of the parameter λ is characterized by an overdamped Langevin process:

λ˙=βDλURAD(X,λ)λ+2DλR[t] 30

where Dλ is the diffusion coefficient of λ, R(t) is a stochastic term and t is the simulation time. Introducing the expression of URAD(X, λ) in eq. 30 we obtain the following equation:

λ˙=Dλ(ξ(X,Λ)ξ¯RAD)Dλ(K+1)(ξ(X,Λ)ξ¯RAD)+2DλR[t]. 31

Eq. 31 is not really applicable in the present form as the mean value, 〈ξ(X, Λ)〉, is not known a priori. To obtain a useful expression for the time evolution of λ, we select Dλ small enough so that the time variation of λ is slower compared to the one of the observable ξ(X, Λ). Under this condition the dynamics of λ captures only the average effect of the fluctuations of ξ(X, Λ), hence, in eq. 31 ξ(X,Λ)ξ(X,Λ). As we are only interested in converging to the global minimum of F(λ), we also neglect the noise term of eq. 31 (large K approximation). Therefore, we obtain the following expression:

λ˙=(ξ(X,Λ)ξ¯RAD)σ2τ 32

where DλK = 1⁄σ2τ, in which σ2 denotes the typical variance of ξ(X, Λ) during the dynamics and τ is a time constant.

Relations similar to eq. 32 have been proposed also in previous studies based either on a Bayesian or on a maximum entropy approach(Cesari et al., 2016; Hummer and Koefinger, 2015; Olsson et al., 2015; White and Voth, 2014). To avoid large time oscillations of λ, τ must be usually selected in the order of the equilibration time of ξ(X, Λ). Integrating eq. 32 we obtain the update rule for λ:

λ[t+dt]=λ[t](ξ(X[t],Λ[t])ξ¯RAD[t])σ2[t]τ[t]dt. 33

The present formulation can be extended to multiple observables and experimental measurements by simply updating on time the individual parameters, λi, in eq. 18 according to eq. 33, leading to an expression equivalent to eq. 3:

λi[t+dt]=λi[t](ξi(X[t],Λ[t])ξ¯iRAD[t])σi2[t]τi[t]dt. 34

The time average of the selected observables along the trajectory typically converges to the optimal values ξ¯iRAD, provided that the latter values are energetically feasible and that different observables are not strongly correlated(Pitera and Chodera, 2012; White and Voth, 2014). Note however that the presence of correlations between the observables could increase the strength of the bias leading to larger time fluctuations of the terms λi[t]. In such cases, it can be useful to reduce the bias strength by increasing the term σi2. According to the tests performed in this work, large time oscillations of λi[t] give rise to an overall larger simulation bias.

Guided by a series of preliminary tests, we set the time update of the coupling constant τi according to the Zinkevich’s online gradient descent algorithm(Zinkevich, 2003), τi[t]=τ0i(τ0i+t), in which τ0i is the initial coupling time and can be selected as the time range associated to the fluctuations of ξi(X, Λ). Such gradual increase of τi induces smaller fluctuations of λi during the simulation thereby facilitating the convergence.

To consider the overall uncertainty, the term ξ¯iRAD in eq. 34 is updated on time according to eq. 24, that for a Gaussian error distribution lead to an expression equivalent to eq. 4:

ξ¯iRAD[t]=ξiexpηi2γ[t]λi[t]. 35

The meaning of parameter γ is explained in the Restrained-average dynamics section and resembles the confidence factor introduced by Hummer et. al.(Hummer and Koefinger, 2015).

In this regard, our approach is also related to the method introduced by Gull and Daniell(Gull and Daniell, 1978) for image reconstruction and then proposed by Boomsma et al.(Boomsma et al., 2014) in the context of molecular simulations. Specifically, γ is selected so that ξ¯iRAD reproduces the experimental value, ξiexp, p to a certain level, that we defined as Dexp=i|ξ¯iRAD[t]ξiexp|/Nηi, in which N is the number of data points, leading to the expression for γ[t] given by eq. 5. Note that the degree of agreement with the experiments can be alternatively defined by the (more common) sum of square residuals:

χ2N=1Ni(ξ¯iRAD[t]ξiexp)2ηi2=α2 36

Substituting the expression for ξ¯iRAD[t] as provided by eq. 35, into eq. 36, we obtain an alternative update rule for γ:

γ[t]=1α1Niηi2λi2[t]. 37

Note however that assuming for example α = 1, eq. 37 usually entails a larger amount of bias than eq. 5.

The specific strategy to select the parameter α (using either eq. 37 or eq. 5) are discussed in the sections above.

It is worth pointing out that large values of the uncertainty, ηi/γ e.g. small values of γ, may lead in principle to instabilities due to broad time fluctuations of ξ¯iRAD[t] (see eq. 35). In practice, we didn’t experience such instabilities in the application of this methodology to DEER spectroscopy, even when the signal is very noisy or when the unbiased simulation already reproduces the experimental observations, unless, the uncertainty becomes unphysically large (e.g. η~1 with α~1). In such extreme cases, a convergent behavior can be facilitated by updating ξ¯iRAD[t] (and eventually also γ[t]) at a slower pace:

ξ¯iRAD[t+dt]=ξ¯iRAD[t](1θ)θ(ξ¯iexpηi2γ[t]λi[t]) 38

in which θ (0 ≤ θ ≤ 1) is a suitably chosen update rate. The parameters Λ are determined by minimizing the DKL functional (i.e. reducing the simulation bias and improving the agreement with the experiments) according to a gradient optimization algorithm (equivalent to Equation 6) based on eq. 25:

Λj[t+dt]=Λj[t]+εj[t]iλi[t](ξi(X,Λ)Λj)Λnjdt 39

where εjt is an update rate appropriately selected (see below for application to DEER).

The trend of a RAD simulation can be inspected monitoring the model parameters Λj[t](Fig. 1D) and the factor γ[t] (Fig. S1B, S2D) during the trajectory, in which γ[t] reflects the global evolution of the terms λi[t]. Typically, after an initial equilibration stage, they converge to specific values. Non-convergent behavior of either of these terms is an indication of instability, owing for example to low values of α (Fig. S3E, α = 0.1) and/or to wrong setting of the simulation parameters. Another useful indicator for assessing the simulation behavior is Dexp=i|ξ¯i[t]ξiexp|/Nηi in which ξ¯i[t] is the time average of ξi(X, Λ) along the simulation, e.g. after the equilibration time. Ideally, Dexp converges on time to the imposed value of α and in case it remains larger than α throughout the simulation, this might suggest that α must be increased (Fig. S3D, α = 0.1). Lastly, the trend of the work for different simulation intervals (global or of the individual observables) is also a useful descriptor to assess the simulation progression. Again, large and divergent work values can give an indication that the value of a is too small (Fig. S3F, α = 0.1).

Using RAD with multiple walkers

Sampling and convergence of the RAD approach can be improved using a simple generalization of the methodology involving parallel simulation trajectories. Similarly to the multiple walkers metadynamics technique(Raiteri et al., 2006) such scheme can be devised using multiple simulation replicas that share a common bias potential arising from the average bias potentials of the individual simulations:

URAD(X)=UMD(X)1βi(1NRj=1NRλij)ξi(X,Λ) 40

in which NR is the number of replicas and λij is the bias factor of the observable ξi(X, Λ) associated to replica j. In practice eq. 40 exploits bias factors that are averaged over the replicas. Each λij evolves on time according to eq. 34 leading to the following update rule for the average bias factor, λ¯i=j=1NRλij/NR:

λ¯i[t+dt]=λ¯i[t](1NRj=1NRξi(Xj[t],Λ[t])ξ¯iRAD[t])σi2[t]τi[t]dt 41

where Xj[t] are the atomic coordinates of replica j at time t. Optimal values and model parameters are evolved according to eq. 35 and 39 (eq. 4 and 6), in which the term λ¯i is used in place of λi Similarly, the parameter γ is calculated using eq. 5 or eq. 37 according to the average bias factor.

Interestingly, in the limit of small time variations of the term (1NRj=1NRξi(Xj[t],Λ[t])ξ¯iRAD[t]), e.g. near convergence and using many replicas, this scheme becomes equivalent to the replica average approach(Bonomi et al., 2016; Cavalli et al., 2013).

The multiple walker RAD method provides in principle the same conformational ensemble of the single RAD trajectory, as illustrated on a model system (inset of Fig. S2D), however, the use of multiple replicas improves the sampling and reduces the time fluctuations of the bias potential leading to a better convergence (Fig. S2D).

Finally, the performance of RAD could be further improved by coupling the latter approach with enhanced sampling techniques through a replica exchange approach(Cesari et al., 2016).

Alternative methods to select the degree of agreement with the experiments

Besides the L-curve plot discussed previously, an alternative route to infer the optimal degree of agreement with the experiments, α, is to exploit methods that are typically used to select the value of the regularization term in the Tikhonov regularization approach(Edwards and Stoll, 2018) or to assess the best model in fitting procedures(Brandon et al., 2012; Stein et al., 2015), as for example cross validation methods and Bayesian criteria. Most of these approaches require an estimate of the effective number of degrees of freedom in the model, that can be calculated as the trace of the hat matrix or the influence matrix, defined according to the following equation:

ξ¯RAD=H^ξexp 42

From eq. 42 the elements of the hat matrix can be derived as follows:

Hij=ξ¯iRADξjexp 43

Exploiting eq. 35 we can express the previous derivatives as: ξ¯iRADξjexp=kξ¯iRADλkλkξjexp=kCikγηk2(Ikjξ¯kRADξjexp) in which ξ¯iRADλk=Cik=ξi(X,Λ)ξk(X,Λ)ξi(X,Λ)ξk(X,Λ) is the covariance matrix of the observable calculated on the simulation ensemble. From the previous relation we can derive the following expression for the hat matrix:

H^=I^(I^+C^γ^η)1 44

in which {γ^η}ii=γηi2 is a diagonal matrix. Therefore, the effective number of model parameters or degrees of freedom, nf, can be estimated as:

nf=tr(H^)=Ntr((I^+C^γ^η)1) 45

or using symmetrized matrices as nf=Ntr((γ^η1/2[γ^η1+C^]γ^η1/2)1).

In general, the number of effective degrees of freedom is smaller for observables that are correlated (as H^ depends on the covariance matrix) and increases by decreasing the effective uncertainty, ηi/γ, and thereby for small values of α in eq. 5 (see Fig. S2C). Once such quantity has been estimated, it can be used for example to calculate the generalized cross validation function at different values of α(Edwards and Stoll, 2018):

GCV(γ[α])=i(ξ¯iRADξiexp)2[1nf/N]2 46

The optimal value of α can be selected as the one that minimizes the GCV function. Our test on a model system dominated by the random error shows for example that most of the decrease in GCV function occurs in the region α > 1 (inset of Fig S2C); i.e. in agreement with the L-curve method, α~1 provides a reasonable choice of this parameter.

Details on the application of RAD to DEER experiments

As discussed previously, DEER experiments entail a time dependent signal, Ftdexp that can be related to the ensemble average of a set of observables. Therefore, we can combine simulations and DEER data using the RAD approach by substituting in the previous equations ξi(X, Λ) with the corresponding DEER observable, Ftd(r(X),Λ1,Λ2), (see eq. 1). Here we assume that the DEER signal has been scaled and shifted so that F0exp1 (Brandon et al., 2012).

For two isolated spins, the dipolar kernel of eq. 1 is typically modeled as(Brandon et al., 2012; Edwards and Stoll, 2016):

k(td,r)=π[C(z)2+S(z)2]6ωdtdcos[ωdtdarctanS(z)C(z)] 47

where r is the spin-labels distance, ωd=g2μB2μ0/4πr3 is the dipolar frequency, whereas C(z)=0zcos(π2x2)dx and S(z)=0Zsin(π2x2)dx are the cosine and sine Fresnel integrals (z=6ωdtd/π). The previous equation implicitly assumes macroscopically disordered sample and ideal pulses(Jeschke et al., 2006). The implication is that such formulation is not accurate for spin-labels distances below 17–18 Å(Jeschke et al., 2006). To avoid artifacts in the RAD methodology, the biasing forces can be removed below those distances. We also neglected possible issues arising from orientation selectivity(Jeschke, 2012; Schiemann and Prisner, 2007), overlapping of the electron paramagnetic resonance spectra of the spin-labels pair and overlapping of the excitation bands of pump and detection pulses(Salikhov and Khairuzhdinov, 2015). More complex expressions of the DEER kernel, including more parameters to be optimized, can be eventually employed to account for such effects. Similarly, more accurate expressions for the background term in eq. 1, involving additional parameters, can be adopted to account for excluded volume effects in large biomolecules(Brandon et al., 2012). The connection between the DEER signal and the spin-labels distance distribution, P(r), becomes explicit by expressing the experimental signal as the mean value of the DEER observables:

FtdexpFtd(r(X),Λ1,Λ2)=P(r)Ftd(r,Λ1,Λ2)dr. 48

The typical analysis of the DEER data aims at obtaining P(r) from fitting and/or regularization procedures based on eq. 48 (Brandon et al., 2012; Edwards and Stoll, 2016; Jeschke et al., 2006). The value of the parameters Λl, Λ2 is also usually obtained by data fitting(Brandon et al., 2012; Jeschke et al., 2006). The common procedure in this regard is to exploit a single exponential fit at long time ranges(Jeschke et al., 2006). If the DEER kernel function has not decayed to zero (small signal undulations) in the available time range of the fit, such analysis may lead to suboptimal values of the parameters (see Fig 1D) that then affect the shape of the DEER-based distance distribution (Brandon et al., 2012).

Considering the linear relationship between Ftd(r(X),Λ1,Λ2) and the DEER kernel in eq. 1, to simplify the implementation of the RAD approach, we considered k(td,r(X)) as the actual observable for the maximum entropy correction. Therefore, the simulation energy function is biased according to the following equation:

URAD(X)=UMD(X)1βtdNλtdkk(td,r(X)) 49

in which N is the number of data points and λtdk is updated according to eq. 34:

λtdk[t+dt]=λtdk[t]k(td,r(X[t]))K¯tdRAD[t]σk2τ[t]dt. 50

The parameter σk in eq. 50 is related to the variance of the observable k(td, r(X)) and for all the simulations carried out in this study was selected as σk~0.1. The term k¯tdRAD is calculated according to eq. 1:

K¯tdRAD[t]=(F¯tdRAD[t]e(Λ2[t]|td|)D/3)e(Λ2[t]|td|)D/3Λ1[t]+1. 51

Comparing eq. 49 and eq. 2, the relationship between the corresponding biasing parameters is:

λtd[t]=λtdk[t]/(e(Λ2[t]|td|)D/3Λ1[t]). 52

Hence, considering eq. 52, the time update of F¯tdRAD[t] and γ[t] follows eq. 4 and 5, in which we used a single noise term (ηi = η). The evolution of the parameters Λ1 and Λ2 can be obtained substituting the expression of the DEER observable given by eq. 1 in eq. 39 and considering that k(td,r(X))~k¯tdRAD. Thus, we obtain these specific expressions for eq. 6:

Λ1[t+dt]=Λ1[t]ε1[t]tdNλtdk[t](1k¯tdRAD[t])Λ1[t]dt 53
Λ2[t+dt]=Λ2[t]ε2[t]tdNλtdK[t](Λ1[t](k¯tdRAD[t]1)+1)Λ1[t]D3|td|D/3Λ2[t](D3)/3dt. 54

The terms ε1[t] and ε2[t] is eq. 53 and 54 regulate the update rate of the parameters and are related to the second derivatives of the DKL functional (eq. 22) respect to the parameters. In practice, ε1[t] and ε2[t] are selected so that at each step, variations of Λ1 or Λ2 over a time τ[t], induce changes in k¯tdRAD that are in the order of σk:

ε1[t]=σk2Λ12[t]τ[t]tdN(k¯tdRAD[t]1)2 55
ε2[t]=σk2Λ12[t]τ[t]tdN((Λ1[t]k¯tdRAD[t]Λ1[t]+1)D3|td|D/3Λ2[t](D3)/3)2. 56

For a good behavior of the parameter evolution, the term ϵ is typically chosen in the order of =N/σktdN|λtdk|.

Alternatively, the parameters can be selected at each step by a gradient based minimization of the following function:

χk2=td(Λ1e(Λ2|td|)D/3)2(k(td,r(X))ktdexp)2η2 57

where ktdexp=(Ftdexp(1Λ1[t])e(Λ2[t]|td|)D/3)/Λ1[t]e(Λ2[t]|td|)D/3 and ktd,rX is the time average of ktd,rX along the simulation. Note that such procedure is exactly equivalent to eq. 6 or eq. 5354, considering that λtdk is related to the difference between ktdexp and ktdRAD (through eq. 4) and that the latter optimal value converges to 〈k(td, r(X))〉. Typically, eq. 5354 can converge faster than the latter approach in case the unbiased simulation does not reproduce the experimental signal. However, in case of large noise on the experimental data, to avoid instabilities due to the time oscillations of the optimal value (see. eq. 4), it is more convenient to replace the latter with the corresponding time average (i.e. to minimize the χk2 function).

Finally, the RAD formulation can be extended to multiple independent DEER experiments by simply adding the corresponding bias potential terms in eq. 49:

URAD(X)=UMD(X)1βjtdjNjλtdjkk(tdj,rj(X)) 58

in which the index j account for the different DEER experiments and the other parameters are updated on time following the equations reported above.

Test of RAD on a two-dimensional system

To illustrate the importance of considering the experimental noise and of the model parameters optimization, we show the outcome of RAD simulations on the two-dimensional model system of Fig. 1 (see also the section Two-dimensional model system) in which either the latter parameters optimization (as in eq. 5354) was not carried out or the signal noise was artificially considered zero. For all simulations, the parameter σk in eq. 50 was set as the average standard deviation of the observables k(td, r(X)) along an unbiased Langevin dynamics. As shown in Fig. S1B-S1C (and in Fig. 1), in contrast to the unbiased dynamics, the RAD approach matches the target DEER data up to the signal noise (as we imposed α = 1 in eq. 5) and the corresponding spin-labels distance distribution approaches the reference histogram (black line in the inset of Fig. S1C). Despite the initial values of Λ1 and Λ2 are close to the reference ones (see Two-dimensional model system and Fig. 1D), if such parameters are not optimized (i.e. setting ϵ = 0 in eq. 5556) the system experiences large biasing forces (large values of γ; eq. 5) in order to reproduce the target data (Fig. S1B; right panel), resulting in a noisy spin-labels distance histogram (blue line in the inset of Fig. S1C). The value of γ is even larger when the signal noise is neglected (η = ηi = 0 in eq. 35 or in eq. 4) and it constantly increases during the simulation (Fig. S1B; right panel). Conversely, the agreement between reference and calculated DEER signal does not improve significantly along the simulation (Fig. S1B; left panel). The implication is a very rough spin-labels distance distribution (cyan line in the inset of Fig. S1C), resembling the behavior observed in the Tikhonov regularization approach for small values of the regularization parameter(Jeschke et al., 2006). These results underscore the importance of considering parameters optimization and the signal noise in order to avoid a large simulation bias due to experimental data overfitting. Note that large divergent forces are in general a signature of instability of the RAD simulation and can be for example experienced in case the parameter α in eq. 5 is underestimated.

As in this case the overall uncertainty is determined by the random error, the discrepancy between the reference histogram and the distance distribution obtained with RAD depends on the signal noise and on the number of data points (Fig. S1D). In the case the latter noise encompasses the DEER signal calculated from the conventional dynamics, the RAD approach does not bias the sampling, yielding a probability distribution that matches the one of the unbiased dynamics (Fig. S1D, right panel).

In particular, the trend of the bias potential for different noise levels shown in Fig. S1E reflects the corresponding values of the work required to sustain the latter potential (table in Fig. S1E). In this regard, a negligible value of the work is compatible with a bias potential that is almost zero everywhere. It is worth pointing out that, for the examples shown in Fig. S1, calculating the work using only the RAD sampling according to eq. 7 or measuring the Kullback-Leibler divergence from unbiased dynamics to RAD configurational ensemble W=kBTDKLρRADρUnb, lead to very similar values (see table in Fig. S1E).

The choice of the parameter α = 1 in the simulation of Fig. 1 can be confirmed by assessing the bias work (W) for different values α using the reweighting approach reported in the section Ensemble Reweighting. As shown in Fig. S2A the kink of the curve W vs α is located at α < 1, therefore, considering that the uncertainty is given in this case by the random noise, α = 1 can be considered a good choice. For α = 1 the number of effective degrees of freedom (eq. 45 and Fig. S2C) is much smaller than the number of DEER data points (N = 275), hence the selection α = 1 resembles a good fit condition(Stein et al., 2015). It is worth pointing out that,

despite the correlations between observables Ftdr,Λ1,Λ2 at adjacent time values, raising the density of data points by a factor of four does not change the results (red curves in Fig. S2A).

To mimic the presence of a systematic error in the observable Ftd(r,Λ1,Λ2), we considered the case in which the RAD approach is used without model parameters optimization and these are suboptimal (as in Fig. S1; RAD with static Λ1 and Λ2). In this case, the curve W vs α (Fig. S2B) has a kink for α > 1, providing a signature of the presence of a systematic error. The most reasonable choice of α is near the kink (α = 1.35 in Fig. S2B), that leads to an adequate simulation bias and to a smoother distribution of the spin-label distance compared to the case in which α = 1 (inset of Fig. S2B).

Quantification and statistical analysis

For all simulations, the initial segment of each trajectory was considered equilibration and excluded from analysis (50 ns and 100 ns for T4-Lysozyme RAD and MD simulations respectively, and 200 ns for VcSiaP). Such equilibration stage was assessed monitoring the time-convergence of the factor γ, the model parameters Λ1, Λ2 and the quantity Dexp (see below and Theoretical derivation of the RAD approach).

The simulated DEER signal was calculated as the time average of Ftd(r(X),Λ1,Λ2) (see eq. 1) after equilibration (te):

F¯td[ttot]=1ttottetettotFtd(r(X[t]),Λ1[t],Λ2[t])dt. 59

The DEER signal calculated from MD uses best fitted values of Λ1, Λ2 (eq. 57).

The mean deviation between the calculated and the experimental DEER signal Ftdexp reported in Fig. 12, is defined as Dexp[t]=td|F¯td[t]Ftdexp|/Nη, in which η is the estimated uncertainty and N is the number of experimental data points (in Fig. 1C, te = 0).

The error reflected by the shaded area in Fig. 23 as well as the error bars of the bar charts in Fig. 3E and in Fig. 6A were obtained by block averaging of the respective simulation trajectories.

Supplementary Material

Supplemental Information
movie S1
Download video file (8.8MB, mp4)

Highlights.

  • Proposed method to couple experiments and simulations implemented in Colvars

  • Applied to target the double electron-electron resonance signal within the error

  • Reproduced the experimental data for the test system T4-Lysozyme and for VcSiaP

  • In contrast to substrate bound, apo VcSiaP is more open than the X-ray structure

Acknowledgements

We thank Dr. José D. Faraldo-Gómez for helpful discussions and supporting this research. We thank prof. Gregor Hagelueken for providing the VcSiaP DEER data used in this study. We are grateful to Vanessa Leone for suggestions and reading the manuscript. This work was funded by the Division of Intramural Research of the National Heart, Lung and Blood Institute (NHLBI), National Institutes of Health (USA) and utilized the computational resources of the NIH HPC Biowulf cluster and of a local NHLBI cluster.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Declaration of interests

The authors declare no competing interests

Data and Software Availability

The restrained-average dynamics approach for atomistic and model systems and the related reweighting tool are implemented in the Colvars module (Fiorin et al., 2013) included in the NAMD, LAMMPS and VMD programs (https://github.com/Colvars/colvars).

References

  1. Alexander NS, Stein RA, Koteiche HA, Kaufmann KW, Mchaourab HS, and Meiler J (2013). RosettaEPR: Rotamer Library for Spin Label Structure and Dynamics. Plos One 8, 1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Berliner LJ, Grunwald J, Hankovszky HO, and Hideg K (1982). A novel reversible thiol-specific spin label: papain active site labeling and inhibition. Anal Biochem 119, 450–455. [DOI] [PubMed] [Google Scholar]
  3. Best RB, Zhu X, Shim J, Lopes PE, Mittal J, Feig M, and Mackerell AD Jr. (2012). Optimization of the additive CHARMM all-atom protein force field targeting improved sampling of the backbone phi, psi and side-chain chi(1) and chi(2) dihedral angles. J Chem Theory Comput 8, 3257–3273. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bienert S, Waterhouse A, de Beer TA, Tauriello G, Studer G, Bordoli L, and Schwede T (2017). The SWISS-MODEL Repository-new features and functionality. Nucleic Acids Res 45, D313–D319. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bonomi M, Camilloni C, Cavalli A, and Vendruscolo M (2016). Metainference: A Bayesian inference method for heterogeneous systems. Sci Adv 2, e1501177. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Boomsma W, Ferkinghoff-Borg J, and Lindorff-Larsen K (2014). Combining Experiments and Simulations Using the Maximum Entropy Principle. Plos Comput Biol 10, 1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Brandon S, Beth AH, and Hustedt EJ (2012). The global analysis of DEER data. J Magn Reson 218, 93–104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Cafiso DS (2014). Identifying and Quantitating Conformational Exchange in Membrane Proteins Using Site-Directed Spin Labeling. Accounts Chem Res 47, 3102–3109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Cavalli A, Camilloni C, and Vendruscolo M (2013). Molecular dynamics simulations with replica-averaged structural restraints generate structural ensembles according to the maximum entropy principle. The Journal of Chemical Physics 138, 094112. [DOI] [PubMed] [Google Scholar]
  10. Cesari A, Gil-Ley A, and Bussi G (2016). Combining Simulations and Solution Experiments as a Paradigm for RNA Force Field Refinement. J Chem Theory Comput 12, 6192–6200. [DOI] [PubMed] [Google Scholar]
  11. Edwards TH, and Stoll S (2016). A Bayesian approach to quantifying uncertainty from experimental noise in DEER spectroscopy. J Magn Reson 270, 87–97. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Edwards TH, and Stoll S (2018). Optimal Tikhonov regularization for DEER spectroscopy. J Magn Reson 288, 58–68. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Fiorin G, Klein ML, and Henin J (2013). Using collective variables to drive molecular dynamics simulations. Mol Phys 111, 3345–3362. [Google Scholar]
  14. Gangi Setty T., Cho C, Govindappa S, Apicella MA, and Ramaswamy S (2014). Bacterial periplasmic sialic acid-binding proteins exhibit a conserved binding site. Acta Crystallogr D Biol Crystallogr 70, 1801–1811. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Glaenzer J, Peter MF, Thomas GH, and Hagelueken G (2017). PELDOR Spectroscopy Reveals Two Defined States of a Sialic Acid TRAP Transporter SBP in Solution. Biophysical Journal 112, 109–120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Gull SF, and Daniell GJ (1978). Image reconstruction from incomplete and noisy data. Nature 272, 686–690. [Google Scholar]
  17. Hagelueken G, Ward R, Naismith JH, and Schiemann O (2012). MtsslWizard: In Silico Spin-Labeling and Generation of Distance Distributions in PyMOL. Applied Magnetic Resonance 42, 377–391. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Hilger D, Polyhach Y, Padan E, Jung H, and Jeschke G (2007). High-resolution structure of a Na+/H+ antiporter dimer obtained by pulsed electron paramagnetic resonance distance measurements. Biophysical Journal 93, 3675–3683. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Hummer G, and Koefinger J (2015). Bayesian ensemble refinement by replica simulations and reweighting. The Journal of Chemical Physics 143, 243150. [DOI] [PubMed] [Google Scholar]
  20. Jeschke G (2012). DEER Distance Measurements on Proteins. Annu Rev Phys Chem 63, 419–446. [DOI] [PubMed] [Google Scholar]
  21. Jeschke G (2013). Conformational dynamics and distribution of nitroxide spin labels. Prog Nucl Mag Res Sp 72, 42–60. [DOI] [PubMed] [Google Scholar]
  22. Jeschke G, Chechik V, Ionita P, Godt A, ZimmermannJ H, Banham j., Timmel CR, Hilger D, and Jung H (2006). DeerAnalysis2006—a comprehensive software package for analyzing pulsed ELDOR data. Applied Magnetic Resonance 30, 473–498. [Google Scholar]
  23. Jeschke G, Pannier M, and Spiess HW (2002). Double Electron-Electron Resonance. Biological Magnetic Resonance 19, 493–512. [Google Scholar]
  24. Jo S, Kim T, Iyer VG, and Im W (2008). Software news and updates - CHARNIM-GUI: A web-based grraphical user interface for CHARMM. Journal of Computational Chemistry 29, 1859–1865. [DOI] [PubMed] [Google Scholar]
  25. Johnston JW, Coussens NP, Allen S, Houtman JCD, Turner KH, Zaleski A, Ramaswamy S, Gibson BW, and Apicella MA (2008). Characterization of the N-acetyl-5-neuraminic acid-binding site of the extracytoplasmic solute receptor (SiaP) of nontypeable Haemophilus influenzae strain 2019. Journal of Biological Chemistry 283, 855–865. [DOI] [PubMed] [Google Scholar]
  26. Kazmier K, Alexander NS, Meiler J, and Mchaourab HS (2011). Algorithm for selection of optimized EPR distance restraints for de novo protein structure determination. J Struct Biol 173, 549–557. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Lindorff-Larsen K, Best RB, Depristo MA, Dobson CM, and Vendruscolo M (2005). Simultaneous determination of protein structure and dynamics. Nature 433, 128–132. [DOI] [PubMed] [Google Scholar]
  28. Lipari G, and Szabo A (1982). Model-Free Approach to the Interpretation of Nuclear Magnetic-Resonance Relaxation in Macromolecules .1. Theory and Range of Validity. Journal of the American Chemical Society 104, 4546–4559. [Google Scholar]
  29. Losonczi JA, Andrec M, Fischer MW, and Prestegard JH (1999). Order matrix analysis of residual dipolar couplings using singular value decomposition. J Magn Reson 138, 334–342. [DOI] [PubMed] [Google Scholar]
  30. Mackerell AD Jr., Feig M, and Brooks CL 3rd. (2004). Extending the treatment of backbone energetics in protein force fields: limitations of gas-phase quantum mechanics in reproducing protein conformational distributions in molecular dynamics simulations. Journal of Computational Chemistry 25, 1400–1415. [DOI] [PubMed] [Google Scholar]
  31. Marinelli F, and Faraldo-Gomez JD (2015). Ensemble-Biased Metadynamics: A Molecular Simulation Method to Sample Experimental Distributions. Biophysical Journal 108, 2779–2782. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Marinelli F, Kuhlmann SI, Grell E, Kunte HJ, Ziegler C, and Faraldo-Gomez JD (2011). Evidence for an allosteric mechanism of substrate release from membrane-transporter accessory binding proteins. Proc Natl Acad Sci U S A 108, E1285–E1292. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Mchaourab HS, Steed PR, and Kazmier K (2011). Toward the Fourth Dimension of Membrane Protein Structure: Insight into Dynamics from Spin-Labeling EPR Spectroscopy. Structure 19, 1549–1561. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Milov AD, Ponomarev AB, and Tsvetkov YD (1984). Electron Electron Double-Resonance in Electron-Spin Echo - Model Biradical Systems and the Sensitized Photolysis of Decalin. Chem Phys Lett 110, 67–72. [Google Scholar]
  35. Milov AD, and Tsvetkov YD (1997). Double electron-electron resonance in electron spin echo: Conformations of spin-labeled poly-4-vinilpyridine in glassy solutions. Applied Magnetic Resonance 12, 495–504. [Google Scholar]
  36. Müller A, Severi E, Mulligan C, Watts AG, Kelly DJ, Wilson KS, Wilkinson AJ, and Thomas GH (2006). Conservation of structure and mechanism in primary and secondary transporters exemplified by SiaP, a sialic acid binding virulence factor from Haemophilus influenzae. J Biol Chem 281, 22212–22222. [DOI] [PubMed] [Google Scholar]
  37. Mulligan C, Geertsma ER, Severi E, Kelly DJ, Poolman B, and Thomas GH (2009). The substrate-binding protein imposes directionality on an electrochemical sodium gradient-driven TRAP transporter. Proc Natl Acad Sci U S A 106, 1778–1783. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Olsson MHM, Sondergaard CR, Rostkowski M, and Jensen JH (2011). PROPKA3: Consistent Treatment of Internal and Surface Residues in Empirical pK(a) Predictions. J Chem Theory Comput 7, 525–537. [DOI] [PubMed] [Google Scholar]
  39. Olsson S, Ekonomiuk D, Sgrignani J, and Cavalli A (2015). Molecular Dynamics of Biomolecules through Direct Analysis of Dipolar Couplings. J Am Chem Soc 137, 6270–6278. [DOI] [PubMed] [Google Scholar]
  40. Phillips JC, Braun R, Wang W, Gumbart J, Tajkhorshid E, Villa E, Chipot C, Skeel RD, Kale L, and Schulten K (2005). Scalable molecular dynamics with NAMD. Journal of Computational Chemistry 26, 1781–1802. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Piana S, Klepeis JL, and Shaw DE (2014). Assessing the accuracy of physical models used in protein-folding simulations: quantitative evidence from long molecular dynamics simulations. Curr Opin Struct Biol 24, 98–105. [DOI] [PubMed] [Google Scholar]
  42. Pitera JW, and Chodera JD (2012). On the Use of Experimental Observations to Bias Simulated Ensembles. J Chem Theory Comput 8, 3445–3451. [DOI] [PubMed] [Google Scholar]
  43. Polyhach Y, Bordignon E, and Jeschke G (2011). Rotamer libraries of spin labelled cysteines for protein studies. Phys Chem Chem Phys 13, 2356–2366. [DOI] [PubMed] [Google Scholar]
  44. Raiteri P, Laio A, Gervasio FL, Micheletti C, and Parrinello M (2006). Efficient reconstruction of complex free energy landscapes by multiple walkers metadynamics. J Phys Chem B 110, 3533–3539. [DOI] [PubMed] [Google Scholar]
  45. Roux B, and Islam SM (2013). Restrained-Ensemble Molecular Dynamics Simulations Based on Distance Histograms from Double Electron-Electron Resonance Spectroscopy. J Phys Chem B 117, 4733–4739. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Salikhov KM, and Khairuzhdinov IT (2015). Four-Pulse ELDOR Theory of the Spin A1/2 Label Pairs Extended to Overlapping EPR Spectra and to Overlapping Pump and Observer Excitation Bands. Applied Magnetic Resonance 46, 67–83. [Google Scholar]
  47. Schiemann O, and Prisner TF (2007). Long-range distance determinations in biomacromolecules by EPR spectroscopy. Q Rev Biophys 40, 1–53. [DOI] [PubMed] [Google Scholar]
  48. Schmidt T, Walti MA, Baber JL, Hustedt EJ, and Clore GM (2016). Long Distance Measurements up to 160 angstrom in the GroEL Tetradecamer Using Q-Band DEER EPR Spectroscopy. Angew Chem Int Edit 55, 15905–15909. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Schrödinger L.L.C. The PyMOL Molecular Graphics System, Version 1.8 Schrödinger, 2015 [Google Scholar]
  50. Schwieters CD, Kuszewski JJ, and Clore GM (2006). Using Xplor-NIH for NMR molecular structure determination. Prog Nucl Mag Res Sp 48, 47–62. [DOI] [PubMed] [Google Scholar]
  51. Stein RA, Beth AH, and Hustedt EJ (2015). A Straightforward Approach to the Analysis of Double Electron-Electron Resonance Data. Method Enzymol 563, 531–567. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. van Dijk E, Hoogeveen A, and Abeln S (2015). The hydrophobic temperature dependence of amino acids directly calculated from protein structures. Plos Comput Biol 11, e1004277. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Weaver LH, and Matthews BW (1987). Structure of Bacteriophage-T4 Lysozyme Refined at 1.7 a Resolution. J Mol Biol 193, 189–199. [DOI] [PubMed] [Google Scholar]
  54. White AD, and Voth GA (2014). Efficient and Minimal Method to Bias Molecular Simulations with Experimental Data. J Chem Theory Comput 10, 3023–3030. [DOI] [PubMed] [Google Scholar]
  55. Yirdaw RB, and McHaourab HS (2012). Direct observation of T4 lysozyme hinge-bending motion by fluorescence correlation spectroscopy. Biophysical Journal 103, 1525–1536. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Zinkevich M (2003). Online Convex Programming and Generalized Infinitesimal Gradient Ascent. Proceedings of the Twentieth International Conference on Machine Learning, 928–936.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Information
movie S1
Download video file (8.8MB, mp4)

RESOURCES