Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2018 Sep 11.
Published in final edited form as: J Chem Theory Comput. 2015 Apr 14;11(4):1919–1927. doi: 10.1021/ct5011455

Identification of Mutational Hot Spots for Substrate Diffusion: Application to Myoglobin

David De Sancho , Adam Kubas ‡,§, Po-Hung Wang ‡,, Jochen Blumberger , Robert B Best ¶,*
PMCID: PMC6132223  NIHMSID: NIHMS986417  PMID: 26574395

Abstract

The pathways by which small molecules (substrates or inhibitors) access active sites are a key aspect of the function of enzymes and other proteins. A key problem in designing or altering such proteins is to identify sites for mutation which will have the desired effect on the substrate transport properties. While specific access channels have been invoked in the past, molecular simulations suggest that multiple routes are possible, complicating the analysis. This complexity, however, can be captured by a Markov State Model (MSM) of the ligand diffusion process. We have developed a sensitivity analysis of the resulting rate matrix, which identifies the locations where mutations should have the largest effect on the diffusive on rate. We apply this method to myoglobin, which is the best characterized example both from experiment and simulation. We validate the approach by translating the sensitivity parameter obtained from this method into the CO binding rates in myoglobin upon mutation, resulting in a semi-quantitative correlation with experiment. The model is further validated against an explicit simulation for one of the experimental mutants.

TOC Figure

graphic file with name nihms-986417-f0005.jpg

Introduction

The diffusion of small molecules inside proteins is often essential to their function. Perhaps the best known example is the binding of oxygen and carbon monoxide by hemoglobin and myoglobin. Early crystal structures of myoglobin did not reveal a clear access path to the heme pocket, implying a role for protein dynamics.1 The same theme occurs in the diffusion of enzyme substrates from the solvent to buried active sites, such as in P450 cytochromes2 or flavoenzymes,3, 4 or from one active site to another in multi-enzyme complexes such as tryptophan synthase5 and carbon monoxide dehydrogenase/acetyl-CoA synthase.6 The study of the ligand migration pathways is hence key for a fundamental understanding of protein function, and critically also for our ability to engineer these systems, for example to enhance substrate diffusion or reduce active site access for inhibitors.

The dynamics of the diffusion process and the conformational states involved can be partially resolved via a combination of ultrafast spectroscopy and time-resolved crystallography. Molecular dynamics (MD) simulations can provide a complementary, and more detailed, picture of the mechanisms by which small molecules are able to reach the protein active sites or binding sites. By far the best characterized model system is myoglobin, due to its suitability for both ultrafast spectroscopy and crystallography.7,8 Currently experiments918 and simulations1935have reached a general consensus on the protein cavities occupied by the gas molecules and the access tunnels to the heme group. An interesting avenue of research is the possibility of engineering the ligand diffusion process for proteins of biomedical or industrial interest using site directed mutagenesis. However, the engineering of enzymes for modulating the access of ligands is still primarily guided by visual inspection of the structures and human intuition. Computational approaches have the potential to direct such experimental efforts by providing candidates for mutation sites using a more quantitative method.

Here we present a general approach to calculate the effect of mutations on the binding kinetics of ligand molecules to proteins. We approximate the dynamics of gas molecules within the protein, and exchange with the solvent, via a Markov state model (MSM), which we show to provide a good description at relevant time scales.36 We then present a method to analyze the sensitivity to mutation of diffusive rates obtained from the MSM, to identify local “hot spots” where mutations would be expected to have the largest effect. We have applied our approach to CO diffusion in myoglobin, for which the abundant experimental data available can serve to validate the results from our approach. We find good correspondence between the effects of mutations predicted by our approach and experiment. Lastly, we show how the effect of mutations identified by this perturbative approach can be more accurately quantified by resampling relevant transitions in the rate matrix with new simulations.

Theory

Master equation

We start from the chemical master equation that describes the evolution of probability in a network of stochastic transitions (see Figure 1a) between a set of metastable states (or microstates)

dpdt=Kp(t). (1)

Figure 1:

Figure 1:

Theoretical framework. (a) Scheme of a network of metastable states (nodes). Edges are shown between pairs of nodes that are connected by direct transitions. The network is divided into three regions, separated by dashed lines, corresponding to the unbound state (U, blue, with øG = 0), the geminate or bound state (G, red, with øG = 1) and an intermediate region (I, pink, with 0 < øG < 1). The dot-dashed curve marks the dividing surface Σ across which the reactive flux is calculated. A perturbation is introduced in microstate s which affects all the connected microstates i (only one is shown with a gray frame for clarity). (b) Schematic free energy diagram for the perturbed microstate s and a connected microstate i. Upon a change of Δgs in the free energy of microstate s there is a change of δgs/2 in the barrier to reach microstate i.

Here K is the rate matrix, with elements kji corresponding to the rate coefficients for the transition between microstates i → j, and p is the vector with the probabilities of all microstates in the network.

Sensitivity analysis

We are interested in the sensitivity of an overall reaction rate (here the binding rate, k+1) with respect to mutations in the network. The sensitivity upon mutation of a microstate s, as, is defined as the partial derivative of the binding rate with respect to changes in the free energy of the microstate,

αs=gsln(k+1). (2)

The binding rate is calculated from the rate matrix K as the steady state rate for reaching the active site, using the Berezhkovskii-Hummer-Szabo (BHS) method,37 which allows us to estimating the flux from the rate coefficients of the metastable network. Using this, the derivative in Equation 2 can be expressed in terms of the elements in the rate matrix K. We use linear free energy relationships to approximate the dependence of the microscopic rate coefficients with respect to changes in the stability of a microstate.

Reactive flux and commitment probabilities

We obtain the binding rate, k+1, using the expression of flux over population k+1 = JU→G/fU where JU→G is the total flux through an interface separating the unbound and bound states and fU is the fractional population of unbound states in equilibrium. We calculate the flux using the BHS expression37

JUG=jG,iUkjipeq(i)[ϕG(j)ϕG(i)] (3)

The sum in Equation 3 runs over pairs of states i and j at the unbound (U*) and bound (G*) sides of a dividing surface Σ (such as that illustrated in Fig. 1(a) ), which are connected (i.e. kji > 0); peq(i) is the equilibrium probability of state i, and øG(i) is the commitment probability of state i. The values of øG (also termed the committor, or pfold in the context of protein folding) are defined to be 0 and 1 for for microstates in the unbound and geminate states respectively, i.e. øG(i ϵ U)= 0 and øG(i ϵ G) = 1. For the intermediate region, the values of øG are determined by solving37

jϕG(j)kji=jIϕG(j)kji+jGkji=0,iI (4)

Effects of mutations on the microscopic rates

To reasonably approximate the effects of the mutation near a microstate on the rate coefficients of K we use linear free energy relationships (see Figure 1b). For a mutation causing a change dgs to the free energy of microstate s, we assume that the “transition state” for the si transition is halfway between microstates s and i and the perturbed forward (s → i) and backward (s← i) rate constants are thus

kis(δgs)=kiseβδgs/2 and ksi(δgs)=ksieβδgs/2, (5)

respectively. Hence, for negative gs (i.e. stabilizing mutation) we get kis(δgs) < kis (i.e. slowdown of the si rate) and for positive δgs (i.e. destabilizing mutation) we get kis(δgs) > kis (i.e. speedup of the si rate), according to intuition.

Calculation of the sensitivity parameter

We calculate the sensitivity to point mutations in microstate s (Equation 2) as

αs=gslnk+1=JgsfUJfUgsk+1fU2 (6)

This involves estimating the derivative of the flux (Equation 3) with respect to the free energy of the mutated state, gs,

JUGgs=jG,iUgs(kjipeq(i)[ϕG(j)ϕG(i)])=jG*,iU(kjigspeq(i)[ϕG(j)ϕG(i)]+kjipeq(i)gs[ϕG(j)ϕG(i)]+kjipeq(i)[ϕG(j)ϕG(i)]gs) (7)

The derivatives (i)kji/gs, (ii) ∂ peq(j)/∂gs and (iii) [øG(j) øG(i)]/∂gs are determined as:

  • (i)
    The derivative of the rate coefficients, when one of the microstates involved is the mutatedstate s, is obtained from the linear free energy relation for the rates:
    kisgs=β2kis and ksigs=β2ksi (8)
  • (ii)
    The derivative of the equilibrium population is readily obtained from the conservation ofpopulation:
    peq(j)gs={βpj(1pj), if j=sβpjps if js (9)
  • (iii)
    We solve for the derivative of øG(i) after differentiating Equation 4.
    jI(kjigsϕG(j)+kjiϕG(j)gs)+jGkjigs=0,iI (10)
    In practice we can re-write this expression as a vector equation of dimension |I|.
    Agsb+Abgs+cgs=0 (11)
    where for i ϵ I we define
    Aji=kij for jI;bi=ϕG(i); and ci=jGkji (12)
    bgs=(cgsAgsb)A1 (13)

Methods

Molecular dynamics simulations

We have run atomistic molecular dynamics simulations of sperm whale myoglobin (PDB id: 1mbn) using the Amber ff03 protein force field38 in explicit TIP3P water39 and in presence of 20 CO molecules (corresponding to a total of 12436 atoms, with a gas concentration of 266 mM, see Figure 2a). Dynamics was propagated using a stochastic leap-frog algorithm implemented in the Gromacs package (version 4.6.5)40 with a time step of 2 fs for a total of 500 ns. A temperature of 300 K was controlled by coupling the simulation to a Langevin thermostat with a friction of of 1 ps −1 and a pressure of 1 bar was maintained using a Parrinello-Rahman barostat,41 respectively. We use the protonation state of the protein described by Onufriev and co-workers, based on available empirical evidence.33 This is important as the protonation state has been shown to determine the conformation of His64, which can act as a gate for the entry of ligand molecules to the binding pocket.42 We note that in simulations with an alternative protonation state the binding mechanism of the CO molecules changes considerably (see SI). For the CO we use a well established 3-site model43 used in our previous work.44

Figure 2:

Figure 2:

MSM microstates and access pathways. (a) Structure of myoglobin (white ribbon) with the positions of CO molecules from the MD trajectory (cyan dots) used in the clustering. Transparent spheres represent the clusters corresponding to (non-solvent) microstates used in the construction of the Markov state model, coloured from red to blue based on their proximity to heme group. The clusters matching relevant xenon sites within myoglobin as shown as solid spheres. (b) Access pathways to the geminate site. We show as lines representative trajectory paths corresponding to entry events through the three most important access ports.

The high concentration of gas molecules (a CO pressure of 1 bar corresponds to 0.93 mM concentration) was used to enhance sampling of transitions in the MSM. We have first checked that the protein dynamics do not seem to be substantially altered in the presence of the gas molecules, relative to a simulation of the protein in water. Both in the absence and presence of CO molecules, the global Cα-RMSD of the myoglobin remains within 0.1–0.2 nm of the native structure; the profiles of residue-wise RMSD from native for the two simulations are also very similar to each other, differing by less than 0.05 nm for most residues (see Figure S1 and S2). Also, we have verified that there is little dependence of the kinetic parameters obtained from the MSM (see below) on the number of gas molecules used (see Figure S4 and S5).

Markov state model

We derive a Master equation / Markov state model (MSM) from the simulation data. The procedure requires the following steps: (i) discretization of the simulation in order to define the microstates of the model; (ii) assignment of transitions among this set of microstates, and (iii) estimation of the transition and rate matrices based on the state trajectory. We briefly describe these steps below.

Discretization

To define the microstates of the model we first cluster the Cartesian coordinates of the virtual atoms (VCO) at the center of mass of the CO molecules. In order to do this we overlay all snapshots of the simulation on the initial structure, using the alpha carbons of the myoglobin molecule as reference, and recenter all molecules to be within this primary cell. Only gas molecule positions which are closer than 6 A of the protein alpha carbons or heme atoms are used in the clustering, as every˚ other possible position of the gas molecule is lumped into the “solvent” microstate. Clustering is carried out using the Daura algorithm,45 as implemented in the Gromacs package,40 with a cutoff of 3 A. From this procedure we obtain 434 clusters of which we keep only the most populated set,˚ accounting for 95% of the total population. This procedure results in 194 clusters, which we use, together with the solvent cluster, to construct the MSM.

Assignment

We use transition-based assignment (TBA)46,47 to identify transitions between pairs of microstates i and j. For each of the individually defined clusters we calculate the distribution of distances from the cluster center to its members. We define the transition-based assignment distance for that cluster, rTBA(i), as the radius that contains 80% of the molecules corresponding to that cluster. This procedure results in a set of non-overlapping spheres corresponding to the different clusters. Transitions from one state i to another state j are identified when the virtual atom of the gas molecule is closer to the center of cluster j than rTBA(j). For transitions from a state i to the solvent we require that the gas molecule has really left the vicinity of the protein, imposing a cutoff distance for escape of 6 Å from any protein alpha carbon or heavy atom from the heme˚ group.

We use the simulation data of each individual CO molecule as an independent trajectory, assuming that the results are uncorrelated with the other gas molecules in the system. Each trajectory is assigned following the rules described above. We calculate the number of transitions between every pair of microstates, nji, after a given lag time (Δt). This results in the transition count matrix N(Δt). For the calculation of the transition probability matrix and the rate count matrix we use only the largest strongly connected subset of the network, selected using Tarjan’s algorithm.48

Estimation

As in previous work,49 we first determine the transition probability matrix Tt) using the maximum likelihood estimator50

tji(Δt)=nji(Δt)/knki(Δt) (14)

where tjit) are the elements of the transition probability matrix, i.e. the probability that, initially being in state i, a gas molecule is found in state j after a lag time Δt. The value of the lag time is chosen so that the relaxation times of the system are converged51 (see Figure S5).

As before we calculate the elements of the rate matrix K, using the following approximation49

kji{tji(Δt)/Δt for ijΣjikji for i=j (15)

The approximation in Equation 15 becomes exact in the limit Δt → 0.

The rates obtained from the individual CO trajectories correspond to a concentration of ~1/VH2Osim. In order to obtain the rates at a reference 1 mM concentration we scale the rate coefficients connecting the solvent microstate with those inside the protein by VH2Osim/VH2Oo, where VH2Oo is the volume per molecule of gas at a 1 mM gas concentration, as before.6,44,52,53

Analysis

We compute equilibrium populations from the right eigenvector of the stationary mode of the rate matrix (ψ0R) and relaxation times τi from its eigenvalues λi as τi = 1i. Errors for these quantities are obtained by a bootstrap method.54 Each bootstrap sample was generated by randomly drawing trajectory segments from the pool of simulations with repetition, until the same amount of data as in the original dataset is obtained.

Estimation of free energy changes for experimental mutations

In order to make a direct comparison with experiment, we use a database of kinetic data for mutations aiming to affect the E7 gate (H64 and F46), the DP site (L29), the Xe4 cavity (I28, V68 and I107) and the Xe1 cavity (L89, L104, F138)55 (see Table 1). The rate we can make a comparison with is the biomolecular rate cosntant k entry , which corresponds to the access from the solvent to the initial docking site, from which the gas molecule would be able to bind to the heme group.55 Although this data set corresponds to O2 binding rates, we assume that the relative changes will be the same for CO, as the mutations have primarily a steric effect and the molecules are similar in size.

Table 1:

Dataset of entry rates for O2 binding to myoglobin from Olson et al.55

Protein k entry 
μ M−1 s−1
WT 34±7

E7 gate mutations

H64A 410
H64W 8.6
F46A 110
F46W 35

Docking site mutations

L29A 78
L29W ≥6

Xe4 cavity mutations

I28A 49
I28W 14
V68A 44
V68W 0.2
I107A 59
I107W 21

Xe1 cavity mutations

L89G 31
L89W 57
L104A 50
L104W 28
F138A 48
F138W 27

In order to compare our results with this dataset of mutants, we require an estimate of the free energy change that the mutation induces in the microstates of the MSM. Here we use an extremely coarse-grained description of these effects. First, we assume that a given mutation in the protein will only affect the closest microstate s, and hence the overall effect of the mutation can be obtained as

Δlnk+1=Δgmut×αs (16)

Our estimate of Δgmut assumes that the changes in the amino acid volume upon mutation will translate in a decrease or increase of the cavity volume vs and its population. Therefore the change in population will be proportional to the relative changes in the cavity volume for the WT protein

Δgs=kBTln(ps+psΔvs/vsps)=kBTln(1+Δvs/vs) (17)

We assume that the change in the cavity volume (Δvs) is the same as the change in amino acid volume from WT to mutant, but with negative sign. The amino acid volumes are those reported by Zamyatnin.56 The reference cavity volume for the WT (vs) is approximated as that of a sphere with radius equal to the RMSD of the particle positions for that cluster scaled by a single adjustable parameter, which we set to be λ =1/5 to improve the correlation with experiment. The justification for doing this is that each cavity fluctuates and so the RMSD of the particles assigned to a given cluster will generally be larger than the cavity size. In those cases where the estimated volume change from amino acid replacement is strongly negative and larger in magnitude than the actual reference volume, then a very small volume (1e-6 Å3) is assigned. In this case, the site should clearly be completely blocked, and so we assign it a very high free energy so that it is essentially never visited.

Results

The MSM captures essential features of the binding of CO to myoglobin

In order to characterise the diffusion of CO molecules into myoglobin, we have analyzed equilibrium simulations of the protein in the presence of the gas with a Markov state model. As a first test for our computational model we compare check whether it correctly captures pockets identified in the protein via structural studies. In particular we can compare our clusters with the sites occupied by xenon in equilibrium crystal structures, and which have been shown to be populated by CO in time-resolved crystallographic experiments (Figure 2a). The Xenon sites, Xe1, Xe2, Xe3 and Xe4, which are transiently visited by CO molecules after photodissociation,17 are all found by the clustering algorithm without any preconditioning. There is a one-to-one mapping between the Xe1, Xe2 and Xe4 sites and single gas molecule clusters, while two of our clusters are equally close to the Xe3 site. In addition, the distal pocket (DP) (here referred to as the geminate site (G)) is identified by the clustering, and a cluster also appears right immediately adjacent to His64, the key residue in the E7 gate that is widely accepted to be the main entry gate to myoglobin. These results indicate that all of these protein cavities are populated preferentially in our simulations, in accord with experiment.

In the 500 ns of equilibrium run we find seven instances in which gas molecules reach the geminate site (see Figure 2b). These occur through diverse pathways that we summarize in Figure 2b. Three of the entries occur via a side-tunnel reaching Phe46 (pathway 1), two occur via His64 (pathway 2) and another two via a back gate (pathway 3). The first two pathways can be mapped onto the main access portal identified in a previous simulation study by Onufriev et al,33 with the entry clusters corresponding to a broad region in the vicinity of the E7 gate. The back gate into the active site of the protein is widely regarded as of being of minor relevance. We note however, that in simulations carried out with a different protonation state, access to the geminate site was found to occur predominantly via this back gate, resulting in a very similar net binding rate (see SI).

Estimate of binding rates and committors

The advantage of the MSM methodology, is that by stitching together the transitions from the 20 different CO molecules, analyzed as independent trajectories, we are able to integrate the information corresponding to the different pathways into a unified kinetic model. In Figure 3 we show the spectrum of relaxation times corresponding to the different modes of the MSM. The slowest relaxation has a characteristic time-scale of 5.7 ns while there are two more modes with similarly slow time-scales. Inspecting the eigenvectors of the rate matrix we can identify the states which are exchanging with each relaxation time. The slowest mode (λ1) corresponds to the exchange between the solvent cluster with a series of microstates in the proximity of the heme group, including those at the geminate site and Xe1 and Xe2 clusters (i.e. the associated process is the binding of the CO from the solvent to the protein interior). The next two eigenmodes (λ 2 and λ 3) correspond to the exchange between these internal clusters and those in the back entry to myoglobin (one of which is close to the Xe3 site). This description is coincident with that provided by Onuvrief and co-workers, with two main discrete pathways connected by a bottleneck (in this case corresponding to the separation emerging from modes λ2 and λ3).

Figure 3:

Figure 3:

Markov State Model for CO diffusion in myoglobin. (a) Relaxation times for the first 50 eigenvalues (λi) from the rate matrix Kt) obtained with a lag time Δt=10 ps. (b) Commitment probabilities or øG values. The inset emphasizes the gas sites with øG > 0.1, relative to the positions of the Xe clusters and the DP site.

While there are many accessible sites within the protein, it is not immediately evident which are most important, and at what point the gas molecule is more likely to bind. This can be quantified by computing the committors for binding to the geminate (or DP) binding site, øG, defined here as the probability that gas molecules will reach the geminate site before reaching the solvent; i.e. the critical value of øG = 0.5 signifies an equal probability of binding or dissociation. We use the BHS method37 (see Methods) to determine the values of the committors directly from the rate matrix. In this case, we define the end states corresponding to unbound, øG(U) = 0, and bound, øG(G) = 1, as the solvent microstate and the DP or geminate site respectively (see Figure 2b). For every other microstate between these, the committors can be calculated analytically from the rate matrix using Equation 4.37By definition, the highest value of the committor øG = 1 corresponds to the geminate site, and sites in its vicinity also have very high values (0.89 < øG < 1, shown in red in Figure 2b). The cluster in front of His64, a key residue in the entry through the E7 gate has a value of øG ‘ 0.3, i.e. being the closest to the definition of a transition state based on this criterion (orange in Figure 3b). For the rest of the clusters, the values of the committor rapidly decrease. For microstates either on the surface of myoglobin but in the proximity of His64 or in the internal pockets of the protein (including Xe1 and Xe3) we obtain øG ≃ 0.1 (green in Figure 3b) while for the microstates at the surface of the protein we get øG ≃ 0 (blue in Figure 3b)

The BHS method also allows for us to calculate fluxes for each of the microscopic transitions, so we can calculate the reactive flux through a dividing line Σ that we place just before reaching the geminate state. Using Equation 3 we recover the binding rate, k+1 = 646 μM–1s–1, and reversing the definition of the end states also the dissociation rate k–1 =15 μs–1. These estimates are somewhat faster than the experimental values of 12 μM–1s–1 and 5.3 μs–1,57 resulting in a slightly large binding constant (K = k+1/k–1=43 M–1) relative to experiments (2.2 M–1). The fast dynamics may partly be explained by the low viscosity of the TIP3P water model, resulting in a diffusion coefficient of CO in water about 3 times faster than in experiment.58

Sensitivity of the binding rate to mutations

Our main focus is to capture the relative effects of changes in one microstate of the network on the global dynamics. In particular, we would like to predict the effect of mutations on the rate of binding of the gas molecules to the heme. While one approach to doing this would be to directly simulate all mutants of interest, it would be computationally demanding and subject to considerable statistical error. To overcome these problems, we instead apply a perturbative approach to the rate matrix for the wild type protein. To do this, we consider mutations as either increasing or decreasing the volume accessible to gas molecules in a specific site within the protein, thus changing the relative free energy of that site. From the change in free energy, we estimate changes in microscopic rate coefficients kji via a linear free energy relation, allowing us to determine changes in overall rates. The sensitivity of the binding rate k+1 to mutations at a specific site s in the diffusion network can be characterized by the sensitivity parameter αs = lnk+1/∂gs, where gs is the free energy of site s (see Methods section).

In Figure 4a we present the calculated sensitivity parameter, αs, determined from the rate matrix using Equations 5 to 13. The parameter as can be interpreted as the effect that an infinitesimal microstate blocking mutation would have on the binding rate, with a negative sign indicating that a decrease of the population (increase in free energy αs) would translate into a decrease in the overall binding rate. As expected we find that changes in the stability of microstates near the protein surface have a very little effect on the rates (blue in Figure 4a). However, in the vicinity of the DP-site, the effects are much stronger, with ∂k+1/∂gs becoming large and negative (green to red in Figure 4). If we zoom in to the region that our analysis proposes is most sensitive to mutations, we find that the microstates with larger sensitivity are close to some of the sites which were experimentally mutated to block the Xe2 and Xe4 cavities (see Table 1 and Figure 4a, inset). A particularly strong effect is observed around the H64 residue, the key amino acid for entry via the E7 gate. The mutation of this residue to a tryptophan resulted in a considerable decrease in binding rate of O2 to myoglobin.55 Another residue that exhibited a considerable sensitivity in experiments was L29, which upon mutation to a tryptophan resulted in a decrease in the entry rate by a factor of ≃ 6. The largest effect of a blocking mutation in experiments resulted from V68W, which intended to affect the Xe2 site. In our model, the sensitivity of the Xe2 is larger than for most other clusters. However, the simulation model seems to emphasize the effect of the E7 gate pathway.

Figure 4:

Figure 4:

Sensitivity analysis of the MSM. (a) Normalized sensitivity parameters for each of the microstates, coloured onto the structure. The inset shows a close-up view emphasizing the positions of the experimental mutations. (b) Correlation of experimental and calculated changes in the logarithm of the on rate. The colour code is the same in both panels. The triangle corresponds to the result from the explicit simulation of the L29W mutant.

Validation of the sensitivity analysis

In order to make a more direct comparison with experiment we make a rough guess of the free energy change of the clusters (Δgs) based on the original cluster volume and the volume change between swapped amino acid residues (see Methods). Then we can calculate the changes in the rates due to a change in free energy Δgs of the site s, as Dlnk+1,calc = Δgsas. Using this coarsegrained description of the effects of the mutation, we obtain the values of Dlnk+1,calc corresponding to each of the mutants in the dataset (see Figure 4b). We find a semi-quantitative agreement with the experimental results, with a correlation coefficient of R = 0.62. In fact, if we remove the most notable outlier (V68W) from the correlation the correlation coefficient increases to R = 0.79. The model seems to overestimate the effect of the E7 gate while it underestimates effects in alternative pathways. The model correctly captures the relative insensitivity of the rates to most of the alanine and glycine mutations, the lack of response to the Xe1 cavity mutations, and the larger effects resulting from the mutation in the vicinity of the Xe4 and docking sites. The residues L29, V68 and H64 are all identified as hot spots for mutation, although their rank order is not correctly captured, due to a failure of our coarse grained description of the effects of the mutations on the free energy of the site. The discrepancy could be due to uncertainties in the effective volume estimate for the tryptophan, whose orientation in the mutant may determine the strength of the perturbation. Also, effects other than volume (e.g. electrostatics) or structural relaxation in the actual mutant, which are not included in our simple model, may also be important.

As a second attempt to validate the results of our sensitivity calculation we have explicitly simulated one of the most sensitive mutants in the experiments (L29W). Instead of constructing an MSM from scratch for the mutant, we have continued with the assumption of the mutation principally affecting the closest microstate, s, to the mutated residue and all its connected clusters, i for i : kis 0 or ksi ≠ 0. We have run simulations from those clusters as initial positions for a single CO molecule. This allowed us to recalculate the column elements kis for all transitions out of the mutated site s to connected sites i, and kji for all transitions out of each connected site i to all other sites j connected to it, and repeat the calculation of the binding rate, k+1. Via this approach, we are able to recover a binding rate of 64.1 μM –1s–1. The change in the binding rate from the explicit simulation Dlnk+1 = 2.04 is then very close to that predicted by the model, Δlnk+1 = 2.29, and also very close to the experimental change, Dlnk+1 = 1.74 (see Figure 4.

Conclusions

We have presented a new approach for quickly determining the sensitivity of diffusive on rates for protein substrates to mutations in the protein interior. The methodology can also be coupled with calculations that explicitly consider the chemical attachment of the ligand to the binding site.59,60 This approach can serve as an initial screening for carrying out explicit simulations of mutants or for experimentalists to direct their protein engineering studies. The method is based on inferring a kinetic Markovian model from atomistic simulation data, from which we calculate a sensitivity metric for the effects of mutants on the rates.

We have applied the method to the study of ligand binding in myoglobin. The MSM that we have constructed is in good agreement with the current models for gas diffusion within this protein obtained from simulation and experiment, and sheds new light on the role of the E7 gate as entry port to myoglobin and on the time-scales for communication between the primary and secondary pathways, which are comparable to the slowest relaxation time in the system. Our approach is validated in two different ways. First, using a set of simple assumptions to estimate the free energy change of a model microstate upon mutation in myoglobin, we find semi-quantitative agreement with experiment. Second, we test our results against the explicit simulation of one of the experimentally relevant mutants (L29W), again finding remarkable agreement. In future work we will address a more refined estimation of the free energy changes, although the results of the very coarse approach used here are encouraging.

The method proposed here will hence make it possible to identify “hot-spots” for mutations in proteins which should maximally affect ligand diffusion. A similar approach would also be more generally applicable to analyze the sensitivity of global rate coefficients obtained from any MSM to the stability of its constituent microstates.

Supplementary Material

1

Acknowledgement

D.D.S was supported by EPSRC grant EP/J016764/1 and A.K. by EPSRC grant EP/J015571/1. R.B.B. was supported by the Intramural Research Program of the National Institute of Diabetes and Digestive and Kidney Diseases of the National Institutes of Health. J.B. thanks the Royal Society for a University Research Fellowship. This work was carried out on the HECToR and Archer computing facilities (Edinburgh), access to which was granted through the Materials Chemistry Consortium (EPSRC grants EP/F067496 and EP/L000202). This study utilized the highperformance computational capabilities of the Biowulf Linux cluster at the National Institutes of Health, Bethesda, MD (http://biowulf.nih.gov). The authors acknowledge the use of the UCL Legion High Performance Computing Facility (Legion@UCL), and associated support services, in the completion of this work. D.D. acknowledges Jacob Stevenson for useful discussions.

Footnotes

Supporting Information Available

Additional molecular visualizations, details of the MSM methodology and results, and analytical expressions for the sensitivity parameter. This material is available free of charge via the Internet at http://pubs.acs.org/.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

RESOURCES