Skip to main content
Biophysical Journal logoLink to Biophysical Journal
. 2019 Jan 25;116(5):818–830. doi: 10.1016/j.bpj.2018.11.3144

Cooperative Changes in Solvent Exposure Identify Cryptic Pockets, Switches, and Allosteric Coupling

Justin R Porter 1, Katelyn E Moeder 1, Carrie A Sibbald 1, Maxwell I Zimmerman 1, Kathryn M Hart 2, Michael J Greenberg 1, Gregory R Bowman 1,3,
PMCID: PMC6400826  PMID: 30744991

Abstract

Proteins are dynamic molecules that undergo conformational changes to a broad spectrum of different excited states. Unfortunately, the small populations of these states make it difficult to determine their structures or functional implications. Computer simulations are an increasingly powerful means to identify and characterize functionally relevant excited states. However, this advance has uncovered a further challenge: it can be extremely difficult to identify the most salient features of large simulation data sets. We reasoned that many functionally relevant conformational changes are likely to involve large, cooperative changes to the surfaces that are available to interact with potential binding partners. To examine this hypothesis, we introduce a method that returns a prioritized list of potentially functional conformational changes by segmenting protein structures into clusters of residues that undergo cooperative changes in their solvent exposure, along with the hierarchy of interactions between these groups. We term these groups exposons to distinguish them from other types of clusters that arise in this analysis and others. We demonstrate, using three different model systems, that this method identifies experimentally validated and functionally relevant conformational changes, including conformational switches, allosteric coupling, and cryptic pockets. Our results suggest that key functional sites are hubs in the network of exposons. As a further test of the predictive power of this approach, we apply it to discover cryptic allosteric sites in two different β-lactamase enzymes that are widespread sources of antibiotic resistance. Experimental tests confirm our predictions for both systems. Importantly, we provide the first evidence, to our knowledge, for a cryptic allosteric site in CTX-M-9 β-lactamase. Experimentally testing this prediction did not require any mutations and revealed that this site exerts the most potent allosteric control over activity of any pockets found in β-lactamases to date. Discovery of a similar pocket that was previously overlooked in the well-studied TEM-1 β-lactamase demonstrates the utility of exposons.

Introduction

Proteins are highly dynamic molecules that are capable of accessing a wide variety of excited conformations. Many of these excited states have important biological functions. For example, many proteins predominantly adopt an off state until they interact with a binding partner that stabilizes a higher-energy on state. However, the most common tools for structural biology, such as x-ray crystallography and cryoelectron micrography, typically provide only a static picture of one (or a few) low-energy states. Characterizing proteins’ excited states would improve our understanding of how they function and open new therapeutic opportunities (1).

Computer simulations, because of their excellent spatiotemporal resolution, are a promising means to identify functionally relevant excited states and conformational changes (2). However, simulations have historically faced severe limitations. In particular, the inability to capture slow processes, such as large-scale conformational changes, has hampered the routine discovery of physiologically relevant excited states using computer simulations. However, enormous advances in computer hardware and simulation algorithms have made it possible to capture processes that occur on tens to hundreds of milliseconds, finally giving access to this physiologically important timescale for many proteins (3, 4). One successful approach has been to combine many parallel simulations executed on commodity hardware into a single model of protein dynamics using Markov state models (MSMs) (5, 6, 7). MSMs are network models of protein energy landscapes composed of many conformational states and the probabilities of hopping between them. Because they are able to integrate information from many independent simulations, they are capable of reaching timescales many orders of magnitude larger than any of the individual simulations used to build the model.

The growing availability of long-timescale simulations has revealed a new major challenge: extracting meaningful insights from the resulting colossal data sets. These data sets are not only composed of hundreds of millions or billions of timepoints but are also embedded in tens or hundreds of thousands of dimensions. Numerous methods have been developed to address this challenge. One method, projecting simulation data onto specific order parameters, is a valuable means to test hypotheses, but this approach requires detailed foreknowledge of which parameters are important to avoid obscuring important features (8). For data sets for which foreknowledge is not available, unsupervised methods have been developed to learn what degrees of freedom are important. For example, principal components analysis (9) highlights large geometric changes. Unfortunately, larger conformational changes are not necessarily more functionally relevant. For example, the large variance in the atomic positions of a disordered loop can easily dwarf a subtler but more functionally relevant conformational changes. Another approach is to leverage the variational principle (10), typically operationalized in the form of time-lagged independent component analysis (tICA) (11, 12), which instead focuses on slowly varying dimensions. Slowness, however, does not necessarily imply functional relevance. For instance, the process of flipping a phenylalanine about its ring axis may be slow, but the exchange symmetry of atoms on either side of the ring means the process does not alter the conformation at all. Indeed, practitioners have increasingly begun to move toward a combination of hand-tuned features combined with tICA (13), which, although focusing the featurization, is labor intensive and requires detailed foreknowledge of the system to avoid omitting important features.

To address these challenges, we hypothesized that functionally relevant conformational changes are likely to result in large, cooperative changes to the surfaces of a protein that are available to interact with potential binding partners. This hypothesis was inspired, in part, by the fact that surface chemistry is an especially important feature of most proteins because it is how the protein interacts with other objects, including any substrates and binding partners. Furthermore, we reasoned that functionless cooperativity at protein surfaces is rare. Physically, the folded state is bombarded by thermal noise, which—in the absence of a specific design constraint—will tend to decorrelate any arbitrary pair of features. Genetically, sequence drift would be expected to eliminate cooperativity that is not selected for over time, much as early protein redesigns sometimes inadvertently destroyed cooperative folding (14). This assumption is also the basis for sequence-based methods that use patterns of conservation and covariance to infer pairs of residues that are in direct contact in a protein’s three-dimensional structure or that are allosterically coupled (15, 16, 17).

To make testing this hypothesis tractable, we developed a method that returns a prioritized list of potentially functional conformational changes by segmenting protein sequences into clusters of residues that undergo cooperative changes in their solvent exposure and uncovers the hierarchy of interactions between these groups. We term these clusters of mutually correlating residues “exposons” to disambiguate them from other forms of clustering found in this work and in the literature. To identify exposons and the structural motions that give rise to them, we present an efficient, MSM-based approach.

For a concrete example of the utility of identifying cooperative changes in solvent exposure, consider cryptic pockets. Cryptic pockets are transient concavities on protein surfaces that open when the protein fluctuates to an excited state (18, 19). Pocket opening concomitantly increases the solvent exposure of surrounding residues, and closing a pocket simultaneously reduces their exposure. Thus, these residues undergo correlated changes in their solvent exposure and are likely to form an exposon.

To establish the value of exposons, we demonstrate that they naturally identify a variety of functionally relevant conformational changes without any foreknowledge of what structural features are important for a given system. First, we show that exposons identify cryptic pockets and allostery in the enzyme TEM-1 β-lactamase. We then show that they detect a conformational switch in the Ebola virus’ nucleoprotein (eNP) and allostery in the catabolite activator protein (CAP). Then, we use exposons to prospectively discover cryptic allosteric sites in two different β-lactamase enzymes with less than 40% sequence identity, TEM-1 and CTX-M-9, and perform in vitro biochemical experiments to test our predictions.

Methods

Simulations

As described previously (20), simulations were run at 300 K with the GROMACS software package (21) using the Amber03 force field (22) and TIP3P (23) explicit solvent. β-Lactamase simulations were deployed on the Folding@home distributed computing platform (24), whereas simulations of eNP and CAP were performed on NVIDIA P100 graphics processing units on our local cluster. For our retrospective work, we used previously published data sets including 90.5 μs of simulation of TEM-1 β-lactamase with the M182T substitution (20), 28.0 μs of simulation of eNP (25), and 1.5 μs of simulation of CAP (26). For our prospective work on CTX-M-9 β-lactamase, we ran 76.0 μs of aggregate simulation on Folding@home.

Solvent-exposure featurization

To generate a solvent-exposure featurization of each data set, we computed the solvent-accessible surface area (SASA) of each residue’s side chain in each simulation frame to a drug-fragment-sized probe using the Shrake-Rupley (27) algorithm, as implemented in MDTraj (28). The result is a set of t vectors of length n, where n is the number of residues and t is the number of frames. A probe size of 2.8 Å was chosen because previous work suggests this value identifies pockets that can accommodate a drug-sized molecule (18). Jug (29) was used to organize the parallel execution of many independent tasks, including parallelizing solvent-accessibility calculations across many cores.

MSMs

We defined MSM microstates by clustering the side-chain SASA-featurized representation. Clusters were discovered using k-centers, which adds new cluster centers until the maximal within-cluster distance dropped below a threshold value, which was 2.6 nm2 for TEM-1, 3.5 nm2 for CTX-M-9, 5.0 nm2 for eNP, and 3.7 nm2 for CAP. These values were chosen based on the implied timescales test (Fig. S1). Then, five rounds of k-medoids updates were performed in which updates were accepted if the largest distance to the nearest medoid decreased. Then, to estimate transition probabilities from assignments of frames to clusters, we first constructed a transition count matrix, where the element Cij is the number of transitions observed from state i to state j. Then, we added a pseudocount of 1/n (where n is the number of states) to each element of the transition counts matrix and row normalized this matrix to obtain a transition probability matrix, as suggested in (30, 31). The lag times were 0.1 ns (TEM-1), 0.1 ns (CTX-M-9), 0.5 ns (eNP), and 0.4 ns (CAP), which were chosen by the implied timescales test (Fig. S1). The highest flux pathways between two sets of states were then extracted using transition path theory (32, 33).

Exposon calculation

Beginning with the featurized representation for the representative conformation for each MSM state, we classified each side chain in each state as exposed or buried using a fixed threshold. We chose a fixed threshold rather than a continuous threshold to reduce the number of parameters (a sigmoid, for example, would require a step width and a step midpoint) and because previous work suggested that mutual information performs better when a smaller number of bins are used (26). Our fixed threshold in this study was 2.0 Å2, but choices in the range 2.0–5.0 Å2, as well as values formulated as a fraction of maximal possible side-chain exposure in the 1–3% range, gave similar results. This invariance implies that our algorithm does not erroneously favor larger residues because of their larger maximal possible SASA. The end result is a featurization of each MSM state, wherein each snapshot is represented by a binary vector with one entry per residue that contains a one for exposed residues and a zero for buried residues.

We then calculate the mutual information between each pair of residues. Mutual information (MI) is a measure of the statistical interdependence of two random variables. It is given by the equation

MI(X,Y)=yYxXp(x,y)log(p(x,y)p(x)p(y)), (1)

where X and Y are any pair of residues and x and y represent the solvent-accessibility states (i.e., buried, exposed) of the corresponding residue. The probability p(x) is the probability that a residue is observed in state x, and p(x, y) is the joint probability of x and y. These probabilities are the equilibrium probabilities calculated during MSM fitting. Other methods could be used in place of MSMs to identify a set of representative structures and their equilibrium probabilities. However, MSMs are advantageous because they provide a facile means to extract the motions that give rise to an exposon. Another key advantage of MSM technologies is that they often better estimate true equilibrium probabilities on sets of trajectories of finite length. To compute MI matrices for CAP, we leveraged its dimer symmetry to improve sampling of solvent-exposure states. If Ai and Bi are the random variables representing the exposure states of residue i of chains A and B, respectively, then by the chemical identity of the two chains, at equilibrium, P(Ai, Bj) = P(Aj, Bi). To exploit this fact to enhance our sampling of the state space and make our predictions more robust to sampling error, we take the mean of these two probabilities when computing MI.

Exposons are the cluster assignments computed by affinity propagation (34). Our choice of this algorithm was motivated by affinity propagation’s ability to use a similarity matrix (rather than distances), lack of a need for its similarities to not satisfy the triangle inequality (because MI does not satisfy this inequality), and use of relatively few tunable parameters. We use the affinity propagation implemented in scikit-learn 0.19.0 (35) and zero initial affinities. The one parameter that must be chosen is the so-called “damping parameter,” which usually (but not always) causes the algorithm to produce fewer clusters at higher values. In practice, however, the results are generally similar throughout most of the range of valid choices of the damping parameter (from 0.5 to 1.0), though the results will vary for damping parameters very near to 1.0 or 0.5. The damping parameters were 0.9 for TEM-1, CTX-M-9, and eNP and 0.95 for CAP, generally chosen to be as high as possible (creating a low number of exposons) without causing the algorithm to converge on a single exposon for the entire protein. This typically generates 10–50 exposons, of which we visualize the top 3–10. Affinity propagation is robust to the choice of damping parameter, giving similar values for much of the range of valid choices (Fig. S2).

Coarse-grained exposon graphs are network models of the communication between each pair of exposons. In this model, each node represents an exposon and each edge represents the communication between a pair of exposons. The weight of each edge in the coarse-grained network is calculated by

MIcoarse(A,B)=iAjBMI(i,j)C(A,B), (2)

where A and B are any pair of exposons (sets of residues) and C is the channel capacity, which is the maximal possible MI between two pairs of exposons (26). In this case, the channel capacity is given by C(A, B) = min(|A|, |B|) or one bit per residue up to the number of residues in the smallest exposon. When visualizing coarse grain exposons, we omit self-edges and very low-valued edges (<0.015 for TEM, <0.075 bits for eNP, and <0.2 bits for CAP). Eigenvector centrality calculations were performed with NetworkX (36).

Labeling-rate predictions

We predict time-dependent labeling behavior using the MSM we fitted as described above. Labeling-rate predictions are made first by identifying all states in which the residue of interest is exposed and converting them to sink states by zeroing out the rows in the transition probability matrix. Then, iteratively multiplying the equilibrium probability distribution by this new matrix gives a monotonically decaying fraction of “unlabeled” probability density as density flows into the sink states. Finally, we fit the unlabeled fraction as a function of time to a single exponential to yield a rate. In the limit of a perfectly good fit, this rate is equivalent to a mean first-passage time (37). An implementation of this simple procedure is provided in a Jupyter notebook (see Code Availability). We used SciPy (38) version 0.19.1 for curve fitting.

Protein expression and purification

TEM-1 was purified from the periplasmic fraction of BL21(DE3) cells (Agilent Technologies, Santa Clara, CA) using both cation exchange and size-exclusion chromatography. The full protocol is described in previous work (39).

We subcloned the CTX-M-9 gene into the multiple cloning site of pET9-a vector. Plasmids were transformed into BL21(DE3) Gold cells (Agilent Technologies) for expression under T7 promoter control. Cells were induced with 1 mM isopropyl β-D-1-thiogalactopyranoside at optical density OD = 0.6 and grown for 5 h at 37°C. The cells were then centrifuged, and the pellet was frozen at −80°C.

CTX-M-9 cells were resuspended in 20 mM sodium acetate (pH 5.5), sonicated, and then centrifuged. The protein was purified from the insoluble cytoplasmic fraction. The pellet was unfolded in 9 M urea, 20 mM sodium acetate (pH 5.5) and centrifuged. CTX-M-9 was then refolded in 20 mM sodium acetate (pH 5.5), purified by both cation exchange and size-exclusion chromatography, and stored similarly to TEM-1. Site-specific variants were constructed via site-directed mutagenesis and verified by DNA sequencing.

Thiol labeling

We observe the change in absorbance over time of 5,5′-dithiobis-(2-nitrobenzoic acid) (DTNB, Ellman’s reagent; Thermo Fisher Scientific, Waltham, MA), a small molecule that changes its absorbance as it covalently binds reduced cysteine side chains (18). We used an SX20 stopped-flow instrument (Applied Photophysics, Leatherhead, UK) with a dead time of 1.5 ms. Measurements were taken over time in 20 mM Tris (pH 8) 1% dimethyl sulfoxide, followed at an absorbance of 412 nm (ε412 = 14,150 M−1 cm−1), and fitted by a single exponential (Fig. S3). Our previous work with thiol labeling was performed using manual mixing in a standard UV-Vis spectrophotometer (18), but in this work, we used a stopped-flow instrument that gives access to faster timescale motions and improves the quality of fits because it offers a dead time that is much shorter than the timescale of our experiments. It also allows for the use of lower DTNB and protein concentrations.

The labeling rate at a given DTNB concentration can be described by the Linderstrøm-Lang model, originally derived for hydrogen-deuterium exchange (40):

closedkopkclopenkintDTNBlabeled. (3)

The observed rate is given by

kobs=kopkintDTNBkop+kcl+kintDTNB, (4)

which is a nonlinear function that approaches a linear dependence on [DTNB] at low concentrations and [DTNB] independence at high concentrations. In the limiting case in which kclkint[DTNB], called the EX1 regime, the observed rate of labeling reduces to

kobsEX1=kopkclkintDTNB. (5)

In the limiting case in which kclkint[DTNB], called the EX2 regime, the observed rate of labeling reduces to

kobsEX2=kopkclkintDTNB=KkintDTNBkclkintDTNB, (6)

where K is the equilibrium constant between the open and closed forms. In the intermediate regime in which kclkint[DTNB], called the EXX regime, one must fit to the full expression (given in Eq. 3). We found that over the concentrations of DTNB used, TEM-1 S243C labeling was in the EX2 regime (linear dependence on [DTNB]) and CTX-M-9 labeling was in the EXX regime (nonlinear dependence on [DTNB]).

The three regimes differ in terms of the controls required to demonstrate that labeling is not occurring predominantly in the unfolded state. In the EX1 regime, the observed labeling rate for a pocket must be faster than the rate of global unfolding. Neither of the pockets we test in this work labeled in this regime, but we have previously observed labeling rates in pockets that showed this behavior (18). In the EX2 regime, the equilibrium constant for pocket opening must be greater than that for the unfolded state (Eq. 5). To determine these quantities for TEM-1 S243C, we measured the K of unfolding (Table S1) and the intrinsic rate of labeling for the denatured protein. To determine the intrinsic rate of labeling, kint, our labeling assay was repeated with the addition of 6 M urea. In the EXX regime, the observed labeling rate for a pocket must be greater than the maximal expected labeling rates of the unfolded state in either the EX1 or EX2 regimes (derivation in Supporting Materials and Methods). Thus, to test that CTX-M-9’s labeling rate is not consistent with global unfolding alone, we measured both its rate of unfolding (Fig. S4) and its thermodynamic stability (Table S2). We then combined that with the fit value of kint (Table S1) to produce a piecewise function that is an upper bound for labeling from the unfolded state. In this case, however, we found that the unfolding rate is the relevant control for all DTNB concentrations used here—the population of unfolded enzyme is relevant only at DTNB concentrations less than ∼170 nM, the DTNB concentration at which kcl = kint.

Urea melts and unfolding kinetics

Equilibrium stabilities and unfolding kinetics were acquired on a Chirascan circular dichroism spectrometer (Applied Photophysics) at a temperature of 25°C. Protein denaturation was observed by measuring the average ellipticity over 60 s at 222 nm as a function of urea concentration (Fig. S5; Table S2). Samples of 35 μg/mL protein were equilibrated in 50 mM potassium phosphate (pH 7) and varying concentrations of urea overnight before data collection.

To determine the global unfolding rate, we fit a linear model (41) of the log observed unfolding rates as a function of urea concentration at concentrations above the protein’s melting concentration (the Cm, Table S2) and extrapolated global unfolding rate back to 0 M urea (Fig. S4). Concentrations were between 4 and 5.5 M urea for TEM-1 M182T and between 1.8 and 2.8 M urea for CTX-M-9.

Activity measurements

Activity measurements were performed on both labeled and unlabeled proteins. To measure the activities of the labeled proteins, 10 μM S243C and 5 μM CTX-M-9 were each incubated with excess DTNB for 1 h, giving ample time for both proteins to fully label before the activity measurements. The proteins were then separated from excess DTNB using size-exclusion chromatography.

Enzyme activities against nitrocefin (Cayman Chemical, Ann Arbor, MI) were monitored at 482 nm (ε482 = 15,000 M−1 cm−1) using a Cary 100 UV-vis spectrophotometer (Agilent Technologies). Reactions were measured in 50 mM potassium phosphate, 10% glycerol (v:v), 2% dimethyl sulfoxide (pH 7.0) at 25°C using 2 nM enzyme. Initial velocities were plotted as a function of nitrocefin concentration and fit to a Michaelis-Menten model to extract kcat and Km values (Fig. S6; Table S3).

Visualizations

Protein structures were visualized using PyMOL 2.2 (42). Graphs were embedded with Fruchterman-Reingold algorithm (43) in NetworkX (36).

Code availability

Library code is available on GitHub as bowman-lab/enspara (44). MSM weights and state representative structures, along with a Jupyter notebook demonstrating the analysis described in this manuscript, can be found at https://wustl.box.com/v/2018-exposons.

Results

Exposons simultaneously capture conformational changes and allosteric communication at protein surfaces

To identify exposons, we first construct an MSM (see Methods). In this work, we defined states using the Euclidean distance between vectors of side-chain SASAs, but in general, other methods can also be used. Each state in the MSM is then represented by a binary vector that characterizes the solvent exposure of each residue in the cluster center for that state (i.e., element i is zero if residue i is buried or one if it is exposed) (Fig. 1 A). Based on previous work (18), side chains are classified as exposed if their surface area exceeds 2 Å2. We then compute the MI between each pair of residues, giving a square MI matrix (Fig. 1 B). MI—defined in Eq. 1 of Methods—is a nonlinear measure of the statistical interdependence of two random variables that has been previously used in studies of protein allostery (26, 45). A particularly useful property of the MI is that residues that never change their solvent exposure (i.e., have zero entropy) have zero MI. Finally, we cluster this MI matrix using affinity propagation (34) to assign residues to exposons. The resulting MI matrix can be visualized as a network with nodes colored according to their exposon assignment (Fig. 1 C). The list of exposons can be prioritized for further analysis based on the total information of each exposon, which is the sum of the nondiagonal elements of each exposon’s row in the coarse-grained exposon MI matrix (26). This tends to identify larger exposons with more communication, which we reasoned are more likely to be functionally relevant and less susceptible to noise and errors introduced by finite sampling.

Figure 1.

Figure 1

A schematic outline of our method for identifying exposons. (A) An MSM composed of variably populated states (circles, population is indicated by circle diameter) and transitions between states (single-headed arrows, probability is indicated by arrow length) is shown. Each state is associated with a binary exposed or buried classification for each residue, indicating whether the residue is exposed or buried in that state (column of black and white boxes, white denoting buried and black denoting exposed). (B) An all-against-all pairwise mutual information (MI) matrix that is calculated from (A) is shown. Exposons, indicated by the colored groups in the margins, are clusters of residues with mutually high pairwise MI. (C) The residue-level network representation of an MI matrix. Residues are indicated by double-edge circles and are colored by their exposon membership. Exposons are indicated by dashed-lined circles. The mutual information between residues is indicated by straight lines between double-edged circles, and the weight of the line represents the strength of the correlation.

Once exposons have been computed, we typically wish to identify—in terms of atomic coordinates and protein conformations—which motion or motions give rise to an exposon. MSMs, which contain kinetic as well as thermodynamic information, are a natural source of this information. An MSM’s top eigenmodes capture how much each conformational state participates in the slowest motions observed in a simulation. In each eigenmode, each state is a assigned a value between −1 and 1. The absolute value of each assignment represents the degree of participation of the state in the eigenmotion, and the sign is arbitrary but groups states into opposite ends of the motion. That is, states with low values in the eigenmode are slowly interconverting with states with high values of the eigenmode. Therefore, we reasoned that an MSM’s top eigenmodes provide a facile means to identify the dominant motions contributing to an exposon (5). To identify which eigenmode reports on a particular exposon, we first compute the degree to which changes in an exposon’s solvent exposure are correlated with each eigenmotion. Specifically, for a residue of interest, we choose the eigenvector that maximizes the Pearson’s R correlation coefficient between the eigenvector’s component and a vector of solvent accessibilities for each state. More formally, we compute argmaxjR(νj,Si),where R is the function that calculates Pearson’s R, v is the m × m matrix of eigenvectors, and S is the n × m matrix of state exposures for an MSM with m states of a protein with n residues. This approach is similar to dynamical fingerprinting (46). We then choose the eigenmode that maximizes this correlation and extract the structures of the conformers at the extremes of this motion.

This conceptual framework has several important advantages over more traditional geometric approaches. First, it does not make any assumptions about which types of surfaces are most interesting—instead, any surfacial rearrangement that shows cooperativity will be detected. Second, this approach explicitly considers the entire sampled ensemble and uses this information to prioritize the most interesting features of the ensemble rather than relying on structural features of particular conformers. Third, because exposons exist in sequence space, the results are insensitive to structural alignments and can be easily compared with experimental techniques that provide a readout at the primary structural level, including thiol labeling. Consequently, this tool is applicable to a wide variety of conformational changes and scientific questions, as we demonstrate below.

Our approach is predicated on the assumption that interesting features are those that change at the surface. Our motivation for this assumption was that most interesting protein behavior ultimately is a consequence of the protein’s ability to interact with other objects, which occurs at the surface. Any cooperative rearrangement that does not substantially alter a protein’s solvent exposure will not be detected. For instance, a rotameric transition that creates geometry necessary for catalysis may not entail any change in solvent exposure. Likewise, any cooperative changes occurring exclusively in the protein core will not be detected. Allosteric coupling between two surface sites that occurs through the core will be detected, but the mechanism will not be apparent because exposons will only be sensitive to the endpoints. Another potential limitation of our approach is imposed by the use of MI, which is only sensitive to features that change. A concavity at a protein surface that never changes its conformation will not be identified by this method—this situation is much better suited to the many excellent geometrical pocket detection methods proposed over the years (47, 48, 49, 50).

Retrodiction of a cryptic allosteric site in TEM-1 β-lactamase

As a first test of our model, we examined its ability to identify cryptic allosteric sites. A cryptic allosteric site is a pocket that is absent in available structures but is present in excited states and can exert allosteric control over a distant functional site, such as an enzyme’s active site. Cryptic pockets are a particularly interesting class of excited states because identifying new cryptic sites could offer new druggable sites on established drug targets, provide a means to inhibit targets that are currently considered undruggable, or even enable the enhancement of desirable activities (19, 51). Therefore, a systematic means to identify functionally relevant conformational transitions to excited states in the absence of stabilizing interactions could provide biophysical insight and new therapeutic opportunities. We expect the formation of a cryptic pocket to result in an exposon because, as explained above, the opening and closing of a pocket should result in cooperative changes in the solvent exposure of surrounding residues. Furthermore, for a cryptic allosteric site, we expect the allosteric coupling to give rise to correlations between the pocket exposon and residues around the relevant functional site.

We chose to test our approach on the enzyme TEM-1 β-lactamase because it is known to contain several cryptic allosteric sites (18, 20). It is also an important source of antibiotic resistance, so new inhibitors could provide a valuable means to restore the efficacy of existing antibiotics. In pursuit of new inhibitors, allosteric modulators have been discovered for at least one of these sites, which is created when a short α-helix undocks from the protein, exposing a ligand binding site (18, 52, 53). To distinguish this site from other putative allosteric sites on this protein, we will refer to this site as the Horn pocket, or the Horn allosteric site, after the author who first reported this pocket (52).

As expected, we identify exposons corresponding to known cryptic pockets in TEM-1 β-lactamase (Fig. 2 A). To visualize this, we mapped high-total-information exposons onto a crystal model of the TEM-1 ground state (Fig. 2 B). In this format, we observe a small number of spatially condensed clusters of residues that are distant in sequence space, recapitulating our expectation that spatially (but not necessarily sequentially adjacent) objects are more likely to act cooperatively. The exposon with the highest total information (dark blue) corresponds to the active site. The exposon with the second-highest total information (light blue) corresponds to the Horn site (20, 52), shown in Fig. 2 C, gray structure. Yet another exposon (beige, fourth highest total information) reports on a second cryptic pocket that we reported previously (18). Each of the exposons corresponding to a cryptic pocket has substantial interexposon communication (i.e., at least 90th percentile of all exposon-exposon edges) with the active site, suggesting the potential for perturbations to these pockets to exert allosteric control over activity.

Figure 2.

Figure 2

Exposons for TEM-1 β-lactamase. (A) The coarse-grained exposon network for TEM-1 is shown. Edge weights are proportional to the total correlation between each pair of exposons, and node sizes are proportional to their eigenvector centrality. Self-edges and edges with very low values are omitted. (B) The highest-total-information exposons superimposed on a crystal model of unliganded TEM-1, Protein Data Bank (PDB): 1JWP (67), are shown. Residue colors match the exposon colors in (A). Note the spatially contiguous exposons centered about the active site (dark blue), Horn allosteric site (light blue), a previously identified cryptic pocket (beige), and the Ω-loop (light and dark green). (C) A representative structure of the open state from the light blue exposon (teal) overlaid on a ligand-bound crystal structure of the Horn allosteric site (gray, PDB: 1PZO) (52) is shown.

To assess the effectiveness of using an MSM’s eigenmodes to identify the motions that induce a particular exposon, we compare the structures identified in this way with known crystal models for the relevant ligand-bound state. In the case of the Horn cryptic allosteric site, a crystallographic model for the ligand-bound, open state is available (52). We then compare this structure with the structures at the extremes of the eigenmotion that best correlates with this exposon’s exposure state, as described in Methods. In this case, one extreme of the configuration resembles the ligand-free crystal structure, and the other is similar to the bound crystal structure (Fig. 1 A, teal structure). The fact that the open structure from our model is somewhat more open than the ligand-bound structure is consistent with previous evidence that this pocket opens even further in solution than is seen in the crystal structure (20).

As an even more stringent test of our model, we assessed the consistency of our model’s predictions with an in vitro measurement of the kinetics of solvent exposure. Specifically, we used a thiol-labeling approach, which we have improved from our previous work (18) by the use of a stopped-flow instrument (see Methods). In brief, this assay uses a drug-sized labeling reagent, DTNB (Ellman’s reagent) that changes absorbance upon covalently reacting with solvent-exposed reduced cysteines, providing a time-resolved measurement of residue-level solvent exposure with millisecond resolution. If a cysteine is not natively present at a position of interest, then one can be introduced via mutation (see Methods). To make the comparison between our MSM and our thiol-labeling experiment, we also developed a method for labeling-rate predictions (see Methods) that gives, as a function of time, the fraction of the population that has ever exposed the relevant side chain to solvent.

As predicted, experimentally confirmed pocket positions (S203, A232, L286) expose at intermediate rates in our model (Fig. S7). Furthermore, positions that do not label in our experiments (L190, I260) remain buried in our simulations. Similarly, a surface control (A150) labels immediately in our experiments and never buries in our simulations. Additionally, rank is preserved: residues that label faster in vitro are predicted to label faster in silico. The main discrepancy between predicted and experimental labeling occurs at S249, which labels very slowly in vitro but does not label in silico, likely because finite sampling prevented us from ever observing the slow process that leads to exposure of this residue. It is worth noting, however, this residue is located just “beneath” (i.e., deeper toward the core of the protein) an exposon that reports on cryptic pocket opening at this position (Fig. 2 B, beige), suggesting that the exposon analysis may be somewhat robust to sampling error. As discussed previously (18), the fact that we can place cysteines at positions where they remain buried reassures us that we have not introduced pockets where they did not exist before. Furthermore, the strong correlation between our predicted and observed labeling rates supports the conclusion that we have not erroneously created pockets. In the future, a more precise understanding of the labeling reaction’s geometric requirements could enable quantitative predictions of labeling rates. For now, our model’s ability to correctly order pocket opening rates demonstrates its utility for identifying and characterizing pockets.

Retrodiction of a conformational switch in nucleoprotein

As a subsequent test of our model, we assessed its ability to retrodict a conformational switch that was previously identified by Su et al. (25). Proteins must frequently act as switches, altering their behavior in response to some signal. For example, many signaling proteins undergo conformational changes in response to specific stimuli that either increase or decrease their propensity to interact with downstream binding partners. We expect these concerted changes to manifest as exposons in our analysis.

As a test of the hypothesis that functional conformational switches at protein surfaces induce exposons, we analyzed eNP, a conformational switch that controls access to and replication of the viral genome. Understanding and manipulating this conformational switch is of interest because Ebola was the causative agent in several recent, high-case-fatality epidemics in sub-Saharan Africa (54) and is a pathogen for which very limited treatment options are available. Therefore, an improved biophysical understanding of this virus’s lifecycle may prove useful in understanding how it can be therapeutically targeted. In one state, eNP oligomerizes and encapsidates the viral genome to package it for transport and protect it from degradation (55). In a second state, eNP exists as a monomer, releasing RNA to allow transcription of the viral genome (55). Recent evidence suggests that oligomerization is controlled by the curling of C-terminal helices of eNP into the RNA-binding cleft (25). We then expect an exposon to be formed by the residues in this groove and by the residues in the C-terminal tail that transiently occupies it.

Consistent with our expectation that the surfacial rearrangements required for eNP function result in exposon formation, the highest-total-information exposon in eNP (Fig. 3 A, dark blue) spans the residues in the C-terminal polymerization domain and the RNA-binding groove. This is an interesting case in which an exposon is not predicted to be composed of residues that are spatially contiguous when mapped to a crystal model of the ground state. This exposon is also at the center of the network of exposons (Fig. 3 B). Extracting the motion that induces this exposon, shown in Fig. 3 C, reveals that this exposon reports on the very same collective curling of the terminal helices into the RNA-binding cleft identified previously (56). Crucially, this dynamic process is consistent with hydrogen-deuterium exchange data that cannot be accounted for using available cryoelectron microscopy models (25). Manipulating this conformational equilibrium with small molecules or peptides could provide a powerful means of modulating the Ebola lifecycle. Indeed, a peptide that binds this interface has already been found to inhibit viral replication (56).

Figure 3.

Figure 3

Exposons for eNP. (A) The distribution of the highest-total-information exposons superimposed upon a crystal model of monomeric eNP (25) is shown. Note the spatially noncontiguous nature of some of the exposons, especially the dark blue exposon. (B) The coarse-grained exposon network of eNP is shown. Node colors match exposon colors in (A). Edge weights are proportional to the correlation between each pair of exposons, and node sizes are proportional to their eigenvector centrality. Self-edges and edges with very low value are omitted. (C) The extremes of the eigenmotion best correlating with the highest-total-information exposon (blue exposon in A) are shown.

Retrodiction of allosteric coupling between domains in CAP

As a further test of our model, we investigated its capacity to identify allosteric coupling between binding sites. Wherever an element of conformational selection is present, a binding site will sample both its bound and unbound configurations, and whenever the bound and unbound configurations differ in their pattern of solvent exposure, an exposon is expected to form. Because bound and unbound configurations presumably expose a different pattern of surface chemistry—one association compatible and the other association incompatible—we expect that differing patterns of solvent exposure might be a near requirement. Furthermore, if these sites are allosterically coupled, they may even cluster into the same exposon.

CAP is a homodimeric transcriptional activator in Escherichia coli that allosterically couples cAMP binding to sequence-specific DNA association (57). This allosteric coupling between the cAMP-binding domains (CBDs) and DNA-binding domains (DBDs) is realized by a dramatic swiveling motion of the DBDs (58, 59), which changes the pattern of solvent accessibility on both the CBDs and DBDs, potentially producing one or more exposons. Besides coupling between the CBDs and DBDs (60), CAP also exhibits strong negative cooperativity between the two cAMP-binding sites (61). Because these binding sites show different solvent exposure in cAMP-free and doubly cAMP-liganded crystal models, we expect these sites induce exposons as well. Because previous computational work on this protein suggests that evidence of this coupling is present in equilibrium simulations of the unliganded state (26), we expect to observe exposons that encompass residues in these regions.

As expected, the two highest-total-information exposons computed from simulations of CAP in the unliganded state (Fig. 4 A) are a symmetric pair stretching from the cAMP-binding site in each monomer’s CBD to both DBDs. There is very strong communication between these two exposons, consistent with the negative cooperativity between the CBDs (57, 60).

Figure 4.

Figure 4

Exposons for CAP. (A) The distribution of the highest-total-information exposons superimposed upon a crystal model of unliganded, dimeric CAP (59) is shown. Purple circles indicate cAMP-binding sites, and the cAMP-binding domains (CBDs) and DNA-binding domains (DBDs) are labeled. (B) The coarse-grained exposon network of CAP is shown in graph form. Node colors match exposon colors in (A). Edge weights are proportional to the correlation between each pair of exposons, and node sizes are proportional to their eigenvector centrality. Self-edges and edges with very low value are omitted.

The third-highest-total-information pair of exposons (Fig. 4 A beige and orange) is centered about the individual cAMP-binding sites, and they show less communication with one another than the larger DBD/CBD exposons (Fig. 4 B). One explanation for the fact that these sites cluster separately from the rest of the cAMP-binding site is that they are responsible primarily for substrate recognition rather than allostery. The two highest-total-information residues in this exposon, Q80 and R82, are two of the only four residues in the cAMP-binding cassette that reduce their dynamicity upon binding (62)—opposite to the trend of the rest of the molecule and opposite to the hypothesized entropy-driven mechanism of allostery in this system. Furthermore, R82 is predicted to form a salt bridge with the cAMP phosphate, and its mutation strongly affects binding (63).

To understand the motions that create the larger exposons reporting on interdomain and intermolecular allostery in this system, we examined the eigenmotion that best correlates with the highest-total-information exposon we identified in this system (dark blue in Fig. 4 A). A morph between the two extreme states of this eigenmotion (Video S1) indicates that this exposon represents a see-saw motion of the DBDs coupled to the closing of one cAMP site and the opening of the other. This is consistent with structural evidence (58) that the coupling between CBDs and DBDs involves large, rigid-body displacements of the two DBDs. This immediately suggests a testable hypothesis for how the negative coupling between cAMP-binding sites might be achieved. This hypothesis could be further refined and dissected using methods like CARDS, as we have done previously (26), or experimental methods.

Video S1. Rigid-Body Movement of Catabolite Activator Protein’s DNA-Binding Domains Identified by Exposons

A see-saw motion between Catabolite Activator Protein’s DNA-binding domains and its cAMP-binding domains coupled to opening of the cAMP binding sites suggests a structural hypothesis for how negative coupling between cAMP binding sites might be achieved.

Download video file (7.5MB, mp4)

Functional sites are exposon graph hubs

Exposons are a network model and consequently provide facile access to a protein’s allosteric topology. Because we have segmented the sequence into disjoint sets, this allows us to coarse grain our original MI matrix—which represents the sparse communication graph between all residues—into a much smaller graph representing communication between exposons. To calculate the communication between two exposons, we simply sum all edges that begin in one exposon and end in the other and normalize by the channel capacity (26). The channel capacity is a measure of the maximal information that could possibly be transmitted between exposons, given the number of nodes they each contain (see Methods). Normalizing by this quantity allows for an intuitive comparison between the strength of communication between different pairs of exposons.

All exposon networks we examined had a hub-and-spoke architecture, with the exposon(s) with the highest total information serving as a hub and having a clear functional role. In TEM-1, the active site exposon (colored dark blue in Fig. 2 B), including the active site serine, is visually central to the exposon graph (Fig. 5 A), and each other node has its strongest connection with this node. We formalize this intuition by calculating each exposon’s eigenvector centrality (Fig. 5, AC) (64). Eigenvector centrality is a measure of the amount of time a random walker would spend at a particular node if transitions between nodes were distributed according to edge weights. Hence, nodes with higher-weighted or more connections to other nodes have a higher eigenvector centrality. In this case, we also find two groups of exposons attached to the hub but that are relatively uncorrelated with each other. Interestingly, one is a set of exposons that are under and around the Ω-loop, which is a critical modulator of substrate specificity and activity. In eNP, we also found that the exposon with the highest total information is a hub (Fig. 5 B). As discussed previously, this exposon captures the dramatic curling motion that has been proposed to mediate RNA binding (25). In CAP, we find that the two exposons with the highest total information both have high centrality (Fig. 5 C). These exposons appear to couple the ligand and DBDs of that protein.

Figure 5.

Figure 5

Eigenvector centrality of exposons for TEM-1, eNP, and CAP. Exposons are numbered from highest to lowest total information. In each case, the central exposon or exposons are associated with the primary function of the molecule they are found in. (A) In TEM-1, the most central exposon is at the active site. (B) In eNP, the central exposon is associated with a curling motion crucial to protein function (Fig. 3C). (C) In CAP, the pair of central exposons report on allosteric coupling between the DBDs and CBDs.

The fact that functionally relevant conformational changes result in exposons with high total information and high centrality in three completely unrelated proteins furnishing wildly different functions is consistent with our motivating hypothesis that cooperativity does not arise at random.

Discovery of the first known cryptic allosteric site in CTX-M-9

To demonstrate how the exposon model can be used to generate hypotheses and design experiments, we applied it to predict cryptic pockets in the enzyme CTX-M-9 β-lactamase. CTX-M-9 is interesting because, to the best of our knowledge, no cryptic pockets have been reported in this protein. It has less than 40% sequence identity with TEM-1, so it is not obvious whether or not it is likely to have similar cryptic pockets.

Examining the exposons for CTX-M-9 revealed that one of them contains the protein’s single native cysteine, C69 (Fig. 6 A, yellow). This cysteine is completely buried in the apo crystal structure. Examining the motion that gives rise to this exposon reveals that C69 is exposed to solvent by a displacement of the Ω-loop (Fig. 6 A), a structural element conserved among many β-lactamases and containing residues absolutely required for enzymatic activity (65) and that has significant conformational heterogeneity (39). The open structure of this pocket appears to be well-structured, as opposed to disordered, making it a potentially viable drug target. Therefore, we expect a small molecule that binds this pocket and displaces the Ω-loop would be a potent inhibitor, whereas a drug that stabilizes the closed conformation would increase activity. The exposure of C69 in particular is of great interest because our thiol-labeling assay can be applied without having to introduce a cysteine. Therefore, unlike previous applications of this method, there is no concern that the introduction of a cysteine created a pocket where none existed before.

Figure 6.

Figure 6

Exposons identify novel, to our knowledge, cryptic pockets in the CTX-M-9 and TEM-1 β-lactamases. (A) The extremes of the eigenmotion for the exposon containing C69 identify closed (left) and open (right) conformations of a cryptic pocket under the Ω-loop. Residues within 7 Å of C69 are shown as spheres, and residues participating in the same exposon as C69 are shown as red sticks. Residue C69 is colored in yellow. (B) The observed labeling rates (solid green line) are in the EXX regime. The labeling rates expected for the global unfolding process (dashed line) are much slower. (C) The extremes of the eigenmotion best correlating with the exposon containing S243 identify closed (left) and open (right) conformations of a cryptic pocket under and behind the Ω-loop in TEM-1. Residues within 7 Å of S243 are shown in spheres, and residues participating in the same exposon as S243 are colored in dark green and as sticks. Residue S243 is colored in yellow. (D) The observed labeling rates (solid blue line) of a cysteine introduced at position 243 are in the EX2 regime. The labeling rates expected for the global unfolding process (dashed line) are much slower. In (B) and (D), SDs across three experiments were on the order of 10−5 and 10−4, respectively, and are not included for visual clarity.

We examined the labeling of C69 using our thiol-labeling assay. The single exponential labeling that we observe is consistent with our prediction that C69 lines the first cryptic pocket to be identified in CTX-M-9 (Fig. 6 B). C69’s labeling rate is much faster than the rate of the global unfolding process (Fig. 6 B, dashed lines) measured by circular dichroism (Fig. S5), supporting our prediction that it is exposed by a fluctuation within the native state.

To ensure that exposon participation is a bona fide signal of pocket formation, we assayed the labeling rate of a residue that is buried in the crystal model but does not participate in an exposon, S123. Therefore, according to our model, a cysteine at this position should not show labeling. Consistent with this prediction, the S123C variant of CTX-M-9 does not show significant labeling.

To assess the allosteric potency of this site, we also measure the catalytic efficiency of the label-conjugated enzyme. In this case, after incubating CTX-M-9 with DTNB, which TNB labels C69, we measure the rate at which it degrades nitrocefin, a β-lactam substrate (Fig. S7). TNB conjugation acts as a proxy for the binding of a drug. However, owing to TNB’s small size and hydrophilicity, this assay could easily underestimate the effect a true drug could have. We found a ∼15-fold reduction in the catalytic efficiency (Fig. S7). By comparison, this same assay applied to previously identified cryptic pockets in TEM-1 showed a less than threefold change in activity (18), making our newly identified, to our knowledge, site the most potent site in either TEM-1 or CTX-M-9.

Taken together, this newly predicted pocket is the most attractive cryptic drug target site found to date in either TEM-1 or CTX-M-9. The fact that no mutation was required to perform thiol labeling of C69 also makes the results presented here some of the most compelling support for the predictive power of exposons in particular and MSMs in general.

Prediction of a novel cryptic allosteric site in TEM-1

In light of our results for CTX-M-9, we examined the exposon graph for TEM-1 to see if a similar cryptic pocket may arise because of a displacement of the Ω-loop. Because TEM-1 has been extensively studied for the purpose of identifying cryptic allosteric sites, discovery of a new pocket in this molecule would be strong evidence for the utility of exposons for pocket discovery.

Two exposons (Fig. 2 B, dark and light green), showing strong communication with one another, map onto the Ω-loop. The best-correlating MSM eigenmode revealed that S243 is significantly exposed by the opening of this pocket (Fig. 6 C) and that the open form also appears well-structured and druggable. This conclusion is supported by quantitative druggability scores from fpocket (66) (Fig. S8). Interestingly, in our previous work, we were unable to detect this pocket because it frequently forms a channel-like connection with the active site, causing it to be combined with the active site pocket by pocket clustering methods (18, 20).

As position 243’s participation in an exposon predicts, the S243C variant labels at an intermediate rate that is slower than the near-instant labeling of a surface residue but substantially faster than the global unfolding process (Fig. 6 D), which is on the order of hours (18). Once again, we also measured the catalytic function of the TNB-labeled enzyme. Somewhat surprisingly, the TNB adduct had a 3.75-fold increase in catalytic efficiency—the ratio of kcat over Km—driven primarily by a ∼4-fold decrease in Km (Fig. S6; Table S3). This is consistent with recent evidence suggesting that both activation and inhibition are possible at the same allosteric site (53) and suggests that TNB may pack into the Ω-loop in such a way as to stabilize the closed conformation. Examination of crystal models of the closed state (67) reveals a void under the Ω-loop into which TNB might plausibly pack.

The fact that exposons identify a new, to our knowledge, cryptic allosteric site even in TEM-1—a protein that has been studied for many years by many groups, including intensively by our group with the specific goal of locating these sites—highlights the value of our approach for identifying functionally relevant conformational changes. It also supports the hypothesis that the paucity of known cryptic allosteric pockets may stem more from technical limitations in locating them than from a low prevalence.

Conclusions

We have demonstrated that exposons provide a powerful conceptual framework for identifying functionally relevant conformational transitions. Exposons retrodict cryptic pockets, retrospectively identify conformational switches, and identify allosteric coupling between domains. We also showed that exposons are able to make bona fide predictions by discovering two new, to our knowledge, cryptic allosteric sites and experimentally verifying their existence. One of these sites is in a protein, CTX-M β-lactamase, that was not known to have any cryptic pockets and in which no mutations were required to experimentally test our prediction. The other is in an enzyme that has been the target of an extensive search for cryptic pockets, so discovering a new, to our knowledge, site is a surprising testament to the power of exposons. Taken together, these results are compelling evidence for the utility of exposons.

Because many proteins’ most biologically interesting behavior involves changes at their surfaces, we expect our methodology to serve as a powerful first step in the analysis pipeline for proteins with complex, allosteric functions. Our results applying exposons to cryptic pockets, for example, demonstrates this method’s potential as the first step of a drug development pipeline targeting cryptic sites. Because the motions giving rise to exposons are substantially more diverse than simply pocket formation, exposons may also serve as a nearly automatic, high-throughput mechanism for dissecting allostery at protein surfaces either to refine an existing hypothesis or to identify potential alternative hypotheses.

Finally, the apparent ubiquity of the centrality of important functional surfaces in informational graphs for all four of the systems studied in this work is provocative. It may be, for example, that a general feature of protein evolution creates this behavior: that genetic drift destroys functionless cooperativity or that allostery incurs a thermodynamic penalty and is hence selected against. It remains to be seen, however, whether this is a general physical or biological principle in the organization of proteins or whether this finding generalizes to proteins of other sizes and with other functions. Whatever the case, exposons’ value for rapidly analyzing conformational ensembles is clear, and we expect this method may have the capacity to detect even larger allosteric changes, such as folding-upon-binding events.

Author Contributions

J.R.P., K.E.M., M.I.Z., K.M.H., M.J.G., and G.R.B. designed research and analyzed results. J.R.P., K.E.M., and C.A.S. performed research. J.R.P., K.E.M., M.I.Z., K.M.H., C.A.S., M.J.G., and G.R.B. wrote the manuscript.

Acknowledgments

We are grateful to the Folding@home users for computing resources.

This work was funded by National Institutes of Health Grants R01GM12400701, U19AI109664, and T32GM02700, as well as by the National Science Foundation CAREER Award MCB-1552471. G.R.B. holds a Career Award at the Scientific Interface from the Burroughs Wellcome Fund and a Packard Fellowship for Science and Engineering from the David & Lucile Packard Foundation. M.I.Z. holds a Monsanto Graduate Fellowship and a Center for Biological Systems Engineering Fellowship.

Editor: Nathan Baker.

Footnotes

Justin R. Porter and Katelyn E. Moeder contributed equally to this work.

Supporting Materials and Methods, eight figures, three tables, and one video are available at http://www.biophysj.org/biophysj/supplemental/S0006-3495(19)30053-0.

Supporting Material

Document S1. Supporting Materials and Methods, Figs. S1–S8, and Tables S1–S3
mmc1.pdf (2MB, pdf)
Document S2. Article plus Supporting Material
mmc3.pdf (4MB, pdf)

References

  • 1.Knoverek C.R., Amarasinghe G.K., Bowman G.R. Advanced methods for accessing protein shape-shifting present new therapeutic opportunities. Trends Biochem. Sci. 2018 doi: 10.1016/j.tibs.2018.11.007. Published online December 13, 2018: S0968-0004(18)30248-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Chodera J.D., Noé F. Markov state models of biomolecular conformational dynamics. Curr. Opin. Struct. Biol. 2014;25:135–144. doi: 10.1016/j.sbi.2014.04.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Lane T.J., Shukla D., Pande V.S. To milliseconds and beyond: challenges in the simulation of protein folding. Curr. Opin. Struct. Biol. 2013;23:58–65. doi: 10.1016/j.sbi.2012.11.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Voelz V.A., Bowman G.R., Pande V.S. Molecular simulation of ab initio protein folding for a millisecond folder NTL9(1-39) J. Am. Chem. Soc. 2010;132:1526–1528. doi: 10.1021/ja9090353. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Bowman G.R., Pande V.S., Noé F. Springer Science & Business Media; Berlin: 2013. An Introduction to Markov State Models and Their Application to Long Timescale Molecular Simulation. [Google Scholar]
  • 6.Swope W.C., Pitera J.W., Suits F. Describing protein folding kinetics by molecular dynamics simulations. 1. Theory†. J. Phys. Chem. B. 2004;108:6571–6581. [Google Scholar]
  • 7.Buchete N.V., Hummer G. Peptide folding kinetics from replica exchange molecular dynamics. Phys. Rev. E Stat. Nonlin. Soft Matter Phys. 2008;77:030902. doi: 10.1103/PhysRevE.77.030902. [DOI] [PubMed] [Google Scholar]
  • 8.Laio A., Gervasio F.L. Metadynamics: a method to simulate rare events and reconstruct the free energy in biophysics, chemistry and material science. Rep. Prog. Phys. 2008;71:126601. [Google Scholar]
  • 9.Shlens, J. 2014. A tutorial on principal component analysis. arXiv, arXiv:1404.1100, https://arxiv.org/abs/1404.1100.
  • 10.Nüske F., Keller B.G., Noé F. Variational approach to molecular kinetics. J. Chem. Theory Comput. 2014;10:1739–1752. doi: 10.1021/ct4009156. [DOI] [PubMed] [Google Scholar]
  • 11.Naritomi Y., Fuchigami S. Slow dynamics in protein fluctuations revealed by time-structure based independent component analysis: the case of domain motions. J. Chem. Phys. 2011;134:065101. doi: 10.1063/1.3554380. [DOI] [PubMed] [Google Scholar]
  • 12.Pérez-Hernández G., Paul F., Noé F. Identification of slow molecular order parameters for Markov model construction. J. Chem. Phys. 2013;139:015102. doi: 10.1063/1.4811489. [DOI] [PubMed] [Google Scholar]
  • 13.Meng Y., Gao C., Roux B. Predicting the conformational variability of Abl tyrosine kinase using molecular dynamics simulations and Markov state models. J. Chem. Theory Comput. 2018;14:2721–2732. doi: 10.1021/acs.jctc.7b01170. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Dantas G., Kuhlman B., Baker D. A large scale test of computational protein design: folding and stability of nine completely redesigned globular proteins. J. Mol. Biol. 2003;332:449–460. doi: 10.1016/s0022-2836(03)00888-x. [DOI] [PubMed] [Google Scholar]
  • 15.Lockless S.W., Ranganathan R. Evolutionarily conserved pathways of energetic connectivity in protein families. Science. 1999;286:295–299. doi: 10.1126/science.286.5438.295. [DOI] [PubMed] [Google Scholar]
  • 16.Burger L., van Nimwegen E. Accurate prediction of protein–protein interactions from sequence alignments using a Bayesian method. Mol. Syst. Biol. 2008;4:165. doi: 10.1038/msb4100203. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Burger L., van Nimwegen E. Disentangling direct from indirect co-evolution of residues in protein alignments. PLoS Comput. Biol. 2010;6:e1000633. doi: 10.1371/journal.pcbi.1000633. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Bowman G.R., Bolin E.R., Marqusee S. Discovery of multiple hidden allosteric sites by combining Markov state models and experiments. Proc. Natl. Acad. Sci. USA. 2015;112:2734–2739. doi: 10.1073/pnas.1417811112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Vajda S., Beglov D., Whitty A. Cryptic binding sites on proteins: definition, detection, and druggability. Curr. Opin. Chem. Biol. 2018;44:1–8. doi: 10.1016/j.cbpa.2018.05.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Bowman G.R., Geissler P.L. Equilibrium fluctuations of a single folded protein reveal a multitude of potential cryptic allosteric sites. Proc. Natl. Acad. Sci. USA. 2012;109:11681–11686. doi: 10.1073/pnas.1209309109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Berendsen H.J., van der Spoel D., van Drunen R. GROMACS: a message-passing parallel molecular dynamics implementation. Comput. Phys. Commun. 1995;91:43–56. [Google Scholar]
  • 22.Duan Y., Wu C., Kollman P. A point-charge force field for molecular mechanics simulations of proteins based on condensed-phase quantum mechanical calculations. J. Comput. Chem. 2003;24:1999–2012. doi: 10.1002/jcc.10349. [DOI] [PubMed] [Google Scholar]
  • 23.Jorgensen W.L., Chandrasekhar J., Klein M.L. Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 1983;79:926–935. [Google Scholar]
  • 24.Shirts M., Pande V.S. COMPUTING: screen savers of the world unite! Science. 2000;290:1903–1904. doi: 10.1126/science.290.5498.1903. [DOI] [PubMed] [Google Scholar]
  • 25.Su Z., Wu C., Amarasinghe G.K. Electron cryo-microscopy structure of ebola virus nucleoprotein reveals a mechanism for nucleocapsid-like assembly. Cell. 2018;172:966–978.e12. doi: 10.1016/j.cell.2018.02.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Singh S., Bowman G.R. Quantifying allosteric communication via both concerted structural changes and conformational disorder with CARDS. J. Chem. Theory Comput. 2017;13:1509–1517. doi: 10.1021/acs.jctc.6b01181. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Shrake A., Rupley J.A. Environment and exposure to solvent of protein atoms. Lysozyme and insulin. J. Mol. Biol. 1973;79:351–371. doi: 10.1016/0022-2836(73)90011-9. [DOI] [PubMed] [Google Scholar]
  • 28.McGibbon R.T., Beauchamp K.A., Pande V.S. MDTraj: a modern open library for the analysis of molecular dynamics trajectories. Biophys. J. 2015;109:1528–1532. doi: 10.1016/j.bpj.2015.08.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Coelho L.P. Jug: software for parallel reproducible computation in python. J. Open Res. Softw. 2017;5:30. [Google Scholar]
  • 30.Zimmerman M.I., Porter J.R., Bowman G.R. Choice of adaptive sampling strategy impacts state discovery, transition probabilities, and the apparent mechanism of conformational changes. J. Chem. Theory Comput. 2018;14:5459–5475. doi: 10.1021/acs.jctc.8b00500. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Prinz J.H., Wu H., Noé F. Markov models of molecular kinetics: generation and validation. J. Chem. Phys. 2011;134:174105. doi: 10.1063/1.3565032. [DOI] [PubMed] [Google Scholar]
  • 32.E W., Vanden-Eijnden E. Transition-path theory and path-finding algorithms for the study of rare events. Annu Rev Phys Chem. 2010;61:391–420. doi: 10.1146/annurev.physchem.040808.090412. [DOI] [PubMed] [Google Scholar]
  • 33.Metzner P., Schütte C., Vanden-Eijnden E. Transition path theory for markov jump processes. Multiscale Model. Simul. 2009;7:1192–1219. [Google Scholar]
  • 34.Frey B.J., Dueck D. Clustering by passing messages between data points. Science. 2007;315:972–976. doi: 10.1126/science.1136800. [DOI] [PubMed] [Google Scholar]
  • 35.Pedregosa F., Varoquaux G., Duchesnay E. Scikit-learn: machine learning in python. J. Mach. Learn. Res. 2011;12:2825–2830. [Google Scholar]
  • 36.Hagberg A., Swart P., Schult D. Exploring network structure, dynamics, and function using networkx. In: Varoquaux G., Vaught T., Millman J., editors. Proceedings of the 7th Python in Science Conference. Pasadena; 2008. pp. 11–15. [Google Scholar]
  • 37.Noé F., Banisch R., Clementi C. Commute maps: separating slowly mixing molecular configurations for kinetic modeling. J. Chem. Theory Comput. 2016;12:5620–5630. doi: 10.1021/acs.jctc.6b00762. [DOI] [PubMed] [Google Scholar]
  • 38.Oliphant T.E. Python for scientific computing. Comput. Sci. Eng. 2007;9:10–20 [Google Scholar]
  • 39.Hart K.M., Ho C.M., Bowman G.R. Modelling proteins’ hidden conformations to predict antibiotic resistance. Nat. Commun. 2016;7:12965. doi: 10.1038/ncomms12965. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Berger A., Linderstrøm-Lang K. Deuterium exchange of poly-DL-alanine in aqueous solution. Arch. Biochem. Biophys. 1957;69:106–118. doi: 10.1016/0003-9861(57)90478-2. [DOI] [PubMed] [Google Scholar]
  • 41.Pace C.N., Shaw K.L. Linear extrapolation method of analyzing solvent denaturation curves. Proteins. 2000;41:1–7. doi: 10.1002/1097-0134(2000)41:4+<1::aid-prot10>3.3.co;2-u. [DOI] [PubMed] [Google Scholar]
  • 42.DeLano, W.L. 2002. The PyMOL molecular graphics system. Delano Scientific, San Carlos.
  • 43.Fruchterman T.M.J., Reingold E.M. Graph drawing by force-directed placement. Softw. Pract. Exper. 1991;21:1129–1164. [Google Scholar]
  • 44.Porter J.R., Zimmerman M.I., Bowman G.R. Enspara: modeling molecular ensembles with scalable data structures and parallel computing. bioRxiv. 2018 doi: 10.1063/1.5063794. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.McClendon C.L., Friedland G., Jacobson M.P. Quantifying correlations between allosteric sites in thermodynamic ensembles. J. Chem. Theory Comput. 2009;5:2486–2502. doi: 10.1021/ct9001812. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Noé F., Doose S., Smith J.C. Dynamical fingerprints for probing individual relaxation processes in biomolecular dynamics with simulations and kinetic experiments. Proc. Natl. Acad. Sci. USA. 2011;108:4822–4827. doi: 10.1073/pnas.1004646108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Hendlich M., Rippmann F., Barnickel G. LIGSITE: automatic and efficient detection of potential small molecule-binding sites in proteins. J. Mol. Graph. Model. 1997;15:359–363, 389. doi: 10.1016/s1093-3263(98)00002-3. [DOI] [PubMed] [Google Scholar]
  • 48.Huang B., Schroeder M. LIGSITEcsc: predicting ligand binding sites using the Connolly surface and degree of conservation. BMC Struct. Biol. 2006;6:19. doi: 10.1186/1472-6807-6-19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Coleman R.G., Sharp K.A. Protein pockets: inventory, shape, and comparison. J. Chem. Inf. Model. 2010;50:589–603. doi: 10.1021/ci900397t. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Decherchi S., Rocchia W. A general and robust ray-casting-based algorithm for triangulating surfaces at the nanoscale. PLoS One. 2013;8:e59744. doi: 10.1371/journal.pone.0059744. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Hardy J.A., Wells J.A. Searching for new allosteric sites in enzymes. Curr. Opin. Struct. Biol. 2004;14:706–715. doi: 10.1016/j.sbi.2004.10.009. [DOI] [PubMed] [Google Scholar]
  • 52.Horn J.R., Shoichet B.K. Allosteric inhibition through core disruption. J. Mol. Biol. 2004;336:1283–1291. doi: 10.1016/j.jmb.2003.12.068. [DOI] [PubMed] [Google Scholar]
  • 53.Hart K.M., Moeder K.E., Bowman G.R. Designing small molecules to target cryptic pockets yields both positive and negative allosteric modulators. PLoS One. 2017;12:e0178678. doi: 10.1371/journal.pone.0178678. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Agua-Agum J., Ariyarajah A., Wijekoon Kannangarage N., WHO Ebola Response Team West African Ebola epidemic after one year--slowing but not yet under control. N. Engl. J. Med. 2015;372:584–587. doi: 10.1056/NEJMc1414992. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Ruigrok R.W., Crépin T., Kolakofsky D. Nucleoproteins and nucleocapsids of negative-strand RNA viruses. Curr. Opin. Microbiol. 2011;14:504–510. doi: 10.1016/j.mib.2011.07.011. [DOI] [PubMed] [Google Scholar]
  • 56.Leung D.W., Borek D., Amarasinghe G.K. An intrinsically disordered peptide from Ebola virus VP35 controls viral RNA synthesis by modulating nucleoprotein-RNA interactions. Cell Rep. 2015;11:376–389. doi: 10.1016/j.celrep.2015.03.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Brown A.M., Crothers D.M. Modulation of the stability of a gene-regulatory protein dimer by DNA and cAMP. Proc. Natl. Acad. Sci. USA. 1989;86:7387–7391. doi: 10.1073/pnas.86.19.7387. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Schultz S.C., Shields G.C., Steitz T.A. Crystal structure of a CAP-DNA complex: the DNA is bent by 90 degrees. Science. 1991;253:1001–1007. doi: 10.1126/science.1653449. [DOI] [PubMed] [Google Scholar]
  • 59.Seok S.H., Im H., Lee B.J. Structures of inactive CRP species reveal the atomic details of the allosteric transition that discriminates cyclic nucleotide second messengers. Acta Crystallogr. D Biol. Crystallogr. 2014;70:1726–1742. doi: 10.1107/S139900471400724X. [DOI] [PubMed] [Google Scholar]
  • 60.Harman J.G. Allosteric regulation of the cAMP receptor protein. Biochim. Biophys. Acta. 2001;1547:1–17. doi: 10.1016/s0167-4838(01)00187-x. [DOI] [PubMed] [Google Scholar]
  • 61.Heyduk E., Heyduk T., Lee J.C. Intersubunit communications in Escherichia coli cyclic AMP receptor protein: studies of the ligand binding domain. Biochemistry. 1992;31:3682–3688. doi: 10.1021/bi00129a017. [DOI] [PubMed] [Google Scholar]
  • 62.Popovych N., Sun S., Kalodimos C.G. Dynamically driven protein allostery. Nat. Struct. Mol. Biol. 2006;13:831–838. doi: 10.1038/nsmb1132. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Belduz A.O., Lee E.J., Harman J.G. Mutagenesis of the cyclic AMP receptor protein of Escherichia coli: targeting positions 72 and 82 of the cyclic nucleotide binding pocket. Nucleic Acids Res. 1993;21:1827–1835. doi: 10.1093/nar/21.8.1827. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Bonacich P. Power and centrality: a family of measures. Am. J. Sociol. 2015;92:1170–1182. [Google Scholar]
  • 65.Adachi H., Ohta T., Matsuzawa H. Site-directed mutants, at position 166, of RTEM-1 beta-lactamase that form a stable acyl-enzyme intermediate with penicillin. J. Biol. Chem. 1991;266:3186–3191. [PubMed] [Google Scholar]
  • 66.Schmidtke P., Barril X. Understanding and predicting druggability. A high-throughput method for detection of drug binding sites. J. Med. Chem. 2010;53:5858–5867. doi: 10.1021/jm100574m. [DOI] [PubMed] [Google Scholar]
  • 67.Wang X., Minasov G., Shoichet B.K. Evolution of an antibiotic resistance enzyme constrained by stability and activity trade-offs. J. Mol. Biol. 2002;320:85–95. doi: 10.1016/S0022-2836(02)00400-X. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Video S1. Rigid-Body Movement of Catabolite Activator Protein’s DNA-Binding Domains Identified by Exposons

A see-saw motion between Catabolite Activator Protein’s DNA-binding domains and its cAMP-binding domains coupled to opening of the cAMP binding sites suggests a structural hypothesis for how negative coupling between cAMP binding sites might be achieved.

Download video file (7.5MB, mp4)
Document S1. Supporting Materials and Methods, Figs. S1–S8, and Tables S1–S3
mmc1.pdf (2MB, pdf)
Document S2. Article plus Supporting Material
mmc3.pdf (4MB, pdf)

Articles from Biophysical Journal are provided here courtesy of The Biophysical Society

RESOURCES