Abstract
It is imperative to identify the network of residues essential to the allosteric coupling for the purpose of rationally engineering allostery in proteins. Deep mutational scanning analysis has emerged as a function-centric approach for identifying such allostery hotspots in a comprehensive and unbiased fashion, leading to observations that challenge our understanding of allostery at the molecular level. Specifically, a recent deep mutational scanning study of the tetracycline repressor (TetR) revealed an unexpectedly broad distribution of allostery hotspots throughout the protein structure. Using extensive molecular dynamics simulations (up to 50 μs) and free energy computations, we establish the molecular and energetic basis for the strong anti-cooperativity between the ligand and DNA binding sites. The computed free energy landscapes in different ligation states illustrate that allostery in TetR is well described by a conformational selection model, in which the apo state samples a broad set of conformations, and specific ones are selectively stabilized by either ligand or DNA binding. By examining a range of structural and dynamic properties of residues at both local and global scales, we observe that various analyses capture different subsets of experimentally identified hotspots, suggesting that these residues modulate allostery in distinct ways. These results motivate the development of a thermodynamic model that qualitatively explains the broad distribution of hotspot residues and their distinct features in molecular dynamics simulations. The multi-faceted strategy that we establish here for hotspot evaluations and our insights into their mechanistic contributions are useful for modulating protein allostery in mechanistic and engineering studies.
Graphical Abstract

Introduction
Allostery, which couples distant sites in a protein, is a fundamental and prevalent regulatory mechanism in many life processes,1,2 with prominent examples including signal and energy transduction as well as gene transcription and enzyme catalysis.3–5 Moreover, targeting allosteric rather than active sites has emerged as an attractive strategy for drug design,6–8 especially for targets (e.g., kinases) that implicate ubiquitous molecules in the cell, such as ATP, as the substrate. Therefore, there has been a long-standing interest in understanding the molecular basis of allostery, with the ultimate goal of controlling allostery for biomedical and bioengineering applications.9 Thanks to decades of experimental and computational analyses, the general principles that govern allostery are understood in an outline form;10–13 for example, in the popular Monod-Wyman-Changeux (MWC) model,1,12 equilibria between pre-existing conformations are modulated by the binding of allosteric effectors, leading to different activities in the functional site. While the key parameters in such statistical thermodynamic models can be derived based on experimental data and assigned physical significance,14 the molecular details that encode allostery often remain obscure. As a result, while engineerings of allosteric responses have been reported,15–17 the number of successful applications remains limited.
To realize the engineering of allostery with a molecular level of precision, it is imperative to systematically identify residues that dictate the degree of co-operativity between allosteric and active sites in a protein and establish the mechanism of their contributions. Indeed, robust search for these “allostery hotspot” residues has remained a major challenge in the field for both experimental and computational approaches. In experiments, most investigations use X-ray crystallography,5 nuclear magnetic resonance18–20 or single-molecule spectroscopies21 to target residues that undergo changes in their chemical environment or dynamics as part of the allosteric response; the role of specific residues in allostery is then verified with site-directed mutagenesis. While such studies can provide valuable characterizations of the properties of specific residues, they are time-consuming and thus typically limited to a handful of sites motivated a priori by structural considerations, which prevents a comprehensive analysis of critical residues; moreover, the energetic and mechanistic contributions of these residues to allostery are often challenging to evaluate based on experiments alone. For a more efficient exploration of allostery hotspots and analysis of their contributions, various computational approaches have been proposed over the years, ranging from sequence-based techniques that focus on statistics of amino acid co-evolution22,23 through structural-informatics based models probing propagation of conformational distortions24,25 to various molecular dynamics-based approaches that analyze transient intermediates26 or networks of motional coupling27–31 among protein residues. While experimental support for computationally predicted hotspots has been reported in many studies,32–35 the validation is often not comprehensive due to, again, the limited amount of experimental data. Many previous computational studies conducted analyses with relatively short molecular simulations, typically on the order of a few hundreds of nanoseconds or less, limiting the robustness of the results. Moreover, each study tended to focus on residues that exhibit a particular type of properties, which may lead to the miss of critical residues with other features. As a result, the predictive power of these computational approaches remains to be systematically evaluated.
A case in point is a bacteria transcription factor, TetR(B) (Fig. 1a), which belongs to a family of well-characterized transcription repressors that are induced by a broad set of molecules ranging from antibiotics (e.g., tetracycline) to lipids and signaling molecules.36 With more than 105 members, these one-component regulators are widely associated with antibiotic resistance as well as regulation of genes involved in metabolism, antibiotic production, quorum sensing, and many other aspects of prokaryotic physiology; they have also been transferred into mammalian cells.37 Thus understanding the induction mechanism of these proteins is of broad interest from the perspective of developing novel strategies for battling bacterial infections, drug resistance, and engineering mammalian genetic sensors and circuits.
Figure 1:

Key conformational states of TetR and allostery hotspot distributions from deep mutational scanning analysis. (a) The crystal structure of TetR with bound minocycline and Mg2+ (PDB code: 4AC0). Some key structural elements are highlighted in colors; DNA binding domain (DBD, α1–3): red; the ligand binding domain (LBD) includes several secondary structural elements surrounding the ligand, α4: green, α7: blue, α8: mauve, α6 – α7 linker (l6): orange. The prime (’) sign denotes structural elements in the other monomer. (b) Allosteric hotspots determined in deep mutational scanning experiments.52 The red and orange colors highlight strong and intermediate hotspots, respectively. They are defined as sites for which at least 50% and 22%, respectively, of mutations abolished induction, and are not in direct contact with the ligand in the crystal structure. As described in Ref. 52, the hotspot residues are distributed in four regions: region 1 is at the interface of the DBD and LBD on α4 and α6, region 2 is a short motif connecting α7 and α8, region 3 is at the dimer interface consisting of α8, and region 4 is at the C-terminus of α10. (c) A cartoon representation for the putative allostery mechanism of TetR induction,40,47 which involves movements of several helices upon ligand (L) binding that ultimately propagate into the pendulum motion of α4 and reorientation of the DBDs; only one monomer is shown for clarity.
In TetR(B), which will be simply referred to as TetR below (unless when making distinction with other homologs), ligand (tetracycline·Mg2+) and DNA bindings are highly anti-cooperative, despite the long distance (>20 Å) between the two binding sites. Structural, biochemical and molecular dynamics analyses38–48 suggested that allostery in TetR can be largely explained in terms of the classical two-state model,46,49 in which the two conformational states exhibit distinct DNA binding affinities and their populations are modulated by the binding of ligand. In a putative mechanism40,47 (Fig. 1c), ligand binding perturbs helix 8 of one TetR monomer and pulls on helix 6 of the other monomer towards the central core of the structure; the last turn of helix 6 unwinds (H100-T103) and shifts to interact with Mg2+. Hydrophobic interactions with helix 6 then pull helix 4 in a pendulum motion, leading to reorientation and further separation of the two DNA binding domains by ~5 Å and thus a significant decrease of DNA binding affinity. With this mechanism, one would predict that allostery hotspots are localized to regions that connect the ligand and DNA binding sites, which has been verified by numerous mutation experiments.47,50 However, many more residues were known to contribute;38,41 for example, the pioneering mutagenesis study of Hillen and co-workers51 identified 93 single-site TetR mutations with a non-inducible phenotype. While ~60 mutations were clustered in two central regions, non-inducible mutations were observed to spread over the entire TetR structure up to the C-terminus.
Advances in the next-generation sequencing technology make it possible to conduct deep mutational scanning analysis of protein functions,52,53 in which every site in a protein is exhaustively mutated to the other 19 types of amino acids. Such a function-centric strategy complements the traditional reductionist approach in providing a comprehensive and thus unbiased analysis of the sequence-function relationship. Specifically for allostery, saturation mutations followed by high-throughput functional readouts are able to identify allostery hotspots in a much more comprehensive fashion and thus poised to challenge our understanding of allostery at the molecular level. Motivated by such considerations, Leander and co-workers52 conducted deep mutational scanning analysis of TetR as one of the first applications to an allosteric system and observed a rather unexpected distribution of allostery hotspots (Fig. 1b) in comparison to previous analyses.40,41,47 As shown in Fig. 1b, the allostery hotspots, established based on the fraction of mutations at a given site that abolished induction, were observed to be distributed over a significant portion of the protein structure, in contrast to the popular view that hotspot residues tend to form well-defined pathways for “information transduction”.22,34 For example, allostery hotspots were observed in the C-terminal region of helix 7, in a major portion of helix 8 and in the C-terminus of helix 10, most of which are far from any perceivable pathways that connect the ligand and DNA binding sites.
The remarkable discrepancy between the hotspot distribution expected based on the accepted allostery mechanism in TetR (Fig. 1c) and that from deep mutational scanning analysis calls for an in-depth investigation of the molecular basis of allostery in TetR; due to the conservation of key structural and therefore mechanistic features,36 the insights we gain specifically for TetR(B) are expected to be applicable to other TetR family members. Moreover, the broad hotspot distribution suggests that the contributions of these residues are unlikely explained by a single mechanism, thus calling for different analysis strategies compared to previous computational studies. We aim to tackle these major challenges with extensive molecular dynamics and free energy calculations, which enable a systematic analysis of residues potentially important to the anti-cooperativity between ligand and DNA bindings.
Specifically, we have conducted 6–50 μs unbiased explicit solvent atomistic simulations for the three distinct ligation states of TetR: apo, ligand-bound and DNA-bound, which are up to two orders of magnitude longer than previous computational analyses.42,43,52,54 The thorough sampling enables us to employ statistical analysis based on the Jensen-Shannon divergence55 to select collective variables that properly characterize the conformational state of TetR, and construct free energy landscapes spanned by these collective variables. The computed free energy surfaces in different ligation states clearly illustrate that allostery in TetR is well described by a conformational selection model,49 in which the apo state samples a broad set of conformational basins, and specific ones are selectively stabilized by either ligand or DNA binding. We have also systematically examined a range of structural and dynamic properties of residues at both local and global scales, which is demonstrated to be an effective strategy for establishing features that help identify hotspots and infer their physical contributions to allostery. The results suggest that residues may contribute to allostery by controlling either the relative stability of key free energy basins or the propagation of conformational cascades between the two binding sites. Accordingly, we develop an MWC-like thermodynamic model12,56 to qualitatively explain the broad distribution of hotspot residues and their distinct features from molecular dynamics simulations.
The multi-faceted computational framework we establish here for hotspot evaluations and the insights we gain into their mechanistic contributions to allostery are particularly relevant to the analysis of allosteric systems that feature relatively modest degrees of conformational changes.57,58 Such systems include many transcription regulators44,59 that share the general two-domain structural features as the TetR family. However, the strategy that integrates comprehensive analyses based on different metrics at the molecular level for understanding the underlying mechanism of allostery and diverse roles of allostery hotspots is applicable to allosteric systems in general. Moreover, we speculate that the broad distribution of allostery hotspots is relevant to many systems, further supporting the idea that protein functions can be regulated by perturbing deeply buried regions and surface residues,23 including cryptic binding sites.60,61
Computational Methods
System setup and equilibrium MD simulations
The starting structure for the ligand-bound simulations is the crystal structure (PDB code: 4AC0) of TetR(B) with bound Mg2+ and minocycline; the latter is converted to the ligand used in the experimental studies in Ref. 52, anhydrotetracycline (aTC), which is treated using the CHARMM General Force Field (CGENFF) model;62 the protonation state of the ligand was adapted based on the computational analysis of Simonson and co-workers.63,64 The Mg2+ ion is treated using the standard CHARMM force field model,65 and His64, which forms close interaction with the ligand, is protonated at the Nϵ position. For the DNA-bound state, the DNA-bound TetR(D) crystal structure (PDB code: 1QPI) is used as the template to first generate a structural model for the DNA-bound TetR(B) using MODELLER66 (the level of sequence homology between TetR(B) and TetR(D) is 66%). A short targeted MD is then performed to pull the ligand-bound TetR(B) crystal structure (without ligand or Mg2+) towards the modeled DNA-bound TetR(B) conformation. The final structure after 20 ps targeted MD simulation is combined with the DNA 15-mer (sequence CCTATCAATGATAGA) from the DNA-bound TetR(D) crystal structure to establish the starting coordinates of the TetR(B)-DNA complex for subsequent simulations (see Fig. S1). The starting structure of apo TetR(B) is generated in the same way with the apo TetR(D) crystal structures as the template (PDB code: 1BJZ). To facilitate the exploration of the broad conformational space in the apo state, two more starting structures of the apo state are generated by deleting the ligand-Mg2+ complex and DNA in the ligand-bound and DNA-bound TetR(B) structures, respectively. CHARMM-GUI67 is used to model the missing protein residues, solvate proteins in TIP3P water68 cubic boxes with a 15.0 Å edge distance under periodic boundary conditions (PBC), and randomly place 150 mM NaCl ions to neutralize the system and mimic the physiological condition. Particle-mesh Ewald (PME) summation69 is used to calculate the electrostatic interactions. Van der Waals (vdW) interactions are treated by a cutoff distance of 12 Å and a switch distance of 10 Å. SHAKE algorithm is used in all simulations to constrain bonds involving hydrogen atoms.
Systems are energy minimized with steepest descent (SD) and Adopted Basis Newton-Raphson (ABNR) algorithms before equilibration runs with a time step of 1 fs in CHARMM70 for 500 ps in NVT (constant particle number, volume and temperature) ensemble using the CHARMM36m force field for both the protein65 and DNA.71 Weak harmonic restraints are applied to protein backbone (force constant: 1 kcal/(mol Å2)) and sidechain (force constant: 0.1 kcal/(mol Å2)) heavy atoms during equilibration runs. NPT (constant particle number, pressure and temperature) production runs are carried out with a time step of 2 fs in OpenMM 7.372 with GPU acceleration. All simulations are maintained at 303.15 K and under 1 bar. Langevin integrator with a friction coefficient of 1 ps−1 and MonteCarloBarostat with the pressure coupling frequency of 100 steps are used in OpenMM production runs.
For the Anton 2 simulations,73 the equilibration runs are extended to 5 ns in CHARMM. The CHARMM topology, final coordinates and velocity after equilibration runs are used to convert the system into Anton 2 compatible format in VMD. The Anton 2 production runs are performed in the NPT ensemble. The multigrator scheme74 is used to allow for the separate updates of the thermostat, barostat, and Newtonian particle dynamics. The simulation temperature and pressure are controlled with a Nosé-Hoover thermostat75,76 and a Martyna-Tobias-Klein (MTK)77 barostat with isotropic scaling, respectively. The thermostat and barostat are updated every 24 steps and every 480 steps, respectively. Van der Waals interactions are calculated with a cutoff of 9 Å, and long-rage electrostatic interactions are calculated using the u-series method based on Gaussian split Ewald.78 The reversible reference system propagator algorithm (RESPA)79 is used with a time step of 2.4 fs to integrate the long-range non-bonded forces every 5 steps and short-range non-bonded and bonded forces at every step. Production runs are saved every 10 ps and 240 ps in OpenMM and Anton 2 runs respectively. Overall, by collecting independent simulations on local GPUs and on Anton 2, 6 μs trajectories (1 μs on local GPUs and 5 μs on Anton 2) for the ligand-bound and DNA-bound states, and 53 μs trajectories (3 independent trajectories of 1 μs length on local GPUs and 3 independent trajectories of 15–20 μs each on Anton 2) for the apo state are collected.
Critical DNA Binding Domain (DBD) order parameters
For the changes in DBDs, previous structural and MD analyses40–43 highlighted several possible structural parameters (Fig. 2a): the center-of-mass separation between the two DBDs, which controls the contacts between the recognition helices (α3) and the DNA, and the angle between α4 during the simulation and the reference orientation in the DNA-bound crystal structure, which describes the pendulum-like motion during the proposed induction mechanism.41,47 To compute the α4 angles, a vector connecting the center-of-mass of backbone atoms in K48 and H63 is defined. The normalized vector in the starting structure of the DNA-bound TetR(B) structure is the reference. Then the dot products between the computed vectors in each frame from MD simulations and the reference vector are calculated and converted to angles. In addition, the scissor-like motion of the two DBDs have also been noted,42 and we describe this relative twisting motion with the angle between the long principal axes of the two DBDs. Specifically, the principal axes from the moment of inertia of DBD residues are computed individually in each monomer for each frame, and the dot products between the two first principal axes are computed and converted to the DBD twist angles.
Figure 2:

Structural parameters relevant to the structural transitions of the DBDs in TetR among different ligation states and their distributions in different MD ensembles. (a) A schematic diagram that illustrates the structural parameters commonly discussed for the structural transitions of the DBDs during induction, including the center-of-mass separation between the two DBDs (DBD distance), the angle between the long principal axes of the two DBDs (DBD twist) and the angle between the α4 during simulations and the reference orientation in the starting structure of the DNA bound state (α4 angle). (b-d) The corresponding probability distributions of these structural parameters. The vertical dashed lines denote the values in the DNA-bound crystal structure.
The distributions of these parameters during the MD simulations of the three ligation states are compared in Fig. 2b–d, which illustrate that, for all these parameters, the distributions for the ligand-bound and DNA-bound states are indeed distinct, although there is also a non-negligible amount of overlap in all cases. For example, previous MD simulations on the order of 50–100 ns43 suggested that the distance separation between the two DBDs can be used to distinguish induced (ligand-bound) and non-induced (DNA-bound) conformations. Our simulations largely support this statement, since the ligand-bound and DNA-bound distributions peak around ~42 and ~37 Å, respectively. However, in our substantially longer MD simulations, the two distributions feature a finite overlap in the distance range of 37–42 Å. The observed distribution overlaps suggest that free energy changes along these individual structural parameters are low in magnitude, which is not consistent with the significantly different DNA binding affinities for the ligand- and DNA-bound states. In other words, the motions of the DBDs during induction are more complex in nature; thus, a collective variable based on principal component analysis (Fig. S2) is more appropriate for capturing the relevant free energy profiles.80 Accordingly, principal component analysis is performed using the combined trajectories of the ligand-bound, DNA-bound, and apo states with only the coordinates of Cα atoms; using only the apo trajectories leads to similar results (Fig. S11). The pattern of the movement along PC1 is illustrated in VMD and pymol either represented by arrows or a color bar (Fig. S2).
The apo state, which is substantially more flexible (Fig. S3), exhibits broader distributions of all three structural parameters, especially for the DBD-distance and α4-angle. As a result, the apo distributions overlap well with those from both ligand and DNA bound states, hinting at a conformational selection mechanism in TetR (see Results).
Critical Ligand Binding Domain (LBD) order parameters
To describe structural rearrangements in the ligand-binding pocket in different ligation states, which are generally subtle in nature, it is not straightforward to establish a set of geometrical parameters based on, for example, the comparison of static structures. Accordingly, we first identify the set of residues that are in contact with the ligand during extensive (~6 μs) MD simulations (Fig. 3a; for sample time traces, see Fig. S4); specifically, protein residues in which at least one heavy atom is within 5 Å of any heavy atoms in the ligand for at least 75% of the simulation time are defined as the ligand-contacting residues. Next, we compare the Jensen-Shannon (JS) divergence of all pair-wise distances among these residues between the ligand-bound and DNA-bound simulations; the pairs that exhibit the highest JS divergences are then identified and their distance distributions examined. Several examples are shown in Fig. 3b–c and Fig. S5. The two pairs that well distinguish the ligand- and DNA-bound ensembles involve Q109 and E147’, whose distance characterizes the separation between α7 in one monomer and α8 in the other monomer, and T103 and F177’, whose distance characterizes the separation between l6 and α9 in the other monomer. As shown in Fig. 3b–c, their distributions exhibit minimal degrees of overlap, suggesting that they can be used as meaningful collective variables that describe variation in LBD’s compactness, which induces a series of conformational changes that propagate to the DBDs through the domain interface (vide infra). Similar to the situation for the DBDs shown in Fig. 2, the distributions of these parameters in the apo state are broad and have a significant degree of overlap with both ligand- and DNA-bound states, again highlighting a conformation selection mechanism (see Results and Discussions).
Figure 3:

Structural properties of the LBD in TetR. (a) Chemical structure of the ligand. (b) The list of ligand-contacting residues (see text for the selection criterion), which is identical for the two monomers, includes α4: L60, H64; α4 – α5 linker (l4): F67; α5: N82, F86; α6: H100; α6 – α7 linker (l6): T103, R104, P105; α7: Q109, T112, L113, Q116, L117; α8: L131, L134, S135, E147’; α9: I174’, F177’. The side chains and the ligand are shown in licorice; the Mg2+ ion is shown as a pink sphere. (c-d) Representative distributions of pair-wise distances of the ligand-contacting residues identified as those that feature the largest JS divergence values between multi-μs ligand- and DNA-bound simulations; they suggest distinct compactness of the LBD among the ligand-bound, DNA-bound, and apo states. The difference is captured by (c) the distance between α7 and α8’ (represented by the Cα distance between Q109 and E147’); and (d) the distance between l6 and α9’ (represented by the Cα distance between T103 and F177’). More inter-helical or loop-helical distances that characterize the compactness of the ligand-binding site are included in Fig. S5.
Free energy landscape projection
Two-dimensional (2D) free energy landscapes are generated by first constructing a 2D distribution along the selected collective variables and then converting the probability distribution to free energy by ΔF(V1,V2) = −[ln p(V1,V2) – ln pmax(V1,V2)], where p(V1,V2) is an estimate of the joint probability density function based on the 2D histogram of the data (V1,V2). To ensure that ΔF = 0 at the free energy minimum, ln pmax(V1,V2) is subtracted from the free energy. The one-dimensional (1D) free energy landscape is also constructed by ΔF(V1) = −[ln p(V1) – ln pmax(V1)]. Note that the above equations calculate the reduced free energy difference which has the unit of kBT where kB is the Boltzmann constant and T is the temperature in the unit of Kelvin. For the apo state, the free energy landscape is obtained by collecting statistics from independent trajectories with different starting structures (see above). Examination of the distributions of key collective variables confirms substantial overlaps of these independent apo simulations (Figs. S6–S7) and supports the combination of these trajectories for meaningful free energy analyses.
RMSD and RMSF
Backbone atoms in the DNA-bound or apo states are aligned against the backbone atoms in the ligand-bound state. Coordinates of the backbone atoms in each residue are used to compute the root-mean-square deviation (RMSD) per residue between two crystal structures. Only Cα atoms are included in the Root-mean-square fluctuation (RMSF) calculation for each residue relative to the average structures.
SAXS profiles calculations
The small angle X-ray scattering (SAXS) profiles based on MD trajectories of the ligand-bound and apo states are computed by the FoXS method.81 Profiles of the cluster centroids are computed and then reweighted by the minimal ensemble search (MES) method to best fit the experimental SAXS profiles.48
Contact probability calculations
A contact between two protein residues is formed if the distance between the two α carbon (Cα) atoms is within the 5 Å cutoff in a give frame. The contact probability is the fraction of the frames in which a given contact exists.
Analysis of JS divergence in dihedral space
Distributions of the backbone (ϕ, ψ) and sidechain (χ1) dihedral angles are calculated for the ligand-bound, DNA-bound, and apo states. To compare the similarity of the distributions of the same dihedral angle between different states, Jensen-Shannon (JS) divergence is computed, which is symmetric and bounded between 0 and 1 for easier comparison in our cases. A larger JS divergence indicates a larger difference between the two distributions under comparison. JS divergence between two distributions P and Q is as follows:
| (1) |
| (2) |
where distribution and J and D denote JS divergence and Kullback-Leibler divergence, respectively.
Covariance and dynamical network analysis
The Pearson correlation coefficients between the Cartesian coordinates of Cα atoms in each residue are computed in the same way as our previous study.52 The mutual information between the Cartesian coordinates of Cα atoms in each residue is also computed to capture the non-linear correlation.82
Dynamical network analysis for the apo state is constructed with the Bio3D package83 based on the Pearson correlation coefficients. Self-correlations and correlations between nearest-neighbors are excluded. Nodes in the network represent protein residues and are connected by edges. The weight (“length”) of the edge between two nodes i and j is wij = −ln cij, where cij is the Pearson correlation coefficient between i and j. The shortest path between two nodes is found using the Floyd-Warshall shortest path algorithm, which minimizes the total length between two nodes. The suboptimal path analysis is also done in Bio3D. The source residues are the DNA-binding residues (residue ID 2 to 47) and the sink residues are the ligand-contacting residues described above. 10 suboptimal paths including the shortest path are found for each pair of source and sink residues. In each suboptimal path, residues between the source and sink residues are recorded and the occurrences of these residues are counted. Residues with Top 50 occurrences are selected as the hub residues.
Results
In this section, we first present computed free energy landscapes for the three ligation states (apo, ligand-bound and DNA-bound) to establish the energetic basis for the anti-cooperativity between the ligand-binding and DNA-binding sites. To gain further mechanistic insights into residues that dictate the anti-cooperativity, we conduct systematic comparisons for these structural ensembles, followed by the analysis of motional correlations among residues.
Computed free energy landscapes reveal a conformational selection mechanism for TetR
With the extensive amount of sampling (6–50 μs) conducted here, we construct the free energy landscapes by projecting the MD ensembles onto two collective variables that represent structural features of the LBDs and DBDs (see Computational Methods), respectively. In this way, the energetic coupling between these distant sites is explicitly visualized. For the DBDs, the first principal component is used to describe their collective structural transitions (see Fig. S2). For the LBDs, we explore different pair-wise distances among ligand-contacting residues to characterize the compactness of the binding pocket. Free energy landscapes with different collective variable combinations are shown in Fig. 4 and Fig. S8.
Figure 4:

Computed free energy landscapes for the three ligation states of TetR. The overlay of two-dimensional free energy landscapes for the ligand-bound (grey contour), DNA-bound (brown contour), and apo (green contour) states; individual free energy maps are shown in Fig. S9. Color bars are in the unit of kBT. In all landscapes, the horizontal axis is the first principal component (PC1) and the vertical axis in (a) is the α7 – α8’ distance, and in (b) is the l6-α9’ distance. PC1 mainly reflects the large-amplitude movement of the DBDs, while the inter-helical or helix-loop distances characterize the structural changes in the LBDs. Along both directions, the distributions of the bound states are clearly separated and the apo state ensemble features a broad landscape with multiple basins.
The gross features of the free energy landscapes are not sensitive to the specific distance used to characterize the LBDs. With either the ligand or DNA bound, the system samples a limited region in the space of the collective variables, leading to clearly-defined free energy basins. There is a minimal degree of overlap between the basins that represent the ligand and DNA bound conformations, suggesting a significant free energy penalty in adjusting the DBD conformation starting from the ligand-bound basin to allow favorable DNA binding (also see Fig. S10a for an explicit 1D free energy projection along PC1), which is consistent with the induction function of TetR. Better separation of the ligand and DNA bound states along PC1 compared to the structural parameters discussed in Fig. 2 highlights the complex motions of the DBDs during induction, which involve displacements of their centers of mass as well as a relative twist of the two domains. The widths of the bound basins appear to suggest that the DNA bound state has a larger degree of flexibility than the ligand-bound form, especially along PC1. This difference is due mainly to the minor population observed at the longer DBD distance separation in the DNA-bound state simulations (Fig. 2b). Overall, computed RMSF values (Fig. S3) suggest that both bound states have similar degrees of flexibility, which is best illustrated by the RMSF difference plot (dRMSF) mapped onto the protein structure in Fig. S3e.
In the apo state, regardless of the collective variables used, the free energy landscape is broad and overlaps well with both ligand- and DNA-bound basins (Fig. 4). In the two-dimensional space, the shape of the apo landscape is diagonal (or anti-diagonal, depending on the specific distance used for the vertical axis) in nature, which is a clear manifestation of the coupling between the compactness of the ligand-binding site and motions of the DNA binding domains; in the absence of a strong coupling, a rectangular landscape would be expected instead. Therefore, the computed free energy landscapes clearly indicate a conformational selection mechanism49 for the allostery that underlies the induction function of TetR: the apo state is inherently flexible and samples a broad set of conformations; however, the ligand- and DNA-binding sites are strongly coupled even in the apo state and thus, at a given time, only one of them adopts the configuration favorable for binding, which is in turn stabilized by the actual binding event; the dominant conformational state of TetR thus depends on the concentrations of the ligand and DNA.
Comparisons of structural ensembles and residues that undergo local changes
To shed further light on the molecular mechanism that governs the coupling between the two binding sites, we compare the structural ensembles of the three ligation states since residues that undergo a change in the local environment are likely to play a key role in allostery. Specifically, we compare the three ligation states from several perspectives, which involve thermally averaged conformations, residual contact probabilities and the distribution of key dihedral angles (backbone ϕ, ψ and sidechain χ1). Adopting such a multi-faceted analysis is particularly essential to systems such as TetR, which features relatively modest structural variations among different ligation states.
In terms of thermally averaged conformation, the residual RMSDs between the apo and ligand-bound states are relatively small across the entire protein (Fig. 5a), with larger values observed only in the DBDs, residues close to the ligand, and some residues at the interface between LBDs and DBDs (e.g., the N-terminus of α4, l6, and bottom of α8). While it is not surprising that the ligand-contacting region adopts somewhat different average configurations once the ligand·Mg2+ complex is removed, structural transitions at the bottom of α8, α4 and DBDs in the apo state clearly highlight that these regions are tightly coupled to the ligand binding site. Indeed, comparing the ligand-bound and DNA-bound structural ensembles (Fig. 5b), substantial differences ~4–7 Å are seen in the same general regions that exhibit major variations between the apo and ligand-bound states. The observation that similar structural motifs respond to ligand and DNA binding highlights the co-operativity inherent to the structure.
Figure 5:

Comparison of average structures of different ligation states of TetR. RMSD per residue (in Å) between the ligand-bound state and (a) the apo state, or (b) the DNA-bound state. For both the apo state and the DNA-bound state, larger structural deviation occurs in the DBDs, residues close to the ligand, and some residues at the interface between LBDs and DBDs (e.g., the N-terminus of α4, l6, and the bottom of α8). Note that computed SAXS profiles for the apo and ligand-bound states show good agreement with experimental data (Fig. S12), supporting the validity of our MD ensembles.
To further examine the physical nature of the structural differences, we compute the contact probabilities of residues and evaluate the differences between ligation states. Overall, ligand binding results in new contacts around the ligand binding pocket and near the interface between LBDs and DBDs, while DNA binding mainly affects the contacts in DBDs and the LBD/DBD interface. For example, the DNA-bound state lacks many inter-monomer contacts compared to the ligand-bound state (the upper left quadrant of Fig. S13b), while the ligand-bound state loses many intra-monomer contacts (e.g., in α6 due to unwinding of the last turn) compared to the DNA-bound state (the upper right and the lower left quadrants of Fig. S13b).
In particular, the top contact probability changes consistently point to a set of polar interactions (Fig. 6a–b) that govern the coupling between α9’ (D178’), α6–l6 (G102, T103, R104), near the bottom of α8’ (E147’, E157’ and R158’) and α4 (R49, D53). For example, upon ligand binding, which stabilizes a more compact binding site, R104 is able to interact with D178’ in α9′ and therefore no longer engages with E157’ in α8’. In turn, these changes lead to altered interactions involving α8′ and α4; for example, following reorientation of R104, E157’ is more available to interact with R49 in α4 (Fig. 6c–e). Similarly, T103 is hydrogen bonded to D53 in the DNA-bound state; upon ligand binding, which modifies the orientation of E147’ through Mg2+ (see Fig. S4), T103 switches to interact with E147’, leaving D53 to interact with R158’ instead (Fig. 6f–h). These polar interactions undergo substantial fluctuations in the apo state, some of which (e.g., R104–E157’) seem particularly correlated with key parameters such as the α4 angle (Fig. S14). In short, rearrangements of these polar interactions help propagate changes due to ligand binding to the interface between α4 and the DBDs, thus making a major contribution to the coupling between the two binding sites, as evidenced by the computed free energy landscapes shown in Fig. 4.
Figure 6:

Comparison of residual contact probabilities between ligand-bound and DNA-bound states of TetR reveals a set of polar residues near the LBD/DBD interface to undergo structural rearrangements and couple changes in the ligand binding sites and the DBDs. Rearrangements of polar interactions (dotted lines) among these residues in (a) the ligand-bound state and (b) the DNA-bound state, along with steric interactions discussed in previous work,40,47 form the basis for the anti-cooperativity between ligand and DNA bindings. Evolution of minimal key polar distances (in Å) during MD simulations are shown for (c) R104–E157’, (d) R104–D178’, (e) E157’–R49, (f) T103–D53, (g) T103–E147’, (h) R158’–D53. At a given time, the minimal hydrogen-bonding distance of all possible protomer combinations is shown for each pair.
While contact probability analysis is best at revealing rearrangements in relative position and orientation of structural motifs, internal conformational changes of these motifs and hinge residues that mediate their relative displacements are better captured by dihedral angle variations. Therefore, we also compare ϕ, ψ and χ1 angles in the three ligation states of TetR; the χ1 values are also included since they may point to major sidechain reorientations relevant to the formation of important polar or nonpolar interactions either within TetR or between TetR and DNA (e.g., Lys4842). In particular, we compare both differences based on static structures (Fig. S15) and distributions from the MD ensembles; for the latter, the difference in the distributions for each dihedral in two ligation states is quantified by the JS divergence between them.
With static structures, as shown in Fig. S16a–c, very few residues exhibit significant differences in backbone dihedral angles for both apo/ligand-bound and DNA-bound/ligand-bound comparisons, while many more sidechains exhibit large deviations in χ1 angles. With thermal fluctuations included in the JS divergence analysis (Fig. S16b–d), key trends in the backbone ϕ-ψ differences remain largely the same and implicate only a modest number of residues. This observation highlights that the conformational transitions among the three ligation states involve mainly rigid-body motions of structure motifs, thus only hinge residues experience large changes in backbone dihedrals. For example, for the comparison between ligand-bound and DNA-bound ensembles, as shown in Fig. 7a–b, residues with large JS divergence in backbone dihedrals are located near the ends of α4 (R49), α7 (e.g., N109), α8 (e.g., R158) and α9 (e.g., G181), which are either at the domain interface or near the ligand binding pocket. In addition, large JS divergence is also observed for residues in α3 (e.g., H44) and α6–l6 (e.g., L101, G102, T103); the latter region was shown to undergo a transition from α-helical to β-turn upon ligand-binding.38,41 For the sidechain χ1 angles, the number of residues that feature a large JS divergence is substantially smaller than the result of static structure analysis (Fig. S16f), and they are located mainly in the DBDs, α4, α7, α9 and l6 (see Fig. 7c). As shown in Fig. 7d, residues that exhibit consistently large dihedral changes in both static and dynamic analyses can be observed throughout different regions of the protein, although many appear to cluster near the ligand binding site or the dimer interface.
Figure 7:

Dihedral angle comparison between the ligand-bound and the DNA-bound states of TetR. (a-c) Mapping of Jessen-Shannon (JS) divergence of dihedral angle (a: ϕ, b: ψ, and c: χ1) distributions based on multiple μs simulations. The majority of the distributions do not show significant deviations, and large JS divergences are mostly observed at the end of helices or loops, suggesting that the most structural differences are largely rigid-body in nature. (d) Red residues indicate that both the static (Fig. S15) and dynamic differences of at least one of the dihedral angles in these residues rank within the top 50. For the comparison between the apo and ligand-bound states, see Fig. S17.
Distributed long-range motional correlations in the apo TetR
Finally, we analyze the motional correlations among residues in different ligation states using the extensive MD ensembles. It is motivated by the consideration that such correlations might help identify additional residues that mediate long-range coupling between the ligand and DNA binding sites, as discussed in many previous computational analyses.27–31
As shown in Fig. S18a–b, the motional covariance is substantially stronger in the apo state than the two bound states; the same trend is observed in computed mutual information (Fig. S19), which also considers non-linear correlations. This difference is further illustrated in Fig. 8a–b by showing only moderate correlations whose absolute values are between 0.3 and 0.7, since strong correlations above 0.7 are only present for neighboring residues for all ligation states (e.g., Fig. S18c). The observation of long-range moderate correlations in the apo state is consistent with the fact that the anti-cooperativity between ligand- and DNA-binding sites is evident in the apo free energy landscape (Fig. 4), while localizing the system into a single free energy basin in the bound states likely obscures the coupling between the two sites. Therefore, in further analysis of long-range coupling among residues, we focus on the apo state.
Figure 8:

Motional correlations among residues are stronger in the apo state of TetR. (a) Pearson correlation coefficients matrices of (a) the ligand-bound (upper triangle) and the DNA-bound (lower triangle) states; (b) the apo state. Only moderate correlations with absolute values between 0.3 and 0.7 are shown. (c) Residues (in red) that have moderate correlations with the ligand-contacting residues but weak correlations with the DBD residues in the apo state. For a similar analysis of residues with moderate correlations with DNA binding site but weak correlation with ligand-contacting residues, see Fig. S20. (d) Top 50 residues frequently sampled in suboptimal path analysis that connect the LBDs and DBDs in the apo state.
A close examination of the apo covariance matrix suggests that no residues feature moderate correlation with both DNA and ligand binding sites, suggesting that the coupling between them is likely mediated indirectly by multiple residues. We thus examine residues that exhibit moderate correlation with ligand-binding residues but not with the DNA binding site. As shown in Fig. 8c, a significant number of residues fall in this class and are distributed mainly in α7, α8, α9 and the bottom of α10. A similar analysis is conducted for residues that feature moderate correlations with the DNA binding site but not with ligand-binding residues. As shown in Fig. S20, these residues mainly lie near the domain interface and in α5. Since there is no direct motional coupling between the ligand and DNA binding sites, we conduct suboptimal path analysis27 to identify residues frequently sampled along short paths that connect the two binding sites (see Fig. S21). As shown in Fig. 8d, the analysis has identified residues not only in regions between the ligand and DNA binding sites, such as in α4 and l6, but also regions far from the DBDs, such as α7 and the middle region of α8.
Discussion
The general aim of our study is to better define the allostery mechanism in TetR with molecular and energetic details, and understand the unexpectedly broad distribution of allostery hotspots revealed by deep mutational scanning analysis.52 This is a challenging task because the broad hotspot distribution suggests that explaining the trend requires going beyond a single mechanism and therefore different analysis strategies compared to previous computational studies. By combining extensive MD simulations, free energy calculations and systematic comparisons of structural and dynamic (motional covariance) features of the three ligation states, we have indeed gained novel mechanistic insights and demonstrated the need of combining local (e.g., dihedral angle distribution and contact probability) and global (e.g., long-range covariance) analyses for identifying critical residues that are likely to contribute to allostery via different mechanisms. These findings lead to a thermodynamic model that rationalizes the broad distribution of hotspot residues in TetR in terms of two distinct molecular mechanisms through which mutations may modulate the phenotypes of TetR.
Allostery mechanism with molecular and energetic details
The extensive MD simulations conducted here help evaluate and enrich the allostery mechanism in TetR proposed in previous structural and computational studies.40,42,43,47 The inherent coupling between the ligand- and DNA-binding sites is best illustrated by the apo simulations, which reveal that even in the absence of DNA, the DBD and its interface with the LBD show considerable average structural deviations compared to the ligand-bound state (Fig. 5a). Moreover, the apo free energy landscape exhibits a broad but (anti-)diagonal feature in the space of the collective variables (Fig. 4 and Fig. S8) that describe the compactness of the ligand binding site and motions of the DBDs, thus demonstrating explicitly that it is energetically feasible for only one of the binding sites to adopt the binding-competent configuration. The apo state also exhibits stronger long-range correlations than the bound states, in which rigidification of the structure is observed to quench long-range correlation. Therefore, TetR simultaneously encodes overall structural flexibility (Fig. S3) and strong anti-cooperativity between the two binding sites in the apo state; this is functionally significant as flexibility and anti-cooperativity are essential to the speed and accuracy of response to changes in ligand concentration, respectively.
Our systematic comparison of structural ensembles, especially the contact probabilities, points to residues likely to dictate the strong anti-cooperativity between the two binding sites. Similar to previous structural and MD simulations,40,42,43,47 we find that the most critical residues are in the bottom of α8′, the end of α6 and the linker l6, and the N-terminus of α4. As illustrated in the schematics shown in Fig. 1c, ligand binding stabilizes a set of conformational changes that propagate through the interfaces between these helices and the linker region to ultimately modify the relative distance and orientation of the DBDs, leading to reduced DNA affinity. However, while previous analyses emphasized steric/non-polar interactions for the propagation of conformational cascade,40,47 our results have revealed the importance of a number of polar interactions (Fig. 6c–d) that have not been recognized in previous work. In particular, binding of the ligand leads to a more compact binding pocket, which modifies the interaction pattern for residues in α6–l6 (e.g., T103 and R104); this in turn changes the interactions between the bottom of α8’ (E157’, R158’) and α4 (R49, D53). Collectively, these altered polar interactions upon ligand binding, along with steric effects discussed in previous work,40,47 modify the orientation of α4 (see Fig. S14), which is an important property that correlates with the DNA binding affinity (Fig. 2b).
Distinct contributions from allostery hotspots from structural and MD analyses
The mechanistic understandings that emerge from our MD simulations and free energy calculations suggest that mutations may perturb the induction function of TetR either by disrupting the conformational cascade linking the ligand and DNA binding sites, or by altering the relative stabilities of the key free energy basins in the apo state (vide infra). Accordingly, either local or global features differentiate residues critical to allostery, reflecting their distinct physical contributions. In Table 1, we summarize the locations of the allostery hotspots identified in the deep mutational scanning analysis52 and whether they stand out in terms of various structural and dynamic properties analyzed here; these include contact probability change between ligand-bound and DNA-bound conformations, dihedral (ϕ, ψ and χ1) differences among various ligation states from static and MD ensemble analysis based on Jensen-Shannon divergence, and features associated with the motional covariance matrix of the apo state. The overlaps between critical residues identified in the current work and experimentally identified hotspots are mapped onto the protein structure in Fig. S22.
Table 1:
List of experimental allostery hotspots,52 their locations in the protein structure and whether they are featured in various structural and dynamic analyses based on MD simulations.a
| Residue | Location | Property |
|---|---|---|
| 49 | α4 | CP; JS |
| 53 | α4 | CP |
| 82 | α5 | JS; LB |
| 86 | α5 | LB |
| 100 | α6 | CP; JS; LB |
| 102 | α6 | CP; JS |
| 103 | 16 | JS; SOP; CP; LB |
| 105 | 16 | CP; LB |
| 116–118 | α7 | JS (1); COV (1); SOP (2); LB (2) |
| 120–123 | α7 | JS (1); COV (3) |
| 125–126 | 17 | JS (1); COV (1) |
| 127–129 | α8 | JS (1); COV (3) |
| 132–135 | α8 | COV (2); CP; LB (2) |
| 137–139 | α8 | COV (1); SOP (1) |
| 142–145 | α8 | — |
| 147–152 | α8 | CP (3); JS (2); COV (4); LB |
| 193–203 | α10 | JS (6) |
The analyses include CP: Contact probability; JS: Jensen-Shannon divergence for dihedral distributions; COV: covariance with ligand/DNA binding sites; SOP: Sub-optimal path analysis that connects the ligand-binding and DNA-binding sites. Numbers in parentheses indicate the number of residues identified in a given segment. “LB” indicates that the residue(s) is (are) observed to be in contact with the ligand in ligand-bound simulations (Fig. 3).
While we have focused on the WT protein simulations, it is remarkable to observe that a significant fraction of experimental hotspots is captured with a combination of computational analyses. Out of the 48 hotspots,52 less than ten have not been identified by any metrics explored here; these missing hotspots are either in the middle of a rigid helix (residues 142–145 in α8) and therefore spatially close to other hotspots (see Table 1), or in the C-terminal end of the protein (α10), which is an important part of the monomer-monomer interface and also close to ligand-contacting residues, especially those in the top half of α7. A notable observation is that different types of computational analyses are effective at picking out subsets of allostery hotspots, consistent with the notion mentioned above that these residues have distinct contributions to TetR functions. In particular, contact probability changes and dihedral distribution differences tend to identify residues near the end of several helices (α4, 6 and 8) or loop regions (e.g., l6), which readily adjust their local environment during conformational transitions (see Figs. 6a and S22a); these residues are likely to participate in propagating conformational cascades and therefore contribute to inter-domain coupling, as discussed in the previous subsection. On the other hand, a residue that experiences limited local changes but exhibits long-range motional covariance with ligand- or DNA-binding sites can also be an allostery hotspot (Table 1). Perturbation (mutation) is expected to induce changes at sites correlated with the mutation site, thus leading to shifts in the free energy landscape; for example, within the linear response framework,84–86 the response of residue j due to a perturbation at residue i is directly proportional to the motional covariance between the two sites. As shown in Fig. S22b,d, a significant number of experimental hotspots appear to fall into this class and reside in the ligand binding domain, especially the middle and top regions of α7 and α8, suggesting that they contribute via controlling the intra-domain energetics of the LBDs.
It is important to recognize that the experimental hotspots are a subset of residues that have been identified as being relevant to allostery in our analyses. For example, quite a few residues identified via dihedral distributions (compare Figs. 7d and S22a), motional correlation with the ligand/DNA-binding sites (compare Figs. 8c, S20 and S22b–c), and suboptimal path analysis (compare Figs. 8d and S22d) are not part of the experimental hotspot set. These observations highlight that the allostery network is likely to have a considerable degree of degeneracy,52 which is essential to the robustness of function during protein evolution. In other words, while some residues participate in allostery in the WT protein, their mutations may have a limited functional impact. Therefore, these residues are not identified as hotspot residues in deep mutational scanning analysis. For example, while several charged residues at the C-terminus of α8 (E157 and R158) are observed to relay conformational changes to α4, they were not identified as hotspot residues,52 likely because the C-terminal region of α8 features several other charged amino acids (K155, E156, E159), thus the roles of E157/R158 can be taken up by these residues in the relevant mutants; these additional charged residues do not stand out in our analysis because E157/R158 is in a better position to interact with α4 in the WT protein. Similarly, mutation effects of surface residues are likely to be compensated by nearby groups, thus residues featuring high solvent-accessible-surface-areas (SASAs) are less likely to show up as hotspots even if they participate in the WT allostery network. Indeed, most hotspots have low SASA values with an average of 31 Å2 excluding α10 (see Table S1); even including α10, there are only 10 out of 48 that have SASA values above 60 Å2. By contrast, residues identified by the JS divergence of dihedral distributions but are not part of the hotspot set have an average SASA larger than 60 Å2; among those, there are 10 residues with very low SASA values, and 3 of them (L101, T106, A154) are very close to hotspots, while the rest are mainly in α10, half of which are allostery hotspots.52
Overall, our observation that various computational analyses have captured different subsets of hotspots indicates that these residues modulate allostery in distinct ways. On the one hand, this explains why only a small subset of hotspots is usually picked up by one analysis method (Fig. S22, Table 1); on the other hand, the observation highlights that a comprehensive analysis based on both local and global properties is required to understand the underlying mechanism of allostery.
Two types of allostery hotspots: an MWC-like model for TetR
The key motivation for our work is to provide a physical rationale for the broad distribution of allostery hotspots identified in the deep mutational scanning study52 as well as for the observation from current MD analysis that different groups of hotspot residues appear to contribute to allostery in distinct ways. Motivated by these considerations, we develop an MWC-like statistical thermodynamic model12 for TetR populations to qualitatively explain these trends. In this regard, it is worth recognizing that the deep mutational scanning experiment defined the allostery hotspots in terms of functional readout, which is also related to the specific (DNA-bound) population of TetR when the cell is exposed to a consistent concentration of ligand.
As shown in the schematics in Fig. 9a, it is natural to divide TetR into two domains that bind to the ligand (L) and DNA (D), respectively; for energetics, we ignore the dimeric nature of TetR here, although this feature can be included in a more complex model. In the vein of the MWC model,12,56 we surmise that each domain has two distinct conformations that differ in binding affinity; for simplicity, we assume that the relaxed/inactive conformation exhibits negligible binding affinity, while binding occurs only in the active conformation, which is higher in free energy than the inactive conformation in the absence of binding. In addition, there is a free energy penalty for both domains to assume the active, binding-competent conformation. Therefore, the key parameters in the model are the two intra-domain activation free energies (ϵL for the LBD and ϵD for the DBD), and the inter-domain coupling free energy, γ, which explicitly controls the anti-cooperativity (i.e., when γ > 0) between the two domains; the intrinsic binding affinities of the active conformations to the ligand and DNA are also parameters (ΔGL, ΔGD), although they can be regarded as constants when mutations are far from the corresponding binding sites and therefore no longer considered explicitly in subsequent discussions.
Figure 9:

An MWC-like model that qualitatively distinguishes two types of allostery hotspots related to the deep mutational scanning analysis of TetR. (a) Both the LBD (L) and DBD (D) can adopt two conformations; the relaxed/inactive conformation is incapable of binding, while the active conformation is binding competent but lies at a higher free energy (ϵL for L and ϵD for D); there is an unfavorable coupling free energy γ that disfavors both domains from adopting the active conformations. The intrinsic binding free energies for the active conformations are indicated as ΔGL and ΔGD in the schematic free energy diagrams in (b-d). A complete description is included in Fig. S23. (b) The qualitative free energy diagram for the WT protein, which generally favors the ligand-bound state as the dominant population (red), is consistent with the induction function of TetR. (c) Mutations of allostery hotspots that control γ lead to the dead (non-inducible) phenotype since the doubly-bound state becomes the dominant population (red). (d) Mutations of allostery hotspots that control ϵL can also lead to the non-inducible phenotype by reducing the binding-competent population of L, resulting in the DNA-bound state as the dominant state (red).
TetR may adopt four different states: unbound (LE · DE), ligand-bound (LB · DE), DNA-bound (LE · DB) and doubly-bound (LB · DB); a complete description is included in Fig. S23. The populations of these states, at given ligand and DNA concentrations, depend only on the three key parameters, ϵL, ϵD and γ. With proper parameters (Fig. 9b), the dominant population is ligand-bound, which corresponds to the normal function of TetR. With mutations, the thermodynamic parameters are perturbed, which may lead to situations where the dominant population is either DNA-bound or doubly-bound, resulting in the non-inducible phenotype observed in experiments. The important realization is that mutations may lead to the non-inducible phenotype through two distinct mechanisms: they may either perturb inter-domain coupling, (γ, see Fig. 9c), which reduces the anti-cooperativity between the two domains and therefore leads to the doubly-bound TetR as the major population, or perturb intra-domain properties (ϵL, ϵD see Fig. 9d), which, for example, reduces the binding-competent population of the LBD that, in turn, leads to reduction of the ligand-bound population of TetR. We distinguish the latter case from “trivial” mutations in the ligand-binding pocket that explicitly modify the intrinsic ligand binding affinity (ΔGL), thus our analysis generally excludes residues in direct contact with the ligand.
This thermodynamic model is consistent with the broad distribution of allostery hotspots identified in the deep mutational scanning experiments. While hotspots that modulate inter-domain coupling are expected to be located in regions between the two domains, such as R49 in α4, G102 in l6 and H151 in α8, hotspots that perturb intra-domain energetics do not have to be near the domain interface and can be more broadly distributed in the protein structure. For example, hotspots that perturb ϵL are likely in the LBD but not in immediate contact with the ligand; this is consistent with experimental observation (Fig. 1b), as well as our motional covariance-based analyses (Fig. S22b–d), that many hotspots are in the upper half of TetR, far from the DBD but potentially perturb the energetic properties of the ligand binding pocket. Moreover, another remarkable observation from the deep mutational scanning study52 was that the loss of allostery due to mutation of hotspot residues could be rescued in multiple ways by additional mutations, suggesting a considerable degree of degeneracy in the allosteric network. These rescuing mutations are also distributed throughout the protein structure, especially the upper half of TetR, including, for example, the C-terminus of α10; these residues likely restore allostery via perturbation of intra-domain properties.
The computed free energy landscapes from detailed atomistic simulations are, in fact, qualitatively consistent with the thermodynamic model. For the apo state, the thermodynamic model suggests that in the presence of strong anti-cooperativity between the two domains, three diagonal-shape free energy basins are expected to be present (Fig. 9), which correspond to LE · DE, LB · DE and LE · DB, respectively, with the latter two expected to resemble the states with ligand and DNA bound, respectively. These features are indeed observed in the computed apo free energy landscape as shown in Fig. 4, in which the middle basin along the diagonal represents LE · DE, thus providing explicit support for our two-domain model.
Implications to the analysis of allostery in other systems
Foremost, our study highlights that many residues make essential contributions to allostery with distinct mechanisms, either by mediating conformational cascades across domain interfaces or by modulating energetic properties within a domain. Previous studies usually focused on one of the mechanisms. For example, most molecular dynamics simulations,27–31 structural24,25 and motion34,35 based analyses focused entirely on residues that either propagate conformational changes (e.g., hinge residues) or undergo conformational transitions themselves; accordingly, such analyses generally aimed to identify a group of residues that form pathways of “information transfer” between the functional and effector sites. By contrast, analyses based entirely on thermodynamic models13,56 argued against the existence of specific allostery pathways87 and emphasized a holistic view of residual contributions in terms of modulating free energies (or statistical weights) of key conformational states.12 Our analysis of TetR suggests that the two “views” are not in conflict, as both types of allostery hotspots are expected to exist and contribute to allostery in a given system. Therefore, we expect that the broad distribution of allostery hotspots is not limited to TetR, further supporting the notion that allostery can be modulated by perturbing not only deeply buried residues, but also surface residues,23 including cryptic binding sites.60,61 In other words, adopting the perspective of broad hotspot distributions will considerably broaden the strategies to engineer allostery in proteins.
The broad distribution of allostery hotspots points to a considerable degree of redundancy in the allosteric network, which was further highlighted by the distribution of rescuing mutations identified for TetR.52 As discussed in the previous work,52 mutation of allostery hotspots led to non-inducible (dead) phenotypes, whose inducibility could be rescued by additional mutations; however, different dead mutants could be rescued with different numbers of mutations (i.e., the hotspots featured varying degrees of “rescuabilities”). For biomedical applications, therefore, a potential implication is that successful design of allostery inhibition7,8 needs to target allostery hotspots that are particularly difficult to rescue, as otherwise allostery can be readily restored with additional mutations, leading to abolishment of inhibition. Evidently, a future endeavor is to develop computational methodologies that help not only identify allostery hotspots but also evaluate their rescuabilities, which are likely related to the degree that the underlying free energy landscape is perturbed by the hotspot mutations.
In terms of identifying hotspot residues for mechanistic or engineering studies, our general strategy of integrating multiple computational analyses at both local and global scales is applicable to other systems. In particular, we find that analysis of JS divergence is valuable for selecting collective variables that best characterize conformational changes and identifying residues that undergo major local conformational transitions. While covariance matrix-based analyses have been used extensively in the literature for identifying allosteric pathways between functional and effector sites,27–35 our work highlights that such analyses do not have to be limited to inter-domain communications; instead, judiciously designed covariance analyses can also be useful for identifying sites that modulate allostery by controlling intra-domain energetics. Along this line, for systems that are well described by a conformational selection model, our study suggests that focusing on the covariance-based analyses in the apo state is likely most productive. With increasing computational power,88 MD simulations hold great promise in helping test engineering ideas and designs, although systematic comparisons between computations and experiments,35,61 as done in the current work, are required as essential validations.
Regarding computational cost, the local scale analyses (e.g., for contacts and dihedral angles) do not require very extensive samplings, especially when reliable structural models exist for multiple ligation/functional states. The global scale analyses, especially free energy landscape calculations, however, require extensive samplings due to the slow convergence of global properties. Enhanced sampling methodologies,89 including innovative techniques for identifying appropriate collective variables that best describe the relevant conformational transitions,90 can be used to reduce the computational demand. Nevertheless, as highlighted by our work, the key is to integrate multiple types of analyses for a comprehensive identification of allostery hotspots. It is worth emphasizing that computations, especially those focusing on the WT protein, are likely to identify a larger set of residues as hotspot candidates (compare Figs. 7d, 8c, 8d, S20 with Fig. S22). In the absence of experimental data, therefore, it is valuable to conduct mutant simulations to explicitly establish the roles of the hotspot candidates. For this purpose, computational methodologies that allow efficient screening of a large number of sites, such as multi-site λ dynamics,91,92 are particularly attractive.
In the current work, we have developed a thermodynamic model at a qualitative level. Nevertheless, the model goes beyond the classical two-state model for allostery in TetR.40,47 Together with observations from extensive MD simulations, it provides a plausible explanation for the unexpectedly distributed nature of allostery hotspots observed in the deep mutational scanning study.52 In future studies, it is of interest to extract the key thermodynamic parameters (ϵL, ϵD and γ) based on either molecular dynamics simulations (e. g., free energy simulations with multi-site λ dynamics91,92) or experimental data such as the level of induction with different ligand concentrations.14 Establishing how these parameters depend on molecular and sequence features93 with techniques such as machine learning94–97 can then lead to a powerful approach for guiding the design of allostery in proteins. For example, it is difficult for deep mutational scanning to thoroughly explore the combinations of multiple mutations since the number of corresponding sequences grows exponentially. Therefore, guidance from a predictive computational model that is able to explore higher order mutations and epistasis will be extremely valuable for the engineering of allosteric systems that exhibit novel phenotypes.98
Conclusion
While allostery has been a topic of intense interest for the past several decades, our understanding of the underlying mechanism at the molecular level continues to be challenged by new experimental observations. In recent years, deep mutational scanning analysis has provided a function-centric approach to identify allostery hotspots in a comprehensive and unbiased fashion, leading to observations not entirely expected based on traditional mechanistic considerations. Specifically, a recent deep mutational scanning study of a bacterial transcription factor, TetR,52 found that allostery hotspot residues are broadly distributed over a major portion of the protein structure, rather than being clustered near the ligand-binding and DNA-binding domain interfaces as often discussed in structure-based studies.39,40,42,47 Similarly, loss of inducibility due to mutation of hotspots could be rescued by additional mutations that were also broadly distributed throughout the protein. These findings suggest that the contributions of hotspot residues are unlikely explained by a single mechanism, thus calling for different analysis strategies compared to previous computational studies.
Motivated by these considerations, we have conducted extensive (up to 50 μs) molecular dynamics simulations and free energy calculations for different ligation states of TetR. The computed free energy landscapes explicitly illustrate that allostery in TetR is well described by a conformational selection model, in which the apo state samples a broad set of conformations, and specific ones are selectively stabilized by either ligand or DNA binding. By examining a range of structural and dynamic properties of residues at both local and global scales, we find that various computational analyses capture different subsets of experimentally identified hotspots, supporting that these residues modulate allostery in distinct ways.
These computational results motivate the development of an MWC-like thermodynamic model that qualitatively explains the broad distribution of hotspot residues and their distinct features in molecular dynamics simulations. The key realization is that allostery hotspots may contribute by either mediating inter-domain communications or intra-domain energetics. Thus our analysis highlights that the “communication pathway”24,25,27–35 and “shifting ensemble”13,56 views of protein allostery are not in conflict, since both types of allostery hotspots are likely to contribute in a single system, which explains their broad distribution. Taking this perspective about allostery hotspots will broaden the strategies to modulate protein allostery in mechanistic and engineering studies.
Supplementary Material
Acknowledgement
We acknowledge discussions with Prof. S. Raman throughout this project and his comments on the manuscript. This work was supported by R01-GM106443 and R35-GM141930. Computational resources from the Extreme Science and Engineering Discovery Environment (XSEDE99), which is supported by NSF grant number ACI-1548562, are greatly appreciated; part of the computational work was performed on the Shared Computing Cluster which is administered by Boston University’s Research Computing Services (URL: www.bu.edu/tech/support/research/). Anton 2 computer time was provided by the Pitts-burgh Supercomputing Center (PSC, award MCB200062P) through Grant R01GM116961 from the National Institutes of Health. The Anton 2 machine at PSC was generously made available by D.E. Shaw Research.
Footnotes
Supporting Information Available
Additional structural, energetic and covariance analyses of TetR based on MD simulations; also included are the illustration of experimentally determined hotspots in TetR captured by the various computational analyses and a more detailed thermodynamic model for the different ligation states of TetR. This material is available free of charge via the Internet at http://pubs.acs.org/.
References
- (1).Monod J; Wyman J; Changeux J-P On the nature of allosteric transitions: a plausible model. J. Mol. Biol 1965, 12, 88–118. [DOI] [PubMed] [Google Scholar]
- (2).Koshland DEJ; Nemethy G; Filmer D Comparison of experimental binding data and theoretical models in proteins containing subunits. Biochem. 1966, 5, 365–385. [DOI] [PubMed] [Google Scholar]
- (3).Alberts B; Bray D; Lewis J; Raff M; Roberts K; Watson JD Molecular biology of the cell; Garland Publishing, Inc., 1994. [Google Scholar]
- (4).Yu EW; Koshland DEJ Propagating conformational changes over long (and short) distances in proteins. Proc. Natl. Acad. Sci. U.S.A 2001, 98, 9517–9520. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (5).Changeux J-P; Edelstein SJ Allosteric mechanisms of signal transduction. Science 2005, 308, 1424–1428. [DOI] [PubMed] [Google Scholar]
- (6).Nussinov R; Tsai CJ Allostery in Disease and in Drug Discovery. Cell 2013, 153, 293–305. [DOI] [PubMed] [Google Scholar]
- (7).Chatzigoulas A; Cournia Z Rational design of allosteric modulators: Challenges and successes. WIREs Comput. Mol. Sci 2021, 11, e1529. [Google Scholar]
- (8).Ni D; Chai Z; Wang Y; Li M; Yu Z; Liu Y; Lu S; Zhang J Along the allostery stream: Recent advances in computational methods for allosteric drug discovery. WIREs Comput. Mol. Sci 2021, 11, e1585. [Google Scholar]
- (9).Taylor ND; Garruss AS; Moretti R; Chan S; Arbing MA; Cascio D; Rogers JK; Isaacs FJ; Kosuri S; Baker D; Fields S; Church GM; Raman S Engineering an allosteric transcription factor to respond to new ligands. Nat. Methods 2016, 13, 177–183. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (10).Cui Q; Karplus M Allostery and cooperativity revisited. Prot. Sci 2008, 17, 1295–1307. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (11).Changeux JP Allostery and the Monod-Wyman-Changeux Model After 50 Years. Annu. Rev. Biophys 2012, 41, 103–133. [DOI] [PubMed] [Google Scholar]
- (12).Marzen S; Garcia HG; Phillips R Statistical Mechanics of Monod-Wyman-Changeux (MWC) Models. J. Mol. Biol 2013, 425, 1433–1460. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (13).Motlagh HN; Wrabl JO; Li J; Hilser VJ The ensemble nature of allostery. Nature 2014, 508, 331–339. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (14).Chure G; Razo-Mejia M; Belliveau NM; Einav T; Kaczmarek ZA; Barnes SL; Lewis M; Phillips R Predictive shifts in free energy couple mutations to their phenotypic consequences. Proc. Natl. Acad. Sci. U.S.A 2019, 116, 18275–18284. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (15).Schrank TP; Bolen DW; Hilser VJ Rational modulation of conformational fluctuations in adenylate kinase reveals a local unfolding mechanism for allostery and functional adaptation in proteins. Proc. Natl. Acad. Sci. U.S.A 2009, 106, 16984–16989. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (16).Dokholyan NV Controlling allosteric networks in proteins. Chem. Rev 2016, 116, 6463–6487. [DOI] [PubMed] [Google Scholar]
- (17).Wodak SJ et al. , Allostery in Its Many Disguises: From Theory to Applications. Structure 2019, 27, 566–578. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (18).Masterson LR; Mascioni A; Traaseth NJ; Taylor SS; Veglia V Allosteric cooperativity in protein kinase A. Proc. Natl. Acad. Sci. U.S.A 2008, 105, 506–511. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (19).Tzeng SR; Kalodimos CG Protein dynamics and allostery: an NMR view. Curr. Opin. Struct. Biol 2011, 21, 62–67. [DOI] [PubMed] [Google Scholar]
- (20).Lisi GP; Loria JP Solution NMR Spectroscopy for the Study of Enzyme Allostery. Chem. Rev 2016, 116, 6323–6369. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (21).Mazal H; Haran G Single-molecule FRET methods to study the dynamics of proteins at work. Curr. Opin. Biomed. Engr 2019, 12, 8–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (22).Suel GM; Lockless SW; Wall MA; Ranganathan R Evolutionarily conserved networks of residues mediate allosteric communication in proteins. Nat. Struct. Biol 2003, 10, 59–69. [DOI] [PubMed] [Google Scholar]
- (23).Reynolds KA; McLaughlin RN; Ranganathan R Hot Spots for Allosteric Regulation on Protein Surfaces. Cell 2011, 147, 1564–1575. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (24).Xu CY; Tobi D; Bahar I Allosteric changes in protein structure computed by a simple mechanical model: Hemoglobin T ↔ R2 transition. J. Mol. Biol 2003, 333, 153–168. [DOI] [PubMed] [Google Scholar]
- (25).Wang J; Jain A; McDonald LR; Gambogi C; Lee AL; Dokholyan NV Mapping allosteric communications within individual proteins. Nat. Comm 2020, 11, 3862. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (26).Kohlhoff KJ; Shukla D; Lawrenz M; Bowman GR; Konerding DE; Belov D; Altman RB; Pande VS Cloud-based simulations on Google Exacycle reveal ligand modulation of GPCR activation pathways. Nat. Chem 2014, 6, 15–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (27).Sethi A; Eargle J; Black AA; Luthey-Schulten Z Dynamical networks in tRNA: protein complexes. Proc. Natl. Acad. Sci. U.S.A 2009, 106, 6620–6625. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (28).VanWart A; Eargle J; Luthey-Schulten Z; Amaro RE Exploring Residue Component Contributions to Dynamical Network Models of Allostery. J. Chem. Theory Comput 2012, 8, 2949–2961. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (29).Rivalta I; Batista V In Allostery: Methods and Protocols; DiPaola L, Giuliani A, Eds.; Methods in Molecular Biology; Springer, 2021; Vol. 2253; Chapter Community Network Analysis of Allosteric Proteins, pp 137–151. [DOI] [PubMed] [Google Scholar]
- (30).Guo J; Zhou HX Protein allostery and conformational dynamics. Chem. Rev 2016, 116, 6503–6515. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (31).Colizzi F; Orozco M Probing allosteric regulations with coevolution-driven molecular simulations. Sci. Adv 2021, 7, eabj0786. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (32).Doshi U; Holliday MJ; Eisenmesser EZ; Hamelberg D Dynamical network of residue-residue contacts reveals coupled allosteric effects in recognition, catalysis, and mutation. Proc. Natl. Acad. Sci. U.S.A 2016, 113, 4735–4740. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (33).Negre CFA; Morzan UN; Hendrickson HP; Pal R; Lisi GP; Loria JP; Rivalta I; Ho JM; Batista VS Eigenvector centrality for characterization of protein allosteric pathways. Proc. Natl. Acad. Sci. U.S.A 2018, 115, E12201–E12208. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (34).East KW; Newton JC; Morzan UN; Narkhede YB; Acharya A; Skeens E; Jogl G; Batista VS; Palermo G; Lisi GP Allosteric Motions of the CRISPR-Cas9 HNH Nuclease Probed by NMR and Molecular Dynamics. J. Am. Chem. Soc 2020, 142, 1348–1358. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (35).Nierzwicki L; East KW; Morzan UN; Arantes PR; Batista VS; Lisi GP; Palermo G Enhanced specificity mutations perturb allosteric signaling in CRISPR-Cas9. eLife 2021, 10, e73601. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (36).Cuthbertson L; Nodwell JR The TetR Family of Regulators. Microbiol. Mol. Biol. Rev 2013, 77, 440–475. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (37).Stanton BC; Siciliano V; Ghodasara A; Wroblewska L; Clancy K; Trefzer AC; Chesnut JD; Weiss R; Voigt CA Systematic Transfer of Prokaryotic Sensors and Circuits to Mammalian Cells. ACS Synth. Biol 2014, 3, 880–891. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (38).Hinrichs W; Kisker C; Düvel M; Müller A; Tovar K; Hillen W; Saenger W Structure of the Tet repressor-tetracycline complex and regulation of antibiotic resistance. Science 1995, 264, 418–420. [DOI] [PubMed] [Google Scholar]
- (39).Kisker C; Hinrichs W; Tovar K; Hillen W; Saenger W The complex formed between Tet repressor and tetracycline-Mg2+ reveals mechanism of antibiotic resistance. J. Mol. Biol 1995, 247, 260–280. [DOI] [PubMed] [Google Scholar]
- (40).Orth P; Schnappinger D; Hillen W; Saenger W; Hinrichs W Structural basis of gene regulation by the tetracycline inducible Tet repressor–operator system. Nat. Struct. Biol 2000, 7, 215–219. [DOI] [PubMed] [Google Scholar]
- (41).Saenger W; Orth P; Kisker C; Hillen W; Hinrichs W The tetracycline repressor – a paradigm for a biological switch. Angew. Chem. Int. Ed. Engl 2000, 39, 2042–2052. [DOI] [PubMed] [Google Scholar]
- (42).Aleksandrov A; Schuldt L; Hinrichs W; Simonson T Tet Repressor Induction by Tetracycline: A Molecular Dynamics, Continuum Electrostatics, and Crystallographic Study. J. Mol. Biol 2008, 378, 898–912. [DOI] [PubMed] [Google Scholar]
- (43).Haberl F; Lanig H; Clark T Induction of the tetracycline repressor: Characterization by molecular-dynamics simulations. Proteins: Struct. Funct. & Bioinf 2009, 77, 857–866. [DOI] [PubMed] [Google Scholar]
- (44).Seedorff JE; Rodgers ME; Schleif R Opposite allosteric mechanisms in TetR and CAP. Prot. Sci 2009, 18, 775–781. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (45).Zhou Y; Reichheld SE; Savchenko A; Parkinson J; Davidson AR A Comprehensive Analysis of Structural and Sequence Conservation in the TetR Family Transcriptional Regulators. J. Mol. Biol 2010, 400, 847–864. [DOI] [PubMed] [Google Scholar]
- (46).Sevvana M; Goetz C; Goeke D; Wimmer C; Berens C; Hillen W; Muller YA An Exclusive α/β Code Directs Allostery in TetR–Peptide Complexes. J. Mol. Biol 2012, 416, 46–56. [DOI] [PubMed] [Google Scholar]
- (47).Werten S; Schneider J; Palm GJ; Hinrichs W Modular organisation of inducer recognition and allostery in the tetracycline repressor. FEBS J. 2016, 283, 2102–2114. [DOI] [PubMed] [Google Scholar]
- (48).Palm GJ; Buchholz I; Werten S; Girbardt B; Berndt L; Delcea M; Hinrichs W Thermodynamics, cooperativity and stability of the tetracycline repressor T (TetR) upon tetracycline binding. Biochim. Biophys. Acta - Prot. & Proteom 2020, 1868, 140404. [DOI] [PubMed] [Google Scholar]
- (49).Takeuchi K; Imai M; Shimada I Conformational equilibrium defines the variable induction of the multidrug-binding transcriptional repressor QacR. Proc. Natl. Acad. Sci. U.S.A 2019, 116, 19963–19972. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (50).Müller G; Hecht B; Helbl V; Hinrichs W; Saenger W; Hillen W Characterization of non-inducible Tet repressor mutants suggests conformational changes necessary for induction. Nat. Struct. Biol 1995, 2, 693–703. [DOI] [PubMed] [Google Scholar]
- (51).Hecht B; Müller G; Hillen W Noninducible Tet Repressor Mutations Map from the Operator Binding Motif to the C Terminus. J. Bacter 1993, 175, 1206–1210. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (52).Leander M; Yuan Y; Mager A; Cui Q; Raman S Functional Plasticity and Evolutionary Adaptation of Allosteric Regulation. Proc. Natl. Acad. Sci. U.S.A 2020, 117, 25445–25454. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (53).Fowler DM; Fields S Deep mutational scanning: a new style of protein science. Nat. Methods 2014, 11, 801–807. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (54).Lanig H; Othersen OG; Beierlein FR; Seidel U; Clark T Molecular Dynamics Simulations of the Tetracycline-repressor Protein: The Mechanism of Induction. J. Mol. Biol 2006, 359, 1125–1136. [DOI] [PubMed] [Google Scholar]
- (55).Schütze H; Manning CD Foundations of Statistical Natural Language Processing; MIT Press: Cambridge, MA, USA, 1999. [Google Scholar]
- (56).Hilser VJ; Wrabl JO; Motlagh HN Structural and energetic basis of allostery. Annu. Rev. Biophys 2012, 41, 585–609. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (57).Lockless SW; Ranganathan R Evolutionarily conserved pathways of energetic connectivity in protein families. Science 1999, 286, 295–299. [DOI] [PubMed] [Google Scholar]
- (58).Tzeng SR; Kalodimos CG Dynamic activation of an allosteric regulatory protein. Nature 2009, 462, 368–372. [DOI] [PubMed] [Google Scholar]
- (59).Jain D Allosteric Control of Transcription in GntRFamily of Transcription Regulators: AStructural Overview. IUBMB Life 2015, 67, 556–563. [DOI] [PubMed] [Google Scholar]
- (60).Kuzmanic A; Bowman GR; Juarez-Jimenez J; Michel J; Gervasio FL Investigating Cryptic Binding Sites by Molecular Dynamics Simulations. Acc. Chem. Res 2020, 53, 654–661. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (61).Knoverek CR; Mallimadugula UL; Singh S; Rennella E; Frederick TE; Yuwen T; Raavicharla S; Kay LE; Bowman GR Opening of a cryptic pocket in β-lactamase increases penicillinase activity. Proc. Acad. Natl. Sci. U.S.A 2021, 118, e2106473118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (62).Vanommeslaeghe K; Hatcher E; Acharya C; Kundu S; Zhong S; Shim J; Darian E; Guvench O; Lopes P; Vorobyov I; Mackerell AD CHARMM general force field: A force field for drug-like molecules compatible with the CHARMM all-atom additive biological force fields. J. Comput. Chem 2010, 31, 671–690. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (63).Aleksandrov A; Simonson T The tetracycline : Mg2+ complex: A molecular mechanics force field. J. Comput. Chem 2006, 27, 1517–1533. [DOI] [PubMed] [Google Scholar]
- (64).Aleksandrov A; Proft J; Hinrichs W; Simonson T Protonation patterns in tetracycline : Tet repressor recognition: Simulations and experiments. ChemBioChem 2007, 8, 675–685. [DOI] [PubMed] [Google Scholar]
- (65).Huang J; MacKerell AD Jr., CHARMM36 all-atom additive protein force field: Validation based on comparison to NMR data. J. Comput. Chem 2013, 34, 2135–2145. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (66).Sali A; Blundell TL Comparative protein modelling by satisfaction of spatial restraints. J. Mol. Biol 1993, 234, 779–815. [DOI] [PubMed] [Google Scholar]
- (67).Jo S; Kim T; Iyer VG; Im W CHARMM-GUI: a web-based graphical user interface for CHARMM. J. Comput. Chem 2008, 29, 1859–1865. [DOI] [PubMed] [Google Scholar]
- (68).Jorgensen WL; Chandrasekhar J; Madura JD; Impey RW; Klein ML Comparison of simple potential functions for simulating liquid water. J. Chem. Phys 1983, 79, 926–935. [Google Scholar]
- (69).Darden T; York D; Pedersen L Particle mesh Ewald: An Nlog (N) method for Ewald sums in large systems. J. Chem. Phys 1993, 98, 10089–10092. [Google Scholar]
- (70).Brooks BR et al. CHARMM: The Biomolecular Simulation Program. J. Comput. Chem 2009, 30, 1545–1614. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (71).Hart K; Foloppe N; Baker CM; Denning EJ; Nilsson L; MacKerell AD Jr., Optimization of the CHARMM additive force field for DNA: Improved treatment of the BI/BII conformational equilibrium. J. Chem. Theory Comput 2012, 8, 348–362. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (72).Eastman P; Swails J; Chodera JD; McGibbon RT; Zhao Y; Beauchamp KA; Wang LP; Simmonett AC; Harrigan MP; Stern CD et al. , OpenMM 7: Rapid Development of High Performance Algorithms for Molecular Dynamics. PLoS Comput. Biol 2017, 13, e1005659. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (73).Shaw DE et al. Anton 2: Raising the Bar for Performance and Programmability in a Special-Purpose Molecular Dynamics Supercomputer. IEEE 2014, 41–53. [Google Scholar]
- (74).Lippert RA; Predescu C; Ierardi DJ; Mackenzie KM; Eastwood MP; Dror RO; Shaw DE Accurate and efficient integration for molecular dynamics simulations at constant temperature and pressure. J. Chem. Phys 2013, 139, 164106. [DOI] [PubMed] [Google Scholar]
- (75).Nosé S A unified formulation of the constant temperature molecular dynamics methods. J. Chem. Phys 1984, 81, 511–519. [Google Scholar]
- (76).Hoover WG Canonical dynamics: Equilibrium phase-space distributions. Phys. Rev. A 1985, 31, 1695–1697. [DOI] [PubMed] [Google Scholar]
- (77).Martyna GJ; Tobias DJ; Klein ML Constant pressure molecular dynamics algorithms. J. Chem. Phys 1994, 101, 4177–4189. [Google Scholar]
- (78).Predescu C; Lerer AK; Lippert RA; Towles B; Grossman JP; Shaw DE The u-series: A separable decomposition for electrostatics computation with improved accuracy. J. Chem. Phys 2020, 152, 084113. [DOI] [PubMed] [Google Scholar]
- (79).Tuckerman M; Berne BJ; Martyna GJ Reversible multiple time scale molecular dynamics. J. Chem. Phys 1992, 97, 1990–2001. [Google Scholar]
- (80).Zhuravlev PI; Papoian GA Protein functional landscapes, dynamics,allostery : a tortuous path towards a universaltheoretical framework. Q. Rev. Biophys 2010, 43, 295–332. [DOI] [PubMed] [Google Scholar]
- (81).Schneidman-Duhovny D; Hammel M; Tainer JA; Sali A FoXS, FoXSDock and MultiFoXS: Single-state and multi-state structural modeling of proteins and their complexes based on SAXS profiles. Nuc. Acids Res 2016, 44, W424–W429. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (82).Lange OF; Grubmüller H Generalized correlation for biomolecular dynamics. Proteins: Struct. Funct. & Bioinf 2005, 62, 1053–1061. [DOI] [PubMed] [Google Scholar]
- (83).Grant BJ; Rodrigues APC; ElSawy KM; McCammon JA; Caves LSD Bio3D: An R package for the comparative analysis of protein structures. Bioinform. 2006, 22, 2695–2696. [DOI] [PubMed] [Google Scholar]
- (84).Ikeguchi M; Ueno J; Sato M; Kidera A Protein structural change upon ligand binding: Linear response theory. Phys. Rev. Lett 2005, 94, 078102. [DOI] [PubMed] [Google Scholar]
- (85).Wynsberghe AWV; Cui Q Interpretating correlated motions using normal mode analysis. Structure 2006, 14, 1647–1653. [DOI] [PubMed] [Google Scholar]
- (86).Zheng WJ; Brooks BR; Thirumalai D Low-frequency normal modes that describe allosteric transitions in biological nanomachines are robust to sequence variations. Proc. Acad. Natl. Sci. U.S.A 2006, 103, 7664–7669. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (87).Pan H; Lee JC; Hilser VJ Binding sites in Escherichia coli dihydrofolate reductase communicate by modulating the conformational ensemble. Proc. Natl. Acad. Sci. U.S.A 2000, 97, 12020–12025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (88).Zimmerman MI et al. SARS-CoV-2 simulations go exascale to predict dramatic spike opening and cryptic pockets across the proteome. Nat. Chem 2021, 13, 651–659. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (89).Valsson O; Tiwary P; Parrinello M Enhancing Important Fluctuations: Rare Events and Metadynamics from a Conceptual Viewpoint. Annu. Rev. Phys. Chem 2016, 67, 159–184. [DOI] [PubMed] [Google Scholar]
- (90).Wang YH; Ribeiro JML; Tiwary P Machine learning approaches for analyzing and enhancing molecular dynamics simulations. Curr. Opin. Struct. Biol 2020, 61, 139–145. [DOI] [PubMed] [Google Scholar]
- (91).Raman EP; Paul TJ; Hayes RL; Brooks III CL, Automated, Accurate, and Scalable Relative Protein–Ligand Binding Free-Energy Calculations Using Lambda Dynamics. J. Chem. Theory Comput 2020, 16, 7895–7914. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (92).Hayes RL; Buckner J; Brooks III CL, BLaDE: A Basic Lambda Dynamics Engine for GPU-Accelerated Molecular Dynamics Free Energy Calculations. J. Chem. Theory Comput 2021, 17, 6799–6807. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (93).Levy RM; Haldane A; Flynn WF Potts Hamiltonian models of protein co-variation, free energy landscapes, and evolutionary fitness. Curr. Opin. Struct. Biol 2017, 43, 55–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (94).Garruss AS; Collins KM; Church GM Deep representation learning improves prediction of LacI-mediated transcriptional repression. Proc. Natl. Acad. Sci. U.S.A 2021, 118, e2022838118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (95).Gelman S; Fahlberg SA; Heinzelman P; Romero PA; Gitter A Neural networks to learn protein sequence–function relationships from deep mutational scanning data. Proc. Natl. Acad. Sci. U.S.A 2021, 118, e2104878118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (96).Biswas S; Khimulya G; Alley EC; Esvelt KM; Church GM Low-N protein engineering with data-efficient deep learning. Nat. Methods 2021, 18, 389–396. [DOI] [PubMed] [Google Scholar]
- (97).Luo Y; Jiang G; Yu T; Liu Y; Vo L; Ding H; Su Y; Qian WW; Zhao H; Peng J ECNet is an evolutionary context-integrated deep learning framework for protein engineering. Nat. Comm 2021, 12, 5743. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (98).Tack DS; Tonner PD; Pressman A; Olson ND; Levy SF; Romantseva EF; Alperovich N; Vasilyeva O; Ross D The genotype-phenotype landscape of an allosteric protein. Mol. Sys. Biol 2021, 17, e10179. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (99).Towns J; Cockerill T; Dahan M; Foster I; Gaither K; Grimshaw A; Hazel-wood V; Lathrop S; Lifka D; Peterson GD; Roskies R; Scott JR; Wilkins-Diehr N XSEDE: Accelerating Scientific Discovery. Comput Sci. & Engn 2014, 16, 62–74. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
