Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2020 May 29;15(5):e0233509. doi: 10.1371/journal.pone.0233509

Predicting the viability of beta-lactamase: How folding and binding free energies correlate with beta-lactamase fitness

Jordan Yang 1, Nandita Naik 2, Jagdish Suresh Patel 3,4, Christopher S Wylie 2, Wenze Gu 1, Jessie Huang 5, F Marty Ytreberg 3,6, Mandar T Naik 7, Daniel M Weinreich 2, Brenda M Rubenstein 1,*
Editor: Jose M Sanchez-Ruiz8
PMCID: PMC7259980  PMID: 32470971

Abstract

One of the long-standing holy grails of molecular evolution has been the ability to predict an organism’s fitness directly from its genotype. With such predictive abilities in hand, researchers would be able to more accurately forecast how organisms will evolve and how proteins with novel functions could be engineered, leading to revolutionary advances in medicine and biotechnology. In this work, we assemble the largest reported set of experimental TEM-1 β-lactamase folding free energies and use this data in conjunction with previously acquired fitness data and computational free energy predictions to determine how much of the fitness of β-lactamase can be directly predicted by thermodynamic folding and binding free energies. We focus upon β-lactamase because of its long history as a model enzyme and its central role in antibiotic resistance. Based upon a set of 21 β-lactamase single and double mutants expressly designed to influence protein folding, we first demonstrate that modeling software designed to compute folding free energies such as FoldX and PyRosetta can meaningfully, although not perfectly, predict the experimental folding free energies of single mutants. Interestingly, while these techniques also yield sensible double mutant free energies, we show that they do so for the wrong physical reasons. We then go on to assess how well both experimental and computational folding free energies explain single mutant fitness. We find that folding free energies account for, at most, 24% of the variance in β-lactamase fitness values according to linear models and, somewhat surprisingly, complementing folding free energies with computationally-predicted binding free energies of residues near the active site only increases the folding-only figure by a few percent. This strongly suggests that the majority of β-lactamase’s fitness is controlled by factors other than free energies. Overall, our results shed a bright light on to what extent the community is justified in using thermodynamic measures to infer protein fitness as well as how applicable modern computational techniques for predicting free energies will be to the large data sets of multiply-mutated proteins forthcoming.

Introduction

The ability to predict how an organism’s fitness is influenced by mutations is central to being able to project and, in some cases, steer the course of natural evolution [13], engineer protein sequences with novel biological functions [4, 5], and treat genetic disorders [6]. Nevertheless, to this day, such predictions remain far from routine. Only in rare instances can a mutation’s effect on an organism’s fitness be directly tied to a single phenotypic consequence, such as a protein’s fitness for performing a specific function. Yet, even in those rare instances, even the simplest protein’s fitness is influenced by a wide variety of factors [7] including protein and gene expression levels [8], interactions with chaperones [911], protein folding stability [1215], protein folding dynamics [16, 17], and proteolytic susceptibility [18]—as well as many complex factors yet to be uncovered or understood. Unfortunately, many of these even well-understood factors are often difficult, if not impossible, to model in vitro or in silico [19], limiting their overall utility. Given this backdrop, simple, calculable indicators that can predict phenotypes, and ultimately, organismal fitness, are of high value and in high demand.

One experimentally accessible set of phenotypic predictors for the effects of nonsynonymous mutations are protein biophysical measures, such as proteins’ thermodynamic stabilities [6, 14, 20, 21]. As nearly all biological processes and structures involve proteins, one would expect that proteins’ abilities to properly fold, catalyze small-molecule substrates, or bind to partners are highly correlated with their proper function, and by extension, organisms’ abilities to survive and reproduce. How these abilities are influenced by mutations may be quantified by the thermodynamic predictors ΔΔGfold, the change in a protein’s folding free energy upon mutation relative to the wild type, and ΔΔGbind, the change in a protein’s binding free energy for a given substrate upon mutation relative to the wild type. Negative ΔΔG values indicate that proteins are stabilized by a mutation, while positive values indicate that they are destabilized (see the Supporting Information for further details). Most proteins have an optimal thermodynamic regime in which they function, as being too stable can also compete with their ability to function [22]. Indeed, past research has shown that most globular proteins have ΔGfold values in the range of -5 to -15 kcal/mol and that most mutations are accompanied by ΔΔGfold values of -4 to 10 kcal/mol, meaning that many mutations possess ΔGfold values roughly equal to zero and therefore exist at the edge of stability [23]. ΔΔGfold values may be experimentally determined via circular dichroism [24], differential scanning calorimetry [25, 26], or single-molecule fluorescence techniques [27], while ΔΔGbind values may be determined via isothermal titration calorimetry [28] or surface plasmon resonance [29]. Although advances in saturation mutagenesis for producing a plurality of mutations [30] and deep sequencing for rapid sequencing large numbers of mutants [31] have accelerated aspects of these techniques, measuring protein free energy changes remains a comparatively low-throughput and time-intensive process, largely owing to the time it takes to express and purify hundreds to thousands of proteins. Thus, while past Herculean experiments on single mutants have produced a smattering of free energy data [3237] and more recent quasi-exhaustive approaches have shed light on distributions of fitness effects by directly measuring fitness [3841], even higher throughput means of estimating the free energy changes of proteins’ full complement of mutations are needed to forge a more complete picture of the correlation between biophysical predictors and organismal phenotypes.

A key tool that has emerged for accelerating the estimation of these predictors is computation. Thermodynamic quantities such as the free energies discussed above may be calculated via equilibrium statistical mechanical simulations of the underlying proteins. While molecular dynamics [42, 43] and Monte Carlo [44] simulations that attempt to fully sample proteins’ degrees of freedom based upon judiciously parameterized force fields are most accurate for estimating these quantities, these simulations are often orders of magnitude too slow to separately model each of a protein’s thousands of distinct single, nevermind multiple, mutants. Indeed, conventional molecular dynamics simulations of just a handful of mutants remains state-of-the-art [45]. What has therefore transformed the field by making the prediction of free energies of large numbers of mutations not only viable, but routine, is the development of empirical effective free energy function techniques [46, 47], which take in the conformations of proteins and ligands, and directly estimate their ΔΔGfold and ΔΔGbind values using functions parameterized on large databases of protein free energies. Such simulations have enabled a number of previously inconceivable comparisons between mutant free energy changes and organismal measures of fitness, such as minimum inhibitory concentrations (MIC) in bacteria [39] or the viability of viral plaques [15]. One of the primary messages to arise from these studies has been that fitness often falls off precipitously as a proteins’ ΔG value surpasses 0 and therefore that large, positive ΔΔG values correlate with low fitness, but not necessarily vice-versa [6, 48]. Despite these seminal findings, much remains to be understood not only about the accuracy with which empirical free energy functions predict individual proteins’ free energy changes upon mutation, but the finer relationships between free energies and fitness.

In this work, we experimentally determine the folding free energies of 21 TEM-1 β-lactamase single and double mutants and compare them with computational results for the ΔΔG values of folding from a variety of empirical free energy function techniques. As our data set of experimental β-lactamase folding free energies is the largest currently available, it has granted us the unique opportunity to make apples-to-apples comparisons between computational and experimental folding free energies, unlike previous works which have been constrained to apples-to-oranges comparisons of folding free energies to fitness [38, 49]. We then analyze how predictive these experimental and computational free energies are of TEM-1 β-lactamase fitness. The TEM-1 β-lactamase protein is a model enzyme [50] that hydrolyzes such essential β-lactam drugs as penicillins (including the ampicillin modeled here) and cephalosporins, and is therefore directly responsible for the evolution of many common forms of bacterial drug resistance (see the Supporting Information for further information about β-lactamases, including TEM-1) [51, 52]. Beyond basic research interest, being able to predict the fitness of TEM-1’s many possible mutants is therefore also helpful for predicting and ultimately combating the mutants that will lead to the next-generation of drug-resistant, “superbug” bacteria. To compute ΔΔG values of folding and binding, we employ FoldX (and MD+FoldX) [53], PyRosetta [54], PoPMuSiC [55], and AutoDock Vina1Please note that PyRosetta and AutoDock Vina utilize a combination of empirical and physical free energy contributions by weighting physically-inspired terms based upon fits to larger data sets. They are therefore not strictly empirical free energy function techniques. [56]. These programs were selected from numerous possible packages [57] because of the balance of computational expediency and accuracy they bring to the problem of predicting single mutant free energies. We find that PyRosetta, in particular, accurately reproduces the experimental folding free energies of the single, and less methodically, double mutants studied. Using these reasonably accurate single-mutant free energies of folding, we then studied how correlated folding free energies are with β-lactamase fitness. We demonstrate that the overall low predictive capacity of folding free energies alone can be boosted by supplementing them with information about binding free energies. Nevertheless, as may be expected given the complexity of the overall transcription, translation, and post-translation processes, we demonstrate that thermodynamic descriptors only explain a small fraction of β-lactamase fitness results. Overall, our findings shed light not only on the accuracy of high-throughput approaches for estimating protein thermodynamic predictors, but also on just how predictive of organismal fitness these measures can be anticipated to be for an important model protein.

Materials and methods

Experimental determination of folding free energies

We began by experimentally determining the folding free energies of 21 TEM-1 β-lactamase mutants, informally known as the ‘Wylie’ mutants (see Fig 1 for a listing and Fig 2 for an illustration of the relative locations of the Wylie mutants), and wild type TEM-1 using circular dichroism.

Fig 1.

Fig 1

(Top) Table of the 15 single and 6 double TEM-1 mutants that constitute the Wylie mutant data set. Note that in naming the mutants, we first specify the wild type residue abbreviation, followed by the Ambler residue number, and end with the mutant residue abbreviation. (Bottom) The mutant data sets produced or analyzed using the different experimental and computational methods described in this manuscript.

Fig 2. A cartoon representation of the TEM-1 β-lactamase protein (PDB ID: 1xpb) and its ampicillin ligand after the wild type protein has been relaxed and ampicillin docked to it.

Fig 2

The positions of many of the residues mutated in this work, including S70 and G144, are indicated in red.

In order to do so, the Wylie mutants were first sub-cloned into a pBAD202 expression vector available in the pBAD directional TOPO expression kit (Invitrogen). The native leader peptide sequence of the β-lactamase gene was retained to achieve periplasmic transport for proper folding of the enzyme. Since TEM-1 has a weak intrinsic affinity for metal ions, no purification tag was added to the protein. The plasmids carrying the gene of each mutant were then transformed into TOP10 (Invitrogen) E. coli strains and 25 ml LB media starter cultures were grown overnight at 37°C using kanamycin as a selection marker. The next morning, the cells were transferred to 800 ml of fresh LB culture supplemented with kanamycin and protein induction was achieved by the addition of 0.1% arabinose at culture OD600∼0.6. The culture was then grown overnight at 18°C and spun at 4000 rpm in a Sorvell RC5 centrifuge. The supernatant medium was next discarded and the cell pellet was suspended in sucrose buffer (30 mM TRIS, 20% Sucrose; pH 8.0). The suspension was spun at 6000 rpm and the supernatant was again discarded. The resultant cell pellet was subsequently gently resuspended in MgSO4 buffer (5 mM MgSO4; pH 7.0) to induce osmotic shock and incubated at 4°C for 30 minutes for maximum release of the enzyme from the periplasm. The protein was separated from the cells by spinning at 14000 rpm. This periplasmic extract was then incubated with 5 ml Ni-NTA beads (Qiagen) at 4°C for 15 minutes and the resin slurry was packed on an open chromatography column. The flow-through from the column was discarded and the column was generously washed with binding buffer (50 mM potassium phosphate, 100 mM NaCl; pH 7.5). The protein was eluted using 10 column volumes of elution buffer (50 mM potassium phosphate, 100 mM NaCl, 15 mM imidazole; pH 7.5). The sample was afterwards concentrated to < 5 ml using an Amicon Ultra-15 centrifugation device with a membrane with a 10 KDa molecular weight cutoff. The sample was then passed through a Superdex-75 16/600 size exclusion column connected to a GE Akta FPLC using the storage buffer (200 mM potassium phosphate, 4% glycerol, pH 7.0). The purity of these samples was ascertained using SDS-PAGE and the protein concentration was determined using 280 nM absorbance with an UV spectrophotometer. The samples were flash frozen in liquid nitrogen and stored at -80°C. Our typical yields were 2–20 mg purified β-lactamase per liter of LB media.

The thermodynamic stability of each allele was determined by circular dichroism (CD) on a Jasco J-815 spectrometer in a 200 mM potassium phosphate pH 7.0 with 4% glycerol buffer. Briefly, 15 μM of each enzyme was subjected to increasing temperature at the rate of 2°C/min in 2 mm cuvettes. The relatively slow temperature ramp and small cuvettes helped to ensure that the samples attained equilibrium at each temperature. Changes in the ellipticity at 223 nM were recorded from 20 to 90°C. The experiment was performed in triplicate and the melting temperature (Tm) and van’t Hoff enthalpy (ΔH) were calculated by fitting resultant data to a two-state transition as described in previous work (see S1 Table in S1 File for a summary of the circular dichroism data obtained) [24, 50].

Computing free energies of folding with FoldX

As a computational starting point, the folding free energies (ΔΔGfold) of all 4978 (= 262 residues × 19 possible mutations per residue) TEM-1 β-lactamase single missense mutants and specific double mutants were calculated using the FoldX 4.0 algorithm [47, 53]. FoldX was selected to accomplish this necessarily high throughput task due to its relatively high accuracy among fast algorithms—the algorithm has been demonstrated to achieve correlation coefficients as large as 0.7 on a mix of ProTherm [58] and the 1088 Guerois mutants [47], outperforming several alternative algorithms based on both empirical and physical force fields [57]—at minimal computational expense. Of relevance to this work, FoldX is especially designed to model single mutants and has been trained on the select set of β-lactamase mutants found in the ProTherm database, but has been infrequently applied to multi-point mutants [59].

In order to obtain folding free energy differences for β-lactamase, we initialized our FoldX calculations with the 1xpb Protein Data Bank β-lactamase structure [60]. The structure file was first modified to remove everything but TEM-1 β-lactamase and crystal structure water molecules. To match the experimental residue numbering, the residues between 51 and 58 were renumbered sequentially.

FoldX simulations were performed on structures with and without prior molecular dynamics relaxation of the initial wild type conformation. In the following, we term those simulations in which molecular dynamics relaxation was performed before FoldX free energies were computed MD+FoldX simulations. Past studies performed by our team have shown that relaxing the wild type structure before introducing mutations can significantly improve the predictive capacity of FoldX on proteins, such as TEM-1, on which FoldX was not explicitly trained [61, 62]. In our MD+FoldX simulations, the final clean structure file was used to carry out atomistic molecular dynamics simulations using the protocol reported in our previous studies [61, 62]. Briefly, the GROMACS 2018.4 software package was used to perform the MD simulations with the AMBER99SB*-ILDN force field [63]. Final production simulations were carried out for 100 ns and snapshots were preserved every 1 ns resulting in 100 snapshots of each TEM-1 β-lactamase structure. Either the MD snapshots or cleaned PDB structures were then repaired using the RepairPDB function six times in succession to minimize and converge the potential energy [64]. During this procedure, FoldX searches for residues with poor torsion angles due to incorrect rotamer assignment, and after calculating interactions with neighboring atoms, replaces them with the correct rotamer assignment. FoldX subsequently performs a local optimization of the side chains to mitigate van der Waals interactions. Lastly, FoldX identifies residues with high free energies and samples new rotamer combinations composed of these residues and their neighbors to pinpoint new free energy minima.

After optimizing the original 3D structures, all mutant structures were generated using the FoldX BuildModel command [64]. Subsequently, FoldX selects the rotamer with the optimal placement. Mutant free energy changes are lastly calculated based upon these final structures using the FoldX free energy function [53]. Given their previous success reproducing experimental free energy changes [57], in this work, the FoldX weights in its free energy function were set to their default values. In order to compute ΔΔG values, the difference between the ΔG of each mutant and the wild type was taken. In many cases, this leads to fortuitous error cancellations, particularly involving difficult to evaluate free energies of the unfolded proteins, that improve the overall accuracy of the predictions. For the MD+FoldX calculations, the final ΔΔGfold values for each mutant were obtained by averaging the FoldX results across all individual snapshot estimates.

Computing free energies of folding with PyRosetta

In order to improve upon the accuracy of our FoldX calculations, we also employed PyRosetta to compute free energies of folding. PyRosetta is an independent Python-based implementation of the Rosetta molecular modeling package that enables users to design and implement structure prediction and design algorithms using its underlying Rosetta sampling and scoring functions [54]. Because PyRosetta possesses more robust ways of relaxing mutant structure side chains than FoldX, it is expected to yield more accurate predictions than FoldX, particularly for mutants in which more compact or weakly charged wild type residues are substituted with more voluminous or highly charged mutant residues. Recent work by Kellogg et al. has demonstrated that PyRosetta can achieve correlation coefficients in excess of 0.5 against databases of experimental folding free energies [65].

In this work, we used PyRosetta-4 [54] to capture the difference in Rosetta score directly in experimentally comparable units of kcal/mol [66] between each mutant structure and wild type TEM-1 represented by the PDB 1xpb structure [60]. During our PyRosetta simulations, we first repacked all 1xpb side chains by sampling from the 2010 Dunbrack rotamer library [67], and then used Monte Carlo (MC) sampling coupled with energy minimization to optimize the wild type structure based upon the Rosetta REF2015 scoring function [66]. Next, we introduced each missense mutation and repacked all residues within a 10 Å distance of the mutated residue’s center, followed by a linear minimization of the backbone and all side chains. As part of our protocol, we performed five independent simulations of 300,000 Monte Carlo cycles each, and the predicted ΔΔGfold value was taken to be the average of the two lowest scoring structures of the five. During each simulation, each mutant structure was perturbed, and accepted or rejected based upon the Metropolis criterion. We selected a 10 Å repacking radius as it served as a reasonable compromise between achieving accurate relaxation without getting caught in metastable minima and computational expediency [38], although other radii could in principle be selected. Because PyRosetta performs Monte Carlo sampling and minimization, it is significantly more computationally expensive than FoldX. We therefore primarily applied it to the Wylie mutant data set and mutants for which FoldX predictions seemed questionable relative to other experimental and simulation predictions.

Computing ampicillin binding free energies with AutoDock Vina

As residues nearest the active site are expected to most influence binding free energies (for exceptions, see Stiffler et al. [41]), ampicillin docking simulations were performed on residues within an 8 Å distance of the active site. Ampicillin was chosen as a substrate so as to be consistent with the ampicillin-based minimum inhibitory concentration (MIC) and fitness experiments described below. Docking was performed using AutoDock Vina (ADV) 1.1.2 [56]. As with PyRosetta, AutoDock uses physically-inspired scoring functions that intake protein and ligand conformations weighted to reproduce experimental binding free energies [56]. With this scoring function, ADV determines the lowest free energy protein-ligand conformations by taking a sequence of steps consisting of a proposed change in conformation followed by an optimization performed by the Iterated Local Search global optimizer algorithm [68, 69]. The performance of the AutoDock and ADV scoring functions were compared with that of 29 other functions during the CASF-2013 benchmark, which assessed the performance of scoring functions [70]. These benchmarks illustrated that ADV outperforms AutoDock and 75% of other tested methods. ADV therefore represents a useful compromise between speed and accuracy for the purposes of high-throughput calculations.

For our ADV simulations, the ligand PDBQT input file was prepared by first downloading the ampicillin pdb file from PubChem (https://pubchem.ncbi.nlm.nih.gov/compound/6249). Because of ampicillin’s low pKa of 2.5, all ampicillin carboxylic acid groups were modeled as carboxylates in our ADV simulations. 6 out of 32 bonds in the ligand were made rotatable. Then, all hydrogens were added to the ligand using AutoDockTools, Gasteiger charges were computed, and the non-polar hydrogens were merged. As there is no co-crystal of ampicillin with TEM-1 publicly available, before performing docking calculations on mutants, preliminary docking calculations had to first be performed on wild type TEM-1 to identify a reliable ampicillin docked pose (see S9 Fig in S1 File). After finding that pose, all 12 residues that were within an 8 Å distance of the alpha carbon of residue 70 (representative of the active site) were selected for binding affinity calculations. First, PyRosetta was used to introduce each missense mutation and relax each mutant structure. Then, to prepare each receptor PDBQT file, AutoDockTools was used to first add polar hydrogens to the macromolecule and then to assign Kollman United Atom charges. To create a configuration file, a grid box with a size of 24 Å × 26 Å × 24 Å was generated and centered on the α-carbon of residue 70. We chose to employ a relatively small grid box so as to reduce the chance that ampicillin binds to a non-active site region of the enzyme, which is undesirable. We set num_modes to 5 and exhaustiveness to 8. The remaining docking parameters were kept at their default values. At the conclusion of each docking calculation, the predicted binding affinity in kcal/mol of the best mode was selected among the 5 generated binding modes and reported in the figures below.

Experimental measures of fitness

There are numerous ways of characterizing organismal fitness, even within the same organism. As this study focuses on mutations to the TEM-1 enzyme that confer antibiotic resistance, we have chosen to gauge the fitness of bacteria containing TEM-1 mutants based upon how resistant they remain to one of the primary β-lactam antibiotics, ampicillin. This resistance may be quantified by minimum inhibitory concentrations (MIC), which are the lowest concentrations of, in this case, ampicillin, that prevent all detectable bacterial growth. While we have determined the MIC values of the Wylie mutant data set (see the Supporting Information, including S6 and S7 Figs in S1 File, for further information), here, we overwhelmingly employ the more comprehensive set of fitness values acquired by Firnberg et al. in our analyses (see Fig 1) [38]. Firnberg et al. quantified each mutant’s fitness by taking an average of the number of copies of the mutant alleles weighted by the range of ampicillin concentrations at which they were grown and normalizing this by the wild type average [38]. We have verified that the Firnberg fitness values correlate well (r2 > 0.7) with our previously determinted MIC values as well as other data sets we have acquired, thus validating their use here (see S6 Fig in S1 File).

Results

The Wylie mutant data set: Direct comparisons between computational and experimental free energies of folding

Accuracy of single mutant predictions

In order to first assess how accurately computational techniques predict the ΔΔGfold values of β-lactamase, we begin by comparing our experimentally-determined folding free energies to our computational predictions from both MD+FoldX and PyRosetta. Note that throughout the remainder of this paper, ‘free energy changes’ will refer to ΔΔG values for brevity. Because experimentally determining folding free energies is inherently low throughput, we focused our experimental efforts on the Wylie mutant data set (see Fig 1 for a detailed list of these mutants). The residues within this set are all greater than 6 Å from the active site and were purposefully selected because, at these distances, they were expected to have potentially significant effects on folding, but limited effects on binding and kinetics, allowing us to mostly attribute their influence on fitness to folding changes. The one exception is the S70G mutant, as residue 70 resides in the heart of the binding site and transforming β-lactamase’s primary catalytic serine into an inert glycine is known to markedly decrease the enzyme’s catalytic efficiency, yet markedly increase its folding stability [71, 72]. In Fig 3, we plot experimentally-determined folding free energies against computationally-predicted free energies for the single mutants. We have shaded the first and third quadrants in this figure to ease identification of the mutants whose experimental and computational free energies are of the same sign. It is thus gratifying to see that both MD+FoldX and PyRosetta free energies of folding positively correlate with the experimentally determined values for these mutants: more positive experimental values are matched by more positive computational predictions, while more negative experimental values are matched by more negative computational predictions (see S1 Fig in S1 File for purely FoldX predictions, which parallel the MD+FoldX results). Indeed, as can be determined by counting the number of mutants in the shaded regions, MD+FoldX correctly predicts the signs of 14 out of 15 mutants, while PyRosetta does so for 11 out of 15 mutants. In general, both MD+FoldX and PyRosetta predict the majority of the mutants to lie in the same relative places on these plots (see S2 Fig in S1 File for a direct comparison of MD+FoldX and PyRosetta predictions). It is moreover pleasing to see that PyRosetta predicts S70G, which is known to be stabilizing and is thus in some sense a control, to have a negative ΔΔGfold value; MD+FoldX fortunately also yields a reasonably accurate, although not fully stabilizing, prediction for this mutant. In concurrence with the results for free energy distributions presented in the next section, even this relatively small data set demonstrates that the majority of mutants destabilize folding. The ΔGfold value for β-lactamase that we determined via experiment is -8.4 kcal/mol. As mutants lose their folding stability as their ΔGfold values approach 0 and ΔGfold,mutant = ΔGfold,wildtype+ ΔΔGfold,mutant, many of the mutations we have studied that have ΔΔGfold values nearing 8 kcal/mol lie on the verge of unfolding the protein.

Fig 3. Scatterplots depicting the correlation between experimentally- and computationally-determined β-lactamase TEM-1 free energies of folding for single mutants.

Fig 3

(Left) Correlation based upon MD+FoldX free energy predictions. (Right) Correlation based upon PyRosetta free energy predictions. The labels on the points indicate the mutant name based upon residue changes to the wild type structure as described above. The MD+FoldX coefficient of determination is r2 = 0.071, which is significantly lower than the r2 = 0.44 value obtained using PyRosetta. The shaded regions delineate the first and third quadrants of the plot, which contain mutants whose free energies are of the same sign according to both experiment and computation.

Despite the qualitative agreement between the Fig 3 panels, however, they do differ quantitatively. First and foremost, the range of PyRosetta folding free energies is significantly larger than the range of MD+FoldX folding free energies. Much of this difference in range may be attributed to PyRosetta’s strongly negative ΔΔGfold values for the S70G, E212K, and K234Q mutants. Without these mutants, PyRosetta’s ability to predict the experimental data would significantly decline. MD+FoldX also seems less able to discern experimentally stable from unstable mutants, as it predicts many mutants to be less stable than they are in reality. In combination, these factors contribute to PyRosetta being more strongly correlated with the experimental data, as evidenced by its 0.44 coefficient of determination relative to MD+FoldX’s 0.071 coefficient. Indeed, PyRosetta’s correlation coefficient of 0.67 for β-lactamase is among the highest PyRosetta correlation coefficients for proteins published in the literature [73, 74]. To complement our regression analysis, we additionally computed Spearman’s rank correlation coefficients [75] for our Experiment vs. MD+FoldX and Experiment vs. PyRosetta data sets. Spearman’s rank correlation coefficients assess how correlated the rank order, here based upon ΔΔGfold magnitudes, is between two lists. In concurrence with our regression results, we obtain a rank correlation coefficient of 0.22 for MD+FoldX and 0.44 for PyRosetta, which corroborates the fact that PyRosetta more accurately captures the experimental data, even beyond a simple linear model. Overall, these results suggest that, while current computational tools are not perfect, they can qualitatively predict single mutant free energy trends.

PyRosetta likely outperforms FoldX at quantifying the folding free energies of the Wylie single mutants because the majority of these mutants are solvent accessible (see S5 Fig in S1 File). Past work comparing the accuracies of PyRosetta and FoldX on a combination of Guerois [47] and ProTherm database mutants [76] demonstrated that Rosetta performs best, while FoldX performs worst on mutants involving solvent exposed residues compared with other classes of mutants [57, 74]. This is because FoldX often implausibly favors placing hydrophobic residues on protein surfaces.

Accuracy of double mutant predictions

Given their ability to predict folding free energy trends of single residue mutants, for which they were largely designed, we next explored how well these computational techniques performed on double mutants constructed from Wiley data set constituent single mutants. While many multiply-mutated proteins are known to possess free energies of folding that are simply the sums of their constituent single mutation free energies because their constituent mutations act largely independently, some of the most biophysically intriguing mutations possess non-additive free energies and thus lead to epistatic effects that can dictate the course of protein evolution [7779]. The right-most panel of Fig 4, which plots the folding free energies directly measured for double mutant structures against those obtained by adding the free energies of the constituent single mutants, illustrates that three of the Wiley double mutants, K234Q/R241H, A172P/G283C, and E212K/G218V, possess essentially additive folding free energies, while three others, the A213G/L57H, G144E/L199F, and D163Y/R93S mutants, possess non-additive free energies according to experiment (see S4 Fig in S1 File for a tabulation of the underlying quantitative data). Despite being limited, this set of mutants is thus ripe for benchmarking how predictive computational techniques are for multiply-mutated proteins. Interestingly, we find that, regardless of whether experiment predicts the mutants to be additive or non-additive, FoldX and MD+FoldX always yield additive predictions (middle panel of Fig 4). The additivity of MD+FoldX, even when supplemented with MD relaxation of the original wild type structure, may be anticipated based upon the fact that it does not globally relax mutant conformations. In contrast, PyRo-setta generally yields non-additive predictions (left-most panel of Fig 4). It is because of this non-additivity that PyRosetta outperforms FoldX in predicting the folding free energies of the Wylie double mutants, as depicted in Fig 5. Nevertheless, the fact that PyRosetta’s double mutant free energy predictions are always superadditive, likely because it is unable to fully relax double mutant structures, also makes its predictions questionable.

Fig 4. Scatterplots of directly computed or measured Wylie double mutant free energies vs. the free energies obtained by adding their corresponding constituent single mutant free energies.

Fig 4

(Left) Results for PyRosetta; (Middle) MD+FoldX; and (Right) Experiment. The labels on the points indicate the double mutants involved. The dotted lines are y = x lines, which indicate where computed or measured double mutant free energies are perfectly additive. Those points closest to the plotted lines are thus closest to being additive.

Fig 5. Scatterplots of experimental vs. predicted folding free energies for the Wylie double mutants.

Fig 5

(Left) Experiment vs. MD+FoldX predictions; (Right) Experiment vs. PyRosetta predictions. The shaded regions delineate the first and third quadrants of the plot, which contain mutants whose free energies are of the same sign according to both experiment and computation.

This said, it is noteworthy that both PyRosetta and MD+FoldX are more accurate at predicting the folding free energies of this set of double mutants than the single mutants presented above (see S3 Fig in S1 File for a scatterplot of all of the Wylie mutants). This is likely an artifact of the small double mutant sample size, but it is disconcerting that MD+FoldX achieves this larger double mutant correlation based upon the incorrect assumption of additive free energies and that PyRosetta does so based upon consistently superadditive predictions. All in all, these results call for the development of improved fast, yet accurate techniques that pair a computationally expedient amount of relaxation with empirical, if not optimally physical, functions also parameterized to account for multiply-mutated proteins.

Firnberg mutant dataset: Analyzing the correlation between free energies of folding and fitness

Folding free energy distributions

Encouraged by our computational predictions for the Wylie mutant data set, we next utilized simulation to predict the free energy trends of all β-lactamase single mutants with the ultimate aim of characterizing their influence on β-lactamase fitness. We find that the shape of β-lactamase’s folding free energy distribution according to FoldX may best be fit by a gamma distribution2A gamma distribution (denoted as Γ(α, β)) is characterized by its shape parameter α, which determines whether the distribution is exponentially-shaped (for α ≤ 1) or mounded (for α > 1; the greater α is above one, the less skewed the distribution is), and its rate parameter β, which determines how slowly the distribution decays, with distributions with larger values of β decaying more slowly than those with smaller values [80]. (see Fig 6), which can capture the right skew of the distribution due to the substantial number of mutants predicted to have ΔΔGfold > 10 kcal/mol. Interestingly, we find only very slight differences between the FoldX and MD+FoldX distributions. MD relaxation of the wild type structure simply shifts some of the probability for observing large free energy mutants to observing more low free energy mutants. The fact that the overall form of the distribution is preserved with and without MD suggests that it is most strongly influenced by the FoldX scoring function. Subsequently, we compared this distribution to that obtained by Firnberg et al. with PyRosetta as well as with our own PoPMuSiC results. PoPMuSiC is a popular web server for protein stability prediction (see the Supporting Information for further details regarding the PoPMuSiC algorithm). While the shape of the PyRosetta distribution may also be described by a gamma distribution with a slightly larger right skew, the PoPMuSiC distribution was best described by a normal distribution. This is because PoPMuSiC neither considers mutations that destabilize the structure by more than 5 kcal/mol nor those involving a proline, which are likely to induce significant structural modifications [55]. Many of the mutants predicted by FoldX and PyRosetta to be accompanied by large free energy changes stem from these expedient methods’ inability to fully relax the structures of mutants in which tryptophans or other volumetrically bulky residues replace volumetrically smaller amino acids (for an illustration, see Fig 8). While experiments find that, in certain cases, tryptophans do in fact grossly destabilize the enzyme [59], in other cases, FoldX and PyRosetta strongly overestimate tryptophan-induced clashes and their related free energies. The key point that may be garnered from this comparison of distributions is that, while disparate computationally-expedient techniques may differ in their predictions for specific mutant free energies, they yield similar free energy trends overall, particularly for mildly destabilizing mutants.

Fig 6. The probability distribution functions of folding free energies based upon PoPMuSiC, FoldX, PyRosetta, and MD+FoldX results.

Fig 6

The curves depicted are histograms of the data, while the parameters given in the legend are based off of smooth fits. The PoPMuSiC distribution is significantly more peaked and less skewed than the other distributions, making it most consistent with a Gaussian distribution. FoldX, MD+FoldX, and PyRosetta all possess more right-skewed distributions with significant high free energy tails such that their distributions are best captured by Γ-distributions [80].

Fig 8. β-lactamase fitness values as measured by Firnberg et al. vs. MD+FoldX predictions of the folding free energy for all β-lactamase single mutants.

Fig 8

(Right) All mutants; (Left) Mutants with ΔΔGfold < 5 kcal/mol. Mutants involving residues within 8 Å of the binding site and substitutions to tryptophan, proline, and glycine are colored in red, green, purple, and pink, respectively. All other mutants are depicted in blue. As labeled, many of the mutants with the largest predicted free energies of folding involve substitutions to tryptophan (W), resulting in large ΔGclash terms within FoldX.

As a further physical check on MD+FoldX’s predictions, we additionally analyze its predicted folding free energies as a function of residue number. β-lactamase crystal structures reveal that β-lactamase is composed of two domains, comprised of a total of 5 β-sheets and 9 α-helices [60, 81]. Mutations that disrupt how well these secondary structures form are thus most likely to significantly alter β-lactamase’s folding free energy. In Fig 7, we depict the folding free energies as a function of mutant residue number and color the residues according to the secondary structures they form. It is evident from this plot that many of the largest predicted folding free energy changes occur in regions with stabilizing β-sheet or α-helical character. Since previous work has shown that FoldX predicts ΔΔGfold values essentially as accurately for helix and sheet regions as for all other residues [57], the large free energy changes observed in these regions are likely due to the disruption of secondary structure and therefore serve to validate the predictive capacity of this method.

Fig 7. β-lactamase folding free energies predicted using MD+FoldX vs. residue number.

Fig 7

Each point depicts a possible single-residue mutant. Those residues located in β sheets are depicted in red, those in α helices are depicted in blue, and all others are depicted in green. Many of the mutants with the largest ΔΔGfold values are present in β sheets, α helices, or interestingly, around key catalytic residues (such as S70, S130, and K234 as discussed in the Supplemental Information).

Correlation between firnberg fitness data and folding free energies

With these modeling considerations in mind, we then returned to our original goal of understanding how predictive free energies of folding are of protein fitness by comparing our folding free energies against Firnberg et al.’s [38] fitness data. This said, from the left-hand panel of Fig 8 and S2 Table in S1 File, it is clear that folding free energies are reasonable predictors of fitness: many (2175) mutants predicted to be stable with ΔΔGfold < 5 kcal/mol are in fact fit, possessing fitness values greater than 0.5 (so-called ‘true positives’). We note that the ΔΔGfold and fitness cutoffs used in these definitions are somewhat arbitrary, but a ΔΔGfold of 5 kcal/mol was selected for the folding free energy because at 8 kcal/mol, beta-lactamase unfolds and thus, above 5 kcal/mol, it is expected to be unstable. There are additionally many (506) mutants predicted to be unstable, with ΔΔGfold > 5 kcal/mol, that are unfit, possessing fitness values less than 0.5 (so-called ‘true negatives’), also as one would hope. As can be inferred from the labeled residues toward the right of the left-hand panel of Fig 8, most of these large free energy mutants involve substitutions of volumetrically smaller residues, such as glycine and alanine, with larger, bulky residues, such as tryptophan and tyrosine, which dramatically raise the free energy contributions associated with steric clash. Remarkably, our plots manifest strikingly few (22) cases for which mutants with large folding free energies (>5 kcal/mol) possess high fitness (>0.5), which we term false negatives. Even though this is heartening, there nevertheless exist numerous (2079) false positives: mutants that exist in the lower left corner of the plot whose small folding free energy differences (<5 kcal/mol), which one would expect to correspond to high fitness values (>0.5), nonetheless map to low fitness values (<0.5). It is also clear from the right-hand panel of Fig 8 that small (< 5 kcal/mol) changes in folding free energies which destabilize the protein, but do not unfold it (based upon its ΔGfold ∼ −8 kcal/mol), lead to a wide range of fitness values and are therefore not strongly correlated with fitness. Putting these factors together, we find that roughly 22% of the variance in β-lactamase fitness can be explained by linearly fitting a regression line to the MD+FoldX data points with −5 ≤ ΔΔGfold ≤ 10 kcal/mol (to remove the influence of large free energy outliers on a linear fit). Better fits that account for mutants with larger folding free energies may be obtained by fitting non-linear functions to the data. As we discuss in the Supplemental Information, the full data set may best be modeled by the offset Duncan Equation, f(x)fold=-0.804[1+e-(x+8.30)/1.17]3597+0.871 with a correlation coefficient of r = 0.647, an improved correlation coefficient, but still one that struggles to accommodate for the problematic false positives.

To better understand the origin of these many false positives, we have colored mutations within 8 Å of the active site and those involving tryptophan, proline, or glycine in Fig 8. It is well-known that FoldX predictions for mutants containing proline and glycine are often incorrect, as they both disrupt protein secondary structure and glycine side chains are so minimal that replacing them with most other amino acids will result in prohibitive levels of steric clash [57]. As already discussed and clearly indicated by the labeled residues in the figure, substitutions to tryptophans often generate aberrantly large folding free energies. Moreover, mutants located near the active site are more likely to significantly contribute to changes in fitness by affecting enzyme catalysis than protein folding. This has been borne out in one previous study in which the predictive capacity of FoldX for β-lactamase MIC values was increased from 0.15 to 0.19 by excluding active site residues from consideration [39]. Discarding all of these different mutant classes removes roughly 12% of the false positives, here defined to be mutants having a fitness between 0 and 0.5 and a ΔΔGfold value between -5 and 10 kcal/mol. This boosts the percent of variance in fitness explained by our MD+FoldX calculations based upon linear fitting to 23.8%, a slight, but not profound, improvement over our previous fit.

Correlation between firnberg fitness and experimental folding free energies

Given the inability of computational folding free energies to fully explain fitness, one may ask if our lack of predictive power stems from modeling errors. To address this point, in Fig 9, we plot fitness vs. our experimental Wylie free energies. Keeping in mind the limitations of this small data set, we see that there appears to be virtually no correlation between experimental folding free energies and fitness over this small range of folding free energy values. ΔΔGfold values less than 5 kcal/mol may, again, simply not be large enough to induce the structural changes needed to clearly impact fitness. The fact that the same conclusion may be drawn based upon experimental and computational data adds credence to our computational results. While a larger experimental free energy data set may end up manifesting a stronger correlation between fitness and folding free energies, these results lead one to wonder whether taking other potential thermodynamic predictors, such as binding free energy changes, into account may improve matters.

Fig 9. Scatterplot of fitness vs. Wylie data set experimental free energies of folding.

Fig 9

Virtually no correlation can be observed between fitness and folding free energies based on this data set alone.

Improving fitness predictions with binding calculations

Since many of the mutants whose fitness values cannot be well explained by their ΔΔGfold values reside near the active site, going beyond all previous works, we lastly considered these mutants’ computational free energies of binding, ΔΔGbind. Based upon our ADV docking calculations, we indeed find that many of the mutations that occur within 8 Å of the alpha carbon of S70 significantly increase their mutants’ binding free energies. Plotting these 8 Å mutants’ fitness against both their folding and binding free energies as in Fig 10 shows that a larger fraction of β-lactamase fitness can be explained by a combination of ΔΔGfold and ΔΔGbind data. Based upon non-linear fits to the functional forms given in Fig 10 (see the Supporting Information for fitting details), the fit r-values increase from 0.244 for folding alone and 0.304 for binding alone to an r-value of 0.360 when both folding and binding are accounted for. The larger r-value for binding than folding furthermore supports our assumptions about the more important influence of binding on fitness for residues neighboring the active site.

Fig 10. β-lactamase fitness values as measured by Firnberg et al. vs. MD+FoldX predictions of the folding free energy and Autodock Vina predictions of the binding free energy for residues within 8 Å of the active site.

Fig 10

Using a combination of folding and binding free energies as predictors of fitness significantly improves their predictive capability beyond using them individually by accounting for both of the largely independent (as may be gleaned from the plot) effects of folding and binding. Indeed, the r-value of 0.244 of the folding data alone and the r-value of 0.304 of the binding data alone are improved to 0.360 when utilizing both data sets to explain the fitness. This data was fit by the two- and three-dimensional non-linear functions provided on the plot.

Interestingly, we find that, given β-lactamase’s structure, mutants that affect folding are overwhelmingly independent of mutants that affect binding, as can be seen from the limited number of mutants that cluster around the ΔΔGfold-ΔΔGbind diagonal in Fig 10 (we leave the correlation between folding and binding suggested in Fig 7 for the residues neighboring S70, S130, and K234 to future work). We would expect this situation to vary for proteins whose folding and binding mechanisms are more intimately intertwined or that have a more deeply concealed active site.

As with folding, we can also analyze the form of the fitness vs. free energy of binding curve. As shown in Fig 11, the probability distribution associated with the change in free energies of binding also exhibits a precipitous decline beyond a ΔΔGbind value of roughly 0.25 kcal/mol, a comparatively tight threshold. Thus, the binding distribution may also be characterized by a gamma distribution with a significant right skew. Although we have computed binding free energies for a small set of mutants and a larger set may manifest different trends, we furthermore find that a comparatively small fraction seem to exhibit negative ΔΔGbind values. S8 Fig in S1 File, which labels the largest ΔΔGbind points in Fig 11 with their corresponding mutants, additionally demonstrates that the majority of the mutants that most affect binding alter the critical catalytic residues 70 and 130, as well as several of the residues known to influence catalysis in the 230 range.

Fig 11. Scatterplot of the fitness vs. binding free energies produced using AutoDock Vina of β-lactamase mutants whose mutated residues are within 8 Å of the active site.

Fig 11

The distribution is best captured by a Γ distribution with α = 2.39 and β = 0.268.

Conclusion

In closing, in this work, we have analyzed how predictive thermodynamic biophysical indicators can be of organismal fitness, focusing in particular on how well protein folding and binding free energies can predict the fitness of β-lactamase mutants. As a prelude to our fitness studies, we first presented the largest published data set of experimental β-lactamase ΔΔGfold values for mutants purposefully selected to predominantly affect folding. We subsequently demonstrated that trends in these values can be reasonably predicted using high-throughput modeling techniques such as MD+FoldX and PyRosetta. More specifically, we find that while FoldX and PyRosetta can both qualitatively match experimental results, PyRosetta with its more robust conformational sampling algorithms can more quantitatively predict the folding free energies of surface-exposed β-lactamase single mutants. Interestingly, we find that both MD+FoldX and PyRosetta are capable of making sensible predictions of double mutant free energies of folding, even though they are not explicitly designed to do so and often do so for the wrong physical reasons. Using MD+FoldX predictions and previously acquired β-lactamase fitness data, we moreover demonstrated that large, positive ΔΔGfold values are highly predictive of low fitness, but that ΔΔGfold values only account for, at most, 24% of the variance in β-lactamase fitness based on linear models. Adding credence to our simulation results, this low overall predictive capacity was also borne out by comparisons among fitness and experimental folding free energies. Lastly, going beyond previous work, we demonstrated that, for a select set of mutants, the fraction of the fitness that can be accounted for by thermodynamic measures can be improved by including binding free energy information. Nevertheless, the fact that all combinations of our thermodynamic indicators consistently predict a small fraction of the variance in β-lactamase fitness points to the fact that, to achieve its ambition of predicting fitness landscapes, the community must redouble its efforts to develop and analyze the predictive capacity of other potential predictors of organismal fitness. Even though it is likely that the techniques used here fail to correctly capture some fraction of the variance in the fitness due to their inherent approximations, these techniques have been shown to perform as well as many of the best empirical effective free energy function methods available and thus our results point more to the deficiencies of thermodynamic predictors than to the deficiencies in our modeling. Indeed, our results strongly suggest that, at least for β-lactamase, non-thermodynamic measures play a central role in determining fitness.

Although the community’s understanding is still evolving, recent research points to the significant impact the kinetics of catalysis [50], protein quality control [10], protein aggregation, degradation, and interactions with other proteins more generally [6], and post-translational modifications [82], among other non-thermodynamic factors, have on fitness. Accurately modeling protein kinetics still represents a formidable challenge for simulators as doing so either requires the ability to simulate out to long enough times to capture all relevant protein dynamics or models that can reliably project out to these long times [83]. Despite the modeling challenges at hand, it may be worthwhile to explore how much of even the short-time dynamics of proteins can predict their fitness, as dynamical fingerprints of proteins have recently been used to uncover new inhibitors [84] and understand allostery [85]. In fact, recent molecular dynamics simulations and NMR experiments performed on a select set of proteins and their mutants have shown that mutations can have “propagatory effects” that can influence the conformation and dynamics of residues up to 25 Å away from them [8688]. It would be fascinating and worthwhile to eventually be able to relate such propagatory effects to fitness landscapes. Relatedly, seeing which aspects of recent, cutting-edge kinetic models of β-lactamase [89] can be used to predict fitness would be a next intriguing step. Sufficient progress has also been made toward modeling post-translational modifications that one can readily imagine the incorporation of these effects into high-throughput computational methodologies for mutants in the very near future [90]. The computational modeling of protein quality control and transcriptional and translational dynamics, however, remain in their infancy owing to the difficulty of experimentally determining the strengths and frequencies of the protein-protein and protein-nucleic acid interactions involved which can be used to parameterize reaction network models [91]. Given the wealth of information these models could yield, these represent an exciting frontier that needs significantly more researcher attention moving forward.

Shy of achieving these blue sky ambitions, our results strikingly exemplify the nearer-term need for computational techniques capable of predicting correlations among mutants, which we will loosely term epistatic effects here. Although we studied a limited set of double mutants and their constituent singles using a select set of computationally-expedient techniques, the methods explored tended to yield sensible fits to double mutant data but for the wrong reasons: MD+FoldX implicitly assumed that all double mutants possessed additive free energies, whereas PyRosetta calculated all double mutants to possess superadditive free energies, regardless of whether the mutant experimental free energies were additive or not. Based on previous literature, it is likely that other empirical effective free energy techniques will perform similarly and that many techniques that more completely sample each mutant’s conformation space will be too costly to bring to bear on the results of saturation mutagenesis experiments. This glaringly limits our understanding of epistasis to the few, painstakingly obtained complete sets of mutants available, making it difficult, if not impossible, to enunciate just how prevalent epistatic effects are and what ultimate impact they have on evolution across species. While it is true that the FoldX and PyRosetta free energy functions are parameterized on data sets overwhelmingly comprised of single mutants and therefore can certainly be improved via machine learning or better fitting to predict the folding free energies of proteins containing multiple mutations, our data suggest that most of the inadequacies of these techniques stem from their inability to fully relax mutant conformations. All-atom molecular dynamics or Monte Carlo techniques are designed to realize such full relaxation, but typically at costs prohibitive for the high-throughput studies of mutants necessary for understanding fitness landscapes. Thought must therefore be dedicated to how best to relax the regions directly surrounding and connecting mutations while maintaining efficiency. Possible paths to achieving the relaxation needed may include using umbrella sampling [92], Hamiltonian-exchange-like techniques [93], or simply resolving relaxation protocols compatible with FoldX and PyRosetta that can reliably be used to relax the majority of multiply-mutated proteins.

Even though our results are limited to the β-lactamase protein, a protein exceptional in that its catalytic function may be directly tied to organismal fitness in bacteria, there is significant evidence that our findings are generalizable across many protein families [23]. We therefore hope that our results motivate the community to develop the beyond-free energy computational tools that will be central to once-and-for-all seizing the holy grail of rapidly and accurately predicting organismal fitness from molecular principles.

Supporting information

S1 File. Supplemental information file of supporting discussion and figures.

Contains all supporting discussion regarding β-lactamase and methods, as well as two supporting tables and nine supporting figures.

(ZIP)

Acknowledgments

The authors deeply thank Craig Miller, Holly Wichman, Christopher Marx, and many other members of the University of Idaho Center for Modeling Complex Interactions Molecular Modeling working group as well as Hersh Gupta for numerous fruitful discussions. We thank Gabriel Monteiro da Silva for assistance with our binding calculations. We also thank Elad Firnberg for graciously providing us with his fitness data set and related conversations. Computer resources were provided by the Brown Center for Computation and Visualization (CCV) and the high-performance computing center at Idaho National Laboratory, which is supported by the Office of Nuclear Energy of the U.S. DOE and the Nuclear Science User Facilities under Contract No. DE-AC07-05ID14517.

Data Availability

All input/output, script, and data files are available from this manuscript’s GitHub Repository, yanghaobojordan/PLOS_BetaLactamase: https://github.com/yanghaobojordan/plos_betalactamase.

Funding Statement

JY, NN, JSP, WG, JH, MY, MTN, DMW, and B MR were funded by National Science Foundation EPSCoR Track-II award number OIA1736253. JSP was additionally supported by the Center for Modeling Complex Interactions sponsored by the NIGMS under award number P20 GM104420. Computer resources employed by JY, JH, WG, and BMR were provided by the Brown Center for Computation and Visualization (CCV) and JSP and MY employed resources provided by the high-performance computing center at Idaho National Laboratory, which is supported by the Office of Nuclear Energy of the U.S. DOE and the Nuclear Science User Facilities under Contract No. DE-AC07-05ID14517. EPSCOR Grant Information: https://www.nsf.gov/awardsearch/showAward?AWD_ID=1736253 Center for Modeling Complex Interactions Information: http://grantome.com/grant/NIH/P20-GM104420-04 No, the funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1. Eyre-Walker A, Keightley PD. The distribution of fitness effects of new mutations. Nature Reviews Genetics. 2007;8:610–618. 10.1038/nrg2146 [DOI] [PubMed] [Google Scholar]
  • 2. Silander OK, Tenaillon O, Chao L. Understanding the evolutionary fate of finite populations: The dynamics of mutational effects. PLOS Biology. 2007;5(4):1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Soskine M, Tawfik DS. Mutational effects and the evolution of new protein functions. Nature Reviews Genetics. 2010;11:572–582. 10.1038/nrg2808 [DOI] [PubMed] [Google Scholar]
  • 4. Huang PS, Boyken SE, Baker D. The coming of age of de novo protein design. Nature. 2016;537:320–327. 10.1038/nature19946 [DOI] [PubMed] [Google Scholar]
  • 5. Arnold FH. Design by directed evolution. Accounts of Chemical Research. 1998;31(3):125–131. 10.1021/ar960017f [DOI] [Google Scholar]
  • 6. DePristo MA, Weinreich DM, Hartl DL. Missense meanderings in sequence space: a biophysical view of protein evolution. Nature Reviews Genetics. 2005;6(9):678–687. 10.1038/nrg1672 [DOI] [PubMed] [Google Scholar]
  • 7. Liberles DA, Teichmann SA, Bahar I, Bastolla U, Bloom J, Bornberg-Bauer E, et al. The interface of protein structure, protein biophysics, and molecular evolution. Protein Science. 2012;21(6):769–785. 10.1002/pro.2071 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Drummond DA, Wilke CO. Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolution. Cell. 2008;134(2):341–352. 10.1016/j.cell.2008.05.042 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Rutherford SL. Between genotype and phenotype: protein chaperones and evolvability. Nature Reviews Genetics. 2003;4(4):263–274. 10.1038/nrg1041 [DOI] [PubMed] [Google Scholar]
  • 10. Bershtein S, Mu W, Serohijos AWR, Zhou J, Shakhnovich EI. Protein quality control acts on folding intermediates to shape the effects of mutations on organismal fitness. Molecular Cell. 2013;49(1):133–144. 10.1016/j.molcel.2012.11.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Tokuriki N, Tawfik DS. Chaperonin overexpression promotes genetic variation and enzyme evolution. Nature. 2009;459:668–673. 10.1038/nature08009 [DOI] [PubMed] [Google Scholar]
  • 12. Araya CL, Fowler DM, Chen W, Muniez I, Kelly JW, Fields S. A fundamental protein property, thermodynamic stability, revealed solely from large-scale measurements of protein function. Proceedings of the National Academy of Sciences. 2012;109(42):16858–16863. 10.1073/pnas.1209751109 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Mirny LA, Shakhnovich EI. Universally conserved positions in protein folds: reading evolutionary signals about stability, folding kinetics and function. Journal of Molecular Biology. 1999;291(1):177–196. 10.1006/jmbi.1999.2911 [DOI] [PubMed] [Google Scholar]
  • 14. Tokuriki N, Tawfik DS. Stability effects of mutations and protein evolvability. Current Opinion in Structural Biology. 2009;19(5):596–604. 10.1016/j.sbi.2009.08.003 [DOI] [PubMed] [Google Scholar]
  • 15. Miller CR, Lee KH, Wichman HA, Ytreberg FM. Changing folding and binding stability in a viral coat protein: A Comparison between substitutions accessible through mutation and those fixed by natural selection. PLOS ONE. 2014;9(11):1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Maguid S, Fernández-Alberti S, Parisi G, Echave J. Evolutionary conservation of protein backbone flexibility. Journal of Molecular Evolution. 2006;63(4):448–457. 10.1007/s00239-005-0209-x [DOI] [PubMed] [Google Scholar]
  • 17. Pandini A, Mauri G, Bordogna A, Bonati L. Detecting similarities among distant homologous proteins by comparison of domain flexibilities. Protein Engineering, Design and Selection. 2007;20(6):285–299. 10.1093/protein/gzm021 [DOI] [PubMed] [Google Scholar]
  • 18. Gur E, Biran D, Ron EZ. Regulated proteolysis in Gram-negative bacteria —how and when? Nature Reviews Microbiology. 2011;9:839. [DOI] [PubMed] [Google Scholar]
  • 19.Jack BR, Boutz DR, Paff ML, Smith BL, Wilke CO. Transcript degradation and codon usage regulate gene expression in a lytic phage. bioRxiv. 2019;. [DOI] [PMC free article] [PubMed]
  • 20. Bloom JD, Raval A, Wilke CO. Thermodynamics of neutral protein evolution. Genetics. 2007;175(1):255–266. 10.1534/genetics.106.061754 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Bloom JD, Silberg JJ, Wilke CO, Drummond DA, Adami C, Arnold FH. Thermodynamic prediction of protein neutrality. Proceedings of the National Academy of Sciences. 2005;102(3):606–611. 10.1073/pnas.0406744102 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Tokuriki N, Stricher F, Serrano L, Tawfik DS. How protein stability and new functions trade off. PLOS Computational Biology. 2008;4(2):1–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Tokuriki N, Stricher F, Schymkowitz J, Serrano L, Tawfik DS. The stability effects of protein mutations appear to be universally distributed. Journal of Molecular Biology. 2007;369(5):1318–1332. 10.1016/j.jmb.2007.03.069 [DOI] [PubMed] [Google Scholar]
  • 24. Greenfield NJ. Using circular dichroism collected as a function of temperature to determine the thermodynamics of protein unfolding and binding interactions. Nature Protocols. 2006;1(6):2527–2535. 10.1038/nprot.2006.204 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Johnson CM. Differential scanning calorimetry as a tool for protein folding and stability. Archives of Biochemistry and Biophysics. 2013;531(1):100–109. [DOI] [PubMed] [Google Scholar]
  • 26. Freire E. In: Shirley BA, editor. Differential Scanning Calorimetry. Totowa, NJ: Humana Press; 1995. p. 191–218. [DOI] [PubMed] [Google Scholar]
  • 27. Ramanathan R, Muñoz V. A method for extracting the free energy surface and conformational dynamics of fast-folding proteins from single molecule photon trajectories. The Journal of Physical Chemistry B. 2015;119(25):7944–7956. 10.1021/acs.jpcb.5b03176 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Leavitt S, Freire E. Direct measurement of protein binding energetics by isothermal titration calorimetry. Current Opinion in Structural Biology. 2001;11(5):560–566. 10.1016/S0959-440X(00)00248-7 [DOI] [PubMed] [Google Scholar]
  • 29. Schasfoort RBM. Chapter 1 Introduction to Surface Plasmon Resonance In: Handbook of Surface Plasmon Resonance (2). The Royal Society of Chemistry; 2017. p. 1–26. [Google Scholar]
  • 30. Siloto RMP, Weselake RJ. Site saturation mutagenesis: Methods and applications in protein engineering. Biocatalysis and Agricultural Biotechnology. 2012;1(3):181–189. 10.1016/j.bcab.2012.03.010 [DOI] [Google Scholar]
  • 31. Fowler DM, Araya CL, Fleishman SJ, Kellogg EH, Stephany JJ, Baker D, et al. High-resolution mapping of protein sequence-function relationships. Nature Methods. 2010;7:741–746. 10.1038/nmeth.1492 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Petrosino JF, Palzkill T. Systematic mutagenesis of the active site omega loop of TEM-1 beta-lactamase. Journal of Bacteriology. 1996;178(7):1821–1828. 10.1128/JB.178.7.1821-1828.1996 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Jacob F, Joris B, Lepage S, Dusart J, Frère JM. Role of the conserved amino acids of the ‘SDN’ loop (Ser130, Asp131 and Asn132) in a class A β-lactamase studied by site-directed mutagenesis. Biochemical Journal. 1990;271(2):399–406. 10.1042/bj2710399 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Zaccolo M, Gherardi E. The effect of high-frequency random mutagenesis on in vitro protein evolution: a study on TEM-1 beta-lactamase. Journal of Molecular Biology. 1999;285(2):775–783. 10.1006/jmbi.1998.2262 [DOI] [PubMed] [Google Scholar]
  • 35. Orencia MC, Yoon JS, Ness JE, Stemmer WPC, Stevens RC. Predicting the emergence of antibiotic resistance by directed evolution and structural analysis. Nature Structural Biology. 2001;8:238–242. 10.1038/84981 [DOI] [PubMed] [Google Scholar]
  • 36. Delaire M L S JP, Masson JM. Site-directed mutagenesis at the active site of Escherichia coli TEM-1 beta-lactamase. Suicide inhibitor-resistant mutants reveal the role of arginine 244 and methionine 69 in catalysis. Journal of Biological Chemistry. 1992;267:20600–20606. [PubMed] [Google Scholar]
  • 37. Gibson RM, Christensen H, Waley SG. Site-directed mutagenesis of beta-lactamase I. Single and double mutants of Glu-166 and Lys-73. Biochemical Journal;(3):613–619. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Firnberg E, Labonte JW, Gray JJ, Ostermeier M. A comprehensive, high-resolution map of a Gene’s fitness landscape. Molecular Biology and Evolution. 2014;31(6):1581–1592. 10.1093/molbev/msu081 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Jacquier H, Birgy A, Le Nagard H, Mechulam Y, Schmitt E, Glodt J, et al. Capturing the mutational landscape of the beta-lactamase TEM-1. Proceedings of the National Academy of Sciences. 2013;110(32):13067–13072. 10.1073/pnas.1215206110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Deng Z, Huang W, Bakkalbasi E, Brown NG, Adamski CJ, Rice K, et al. Deep sequencing of systematic combinatorial libraries reveals β-lactamase sequence constraints at high resolution. Journal of molecular biology. 2012;424(3-4):150–167. 10.1016/j.jmb.2012.09.014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Stiffler MA, Hekstra DR, Ranganathan R. Evolvability as a function of purifying selection in TEM-1 beta-lactamase. Cell. 2015;160(5):882–892. 10.1016/j.cell.2015.01.035 [DOI] [PubMed] [Google Scholar]
  • 42. Allen MP, Tildesley DJ. Computer Simulation of Liquids. New York, NY, USA: Clarendon Press; 1989. [Google Scholar]
  • 43. Dror RO, Dirks RM, Grossman JP, Xu H, Shaw DE. Biomolecular simulation: A computational microscope for molecular biology. Annual Review of Biophysics. 2012;41(1):429–452. 10.1146/annurev-biophys-042910-155245 [DOI] [PubMed] [Google Scholar]
  • 44. Frenkel D, Smit B. Understanding Molecular Simulation. 2nd ed Orlando, FL, USA: Academic Press, Inc.; 2001. [Google Scholar]
  • 45. Kasson PM, Ensign DL, Pande VS. Combining Molecular Dynamics with Bayesian Analysis To Predict and Evaluate Ligand-Binding Mutations in Influenza Hemagglutinin. Journal of the American Chemical Society. 2009;131(32):11338–11340. 10.1021/ja904557w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Lazaridis T, Karplus M. Effective energy functions for protein structure prediction. Current Opinion in Structural Biology. 2000;10(2):139—145. 10.1016/S0959-440X(00)00063-4 [DOI] [PubMed] [Google Scholar]
  • 47. Guerois R, Nielsen JE, Serrano L. Predicting changes in the stability of proteins and protein complexes: A study of more than 1000 mutations. Journal of Molecular Biology. 2002;320(2):369—387. 10.1016/S0022-2836(02)00442-4 [DOI] [PubMed] [Google Scholar]
  • 48. Wylie CS, Shakhnovich EI. A biophysical protein folding model accounts for most mutational fitness effects in viruses. Proceedings of the National Academy of Sciences. 2011;108(24):9916–9921. 10.1073/pnas.1017572108 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Jacquier H, Birgy A, Le Nagard H, Mechulam Y, Schmitt E, Glodt J, et al. Capturing the mutational landscape of the beta-lactamase TEM-1. Proceedings of the National Academy of Sciences. 2013;110(32):13067–13072. 10.1073/pnas.1215206110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Knies JL, Cai F, Weinreich DM. Enzyme efficiency but not thermostability drives cefotaxime resistance evolution in TEM-1 β-lactamase. Molecular Biology and Evolution. 2017;34(5):1040–1054. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Walsh C. Antibiotics: actions, origins, resistance. Walsh C, editor. Washington, USA: American Society for Microbiology (ASM); 2003. [Google Scholar]
  • 52. Morin RB, Gorman M. The Biology of B-Lactam Antibiotics. Elsevier; 2014. [Google Scholar]
  • 53. Schymkowitz J, Borg J, Stricher F, Nys R, Rousseau F, Serrano L. The FoldX web server: an online force field. Nucleic Acids Research. 2005;33(suppl_2):W382–W388. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. Chaudhury S, Lyskov S, Gray JJ. PyRosetta: a script-based interface for implementing molecular modeling algorithms using Rosetta. Bioinformatics. 2010;26(5):689–691. 10.1093/bioinformatics/btq007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55. Dehouck Y, Kwasigroch JM, Gilis D, Rooman M. PoPMuSiC 2.1: a web server for the estimation of protein stability changes upon mutation and sequence optimality. BMC Bioinformatics. 2011;12(1):151 10.1186/1471-2105-12-151 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56. Trott O, Olson AJ. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. Journal of computational chemistry. 2010;31(2):455–461. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57. Potapov V, Cohen M, Schreiber G. Assessing computational methods for predicting protein stability upon mutation: good on average but not in the details. Protein Engineering, Design and Selection. 2009;22(9):553–560. 10.1093/protein/gzp030 [DOI] [PubMed] [Google Scholar]
  • 58. Bava KA, Gromiha MM, Uedaira H, Kitajima K, Sarai A. ProTherm, version 4.0: Thermodynamic database for proteins and mutants. Nucleic Acids Research. 2004;32(suppl_1):D120–D121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59. Bershtein S, Segal M, Bekerman R, Tokuriki N, Tawfik DS. Robustness–epistasis link shapes the fitness landscape of a randomly drifting protein. Nature. 2006;444(7121):929–932. 10.1038/nature05385 [DOI] [PubMed] [Google Scholar]
  • 60. Fonzé E, Charlier P, To’th Y, Vermeire M, Raquet X, Dubus A, et al. TEM1 β-lactamase structure solved by molecular replacement and refined structure of the S235A mutant. Acta Crystallographica Section D. 1995;51(5):682–694. 10.1107/S0907444994014496 [DOI] [PubMed] [Google Scholar]
  • 61. Patel JS, Quates CJ, Johnson EL, Ytreberg FM. Expanding the watch list for potential Ebola virus antibody escape mutations. PLOS ONE. 2019;14(3):1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62. Miller CR, Johnson EL, Burke AZ, Martin KP, Miura TA, Wichman HA, et al. Initiating a watch list for Ebola virus antibody escape mutations. PeerJ. 2016;4:e1674 10.7717/peerj.1674 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63. Hess B, Kutzner C, van der Spoel D, Lindahl E. GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable molecular simulation. Journal of Chemical Theory and Computation. 2008;4(3):435–447. 10.1021/ct700301q [DOI] [PubMed] [Google Scholar]
  • 64. Van Durme J, Delgado J, Stricher F, Serrano L, Schymkowitz J, Rousseau F. A graphical interface for the FoldX forcefield. Bioinformatics. 2011;27(12):1711–1712. 10.1093/bioinformatics/btr254 [DOI] [PubMed] [Google Scholar]
  • 65. Kellogg EH, Leaver-Fay A, Baker D. Role of conformational sampling in computing mutation-induced changes in protein structure and stability. Proteins: Structure, Function, and Bioinformatics. 2011;79(3):830–838. 10.1002/prot.22921 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66. Alford RF, Leaver-Fay A, Jeliazkov JR, O’Meara MJ, DiMaio FP, Park H, et al. The Rosetta all-atom energy function for macromolecular modeling and design. Journal of Chemical Theory and Computation. 2017;13(6):3031–3048. 10.1021/acs.jctc.7b00125 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67. Shapovalov MV, Dunbrack RL Jr. A smoothed backbone-dependent rotamer library for proteins derived from adaptive kernel density estimates and regressions. Structure. 2011;19(6):844–858. 10.1016/j.str.2011.03.019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68. Baxter J. Local optima avoidance in depot location. Journal of the Operational Research Society. 1981;32(9):815–819. 10.2307/2581397 [DOI] [Google Scholar]
  • 69. Blum C, Roli A, Sampels M. Hybrid metaheuristics: an emerging approach to optimization. vol. 114 Springer; 2008. [Google Scholar]
  • 70. Gaillard T. Evaluation of AutoDock and AutoDock Vina on the CASF-2013 benchmark. Journal of Chemical Information and Modeling. 2018;58(8):1697–1706. 10.1021/acs.jcim.8b00312 [DOI] [PubMed] [Google Scholar]
  • 71. Stec B, Holtz KM, Wojciechowski CL, Kantrowitz ER. Structure of the wild-type TEM-1 beta-lactamase at 1.55 Å and the mutant enzyme Ser70Ala at 2.1 Å suggest the mode of noncovalent catalysis for the mutant enzyme. Acta Crystallographica Section D: Biological Crystallography. 2005;61(8):1072–1079. 10.1107/S0907444905014356 [DOI] [PubMed] [Google Scholar]
  • 72. Stojanoski V, Adamski CJ, Hu L, Mehta SC, Sankaran B, Zwart P, et al. Removal of the side chain at the active-site serine by a glycine substitution increases the stability of a wide range of serine beta-lactamases by relieving steric strain. Biochemistry. 2016;55(17):2479–2490. 10.1021/acs.biochem.6b00056 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73. Broom A, Jacobi Z, Trainor K, Meiering EM. Computational tools help improve protein stability but with a solubility tradeoff. Journal of Biological Chemistry. 2017;292(35):14349–14361. 10.1074/jbc.M117.784165 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74. Bub O, Rudat J, Ochsenreither K. FoldX as Protein Engineering Tool: Better Than Random Based Approaches? Computational and Structural Biotechnology Journal. 2018;16:25–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Hampton RE, Havel JE. Introductory Biological Statistics. Waveland Press; 2006. Available from: https://books.google.com/books?id=SBJFAQAAIAAJ.
  • 76. Kumar MDS, Bava KA, Gromiha MM, Prabakaran P, Kitajima K, Uedaira H, et al. ProTherm and ProNIT: thermodynamic databases for proteins and protein-nucleic acid interactions. Nucleic Acids Research. 2006;34(suppl_1):D204–D206. 10.1093/nar/gkj103 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77. Weinreich DM, Delaney NF, DePristo MA, Hartl DL. Darwinian evolution can follow only very few mutational paths to fitter proteins. Science. 2006;312(5770):111–114. 10.1126/science.1123539 [DOI] [PubMed] [Google Scholar]
  • 78. Weinreich DM, Watson RA, Chao L. Perspective: Sign epistasis and genetic constraints on evolutionary trajectories. Evolution. 2005;59(6):1165–1174. 10.1111/j.0014-3820.2005.tb01768.x [DOI] [PubMed] [Google Scholar]
  • 79. Weinreich DM, Lan Y, Wylie CS, Heckendorn RB. Should evolutionary geneticists worry about higher-order epistasis? Current Opinion in Genetics and Development. 2013;23(6):700–707. 10.1016/j.gde.2013.10.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Wackerly D, Mendenhall W, Scheaffer RL. Mathematical Statistics with Applications. International student edition / Brooks-Cole. Cengage Learning; 2014. Available from: https://books.google.com/books?id=ZvPKTemPsY4C.
  • 81. Herzberg O, Moult J. Bacterial resistance to beta-lactam antibiotics: crystal structure of beta-lactamase from Staphylococcus aureus PC1 at 2.5 A resolution. Science. 1987;236(4802):694–701. 10.1126/science.3107125 [DOI] [PubMed] [Google Scholar]
  • 82. Brunk E, Chang RL, Xia J, Hefzi H, Yurkovich JT, Kim D, et al. Characterizing posttranslational modifications in prokaryotic metabolism using a multiscale workflow. Proceedings of the National Academy of Sciences. 2018;115(43):11096–11101. 10.1073/pnas.1811971115 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83. Hart KM, Ho CMW, Dutta S, Gross ML, Bowman GR. Modelling proteins’ hidden conformations to predict antibiotic resistance. Nature Communications. 2016;7:12965 10.1038/ncomms12965 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84. Ash J, Fourches D. Characterizing the chemical space of ERK2 kinase inhibitors using descriptors computed from molecular dynamics trajectories. Journal of Chemical Information and Modeling. 2017;57(6):1286–1299. 10.1021/acs.jcim.7b00048 [DOI] [PubMed] [Google Scholar]
  • 85. VanWart AT, Eargle J, Luthey-Schulten Z, Amaro RE. Exploring residue component contributions to dynamical network models of allostery. Journal of Chemical Theory and Computation. 2012;8(8):2949–2961. 10.1021/ct300377a [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86. Naganathan AN. Modulation of allosteric coupling by mutations: from protein dynamics and packing to altered native ensembles and function. Current opinion in structural biology. 2019;54:1–9. 10.1016/j.sbi.2018.09.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87. Rajasekaran N, Suresh S, Gopi S, Raman K, Naganathan AN. A general mechanism for the propagation of mutational effects in proteins. Biochemistry. 2017;56(1):294–305. 10.1021/acs.biochem.6b00798 [DOI] [PubMed] [Google Scholar]
  • 88. Rajasekaran N, Sekhar A, Naganathan AN. A universal pattern in the percolation and dissipation of protein structural perturbations. The journal of physical chemistry letters. 2017;8(19):4779–4784. 10.1021/acs.jpclett.7b02021 [DOI] [PubMed] [Google Scholar]
  • 89. Bowman GR, Bolin ER, Hart KM, Maguire BC, Marqusee S. Discovery of multiple hidden allosteric sites by combining Markov state models and experiments. Proceedings of the National Academy of Sciences. 2015;112(9):2734–2739. 10.1073/pnas.1417811112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90. Audagnotto M, Peraro MD. Protein post-translational modifications: In silico prediction tools and molecular modeling. Computational and Structural Biotechnology Journal. 2017;15:307–319. 10.1016/j.csbj.2017.03.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91. Powers ET, Powers DL, Gierasch LM. FoldEco: a model for proteostasis in E. coli. Cell Reports. 2012;1(3):265–276. 10.1016/j.celrep.2012.02.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92. Patel JS, Ytreberg FM. Fast calculation of protein–protein binding free energies using umbrella sampling with a coarse-grained model. Journal of Chemical Theory and Computation. 2018;14(2):991–997. 10.1021/acs.jctc.7b00660 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93. Fukunishi H, Watanabe O, Takada S. On the Hamiltonian replica exchange method for efficient sampling of biomolecular systems: Application to protein structure prediction. The Journal of Chemical Physics. 2002;116(20):9058–9067. 10.1063/1.1472510 [DOI] [Google Scholar]

Decision Letter 0

Jose M Sanchez-Ruiz

22 Apr 2020

PONE-D-20-04028

Predicting the Viability of Beta-Lactamase: How Folding and Binding Free Energies Correlate with Beta-Lactamase Fitness

PLOS ONE

Dear Dr. Rubenstein,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Your manuscript has been examined by two experts in the field. They both find your work of interest, but point to a number of issues that must be convincingly addressed in a revised version.

We would appreciate receiving your revised manuscript by Jun 06 2020 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Jose M. Sanchez-Ruiz

Academic Editor

PLOS ONE

Journal requirements:

When submitting your revision, we need you to address these additional requirements:

1.    Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at http://www.plosone.org/attachments/PLOSOne_formatting_sample_main_body.pdf and http://www.plosone.org/attachments/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors of this work employ multiple computational servers to predict and rationalise the changes in free energy changes obtained from specific single-point and double mutations on beta-lactamase. They take a step further to explore how well fitness correlates with change in stability measured from experiments. The work is written well

and though the results are not positive I enjoyed reading the overall thought process associated with their discussion and the need for fitness prediction algorithms. I have the following comments on their work which should be addressed.

1) Correlation in itself does not mean a good predictive ability as the mutant free energy changes should also lie in the correct quadrant. The authors should highlight the four quadrants in plots where experimental and predicted stability changes are compared. This makes it easier for the reader to quickly get an estimate of how many are correctly predicted apart from knowing how well they are predicted.

2) Fig. 4: It is tough to say whether PyRosetta or FOLDX performs better here as there are only 6 data points. The number of data points should be a lot more to authoritatively say that one method works better than the other. The correlations in this case therefore make little sense as can also be seen from the p-values.

3) Fig. 7: In the discussion associated with this figure, the authors say that there is 'folding free energies are reasonable predictors of fitness for many residues". How do they come to this conclusion and what fraction of the points 'correlate' well? This number alone could give some insight as all Fig 7 suggests is that DDG cannot be a good measure of fitness.

4) Adding to the point above, DDG need not be the one factor that can affect fitness as quantified here. There are numerous factors associated with a protein including kinetic stability , diffusivity with in a cell , quinary interactions that aid or impede an organisms fitness in a myriad number of ways. These could be one other factors that could confound the observations in Figure 7.

5) On the effect of mutations: it is likely that most mutational servers should also consider propagatory effects which are not currently included. For example, see recent works on this in PMIDs: 30268910, 28910120, 27958720. The authors could consider discussing these as one of the reasons why PyRosetta or FOLDX perform worse when predicting changes in stability.

Reviewer #2: The manuscript by Yang et al. compared the folding free energies of 21 TEM-1 b-lactamase single and double mutants and previous fitness data. The concept of this study is very interesting and within the scope of Plos One. However, some unclear points should be addressed before publication

page 2: differential scanning calorimetry…. any references for the methods/tools would be extremely helpful;

page 2: be determined via isothermal titration calorimetry or surface plasmon resonance…… any references for the methods/tools would be extremely helpful;

Figure 1: add the pdb entry for the TEM-1 structure

Page5:. The thermodynamic stability of each allele was determined by circular dichroism (CD) on a Jasco J-815 spectrometer… What buffer/pH?

Page 5: In methodology, it is stated that the melting temperature and van´t Hoff enthalpy was fitted to a two-state transition, however, none of the figures and result are shown (a table listing all result would be helpful)

Pag 6 “it does not contain information regarding TEM-1 and thus FoldX has neither been explicitly fit nor tested on the TEM-1 mutants analysed here”. …….. I do not think this main conclusion of the study is correct. See PROTEINS 23 63-72 (1995) PMID: 8539251 included in the ProTherm database.

Please change “wildtype” on page 8 to “wild type”

- Figura s2 is not mentioned

- Figura s3 is not mentioned

- Figura s7 is not mentioned

Reference: 35,41,57,69,70,72,77: Remove DOI number please.

Supporting information:

Page 6 : “10^6 cfu/mL, not 106 cfu/mL”

Figure S4. Date of the Wylie mutants are incomplete

I found Figure S6 to be confusing. Some Wylie mutants are missing (A172P, G218V, S70G) and there are others mutants (not named in the manuscript)

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 May 29;15(5):e0233509. doi: 10.1371/journal.pone.0233509.r002

Author response to Decision Letter 0


4 May 2020

We uploaded our response letter, however, our responses are copied below. Please refer to our response letter, as it is properly formatted.

Dear Dr. Heber and Reviewers,

We graciously thank you and the reviewers for their time reviewing our manuscript. Please find our point-by-point responses below.

Review Comments to the Authors

Reviewer #1: The authors of this work employ multiple computational servers to predict and rationalise the changes in free energy changes obtained from specific single-point and double mutations on beta-lactamase. They take a step further to explore how well fitness correlates with change in stability measured from experiments. The work is written well and though the results are not positive I enjoyed reading the overall thought process associated with their discussion and the need for fitness prediction algorithms.

First and foremost, we thank the reviewer for his/her/their time carefully reviewing our manuscript and are pleased that the reviewer found it enjoyable to read. We have done our best to address your insightful comments below. All changes to our manuscript are indicated in blue in the manuscript and supplemental information documents

I have the following comments on their work which should be addressed.

1) Correlation in itself does not mean a good predictive ability as the mutant free energy changes should also lie in the correct quadrant. The authors should highlight the four quadrants in plots where experimental and predicted stability changes are compared. This makes it easier for the reader to quickly get an estimate of how many are correctly predicted apart from knowing how well they are predicted.

We thank the reviewer for this suggestion, which will make our results easier to understand. Following your advice, we have shaded the first and third quadrants of main text Figures 2 and 4, and Supplementary Figure 3. By shading these quadrants, we make it clear which points are predicted to be positive by both experimental and computational techniques and which are predicted to be negative by both sets of techniques. This shading complements what we discussed in our original manuscript:

“As illustrated in Figure 2, which compares experimental and computational free energies for the single mutants, both MD+FoldX and PyRosetta free energies of folding positively correlate with the experimentally determined values for single mutants: more positive experimental values are matched by more positive computational predictions (see Supporting Figure S1 for purely FoldX predictions, which parallel the MD+FoldX results). It is gratifying to see that both methods predict the majority of the mutants to lie in the same relative places on these plots, with mutants that induce negative effects on free energies of folding experimentally inducing negative or near-negative effects on free energies computationally and vice-versa. Overall, MD+FoldX correctly predicts the signs of 14 out of 15 mutants, while PyRosetta does so for 11 out of 15 mutants.”

However, building upon your suggestions, we have incorporated a discussion of the shading into our new main text description of this figure and clarified our discussion:

“In Figure 2, we plot experimentally-determined folding free energies against computationally-predicted free energies for the single mutants. We have shaded the first and third quadrants in this figure to ease identification of the mutants whose experimental and computational free energies are of the same sign. It is thus gratifying to see that both MD+FoldX and PyRosetta free energies of folding positively correlate with the experimentally determined values for these mutants: more positive experimental values are matched by more positive computational predictions, while more negative experimental values are matched by more negative computational predictions (see Supporting Figure S1 for purely FoldX predictions, which parallel the MD+FoldX results). Indeed, as can be determined by counting the number of mutants in the shaded regions, MD+FoldX correctly predicts the signs of 14 out of 15 mutants, while PyRosetta does so for 11 out of 15 mutants. In general, both MD+FoldX and PyRosetta predict the majority of the mutants to lie in the same relative places on these plots (see Figure S2 for a direct comparison of MD+FoldX and PyRosetta predictions). It is moreover pleasing to see that PyRosetta predicts S70G, which is known to be stabilizing and is thus in some sense a control, to have a negative ΔΔGfold value; MD+FoldX fortunately also yields a reasonably accurate, although not fully stabilizing, prediction for this mutant.”

We have moreover added the following text to our captions:

“The shaded regions delineate the first and third quadrants of the plot, which contain mutants whose free energies are of the same sign according to both experiment and computation.”

2) Fig. 4: It is tough to say whether PyRosetta or FOLDX performs better here as there are only 6 data points. The number of data points should be a lot more to authoritatively say that one method works better than the other. The correlations in this case therefore make little sense as can also be seen from the p-values.

We thank the reviewer for this comment as we whole-heartedly agree. Experiments aimed at creating and measuring the free energies of folding of a wider array of double mutants are underway, although, particularly due to COVID-19, we will not have access to these for quite some time. Thus, we have to make judgements based on the mutants we have.

This said, in the original manuscript, we indicated our agreement with your statement in the following sentence:

“Despite being limited, this set of mutants is thus ripe for benchmarking how predictive computational techniques are for multiply-mutated proteins.”

After carefully perusing the manuscript, we also never said that PyRosetta is better than FoldX at predicting mutant free energies based upon the r2 values provided in the figures. Our argument was predicated on the fact that regardless of whether the double mutantss’ folding free energies are sums of their constituent single mutants’ folding free energies or not, FoldX always predicts their free energies to be additive, which is clearly incorrect. PyRosetta at least leaves room for its simulations to yield nonadditive double mutant free energies, even though these are also not particularly accurate. We made this argument in the original manuscript through the following text:

“Interestingly, we find that, regardless of whether experiment predicts the mutants to be additive or non-additive, FoldX and MD+FoldX always yield additive predictions (middle panel of Figure 3). The additivity of MD+FoldX, even when supplemented with MD relaxation of the original wild type structure, may be anticipated based upon the fact that it does not globally relax mutant conformations. In contrast, PyRosetta generally yields non-additive predictions (left-most panel of Figure 3). It is because of this non-additivity that PyRosetta (r=0.73) outperforms FoldX (r=0.33) in predicting the folding free energies of the Wylie double mutants, as depicted in Figure 4. Nevertheless, the fact that PyRosetta's double mutant free energy predictions are always superadditive, likely because it is unable to fully relax double mutant structures, also makes its predictions questionable.”

To clarify that we are not entirely basing our arguments in the above text on r-values, we have removed the parenthetical references to these values in our revised text. We have nevertheless chosen to keep them on our plot for reference, as they are accompanied by accurate p-values which should help the reader analyze significance of the correlation.

3) Fig. 7: In the discussion associated with this figure, the authors say that there is 'folding free energies are reasonable predictors of fitness for many residues". How do they come to this conclusion and what fraction of the points 'correlate' well? This number alone could give some insight as all Fig 7 suggests is that DDG cannot be a good measure of fitness.

We agree with the reviewer that the data in Figure 7 show that free energies do not correlate perfectly with fitness values. In fact, that’s the crux of the paper and we state this in several places in the original manuscript, including in the Introduction:

“We find that folding free energies account for, at most, 24% of the variance in beta-lactamase fitness values according to linear models and, somewhat surprisingly, complementing folding free energies with computationally-predicted binding free energies of residues near the active site only increases the folding-only figure by a few percent. This strongly suggests that the majority of beta-lactamase's fitness is controlled by factors other than free energies.”

Our statement that the free energies are reasonable (note that we said reasonable, not good or strong) predictors of fitness is primarily based on the fact that our computational free energy predictions yield relatively few false negatives (22) in comparison with the number of true negatives (506) and positives (2175) they yield. The number of false negatives, false positives, true negatives, and true positives our modeling obtains is presented below and has been added as Figure S4 to the Supplementary Materials:

We have defined false negatives to be mutants predicted to be unstable (ΔΔG > 5 kcal/mol) yet are experimentally determined to be fit (Fitness > 0.5), false positives to be mutants that are predicted to be stable (ΔΔG > 5 kcal/mol) yet are experimentally unfit (Fitness < 0.5), true negatives to be mutants that are predicted to be unstable (ΔΔG > 5 kcal/mol) and are unfit (Fitness < 0.5), and true positives to be mutants that are predicted to be stable (ΔΔG < 5 kcal/mol) and are fit (Fitness > 0.5). We note that the ΔΔG and fitness cutoffs used in these definitions are somewhat arbitrary, but a ΔΔGfold of 5 kcal/mol was selected for the folding free energy because, at 8 kcal/mol, beta-lactamase unfolds and thus, above 5 kcal/mol, it is expected to be unstable.

In the original manuscript, we indicated that this is what we meant by reasonable predictor in the following text:

“From the left-hand panel of Figure 7, it is clear that folding free energies are reasonable predictors of fitness for many residues: many mutants with negative or near neutral effects on folding free energies also possess large fitness values. There is additionally an abundance of mutants for which large positive free energy predictions correlate with low fitness, also as one would hope...Remarkably, our plots manifest strikingly few cases for which mutants with large folding free energies (>5 kcal/mol) possess high fitness, which we will term false negatives. Even though this is heartening, there nevertheless exist numerous false positives: mutants that exist in the lower left corner of the plot whose small folding free energy differences, which one would expect to correspond to high fitness values, nonetheless map to low fitness values.”

Based upon your comments, we decided to quantify these statements by including a more thorough discussion of true positives, true negatives, false positives, and false negatives in the text in addition to a discussion of how many of the mutants fall into these four categories:

“From the left-hand panel of Figure 7 and Supplemental Information Table S4, it is clear that folding free energies are reasonable predictors of fitness: many (2175) mutants predicted to be stable with ΔΔGfold<5 kcal/mol are in fact fit, possessing fitness values greater than 0.5 (so-called 'true positives') \\footnote{We note that the ΔΔGfold and fitness cutoffs used in these definitions are somewhat arbitrary, but a ΔΔGfold of 5 kcal/mol was selected for the folding free energy because at 8 kcal/mol, beta-lactamase unfolds and thus, above 5 kcal/mol, it is expected to be unstable.}. There are additionally many (506) mutants predicted to be unstable, with ΔΔGfold>5 kcal/mol, that are unfit, possessing fitness values less than 0.5 (so-called 'true negatives'), also as one would hope. As can be inferred from the labeled residues toward the right of the left-hand panel of Figure 7, most of these large free energy mutants involve substitutions of volumetrically smaller residues, such as glycine and alanine, with larger, bulky residues, such as tryptophan and tyrosine, which dramatically raise the free energy contributions associated with steric clash. Remarkably, our plots manifest strikingly few (22) cases for which mutants with large folding free energies (> 5 kcal/mol) possess high fitness (> 0.5), which we term false negatives. Even though this is heartening, there nevertheless exist numerous (2079) false positives: mutants that exist in the lower left corner of the plot whose small folding free energy differences (< 5 kcal/mol), which one would expect to correspond to high fitness values (> 0.5), nonetheless map to low fitness values (< 0.5). It is also clear from the right-hand panel of Figure 7 that small (< 5 kcal/mol) changes in folding free energies which destabilize the protein, but do not unfold it (based upon its ΔGfold~ -8 kcal/mol), lead to a wide range of fitness values and are therefore not strongly correlated with fitness.”

4) Adding to the point above, DDG need not be the one factor that can affect fitness as quantified here. There are numerous factors associated with a protein including kinetic stability , diffusivity with in a cell, quinary interactions that aid or impede an organisms fitness in a myriad number of ways. These could be one other factors that could confound the observations in Figure 7.

Again, we absolutely agree and this is one of the key punchlines of the manuscript. For example, in the Abstract, we note that:

“This strongly suggests that the majority of beta-lactamase's fitness is controlled by factors other than free energies.”

In the Introduction, we say:

“Yet, even in those rare instances, even the simplest protein's fitness is influenced by a wide variety of factors including protein and gene expression levels, interactions with chaperones, protein folding stability, protein folding dynamics, and proteolytic susceptibility -- as well as many complex factors yet to be uncovered or understood.”

As well as:

“Nevertheless, as may be expected given the complexity of the overall transcription, translation, and post-translation processes, we demonstrate that thermodynamic descriptors only explain a small fraction of beta-lactamase fitness results.”

And, in the Conclusion, we state:

“Even though it is likely that the techniques used here fail to correctly capture some fraction of the variance in the fitness due to their inherent approximations, these techniques have been shown to perform as well as many of the best empirical effective free energy function methods available and thus our results point more to the deficiencies of thermodynamic predictors than to the deficiencies in our modeling. Although the community's understanding is still evolving, recent research points to the significant impact the kinetics of catalysis \\cite{Knies_MBE_2017}, protein quality control \\cite{Bershtein_MolCell_2013}, protein aggregation and degradation \\cite{DePristo_NatRevGen_2005}, and post-translational modifications \\cite{Brunk_PNAS}, among other non-thermodynamic factors, have on fitness.”

To make both of our points clearer, we have added the following sentence to the Conclusion:

“Indeed, our results strongly suggest that, at least for beta-lactamase, non-thermodynamic measures play a central role in determining fitness.”

You bring up specific examples of non-thermodynamic metrics such as “kinetic stability, diffusivity within a cell, and quinary interactions.” I believe we have addressed kinetic stability and diffusivity when we mentioned “kinetics of catalysis” in the above. To ensure that we address quinary interactions, we have modified the above to explicitly mention protein-protein interactions:

“Although the community's understanding is still evolving, recent research points to the significant impact the kinetics of catalysis \\cite{Knies_MBE_2017}, protein quality control \\cite{Bershtein_MolCell_2013}, protein aggregation, degradation, and interactions with other proteins more generally \\cite{DePristo_NatRevGen_2005}, and post-translational modifications \\cite{Brunk_PNAS}, among other non-thermodynamic factors, have on fitness.”

5) On the effect of mutations: it is likely that most mutational servers should also consider propagatory effects which are not currently included. For example, see recent works on this in PMIDs: 30268910, 28910120, 27958720. The authors could consider discussing these as one of the reasons why PyRosetta or FOLDX perform worse when predicting changes in stability.

We agree with the reviewer that FoldX and PyRosetta only capture the essentially static and short-range effects of mutations. Our group is in fact working on relating intraprotein residue interaction networks to fitness landscapes, very much as described in the papers suggested, of which we were previously unaware (thank you for suggesting these; they were very helpful given this more recent thrust). That said, it is difficult to simulate all single and higher order mutants of a given protein using molecular dynamics to most accurately capture the propagatory effects you describe and hence we do not address them in this paper.

Based on your recommendations, we have incorporated a discussion of these effects and the suggested papers into our Conclusion:

“In fact, recent molecular dynamics simulations and NMR experiments performed on a select set of proteins and their mutants have shown that mutations can have ``propagatory effects'' that can influence the conformation and dynamics of residues up to 25 Å away from them \\cite{naganathan2019modulation,rajasekaran2017general,rajasekaran2017universal}. It would be fascinating and worthwhile to eventually be able to relate such propagatory effects to fitness landscapes.”

We thank the reviewer again for carefully scrutinizing our manuscript, and in particular, for his/her/their recommendations of the "propagatory effects" manuscripts.

____________________________________________________________________________

Reviewer #2: The manuscript by Yang et al. compared the folding free energies of 21 TEM-1 b-lactamase single and double mutants and previous fitness data. The concept of this study is very interesting and within the scope of Plos One. However, some unclear points should be addressed before publication

We thank the reviewer for his/her/their time carefully reviewing our manuscript and for finding our paper interesting. We do our best to address your points below.

page 2: differential scanning calorimetry…. any references for the methods/tools would be extremely helpful;

Thank you for pointing out our omission of these references. We have now added the following references about differential scanning calorimetry to our Introduction:

Christopher M. Johnson. Differential scanning calorimetry as a tool for protein folding and stability. Archives of Biochemistry and Biophysics. 531, 100-109 (2013). https://doi.org/10.1016/j.abb.2012.09.008

Ernesto Freire. Differential scanning calorimetry. Protein Stability and Folding. 191-218 (1995).

https://link.springer.com/protocol/10.1385/0-89603-301-5:191

page 2: be determined via isothermal titration calorimetry or surface plasmon resonance…… any references for the methods/tools would be extremely helpful;

Thank you again for pointing out these omissions. We have now added the following references about isothermal titration calorimetry and surface plasmon resonance to our Introduction:

Isothermal titration calorimetry

Stephanie Leavitt and Ernesto Friere. Direct measurement of protein binding energetics by isothermal titration calorimetry. Current Opinion in Structural Biology. 11 (5): 560-566 (2001). https://doi.org/10.1016/S0959-440X(00)00248-7

Surface plasmon resonance imaging:

Richard B. M. Schasfoort. Chapter 1: Introduction to Surface Plasmon Resonance in Handbook of Surface Plasmon Resonance (2), 1-26 (2017). https://pubs.rsc.org/en/content/chapterhtml/2017/bk9781782627302-00001?isbn=978-1-78262-730-2&sercode=bk

Figure 1: add the pdb entry for the TEM-1 structure

As suggested, we have added the pdb reference for TEM-1, 1xpb, to the Figure.

Page5:. The thermodynamic stability of each allele was determined by circular dichroism (CD) on a Jasco J-815 spectrometer… What buffer/pH?

Thank you for pointing this omission out. We performed our circular dichroism experiments in a 200 mM potassium phosphate pH 7.0 with 4% glycerol buffer. We have added this information to the following sentence of the manuscript:

“The thermodynamic stability of each allele was determined by circular dichroism (CD) on a Jasco J-815 spectrometer in a 200 mM potassium phosphate pH 7.0 with 4% glycerol buffer.”

Page 5: In methodology, it is stated that the melting temperature and van´t Hoff enthalpy was fitted to a two-state transition, however, none of the figures and result are shown (a table listing all result would be helpful)

Based upon your request, we have added a new table (Table S1) under the section entitled “Wylie Mutant Circular Dichroism Data at 25℃” to the supplement containing all mutants’ melting temperatures, enthalpies, and free energies.

Pag 6 “it does not contain information regarding TEM-1 and thus FoldX has neither been explicitly fit nor tested on the TEM-1 mutants analysed here”. …….. I do not think this main conclusion of the study is correct. See PROTEINS 23 63-72 (1995) PMID: 8539251 included in the ProTherm database.

Upon further reviewing the ProTherm database, we found that the reviewer is 100% correct. There are actually many different beta-lactamase entries in the ProTherm database, but only several related to the protein we studied here, TEM-1. However, this point actually strengthens what is the main argument of our study, which is that FoldX, PyRosetta, and free energy changes are not predictive of fitness for the variety of reasons we have articulated. Beta-lactamase’s presence in the ProTherm database emphasizes that FoldX is even less predictive given its training than one would hope.

To rectify the text, we have deleted this sentence on page 6, instead saying that:

“Of relevance to this work, FoldX is especially designed to model single mutants and has been trained on the select set of beta-lactamase mutants found in the ProTherm database, but has been infrequently applied to multi-point mutants \\cite{Bershtein_Nature_2006}.”

Please change “wildtype” on page 8 to “wild type”

We have made this edit, as suggested.

- Figura s2 is not mentioned

We have now added a reference to this Figure in the following text on page 10:

“In general, both MD+FoldX and PyRosetta predict the majority of the mutants to lie in the same relative places on these plots (see Figure S2 for a direct comparison of MD+FoldX and PyRosetta predictions).”

- Figura s3 is not mentioned

We have now added a reference to Figure S3 on page 15 in the following text:

“This said, it is noteworthy that both PyRosetta and MD+FoldX are more accurate at predicting the folding free energies of this set of double mutants than the single mutants presented above (see Figure S3 for a scatterplot of all of the Wylie mutants).”

- Figura s7 is not mentioned

We have added a mention of Figure S7 to page 8 in the following sentence (note that we have renumbered our Supplementary Figures to be clearer about which are figures and which are tables, so Figure S7 has become Figure S5 in the new draft):

“While we have determined the MIC values of the Wylie mutant data set (see the Supporting Information, including Figures S4 and S5, for further information), here, we overwhelmingly employ the more comprehensive set of fitness values acquired by Firnberg et al. in our analyses (see Table 1)\\cite{firnberg2014comprehensive}.”

Reference: 35,41,57,69,70,72,77: Remove DOI number please.

We have removed these numbers based upon your recommendation.

Supporting information:

Page 6 : “10^6 cfu/mL, not 106 cfu/mL”

Thank you, we have fixed the exponent.

Figure S4. Date of the Wylie mutants are incomplete

Presumably, you are referring to the fact that we list 12, not 15, single mutants. This is because the table is meant to list only the Wylie double mutants and their constituent singles (constituent meaning only the singles that are part of the double mutants). Only 12 of the 15 Wylie mutants were part of the 6 double mutants (and therefore “constituent single.”. This table is thus complete for the purpose it is meant to serve.

I found Figure S6 to be confusing. Some Wylie mutants are missing (A172P, G218V, S70G) and there are others mutants (not named in the manuscript)

The intent of this figure was simply to show that there is a reasonably strong positive correlation between MIC and fitness data for beta-lactamase mutants. Wylie mutants don’t have to be used to determine the strength of this correlation. This is why we left out some Wylie MIC data and included other mutant data (by the way, G218V is in the figure). Ultimately, this plot showed that we can safely use the expansive fitness data set we employed throughout the main text, which should be the focus of one’s attention, not the MIC data set (which is why all MIC data is conscribed to the supplement). To clarify this point, we changed the name of this section of the Supplement to “Correlation Between MIC Data and Firnberg Fitness Data” instead of “Correlation Between Wylie Mutant MIC Data and Firnberg Fitness Data.” .

We thank the reviewer again for his/her/their careful review of our manuscript, and in particular, for pointing out our various omissions.

____________________________________________________________________________

Additional Edits Introduced by Authors During the Review Process

During the review process, we realized that our original docking calculations were performed on ampicillin in the wrong protonation state (with a COOH). Ampicillin has a pKa of 2.5 and is therefore strongly acidic. As such, the correct protonation state to use should be COO-. We thus reran our Autodock Vina calculations with this corrected protonation state. As a consistency check, we also changed our AutoDock Vina grid box size from 40x60x40 to 24x26x24. Because we would like to obtain our docking scores that reflect ampicillin attempting to bind to the active site, we felt that using a smaller grid box would ensure that we do not end up measuring docking scores associated with spurious docking events to non-active site portions of the enzyme. Altogether, these modifications did not change our docking scores significantly. This is apparent from the plots of fitness vs. docking score presented below from before and after our change.

In the left plot, we plot our new results (our new Figure 10) with the protonated carboxylate (COO-) and the 24x26x24 grid box; on the right, we plot our old results with the carboxylic accid and a 40x60x40 grid box. The overall shape and range of the distribution remains essentially the same, as is also indicated by the comparatively small changes to the fit parameters. We have nevertheless corrected all of the manuscript's plots, including main text Figures 9 and 10 and Supplementary Materials Figure 6, that were based on the original docking calculations.

We have also added the following text explaining the appropriate ampicillin protonation states to the “Computing Ampicillin Binding Free Energies with Autodock Vina” section:

“Because of ampicillin's low pKa of 2.5, all ampicillin carboxylic acid groups were modeled as carboxylates in our ADV simulations.”

and

“We chose to employ a relatively small grid box so as to reduce the chance that ampicillin binds to a non-active site region of the enzyme, which is undesirable.”

We furthermore note that we have performed Smina calculations that independently verified the accuracy of our new Autodock Vina results, as evidenced by the plot of fitness vs. Smina binding free energy changes below.

Attachment

Submitted filename: Response to Reviewers.pdf

Decision Letter 1

Jose M Sanchez-Ruiz

7 May 2020

Predicting the viability of beta-lactamase: How folding and binding free energies correlate with beta-lactamase fitness

PONE-D-20-04028R1

Dear Dr. Rubenstein,

We are pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it complies with all outstanding technical requirements.

Within one week, you will receive an e-mail containing information on the amendments required prior to publication. When all required modifications have been addressed, you will receive a formal acceptance letter and your manuscript will proceed to our production department and be scheduled for publication.

Shortly after the formal acceptance letter is sent, an invoice for payment will follow. To ensure an efficient production and billing process, please log into Editorial Manager at https://www.editorialmanager.com/pone/, click the "Update My Information" link at the top of the page, and update your user information. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, you must inform our press team as soon as possible and no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

With kind regards,

Jose M. Sanchez-Ruiz

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: (No Response)

Reviewer #2: The authors responded satisfactorily to the prior reviewers' comments, this reviewer is pleased to recommend its publication.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Acceptance letter

Jose M Sanchez-Ruiz

15 May 2020

PONE-D-20-04028R1

Predicting the viability of beta-lactamase: How folding and binding free energies correlate with beta-lactamase fitness

Dear Dr. Rubenstein:

I am pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

For any other questions or concerns, please email plosone@plos.org.

Thank you for submitting your work to PLOS ONE.

With kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Prof. Jose M. Sanchez-Ruiz

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 File. Supplemental information file of supporting discussion and figures.

    Contains all supporting discussion regarding β-lactamase and methods, as well as two supporting tables and nine supporting figures.

    (ZIP)

    Attachment

    Submitted filename: Response to Reviewers.pdf

    Data Availability Statement

    All input/output, script, and data files are available from this manuscript’s GitHub Repository, yanghaobojordan/PLOS_BetaLactamase: https://github.com/yanghaobojordan/plos_betalactamase.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES