Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

bioRxiv logoLink to bioRxiv
[Preprint]. 2023 May 9:2023.05.09.539914. [Version 1] doi: 10.1101/2023.05.09.539914

Simultaneous enhancement of multiple functional properties using evolution-informed protein design

Benjamin Fram 1,#, Ian Truebridge 3,4,5, Yang Su 1, Adam J Riesselman 1,2, John B Ingraham 1, Alessandro Passera 6,7, Eve Napier 8, Nicole N Thadani 1, Samuel Lim 1, Kristen Roberts 10, Gurleen Kaur 10, Michael Stiffler 6, Debora S Marks 1, Christopher D Bahl 3,4,5, Amir R Khan 8,9, Chris Sander 1, Nicholas P Gauthier 1,6,#
PMCID: PMC10197589  PMID: 37214973

Abstract

Designing optimized proteins is important for a range of practical applications. Protein design is a rapidly developing field that would benefit from approaches that enable many changes in the amino acid primary sequence, rather than a small number of mutations, while maintaining structure and enhancing function. Homologous protein sequences contain extensive information about various protein properties and activities that have emerged over billions of years of evolution. Evolutionary models of sequence co-variation, derived from a set of homologous sequences, have proven effective in a range of applications including structure determination and mutation effect prediction. In this work we apply one of these models (EVcouplings) to computationally design highly divergent variants of the model protein TEM-1 β-lactamase, and characterize these designs experimentally using multiple biochemical and biophysical assays. Nearly all designed variants were functional, including one with 84 mutations from the nearest natural homolog. Surprisingly, all functional designs had large increases in thermostability and most had a broadening of available substrates. These property enhancements occurred while maintaining a nearly identical structure to the wild type enzyme. Collectively, this work demonstrates that evolutionary models of sequence co-variation (1) are able to capture complex epistatic interactions that successfully guide large sequence departures from natural contexts, and (2) can be applied to generate functional diversity useful for many applications in protein design.

Introduction

As proteins become increasingly useful across a range of fields including medicine and industry, there is a growing need for designed proteins with optimized characteristics, such as elevated thermostability, higher binding affinity, or increased catalytic activity. Natural proteins are often used as starting points for the development of useful proteins, which can then be engineered as high-performance, task-specific tools. However, efficiently mutating enzymes to yield optimized variants is exceedingly difficult, and randomly mutating enzymes almost always leads to loss of performance, which decreases considerably with every additional mutation [1]. Information-based ‘rational’ engineering can avoid performance loss, but is generally limited to a very small number of sequence changes. One common approach to protein design, directed evolution, makes use of iterative rounds of mutagenesis followed by selection to optimize a specific property like activity or thermostability. However, as increased random mutation count overwhelmingly negatively impacts fitness [1], the number of amino acid changes that can be introduced while still maintaining a reasonable number of functional variants is limited by assay throughput. This stepwise incremental selection strategy is often effective at finding sequences with improved properties with a limited number of mutations. The introduction of many simultaneous changes to a protein’s primary sequence is likely required to diversify and optimize multiple desirable properties, and new methods that enable such large changes in primary sequence are needed. Computational design strategies [26], which account for the complexity of how each mutated residue interacts with all other residues, are likely required to maintain function when introducing more than a handful of mutations.

An evolution-informed computational protein design strategy may provide a means to generate many changes in primary sequence, enabling the exploration of diverse structural and functional properties. Evolutionary models that take into account complex selective conditions over millions of years by learning meaningful constraints on function from related sets of protein sequences [4,79] have been shown to recapitulate core aspects of protein biology, such as 3D structure [7,10,11] and the effects of mutations on protein fitness [2,9,1214]. Some of these models have been used for protein design, generating mutated proteins from a wild type scaffold that maintain function [2,46,15]. Evolutionary couplings models are one subclass of evolutionary models based on residue site- and pairwise dependencies in natural sequence variation [9,16,17]. These models are unsupervised, inferring sequence constraints characteristic of a functional space and quantifying fitness differences between variants without experimentally measured phenotype labels. To make use of these discriminative models for protein design, a sampling algorithm is used to iteratively generate variant sequences that are chosen to optimize a fitness function.

The TEM-1 β-lactamase model system has been extensively used to study protein evolution [5,8,1821]. β-lactamases are a class of enzymes that are produced by bacteria in order to provide resistance to bactericidal β-lactam antibiotics through hydrolysis of their core β-lactam ring. Many bench biologists are familiar with the use of TEM-1 β-lactamase as a marker for successful transformation, in which selection of functional TEM-1-containing plasmids is as simple as growth in a β-lactam antibiotic like ampicillin. Due to experimental tractability, many publications report the effects of mutations on TEM-1 function and stability, including several studies using deep mutational scans [18,20,22]. Other studies have described the exponential decrease in TEM-1 function when subjected to multiple mutations, with the cumulative effect of 10 random mutations completely abrogating enzyme activity [1].

In this work we investigate whether evolutionary models of sequence co-variation can be used to generate novel enzyme variants that contain many changes to the target sequence while maintaining function. In addition, we test whether making large jumps in the primary amino acid sequence can lead to augmented protein properties such as increased thermostability and altered or broadened substrate specificity, and investigate the implications of these mutations on protein 3D structure.

Results

Protein design and testing workflow

Protein design from discriminative models such as EVcouplings [9,23] is a multistep process (Figure 1). A multiple sequence alignment (MSA) of homologous proteins is generated for the protein of interest, which is then used to generate a site and pairwise evolutionary model. This maximum entropy model quantifies evolutionary constraints and is parameterized by both site-specific (hi) and pairwise or epistatic (Jij) constraints (where i and j indicate amino acid positions) with minimal spurious information in the parameter set. The predicted fitness of a specific sequence (σ) is defined as the statistical energy (evolutionary hamiltonian or EVH, Figure 1 top). To have confidence that the model is capable of generating functional sequences, model quality is assessed here by comparing predictions to known biological properties (e.g., recapitulating known structural contacts common to the protein family and/or the known effects of point mutations on protein fitness of individual sequences). Designed sequences are then generated with a sampling algorithm (e.g., Markov Chain Monte Carlo or Gibbs sampling, Figure 1 bottom left) that optimizes EVH fitness of each entire sequence and satisfies user-specified sequence distance constraints.

Figure 1: The β-lactamase variant design process.

Figure 1:

Strategy applied to generate and test design variants using an evolution-informed statistical model of the β-lactamase protein family. [Computationally Model β-lactamase, top] WT TEM-1 β-lactamase was used to generate a multiple sequence alignment that was used as input to derive an EVcouplings maximum entropy model. The predicted fitness (EVH) for any sequence (σ) can be calculated as the sum of coupling terms Jij between every pair of residues as well as site-wise conservation terms hi. [Computationally Design β-lactamase Variants, bottom left] Design variants are generated by using Gibbs sampling to iteratively optimize an objective function that takes into account EVH and sequence similarity to WT TEM-1, to natural homologs, as well as to the other designed sequences. Designs with increasingly enforced dissimilarity to natural sequences were nominated for experimental testing. [Experimentally Test Designs, bottom right] Designs were synthesized, cloned into plasmids, expressed in E. coli, and several experimental protocols were performed to characterize each design including cell-based activity assays, biochemical kinetics analysis, and structural analysis.

Conceptually, a single design variant is generated from an iterative process in which a random starting sequence is mutated over and over with the identity of the retained mutations chosen to optimize a function that includes parameters such as predicted fitness and sequence distance to target or homologous proteins. Testing of designs is dependent on the protein of interest, and can include cellular biological activity assays or detailed biochemical characterization after protein purification (Figure 1 bottom right). For this work we used the TEM-1 β-lactamase model system, which is highly tractable for high-throughput experimental analysis in bacterial culture, and has been well studied for many purposes including evolvability [8,24,25], design [5], and the fitness effect of point mutations [8,20,21].

Diverse design variants generated from model of protein evolution

We generated a global probability model [7,9] of the β-lactamase protein family using a multiple sequence alignment of 14,793 sequences compiled by the jackhmmer[26] sequence search and alignment tool, seeded with wild type TEM-1 (WT TEM-1; UniProt P62593; bitscore cutoff = 0.5*length, Neff = 3,757, Figure 1). Alignment depth was selected to be largely composed of bonafide β-lactamases. Analysis of the model revealed that over 80% of the top L predicted residue-residue interactions (“evolutionary couplings”, where L is the length of aligned WT TEM-1 residues) match structural contacts in a known 3D structure of WT TEM-1 solved using X-ray crystallography (PDB: 1XPB [27], Figure 2 Model Quality), as well as in crystal structures of homologs (data not shown). In addition, mutation effect prediction from the model, for single residue variants, is positively correlated with a published deep mutational scan of WT TEM-1 (replicate 1: n=4,788 with spearman=0.717; replicate 2: n=4,769 with spearman=0.702; Supplemental Figure 1) [18]. These data indicate that the evolutionary hamiltonian (EVH) model is able to capture both the structure and functional sequence dependence of TEM-1 β-lactamase.

Figure 2: Quality of computational model and properties of experimentally tested design variants.

Figure 2:

[Model Quality] The computational model was evaluated for quality by comparison to published metrics of β-lactamase structure and function. Predicted residue-residue interactions (top L model couplings that are at least 5 positions apart in primary sequence) are compared to known structural contacts of WT TEM-1 as determined by X-ray crystallography (PDB: 1XPB [27]. Over 80% of the top predicted interactions are structural contacts. [Design Properties] (predicted fitness, left): Predicted fitness changes (ΔEVH) between WT TEM-1 and designs (bottom) as well as the natural homologs of β-lactamases (top, from the multiple sequence alignment used to generate the model). (diversity, right): Table of similarities and general properties of each design. [Design Sequences] Multiple sequence alignment of all designs, with amino acids changes relative to WT TEM-1 colored by new residue property (standard colors: green, hydrophobic and glycine (G); blue, negative charge; red, positive charge; light blue, polar). Conservation and logo for each position in the multiple sequence alignment of natural homologs are displayed above the design sequences.

WT TEM-1 design variants were generated algorithmically. To begin, a random amino acid sequence of length L was separately generated for each design. Gibbs sampling was applied to each sequence: at each iteration, a random position was chosen for mutation and the identity of the amino acid selected to persist to the next round was determined by sampling from the conditional probability distribution of residues at that position proportional to a three-term objective function. The objective function aims to (a) maximize predicted fitness (EVH) while (b) constraining to a target sequence identity relative to WT TEM-1 and (c) enforcing an upper bound on the sequence identity relative to known natural homologs as well as to other design variants. For example, a target of 70% results in a design that has ~70% sequence identity with WT TEM-1 and at most a 70% sequence identity with any natural homolog or other design. We generated sequences with varying sequence identity targets (98%, 95%, 90%, 80%, 70%, and 50%). Of the six sequences that were generated at each threshold, two were selected (Methods) and underwent experimental testing. The sequences assayed are referred to hereafter as “98.a, 98.b, 95.a…50.a, 50.b”, corresponding to their respective sequence identity threshold and an arbitrary secondary identifier. In addition, two sequences were optimized solely towards generating the highest predicted fitness (unconstrained by sequence identity) with one in which the mutations were chosen in a greedy manner (opt.a) and one that was selected using parallel tempering (opt.b, [28,29]).

We next examined general properties of the designs (Figure 2). Controls include WT TEM-1, a consensus sequence that contains the most represented amino acid at each position in the frequency-reweighted alignment used for model generation (rw-consensus), and a catalytically-inactive negative control where the catalytic residue Ser70 (Ser68 in UniProt numbering) was mutated to alanine (S70A, neg. ctrl) [18,30]. The designs all had a predicted fitness (EVH) higher than WT TEM-1 except for 50.a and 50.b (Figure 2, Predicted Fitness and Table). These designs also had much higher predicted fitness than randomly introducing mutations into the WT TEM-1 sequence (Supplemental Figure 2). While a given position was often altered across multiple designs, the identity of the amino acid change often varied (Figure 2, bottom). In general and as expected, the algorithm preferred to change positions that were variable rather than conserved in the multiple sequence alignment (Figure 2 and Supplemental Figure 3A). Although the algorithm did make mutations to residues in the core of the protein (Table in Figure 2), the algorithm preferred mutating positions that were closer to the surface (Supplemental Figures 3B and Supplemental Figure 4). In addition, positions mutated in the designs generally had fewer overall interactions with other positions in WT TEM-1 (PDB: 1XPB) when compared to positions that were not mutated (Supplemental Figures 3C). An overview of each design, including the number of amino acid changes relative to WT TEM-1 and closest natural homolog as well as some general properties of mutated positions (e.g., number of mutations at core residues) can be found in Figure 2 (Table).

The majority of designs confer resistance to ampicillin in bacteria and are able to hydrolyze ampicillin and nitrocefin in biochemical assays

To assess the effect of the many mutations in each design on function, the sequences were synthesized and cloned in front of the WT TEM-1 promoter and N-terminal signal peptide, transformed into E. coli, and assayed for growth in the presence of ampicillin. Using a Clinical and Laboratory Standards Institute (CLSI) broth microdilution assay, we quantified the minimum inhibitory concentration (MIC) of ampicillin required to completely abrogate the growth of bacteria that express the designed variant (Methods). Results are shown in Figure 3 (top right). Of the 14 designed variants, 11 conferred resistance to ampicillin. Eight had equal or increased MICs compared to WT TEM-1, three had a decreased MIC, and three had the same MIC as the catalytically-inactive negative control (neg. ctrl). In at least one replicate, several of the designs (98.b, 95.a, 95.b, 90.b, and 80.a) grew on the maximum tested concentration of ampicillin (4096 μg/mL), which completely inhibited growth of WT TEM-1. Bacteria expressing the rw-consensus sequence were unable to grow in any concentration of ampicillin above the MIC of the negative control. The two designs that were optimized solely for predicted fitness (i.e., distance unconstrained) both conferred resistance to ampicillin, with the greedy optimized sequence (opt.a) having a MIC equal to WT TEM-1, and the parallel tempering optimized sequence (opt.b) had a large decrease in MIC (roughly 100X less). In summary, nearly all of the designs were able to confer resistance to ampicillin including a design with 84 mutations relative to its closest natural homolog (70.a) and two designs with over 50 mutations (80.a and 80.b).

Figure 3: Most designs are able to confer ampicillin resistance to bacteria as well as hydrolyze both ampicillin and nitrocefin in vitro.

Figure 3:

[top left panel] Minimum inhibitory concentration (MIC) of ampicillin in E. coli as determined by a CLSI broth microdilution assay in which a fixed concentration of design-expressing bacteria were subjected to various ampicillin concentrations. The aggregated MIC calls (gray bars, see methods) summarize three individual replicate experiments. Most designs (11 of 14) confer resistance to ampicillin. Three designs (70.b, 50.a, 50.b) have a similar MIC as the negative control (a catalytically dead point mutant). [top right panel] Specific activity (change in absorbance per mg enzyme per minute, three replicates each) of purified designs on the antibiotic ampicillin. Hydrolysis was measured at an absorbance of 235 nm with an initial saturating concentration of 800 uM ampicillin. Specific activity values are generally consistent with the results of cell-based resistance experiments. [bottom panel] Michaelis–Menten kinetics of each design towards the canonical colorimetric β-lactam substrate nitrocefin (standard deviation from 3 replicates). N/H: no hydrolysis was detectable (neg. ctrl, consensus, and 70.b). N/A: designs that could not be purified (50.a and 50.b). [colored bars] Binary call for each sample and assay. Red: non-functional. Green: functional.

In addition to assessing MIC with the broth microdilution assay, we applied and obtained similar results using two additional independent antibiotic-resistance assays to assess the designs’ resistance to ampicillin in cells: MIC determination by assessing colony formation on a serial dilution of ampicillin on agar plates (Figure 5 and Supplemental Figure 5A) and growth on agar plates containing MIC strips (Liofilchem, Figure 5 and Supplemental Figure 5B).

Figure 5: Summary of in-cell, biochemical and stability properties of designed variants.

Figure 5:

[Blue color scheme]: Summary values from multiple independent resistance assays of the minimum inhibitory concentration (MIC) in E. coli of multiple β-lactams. Numbers for each assay are as follows: [broth microdilution] aggregate of three or five replicates (Methods), [colony formation] mean of three replicates, [MIC Strip] mean of three replicates unless any replicate was above the highest tested dose at which point the mode is displayed if available otherwise the median is depicted. Colors are log normalized; darker colors: larger values. [Green and red color scheme]: In vitro biochemical analysis of each design’s ability to hydrolyze ampicillin or nitrocefin (green) as well the change in melting temperature (thermostability) from WT TEM-1 (red). N/A: the design could not be purified; N/H: the design had no hydrolysis. Heatmap colors are linear; darker colors: larger values.

Intrigued that nearly all designs (11 of 14) were able to confer resistance to ampicillin in bacteria, we next investigated the biochemistry of the enzymatic reaction. Each design was expressed in E. coli and purified. The catalytic activities on the colorimetric β-lactam substrate nitrocefin were measured and initial velocities were fit to the Michaelis–Menten equation (Methods, Figure 3, bottom panel). All of the designs that enabled resistance to ampicillin in the biological assays had a similar or elevated catalytic efficiency (Kcat/KM) to WT TEM-1. This includes the 70.a design with 84 amino acid differences relative to any known protein. The remaining designs that did not confer resistance to ampicillin in bacteria had no detectable biochemical activity (70.b), or were unable to be purified (50.a, 50.b). Although several designs had an increase in Kcat (98.b, 90.b, opt.a, opt.b) the overall effect of this increase on catalytic efficiency was generally nullified by a concordant increase in KM. Interestingly, many of the changes in catalytic efficiency are driven by changes to KM, with the majority of the designs showing a decreased KM relative to WT TEM-1, likely reflecting increased binding affinity to nitrocefin. In summary, these results offer a biochemical explanation (β-lactam hydrolysis) for each designs’ ability to confer ampicillin resistance in bacteria.

We also performed biochemical analysis of each designs’ activity on ampicillin. As absorbance-based detection of ampicillin cleavage was noisy in high-throughput plate-based formats and in low concentrations of ampicillin, we calculated specific activity using a single saturated concentration of ampicillin. Specific activity values for each design are shown in Figure 3 (top right). All of the designs that enabled resistance to ampicillin in bacteria (except for opt.b) had activities similar to WT TEM-1, and the designs that did not confer resistance in bacteria had activity values similar to the catalytically dead negative control (neg. ctrl) or could not be purified (50.a and 50.b). The one exception to this agreement was opt.b, which, compared to the negative control, did not have different biochemical specific activity, yet did have an increased MIC in the bacterial assays. We found purified opt.b unstable and prone to precipitation, which may explain the discrepancy between bacterial and biochemical assays. Overall, as with the nitrocefin biochemical analysis, specific activity on ampicillin further confirms that the majority of designs are able to effectively hydrolyze β-lactams.

Taken together, designs with the largest number of amino acid differences (50.a, 50.b, 70.b) were non-functional in both bacterial resistance assays and biochemical analysis. The other designs (11 out of 14) conferred resistance to ampicillin in bacteria using multiple independent assays, and exhibited β-lactam hydrolysis in biochemical analysis of nitrocefin and/or ampicillin. These functional designs had varying numbers of amino acid changes, including two sequences with over 50 amino acid changes (80.a and 80.b), one with 62 changes (opt.b) and one with 84 changes (70.a) relative to any known homolog. These data are consistent with the general view that it is increasingly difficult to maintain or improve activity with an increasing number of amino acid changes in a protein, whether by computation design (this work) or experimental-based approaches [1]. The key encouraging difference, however, is that the design process used here can maintain function at a surprisingly much larger number of mutations than a random mutation process.

Designs have increased stability and broadened specificity

Enhanced enzyme stability and altered substrate specificity or catalytic profile are common goals of protein design. The spectrum of protein stabilities and catalytic substrates of the β-lactamases included in the MSA used to derive the computational model likely extends far beyond that of WT TEM-1. As the design process is informed by this ensemble of homologous proteins with various characteristics (melting temperature, specificity, enzyme kinetics, etc.), it is plausible that the designs are not just an optimized version of WT TEM-1, but rather that each reflects information from all of the sequences used in the MSA.

We assayed the melting temperature (Tm) of each purified design using differential scanning fluorimetry (DSF). Surprisingly, every design we were able to purify (all except 50.a and 50.b) had substantial increases in Tm relative to WT TEM-1 (Figure 4, left). The WT TEM-1 Tm is 50.6°C and the absolute Tm of each design ranged from 55°C to 78°C. Over half (9 of 14) of the designs had an increase of over 10°C, and three exceeded a 20°C increase. Additionally, these increases in thermostability were not at the expense of enzymatic activity as, aside from 70.b, all of these designs were also functional (at mesophilic temperatures). In general a protein’s consensus sequence often has increased thermostability [31], and indeed the rw-consensus (reweighted consensus, Methods) of our MSA had a large Tm increase of 15°C. Although the computational fitness prediction takes into account positional conservation, all of the design sequences are substantially different from the rw-consensus (Figure 2), suggesting that these increases in Tm are not solely accounted for by conserved residues. In summary, every design that we were able to purify had increased thermostability, and, despite these increases, all of these designs except 70.b conferred resistance to ampicillin in E. coli and were able to hydrolyze nitrocefin (Figure 3).

Figure 4: All of the designs have increased stability and many also provide increased resistance to multiple classes of β-lactam antibiotics relative to WT TEM-1.

Figure 4:

[Thermostability, left panel]: Quantification of thermostability (Tm) using differential scanning fluorimetry (DSF) to determine the melting temperature of each design. The change in melting temperature relative to WT TEM-1 ranges up to 25 degrees (three replicate experiments). Except for the two designs that could not be purified (50.a and 50.b), all designs were more thermostable, with 9 of the 11 functional designs having >10°C increase in melting temperature. [Resistance to multiple β-lactams, right panels]: Minimum inhibitory concentration (MIC) of several classes of β-lactam antibiotics was determined using CLSI broth microdilution. Gray bars: aggregated MIC call (Methods). Red diamonds: individual replicate experiments. Upper left in each plot is the antibiotic name, class name, and chemical structure. Many of the designs confer resistance to higher concentrations of antibiotics than WT TEM-1 (left-most column).

We next profiled the designs’ ability to confer resistance to a panel of β-lactam antibiotic substrates using the same CLSI broth microdilution assay used to profile ampicillin resistance. Interestingly, many of the designs had acquired broadened substrate specificities relative to the WT TEM-1 (Figure 4 right panel and Figure 5). Other than the two designs that were optimized solely for predicted fitness (opt.a and opt.b), most of the designs that were able to confer resistance to ampicillin also conferred increased resistance to E. coli grown in the presence of at least one of the other tested β-lactam antibiotics (aztreonam, ceftazidime, cefazolin, and cephalothin). Most notably, the 70.a design, which contains 88 mutations relative to WT TEM-1, had a ~32-fold increased MIC of the monobactam β-lactam antibiotic aztreonam, a 16-fold increased MIC of ceftazidime, an 8-fold increased MIC of cefazolin, and a 4-fold increased MIC of cephalothin. The 80.a and 80.b designs also had a 4-fold increased MIC of aztreonam, as well as a 16-fold increased MIC of ceftazidime. Many of the designs had an increased MIC of cefazolin (98.a, 95.a, 95.b, 80.a, and 70.a) and cephalothin (98.a, 98.b, 95.a, 95.b, 80.a, 70.a). Three of the antibiotics (cefoxitin, imipenem, and meropenem) had no consistent differences in MIC compared to the negative controls or WT TEM-1 (Supplemental Figure 6). We obtained similar results assessing resistance to aztreonam, ceftazidime, and cephalothin using MIC strips (Supplemental Figure 7).

In summary, every design had an increase in thermostability, and most of the designs had an increase in the ability to confer resistance towards at least one of the tested β-lactams. In particular, the 80.a and 70.a designs had between 4-fold and 32-fold increased MIC of four tested antibiotics as well as 24°C and 6°C increased Tm, respectively. Most of the rest of the distance-constrained designs (98.a, 98.b, 95.a, 95.b, and 80.b), which all had more than an 11°C increased melting temperature, also had broadened specificity towards at least one of the four tested β-lactams. The simultaneous increase in thermostability and broadened substrate specificity suggests that the design process enhanced multiple parameters, resulting in a diverse set of designed variants that contain a set of useful properties.

Highly mutated design variants have 3D protein structures nearly identical to WT TEM-1

To examine the structural effects of these mutations, we obtained X-ray crystal structures of designs 80.a, 80.b, and 70.a. These three designs had some of the highest mutation counts to any natural sequence while still retaining function. Aligning these structures in 3D to a published WT TEM-1 structure (PDB:1XPB, [27]) revealed that all three have nearly identical Cα backbones to WT TEM-1 (0.26–0.59 Å RMSD over all Cα atoms). Both 80.a and 80.b have 55 amino acid substitutions relative to WT TEM-1, of which 80.a has 15 mutations in the core of the structure and 80.b has 19 in the core (Figure 6A). 70.a has 88 mutations relative to WT TEM-1, 35 of which are in the core.

Figure 6: Structural evaluation of designs.

Figure 6:

[A] Structural alignment of the crystal structure of 70.a (top, blue), 80.a (middle, yellow) and 80.b (bottom, teal) with WT TEM-1 (PDB:1XPB, silver). Catalytic residues—E166 and S70 [41]—are circled in red and their side chains are in stick representation. [B] Highlight of the mutations (red sections of the ribbons) in each design relative to WT TEM-1. Catalytic residues (E166 and S70) marked as in A. [C] Structural superimposition of 70.a (blue), 80.a (yellow) and 80.b (teal) with publicly available β-lactamase structures (silver, 927 protein chains from 542 PDB entries). [C, inset] Distribution of structural similarity (root mean square deviation in Cα atomic positions, RMSD) of each publicly available β-lactamase structure relative to WT TEM-1 (PDB 1XPB). RMSD of each design: marginal ticks on the x-axis. [D] The relationship between sequence identity and Cα backbone structural similarity (RMSD) for the same publicly-available β-lactamase structures as in C and the three designs. Colors as in C.

Although all three designs had Cα backbone structures that were nearly identical to WT TEM-1, there were slight deviations, and so we analyzed structural variations among WT and TEM-1 natural homologs. We collected a set of β-lactamase structures (947 polypeptide chains from 542 PDB structures, Methods) and structurally aligned each as well as the 70.a, 80.a, and 80.b structures to WT TEM-1. The Cα backbones of the homologous structures largely overlap with one another and those of the three designs. Quantifying the structural deviation from WT TEM-1 (1XPB) reveals that the designs have a similar amount of variation as the natural homologous structures (Inset of Figure 6C). We next investigated the relationship between sequence identity (from WT TEM-1) and structural variation, and found that the three designs have structural variation that is similar to other PDB entries with a similar sequence identity to WT TEM-1 (i.e., ~70–80%, Figure 6D). In summary, the mutations introduced by the design process do not lead to larger structural variations than those of naturally evolved proteins, in spite of the imposed design constraint of upper bounds on the sequence distances not only to WT TEM-1, but also to all known natural homologs.

We next attempted to find structural evidence for the increased melting temperature and broadened substrate specificity of the three functional designs for which we were able to obtain crystal structures. To investigate a structural basis for the increased Tm (80.a had a 24°C increase relative to WT TEM-1, 80.b had a 18°C increase, and 70.a had a 6.3°C increase, Figure 4 and Figure 5), we tallied the total number of hydrogen bonds and the number of atom pairs in contact (1.7–4 Å) in each structure. These analyses did not reveal any substantial differences with WT TEM-1 that would explain the increase in Tm (data not shown). However, we did observe literature-reported globally stabilizing mutations such as M182T in every one of the designed sequences, which has been shown to confer an 7.5°C increase in Tm as a single substitution [21]. The 70.a, 80.a and 80.b designs also showed increased resistance to aztreonam (80.a and 80.b had a 4-fold increased MIC, and 70.a had a 32-fold increase MIC, see broth microdilution assay in Figure 4 and Figure 5). There are five shared mutations near the active site that are exclusive to these three designs (M69A, E104T, P167T, E168A, E240G) that may contribute to this increased resistance. Compared to published structures of β-lactamases bound to aztreonam (PDB 5G18, 1FR6, 2ZQC, 4WBG, 4X53, 5KSC, [3236] and penicillin binding proteins (PDB 3PBS, 3UE0, 5HLB, 6KGU) [3740] the smaller side chains of E104T and E240G possibly avoid steric hindrance with aztreonam, but further experimental testing is necessary in order to arrive at a definitive biophysical explanation for the increased activity.

In summary, the three designs possessed identical folds and highly conserved backbone conformations relative to WT TEM-1. The differences in structural variation were of similar magnitude to other published structures with a similar sequence identity to WT TEM-1 (~70–80%), and the mechanism for increases in thermostability and broadened substrate specificities were not due to any obvious structural changes.

The ensemble of mutations in each design positively influences fitness beyond that of individual point mutations

So far this work indicates that our design algorithm, which utilizes a scoring function that takes into account both positional constraints and pairwise interactions between positions (i.e., epistasis between positions), can generate sequence variants with a very large number of amino acid changes—up to 84 in a single sequence—that maintained function and had enhanced properties. It is common to utilize experimentally-derived fitness measurements of individual point mutations to inform protein design. For example, deep mutational scans (DMS), which aim to quantify the fitness of all single point mutations in a wild type background, are a useful means to increase the yield of obtaining functionally active variants by avoiding the introduction of deleterious mutations [42,43]. However, the fitness effect of point mutations on the wild type sequence is not necessarily additive and likely becomes less useful for design as the variant sequences diverge substantially from wild type.

We examined the experimentally-determined fitness effect of individual amino acid changes in the functional designs in the WT TEM-1 background from a published DMS [18] (Supplemental Figure 8). We define “fitness defect” as an amino acid substitution having a fitness score of less than −1 in at least one replicate in the published DMS study, which conceptually equates to a 10-fold decrease in fitness at a high ampicillin concentration (2,500 μg/mL) relative to WT TEM-1. Designs with the fewest number of amino acid changes (98.a, 98.b, 95.a, 95.b, 90.a, 90.b, opt.a) did not contain any mutations that exhibited a fitness defect at any concentration of ampicillin. However, the other functional designs contain amino acid changes that exhibit fitness defects in WT TEM-1; 80.a has two amino acid changes that show fitness defects, 80.b has four, 70.a has 11 and opt.b has 28. The presence of mutations that cause fitness defects to WT TEM-1 in four of the functional designs was noteworthy, especially given that all of these mutations also had a negative fitness prediction as point mutants in the WT TEM-1 background. Several key questions arise from these observations. How are these designs able to function with mutations that are deleterious to WT TEM-1? Why does the design generation algorithm, which attempts to optimize the predicted fitness, produce designs with mutations that were predicted to have a negative fitness impact to WT TEM-1? We next investigated these questions by (1) performing a targeted analysis on one particularly well studied inactivating mutation (G251W), and (2) taking a more general look at the predicted fitness effects of a set of mutations known to be deleterious to WT TEM-1 in each design.

The G251W mutation, which was present in both 70.a and opt.b, stood out due to the numerous studies that describe it as negatively impacting fitness in WT TEM-1 [18,44]. In addition, the G251W mutation in WT TEM-1 had one of the lowest predicted fitnesses. The opt.b design had the lowest performance of any functional design in both the bacterial resistance and biochemical assays, which could in part be due to G251W. Intriguingly, 70.a was reasonably active on ampicillin and was the most effective design (highest MIC) towards three of the five tested β-lactam antibiotics (Figure 2 and Figure 4 and Figure 5).

We next investigated whether any single mutation in the 70.a design enables it to retain function with the presence of G251W. In the literature [44], several “compensatory mutations” have been described to at least partially alleviate the fitness defects caused by G251W, and both 70.a and opt.b contain some of these mutations. However, each of these mutations was also present in at least one other design that did not contain the G251W mutation, suggesting that these mutations were not specifically selected by the design algorithm to compensate for G251W. It is also possible that the design generation algorithm selected compensatory mutations not uncovered in the literature review. Although there are no large predicted epistatic effects between G251W and the 70.a mutations on the WT TEM-1 sequence background (Supplemental Figure 9), these double mutants do have some of the highest predicted changes in fitness of all possible double mutations (Supplemental Figure 10). For example, in the 70.a structure, new local interactions between mutated residues W251 and R230 and non-mutated residue E212 may contribute to maintaining β-lactamase structure and function (Supplemental Figure 11). In summary, (1) 70.a contains many point mutations that have been described in the literature as compensatory, (2) these mutations combined with G251W have some of the highest predicted fitness effects, and (3) structural analysis suggests that local interactions between W251 and R230 (a mutation in 70.a F230R) may be important for maintaining function. Each of these factors likely help 70.a overcome the deleterious effect of the G251W mutation.

Rather than explaining the presence of mutations in the designs that would have caused fitness defects in WT TEM-1 solely through a single compensatory amino acid change, it is plausible that it is the ensemble of mutations in a given design that accounts for why the algorithm selects such mutations. To probe this hypothesis, we examined how the predicted fitness score of a point mutation may be affected by the amino acids in the rest of the sequence (i.e., the “background” for the point mutation). Using the 70.a design as an example, we isolated each of its 88 mutations and calculated the change in predicted fitness (ΔEVH) from the WT TEM-1 amino acid to the 70.a amino acid in both the WT TEM-1 background and 70.a background (Figure 7A, left). The predicted fitness effect of each point mutation is positive (+) or negative (−) to 70.a and/or to WT TEM-1, and so each mutation falls into one of four possible categories: (Q1) positive in 70.a and negative in WT TEM-1, (Q2) positive in both 70.a and WT TEM-1, (Q3) negative in 70.a and positive in WT TEM-1, and (Q4) negative in both 70.a and WT TEM-1. For all designs, the percent of mutations in each category is shown in Figure 7A (right). As expected, nearly all of the mutations are predicted to have a positive effect on fitness in the design backgrounds. Interestingly, a large fraction of the design mutations are predicted to have a negative impact in the WT TEM-1 background. For example, G251W in WT TEM-1 is predicted to have an approximately 3 point decrease (arbitrary units) in fitness whereas the same mutation in 70.a is predicted to have a ~3 point increase (Figure 7A, left). Conceptually, even though introducing an individual mutation from the designs into WT TEM-1 often lowers its predicted fitness, combining this set of mutations into one sequence results in a higher predicted fitness than WT TEM-1. Furthermore, in the context of the designed sequence, reverting the mutated residues individually to their WT TEM-1 amino acids lowers the design’s predicted fitness (Figure 7B). These data suggest that the specific sequence context is very important, i.e., that the combination of all mutations can drastically influence the fitness effect of individual point mutations. Furthermore, the site and pairwise evolutionary model is able to capture complex epistatic interactions between amino acids and, with reasonable accuracy, correctly predict the fitness of highly divergent designs.

Figure 7: The set of mutations in each design collectively influence the predicted effect of individual point mutations.

Figure 7:

[A, left] Example design (70.a) of the raw data that are summarized in the table on the right. Plotted is a comparison of how point mutations in 70.a change the predicted fitness (ΔEVH) in WT TEM-1 (x-axis) versus 70.a (y-axis). Each of the 88 points represent a single amino acid substitution that is found in 70.a. Pink background indicates the top left quadrant, which are those mutations that are predicted to have a negative effect on fitness in WT TEM-1 and positive effect on fitness in 70.a. The percent of amino acid changes in each quadrant is indicated. [A, right] Summary of predicted fitness changes (ΔEVH) for each designs’ amino acid changes compared to their predicted effect in WT TEM-1. The “percent of mutations” is the percent of mutations in each quadrant (see example on left), with the pink background containing the percentage that are predicted to have a negative fitness effect in WT TEM-1 and positive effect in the design. [B] Distribution of predicted fitness effects (ΔEVH) of all possible amino acid changes (252 * 19) introduced in the WT TEM-1 sequence background (top, blue) and 70.a sequence background (bottom, blue). Red lines are the predicted fitness effect of the 70.a mutations if added individually to WT TEM-1 (top) or removed from 70.a (i.e., reverted to the WT TEM-1 amino acid).

Discussion

In this work we demonstrate a design algorithm based on an evolutionary model of sequence co-variation that enables large changes to primary sequence while enhancing function and thermostability. Previous studies have demonstrated that the random sequential introduction of mutations into β-lactamase results in a rapid decay of activity (resistance to ampicillin), resulting in complete ablation of activity in nearly every variant after the introduction of only 10 mutations [1]. Of the small set of 14 β-lactamase designs that we tested, the hit rate of active sequences, with up to 30% of residues mutated, was surprisingly high with 11 designed proteins enabling bacterial growth on ampicillin and hydrolysis of nitrocefin. These functional designs contained between 6 and 84 amino acid differences from any known protein, indicating that our algorithm far outperforms random mutagenesis-based approaches.

Perhaps one of the most interesting results is the joint optimization of both stability and activity, which has often been viewed as an inherent tradeoff in the protein engineering literature [4547]. In addition to maintaining enzymatic activity, all of the functional designs also had thermostability increases of between 6°C and 27°C and most conferred increased resistance to one or more β-lactam antibiotics. Crystal structures of the functional designs with the highest mutation count to any natural sequence revealed nearly complete structural preservation relative to WT TEM-1 β-lactamase. The increase in stability and resistance to β-lactams plausibly emerges from the information in the collective set of homologs, which evolved in many species over millions of years, used in sequence fitness prediction. A simple explanation is that each designed variant reflects the set of evolutionary constraints that enabled stability and function of the naturally occurring proteins in the multiple sequence alignment. The expansion of substrate specificity in some of the designs is plausibly related to the diversity of substrate preferences in different species and different conditions represented in the set of sequences in the multiple sequence alignment.

We expect this design strategy leveraging natural diversity to engineer stability and activity is broadly applicable to a range of protein families. The β-lactamase family in particular has a high level of functional and sequence diversity, as reported, with over 4,000 known enzymes in 17 functional groups differentially targeting four classes of substrates [48,49]—characteristics that likely facilitate learning of a fairly broad framework of constraints that retain fold and function while allowing for a range of substrate specificities. A previous study using a similar modeling framework demonstrated the generation of a large set of functional chorismate mutase variants of considerable sequence diversity, illustrating the potential broad-applicability of this type of design process [2]. It remains to be determined what levels of sequence and functional diversity in the training alignment are necessary for designing libraries of stable variants with broad specificities, and which protein families satisfy these requirements.

It is tempting to conclude that these designs, which have large numbers of primary sequence changes while maintaining or even increasing activity, are better starting points for refining the specificity of a design in new directions, e.g., by further exploration of very similar sequences in the neighborhood of the designed starting point, than the original starting sequence. One common approach for refining the properties of a protein is directed evolution where mutations in a starting sequence are gradually introduced and accumulated (usually in a greedy manner) over multiple rounds of selection for the desired properties. Conceptually, as the negative effect of mutations can often be linked to decreased stability [50], starting with a protein that has high stability is useful for directed evolution as increased mutational tolerance enables a higher “hit rate” of stable sequences. All of the purifiable designs had increased stability with over half having melting temperature increases of more than 10°C from the wild type. This increased stability suggests that the majority of the designed sequences may be more tolerant to mutation than WT TEM-1, which would be useful in traditional protein optimization strategies like directed evolution. Conceptually, the design process presented here enables large “jumps” to new regions of functional sequence space through the introduction of many mutations. Further improvements to optimize specific functional and structural properties could then be achieved through smaller mutational steps (i.e., “walking” in sequence space). We believe this “jump and walk” strategy may enable us to efficiently discover protein variants with diverse structural and functional property changes while maintaining or increasing function.

One interesting methodological question that arises from these results is how important accounting for epistasis is for our design strategy, and to what extent other machine learning models [4,5,12,5153] that explicitly include higher-order dependencies, such as variational auto-encoders and large scale language models, are also able to capture the collective effects of protein stability and function necessary to perform a similar design strategy. The evolutionary couplings or Potts model [9] used in this work does approximate collective effects by inferring coupling terms (residue-residue interactions) up to second order (pairwise) that best describes the full dataset of available sequences. When these interactions act iteratively through the entire system, collective effects are approximately captured, in analogy to similar, highly successful, models in statistical physics. Similarly, methods such as variational auto-encoders are also an approximation with parameters derived by minimizing the difference between encoded (input) and decoded (generated) sequence distributions. In practice, which approximation best captures collective effects for the design of entirely new sequences depends on the particular problem and remains to be determined in each case or against large carefully crafted benchmark datasets.

This work supports the use of statistical models of evolutionary sequence information for protein design, enabling the simultaneous introduction of many mutations into the primary protein sequence while maintaining function. Future work will investigate the biophysical explanation for the enhanced properties obtained here, including increased thermostability and broadened substrate specificity, as well as how generalizable this enhancement strategy is to other proteins. We anticipate that this type of approach will be readily applicable to many protein classes as a means to enhance and design new industrial or therapeutic functions.

Methods

Design process and parameters

Alignment

A multiple sequence alignment of β-lactamases was constructed using five iterations of jackhmmer search against the UniRef100 database with a length-normalized bitscore of 0.5 (selected to ensure primarily β-lactamases in the alignment). The alignment was filtered to exclude positions with more than 30% gaps and exclude sequence fragments aligning to less than 50% of the target sequence. To account for redundancy, similar sequences in the alignment are re-weighted according to their uniqueness using a hamming distance cutoff of 0.2 [9].

EVcouplings model

EVcouplings is a probabilistic model of the evolutionary process of sequence generation, parameterized by site-specific and pairwise constraints. The site and coupling parameters are inferred using regularized maximum pseudolikelihood [9].

Gibbs sampling

Design sequences were generated under the EVcouplings model using gibbs sampling, a subset of markov chain monte carlo (MCMC) sampling. Gibbs sampling is implemented by starting with a random seed sequence, iterating through positions at random, computing the conditional probability distribution of each potential residue at those positions according to an objective function and selecting a new residue according to the probability distribution. The cost function used to define the probability distribution consists of the EVH score and constraint terms to restrict distance from the target sequence, distance from any other natural sequences (those in the multiple sequence alignment), and distance from other designs to a set threshold.

Greedy sampling - opt.a

One sequence (opt.a) was also generated using a greedy sampling protocol where rather than selecting a new residue according to a probability distribution, the highest scoring residue was selected at each step.

Parallel tempering sampling - opt.b

One sequence (opt.b) was generated using a parallel tempering protocol to avoid getting trapped in local minima. In this approach, multiple replicas are initiated with different temperature parameters, and these parameters are then exchanged between replicas according to the metropolis-hastings criterion at regular intervals during the optimization [28].

Frequency reweighted sequence - rw-consensus

The reweighted consensus control was generated by assigning the most frequent residue at each position in the alignment, after redundancy-reweighting each sequence according to its uniqueness with a hamming distance cutoff of 0.2 [9].

Sequence subsampling

Six designs were generated at each distance threshold (50%−98% sequence identity) . From these designs, two were selected for experimental characterization. Designs with mutations in positions that are over 90% conserved in the multiple sequence alignment and contain a different amino acid from the WT TEM-1 sequence were removed. For the remaining sequences, the number of mutations on the surface and in the core of each design was estimated with a TEM-1 structure (PDB 1ZG4) as a reference, and using the “FindSurfaceResidues” PyMol script. The two sequences at each threshold with the most core mutations were selected for synthesis and testing. If two sequences were equally ranked, the sequence with the highest energy value was chosen for validation.

Similarity to known sequences

We used the BLAST to find the nearest homologs when preparing for publication. The database used was the non-redundant protein sequences (nr) and the algorithm selected was blastp (protein-protein BLAST).

Cloning for Antibiotic Resistance Assays

Plasmid preparation

Designs were cloned into a modified pSTC0 plasmid after a native ampR promoter and WT TEM-1 N-terminal signal peptide. The pSTC0 plasmid originally contained two antibiotic cassettes, ampicillin and kanamycin. We replaced the kanamycin resistance cassette with a zeocin resistance cassette in order to reduce the probability of contamination with other ongoing projects in the lab.

Codon Optimization

Reverse-translation of sequences to DNA was performed using the canonical E. coli codon table. To reduce the impact of differential translation efficiencies on the rate of β-lactamase translation, we used codons for the mutant amino-acid with the most similar codon usage frequency to that of the wild type codon.

Gene Synthesis and Assembly into Plasmid

Designs were synthesized as gBlocks by Integrated DNA Technologies and gibson cloned (New England Biolabs) into the modified pSTC0 backbone.

Determination of bacterial resistance to ampicillin and other β-lactam antibiotics

Sequence-validated glycerol stocks of DH5α E. coli (New England Biolabs) were used for all bacterial resistance assays.

MIC determination using a broth microdilution assay

A fixed concentration of design-expressing E. coli (DH5α), as determined by optical density (OD) calibrated to a McFarland prep, was added to a 2-fold serial dilution of ampicillin in a cation-adjusted Mueller-Hinton broth (CAMHB). Three experiments were performed for each design-ampicillin concentration, and the final MIC was determined as (1) the mode of the 3 replicates or, if there is no mode, then (2) use the median if all replicates are in essential agreement (i.e., within one serial dilution of one another) or, if there is no essential agreement, then (3) perform an additional 2 replicates and use the median of 5 replicates.

Ampicillin MIC determination by assessing colony formation on agar plates

E. coli (DH5α) that expressed each design were used to inoculate Mueller-Hinton (MH) broth containing 50 μg/mL zeocin (to ensure plasmid maintenance). Cultures were grown overnight (37°C at 250 RPM). The following day, cells were spun down and resuspended in 0.85% NaCl, and diluted to a final optical density of 0.125. This dilution was further diluted to achieve 5000 cells / mL and 40uL of this final solution (~200 cells) was pipetted into one well of a 6-well plate that contained a serial dilution of ampicillin in MH agar. Two 6-well plates were used for each of the three replicate, which contained ampicillin at the following concentrations: 4608 μg/mL, 2304 μg/mL, 1152 μg/mL, 576 μg/mL, 288 μg/mL, 144 μg/mL, 72 μg/mL, 36 μg/mL, 18 μg/mL, 9 μg/mL, 4.5 μg/mL, 0 μg/mL. MIC was defined as the lowest ampicillin concentration having no visible colonies after overnight culture at 37°C.

MIC determination using MIC strips

E. coli (NEB5α, New England Biolabs) that expressed each design were used to inoculate Mueller-Hinton (MH) broth containing 50 μg/mL zeocin (to ensure plasmid maintenance). Cultures were grown overnight (37°C at 250 RPM). The following day, cells were spun down and resuspended in 0.85% NaCl, and diluted to a final optical density of 0.125. 500uL of each design-expressing E. coli culture was pipetted onto three 15-cm plates containing MH agar+50 μg/mL zeocin, and 25 glass beads were dropped onto the plate and shaken vigorously until the liquid culture was well distributed. Five MIC strips (Ampicillin, aztreonam, ceftazidime, cefazolin, and cephalothin) were placed on the plate in a star pattern. Plates were placed at 37°C overnight, and the following day plates were imaged with a Bio-Rad ChemiDoc. Each sample was quantitated by eye by assessing the intersection of bacterial growth and non-growth on the strip.

Protein Purification

All TEM-1 designs were expressed and purified as follows. DNA was transformed into Lemo21(DE3) cells and grown in MDAG-135 media overnight at 37°C. The saturated culture was used to induce an autoinduction media culture of TBM-5052 supplemented with 1 mM Rhamnose and Y-Antifoam (Sigma, A5758) as a 1:100 dilution. The expression culture was grown for 20–22 hr at 30°C. Cells were harvested by centrifugation and resuspended in Buffer A (20 mM NaPi, 500 mM NaCl pH 7.4, 20 mM Imidazole) supplemented with 0.25 mg/ml Lysozyme (Sigma, L6876), Turbonuclease (Accelagen, N0103) , and EDTA-free Protease Inhibitor tablets. Cells were lysed by sonication at 4°C and clarified by centrifugation at 14,000 × g for 20 min. The supernatant was applied to 2 mL of cOmplete Ni2+-agarose (Roche) prewashed with Buffer B (20 mM NaPi, 500 mM NaCl pH 7.4, 20 mM Imidazole) and batch bound at room temperature for 1 hour. The Ni2+-agarose resin was washed with 60 mL of Buffer B, and sample was eluted with 5 mL Buffer C (20 mM NaPi, 500 mM NaCl pH 7.4, 400 mM Imidazole). Protein was concentrated to 0.5 mL using Amicon-4 10K concentrators and applied to a Superdex 75 pg 10/300 column equilibrated with Buffer D (20 mM NaPi, 150 mM NaCl pH 7.4). The β-lactamase peak was collected, and incubated with a 1:100 (w/w) dilution of His-CthSUMO protease [54] at room temperature for 1 hour. The protein solution was incubated with 2.5 mL of Ni2+-agarose prewashed with Buffer D, and the flow through was collected as the final product. Concentration was measured by A280 and purity was determined by SDS-PAGE. Protein was brought to 5% (v/v) glycerol, aliquoted, flash-frozen in liquid nitrogen, and stored at −80°C.

Enzyme Kinetics

All assays were performed at 25°C in 20 mM NaPi, 150 mM NaCl pH 7.4. A SpectraMax i3x instrument was used to monitor substrate hydrolysis at wavelengths 482 nm for nitrocefin and 235 nm for ampicillin (delta E 900).

Nitrocefin

Purified designs and control enzymes were incubated with multiple concentrations of nitrocefin, and hydrolysis was measured at an absorbance of 482 nm over time. For each plate, the buffer-only well absorbance was subtracted from each datapoint (done outside this script). A standard curve of absorbance versus hydrolyzed nitrocefin concentration was generated, and a linear fit of these data was used to calculate an absolute concentration of hydrolyzed nitrocefin for each sample. For each design-nitrocefin concentration a linear regression was calculated for all possible sliding windows of 4 timepoints for each for replicates. The maximum slope was selected and the quality of fit was verified by normalized RMSD as well as by eye. Samples neg. ctrl (S70A), rw-consensus, and 70.b had no detectable hydrolysis (N/H) and samples 50.a and 50.b were unable to be purified (N/A).

Ampicillin

Specific activity measurements were determined using a saturating condition of reagent and a constant amount of enzyme for each given substrate. Hydrolysis was measured at an absorbance of 235 nm (delta E 900) over time. For each design-nitrocefin concentration a linear regression was calculated for all possible sliding windows of 5 timepoints for each for replicates. The maximum slope was selected and the quality of fit was verified by RMSD as well as by eye. Final specific activity values were derived as previously reported [55].

Differential Scanning Fluorimetry

Protein thermal stability was measured by differential scanning fluorimetry using a QuantStudio Pro 6/7 (Applied Biosystems). Protein was brought to a concentration of 10 uM with 5x Sypro Orange dye in a final volume of 20 μL in 20 mM NaPi, 150 mM NaCl pH 7.4. Raw data processing and derivative fitting was determined using Applied Biosystems Protein Thermal Shift software, version 1.2. Measurements were conducted in triplicate.

X-ray structure determination

Expression and purification

Designs 80.a and 80.b were purified as described in Protein Purification. This purification method did not yield crystals for 70.a so a different strategy was applied. A construct with the sequence corresponding to the designed variant was synthesized by Genscript. The cDNA was inserted into a pET15b vector at the NdeI/BamH1 site, with an N-terminal His tag followed by a thrombin protease recognition sequence. The DNA construct was transformed into BL21(DE3) cells and expressed in liter amounts in 2xYT media (FORMEDIUM, UK) in the presence of 100 μg/mL of ampicillin (SIGMA Co). Cells were grown at 37°C to an OD600 of 0.6, induced with 0.1 mM IPTG and incubated overnight at 18°C. Cells were then harvested and lysed by sonication in extraction buffer (300 mM NaCl, 20 mM imidazole, 10 mM β-mercaptoethanol, 20 mM Tris-Cl pH 8). Following centrifugation (30 min, 277 K, 12,000×g), the soluble fraction was applied to a Ni2+-agarose resin by gravity. The resin was washed in the extraction buffer supplemented with 40 mM imidazole, and bound proteins were eluted with a step gradient to 200 mM imidazole. Eluted proteins were dialyzed overnight in extraction buffer in the presence of thrombin protease at 277 K. Following thrombin cleavage, there were 4 non-native residues (Gly-Ser-His-Met) prior to Pro27 of mature TEM-1 β-lactamase. Cleaved bLac70.1 was collected as a flow-through fraction from a second Ni2+-agarose gravity column. Following exchange into a low salt buffer (10 mM NaCl, 10 mM Tris-HCl pH 8, 1 mM DTT), the protein was loaded onto a Mono Q 5/50 GL ion exchange column and a 50% gradient of high salt buffer was applied (low salt buffer supplemented with 1 M NaCl). Fractions corresponding to the first peak of β-lactamase protein were pooled and loaded onto a Superdex 75 10/300 GF column (Cytiva) equilibrated in gel filtration buffer for further purification. The β-lactamase peak was concentrated in a 10 kDa molecular weight cut-off Amicon centrifugal concentrator to a final concentration of 20 mg/ml.

Crystallization

Purified 70.a proteins were subjected to hanging drop crystallization using commercial screens and Mosquito robotics (SPT Labtech). Crystals were directly harvested from 96-well plates and plunged briefly into 25% glycerol as a cryoprotectant. 70.a crystals grew from the commercial screen Morpheus containing 0.002 M Divalent II [0.005 M manganese(II) chloride tetrahydrate, 0.005 M cobalt(II) chloride hexahydrate, 0.005 M nickel(II) chloride hexahydrate, 0.005 M zinc acetate dihydrate], 0.1 M buffer system 6 pH 8.5 [Gly-Gly, AMPD], and 30% precipitant mix 7 [20% w/v PEG 8000, 40% v/v 1,5-pentanediol]. Crystals of purified 80.a and 80.b proteins were grown and harvested from commercial screens under numerous PEG/salt conditions. However, the best crystals were obtained with MES buffer at pH 6.5, with PEG3350 as the precipitant. See Supplemental Table 1 for Crystallographic Data and Refinement Statistics.

Data collection and refinement

The crystals were frozen in liquid nitrogen and subjected to X-ray data collection at the indicated synchrotron (Table 1). Data were processed using XDS [56] and Aimless [57]. The structure of bLac70.a was determined by molecular replacement using WT β-lactamase as a search model in Phaser (PDB code 1XPB, [27,58]. The initial model was built using the program AutoBuild implemented in PHENIX [59]. This was followed by alternate cycles of manual model building in COOT [60] and refinement using PHENIX.

Structural Comparison to Published β-lactamase Homologs

Publicly-available β-lactamase structures

Experimental β-lactamase structures were collected by performing a PSI-BLAST search (5 iterations) of the WT TEM-1 sequence with the database set to Protein Data Bank (PDB). Structures annotated as beta lactamase were further filtered to remove those that had less than 20% sequence identity to WT TEM-1. After filtering, 947 polypeptide chains from 542 PDB structures remained.

Structural alignment

Polypeptide chains were aligned in pymol using the “super” or “align” command, and the alignment with the lowest RMSD for each chain is reported and used to visualize structural variation.

Supplementary Material

Supplement 1
media-1.pdf (4.5MB, pdf)

Acknowledgements

We would like to thank Dan Davidi for advice relating to enzyme kinetics, Frank Poelwijk for insightful discussion, The Center for Macromolecular Interactions at Harvard Medical School for advice, discussion, and equipment. This work is based upon research conducted at the Northeastern Collaborative Access Team beamlines, which are funded by the National Institute of General Medical Sciences from the National Institutes of Health (P30 GM124165). This research used resources of the Advanced Photon Source, a U.S. Department of Energy (DOE) Office of Science User Facility operated for the DOE Office of Science by Argonne National Laboratory under Contract No. DE-AC02-06CH11357. This research also used the FMX beamline of the National Synchrotron Light Source II, a U.S. Department of Energy (DOE) Office of Science User Facility operated for the DOE Office of Science by Brookhaven National Laboratory under Contract No. DE-SC0012704. The Center for BioMolecular Structure (CBMS) is primarily supported by the National Institutes of Health, National Institute of General Medical Sciences (NIGMS) through a Center Core P30 Grant (P30GM133893), and by the DOE Office of Biological and Environmental Research (KP1607011). This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research, Genomic Science Program under Award Number DE-SC0022024 (N.P.G.), a grant from Science Foundation Ireland (SFI 20/FFP-A/8446) (A.R.K), and from Dana-Farber Cancer Institute (N.P.G. and C.S).

Code and data

All data and code are available in the supplemental material.

Bibliography

  • 1.Soskine M, Tawfik DS. Mutational effects and the evolution of new protein functions. Nat Rev Genet. 2010;11: 572–582. [DOI] [PubMed] [Google Scholar]
  • 2.Russ WP, Figliuzzi M, Stocker C, Barrat-Charlaix P, Socolich M, Kast P, et al. An evolution-based model for designing chorismate mutase enzymes. Science. 2020;369: 440–445. [DOI] [PubMed] [Google Scholar]
  • 3.Ingraham J, Baranov M, Costello Z, Frappier V, Ismail A, Tie S, et al. Illuminating protein space with a programmable generative model. bioRxiv. 2022. p. 2022.12.01.518682. doi: 10.1101/2022.12.01.518682 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Alley EC, Khimulya G, Biswas S, AlQuraishi M, Church GM. Unified rational protein engineering with sequence-based deep representation learning. Nat Methods. 2019;16: 1315–1322. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Biswas S, Khimulya G, Alley EC, Esvelt KM, Church GM. Low-N protein engineering with data-efficient deep learning. Nat Methods. 2021;18: 389–396. [DOI] [PubMed] [Google Scholar]
  • 6.Shin J-E, Riesselman AJ, Kollasch AW, McMahon C, Simon E, Sander C, et al. Protein design and variant prediction using autoregressive generative models. Nat Commun. 2021;12: 2403. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Marks DS, Colwell LJ, Sheridan R, Hopf TA, Pagnani A, Zecchina R, et al. Protein 3D structure computed from evolutionary sequence variation. PLoS One. 2011;6: e28766. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Stiffler MA, Poelwijk FJ, Brock KP, Stein RR, Riesselman A, Teyra J, et al. Protein Structure from Experimental Evolution. Cell Syst. 2020;10: 15–24.e5. [DOI] [PubMed] [Google Scholar]
  • 9.Hopf TA, Ingraham JB, Poelwijk FJ, Schärfe CPI, Springer M, Sander C, et al. Mutation effects predicted from sequence co-variation. Nat Biotechnol. 2017;35: 128–135. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596: 583–589. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Baek M, DiMaio F, Anishchenko I, Dauparas J, Ovchinnikov S, Lee GR, et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science. 2021;373: 871–876. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Riesselman AJ, Ingraham JB, Marks DS. Deep generative models of genetic variation capture the effects of mutations. Nat Methods. 2018;15: 816–822. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Frazer J, Notin P, Dias M, Gomez A, Min JK, Brock K, et al. Publisher Correction: Disease variant prediction with deep generative models of evolutionary data. Nature. 2022;601: E7. [DOI] [PubMed] [Google Scholar]
  • 14.Reva B, Antipin Y, Sander C. Predicting the functional impact of protein mutations: application to cancer genomics. Nucleic Acids Res. 2011;39: e118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Giessel A, Dousis A, Ravichandran K, Smith K, Sur S, McFadyen I, et al. Therapeutic enzyme engineering using a generative neural network. Sci Rep. 2022;12: 1536. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Marks DS, Hopf TA, Sander C. Protein structure prediction from sequence variation. Nature Biotechnology. 2012;30: 1072–1080. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Rivoire O, Reynolds KA, Ranganathan R. Evolution-based functional decomposition of proteins. PLoS Comput Biol. 2016;12: e1004817. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Stiffler MA, Hekstra DR, Ranganathan R. Evolvability as a function of purifying selection in TEM-1 β-lactamase. Cell. 2015;160: 882–892. [DOI] [PubMed] [Google Scholar]
  • 19.Bershtein S, Segal M, Bekerman R, Tokuriki N, Tawfik DS. Robustness-epistasis link shapes the fitness landscape of a randomly drifting protein. Nature. 2006;444: 929–932. [DOI] [PubMed] [Google Scholar]
  • 20.Firnberg E, Labonte JW, Gray JJ, Ostermeier M. A comprehensive, high-resolution map of a gene’s fitness landscape. Mol Biol Evol. 2014;31: 1581–1592. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Jacquier H, Birgy A, Le Nagard H, Mechulam Y, Schmitt E, Glodt J, et al. Capturing the mutational landscape of the beta-lactamase TEM-1. Proc Natl Acad Sci U S A. 2013;110: 13067–13072. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Klesmith JR, Bacik J-P, Wrenbeck EE, Michalczyk R, Whitehead TA. Trade-offs between enzyme fitness and solubility illuminated by deep mutational scanning. Proc Natl Acad Sci U S A. 2017;114: 2265–2270. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Hopf TA, Green AG, Schubert B, Mersmann S, Schärfe CPI, Ingraham JB, et al. The EVcouplings Python framework for coevolutionary sequence analysis. Bioinformatics. 2019;35: 1582–1584. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Salverda MLM, De Visser JAGM, Barlow M. Natural evolution of TEM-1 β-lactamase: experimental reconstruction and clinical relevance. FEMS Microbiol Rev. 2010;34: 1015–1036. [DOI] [PubMed] [Google Scholar]
  • 25.Modi T, Risso VA, Martinez-Rodriguez S, Gavira JA, Mebrat MD, Van Horn WD, et al. Hinge-shift mechanism as a protein design principle for the evolution of β-lactamases from substrate promiscuity to specificity. Nat Commun. 2021;12: 1852. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Potter SC, Luciani A, Eddy SR, Park Y, Lopez R, Finn RD. HMMER web server: 2018 update. Nucleic Acids Res. 2018;46: W200–W204. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Fonzé E, Charlier P, To’th Y, Vermeire M, Raquet X, Dubus A, et al. TEM1 β-lactamase structure solved by molecular replacement and refined structure of the S235A mutant. Acta Crystallogr D Biol Crystallogr. 1995;51: 682–694. [DOI] [PubMed] [Google Scholar]
  • 28.Earl DJ, Deem MW. Parallel tempering: theory, applications, and new perspectives. Phys Chem Chem Phys. 2005;7: 3910–3916. [DOI] [PubMed] [Google Scholar]
  • 29.Desjardins G, Courville A, Bengio Y, Vincent P, Delalleau O. Parallel tempering for training of restricted Boltzmann machines. 2010. [cited 2 Mar 2023]. Available: http://proceedings.mlr.press/v9/desjardins10a/desjardins10a.pdf
  • 30.Sideraki V, Huang W, Palzkill T, Gilbert HF. A secondary drug resistance mutation of TEM-1 beta-lactamase that suppresses misfolding and aggregation. Proc Natl Acad Sci U S A. 2001;98: 283–288. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Porebski BT, Buckle AM. Consensus protein design. Protein Eng Des Sel. 2016;29: 245–251. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Vandavasi VG, Langan PS, Weiss KL, Parks JM, Cooper JB, Ginell SL, et al. Active-Site Protonation States in an Acyl-Enzyme Intermediate of a Class A β-Lactamase with a Monobactam Substrate. Antimicrob Agents Chemother. 2017;61. doi: 10.1128/AAC.01636-16 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Oefner C, D’Arcy A, Daly JJ, Gubernator K, Charnas RL, Heinze I, et al. Refined crystal structure of beta-lactamase from Citrobacter freundii indicates a mechanism for beta-lactam hydrolysis. Nature. 1990;343: 284–288. [DOI] [PubMed] [Google Scholar]
  • 34.Bank RPD. RCSB PDB - 2ZQC: Aztreonam acyl-intermediate structure of class a beta-lactam Toho-1 E166A/R274N/R276N triple mutant. [cited 4 May 2023]. Available: https://www.rcsb.org/structure/2ZQC
  • 35.Oguri T, Ishii Y, Shimizu-Ibuka A. Conformational Change Observed in the Active Site of Class C β-Lactamase MOX-1 upon Binding to Aztreonam. Antimicrob Agents Chemother. 2015;59: 5069–5072. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Mitchell JM, Clasman JR, June CM, Kaitany K-CJ, LaFleur JR, Taracila MA, et al. Structural basis of activity against aztreonam and extended spectrum cephalosporins for two carbapenem-hydrolyzing class D β-lactamases from Acinetobacter baumannii. Biochemistry. 2015;54: 1976–1987. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Han S, Zaniewski RP, Marr ES, Lacey BM, Tomaras AP, Evdokimov A, et al. Structural basis for effectiveness of siderophore-conjugated monocarbams against clinically relevant strains of Pseudomonas aeruginosa. Proc Natl Acad Sci U S A. 2010;107: 22002–22007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Han S, Caspers N, Zaniewski RP, Lacey BM, Tomaras AP, Feng X, et al. Distinctive attributes of β-lactam target proteins in Acinetobacter baumannii relevant to development of new antibiotics. J Am Chem Soc. 2011;133: 20536–20545. [DOI] [PubMed] [Google Scholar]
  • 39.King DT, Wasney GA, Nosella M, Fong A, Strynadka NCJ. Structural Insights into Inhibition of Escherichia coli Penicillin-binding Protein 1B. J Biol Chem. 2017;292: 979–993. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Lu Z, Wang H, Zhang A, Liu X, Zhou W, Yang C, et al. Structures of Mycobacterium tuberculosis Penicillin-Binding Protein 3 in Complex with Five β-Lactam Antibiotics Reveal Mechanism of Inactivation. Mol Pharmacol. 2020;97: 287–294. [DOI] [PubMed] [Google Scholar]
  • 41.Maveyraud L, Pratt RF, Samama JP. Crystal structure of an acylation transition-state analog of the TEM-1 beta-lactamase. Mechanistic implications for class A beta-lactamases. Biochemistry. 1998;37: 2622–2628. [DOI] [PubMed] [Google Scholar]
  • 42.Starr TN, Thornton JW. Exploring protein sequence-function landscapes. Nature biotechnology. Nature Publishing Group; 2017. pp. 125–126. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Sruthi CK, Balaram H, Prakash MK. Toward Developing Intuitive Rules for Protein Variant Effect Prediction Using Deep Mutational Scanning Data. ACS Omega. 2020;5: 29667–29677. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Birgy A, Magnan M, Hobson CA, Figliuzzi M, Panigoni K, Codde C, et al. Local and Global Protein Interactions Contribute to Residue Entrenchment in Beta-Lactamase TEM-1. Antibiotics (Basel). 2022;11. doi: 10.3390/antibiotics11050652 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Thomas VL, McReynolds AC, Shoichet BK. Structural Bases for Stability–Function Tradeoffs in Antibiotic Resistance. J Mol Biol. 2010;396: 47–59. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Tokuriki N, Stricher F, Serrano L, Tawfik DS. How protein stability and new functions trade off. PLoS Comput Biol. 2008;4: e1000002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Miller SR. An appraisal of the enzyme stability-activity trade-off. Evolution. 2017;71: 1876–1887. [DOI] [PubMed] [Google Scholar]
  • 48.Tooke CL, Hinchliffe P, Bragginton EC, Colenso CK, Hirvonen VHA, Takebayashi Y, et al. β-Lactamases and β-Lactamase Inhibitors in the 21st Century. J Mol Biol. 2019;431: 3472–3500. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Past Bush K. and Present Perspectives on β-Lactamases. Antimicrob Agents Chemother. 2018;62. doi: 10.1128/AAC.01076-18 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Stimple SD, Smith MD, Tessier PM. Directed evolution methods for overcoming trade-offs between protein activity and stability. AIChE J. 2020;66. doi: 10.1002/aic.16814 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Anand N, Eguchi R, Mathews II, Perez CP, Derry A, Altman RB, et al. Protein sequence design with a learned potential. Nat Commun. 2022;13: 746. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Anishchenko I, Pellock SJ, Chidyausiku TM, Ramelot TA, Ovchinnikov S, Hao J, et al. De novo protein design by deep network hallucination. Nature. 2021;600: 547–552. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Nijkamp E, Ruffolo J, Weinstein EN, Naik N, Madani A. ProGen2: Exploring the Boundaries of Protein Language Models. arXiv [cs.LG]. 2022. Available: http://arxiv.org/abs/2206.13517 [DOI] [PubMed] [Google Scholar]
  • 54.Lau Y-TK, Baytshtok V, Howard TA, Fiala BM, Johnson JM, Carter LP, et al. Discovery and engineering of enhanced SUMO protease enzymes. J Biol Chem. 2018;293: 13224–13233. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Bar-Even A, Noor E, Savir Y, Liebermeister W, Davidi D, Tawfik DS, et al. The moderately efficient enzyme: evolutionary and physicochemical trends shaping enzyme parameters. Biochemistry. 2011;50: 4402–4410. [DOI] [PubMed] [Google Scholar]
  • 56.Kabsch W. XDS. Acta Crystallogr D Biol Crystallogr. 2010;66: 125–132. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Evans PR, Murshudov GN. How good are my data and what is the resolution? Acta Crystallogr D Biol Crystallogr. 2013;69: 1204–1214. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.McCoy AJ, Grosse-Kunstleve RW, Adams PD, Winn MD, Storoni LC, Read RJ. Phaser crystallographic software. J Appl Crystallogr. 2007;40: 658–674. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Adams PD, Afonine PV, Bunkóczi G, Chen VB, Davis IW, Echols N, et al. PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr D Biol Crystallogr. 2010;66: 213–221. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Emsley P, Lohkamp B, Scott WG, Cowtan K. Features and development of Coot. Acta Crystallogr D Biol Crystallogr. 2010;66: 486–501. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 1
media-1.pdf (4.5MB, pdf)

Data Availability Statement

All data and code are available in the supplemental material.


Articles from bioRxiv are provided here courtesy of Cold Spring Harbor Laboratory Preprints

RESOURCES