Skip to main content
Applied and Environmental Microbiology logoLink to Applied and Environmental Microbiology
. 2019 May 2;85(10):e00054-19. doi: 10.1128/AEM.00054-19

Validation and Stabilization of a Prophage Lysin of Clostridium perfringens by Using Yeast Surface Display and Coevolutionary Models

Seth C Ritter a, Benjamin J Hackel a,
Editor: Isaac Cannb
PMCID: PMC6498168  PMID: 30850429

Bacteriophage lysins exhibit high specificity and activity toward host bacteria with which the phage coevolved. These properties of lysins make them attractive for use as antimicrobials. Although there has been significant effort to develop platforms for rapid lysin engineering, there have been numerous shortcomings when pursuing the ultrahigh throughput necessary for the discovery of rare combinations of mutations to improve performance. In addition to validation of a putative lysin and stabilization thereof, the experimental and computational methods presented here offer a new avenue for improving protein stability and are easily scalable to analysis of tens of millions of mutations in single experiments.

KEYWORDS: Clostridium perfringens, antimicrobial protein, coevolutionary model, homolog, lysin

ABSTRACT

Bacteriophage lysins are compelling antimicrobial proteins whose biotechnological utility and evolvability would be aided by elevated stability. Lysin catalytic domains, which evolved as modular entities distinct from cell wall binding domains, can be classified into one of several families with highly conserved structure and function, many of which contain thousands of annotated homologous sequences. Motivated by the quality of these evolutionary data, the performance of generative protein models incorporating coevolutionary information was analyzed to predict the stability of variants in a collection of 9,749 multimutants across 10 libraries diversified at different regions of a putative lysin from a prophage region of a Clostridium perfringens genome. Protein stability was assessed via a yeast surface display assay with accompanying high-throughput sequencing. Statistical fitness of mutant sequences, derived from second-order Potts models inferred with different levels of sequence homolog information, was predictive of experimental stability with areas under the curve (AUCs) ranging from 0.78 to 0.85. To extract an experimentally derived model of stability, a logistic model with site-wise score contributions was regressed on the collection of multimutants. This achieved a cross-validated classification performance of 0.95. Using this experimentally derived model, 5 designs incorporating 5 or 6 mutations from multiple libraries were constructed. All designs retained enzymatic activity, with 4 of 5 increasing the melting temperature and with the highest-performing design achieving an improvement of +4°C.

IMPORTANCE Bacteriophage lysins exhibit high specificity and activity toward host bacteria with which the phage coevolved. These properties of lysins make them attractive for use as antimicrobials. Although there has been significant effort to develop platforms for rapid lysin engineering, there have been numerous shortcomings when pursuing the ultrahigh throughput necessary for the discovery of rare combinations of mutations to improve performance. In addition to validation of a putative lysin and stabilization thereof, the experimental and computational methods presented here offer a new avenue for improving protein stability and are easily scalable to analysis of tens of millions of mutations in single experiments.

INTRODUCTION

The increased prevalence of antibiotic-resistant bacteria (1) is necessitating the development of new platforms for the discovery and optimization of antimicrobials. Lysins, enzymes that degrade peptidoglycan within the cell wall of bacteria (2), possess the potential to be one such platform. Important for the bacteriophage lytic cycle as well as bacterial host cell wall remodeling, lysins have evolved to have high host specificity (3). Most Gram-positive lysins are multidomain proteins possessing a domain which binds specifically to the cell wall of target bacteria and a catalytic domain which hydrolyzes bonds in the peptidoglycan backbone (4). Lysins, however, generally have only mild stabilities, and there has been considerable effort to improve the stabilities of several lysins, traditionally by utilizing domain swapping (5, 6). Mutagenesis studies, either random (7) or rational when crystal structures are available to be used in conjunction with protein design frameworks such as Rosetta and FoldX (8), have had mild success. Lysin engineering efforts have been extensively reviewed (9). High-throughput engineering strategies employed for lysins include microfluidic encapsulation (10) and microtiter plates (7).

Beneficial mutations have been preferentially identified based on their presence in protein homologs, both natural (11, 12) and synthetic (13, 14). The hybrid of homolog-guided design and combinatorial library screening, to design efficient sets of mutants with a high frequency of beneficial mutants, has proven effective (1517). Although the cell wall binding domains of lysins tend to be unique, the catalytic domains have high structural and sequence homology (18) with other members of the same catalytic families. Some catalytic-domain families, due to the presence of prophage genomic elements as well as autolysins, have tens of thousands of homologous family members. The concordance of these two facts implies that homology-guided generative models incorporating higher-order interactions are promising tools for guiding the design of mutagenic lysin libraries.

The promise of bacteriophage lysins as antimicrobials has been hindered by environmental as well as growth-phase-dependent cell wall modifications, leading to poor in vivo performance in many instances despite in vitro success (19). Improving lysin catalytic performance through mutagenesis may be greatly enhanced by first stabilizing the catalytic domain, as seen in other enzymes (20, 21). This is because most mutations will be destabilizing (22), and improving the starting stability increases the fraction of folded variants, thereby increasing the available sequence space that can be explored for performance optimization.

Yeast surface display (23, 24) has been utilized to engineer a wide variety of protein properties, including thermostability (25), binding affinity (23, 26, 27), and enzyme activity (28, 29). This is accomplished, but also limited, by the nature of the assay, which tethers a protein of interest via a flexible linker to a displayed protein on the yeast surface. Here, yeast surface display was used as a high-throughput protein stability assay to evaluate the use of homology-guided protein generative models for the design of mutagenic libraries of the catalytic domain of a putative lysin (30) from a prophage region of the genome of Clostridium perfringens ATCC 13124. Furthermore, a first-order Potts model was regressed over the stability information of the multimutants in order to extract single-mutant stability contributions. From these data, promising mutations from multiple libraries were combined into designs not seen individually in the assay to generate a small collection of designed proteins (Fig. 1). Of the five designs tested, with between five and six mutations, all degrade C. perfringens cell walls; four have a higher melting temperature (Tm) than the wild type, with the largest increase being approximately 4°C; and three retain more activity after incubation at 42°C than the wild type.

FIG 1.

FIG 1

Experimental and data analysis workflow.

RESULTS

LysCP2 degrades C. perfringens cell walls and exhibits modest stability.

LysCP2 was previously identified as a putative lysin in the genome of C. perfringens ATCC 13124 (30). Of the list of putative lysins, LysCP2 was selected for the present study for the following reasons: it belonged to the largest catalytic family (glycoside hydrolase 25 [GH25]) of identified lysin genes; it possessed an SH3 domain, common among almost all putative lysins identified in the study; and it originated from a prophage domain of the genome, indicating that it is most likely of bacteriophage origin. To aid in the visualization and identification of catalytic-domain residues, the automated pipeline of SWISS-MODEL (31) was used to generate a homology model of LysCP2. This process identified an X-ray structure of the endolysin of C. perfringens phage phiSM101 (32) as a template with 40% identity with LysCP2. The resulting template had a qualitative model energy analysis (QMEAN) score of −3.14, with the highest confidence within the catalytic domain. This structure comprises two functional domains, an N-terminal GH25 domain and a C-terminal SH3 cell wall binding domain (Fig. 2A and B). None of the cysteines, in either the catalytic domain or the cell wall binding domain, are oriented such that one would expect them to form disulfides, as predicted in the homology model. Escherichia coli effectively produced LysCP2 in the soluble lysate fraction with a genetically appended GSHHHHHH epitope at the C terminus to facilitate purification by cobalt affinity chromatography (Fig. 2C).

FIG 2.

FIG 2

Sequence and purification of LysCP2. (A) Homology model of LysCP2, with the locations of designed mutagenic libraries highlighted as colored spheres. (B) Primary sequence of LysCP2, with underlined catalytic domains and mutagenic libraries with colors the same as in the structure. (C) SDS-PAGE analysis of purified LysCP2 resulting from the soluble fraction of the E. coli lysate purified by Co-NTA chromatography. The expected molecular weight is 41.1 kDa.

The activity of purified LysCP2 was assessed against crude cell wall extract of C. perfringens ATCC 12916. LysCP2 at 200 nM effectively degrades the C. perfringens cell wall (Fig. 3A), consistent with its hypothesized function. The thermal stability of LysCP2 was determined by a Sypro orange thermal assay, which indicates a modest Tm of 38.3°C ± 0.7°C (Fig. 3B). To evaluate the influence of the catalytic domain on stability, the catalytic domain with the addition of the flexible linker region (INKESSKVT) was expressed in isolation and evaluated for thermal stability (Fig. 3B). Although the Tm cannot be reliably measured, it can be estimated to be <30°C from the data. Thus, LysCP2 exhibits the expected cell wall degradation function, although it has modest thermal stability, including within its catalytic domain, which motivates evolution of elevated stability.

FIG 3.

FIG 3

Purified LysCP2 degrades C. perfringens cell walls with modest stability. (A) C. perfringens ATCC 12916 cell walls were exposed to either 0 nM (dotted lines) or 200 nM (solid lines) LysCP2. Cell wall integrity was monitored via the optical density at 600 nm. Three replicates of each condition are shown. (B) Full LysCP2 (solid lines) or the catalytic domain of LysCP2 (dotted lines) was heated from 25°C to 98°C, and the Sypro orange signal was monitored. The Tm of LysCP2 is indicated by a vertical red line. AU, arbitrary units.

Proteinase K degradation of yeast-displayed LysCP2 demonstrates thermal transition consistent with stability and inactivation properties.

Yeast surface display can be used for high-throughput evaluation of protein stability based on resistance to proteinase K digestion under thermal stress (Fig. 4A) (33). Yeast-displayed LysCP2 demonstrates resistance to proteinase K cutting until unfolding at 40°C (Fig. 4B). This transition is consistent with stability as determined by a purified protein Sypro orange assay (Fig. 3B). This transition provided evidence that the proteinase K assay could be used to evaluate the stabilities of LysCP2 mutants in the display context (Fig. 4C) (see below).

FIG 4.

FIG 4

Proteinase K degradation of yeast-displayed LysCP2 with different thermal stresses. (A) Proteinase K preferentially cleaves unfolded displayed proteins. The fraction of protein cleaved was analyzed by dual-color flow cytometry labeling the N-terminal HA and C-terminal c-myc epitopes. (B and C) Clonal LysCP2 (B) or the full-length mutagenic library (C) was displayed, incubated in proteinase K at the indicated temperatures (None, without proteinase K), and evaluated by flow cytometry. Note that the None condition for panel C was assessed on an instrument different than the instrument for the accompanying thermal conditions; the None signals have been linearly scaled so that the mode of the distribution is 1.0. The clonal sample exhibits a unimodal distribution, whereas the library is bimodal with a population that exhibits proteolytic cutting, even at 30°C. Note the difference in temperatures between panels B and C.

In addition to the goal of engineering enhanced stability in LysCP2, the capacity of coevolutionary statistical models to aid in this process was assessed. In particular, a second-order Potts model was selected as the generative model of interest. The parameters of this model, inferred from thousands of homologous sequences (Table 1), decompose a protein sequence into sets of site-wise (hi) and pairwise (Ji,j) contributions, which sum to give a statistical fitness [E(σ)].

E(σ)=ihi(σi)+ij>iJi,j(σi,σj)

TABLE 1.

Generative model characteristics

Model Sequence set No. of positions Coupling regularization No. of sequences
Total Effectivea
Firm_L1 Firmicutes 197 L1 10,174 3,607
Firm_L2 Firmicutes 197 L2 10,174 3,607
GH25_L1 Glycoside hydrolase 25 169 L1 25,738 5,523
GH25_L2 Glycoside hydrolase 25 169 L2 25,738 5,523
a

The effective number of sequences is the sum of weights of sequences where the weight for each sequence is the inverse of the number of sequences with percent identity greater than 80%, including the sequence of interest.

Given the importance of lysin folding for its function, it was hypothesized that the statistical fitness of lysins would correlate with their stabilities, as has been observed for other protein families (34).

The theoretical relevance of the model chosen can be derived from random energy machines (34), with the energy of a state (sequence) being equal to its statistical fitness and the distribution of protein sequences following a Boltzmann distribution.

P(σ)=eE(σ)Z

The model parameters were inferred, via maximization of a pseudolikelihood approximation of the intractable partition coefficient Z and previously described regularization coefficients (35), from a multiple-sequence alignment generated with homologous sequences of the catalytic domain of LysCP2. These sequences were identified with an iterative homology search in JackHmmer (36), using the catalytic domain of LysCP2 as the seed sequence and homologous sequences restricted to those annotated from the Firmicutes phylum in the UniProtKB database (37). Homology was restricted to this selection because there has been evidence that some lysin catalytic domains maintain a specificity similar to that of their full-molecule counterparts (3840), implying that some amount of specificity can be encoded in the catalytic domain alone. These data provided the observations for inference of a Potts model using PLMC (35) and L1 regularization. With the fitness model in place, combinatorial libraries were designed to both identify stabilized LysCP2 variants and evaluate the efficacy of the generative model. Each library diversified six contiguous residues restricted to sites without significant evolutionary couplings to the primarily connected catalytic residues (designated PLMC). Residues were determined to be evolutionarily coupled if their coupling scores (35) were greater than 0.2 (see Fig. S1A in the supplemental material). Residues that were connected to putative catalytic residues (Y66, D98, E100, Y156, and Q174) (32) were defined as belonging to the catalytic network. Thus, these, and any sites evolutionarily coupled to these sites, were excluded from library diversification, as we aimed to study mutations primarily impacting stability. The design algorithm (see Materials and Methods) selects amino acid degeneracy assuming additive benefits only from site-wise mutations, prioritizing the inclusion of the highest-performing variants and, when possible, selecting degenerate codons that avoid the lowest-performing mutations. From this collection of possible libraries, 10 were selected (as detailed in Algorithm S1 and Fig. S1 in the supplemental material). The designed diversity of the final libraries, and comparisons with the experimentally observed library diversity, are presented in Fig. S2 in the supplemental material. At the positions of all designed libraries (Fig. 5), a second library was also tested with full degeneracy (NNK codons). The use of designed and random libraries assesses the utility of the generative model and samples a broad range of statistical fitnesses.

FIG 5.

FIG 5

Fraction of stable variants observed across all libraries for different library design strategies. Each library number is followed by the indices of the first residue number and the identity of the six wild-type residues. Error bars presented are 95% confidence intervals computed with the Clopper-Pearson interval.

Genes were synthesized via PCR with degenerate oligonucleotides (for the full procedure, see Materials and Methods) and transferred into a yeast surface display system as fusions to the C terminus of Aga2p with a PAS40-(Gly4Ser)3 linker (41). A total of 1.2 million yeast transformants were obtained. A collection of variants with full-length gene products was isolated by sorting 200,000 yeasts displaying a C-terminal c-myc epitope signal. The collection of mutant libraries was then subjected to protease stability screening. A substantial fraction of the population exhibits reduced stability, as evidenced by protease susceptibility at low temperature (30°C) (Fig. 4C). The bimodality of this population is not observed in the absence of protease treatment, providing evidence that even at low thermal stress, many of the displayed mutants are unstable. Additional variants exhibit protease susceptibility at temperatures in the mid-30s °C. Yet similar to the wild type, a substantial fraction of the mutagenic library of LysCP2 does not display a transition until thermal treatment at a more elevated temperature. Notably, very few variants exhibit stability superior to that of the wild type.

LysCP2 variant genes from stable and unstable populations at 30°C (isolated by flow cytometry of yeast with high and low, respectively, c-myc/hemagglutinin [HA] signals) were extracted and analyzed via Illumina MiSeq sequencing. This process generated 1.5 million sequences for analysis. Following preprocessing to remove noise and sequences that did not match library designs, sequences were classified as stable or unstable if at least five more sequence counts were observed in either the stable or unstable populations. This cutoff served to remove sequences with intermediate stability as well as those that were noisy observations due to erroneous sequencing. Increasing this threshold progressively removes sequences while improving performance, as assessed by the area under the curve (AUC), until too many sequences are removed, which degrades model quality (Fig. S3). As a result, 9,749 multimutants were classified as stable or unstable, and summary statistics were computed for each (Fig. 5 and Table S1). For 8 of 10 libraries, the designed library yielded a greater proportion of stable variants than the library of randomized sequences (Fig. 5). The stable fractions of the remaining designed libraries, libraries 8 and 10, were not statistically different from those of their random counterparts. Given the nature of the classification, these summary statistics do not directly represent the magnitude of stabilities of variants within the libraries.

High-throughput analysis of variant stabilities.

Using the extensive set of stable and unstable sequences observed from the assay, model predictive performance was evaluated for different homolog data sets and regularization methods. Potts models inferred using PLMC on homologous sequences were used to compute the difference in statistical fitness relative to wild-type LysCP2 (Δ statistical fitness).

In addition to these homology-driven methods, a first-order Potts model was inferred using logistic regression using the collection of stable and unstable sequences from the yeast display stability assay. A logistic regression was selected as it is analytically equivalent to two-state thermal equilibrium and experiences less bias than for naive Bayes (42). To assess this experimental model’s predictive capability on sequences outside its training set, 90% of sequences were used to train the model, and the stabilities of the remaining 10% were predicted. This process was then restarted and repeated to predict the stability of all sequences (10-fold cross-validation). Furthermore, bootstrap regression enabled error estimation for each library to elucidate mutations with a statistically significant effect.

The predictive performance of each model was computed as the AUC from a receiver operating characteristic (ROC) curve with weighting applied to equalize classes (stable and unstable) as well as library representation, so that each library had equal representation regardless of the number of sequences (Fig. 6). All models were substantially superior to random. The weakest performance (AUC = 0.78) was observed for the model derived from the firmicute homologs with L1 regularization. Notably, a model that merely considered the Hamming distance between mutant and wild type yielded an equivalent AUC. Expansion of the homolog sequences to the entire glycoside hydrolase 25 family, maintaining L1 regularization, elevated the AUC to 0.83. The use of L2 regularization further improved performance to 0.85 for both homolog sets. The logistic regression from experimental data achieves the highest performance of any strategy (AUC = 0.95).

FIG 6.

FIG 6

Performance of homology and experimentally driven models to predict stability of LysCP2 variants with up to six mutations. Receiver operating characteristics for Hamming distance to LysCP2 (blue line) or with generative models Firm_L1 (black solid line), Firm_L2 (black dashed line), GH25_L1 (red solid line), and GH25_L2 (red dashed line) or the cross-validated experimental model (green line).

There is a positive and significant correlation between site-wise Δ statistical fitnesses computed with the GH25_L2 model and the experimental logistic model across all mutated positions (Fig. 7A to C) (slope 95% confidence interval, 0.085 ± 0.019 for Δ stability score/Δ statistical fitness). The nonunity of the slope is not unexpected given that these values have not been transformed into common units. Under the conditions of the assay, a variant remaining 50% uncleaved would see that value rise to 60% after the addition of a beneficial mutation with Δ statistical fitness of +5.0.

FIG 7.

FIG 7

Connection between statistical fitness and stability. (A) Δ statistical fitness values of all possible single mutations (rows), relative to wild-type residues (dotted squares), across each library position (columns). (B) Δ stability score derived separately for each library at the indicated positions for the indicated single mutants. (C) Δ statistical fitness versus the Δ stability score exhibits significant correlation. Red lines show the best fit as well as upper and lower bounds of the 95% confidence interval (slope, 0.085 ± 0.019) with a constant intercept. (D) Fraction of mutants that are stable at the indicated levels of Δ statistical fitness, with the median (solid line), 95% confidence interval (dashed lines), and a fitted two-state model (dotted line).

When approximating the stable fraction of sampled protein variants as a function of Δ statistical fitness, a transition can be seen (Fig. 7D). The close agreement between the estimation and a fitted two-state model is consistent with models whereby the fitness penalty is proportional to the unfolded fraction of molecules of a protein (34).

Designed sublibraries exhibit a more equal representation of both stable and unstable sequences, in contrast to the completely random library (Fig. 5). These sublibraries were regressed the same as the larger libraries (cross-validated AUC of between 0.9 and 0.97 for all sublibraries), and stability scores were computed and compared with statistical fitnesses (Fig. 8A). Focusing on this subset of sequences, libraries 1, 2, 7, 8, and 10 exhibit strong correlation between Δ stability score and Δ statistical fitness. Of these libraries, libraries 1, 2, 7, and 10 are predicted to have dominant alpha-helical character, and library 8 is predicted to be buried and to have low secondary structure. Of the noncorrelated libraries, libraries 6 and 9 are on opposite edges of the catalytic domain and are possibly close to the peptide cross-bridges of the peptidoglycan (32); libraries 3 and 4 are on the upper face of the catalytic pocket (Fig. 8B).

FIG 8.

FIG 8

Structural insights provide context of where coevolutionary models are most predictive. (A) Full-sequence Δ statistical fitness and Δ stability score of mutants belonging to libraries designed with Algorithm S1 in the supplemental material. (B) Homology model positions, highlighted in red.

Examining these libraries further, library 6 can be seen to contain two parallel subpopulations (Fig. 8A). These two subpopulations are distinguished by a highly inaccurate prediction at position 26. The designed sublibrary allows only two mutations, wild-type L and mutant V. Although L26V is highly beneficial by statistical fitness, it is highly detrimental by stability score (Fig. 7A and B), thereby creating the bimodality. Library 5 displays two populations which are independently uncorrelated; however, together they are positively correlated. This appears to be due to C130, which is the only position with strong correlation.

These observations imply that statistical fitness offers the highest confidence in predicting stability changes of the catalytic domain of LysCP2 when considering positions with high secondary structure or low solvent-accessible surface area or that are more distal from the active site.

A site-wise model of stability identified stabilizing mutants that maintain activity against C. perfringens cell walls.

To estimate the error of model parameters, the coefficients for the experimental model were inferred across 1,000 bootstraps to approximate a 90% confidence interval for each parameter. Using this information, mutations with significantly beneficial stability over the wild type were identified (Fig. 9). From this collection, N54R was selected instead of N54K to preserve possible hydrogen bonding; Y107R and Y107G were selected as representatives of the two most beneficial sets of mutations at positions 107 (positive and small, respectively); G43E was not predicted to be significantly different than the wild type, but it was added due to its similarity to G43D and G43N for comparison; and N112D, A127I, and C130V are the only beneficial mutations at their respective positions. These were combined into a collection of 5 design sequences incorporating between 5 and 6 mutations (Fig. 10A and B).

FIG 9.

FIG 9

Bootstrap error estimates of residue stability scores. One thousand bootstraps drawn from library sequences were used to estimate site-wise stability score contributions for wild-type (red) and mutant (black) amino acids. Error bars are 90th percentile confidence intervals. Closed black circles indicate those mutants for which both the mean and error were zero, indicating insufficient information for parameter estimation.

FIG 10.

FIG 10

Site-wise model of stability used to identify potentially stabilizing mutants yields improved thermal stability while maintaining activity against Clostridium perfringens cell walls. (A) Residues of LysCP2 identified for mutagenesis to improve stability. (B) Compositions of different tested constructs. (C) Melting temperature (Tm) for each construct as determined via a Sypro orange thermal denaturation assay (n = 11). (D and E) Normalized or residual activity of constructs at 200 nM on crude purifications of Clostridium perfringens strain ATCC 12916 cell walls after 30 min at 4°C (D) or 42°C (E), respectively (n = 3). For panels C to E, error bars are standard errors, and statistical tests are two-sided unequal-variance t tests with Bonferroni correction. **, P < 0.01; *, P < 0.05.

The designs were produced in E. coli and evaluated for thermal stability and activity. Of the designs, 4 of the 5 display higher thermal stability than the wild type, with improvements by up to 4°C as assessed by a Sypro orange thermal denaturation assay (Fig. 10C). All tested designs degrade crude cell wall preparations of C. perfringens ATCC 12916 at 37°C in phosphate-buffered saline (PBS) (Fig. 10D). The combination of stability and activity was tested by treating C. perfringens cell walls with 200 nM lysin after incubation at an elevated temperature (42°C) for 30 min. Variants D1, D3, and D5 exhibit 4- to 5-fold greater activity than the wild type following thermal treatment, which retains only 20% of the activity relative to unheated lysin (Fig. 10E). D2 is nominally more active than the wild type, whereas D4 loses all activity despite thermal stability similar to that of the wild type.

DISCUSSION

Generative models have been shown to predict protein fitness changes (35). Yet it is still being studied how statistical fitness, however accurate, can be interpreted to engineer physical properties, such as stability, especially in the context of multiple mutations. A protein’s contribution to organism fitness is a nonlinear function of many attributes, including, for example, stability, solubility, and activity. In some cases, these attributes are the result of different spatially connected regions of a protein. (43) Many proteins are only marginally stable under physiological conditions, and destabilizing mutations therefore incur a fitness cost on the organism due to a decreased effective concentration as well as increased aggregation and protein degradation stresses. Focusing mutations in regions of LysCP2 that were evolutionarily decoupled from the catalytic residues enabled the study of mutations primarily affecting folding properties, including stability. The results shown here for the glycoside hydrolase 25 catalytic domain of a phage-lytic enzyme isolated from Clostridium perfringens reveal that a second-order Potts coevolutionary model is predictive in its classification of variants at these positions as stable or unstable. The near-wild-type activity of all designs (Fig. 10D) is consistent with the hypothesis that the selected positions were removed from catalytic activity, thereby improving critique of the coevolutionary model to infer stability changes.

Although libraries designed using a homology model did not display a high proportion of stabilizing effects in comparison with the wild type (Fig. 3), it was desired to infer potentially stabilizing mutations from the data and combine those to generate an improved molecule. A site-wise global model was fit over the experimental data of each library and obtained high performance in predicting mutational outcome on assay stability (cross-validated ROC AUC = 0.95). Comparing predicted single-mutant statistical fitnesses of the homology model to stability scores of the experimental model demonstrated a correlation across all mutations. When examining each sublibrary separately (Fig. 8), five libraries (sites 54 to 59, 83 to 88, 90 to 95, 118 to 123, and 133 to 138) exhibit strong linear correlation between statistical fitness and stability score.

There are several possibilities to explain the disagreements of four libraries (while a fifth library’s discordance is explained by a single strongly inaccurate prediction [L26V]). These possible inconsistencies arise from systematic biases in the experimental setup or assumptions made in the connection of statistical fitness and stability. First, the assay itself is a measure of protease stability in the context of yeast surface display and may not be as strongly correlated with soluble-form stability as assumed. Second, it is possible that the cell wall binding domain improves the overall stability of the molecule, thereby decreasing the fitness burden of a less stable catalytic domain. Indeed, for LysCP2, when produced in isolation, the catalytic domain has significantly reduced stability compared to the full form. Third, the in vivo function of phage lysins requires that they pass through holins to degrade the cell wall at the end of the lytic cycle. This process may be optimal for particular surface-displayed residues at the expense of other properties such as stability. Nonetheless, the highly correlated results of some libraries imply that selecting positions that have high secondary structure, are buried, or are distal from the catalytic pocket or possible substrate interactions can improve the predictive performance of statistical fitness.

Not all designs predicted to be more stable by the experimentally derived model were. The Y107G substitution was predicted to be nearly as stabilizing as Y107R; however, it resulted in a significant decrease in both melting temperature as well as kinetic stability (Fig. 10C and E). This could potentially be due to biases of proteinase K’s cutting activity, which could result in Y107G being more resistant to protease degradation on average across the mutations presented in that sublibrary. In another example, although less pronounced, these types of biases may also be seen in comparisons of G43D and G43E. The experimentally derived model predicted G43E to be less stabilizing than G43D, whereas in the pure form, these results appear to be reversed. Protease biases can be mitigated by performing this stability assay with multiple proteases, as seen in the work of Rocklin et al. (33).

Although relatively small, the improvements to stability seen in the final designs, which maintain wild-type activity levels, support the use of coevolutionary models to augment library design in the pursuit of stabilizing mutations to lysins. Within the context of methods presented here, experimental outcomes could be further improved in multiple ways: (i) increased sampling under a wider range of environmental conditions, which could reduce noise and enable pairwise parameter estimation from experimental data; (ii) creation of second-generation libraries from these experimental results, which would enable direct evaluation of stabilities of many more designs (i.e., the current results were accomplished with a single iteration of evolution); and (iii) selection of library positions based on structural arguments without the requirement of continuity in primary sequence. Beyond these, coevolutionary models could be used to augment existing stabilizing techniques, such as iterative-saturation mutagenesis.

The designed libraries were limited to a maximum possible genetic diversity of 16,000 to demonstrate the capacity of a designed restrained library to outperform a random library at those same positions (the maximum diversity of the random libraries at each set of positions is approximately 1 billion members). The first arm of the study was meant to provide a large number of multimutant observations to assess the utility of the generative protein model, and although it could be expanded, the roughly 10,000 variants whose stabilities could be categorized by the assay were sufficient to explore this question. This work also presents support for the translatability of the protease stability assay, with thermal shifts, on the surface of yeast to a particular lysin. Moving beyond the work presented here, future studies attempting to stabilize lysin catalytic domains will benefit greatly by using coevolutionary models to augment the design of libraries larger than those presented here (at many more positions) and be scalable to use the full capacity of fluorescence-activated cell sorting and yeast display. Of particular note, given the principles applied here, one could choose to titrate the temperature used to assay these libraries and utilize multiple rounds of sorting in order to enrich directly for the rare multimutants which provide large benefits in stability.

MATERIALS AND METHODS

Bacterial and yeast cultures.

Escherichia coli cells were grown in lysogeny broth (LB) in liquid, or solid with 1.5% agar, supplemented with either 100 μg/ml ampicillin or 50 μg/ml kanamycin when noted. All cultures were grown at 37°C, with liquid cultures shaken at 250 rpm, unless otherwise noted.

Clostridium perfringens strains (generously provided by S. Swift in the Donovan Laboratory at the U.S. Department of Agriculture) were grown on solid brain heart infusion (BHI) medium (37 g/liter brain heart infusion medium, 1.5% agar) or liquid BYC (BHI medium with 5 g/liter yeast extract and 0.5 g/liter l-cysteine). Solid culture plates were prepared fresh for each use and incubated overnight anaerobically using the AnaeroGen compact system (Thermo Fisher). Liquid cultures were sealed immediately after autoclaving to reduce dissolved gases. After equilibration to 37°C, the liquid culture was inoculated with fresh bacterial colonies.

Saccharomyces cerevisiae strain EBY100 was grown nonselectively in liquid YPD (10 g/liter yeast extract, 20 g/liter Bacto peptone, and 20 g/liter d-glucose) or on solid YPD with 1.5% agar. For selective growth, yeast cells were grown in SD-CAA (16.8 g/liter sodium citrate dihydrate, 3.9 g/liter citric acid, 20 g/liter d-glucose, 6.7 g/liter yeast nitrogen base, 5 g/liter Casamino Acids). For induction, yeast was grown in SG-CAA (10.2 g/liter Na2HPO4·7H2O, 8.6 g/liter NaH2PO4·H2O, 19 g/liter d-galactose, 1 g/liter d-glucose, 6.7 g/liter yeast nitrogen base, 5 g/liter Casamino Acids). All growth was done at 30°C at 250 rpm where applicable for liquid culture.

Generation of a LysCP2 expression plasmid.

The pET-24 expression plasmid, previously modified to incorporate a C-terminal six-histidine tag (pETh) (44), was digested with NdeI and BamHI-HF (New England Biolabs) and isolated by gel electrophoresis. The digested plasmid was then assembled via Gibson assembly (HiFi; New England Biolabs) with a codon-optimized gBlock (Integrated DNA Technologies) for LysCP2 (GenBank accession number WP_003469445), transformed into E. coli MC1061 F (Lucigen), plated on solid LB supplemented with 50 μg/ml kanamycin, and sequence verified.

Protein production and purification.

Assembled plasmids were transformed into NEB T7 Express LysY/Iq (New England Biolabs), plated on solid LB with 50 μg/ml kanamycin, and grown overnight at 37°C. Fresh colonies were then used to inoculate 3 ml LB with 50 μg/ml kanamycin and grown at 37°C at 250 rpm overnight. The culture was then diluted into 100 ml LB, grown to an optical density at 600 nm (OD600) of between 0.5 and 0.8, and induced with isopropyl β-d-1-thiogalactopyranoside (IPTG) to a final concentration of 0.5 mM. The culture was then induced for 3 h, chilled, and pelleted at 6,000 × g at 4°C in 3-min increments until all the culture was pelleted. Cell pellets were then resuspended in 0.6 ml of lysis buffer {137 mM NaCl, 2.7 mM KCl, 8 mM Na2HPO4, and 2 mM K2PO4 (PBS) plus 5% glycerol, 3.1 g/liter 3-[(3-cholamidopropyl)-dimethylammonio]-1-propanesulfonate (CHAPS), and 1.7 g/liter imidazole}. The suspension was then freeze-thawed four times at −80°C.

A total of 0.6 ml of wash buffer (20 mM imidazole in 1× PBS) was then added to each sample and mixed by inversion several times. The suspension was then spun at 17,000 × g for 10 min at 4°C. Following centrifugation, the insoluble fraction was removed by pipetting, and the soluble fraction was sterilized through a 0.22-μm syringe filter (GE Healthcare). Purification was performed with 0.2 ml HisPur cobalt nitrilotriacetic acid (Co-NTA) spin columns according to the manufacturer’s instructions (Thermo Fisher). Proteins were then desalted into PBS with Zeba desalting columns according to the manufacturer’s instructions (Thermo Fisher).

Protein purity was assessed by using an SDS-PAGE (sodium dodecyl sulfate-polyacrylamide gel electrophoresis) (NuPAGE Bis-Tris 4% to 12%) gel and ranged from 70% to 88% for full-length lysins and was 91% for the catalytic-only domain (see Fig. S4 in the supplemental material).

Sypro orange thermal denaturation assay.

Purified proteins were diluted in PBS to stock concentrations of 5 μM, and 45 μl was aliquoted into optically clear PCR tubes. The stock solution of Sypro orange (Thermo Fisher) was diluted to 200× in PBS, 5 μl of which was added to each tube and gently mixed. Samples were then loaded into a CFX Connect real-time PCR detection system, and the temperature was ramped from 25°C to 98°C in 0.5°C increments, with 30 s for each step to allow for equilibrium. The Sypro orange signal was monitored via the fluorescence resonance energy transfer (FRET) channel (450- to 490-nm excitation with 560- to 580-nm emission). The temperature derivatives of these data were determined by smoothing with local second-degree polynomials with a window width of 2.5°C through the use of a Savitzky-Golay filter (sklearn). The Tm of a sample is determined as the temperature with the maximum derivative.

Clostridium perfringens cell wall extraction and crude purification.

C. perfringens was cultured as described above to mid-exponential phase in liquid. The liquid culture was then cooled on wet ice for 15 min, transferred to sterile 50-ml conical tubes, and centrifuged at 6,000 × g at 4°C for 5 min. The cell pellet was then resuspended in 1 ml 50 mM Tris-HCl and added dropwise to 20 ml boiling 5% SDS. Cells were then boiled for 15 min, cooled to room temperature, and centrifuged at 10,000 × g for 10 min. The pellet was then washed twice with 1 ml 1 M NaCl, 7 times with 1 ml deionized water, and then 2 times with 1 ml PBS; all washes were pelleted at 17,000 × g for 10 min. Crude cell walls were then stored at 4°C in PBS until use.

Cell wall degradation assay.

The crude cell wall extract was diluted in PBS to an OD600 of 1.0. In wells of a 384-well plate, 40 μl crude cell wall was added to 10 μl 1 μM test proteins or buffer. The OD600 was then monitored via a spectrophotometer at 37°C, collecting data in 2-min increments. Activity was calculated as the slope of the linear region of the OD600 versus time. For thermal inactivation assays, proteins were first incubated at 42°C for 30 min, followed by cooling on wet ice before use in assays as described above.

Homology-guided generative model of LysCP2.

The statistical fitness of lysin variants was predicted using a Potts model:

E(σ)=ihi(σi)+ij>iJi,j(σi,σj)
Pσ=eEσZ

where E(σ), the statistical fitness of a sequence, σ, is the sum of site-wise compositional contributions, hi, and pairwise contributions, Ji,j. The probability of observing a sequence in a data set given a set of random energy machines’ sampling space, P(σ), is a Boltzmann distribution. The parameters of this model can be inferred in a number of different ways, each approximating the intractable partition coefficient, Z. Inference was performed using PLMC (https://github.com/debbiemarkslab/plmc) with recommended regularization coefficients as described previously (35). These parameters were a neighborhood cutoff of 80% identity for each sequence; a site-wise regularization coefficient, λh, of 0.01; and pairwise regularization coefficients, λJ, of 39.2 and 33.6 for the firmicute and glycoside hydrolase 25 family models, respectively.

Sequence data sets were gathered from two sources. (i) The catalytic domain of LysCP2 was used as the seed sequence for an iterative JackHmmer search (36). The UniProtKB database (37) was used with taxonomic restriction set to Firmicutes. Five iterations were performed with the hit threshold set to an E value of 1.0 × 10−20. Sequences were then aligned with PROMALS3D (45) with default settings. (ii) A Pfam (46) family alignment of the catalytic domain of LysCP2 (glycoside hydrolase 25 [PF01183]) was performed, utilizing the NCBI database source.

Design of libraries using statistical fitness.

The algorithm for library design according to statistical fitness (fully detailed in the supplemental material) adds library diversity up to a fixed limit based upon single-mutation Δ statistical fitness information from the parental sequence. The user identifies (i) maximum library size, Lmax, based on experimental synthesis and screening limits; (ii) residues to be mutated (ri); (iii) the parental sequence, σ; (iv) a scoring function to compute the fitness of a given sequence, f(σ); and (v) a codon table, c. The script then sets the initial diversity of each codon to null and expands the library iteratively.

The fitness, fij, of all single-mutation substitutions at each ri is computed and sorted in descending order. Moving progressively through the list, each mutation is proposed as an addition to the growing library. Given the collection of mutations already desired at that position, a set of degenerate codons which include all previously accepted mutations as well as the proposed additional mutation is generated. From this collection of degenerate codons, one is selected, which maximizes the minimum fij while maintaining the smallest combinatorial genetic size. If the inclusion of this degenerate codon maintains the library size below Lmax, then it replaces the current degenerate codon at that residue location. The algorithm then progresses with the next-best-proposed mutation until Lmax is reached.

Design of mutant libraries.

The set of evolutionary coupling scores of LysCP2’s catalytic domain, computed using the Potts parameters inferred from the firmicute data set using L1 regularization, was computed as discussed previously (47). The set of residues that were statistically coupled to the catalytic triad as well as the residues connected to that set were removed from the list of residues. For all primary sequence contiguous sets of residues with a length of 6, the algorithm for library design, as outlined above, was performed with maximum rational library sizes of 16,000. Two metrics were calculated for each designed library: the maximum single-mutation score and the 90th percentile statistical fitness score as determined by random sampling from the library. A set of 10 nonoverlapping libraries was selected to maximize point mutant scores, with 90th percentile scores used for tie breaking (Fig. S1 and Algorithm S1).

Generation of a yeast surface display plasmid for LysCP2 and mutants.

Sequence-verified LysCP2 was amplified from the pETh plasmid using primers LYS001 and LYS005 (for all indicated oligonucleotides, see Table 2); it was then assembled with the pCT40 plasmid (41) previously digested and isolated with NheI-HF and BamHI-HF (New England Biolabs), transformed into E. coli MC1061 F (Lucigen), and plated on solid LB supplemented with 100 μg/ml ampicillin. The resulting construct encodes an Aga2p linker with HA epitope-LysCP2-Myc epitope.

TABLE 2.

List of oligonucleotides

graphic file with name AEM.00054-19-t0002.jpg

Multimutant libraries of LysCP2 were generated in two steps: (i) the generation of the mutagenic region with the downstream sequence and (ii) the use of these fragments as megaprimers to complete the gene by appending the upstream sequence. First, pCT40:LysCP2 was amplified with library-specific mutagenic forward primers, Lib.#.PLMC and Lib.#.NNK for each, with a universal reverse primer, Lib.all.REV, via PCR with Q5 polymerase (New England Biolabs), each in separate reactions. These fragments were then isolated by gel electrophoresis and amplified in the same manner but with a new universal forward primer, Lib.all.FWD, and the pCT40:LysCP2 plasmid in a two-cycle reaction. This product for all libraries was then mixed and transformed with the pCT40:LysCP2 vector that had been linearized with NheI-HF and BamHI-HF as outlined previously by Woldring et al. (27).

Stability assay via yeast surface display.

Yeast cells transformed with pCT40 plasmids were induced by first growing the cells to an OD600 of ∼1.0 in SD-CAA. Cells were then pelleted for 1 min at 6,000 × g, resuspended in SG-CAA, and induced overnight at 20°C at 250 rpm. The following day, yeast cells were washed twice with PBSA (PBS with 0.1% bovine serum albumin). Cell density was determined by dilution and measurement of the OD600. Ten million cells were diluted into 200 μl PBSA per sample. Proteinase K (New England Biolabs) was diluted in PBSA to a concentration of 2.0 × 10−4 U/μl. The tubes of dilute proteinase K and yeast were separately incubated at 30°C to 45°C for 5 min. Next, 200 μl of the proteinase K dilution was added to each yeast sample, mixed briefly by pipetting, and incubated for 10 min. Six hundred microliters of ice-cold PBSA was then added, and samples were placed on ice. All future steps were performed on ice or at 4°C in centrifuges.

Yeast cells were washed twice with 500 μl PBSA, labeled with 5 μg/ml anti-c-myc (clone 9e10, catalog number 626802; BioLegend) and 1 μg/ml anti-HA (chicken anti-HA, catalog number ab9111; Abcam) for 1 h, washed twice with 500 μl PBSA, and labeled with 2 μg/ml goat anti-mouse Alexa Fluor 647 (catalog number A-21235; Thermo Fisher Scientific) and 2 μg/ml goat anti-chicken Alexa Fluor 488 (catalog number A-11039; Thermo Fisher Scientific) for 30 min. Yeast cells were then washed twice in 500 μl PBSA and resuspended in 300 μl PBSA for sorting (fluorescence-activated cell sorter [FACS] analysis) using a FACSAria II instrument (Becton, Dickinson Bioscience). Two sorting gates, gathering HA-positive cells with high and low ratios of c-myc to HA signals, were used.

High-throughput sequencing and preprocessing.

Following FACS analysis, yeast populations were outgrown in SD-CAA overnight. The following day, to isolate plasmids for sequencing, approximately 10 million cells were centrifuged and resuspended in 200 μl of solution 1 (0.13 g/100 ml NaH2PO4·H2O, 1.09 g/100 ml Na2HPO4·7H2O, 1 M sorbitol, 10 mM 2-mercaptoethanol), to which 10 μl of Longlife Zymolyase (G-Biosciences) was then added, after which the mixture was incubated at 37°C for 60 min. Following incubation, plasmid DNA was purified with GenCatch (Epoch Life Science) according to the manufacturer’s instructions, substituting 200 μl of MX2 and 400 μl of MX3. Samples for Illumina sequencing were prepared by amplification with Ni5N501-502, Ni7N701-702, SetFWD_[N, NN, NNN], and SetREV_[N, NN, NNN] via Q5 PCR (New England Biolabs), and the appropriate product was isolated via gel electrophoresis. DNA populations were mixed to a final concentration of 5 ng/μl, with relative proportions of each population matching the relative proportions of yeast collected in each. Sequencing using version 3 chemistry on an Illumina MiSeq system generated 1.5 million reads.

Sequences were processed to remove those with more than one expected error using USearch v11 (48) (https://drive5.com/usearch/manual/whatsnewv11.html). Nucleotide sequences were then compared with the sequences of wild-type LysCP2 and all library designs. Unique nucleotide sequences with >3 reads were assigned to libraries with zero tolerance for mutations outside degenerate regions. Singletons and doublets with at most one deviation outside degenerate regions were also assigned to most-probable-origin libraries.

Nucleotide sequences were then translated, and counts were pooled based on amino acid sequence, here sequence. A sequence was then designated stable or unstable if it occurred in predominantly the stable or unstable pool, respectively.

Site-wise logistic modeling of yeast-displayed protein stability and stability score.

Similar to the homology model presented above, E, now considered a stability score, can be modeled using a first-order Potts model (Ji,j = 0). To infer the parameters of the model, logistic regression can be implemented over the data set with model form:

P(stable|σ)=11+eE(σ)

This model was regressed on the data using the sklearn package in Python (https://scikit-learn.org/stable/). Due to overfitting concerns, L2 regularization was done with a λ value of 10. This value was chosen as it had maximum cross-validated accuracy while maintaining linear correlation between original-data-set parameter inference and means of those inferred from bootstrapped resampling (Fig. S5). This bootstrapped error enabled the design of mutants which combined those features (parameters) that were consistently beneficial across the different bootstrap samplings.

The stability score of a sequence is defined as E(σ) using parameters inferred from the library’s stability observations. The predictive performance of this experimentally derived model was assessed via 10-fold cross-validation. To simulate predicting unseen observations from experimental data, the set of unique observations for each library was subdivided randomly into 10 folds with equal sizes. For each fold, the prediction accuracy was assessed via a model regressed on the collection of observations in the other 9 folds.

Design of multimutants.

First, the set of mutations for each library whose stability score contributions were significantly greater than that of the wild type (Fig. 8) was identified. Within this set, at sites 112, 127, and 130, each one had a single significant beneficial mutational choice: N112D, A127I, and C130V. Where multiple choices were possible, selection was done to explore interesting options or preserve characteristics of the wild-type residue. N54R was selected instead of N54K to preserve possible hydrogen bonding. Although G43E was not predicted to be significantly different than the wild type, it was selected due to its similarity to G43D and G43N. Y107 had multiple options; of these, Y107R and Y107G were selected as representatives of the two most beneficial sets of mutations (positive and small, respectively).

Genes encoding the catalytic domains of designs containing a subset of combinations of these selected mutations were synthesized (Twist Bioscience) and assembled (HiFi; New England Biolabs) with the cell wall binding domain of LysCP2 in the pETh vector.

Data availability.

Sequences of mutant constructs generated by Illumina sequencing are available as a repository on Zenodo (https://doi.org/10.5281/zenodo.2600306). The code required to reproduce the analysis is available upon request.

Supplementary Material

Supplemental file 1
AEM.00054-19-s0001.pdf (906.1KB, pdf)

ACKNOWLEDGMENTS

This work was supported by a grant from the National Institutes of Health (R01 GM121777).

We have submitted a patent application pertaining to some of the engineered lysin molecules in this work.

Footnotes

Supplemental material for this article may be found at https://doi.org/10.1128/AEM.00054-19.

REFERENCES

  • 1.O’Neill J. 2014. Antimicrobial resistance: tackling a crisis for the health and wealth of nations. Review on Antimicrobial Resistance, London, United Kingdom. [Google Scholar]
  • 2.Pastagia M, Schuch R, Fischetti VA, Huang DB. 2013. Lysins: the arrival of pathogen-directed anti-infectives. J Med Microbiol 62:1506–1516. doi: 10.1099/jmm.0.061028-0. [DOI] [PubMed] [Google Scholar]
  • 3.Fenton M, Ross P, Mcauliffe O, O’Mahony J, Coffey A. 2010. Recombinant bacteriophage lysins as antibacterials. Bioeng Bugs 1:9–16. doi: 10.4161/bbug.1.1.9818. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Fischetti VA. 2008. Bacteriophage lysins as effective antibacterials. Curr Opin Microbiol 11:393–400. doi: 10.1016/j.mib.2008.09.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Swift S, Seal B, Garrish J, Oakley B, Hiett K, Yeh H-Y, Woolsey R, Schegg K, Line J, Donovan D. 2015. A thermophilic phage endolysin fusion to a Clostridium perfringens-specific cell wall binding domain creates an anti-Clostridium antimicrobial with improved thermostability. Viruses 7:3019–3034. doi: 10.3390/v7062758. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Blázquez B, Fresco-Taboada A, Iglesias-Bexiga M, Abedon ST. 2016. PL3 amidase, a tailor-made lysin constructed by domain shuffling with potent killing activity against pneumococci and related species. Front Microbiol 7:1156. doi: 10.3389/fmicb.2016.01156. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Heselpoth RD, Nelson DC. 2012. A new screening method for the directed evolution of thermostable bacteriolytic enzymes. J Vis Exp 2012:4216. doi: 10.3791/4216. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Heselpoth RD, Yin Y, Moult J, Nelson DC. 2015. Increasing the stability of the bacteriophage endolysin PlyC using rationale-based FoldX computational modeling. Protein Eng Des Sel 28:85–92. doi: 10.1093/protein/gzv004. [DOI] [PubMed] [Google Scholar]
  • 9.São-José C. 2018. Engineering of phage-derived lytic enzymes: improving their potential as antimicrobials. Antibiotics 7:29. doi: 10.3390/antibiotics7020029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Scanlon TC, Dostal SM, Griswold KE. 2014. A high-throughput screen for antibiotic drug discovery. Biotechnol Bioeng 111:232–243. doi: 10.1002/bit.25019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Steipe B, Schiller B, Pluckthun A, Steinbacher S, Steipe B, Schiller B, Plückthun A, Steinbacher S. 1994. Sequence statistics reliably predict stabilizing mutations in a protein domain. J Mol Biol 240:188–192. doi: 10.1006/jmbi.1994.1434. [DOI] [PubMed] [Google Scholar]
  • 12.Cochran JR, Kim Y-S, Lippow SM, Rao B, Wittrup KD. 2006. Improved mutants from directed evolution are biased to orthologous substitutions. Protein Eng Des Sel 19:245–253. doi: 10.1093/protein/gzl006. [DOI] [PubMed] [Google Scholar]
  • 13.Jäckel C, Bloom JD, Kast P, Arnold FH, Hilvert D. 2010. Consensus protein design without phylogenetic bias. J Mol Biol 399:541–546. doi: 10.1016/j.jmb.2010.04.039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Case BA, Hackel BJ. 2016. Synthetic and natural consensus design for engineering charge within an affibody targeting epidermal growth factor receptor. Biotechnol Bioeng 113:1628–1638. doi: 10.1002/bit.25931. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Magliery TJ. 2015. Protein stability: computation, sequence statistics, and new experimental methods. Curr Opin Struct Biol 33:161–168. doi: 10.1016/j.sbi.2015.09.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Porebski BT, Buckle AM. 2016. Consensus protein design. Protein Eng Des Sel 29:245–251. doi: 10.1093/protein/gzw015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Reynolds KA, Russ WP, Socolich M, Ranganathan R. 2013. Evolution-based design of proteins. Methods Enzymol 523:213–235. doi: 10.1016/B978-0-12-394292-0.00010-2. [DOI] [PubMed] [Google Scholar]
  • 18.Broendum SS, Buckle AM, Mcgowan S. 2018. Catalytic diversity and cell wall binding repeats in the phage-encoded endolysins. Mol Microbiol 110:879–896. doi: 10.1111/mmi.14134. [DOI] [PubMed] [Google Scholar]
  • 19.Oliveira H, São-José C, Azeredo J. 2018. Phage-derived peptidoglycan degrading enzymes: challenges and future prospects for in vivo therapy. Viruses 10:E292. doi: 10.3390/v10060292. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Bershtein S, Goldin K, Tawfik DS. 2008. Intense neutral drifts yield robust and evolvable consensus proteins. J Mol Biol 379:1029–1044. doi: 10.1016/j.jmb.2008.04.024. [DOI] [PubMed] [Google Scholar]
  • 21.Bloom JD, Labthavikul ST, Otey CR, Arnold FH. 2006. Protein stability promotes evolvability. Proc Natl Acad Sci U S A 103:5869–5874. doi: 10.1073/pnas.0510098103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Tokuriki N, Stricher F, Schymkowitz J, Serrano L, Tawfik DS. 2007. The stability effects of protein mutations appear to be universally distributed. J Mol Biol 369:1318–1332. doi: 10.1016/j.jmb.2007.03.069. [DOI] [PubMed] [Google Scholar]
  • 23.Boder ET, Wittrup KD. 1997. Yeast surface display for screening combinatorial polypeptide libraries. Nat Biotechnol 15:553–557. doi: 10.1038/nbt0697-553. [DOI] [PubMed] [Google Scholar]
  • 24.Gai SA, Wittrup KD. 2007. Yeast surface display for protein engineering and characterization. Curr Opin Struct Biol 17:467–473. doi: 10.1016/j.sbi.2007.08.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Traxlmayr MW, Obinger C. 2012. Directed evolution of proteins for increased stability and expression using yeast display. Arch Biochem Biophys 526:174–180. doi: 10.1016/j.abb.2012.04.022. [DOI] [PubMed] [Google Scholar]
  • 26.Hackel BJ, Kapila A, Wittrup KD. 2008. Picomolar affinity fibronectin domains engineered utilizing loop length diversity, recursive mutagenesis, and loop shuffling. J Mol Biol 381:1238–1252. doi: 10.1016/j.jmb.2008.06.051. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Woldring DR, Holec PV, Zhou H, Hackel BJ. 2015. High-throughput ligand discovery reveals a sitewise gradient of diversity in broadly evolved hydrophilic fibronectin domains. PLoS One 10:e0138956. doi: 10.1371/journal.pone.0138956. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Chen I, Dorr BM, Liu DR. 2011. A general strategy for the evolution of bond-forming enzymes using yeast display. Proc Natl Acad Sci U S A 108:11399–11404. doi: 10.1073/pnas.1101046108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Dorr BM, Ham HO, An C, Chaikof EL, Liu DR. 2014. Reprogramming the specificity of sortase enzymes. Proc Natl Acad Sci U S A 111:13343–13348. doi: 10.1073/pnas.1411179111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Schmitz JE, Ossiprandi MC, Rumah KR, Fischetti VA. 2011. Lytic enzyme discovery through multigenomic sequence analysis in Clostridium perfringens. Appl Microbiol Biotechnol 89:1783–1795. doi: 10.1007/s00253-010-2982-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Biasini M, Bienert S, Waterhouse A, Arnold K, Studer G, Schmidt T, Kiefer F, Cassarino TG, Bertoni M, Bordoli L, Schwede T. 2014. SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information. Nucleic Acids Res 42:W252–W258. doi: 10.1093/nar/gku340. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Tamai E, Yoshida H, Sekiya H, Nariya H, Miyata S, Okabe A, Kuwahara T, Maki J, Kamitori S. 2014. X-ray structure of a novel endolysin encoded by episomal phage phiSM101 of Clostridium perfringens. Mol Microbiol 92:326–337. doi: 10.1111/mmi.12559. [DOI] [PubMed] [Google Scholar]
  • 33.Rocklin GJ, Chidyausiku TM, Goreshnik I, Ford A, Houliston S, Lemak A, Carter L, Ravichandran R, Mulligan VK, Chevalier A, Arrowsmith CH, Baker D. 2017. Global analysis of protein folding using massively parallel design, synthesis, and testing. Science 175:168–175. doi: 10.1126/science.aan0693. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Miyazawa S. 2017. Selection originating from protein stability/foldability: relationships between protein folding free energy, sequence ensemble, and fitness. J Theor Biol 433:21–38. doi: 10.1016/j.jtbi.2017.08.018. [DOI] [PubMed] [Google Scholar]
  • 35.Hopf TA, Ingraham JB, Poelwijk FJ, Schärfe CPI, Springer M, Sander C, Marks DS. 2017. Mutation effects predicted from sequence co-variation. Nat Biotechnol 35:128–135. doi: 10.1038/nbt.3769. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Finn RD, Clements J, Arndt W, Miller BL, Wheeler TJ, Schreiber F, Bateman A, Eddy SR. 2015. HMMER Web server: 2015 update. Nucleic Acids Res 43:W30–W38. doi: 10.1093/nar/gkv397. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.UniProt Consortium. 2017. UniProt: the universal protein knowledgebase. Nucleic Acids Res 45:D158–D169. doi: 10.1093/nar/gkw1099. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Donovan DM, Foster-Frey J, Dong S, Rousseau M, Moineau S, Pritchard DG. 2006. The cell lysis activity of the Streptococcus agalactiae bacteriophage B30 endolysin relies on the cysteine, histidine-dependent amidohydrolase/peptidase domain. Appl Environ Microbiol 72:5108–5112. doi: 10.1128/AEM.03065-05. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Donovan DM, Dong S, Garrett W, Rousseau M, Moineau S, Pritchard DG. 2006. Peptidoglycan hydrolase fusions maintain their parental specificities. Appl Environ Microbiol 72:2988–2996. doi: 10.1128/AEM.72.4.2988-2996.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Donovan DM, Lardeo M, Foster-Frey J. 2006. Lysis of staphylococcal mastitis pathogens by bacteriophage phi11 endolysin. FEMS Microbiol Lett 265:133–139. doi: 10.1111/j.1574-6968.2006.00483.x. [DOI] [PubMed] [Google Scholar]
  • 41.Stern LA, Schrack IA, Johnson SM, Deshpande A, Bennett NR, Harasymiw LA, Gardner MK, Hackel BJ. 2016. Geometry and expression enhance enrichment of functional yeast-displayed ligands via cell panning. Biotechnol Bioeng 113:2328–2341. doi: 10.1002/bit.26001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Ng AY, Jordan MI. 2001. On discriminative vs. generative classifiers: a comparison of logistic regression and naive Bayes. Adv Neural Inform Process Syst 14:605–610. [Google Scholar]
  • 43.Halabi N, Rivoire O, Leibler S, Ranganathan R. 2009. Protein sectors: evolutionary units of three-dimensional structure. Cell 138:774–786. doi: 10.1016/j.cell.2009.07.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Woldring DR, Holec PV, Stern LA, Du Y, Hackel BJ. 2017. A gradient of sitewise diversity promotes evolutionary fitness for binder discovery in a three-helix bundle protein scaffold. Biochemistry 56:1656–1671. doi: 10.1021/acs.biochem.6b01142. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Pei J, Kim BH, Grishin NV. 2008. PROMALS3D: a tool for multiple protein sequence and structure alignments. Nucleic Acids Res 36:2295–2300. doi: 10.1093/nar/gkn072. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Finn RD, Coggill P, Eberhardt RY, Eddy SR, Mistry J, Mitchell AL, Potter SC, Punta M, Qureshi M, Sangrador-Vegas A, Salazar GA, Tate J, Bateman A. 2016. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res 44:D279–D285. doi: 10.1093/nar/gkv1344. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Marks DS, Hopf TA, Sander C. 2012. Protein structure prediction from sequence variation. Nat Biotechnol 30:1072–1080. doi: 10.1038/nbt.2419. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Edgar RC. 2010. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26:2460–2461. doi: 10.1093/bioinformatics/btq461. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental file 1
AEM.00054-19-s0001.pdf (906.1KB, pdf)

Data Availability Statement

Sequences of mutant constructs generated by Illumina sequencing are available as a repository on Zenodo (https://doi.org/10.5281/zenodo.2600306). The code required to reproduce the analysis is available upon request.


Articles from Applied and Environmental Microbiology are provided here courtesy of American Society for Microbiology (ASM)

RESOURCES