Significance
Maximizing structural and functional information from multiple sequence alignments is difficult for protein families that exhibit extreme sequence variation. We addressed this issue by identifying covarying positions within the sequence alignment to predict networks of coevolving amino acid residues in LAGLIDADG homing endonucleases, enzymes used for genome-engineering applications. Intriguingly, the predicted coevolving network with the highest score includes the active-site metal-binding residues and adjacent residues. We were able to modulate catalytic efficiency ∼100-fold by substitution of residues in the network. Our data show that the evolutionary trajectory and fitness landscape of LAGLIDADG active sites is constrained by a barrier of coevolving residues and imply that generating an optimal coevolving network is an important consideration when engineering these endonucleases.
Keywords: amino acid coevolution, genetic selection
Abstract
The active sites of enzymes consist of residues necessary for catalysis and structurally important noncatalytic residues that together maintain the architecture and function of the active site. Examples of evolutionary interactions between catalytic and noncatalytic residues have been difficult to define and experimentally validate due to a general intolerance of these residues to substitution. Here, using computational methods to predict coevolving residues, we identify a network of positions consisting of two catalytic metal-binding residues and two adjacent noncatalytic residues in LAGLIDADG homing endonucleases (LHEs). Distinct combinations of the four residues in the network map to distinct LHE subfamilies, with a striking distribution of the metal-binding Asp (D) and Glu (E) residues. Mutation of these four positions in three LHEs—I-LtrI, I-OnuI, and I-HjeMI—indicate that the combinations of residues tolerated are specific to each enzyme. Kinetic analyses under single-turnover conditions revealed that I-LtrI activity could be modulated over an ∼100-fold range by mutation of residues in the coevolving network. I-LtrI catalytic site variants with low activity could be rescued by compensatory mutations at adjacent noncatalytic sites that restore an optimal coevolving network and vice versa. Our results demonstrate that LHE activity is constrained by an evolutionary barrier of residues with strong context-dependent effects. Creation of optimal coevolving active-site networks is therefore an important consideration in engineering of LHEs and other enzymes.
The active sites of enzymes are often the most conserved positions in a multiple sequence alignment, as purifying selection for maintenance of function constrains amino acid variation of residues that directly participate in catalysis. Noncatalytic residues, often in close proximity to catalytic residues, contribute to enzymatic function by maintaining the architecture and chemical environment of the active site. Noncatalytic residues often show sequence variation in multiple sequence alignments, yet nonpermissive substitutions at these positions will have an impact on enzymatic function by disrupting the architecture or chemical environment necessary for catalysis. Thus, catalytic and noncatalytic residues must coevolve with each other and surrounding residues to maintain active-site conformation and chemistry and to buffer against potentially deleterious mutations (1, 2). Coevolving residues within a protein family can be predicted by computational methods that use mutual information theory to identify residue covariation in a multiple sequence alignment (2–5). However, because the magnitude of covariation between positions varies with the magnitude of positional variation (4), the identification of catalytic residues as part of coevolving networks is problematic for the simple reason that catalytic residues show little sequence variation. Although others have identified putative coevolution between residues involved in catalysis in a small number of protein families (6–8), to our knowledge there are no examples demonstrating the functional consequences of coevolution between catalytic and surrounding noncatalytic positions.
We examined coevolution between the catalytic and noncatalytic residues of LAGLIDADG homing endonucleases (LHEs), site-specific DNA endonucleases that are typically encoded within self-splicing introns and inteins (9). LHEs show extreme sequence variation and function as homodimers or as single-chain monomers composed of two LAGLIDADG domains that evolved by gene duplication or gene fusion events. The active site of LHEs consists of two parallel α-helices containing the class-defining LAGLIDADG amino acid motif with acidic metal-binding residues (D or E) at the bottom of each α-helix and positioned in close proximity to the DNA substrate (10–12). Outside of the LAGLIDADG α-helices, the extreme sequence variation makes it difficult to build robust alignments and infer functional information (13–15). Moreover, the monomeric and dimeric LHEs likely evolved under different functional constraints, and the phylogenetic signal and functional information for either form is diluted in alignments that include both the monomeric and dimeric LHEs. Because LHEs are currently under investigation for use as genome-editing agents (16–18), a greater understanding of their functional constraints would aid in engineering studies.
Here, we take advantage of high-quality structure-guided multiple sequence alignments of single-chain LHEs to predict coevolving networks using methods based on mutual information (5, 19, 20). Strikingly, the network with the strongest predictive scores included the metal-binding catalytic residues and adjacent noncatalytic residues that lie on opposite LAGLIDADG α-helices. The coevolving network was experimentally confirmed by functional analyses showing that catalytic activity could be modulated over an ∼100-fold range by mutation of either catalytic or noncatalytic residues in the network. Our results show that maintaining integrity of coevolving networks of catalytic and noncatalytic residues is an important consideration when engineering LHEs and may also be applicable to engineering of other enzyme families.
Results and Discussion
Identification of a Coevolving Network of Catalytic and Noncatalytic Residues.
We noted that the occurrence of two different residues, Asp (D) and Glu (E), at the catalytic metal-binding sites of the single-chain LHEs is an unusual feature for a protein family and might influence enzyme activity. In the I-LtrI, the metal-binding residues correspond to positions E29 and E184 of the two LAGLIDADG α-helices (I-LtrI numbering is adopted throughout this article) (21). The evolutionary distribution of these residues was examined with an alignment of 178 single-chain LHE protein sequences generated by using Cn3D (22) to develop a structure-based guide alignment followed by LoCo (19) to identify and correct systematically misaligned segments. The resulting alignment was hand-curated to remove all sequences lacking acidic residues at the catalytic positions (19) (Fig. S1). A maximum-likelihood phylogenetic tree was derived from the full alignment by PhyML (23), and unexpectedly, the distribution of the catalytic residues sorted into four distinct groups: A, B, C, and D (Fig. 1A and Fig. S2). We considered the possibility that the groupings reflected an evolutionary and functional interaction between the metal-binding and other residues in the LAGLIDADG α-helices. Examination of the alignments by sequence covariation analysis showed that LAGLIDADG α-helices 1 and 2 were the most strongly covarying segments of the protein family by Zpx scores (20). The highest covariation score was between the residue positions immediately preceding the metal-binding residues (A28 and G183 of I-LtrI) (Fig. 1 B and C and Dataset S1). Surprisingly, the positions corresponding to the metal-binding residues (29 and 184) were also identified in this analysis (Dataset S1).
Fig. 1.

A coevolving network in LHEs involving catalytic and noncatalytic residues. (A) A cladogram of single-chain LHEs generated from an unrooted maximum-likelihood tree made by PhyML. The outer ring is colored according to the identity of the metal-binding residues at positions 29 and 184, and the branches leading to the tips are colored according to the identity of residues at positions 28 and 183, as indicated. Each class is colored by the residue 28_183 combination at the deepest branch, with no assumptions made regarding an ancestral state. The locations of the I-LtrI, I-OnuI and I-HjeMI endonucleases are denoted by triangles containing the first letter of the endonuclease. (B) Circos plot (44) illustrating covariation scores between residues of LHEs, mapped onto I-LtrI sequence. The outer ring shows I-LtrI residue number and identity for positions within the LHE alignment. The next ring shows amino acid conservation using a heat map (with red indicating “most conserved” and blue indicating “least conserved”) and bar plot. The internal ring shows covariation scores using bar plots, with height of the bar proportional to the covariation score. Red lines connect positions with the highest covariation scores, black lines connect positions with intermediate scores, and gray lines connect positions with the lowest scores. (C) Covarying positions A28 (yellow), G183 (teal), and E29 and E184 (red) mapped onto the structure of I-LtrI (blue, Protein Data Bank 3R7P) in the presence of DNA substrate. Magnesium cofactors (violet) are shown as dotted spheres. (D) Heat map of residue combinations in the LHE sequence alignment for positions 28, 29, 183, and 184 plotted on a log2 scale. Identities of the metal-binding residue at positions 29 and 184 are on the x axis, and identities of residues at positions 28 and 183 are on the y axis. Combinations of residues at positions 28 and 183 that were observed once in the LHE alignment (G_T, G_R, and G_D) are not plotted. ND, residue combination not detected in alignment.
Based on these observations we examined the initial hypothesis that the group-specific distribution of the metal-binding residues was influenced by a covarying network composed of these four positions (28, 29, 183, and 184 in I-LtrI). The first test of this hypothesis examined the frequency of all observed quartet species in the LHE alignment and the distribution of the quartets on the LHE phylogenetic tree (Fig. 1 A and D and Table S1). There was a strong association between the phylogenetic position of the metal-binding residue and the adjacent residues and between the identity of the metal-binding residues and the identity of the residues permitted at the adjacent positions. In particular, group A LHEs, including I-LtrI, consisted largely of proteins with E_D and E_E metal-binding residues, with A_G or G_A combinations being favored at the adjacent 28 and 183 positions (this naming scheme is given throughout this article where the first metal binding residue is understood to be orthologous to I-LtrI position 29 and the second to position 184, and similarly for the noncatalytic site residues). In contrast, group B and C LHEs possessed D_E and D_D metal-binding residues and a much more diverse set of permitted residues at positions 28 and 183 (Fig. 1A). However, the achievement of statistical rigor is difficult to achieve from such analyses because of the small number of nonindependent sequences sampled.
Saturating Mutagenesis of the Coevolving Network Recapitulates Phylogenetic Patterns.
We performed an unbiased functional screen that selects for functional LHE variants to determine if the observed phylogenetic relationships in the predicted coevolving network residues were constrained (Fig. 2A). Positions 28 and 183 were each randomized in the context of the I-LtrI backbone to all 20 amino acids, whereas the metal-binding residues at positions 29 and 184 were each held at either D or E to give a complexity of 1,600 possible I-LtrI variants (Fig. 2A). This library was screened for enzymatic activity in a liquid culture assay where expression of an active endonuclease (carried on pEndo-Ltr) would cleave an I-LtrI recognition site on a plasmid encoding a bacteriostatic DNA gyrase toxin gene, thereby eliminating the toxin-carrying plasmid (pTox-Ltr) (24). The rate of destruction of pTox-Ltr by the I-LtrI variants correlates with the ability of the enzyme to bind to and cleave the recognition site. Loss of pTox-Ltr is reflected as an increase in relative abundance for cells carrying active I-LtrI variants over weak or inactive variants (Fig. 2B).
Fig. 2.
Competition growth experiments reveal network preferences. (A) Schematic of the liquid competition growth experiment and sequencing strategy used to screen an I-LtrI mutant library. (B) Heat map containing the log2 effect size (25) for quartet pairs with a false discovery rate of less than 0.05. The metal-binding residue identities for 29 and 184 are found along the x axis, and the identities of residues 28 and 183 are found along the y axis. Relative abundance values for all 1,600 possible quartets are found in Fig. S3.
We performed seven independent transformations of the pEndo-Ltr library into pTox-Ltr–competent cells, harvested total plasmid DNA from selective and nonselective conditions after 16 h outgrowth, and examined the mutated positions by Illumina paired-end sequencing. Because the strategy captured all four positions in paired-end sequencing reads, it was possible to determine the relative enrichment of all 1,600 quartet combinations in the selected condition versus the unselected condition (25, 26) (Fig. S3). As shown in Fig. 2B, I-LtrI variants with small side-chain residues at positions 28 and 183 were strikingly enriched relative to other combinations, indicating that these positions were not randomly assorting. In particular, combinations of A and G were favored, possibly indicating that steric restrictions at the base of the LAGLIDADG α-helices surrounding the active site may influence residue identity. Residue combinations at positions 28 and 183 that occurred at low frequency in the LHE alignment, or that were not observed, were also enriched (V_G, T_G, S_G), confirming that the frequency of residues in the current LHE alignment reflects an ascertainment bias. Variants with D_E or D_D combinations at positions 29 and 184 within the context of the I-LtrI backbone were underrepresented relative to I-LtrI variants with E_D or E_E combinations. These observations strongly parallel the preference of the metal-binding residues on the LHE phylogeny (Fig. 1A), supporting the hypothesis that a covarying network of residues influences the amino acid composition of these positions. We conclude that the distribution of metal-binding residues within related LHEs is likely constrained by these context-dependent interactions.
We performed a similar experiment with the LHE I-HjeMI to extend these observations to other LHEs. I-HjeMI is a representative of the type B LHEs that are predicted from the phylogenetic analysis to prefer D_E at the catalytic residues (positions 18 and 150 of I-HjeMI) (Fig. 1D). We found that variants with D_D or D_E at the catalytic positions were strongly enriched over variants with E_D or E_E (Fig. S4). This result is in stark contrast to that observed with I-LtrI where E_D or E_E catalytic residues were highly enriched (Fig. 2B). At the I-HjeMI noncatalytic positions (residues 17 and 149), the wild-type G_A pairing was not the preferred pairing, but rather we observed that variants with an A or G at position 17 could tolerate a wide range of residues at position 149, agreeing with the observed diversity of residues at these positions in the LHE alignment (Fig. 1D).
I-LtrI Variants with Suboptimal Networks Grow Slowly or Not at All.
To investigate the basis for the low abundance of D_D and D_E variants in the I-LtrI liquid competition experiments, we tested individual I-LtrI network variants in a bacterial two-plasmid survival assay on solid media (24) and found that many variants with suboptimal combinations at positions 28_183 and 29_184 survived poorly or not at all (Fig. 3A, Left; Dataset S2). Similar findings were found when the same mutations were made in I-OnuI (Fig. 3A, Right; Dataset S2), a LHE enzyme with ∼30% sequence identity to I-LtrI. Variants of both I-OnuI and I-LtrI with G_A or A_G at positions 28_183 in combination with D_D and D_E at the metal-binding positions survived, but had small colony sizes relative to E_D or E_E variants (Fig. S5). This phenotype was supported by liquid growth experiments that measured the time individual I-LtrI variants took to reach midlog phase (A600 = 0.35) (Fig. 3B and Fig. S5). Notably, D_D or D_E metal-binding residue variants all had longer times to reach midlog phase than E_E or E_D variants with the same residues at positions 28_183. Collectively, with the I-HjeMI experiment, these data show that interactions between the metal-binding and adjacent residues are generalizable to other LHE sequence backbones and that the network of coevolving residues influences both survival and growth rate.
Fig. 3.
Growth effects of mutations in the covarying network. (A) Heat map depicting the log2 survival for individual I-LtrI (Left) and I-OnuI (Right) mutants in a bacterial two-plasmid selection. Identity of the metal-binding residues at positions 29 and 184 are on the x axis, and identity of residues at positions 28 and 183 are on the y axis. ND, not determined. (B) Boxplots of time (min) for I-LtrI variants to reach midlog growth (A600 = 0.35) in selective media, with individual data points shown as dots. I-LtrI variants are indicated on the x axis, with colors indicating variants that were analyzed kinetically.
Coevolving Network Modulates Enzymatic Activity.
It is possible that the low enrichment and low survival in the liquid competition and plate experiments was due to structural defects of the network variants. We tested this first using Rosetta modeling of sequence variants on the I-LtrI backbone and found that the range of values of predicted Rosetta energy units for different initial random seed values using the wild-type sequences encompassed the values observed for the variants (Fig. 4A, Center). Thus, there was no obvious difference in predicted Rosetta energy units for I-LtrI and any variants (Fig. 4). These predictions were supported, using differential scanning calorimetry on select purified I-LtrI and I-OnuI variants to show that the melting temperatures and enthalpy of denaturation were not obviously different for wild-type enzymes or network variants (Fig. 4 and Table S2). Intriguingly, the A_A:E_E enzyme displayed a much higher melting temperature than the other variants. It is thought that many enzymes exhibit full activity in a narrow band of optimal stabilities (27); thus, we suggest that the A_A:E_E variant may be locked into a conformation that is incompatible with efficient catalysis.
Fig. 4.
The coevolving network impacts catalysis, not structure. (A) Plot of experimentally determined melting temperatures (temperature in °C, open circles) and Rosetta energy units (black triangles). (Left) Experimentally determined temperature and predicted Rosetta energy units for the indicated proteins. (Center) The range of Rosetta energy units for 20 predictions performed using the wild-type I-LtrI structure. (Right) The Rosetta energy units relaxed with minor constraints for the indicated I-LtrI variants. (B) Log10 plot of kcat* (nM/min) on the x axis versus KM* (nM) on the y axis for the indicated I-LtrI variants. Three replicates for each variant are shown as open circles colored according to quartet combination. (C) Log10 plot of catalytic efficiency versus time to midlog phase for the indicated mutants. Fit of the data to a quadratic regression model is shown by a black line with the 95% confidence interval as a gray-shaded area.
To test the hypothesis that the covarying network modulates the catalytic efficiency of LHE enzymes, the pseudo-Michaelis–Menten parameters (28, 29) kcat* and KM* were determined from single-turnover reaction conditions for a number of I-LtrI variants and compared with the wild-type enzyme (Fig. 4 B and C and Fig. S6). For example, a single change at position 183 (G183A) to create the suboptimal A_A:E_E network resulted in an ∼65-fold decrease in kcat*, whereas a single change at position 28 (A28G) to create G_G:E_E generated an enzyme with an ∼3.5-fold decrease in KM* over the wild-type enzyme (Fig. 4B). Interestingly, the G_G combination is infrequently found in the LHE alignment (Fig. 1D) and exhibits slower growth in liquid selections compared with A_G variants (Fig. 2B and Fig. 3C), suggesting a penalty for highly efficient enzymes in a cellular context. Plotting catalytic efficiency (kcat*/KM*) versus time to midlog phase in liquid culture for I-LtrI network variants revealed a striking correlation (Fig. 4C and Fig. S1), suggesting that changes in enzymatic efficiency are sufficient to explain the differential growth phenotypes (Fig. 2B and Fig. 3) and the class specificity observed (Fig. 1A). It is possible that many of the I-HjeMI variants that are more active than wild type (Fig. S4) may also be less fit in their normal context.
Suppressor Mutations in the Coevolving Network Can Rescue Suboptimal Variants.
The S_G:E_E variant displays both slow growth and low survival in the plate assay after an extended outgrowth period (Fig. 5) and was thus a logical candidate for a suppressor screen. We conducted a random PCR-based mutagenesis of the entire S_G:E_E variant gene and selection for functional variants to determine if poorly active variants could be rescued by mutations in the coevolving network. The mutated S_G:E_E library was subjected to two rounds of selection using the two-plasmid survival assay, over which the average survival of the library increased from ∼10 to ∼95% (Fig. 5). DNA sequencing identified 18 unique clones, 5 of which contained an E184D mutation (Fig. 5). This change of the metal-binding residue from E to D creates S_G:E_D from S_G:E_E, restored kcat* and KM* to wild-type levels (Fig. 5), and resulted in a more rapid growth (Fig. 2B and Fig. 3B). Mutations at other residue positions were also isolated in the S_G:E_E suppressors, in particular positions Y2, Q11, and A12 that are in the N-terminal tail of I-LtrI. We suggest that these N-terminal mutations are selected for increased expression of the suppressors, as was noted by a recent study that revealed a relationship between codon use in the N-terminal region and protein expression for Escherichia coli proteins (30).
Fig. 5.
A suppressor screen identifies residues in the coevolving network. (Left) Boxplot of survival of the S_G:E_E I-LtrI variant in the two-plasmid genetic selection with 1- or 4-h outgrowth. Red dots indicate data points for different experimental replicates. (Right) Survival of a randomized S_G:E_E library in the two-plasmid genetic selection over two successive rounds of selection with 1-h outgrowths. Genotypes of individual clones sequenced after two rounds of selection. All clones possessed the parental A28S mutation, and clones with the E184D mutation are highlighted in yellow.
Implications for Engineering of LHEs.
Our study identified a number of coevolving networks in single-chain LHEs, the highest scoring of which involved the catalytic and adjacent residues in the LAGLIDADG α-helices. Because catalytic positions in sequence families are often invariant, these positions are refractory to computational methods designed to identify coevolving networks. However, as the acidic metal-binding residues in the LAGLIDAG family exhibited variation, we were able to identify and experimentally validate a coevolving network that included the two acidic metal-binding residues, E29 and E184, and the two adjacent residues, A28 and G183. Indeed, the importance of these residues to LHE structure and function has not gone unnoted. Silva and Belfort identified the A28 and G183 positions as part of a GxxxG network in LAGLIDADG helices (14) and suggested that residue combinations at these positions were influenced by packing constraints at the base of the helices. Similarly, random mutagenesis screens for increased activity of I-CreI derivatives identified the A28 position in dimeric and monomeric I-CreI derivatives (G19 of I-CreI) (17, 31). Likewise, up-activity mutants of I-AniI included randomly selected E-to-D substitutions in the equivalent position to E184 (32). In all cases, the coevolutionary context of the residues involved was not appreciated. Computational modeling using Rosetta and biophysical analyses revealed that I-LtrI and I-OnuI network variants do not exhibit structural defects, which is further supported by our kinetic analyses showing significant effects on kcat* and KM*. The observation that mutation of these positions does not cause structural defects is somewhat surprising, given their location in the core of the molecule and their proximity to the active site. It is commonly thought that the maintenance of protein stability plays a key role in limiting the rate of protein sequence evolution (27, 33); however, the generally low expression of these proteins is likely to mitigate this effect greatly (34). Thus, the coevolving network directly impacts on catalytic function, possibly because mutations in the network subtly affect positioning of the acidic metal-binding residues relative to DNA substrate. It is unlikely that subtle differences in positioning of residues in the network would be readily detectable in I-LtrI, I-OnuI, or I-HjeMI variants by structural studies or computational modeling.
The observation that the coevolving network that we studied lies across the LAGLIDADG interface has obvious implications for engineering of chimeric LHEs. A common strategy for engineering LHEs with altered specificity is to fuse two halves (or domains) from different LHEs, each with different DNA-binding specificity (14, 16, 35–38). This strategy has been successful, particularly when the N- and C-terminal domains are derived from closely related LHEs. Our data indicate that fusion of more distantly related domains may require subsequent fine-tuning, as indicated by the contrasting preferences for acidic catalytic residues in I-HjeMI and I-LtrI and the observation that some variants are more active than wild type. Our data show that the identity of the acidic residue in the first LAGLIDADG helix (position 29 of I-LtrI) is the major determinant to transitioning between residue combinations (Fig. 6), which would involve multiple substitutions to create inactive or hyperactive intermediates unlikely to be tolerated in a cellular environment. Moreover, the greater diversity of residues at noncatalytic positions that create highly enriched variants in the I-HjeMI backbone also supports the hypothesis that chimeric enzymes created by fusion of distantly related domains will require optimization of the noncatalytic positions.
Fig. 6.
A network of coevolving residues creates a barrier to active-site evolution in LHEs. The wild-type quartet of residues in the coevolving network of I-LtrI and I-HjeMI are boxed, with the most frequently observed combinations of residues in the other sequence types shown in Fig. 1D connected by green or red lines and arrows. The green lines indicate observed and permissive transitions between quartets, and the dashed red lines indicate infrequently observed and nonpermissive transitions, with the number of amino acid changes required for each transition indicated by a circled number.
Conclusions.
Our data support the existence of a network of coevolving residues in LHEs, which can be manipulated to modulate catalytic efficiency over an ∼100-fold range. Efforts to re-engineer LHEs for genome-editing applications have met with variable success, and we suggest that this is due in part because such efforts inadvertently created a suboptimal complement of residues within the coevolutionary network that we have identified here. Our analyses predicted additional coevolving networks in LHEs, experimental validation of which may be required to enhance success in re-engineering efforts. More generally, our results suggest that robust, structure-guided alignments will facilitate the identification of coevolving catalytic and noncatalytic residues in other protein families, experimental validation of which would add information to enzyme engineering.
Materials and Methods
Oligonucleotides and Plasmids.
All oligonucleotides used in this study were synthesized by Integrated DNA Technologies, Inc. Plasmid DNA was isolated from E. coli cultures grown in Luria Broth (LB) with an EZ-10 Spin Column Plasmid DNA Kit (Bio Basic Inc.) according to the manufacturer’s protocol. A cesium chloride (CsCl2) gradient was used to obtain supercoiled plasmid DNA from large-scale cultures that was used as substrate for all kinetic analyses. Wild-type I-LtrI and I-OnuI encoding genes (codon-optimized for E. coli) were cloned between the NcoI and NotI sites of plasmid pEndo with a single methionine and glycine sequence added to the N termini. Point mutations were then incorporated into I-LtrI and I-OnuI using a QuikChange II Site-Directed Mutagenesis Kit (Stratagene) to create a variety of amino acid combinations at the coevolving positions, and the mutations were confirmed by sequencing.
Bacterial Two-Plasmid Genetic Selection.
A bacterial two-plasmid genetic selection was used to screen the activity of all LHE variants used in this study, as previously described (39). For liquid media selections, 10 ng of LHE variant (in pEndo) were transformed into 50 μL of competent NovaXGF′ (Novagen) cells harboring the appropriate pTox plasmid. Transformants were allowed to recover in 300 μL of 2× YT medium (16 g/L tryptone, 10 g/L yeast extract, and 5 g/L NaCl) at 37 °C and 200 × g for 30 min. Experiments with I-HjeMI included a 1-h expression period in 2× YT supplemented with 0.02% arabinose and 100 μg/mL carbenicillin. Following the recovery and expression period, both I-LtrI and I-HjeMI cultures were diluted 200-fold into either nonselective [1× M9 salt, 0.8% wt/vol tryptone, 1% vol/vol glycerol, 1 mM MgSO4, 1 mM CaCl2, 0.2% wt/vol thiamine, 100 μg/mL carbenicillin, and 0.02% (wt/vol) l-glucose] or selective media [nonselective media lacking glucose with the addition of 0.02% (wt/vol) l-arabinose and 0.1 mM isopropyl β-d-1-thiogalactopyranoside]. Cultures were then grown at 37 °C and monitored until cell turbidity reached ∼0.7 at 600 nm.
For solid media selections, 50 ng of LHE variant (in pEndo) was transformed into 50 μL NovaXGF′ (Novagen) cells harboring the appropriate pTox plasmid. Transformants were allowed to recover in 300 μL of 2× YT medium for 10 min, followed by the addition of 2 mL 2× YT medium supplemented with 100 μg/mL carbenicillin and 0.02% l-arabinose. Cultures were then grown at 37 °C for 1 or 4 h of outgrowth, harvested, and resuspended in sterile saline (0.9% wt/vol NaCl), and dilutions were spread onto nonselective and selective agar plates. Plates were incubated at 37 °C for 16–24 h, and the survival percentage was calculated as the ratio of colonies on selective to nonselective plates. Three biological replicates with two technical replicates per selection were performed. An I-LtrI catalytic mutant (E29A) was used as a negative control for all selections.
Mutant Library Synthesis and Screening.
The I-LtrI and I-HjeMI quartet libraries were constructed by randomizing the noncatalytic positions of I-LtrI (28 and 183) and I-HjeMI (17 and 149) to all 20 amino acids and holding the catalytic positions (29 and 184 of I-LtrI and 18 and 150 of I-HjeMI) to either Asp or Glu (GenScript). The library was screened for activity using two-plasmid bacterial selection in liquid media followed by paired-end sequencing on the Illumina MiSeq platform at the London Regional Genomics Centre (London, ON). Reads were parsed for the presence of D or E at positions 29 and 184, and the number of reads in all possible quartets was identified using a custom Perl script. The significance of the proportional abundance of each quartet in the selected versus nonselected condition was determined using the ANOVA-like differential analysis method (25, 26). We generated 128 Dirichlet Monte-Carlo instances of the selected and nonselected datasets, performed t tests on each instance, and estimated the associated false discovery rate for each Dirichlet instance using the lfdrtool (25). Values reported and plotted are effect sizes calculated for those variant quartets that had estimated false discovery rates of 0.05 or less (25). Data were plotted using R (40) and the ggplot2 package (41).
BioScreen Bacterial Growth Assay.
A Microbiology Reader Bioscreen C (MTX Lab Systems, Inc.) was used to measure the growth of individual I-LtrI variants in liquid culture medium. A total of 10 ng of I-LtrI variant DNA was transformed into 50 μL of NovaXGF′ cells (containing pTox-Ltr), and cells were allowed to recover in 400 μL of 2× YT at 37 °C for 30 min. Following the recovery period, 4 μL of culture was aliquoted into 200 μL of both selective and nonselective media within wells of a 10 × 10 Honeycomb 2 plate (Oy Growth Cures Ab, Ltd.). Plates were incubated within the BioScreen apparatus at 37 °C with medium shaking while culture turbidity was measured every 15 min for 24 h. Each I-LtrI variant was tested using four independent transformations (n = 4) and during two separate growth periods.
Protein Expression and Purification.
I-LtrI and select variants were cloned between the NcoI and NotI sites of plasmid pProExHta (Invitrogen and Life Technologies), and the 6× histidine-tagged proteins were expressed in E. coli strain ER2566 (New England Biolabs) at 16 °C for 16 h. Cells were harvested at 6,000 × g for 15 min, and pellets were resuspended (40 mL/1 g of cell pellet) in Binding Buffer [50 mM Tris⋅HCl, pH 8.0, 500 mM NaCl, 1 mM imidazole, and 10% (wt/vol) glycerol] supplemented with SIGMAFAST protease inhibitor (Sigma). Cells were lysed using an EmulsiFlex-C3 high-pressure homogenizer followed by sonication for 30 s. Cell lysates were cleared by centrifugation at 29,000 × g for 30 min at 4 °C, and the supernatant was loaded onto a 1-mL HiTrap column (GE Healthcare Life Sciences), washed, and eluted in 6× 1-mL aliquots using elution buffer [50 mM Tris⋅HCl (pH 8.0), 500 mM NaCl, 500 mM imidazole, and 10% (wt/vol) glycerol]. Fractions were pooled and dialyzed for 16 h at 4 °C into 50 mM Tris⋅HCl (pH 8.0), 250 mM NaCl, 30 mM imidazole, and 10% glycerol. The N-terminal 6× histidine tags were removed by adding Tobacco Etch Virus (TEV) protease (6× histidine tagged) to a molar ratio of 1:25 TEV to LHE. The protein mixture was dialyzed into binding buffer for 4 h at 4 °C and run over an equilibrated 1-mL HiTrap column, and the flow-through was collected and dialyzed for 16 h into storage buffer [50 mM Tris⋅HCl, pH 8.0, 25 mM NaCl, 1 mM DTT, and 10% (wt/vol) glycerol] and stored at −80 °C.
Differential Scanning Calorimetry.
Differential scanning calorimetry (DSC) was performed at the Bimolecular Interaction and Conformation Facility at Western University. Two independent DSC determinations were performed for each LHE construct (0.5 mg/mL). Samples were scanned from 10 °C to 110 °C at 60 °C/h using a MicroCal VP-Differential Scanning Calorimeter (GE Healthcare Life Sciences). Raw data were processed using the nonlinear least-squares regression analysis in Origin 7.0 (MicroCal) by first subtracting a buffer–buffer reference scan and then fitting to a non–two-state transition model.
In Vitro Cleavage Assays and Single-Turnover Kinetics.
One copy of the cognate target site for I-LtrI (5′-AATGCTCCTATACGACGTT TAG-3′) was cloned between the AflIII and BglII sites of pLITMUS 28i (New England Biolabs) and was used as substrate for in vitro cleavage assays. I-LtrI and variant protein constructs were diluted using storage buffer to working (10×) concentrations. Six protein concentrations were assayed over a 25-fold range, and the reaction mixture consisted of 50 mM Tris⋅HCl (pH 8.0), 100 mM NaCl, 10 mM MgCl2, 1 mM DTT, and 5 nM substrate. Both the protein and the reaction mixture (containing substrate) were incubated separately for 5 min at 37 °C before reactions were started. Time-course experiments involved sampling 37 °C reactions at six independent points (not including t = 0) after the addition of protein. Reactions were stopped using 200 mM EDTA, 30% glycerol, 0.2% SDS, and bromophenol blue and incubated at 50 °C for 5 min. The percentage product formed was calculated as the intensity of the linear product band divided by the sum of the three reactants (supercoiled substrate, nicked plasmid, and linear product). Percentage of the product was plotted against time using GraphPad Prism, and the initial rate was determined after curve-fitting to a one-phase association function. Initial rates from each of the six time-course experiments were then plotted against enzyme concentration and fit to a Michaelis–Menten model to determine the parameters kcat* and KM*. All time-course experiments were repeated three times using protein samples from at least two independent purification procedures.
I-LtrI A28S Suppressor Screen.
I-LtrI A28S was subject to error-prone PCR (approximately seven to nine nucleotide substitutions per kilobase) by amplifying the ORF using a GeneMorph II Random Mutagenesis Kit (Agilent Technologies). The I-LtrI A28S mutant library (in pEndo) was then used as the input for an initial round (round 1) of bacterial two-plasmid genetic selection. All colonies that survived on selective media (480 colonies) after round 1 were individually inoculated into wells of 96-well plates containing LB supplemented with 100 μg/mL ampicillin, grown overnight, and pooled, and plasmid DNA was isolated. The I-LtrI-encoding insert was recloned to create a second mutagenic library, and 30 colonies that survived the second genetic selection (round 2) were sequenced to identify mutations.
Rosetta Modeling.
In silico mutants of the I-LtrI structure 3R7P structure (21) were generated using PyRosetta (42) and the mutate_residues function for all permutations of the residues found in the alignment at positions 28, 29, 183, and 184. Structural refinement and scoring was conducted using Rosetta (43) to evaluate the predicted local energy minima of wild-type and in silico mutants. The Rosetta relax protocol was applied to the in silico mutant structures with added constraints to the input model using standard flags for the Rosetta relax protocol (–constrain_relax_to_start_coords). In addition to the mutants, the Rosetta relax protocol was applied to the wild-type structure with 20 repeats using the –nstructs flag. Each structure was scored as part of the relax protocol. Energies are represented in Rosetta Energy Units. The data were plotted using R (40).
Supplementary Material
Acknowledgments
We thank Andrew Fernandes for discussions regarding statistical analyses; the London Regional Genomics Facility and the Bi-Molecular Conformation and Interaction Facility for technical assistance with Illumina sequencing and differential scanning calorimetry analyses; and Dr. David Heinrichs for the use of the Microbiology Reader Bioscreen instrument. R.J.D. was supported by a Canada Graduate Scholarship from the National Science and Engineering Research Council of Canada. G.B.G. is funded by a Discovery Grant from the National Science and Engineering Research Council of Canada, and D.R.E. is funded by an Operating Grant from the Canadian Institutes of Health Research (MOP 97780) and by Pilot Project funding from the Northwest Genome Engineering Consortium.
Footnotes
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1322352111/-/DCSupplemental.
References
- 1.Fitch WM, Markowitz E. An improved method for determining codon variability in a gene and its application to the rate of fixation of mutations in evolution. Biochem Genet. 1970;4(5):579–593. doi: 10.1007/BF00486096. [DOI] [PubMed] [Google Scholar]
- 2.de Juan D, Pazos F, Valencia A. Emerging methods in protein co-evolution. Nat Rev Genet. 2013;14(4):249–261. doi: 10.1038/nrg3414. [DOI] [PubMed] [Google Scholar]
- 3.Gloor GB, Martin LC, Wahl LM, Dunn SD. Mutual information in protein multiple sequence alignments reveals two classes of coevolving positions. Biochemistry. 2005;44(19):7156–7165. doi: 10.1021/bi050293e. [DOI] [PubMed] [Google Scholar]
- 4.Martin LC, Gloor GB, Dunn SD, Wahl LM. Using information theory to search for co-evolving residues in proteins. Bioinformatics. 2005;21(22):4116–4124. doi: 10.1093/bioinformatics/bti671. [DOI] [PubMed] [Google Scholar]
- 5.Dunn SD, Wahl LM, Gloor GB. Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction. Bioinformatics. 2008;24(3):333–340. doi: 10.1093/bioinformatics/btm604. [DOI] [PubMed] [Google Scholar]
- 6.Little DY, Chen L. Identification of coevolving residues and coevolution potentials emphasizing structure, bond formation and catalytic coordination in protein evolution. PLoS ONE. 2009;4(3):e4762. doi: 10.1371/journal.pone.0004762. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Yeang CH, Haussler D. Detecting coevolution in and among protein domains. PLOS Comput Biol. 2007;3(11):e211. doi: 10.1371/journal.pcbi.0030211. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Chakrabarti S, Panchenko AR. Structural and functional roles of coevolved sites in proteins. PLoS ONE. 2010;5(1):e8591. doi: 10.1371/journal.pone.0008591. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Stoddard BL. Homing endonuclease structure and function. Q Rev Biophys. 2005;38(1):49–95. doi: 10.1017/S0033583505004063. [DOI] [PubMed] [Google Scholar]
- 10.Gimble FS, Duan X, Hu D, Quiocho FA. Identification of Lys-403 in the PI-SceI homing endonuclease as part of a symmetric catalytic center. J Biol Chem. 1998;273(46):30524–30529. doi: 10.1074/jbc.273.46.30524. [DOI] [PubMed] [Google Scholar]
- 11.Jurica MS, Monnat RJ, Jr, Stoddard BL. DNA recognition and cleavage by the LAGLIDADG homing endonuclease I-CreI. Mol Cell. 1998;2(4):469–476. doi: 10.1016/s1097-2765(00)80146-x. [DOI] [PubMed] [Google Scholar]
- 12.Moure CM, Gimble FS, Quiocho FA. Crystal structure of the intein homing endonuclease PI-SceI bound to its recognition sequence. Nat Struct Biol. 2002;9(10):764–770. doi: 10.1038/nsb840. [DOI] [PubMed] [Google Scholar]
- 13.Dalgaard JZ, et al. Statistical modeling and analysis of the LAGLIDADG family of site-specific endonucleases and identification of an intein that encodes a site-specific endonuclease of the HNH family. Nucleic Acids Res. 1997;25(22):4626–4638. doi: 10.1093/nar/25.22.4626. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Silva GH, Belfort M. Analysis of the LAGLIDADG interface of the monomeric homing endonuclease I-DmoI. Nucleic Acids Res. 2004;32(10):3156–3168. doi: 10.1093/nar/gkh618. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Grishin A, et al. Identification of conserved features of LAGLIDADG homing endonucleases. J Bioinform Comput Biol. 2010;8(3):453–469. doi: 10.1142/s0219720010004665. [DOI] [PubMed] [Google Scholar]
- 16.Baxter S, et al. Engineering domain fusion chimeras from I-OnuI family LAGLIDADG homing endonucleases. Nucleic Acids Res. 2012;40(16):7985–8000. doi: 10.1093/nar/gks502. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Redondo P, et al. Molecular basis of xeroderma pigmentosum group C DNA recognition by engineered meganucleases. Nature. 2008;456(7218):107–111. doi: 10.1038/nature07343. [DOI] [PubMed] [Google Scholar]
- 18.Silva G, et al. Meganucleases and other tools for targeted genome engineering: Perspectives and challenges for gene therapy. Curr Gene Ther. 2011;11(1):11–27. doi: 10.2174/156652311794520111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Dickson RJ, Gloor GB. Protein sequence alignment analysis by local covariation: Coevolution statistics detect benchmark alignment errors. PLoS ONE. 2012;7(6):e37645. doi: 10.1371/journal.pone.0037645. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Dickson RJ, Wahl LM, Fernandes AD, Gloor GB. Identifying and seeing beyond multiple sequence alignment errors using intra-molecular protein covariation. PLoS ONE. 2010;5(6):e11082. doi: 10.1371/journal.pone.0011082. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Takeuchi R, et al. Tapping natural reservoirs of homing endonucleases for targeted gene modification. Proc Natl Acad Sci USA. 2011;108(32):13077–13082. doi: 10.1073/pnas.1107719108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Wang Y, Geer LY, Chappey C, Kans JA, Bryant SH. Cn3D: Sequence and structure views for Entrez. Trends Biochem Sci. 2000;25(6):300–302. doi: 10.1016/s0968-0004(00)01561-9. [DOI] [PubMed] [Google Scholar]
- 23.Guindon S, et al. New algorithms and methods to estimate maximum-likelihood phylogenies: Assessing the performance of PhyML 3.0. Syst Biol. 2010;59(3):307–321. doi: 10.1093/sysbio/syq010. [DOI] [PubMed] [Google Scholar]
- 24.Chen Z, Zhao H. A highly sensitive selection method for directed evolution of homing endonucleases. Nucleic Acids Res. 2005;33(18):e154. doi: 10.1093/nar/gni148. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Fernandes AD, Macklaim JM, Linn TG, Reid G, Gloor GB. ANOVA-like differential expression (ALDEx) analysis for mixed population RNA-Seq. PLoS ONE. 2013;8(7):e67019. doi: 10.1371/journal.pone.0067019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Fernandes A, et al. Unifying the analysis of high-throughput sequencing datasets: Characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis. Microbiome. 2014;2:15. doi: 10.1186/2049-2618-2-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.DePristo MA, Weinreich DM, Hartl DL. Missense meanderings in sequence space: A biophysical view of protein evolution. Nat Rev Genet. 2005;6(9):678–687. doi: 10.1038/nrg1672. [DOI] [PubMed] [Google Scholar]
- 28.Halford SE, Johnson NP, Grinsted J. The EcoRI restriction endonuclease with bacteriophage lambda DNA. Kinetic studies. Biochem J. 1980;191(2):581–592. doi: 10.1042/bj1910581. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Thyme SB, et al. Exploitation of binding energy for catalysis and design. Nature. 2009;461(7268):1300–1304. doi: 10.1038/nature08508. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Goodman DB, Church GM, Kosuri S. Causes and effects of N-terminal codon bias in bacterial genes. Science. 2013;342(6157):475–479. doi: 10.1126/science.1241934. [DOI] [PubMed] [Google Scholar]
- 31.Grizot S, et al. Efficient targeting of a SCID gene by an engineered single-chain homing endonuclease. Nucleic Acids Res. 2009;37(16):5405–5419. doi: 10.1093/nar/gkp548. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Takeuchi R, Certo M, Caprara MG, Scharenberg AM, Stoddard BL. Optimization of in vivo activity of a bifunctional homing endonuclease and maturase reverses evolutionary degradation. Nucleic Acids Res. 2009;37(3):877–890. doi: 10.1093/nar/gkn1007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Studer RA, Dessailly BH, Orengo CA. Residue mutations and their impact on protein structure and function: Detecting beneficial and pathogenic changes. Biochem J. 2013;449(3):581–594. doi: 10.1042/BJ20121221. [DOI] [PubMed] [Google Scholar]
- 34.Wolf YI, Gopich IV, Lipman DJ, Koonin EV. Relative contributions of intrinsic structural-functional constraints and translation rate to the evolution of protein-coding genes. Genome Biol Evol. 2010;2:190–199. doi: 10.1093/gbe/evq010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Silva GH, Belfort M, Wende W, Pingoud A. From monomeric to homodimeric endonucleases and back: Engineering novel specificity of LAGLIDADG enzymes. J Mol Biol. 2006;361(4):744–754. doi: 10.1016/j.jmb.2006.06.063. [DOI] [PubMed] [Google Scholar]
- 36.Steuer S, Pingoud V, Pingoud A, Wende W. Chimeras of the homing endonuclease PI-SceI and the homologous Candida tropicalis intein: A study to explore the possibility of exchanging DNA-binding modules to obtain highly specific endonucleases with altered specificity. ChemBioChem. 2004;5(2):206–213. doi: 10.1002/cbic.200300718. [DOI] [PubMed] [Google Scholar]
- 37.Chevalier BS, et al. Design, activity, and structure of a highly specific artificial endonuclease. Mol Cell. 2002;10(4):895–905. doi: 10.1016/s1097-2765(02)00690-1. [DOI] [PubMed] [Google Scholar]
- 38.Epinat JC, et al. A novel engineered meganuclease induces homologous recombination in yeast and mammalian cells. Nucleic Acids Res. 2003;31(11):2952–2962. doi: 10.1093/nar/gkg375. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Doyon JB, Pattanayak V, Meyer CB, Liu DR. Directed evolution and substrate specificity profile of homing endonuclease I-SceI. J Am Chem Soc. 2006;128(7):2477–2484. doi: 10.1021/ja057519l. [DOI] [PubMed] [Google Scholar]
- 40.R Core Team . R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing; 2013. [Google Scholar]
- 41.Wickham H. 2009. ggplot2: Elegant Graphics for Data Analysis (Springer, New York)
- 42.Chaudhury S, Lyskov S, Gray JJ. PyRosetta: A script-based interface for implementing molecular modeling algorithms using Rosetta. Bioinformatics. 2010;26(5):689–691. doi: 10.1093/bioinformatics/btq007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Leaver-Fay A, et al. ROSETTA3: An object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol. 2011;487:545–574. doi: 10.1016/B978-0-12-381270-4.00019-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Simonetti FL, Teppa E, Chernomoretz A, Nielsen M, Marino Buslje C. MISTIC: Mutual information server to infer coevolution. Nucleic Acids Res. 2013;41(Web Server issue):W8–W14. doi: 10.1093/nar/gkt427. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.





