Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2007 Oct 18;104(44):17353–17357. doi: 10.1073/pnas.0708265104

Identification of functional paralog shift mutations: Conversion of Escherichia coli malate dehydrogenase to a lactate dehydrogenase

Yifeng Yin 1, Jack F Kirsch 1,*
PMCID: PMC2077260  PMID: 17947381

Abstract

Five positions in the Escherichia coli malate dehydrogenase (eMDH) sequence, which distinguish MDH from lactate dehydrogenase (LDH) activity, were identified through a combination of Venn diagrams constructed from whole genomic data and from unbiased representative sequences from terminal clades. Incorporation of the five changes in eMDH sufficed to convert the enzyme from one with (kcat/Kmpyruvate)/(kcat/Kmoxaloacetate) = 6.1 × 10−9 to one with that ratio = 28. The substrate specificity was thus changed by a factor of 4.6 × 109. The kcat/Kmpyruvate value for the pentamutant (eMDH I12V/R81Q/M85E/G210A/V214I) is 3,500 M−1·s−1, which is ≈1/1,000 of the values found for typical wild-type LDHs. The procedure isolates an intersection of “strong forcing sets” that should prove to be of general use in switching paralog function.

Keywords: enzyme design, Venn diagrams, strong forcing set


Alterations in enzyme substrate specificity can be achieved either by rational design or by directed evolution. Some examples of success realized from the latter approach include change of hydroxylation regiospecificity of cytochrome P450 (1), restriction of aminoacyl-tRNA synthetase activity to desired unnatural amino acids (2), and broadening the substrate specificity of an aspartate aminotransferase (AATase) to obtain a tyrosine aminotransferase (TATase) (3).

An important milestone that set a high standard for subsequent attempts to switch enzyme substrate specificity by purely rational design was the demonstration that the Q102R mutation of Bacillus stearothermophilus lactate dehydrogenase (LDH) sufficed to impart near wild-type malate dehydrogenase (MDH) activity to this enzyme (4). The conserved Arg-81 residue ion pairs with the β-CO2 group of the substrate, oxaloacetate (OAA), in MDH as shown in Fig. 1. This position is usually conserved as Gln in LDH, but occasionally Pro and Met are found.

Fig. 1.

Fig. 1.

Schematic drawing of the proposed structure of the MDH–OAA–NADH complex (15). The residues are numbered according to their positions in eMDH.

The reverse experiment of incorporating the R81Q mutation into Escherichia coli malate dehydrogenase (eMDH) was subsequently carried out by the Holbrook group (5), but only a small amount of LDH activity was observed in the resultant construct. It is important to understand why the Arg–Gln interchange was successful only in converting LDH to MDH activity. An unrooted phylogenetic tree relating MDH to LDH reveals a relatively close relationship (>28% sequence identity) between tetrameric MDHs and LDHs (all LDHs are tetrameric) (6). Dimeric MDHs are significantly more distantly related to LDHs (sequence identity typically <22%). The original Q102R mutation converted B. stearothermophilus LDH to a protein that has up to 35% identity to extant tetrameric MDHs. However, the sequence of eMDH is <20% identical to those of the LDH family; therefore, the R81Q mutation is insufficient to traverse significantly on the MDH → LDH trajectory. The R81Q mutation made in a more closely related tetrameric MDH does give an enzyme with ≈0.1% of the activity of a typical LDH (7). This evolutionary analysis clarifies our understanding of why the LDH to MDH experiment, which crossed over a relatively small distance in evolutionary space, was successful and why the large jump from eMDH to the distantly related LDH enzymes was only partly so.

Previous attempts to introduce mutations in addition to R81Q into eMDH only led to moderate increases in its LDH activity (8). Is it possible to define a small set of additional substitutions that will introduce significant LDH activity into eMDH? We report here the identification of a group of four additional eMDH mutations that together increase LDH activity by 27-fold and the specificity ratio by 55-fold over R81Q alone. This construct is a true LDH derived from MDH in that it exhibits a 28-fold preference for pyruvate compared with OAA.

Results and Discussion

Identification of Mutations to Change MDH to LDH Specificity.

The results of earlier studies from this laboratory designed to switch AATase to TATase specificity by directed evolution showed that all of the realized mutations occurred in a small subset of 32 amino acid positions (from a total of 384 that were subject to mutation) that are ≥75% conserved in AATases and <75% conserved in TATases (set AAT–TAT) (3). It was hypothesized that corresponding sets (i.e., set Enzyme I–Enzyme II) might identify residues whose mutation would switch the specificity from an arbitrarily chosen Enzyme I to that of a paralog, Enzyme II. This might be a general design strategy. It was decided for the reasons stated above to employ eMDH as a test vehicle to enhance LDH activity.

Fig. 2A shows the Venn diagram resulting from inclusion of all MDH and LDH sequences in the Swiss-Prot database that align with eMDH. The left and right circles include all residues that are ≥60% conserved in MDH and LDH sequences, respectively. The intersection, set MDHLDH, is divided into two parts: 43 of the 312 positions (upper) are identical in both enzymes, but the lower partition of the intersection reveals a set that was empty in the AATase/TATase Venn diagram (3). This set (SFA) includes 22 conserved positions that exist as different amino acids in LDH compared with MDH. For example, position 81 is always Arg in MDH and is ≥60% conserved as Gln in LDH.

Fig. 2.

Fig. 2.

Venn diagrams used to guide the selection of mutations to confer LDH activity on eMDH. (A) Constructed from a total of all 47 MDH and 127 LDH sequences that could be aligned with eMDH by PileUp (GCG) from Swiss-Prot (July, 2003) using the default parameters. The diagram, constructed to include all residues conserved at ≥60%, shows a total 99 + 43 = 132 of ≈330 residues that are conserved in LDH and 30 + 43 = 73 that are similarly maintained in MDH. Among them, 43 residues are found in both enzymes. The 22 residues in the lower part of the intersection are conserved as different residues in the two enzymes as shown. This group is termed the strong forcing set (SFA). (B) Composed of an arbitrary representative from each major clade of the phylogenetic tree constructed with ClustalW. Total numbers of MDH and LDH sequences are 16 and 13, respectively. See SI Fig. 5. Only 7 residues fall into the set SFB from this more limited selection of sequences. (C) Constructed from sets SFA and SFB. The kinetically characterized mutations are shown in bold. See Results and Discussion.

It seemed reasonable that the substitutions identified in the lower part of the intersection might contain the preponderance of the specificity determinants. This set is thus tentatively named “the strong forcing set” to differentiate it from set Enzyme I–Enzyme II, which is now termed a “forcing set,” as suggested by the earlier directed evolution experiment (3).

Because the 22 substitutions in the 60% conservation strong forcing set of the MDH/LDH Venn diagram are too great to be combinatorially incorporated into eMDH, the stringency of the Venn diagram was increased to ≥75% identity (data not shown), which yielded a forcing set with only five substitutions: the expected R81Q, as well as G176E, G210A, V214I, and A223T. These last four mutations were introduced individually and in several combinations into the eMDH R81Q mutant, and their expression and LDH activity were analyzed with the substrate pair pyruvate–NADH.

All mutants containing G176E were expressed solely as insoluble inclusion bodies, which suggested a significant compromise of structural integrity. Because no countercharge to Glu-176 is close in LDH structures, the insolubility is probably the result of steric clashes of Glu-176 with other residues in eMDH. Therefore, we used the EGAD force field (9, 10) on the structure of the ternary complex of wide-type eMDH cocrystallized with the substrate analog citrate and NAD+ [protein database (PDB) ID code 1EMD] (11) to identify the three likely interfering side chains, Thr-181, Tyr-253, and Leu-299, that were mutated to Ala or other smaller residues such as Ser-181 and Thr-253, respectively. However, these additional mutations failed to yield soluble proteins. A further observation is that all mutants containing A223T failed to catalyze the NADH-mediated reduction of pyruvate to lactate.

The last two mutations, G210A and V214I, do result in a 2.2-fold increase in LDH activity when incorporated together, although individually G210A lowers activity by 30%, and V214I has very little effect (Fig. 3). Thus, these two changes are synergistic. The side chains of Gly-210 and Val-214 are 5.4 Å apart in the eMDH structure (PDB ID code 1EMD), and the corresponding Ala and Ile are separated by 4.7 Å in the B. stearothermophilus LDH structure (PDB ID code 1LDN).

Fig. 3.

Fig. 3.

LDH-specific activities of eMDH mutants, determined in 200 mM TAPS/100 mM KCl/10 mM sodium pyruvate/150 μM NADH at 25°C and pH 8.0. For comparison, the specific activity of wild-type eMDH is <0.05 nmol·s−1 per mg−1 under identical assay conditions. All of the substitutions except V214I, which is from SFA–SFB, are from the set SFASFB of Fig. 2.

The information content of the available sequence database is not distributed uniformly over organismal diversity but is anthropomorphically biased. For example, there are 9 vertebrate cytosolic MDH sequences showing an average identity of 91% [see supporting information (SI) Table 3], whereas 15 of the nonredundant Enterobacteriales (the order containing E. coli) sequences are only 67% identical (SI Table 4). This bias decreases the probability of identifying conserved amino acid residues that are critical for substrate specificity and may explain why the whole genomic approach described above was only modestly successful. The database must therefore be filtered to reduce the number of closely related sequences. Therefore, a phylogenetic tree of the above mentioned MDH and LDH sequences was constructed with ClustalW (12). An arbitrary representative from each major clade of this tree was chosen (see SI Fig. 5). These sequences were used to construct a new 60% cutoff Venn diagram (Fig. 2B). The new upper intersection includes 34 of the 312 positions that are identical in both enzymes, and the strong forcing set SFB is now reduced to only seven substitutions.

The sequencing bias in SFB is greatly reduced from that of SFA. Importantly, the problematic G176E does not appear in SFB. Nonetheless the information content of SFA is valuable; therefore, a final Venn diagram was constructed from sets SFA and SFB (Fig. 2C). The intersection, SFASFB, contains the five substitutions that were considered most likely to confer LDH activity onto the MDH framework. These five substitutions include the expected R81Q as well as G210A and A223T, which were already evaluated (see above). The two novel mutations, I12V and M85E, were incorporated individually and together into the triple mutant eMDH R81Q/G210A/V214I that is 2.2-fold more active in the LDH assay compared with R81Q alone. Both quadruple mutants and the pentamutant (eMDH with five substitutions I12V/R81Q/M85E/G210A/V214I) are more active than the triple. The latter was fully characterized kinetically (see below).

The two members of set SFB–SFA, N122D and F144I, were individually added to the pentamutant, but both constructs were inactive. Thus, it is important to use the members from both sets SFA and SFB, with set SFASFB being the most productive, to succeed in the design. The procedure appears to be robust as an independent analysis beginning with a different set of starting sequences (P. O. Syren, data not shown) also identified the critical mutations I12V, R81Q, M85E, and G210A. There were minor differences in SFB from those shown in Fig. 2.

Enzyme Activity and Specificity.

The first and second quadruple mutants, R81Q/G210A/V214I + I12V (first) or + M85E (second), result in further increases in LDH activity by 2.6- and 5- fold, respectively. The effects of these two mutations on the LDH activity are additive because together they yield a construct with a 13.6-fold increase of the LDH activity over that of the triple mutant (theoretical: 2.6 × 5 = 13). The resultant pentamutant exhibits LDH activity that is 30-fold greater than that of R81Q alone and is 1.7 × 104-fold enhanced over that of wild-type eMDH (Fig. 3).

The eMDH pentamutant exhibits standard Michaelis–Menten steady-state kinetics (Fig. 4). The kinetic parameters are compared with those of wild-type eMDH and eMDH R81Q, which were redetermined here for consistency (Table 1). The presently obtained wild-type and R81Q MDH values differ insignificantly from those in the literature. The R81Q MDH mutant prefers OAA by a factor of 2 over pyruvate (kcat/Km), whereas the eMDH pentamutant favors pyruvate over OAA by 28-fold. This latter figure can be compared with the kcat/Km ratios of wild-type LDHs, which typically favor pyruvate by a factor of 1,000 (Table 1). Thus, the four additional mutations do effect a quantitative change in specificity and result in a true LDH. The increase in kcat contributes most of the enhancement in kcat/Km for pyruvate. The kcat/Km value is ≈0.1% of that of a typical LDH. No curvature is observed in a plot of v versus [OAA] at concentrations ≤10 mM, indicating a KmOAA ≥30 mM. The corresponding Km values for wild-type MDHs are in the 50 μM range.

Fig. 4.

Fig. 4.

Dependence of the rates of the reaction of the eMDH pentamutant as a function of pyruvate concentration. The assays were performed at 37°C and pH 8.0.

Table 1.

Steady-state kinetic parameters for wild-type and mutant eMDHs

Enzyme Oxaloacetate
Pyruvate
kcat/Kmpyruvate/kcat/Kmoxaloacetate
kcat, s−1 Km, mM kcat/Km, M−1·s−1 kcat, s−1 Km, mM kcat/Km, M−1·s−1
Wild-type eMDH* 931 0.04 2.3 × 107 NS NS 0.14 6.1 × 10−9
eMDH R81Q* 0.77 3 257 3.3 25 132 0.51
eMDH R81Q/G210A/V214I/I12V/M85E NS NS 125 46.5 (3.3) 13.3 (2.2) 3,500 (630) 28
Wild-type B. stearothermophilus LDH§ 6.0 1.5 4.0 × 103 250 0.060 4.2 × 106 1.1 × 103

*From ref. 5. The assays were performed at pH 7.5, 30°C. Essentially the same values were obtained under the present condition (pH 8.0, 37°C).

No curvature was noted in v versus [S] plots with [OAA] ≤ 10 mM. See Results and Discussion.

This work. Standard errors are in parentheses.

§From ref. 4. The assays were performed at pH 6.0, 25°C.

Other Routes to the Identification of Specificity Determining Residues.

Other approaches include evolutionary trace analysis (13) and SDPpred (14). The former unbiases the sequence database in a manner similar to that used here (see SI Fig. 5) and assigns particular importance to covarying residues that are in close proximity when mapped onto the structure. The SDPpred method is based similarly on the identification of conserved intraortholog residues that vary in paralogs (http://monkey.belozersky.msu.ru/∼psn). None of the important MDH/LDH specificity residues found in the present investigation (including the diagnostic R81Q position) was identified by the SPDpred program available at this website with the representative sequence alignment discussed above. A possible explanation for the failure to identify these specificity-determining positions might be found in the use of smoothed frequencies in the SPDpred algorithm to minimize the effect of similar residue substitutions. Three of the five important substitutions identified in our work contain only single-methyl group substitutions (I12V, G210A, and V214I).

The Venn diagrams offer a facile and convenient method for isolating the important specificity-determining positions in the strong forcing set. We find that it is important to take data from both the whole genomic and unbiased sequence sets to achieve success. Although the method does not address the challenge of how to achieve de novo enzyme activity, the first step in such an exercise is to understand why given substitutions effect specificity changes.

In summary, the method described here, which identifies important specificity-determining residue changes by incorporating information from the total genomic database and from an unbiased subset of sequences, did succeed in converting the R81Q eMDH mutant into an enzyme with high specificity for the pyruvate–lactate pair with only four substitutions, each of which contributes an average of 2.3-fold to the total change. Each designed mutation contributes >30-fold (1.3/0.04 = 32.5) over the average gain exhibited by the ≈220 substitutions separating wild-type MDH and LDH (Table 2). Although the designed mutations all map to the active site region, this method does not rely on structural information and is thus more generally applicable than evolutionary trace analysis (13).

Table 2.

Activity of designed LDHs

Clone No. of mutations Relative LDH activity, (kcat/Km) Average gain per mutation, fold*
Start eMDH R81Q 0 1
Designed eMDH pentamutant 4 30 2.3
Natural end LDH ca. 220 104 1.04

*x4 = 30; x = 2.3 and x220 = 104; x = 1.04.

eMDH I12V/R81Q/M85E/G210A/V214I mutant.

Materials and Methods

Strains and Plasmids.

E. coli reference strain MG1655 was used to prepare the genomic DNA template from which the eMDH gene was PCR-amplified. E. coli strain DH10B (Invitrogen, Carlsbad, CA) was used for both DNA manipulation and protein expression. E. coli strain W945T1-2, λ with genotype of F thr-1 araC14 leuB6(Am) lacY1 glnV44(AS) galK2(Oc) λ trp-59 mdh-2 rpsL265(StrR) xylA5 mtl-1 thi-1 (E. coli genetic Stock Center, New Haven, CT), which does not express wild-type eMDH, was an alternative expression host to obtain enzymes free of wild-type contamination. Plasmid pKK223-3 (GE Healthcare, Piscataway, NJ) was the expression vector for eMDH wild-type and mutant enzymes.

Cloning, Mutagenesis, Expression, and Purification.

The wild-type eMDH gene (mdh) was amplified from genomic DNA from E. coli MG1655 by standard PCR protocols with PfuTurbo proofreading thermophilic DNA polymerase (Stratagene, La Jolla, CA). Two primers used were 5′-GTATGAGCTCTAAGAAGGAGATATACATATGAAAGTCGCAGTCCTCGGC-3′, which includes an optimized Shine–Dalgarno sequence, and 5′-GCGTATGCATGCTCAATTACTTATTAACGAACTCTTCGCCCA-3′. The PCR product was inserted into pKK223-3 for expression. The mutations were introduced into the gene with the QuikChange kit (Stratagene). All wild-type and mutant mdh genes were sequenced across their full length to ensure that no undesired mutations were present. The mutant proteins were expressed at 37°C for 12 h, and purification was accomplished by affinity chromatography with Affi-Gel Blue (Bio-Rad, Hercules, CA) (8).

Identification of Substrate Specificity-Determining Positions.

Two sets of MDH and LDH sequences were used to identify key residues in eMDH that dictate substrate specificity. The first set consists of all available MDH and LDH sequences from the Swiss-Prot database. These sequences were aligned, and a phylogenetic tree was constructed with ClustalW. A representative sequence was arbitrarily chosen from each major clade of the tree, and a sequence identity matrix was constructed for these representative sequences to eliminate sequences whose identity is >80% with any other in the set. The second set of sequences was separately aligned with PileUp (GCG Seqweb Wisconsin package; Accelrys, San Diego, CA). The alignment was refined with small changes based on available structure and mechanistic information so that all known catalytically important residues were properly aligned. The alignments were analyzed by venn.out (written by Daniel Malashock, University of California, Berkeley) to determine ≥60% consensus positions in the MDH or the LDH group. The program is available at http://mcb.berkeley.edu/labs/kirsch/research.html. The Venn diagrams were constructed from these data by the procedures described previously (3).

Steady-State Kinetics.

Protein concentrations were determined by Bradford assay (Bio-Rad). MDH and LDH activities were measured with OAA and pyruvate, respectively, and rates of reduction monitored by the decreasing NADH absorbance at 340 nm on a UV-visible spectrophotometer (model 8453; Agilent, Santa Clara, CA). Background rates were measured in the absence of enzymes. Collected data were fit to the Michaelis–Menten equation with KaleidaGraph (Synergy Software, Reading, PA).

Supplementary Material

Supporting Information

Acknowledgments

We thank Daniel Malashock for assistance in constructing Venn diagrams from multiple sequence alignments and Melinda Hanes for carrying out the EGAD analyses. This work was supported by National Institutes of Health Grant GM35393.

Abbreviations

AATase

aspartate aminotransferase

eMDH

Escherichia coli malate dehydrogenase

LDH

lactate dehydrogenase

MDH

malate dehydrogenase

OAA

oxaloacetate

pentamutant

eMDH with five amino acid substitutions I12V/R81Q/M85E/G210A/V214I

SF

strong forcing set

TATase

tyrosine aminotransferase.

Footnotes

The authors declare no conflict of interest.

Position 102 is the B. stearothermophilus sequence number and corresponds to position 81 in E. coli. The E. coli numbering is used in the manuscript.

This article contains supporting information online at www.pnas.org/cgi/content/full/0708265104/DC1.

References

  • 1.Peters MW, Meinhold P, Glieder A, Arnold FH. J Am Chem Soc. 2003;125:13442–13450. doi: 10.1021/ja0303790. [DOI] [PubMed] [Google Scholar]
  • 2.Wang L, Brock A, Herberich B, Schultz PG. Science. 2001;292:498–500. doi: 10.1126/science.1060077. [DOI] [PubMed] [Google Scholar]
  • 3.Rothman SC, Kirsch JF. J Mol Biol. 2003;327:593–608. doi: 10.1016/s0022-2836(03)00095-0. [DOI] [PubMed] [Google Scholar]
  • 4.Wilks HM, Hart KW, Feeney R, Dunn CR, Muirhead H, Chia WN, Barstow DA, Atkinson T, Clarke AR, Holbrook JJ. Science. 1988;242:1541–1544. doi: 10.1126/science.3201242. [DOI] [PubMed] [Google Scholar]
  • 5.Nicholls DJ, Miller J, Scawen MD, Clarke AR, Holbrook JJ, Atkinson T, Goward CR. Biochem Biophys Res Commun. 1992;189:1057–1062. doi: 10.1016/0006-291x(92)92311-k. [DOI] [PubMed] [Google Scholar]
  • 6.Madern D. J Mol Evol. 2002;54:825–840. doi: 10.1007/s00239-001-0088-8. [DOI] [PubMed] [Google Scholar]
  • 7.Cendrin F, Chroboczek J, Zaccai G, Eisenberg H, Mevarech M. Biochemistry. 1993;32:4308–4313. doi: 10.1021/bi00067a020. [DOI] [PubMed] [Google Scholar]
  • 8.Boernke WE, Millard CS, Stevens PW, Kakar SN, Stevens FJ, Donnelly MI. Arch Biochem Biophys. 1995;322:43–52. doi: 10.1006/abbi.1995.1434. [DOI] [PubMed] [Google Scholar]
  • 9.Pokala N, Handel TM. Protein Sci. 2004;13:925–936. doi: 10.1110/ps.03486104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Pokala N, Handel TM. J Mol Biol. 2005;347:203–227. doi: 10.1016/j.jmb.2004.12.019. [DOI] [PubMed] [Google Scholar]
  • 11.Hall MD, Banaszak LJ. J Mol Biol. 1993;232:213–222. doi: 10.1006/jmbi.1993.1377. [DOI] [PubMed] [Google Scholar]
  • 12.Thompson JD, Higgins DG, Gibson TJ. Nucleic Acids Res. 1994;22:4673–4680. doi: 10.1093/nar/22.22.4673. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Lichtarge O, Bourne HR, Cohen FE. J Mol Biol. 1996;257:342–358. doi: 10.1006/jmbi.1996.0167. [DOI] [PubMed] [Google Scholar]
  • 14.Kalinina OV, Mironov AA, Gelfand MS, Rakhmaninova AB. Protein Sci. 2004;13:443–456. doi: 10.1110/ps.03191704. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Hall MD, Levitt DG, Banaszak LJ. J Mol Biol. 1992;226:867–882. doi: 10.1016/0022-2836(92)90637-y. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES