Significance
The rate at which proteins accumulate amino acid substitutions during evolution depends on the likelihood that mutations will disrupt structure or affect function. Many mutations affect the ability of proteins to fold correctly, and previous studies showed that the burden imposed by misfolded proteins in cells heavily influences evolutionary rates of proteins. However, these studies could not examine the influence of function on evolutionary rates. The work described here examines the relationship between structural and functional divergence in a rapidly evolving protein family. This analysis revealed that family members that evolved a new function retained more ancestral sequence and structural characteristics, suggesting that the rate of protein evolution is not proportional to the capacity to evolve new functions.
Keywords: enolase superfamily, protein structure, protein structure-function relationships
Abstract
The rate of protein evolution is determined by a combination of selective pressure on protein function and biophysical constraints on protein folding and structure. Determining the relative contributions of these properties is an unsolved problem in molecular evolution with broad implications for protein engineering and function prediction. As a case study, we examined the structural divergence of the rapidly evolving o-succinylbenzoate synthase (OSBS) family, which catalyzes a step in menaquinone synthesis in diverse microorganisms and plants. On average, the OSBS family is much more divergent than other protein families from the same set of species, with the most divergent family members sharing <15% sequence identity. Comparing 11 representative structures revealed that loss of quaternary structure and large deletions or insertions are associated with the family’s rapid evolution. Neither of these properties has been investigated in previous studies to identify factors that affect the rate of protein evolution. Intriguingly, one subfamily retained a multimeric quaternary structure and has small insertions and deletions compared with related enzymes that catalyze diverse reactions. Many proteins in this subfamily catalyze both OSBS and N-succinylamino acid racemization (NSAR). Retention of ancestral structural characteristics in the NSAR/OSBS subfamily suggests that the rate of protein evolution is not proportional to the capacity to evolve new protein functions. Instead, structural features that are conserved among proteins with diverse functions might contribute to the evolution of new functions.
Investigating the causes and effects of protein sequence divergence is the key to identifying properties that enable proteins to evolve new functions. Previous studies found that constraints imposed by biophysical properties such as protein folding and stability, translational accuracy, and interactions with other proteins make a large contribution to the rate of protein evolution (1–11). However, the relative contributions of biophysical properties versus functional constraints is an open question (2). Given that the rate of protein evolution varies over several orders of magnitude, the evolutionary rate of each protein is probably determined by a unique blend of biophysical and functional constraints (12–15). Thus, the evolutionary simulations and statistical analyses of large protein datasets that comprise the primary focus of this field need to be supplemented with case studies.
Here, we present the extraordinarily diverse o-succinylbenzoate synthase (OSBS) family as such a case study. The OSBS family belongs to the enolase superfamily, a group of evolutionarily related protein families that have a common fold but catalyze diverse reactions (16). The rate of sequence divergence in the OSBS family is much faster than other families in the enolase superfamily. For example, the average pairwise amino acid sequence identity of OSBSs from 66 species is 26%, and the most divergent family members share <15% identity. Enzymes from the same set of species that belong to the enolase family, for which the superfamily is named, average 56% amino acid sequence identity (17). These numbers are inversely proportional to the evolutionary rate because the proteins are from the same set of species and thus have diverged for the same amount of time. However, these numbers underestimate the difference in evolutionary rate between these families, because sequence identity does not account for the occurrence of multiple mutations per site.
All enzymes in the enolase superfamily consist of a C-terminal (β/α)8-barrel linked to a capping domain composed of the N terminus and the last section of the C terminus (Fig. 1A). The conserved catalytic residues are in the barrel domain, and two loops from the capping domain form the rest of the active site. The only residues conserved in the whole OSBS family are short motifs surrounding the catalytic residues (17). These motifs are also conserved in members of the enolase superfamily that catalyze other reactions, so they are not sufficient to determine specificity for OSBS activity.
Fig. 1.
(A) Canonical structure of enolase superfamily proteins. The catalytic barrel domain is gray, the N-terminal part of the capping domain is green, the C-terminal part of the capping domain is magenta, and the linker between the domains is blue. Amycolatopsis NSAR/OSBS [Protein Data Bank (PDB) ID code 1SJB] is shown. (B) The o-succinylbenzoate synthase (OSBS) and N-succinylamino acid racemase (NSAR) reactions. Structural similarities of the intermediates are red; blue atoms are lost or rearranged in the reactions. R, hydrophobic amino acid side chain. (C) Distribution of crystal structures in the OSBS family (53). Subfamilies (shown as wedges) were defined by grouping sequences whose pairwise amino acid sequence identity is >40%. Underlined PDB ID codes are structures that were determined in this study. Table 1 lists the species that encode each protein. Quaternary structure is indicated with a superscript letter (d, dimer; m, monomer; o, octamer).
We have ruled out several factors that could have contributed to the high sequence divergence of the OSBS family. First, high sequence divergence is not a general property of the enolase superfamily, as mentioned above. Second, the family’s sequence diversity is not due to convergent evolution, because it has a monophyletic origin (17). Third, the OSBS family did not evolve earlier than related families, as demonstrated by comparing OSBS enzymes to paralogs from the same species (17).
Finally, sequence divergence is not due to functional divergence: most proteins in the family catalyze a conserved step in menaquinone (Vitamin K) synthesis (Fig. 1B). The exceptions are promiscuous enzymes that catalyze both the OSBS reaction and a second reaction, N-succinylamino acid racemization (NSAR), which is part of a pathway that converts d-amino acids to l-amino acids (18, 19). The NSAR/OSBS enzymes originated in a single branch of the OSBS family, the NSAR/OSBS subfamily, which also includes proteins that only have OSBS activity (Fig. 1C) (17, 18). Proteins within the NSAR/OSBS subfamily share >40% sequence identity, so the high sequence divergence in the OSBS family was not required to evolve the new activity.
In this work, we compared structural and sequence divergence in the OSBS family by determining the structures of a representative set of OSBS enzymes (Fig. 1C). The most significant difference among these enzymes is their quaternary structure. All OSBS enzymes, except those in the NSAR/OSBS subfamily, are monomers. Monomeric OSBS enzymes accumulated insertions, deletions, and other mutations that caused them to diverge from each other and the rest of the enolase superfamily. In contrast, proteins in the NSAR/OSBS subfamily are multimeric, like nearly all other members of the enolase superfamily. Because of structural constraints imposed by their quaternary structure, the sequences and structures of proteins in the NSAR/OSBS subfamily are much more similar to other members of the enolase superfamily than they are to other OSBS enzymes. Thus, structural divergence is associated with the high evolutionary rate of the OSBS family, whereas functional divergence in the NSAR/OSBS subfamily is associated with retention of ancestral structural characteristics.
Results
Comparison of Activities in the OSBS Family.
This study analyzes a representative set of enzymes from the OSBS family whose pairwise amino acid sequence identity is <20%. Previously, we determined that these divergent enzymes belong to the OSBS family based on phylogeny and/or the presence of their genes in menaquinone synthesis gene clusters (17). We verified their activities by enzymatic assays (Table 1 and Table S1). All family members had similar catalytic efficiencies for the OSBS reaction (kcat/KM = 105 to 106 M−1⋅s−1).
Table 1.
Enzymatic activity and quaternary structure of enzymes in the OSBS family
| Species | Subfamily | OSBS | NSAR* | ID code(s) | Quaternary structure |
| kcat/KM, M−1⋅s−1 | kcat/KM, M−1⋅s−1 | ||||
| Escherichia coli | γ-Proteobacteria 1 | 2.0 x 106† | n.a.‡ | 1FHV (21) | Monomer |
| Desulfotalea psychrophila | Bacteroidetes | 1.1 x 106 | n.a. | 2PGE | Monomer |
| Thermosynechococcus elongatus | Cyanobacteria 1 | 1.0 x 106 | n.a. | 3H7V, 2OZT | Monomer |
| Bdellovibrio bacteriovorus | Not assigned | 3.1 x 105 | n.a. | 3CAW | Monomer |
| Thermobifida fusca | Actinobacteria | 6.7 x 105§ | n.a. | 2QVH, 2OPJ (52) | Monomer |
| Staphylococcus aureus | Not assigned | 1.1 x 106 | n.a. | 3H70, 2OKT, 2OLA | Monomer |
| Amycolatopsis sp. T-1–60 | NSAR/OSBS | 2.5 x 105¶ | 2.0 x 105 | 1SJB (22) | Octamer |
| Deinococcus radiodurans | NSAR/OSBS | 3.1 x 105 | 3.7 x 105 | 1XS2 (24) | Octamer |
| Thermus thermophilus | NSAR/OSBS | 6.5 x 105 | 7.5 x 104 | 2ZC8 (23) | Octamer |
| Enterococcus faecalis | NSAR/OSBS | 1.6 x 106 | 1.4 x 105 | 1WUE | Dimer |
| Listeria innocua | NSAR/OSBS | 2.9 x 106 | 2.6 x 103 | 1WUF | Dimer|| |
Table S1 lists all kinetic parameters.
N-Succinyl-l-phenylglycine was the substrate.
OSBS activity was measured in ref. 53.
n.a., not active. NSAR activity was measured using 10 μM enzyme and 20 mM N-succinyl-l-phenylglycine.
OSBS activity was measured in ref. 52.
On size exclusion chromatography, it primarily elutes as a dimer, although it also has a significant monomer peak (Fig. S1).
NSAR activity was only detected in the NSAR/OSBS subfamily. The two previously uncharacterized members of the NSAR/OSBS subfamily also catalyze the NSAR reaction, like other members of this subfamily (20). Enterococcus faecalis NSAR/OSBS is encoded in a menaquinone synthesis operon, indicating that OSBS is its biological function (17). A pathway that requires NSAR activity has not been identified in this species, so whether NSAR is also a biological activity is unknown. Listeria innocua has the menaquinone synthesis pathway, indicating that the species requires OSBS activity. However, the NSAR/OSBS is not encoded in the menaquinone operon, raising the possibility that both NSAR and OSBS are biological functions, as observed in the NSAR/OSBS from Geobacillus kaustophilus (17, 19).
Quaternary Structure of OSBS Enzymes.
Crystal structures of OSBS family members from 12 species have been determined, including 6 reported in this work (Table 1). All of them have the canonical enolase superfamily fold, but their quaternary structures are not conserved, as determined from crystal packing and size exclusion chromatography (Fig. S1). Like most other members of the enolase superfamily, the five enzymes from the NSAR/OSBS subfamily are multimers (21–40). The three previously characterized NSAR/OSBS subfamily enzymes are octamers, and the NSAR/OSBS subfamily enzymes from E. faecalis and L. innocua are dimers (22–24). In contrast, the OSBSs from other subfamilies are all monomers.
Structural Comparison of OSBS Monomers.
We compared OSBS family structures to 51 other members of the enolase superfamily using TM-align (Fig. 2) (41). The TM score was used because it considers both RMSD between aligned residues and coverage (fraction of residues in the proteins that were aligned) (42). The TM score is 1 for identical structures, >0.5 for structures that have the same fold, and <0.2 for unrelated structures. As expected from sequence divergence between OSBS subfamilies, structural divergence between OSBS subfamilies is much higher than the divergence within the NSAR/OSBS subfamily (columns 1 and 2 versus column 3 in Fig. 2A).
Fig. 2.
Structural divergence of the OSBS family comparing (A) full-length proteins, (B) barrel domains, or (C) capping domains. Each point represents the TM score of a pair of proteins. The gray bars are the average TM score of each set. The average percentage sequence identity (number of identical residues divided by the length of the structural alignment) are shown in A to illustrate that the sequence divergence in the OSBS family is similar to the divergence in the whole superfamily. Because calculated percentage identity varies by several percent depending on how the sequences are aligned, the difference between 16% and 23% might not be significant (60). The proteins compared in each column are as follows: (1) OSBS family structures, excluding the NSAR/OSBS subfamily; (2) NSAR/OSBS subfamily structures compared with OSBSs from other subfamilies; (3) NSAR/OSBS subfamily structures, excluding other OSBS subfamilies; (4) OSBS family structures, excluding the NSAR/OSBS subfamily, compared with proteins from other families in the enolase superfamily; (5) NSAR/OSBS subfamily structures compared with proteins from other families in the enolase superfamily; (6) structures from other families in the enolase superfamily, excluding the OSBS family.
Given that a new function evolved in the NSAR/OSBS subfamily, one might expect that structures from this subfamily would have diverged from the rest of the enolase superfamily as much or more than other OSBS enzymes. The opposite is true: proteins in the NSAR/OSBS subfamily are more similar to proteins from other families in the enolase superfamily than to other members of the OSBS family (columns 4–6 in Fig. 2A). The other OSBS subfamilies have diverged both from each other and from the rest of the enolase superfamily.
To determine which parts of the structure have diverged the most, we analyzed the barrel and capping domains separately. The structural divergence of the barrel domain is similar to the full-length protein (Fig. 2B). However, the capping domain is much more divergent, both within the OSBS family and among other members of the enolase superfamily (Fig. 2C). Restricting the analysis to structures bound to substrate or product analogs produced the same result, indicating that divergence of the capping domain is not an artifact from comparing apo- and ligand-bound structures.
Subdividing the capping domain into smaller regions showed that the linker between the capping and barrel domains and the C-terminal portion of the capping domain have extremely low TM scores (∼0.3 within the OSBS family compared with ∼0.45 for the whole enolase superfamily; Table S2 and Fig. 1A). The conformation of the linker in the NSAR/OSBS subfamily is similar to that of other enolase superfamily members (Fig. 3). In other OSBS subfamilies, deletions in the linker resulted in loss of a short helix and an extended conformation. The length of the C-terminal section of the capping domain is especially variable, and extensions at the C terminus lie in a variety of directions relative to the rest of the protein (Fig. S2).
Fig. 3.
Structural divergence of the linker between the barrel and capping domains. (A) OSBS proteins, excluding the NSAR/OSBS subfamily (blue; PDB ID codes 1FHV, 2OKT, 2OZT, 2PGE, 2QVH, and 3CAW). (B) The NSAR/OSBS subfamily (pink; PDB ID codes 1SJB, 1WUE, 1WUF, 1XS2, and 2ZC8). (C) Other members of the enolase superfamily (purple; PDB ID codes 1EBG, 1EC8, 1KKR, 2PMQ, 2QJN, 1TKK, 2MNR, and 3DGB). The entrance to the active site is behind the structures and is marked with a yellow circle in B.
Insertions and Deletions in the OSBS Family.
Insertions and deletions (indels) are mainly responsible for structural divergence of the capping domain in the OSBS family (Fig. 4). The average number and length of indels in the enolase superfamily are 7.5 and 4.0, respectively (Table S3). Similarly, the monomeric OSBSs have 8.8 indels that are 4.8 residues long, on average. Although the average number of indels in the NSAR/OSBS subfamily is similar (8.6), the average length (1.4) is much shorter. Within OSBS subfamilies, the positions of most indels are conserved, although the length can vary.
Fig. 4.
Minimum number of insertions and deletions (indels) in OSBS enzymes. Seven OSBS family members (black) are compared with three other members of the enolase superfamily (gray): muconate lactonizing enzyme (MLE) from Pseudomonas fluorescens (PDB ID code 3DGB), dipeptide epimerase (DE) from Bacillus subtilis (PDB ID code 1TKK), and mandelate racemase (MR) from Pseudomonas putida (PDB ID code 2MNR). The first schematic shows the typical secondary structure of enolase superfamily proteins, with green representing β-sheets, pink representing α-helices, and yellow, cyan, and gray representing loops and linkers. Deletions are black, and insertions are red. White regions are gaps that align with insertions in other sequences. The length of each colored segment is proportional to the number of amino acids. Asterisks indicate the positions of the conserved catalytic residues. The total number of indels listed excludes length heterogeneity at the N and C termini.
Most indels in the capping domain are distant from the active site. The second α-helix of the N-terminal capping domain is missing or truncated in four OSBSs (Escherichia coli, Thermosynechococcus elongatus, Bdellovibrio bacteriovorus, and Thermobifida fusca), but it is present in most other members of the enolase superfamily (Figs. 3 and 4). Both the first and second α-helices are deleted in T. fusca OSBS. In Desulfotalea psychrophila OSBS, another helix is inserted after the second α-helix. This insertion is uncommon in the Bacteroidetes subfamily, and enzymes without the insertion lack both the first and second α-helices of the capping domain. Deletions also occur in the linker between the capping and barrel domains in many OSBSs, which is at the end of the third α-helix. Strikingly, the positions of indels in the capping domain helices and linker in monomeric OSBSs are at the interface between subunits in NSAR/OSBS enzymes and other multimeric enolase superfamily members (Fig. 5).
Fig. 5.
Many deletions are located at lost subunit interfaces. (A) Residues that are at subunit interfaces in the octameric Amycolatopsis NSAR/OSBS (1SJB) are red. (B) Positions at which insertions or deletions occur in monomeric OSBS enzymes from other subfamilies are red. The active site is marked with a yellow circle.
Discussion
Our study highlights two structural features that affected the rate of protein evolution in the OSBS family: quaternary structure and indels. Most studies to identify factors that affect the rate of protein evolution have not considered these features because they used large, multifamily datasets that lack experimental information about quaternary structure or accurate alignments to determine positions of indels. As a result, case studies on model systems like the OSBS family provide critical insights into factors affecting the structural and functional evolution of proteins. The large insertions and deletions in the capping domains of several OSBSs could have accelerated the evolutionary rate of amino acid substitutions to compensate for structural perturbations. Indeed, the evolutionary rates of inserted residues and residues flanking deletions are higher than expected based on their solvent accessibility in several monomeric OSBS subfamilies (Fig. S3A). This result agrees with previous studies that found higher mutation rates near the sites of indels (43, 44).
Previous studies also did not consider the role of homomeric quaternary structure in determining evolutionary rates. However, several studies have shown that protein–protein interactions decrease evolutionary rates, partly by decreasing the fraction of surface-exposed residues (5–9, 45). Likewise, interactions with large capping domains in the haloalkanoate dehalogenase superfamily constrain the structural divergence of their Rossman-fold core domain (46). Our observation that the OSBS family, which is primarily composed of monomers, evolved at a faster rate than related, homomultimeric families is consistent with these studies. These studies would also suggest that buried residues at the interface between subunits in NSAR/OSBS enzymes would have slower evolutionary rates than homologous, solvent-exposed residues in monomeric OSBS enzymes. Our results offer some support for this idea, but only a small number of sites fit these criteria (Fig. S3B).
Other studies have calculated frequencies of deletions in protein superfamilies and assessed their effect on functional divergence. Reeves et al. (47) reported that the average indel length of proteins with <20% identity is 6.6, which declines to 3.5 for proteins having 20–40% sequence identity. The OSBS family is in the middle of this range, with an average indel length of 4.3 among proteins that have ∼20% sequence identity. Previous studies also noted a steep decline in both structural and functional similarity below ∼30% sequence identity (47, 48). However, OSBS activity has been conserved despite divergence of the tertiary and quaternary structure. Counterintuitively, functional divergence occurred in the subfamily that retained the most structural similarity to functionally diverse members of the enolase superfamily.
Analyzing the structures of 11 OSBS enzymes revealed that absence of quaternary structure is associated with the structural and sequence diversification of the OSBS family. The fact that nearly all other proteins in the enolase superfamily are multimers suggests that the common ancestor of the OSBS family was also a multimer. If so, loss of quaternary structure permitted the extreme structural and sequence divergence seen in the OSBS family. This scenario is supported by rooting the phylogenetic tree using closely related families as the outgroup (17, 49). The root falls between the NSAR/OSBS subfamily and the other OSBS subfamilies, suggesting that quaternary structure was lost once (Fig. 1C). The alternative scenario is that the root falls within the OSBS family. If so, an ancestral, monomeric OSBS would have given rise to homomultimeric descendants that subsequently experienced functional divergence to give rise to proteins with OSBS, NSAR, dipeptide epimerase, muconate cycloisomerase, and other activities. We cannot exclude this possibility because of challenges associated with rooting the phylogeny of paralogous proteins. However, it is less parsimonious than invoking loss of quaternary structure as the driving force for divergence of the OSBS family.
Is loss of quaternary structure sufficient to explain the extreme sequence divergence of monomeric OSBSs? The uncatalyzed rate of the OSBS reaction is 1,000 times faster than the uncatalyzed rate of mandelate racemization, a reaction catalyzed by a related family (50, 51). Authors of these studies suggested that the relatively high uncatalyzed rate of the OSBS reaction might be associated with greater tolerance of mutations and thus a higher evolutionary rate (50). To date, our data do not support this idea, although experiments have been limited to a small number of active-site residues. Mutations of active-site residues have similar effects in E. coli OSBS, T. fusca OSBS, and P. putida mandelate racemase (MR), reducing kcat/KM by ∼10- to 500-fold (52–55).
Instead, our data suggest a model in which the active sites of OSBS enzymes diverged to compensate for (or were permitted to diverge by) mutations that affected the structure outside the active site, such as deletions at former subunit interfaces. Given the large structural changes associated with loss of quaternary structure and indels, the divergence of OSBS enzymes is probably irreversible. Indeed, mutagenesis to swap amino acids at homologous positions in E. coli and T. fusca OSBS enzymes was deleterious (52). This is similar to the observed mutational epistasis in other proteins, such as the glucocorticoid receptors, although structure, not specificity, has diverged among monomeric OSBSs (56).
The only members of the OSBS family that have NSAR activity are in the NSAR/OSBS subfamily, which includes both promiscuous NSAR/OSBS enzymes and enzymes that catalyze only the OSBS reaction. Remarkably, enzymes in the NSAR/OSBS subfamily are more similar to members of the enolase superfamily that have diverse functions than they are to proteins in other OSBS subfamilies. This raises the possibility that retention of ancestral sequence and structural features contributed to the evolution of NSAR activity.
This idea contrasts with the concept of designability as proposed by England and Shakhnovich (57). They define designability as the number of sequences that are capable of folding into a specific topology below a certain energy threshold. Bloom et al. (58) related designability to high rates of protein evolution, which are enhanced due to structural features such as higher densities of interresidue contacts. This concept might be useful when considering designing protein structures, but it may not be applicable to designing new protein functions. Instead, our results show that family members that evolved a new function retained more ancestral sequence and structural characteristics, suggesting that the rate of protein evolution is not proportional to the capacity to evolve new functions.
Materials and Methods
Biochemical Methods.
Genes for OSBS family enzymes from Staphylococcus aureus (menC), T. elongatus (Tlr1174), D. psychrophila (DP0251), B. bacteriovorus (Bd0547), E. faecalis (EF0450), and L. innocua (lin2664) were cloned into N- or C-terminal His-tag vectors for protein expression and purification. Detailed methods for protein production, structure determination, size exclusion chromatography, and catalytic activity assays are in SI Materials and Methods (Table S4).
Mapping Insertions and Deletions.
To accurately map insertions and deletions, 62 proteins from the enolase superfamily were aligned using University of California San Francisco Chimera (59). The alignment was manually refined based on visual inspection of the structural alignment. Positions of insertions and deletions were determined by comparing each protein to the consensus of the structural alignment. Ideally, the consensus would represent the ancestral structure of the enolase superfamily. However, the lengths of some regions are heterogenous throughout the enolase superfamily, making it difficult to determine the ancestral state. The C-terminal section of the capping domain has additional indels, but they were not enumerated because this region is difficult to align (Fig. S2). To determine the number of indels in each sequence, indels that were separated by more than five residues were considered a single indel, to account for inaccuracies in the alignment. Also, indels longer than one residue could represent multiple insertion and deletion events. Consequently, Fig. 4 and Table S3 report the minimum number of indels.
Other Bioinformatics Methods.
Detailed procedures for automated structural alignment and calculation of evolutionary rates are in SI Materials and Methods.
Supplementary Material
Acknowledgments
We acknowledge the efforts of all New York SGX Research Center for Structural Genomics (NYSGXRC) and New York Structural Genomics Research Consortium (NYSGRC) personnel who contributed to the structure determination and manuscript preparation. We thank Dr. Larry Dangott of the Protein Chemistry Laboratory at Texas A&M University for assistance with size exclusion chromatography. We thank Jacob Jipp and Christine Jones for assistance with subcloning and Dr. Andrew McMillan for insightful comments on the manuscript. This work was supported by Grant A-1758 from the Robert A. Welch Foundation and Grant 000517-0016-2011 from the Norman Hackerman Advanced Research Program (to M.E.G.). The NYSGXRC was supported by National Institutes of Health (NIH) Grant U54 GM074945 (to S.K.B.). The NYSGRC was supported by NIH Grant U54 GM094662 (to S.C.A.). Diffraction data were collected at the National Synchrotron Light Source, Brookhaven National Laboratory [supported by the US Department of Energy (DOE), Office of Science, Office of Basic Energy Sciences, under Contract DE-AC02-98CH10886]: beamlines X29A and X12C (supported by Grant P30-EB-009998 from the National Institute of Biomedical Imaging and Bioengineering to The Center for Synchrotron Biosciences, and the NIH National Center for Research Resources and the DOE Office of Biological and Environmental Research to the National Synchrotron Light Source Research Resource for Macromolecular Crystallography); beamline X9A (supported by Rockefeller University, Albert Einstein College of Medicine, and Sloan Kettering Institute); and beamline X4A (supported by the New York Structural Biology Center). Diffraction data were also collected at the Advanced Photon Source, Argonne National Laboratory (supported by the DOE, Office of Science, Office of Basic Energy Sciences, under Contract DE-AC02-06CH11357): LRL-CAT 31-ID-D beamline (data collection at sector 31 of the Advanced Photon Source was provided by Eli Lilly, which operates the facility).
Footnotes
The authors declare no conflict of interest.
This article is a PNAS Direct Submission. D.L.T. is a guest editor invited by the Editorial Board.
Data deposition: The atomic coordinates and structure factors have been deposited in the Protein Data Bank, www.pdb.org (PDB ID codes 2OKT, 2OLA, 3H70, 2OZT, 3H7V, 2PGE, 3CAW, 1WUE, and 1WUF).
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1318703111/-/DCSupplemental.
References
- 1.Pál C, Papp B, Lercher MJ. An integrated view of protein evolution. Nat Rev Genet. 2006;7(5):337–348. doi: 10.1038/nrg1838. [DOI] [PubMed] [Google Scholar]
- 2.Wilke CO, Drummond DA. Signatures of protein biophysics in coding sequence evolution. Curr Opin Struct Biol. 2010;20(3):385–389. doi: 10.1016/j.sbi.2010.03.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Serohijos AW, Rimas Z, Shakhnovich EI. Protein biophysics explains why highly abundant proteins evolve slowly. Cell Rep. 2012;2(2):249–256. doi: 10.1016/j.celrep.2012.06.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Lobkovsky AE, Wolf YI, Koonin EV. Universal distribution of protein evolution rates as a consequence of protein folding physics. Proc Natl Acad Sci USA. 2010;107(7):2983–2988. doi: 10.1073/pnas.0910445107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Yang JR, Liao BY, Zhuang SM, Zhang J. Protein misinteraction avoidance causes highly expressed proteins to evolve slowly. Proc Natl Acad Sci USA. 2012;109(14):E831–E840. doi: 10.1073/pnas.1117408109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Franzosa EA, Xia Y. Structural determinants of protein evolution are context-sensitive at the residue level. Mol Biol Evol. 2009;26(10):2387–2395. doi: 10.1093/molbev/msp146. [DOI] [PubMed] [Google Scholar]
- 7.Mintseris J, Weng Z. Structure, function, and evolution of transient and obligate protein-protein interactions. Proc Natl Acad Sci USA. 2005;102(31):10930–10935. doi: 10.1073/pnas.0502667102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Kim PM, Lu LJ, Xia Y, Gerstein MB. Relating three-dimensional structures to protein networks provides evolutionary insights. Science. 2006;314(5807):1938–1941. doi: 10.1126/science.1136174. [DOI] [PubMed] [Google Scholar]
- 9.Eames M, Kortemme T. Structural mapping of protein interactions reveals differences in evolutionary pressures correlated to mRNA level and protein abundance. Structure. 2007;15(11):1442–1451. doi: 10.1016/j.str.2007.09.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Yang JR, Zhuang SM, Zhang J. Impact of translational error-induced and error-free misfolding on the rate of protein evolution. Mol Syst Biol. 2010;6:421. doi: 10.1038/msb.2010.78. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Drummond DA, Wilke CO. Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolution. Cell. 2008;134(2):341–352. doi: 10.1016/j.cell.2008.05.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Grishin NV, Wolf YI, Koonin EV. From complete genomes to measures of substitution rate variability within and between proteins. Genome Res. 2000;10(7):991–1000. doi: 10.1101/gr.10.7.991. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Drummond DA, Bloom JD, Adami C, Wilke CO, Arnold FH. Why highly expressed proteins evolve slowly. Proc Natl Acad Sci USA. 2005;102(40):14338–14343. doi: 10.1073/pnas.0504070102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Wall DP, et al. Functional genomic analysis of the rates of protein evolution. Proc Natl Acad Sci USA. 2005;102(15):5483–5488. doi: 10.1073/pnas.0501761102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Wolf YI, Novichkov PS, Karev GP, Koonin EV, Lipman DJ. The universal distribution of evolutionary rates of genes and distinct characteristics of eukaryotic genes of different apparent ages. Proc Natl Acad Sci USA. 2009;106(18):7273–7280. doi: 10.1073/pnas.0901808106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Gerlt JA, Babbitt PC, Rayment I. Divergent evolution in the enolase superfamily: The interplay of mechanism and specificity. Arch Biochem Biophys. 2005;433(1):59–70. doi: 10.1016/j.abb.2004.07.034. [DOI] [PubMed] [Google Scholar]
- 17.Glasner ME, et al. Evolution of structure and function in the o-succinylbenzoate synthase/N-acylamino acid racemase family of the enolase superfamily. J Mol Biol. 2006;360(1):228–250. doi: 10.1016/j.jmb.2006.04.055. [DOI] [PubMed] [Google Scholar]
- 18.Palmer DR, et al. Unexpected divergence of enzyme function and sequence: “N-Acylamino acid racemase” is o-succinylbenzoate synthase. Biochemistry. 1999;38(14):4252–4258. doi: 10.1021/bi990140p. [DOI] [PubMed] [Google Scholar]
- 19.Sakai A, et al. Evolution of enzymatic activities in the enolase superfamily: N-Succinylamino acid racemase and a new pathway for the irreversible conversion of D- to L-amino acids. Biochemistry. 2006;45(14):4455–4462. doi: 10.1021/bi060230b. [DOI] [PubMed] [Google Scholar]
- 20.Taylor Ringia EA, et al. Evolution of enzymatic activity in the enolase superfamily: Functional studies of the promiscuous o-succinylbenzoate synthase from Amycolatopsis. Biochemistry. 2004;43(1):224–229. doi: 10.1021/bi035815+. [DOI] [PubMed] [Google Scholar]
- 21.Thompson TB, et al. Evolution of enzymatic activity in the enolase superfamily: Structure of o-succinylbenzoate synthase from Escherichia coli in complex with Mg2+ and o-succinylbenzoate. Biochemistry. 2000;39(35):10662–10676. doi: 10.1021/bi000855o. [DOI] [PubMed] [Google Scholar]
- 22.Thoden JB, et al. Evolution of enzymatic activity in the enolase superfamily: Structural studies of the promiscuous o-succinylbenzoate synthase from Amycolatopsis. Biochemistry. 2004;43(19):5716–5727. doi: 10.1021/bi0497897. [DOI] [PubMed] [Google Scholar]
- 23.Hayashida M, Kim SH, Takeda K, Hisano T, Miki K. Crystal structure of N-acylamino acid racemase from Thermus thermophilus HB8. Proteins. 2008;71(1):519–523. doi: 10.1002/prot.21926. [DOI] [PubMed] [Google Scholar]
- 24.Wang WC, et al. Structural basis for catalytic racemization and substrate specificity of an N-acylamino acid racemase homologue from Deinococcus radiodurans. J Mol Biol. 2004;342(1):155–169. doi: 10.1016/j.jmb.2004.07.023. [DOI] [PubMed] [Google Scholar]
- 25.Gulick AM, Schmidt DM, Gerlt JA, Rayment I. Evolution of enzymatic activities in the enolase superfamily: Crystal structures of the L-Ala-D/L-Glu epimerases from Escherichia coli and Bacillus subtilis. Biochemistry. 2001;40(51):15716–15724. doi: 10.1021/bi011641p. [DOI] [PubMed] [Google Scholar]
- 26.Helin S, Kahn PC, Guha BL, Mallows DG, Goldman A. The refined X-ray structure of muconate lactonizing enzyme from Pseudomonas putida PRS2000 at 1.85 A resolution. J Mol Biol. 1995;254(5):918–941. doi: 10.1006/jmbi.1995.0666. [DOI] [PubMed] [Google Scholar]
- 27.Kajander T, Lehtiö L, Schlömann M, Goldman A. The structure of Pseudomonas P51 Cl-muconate lactonizing enzyme: Co-evolution of structure and dynamics with the dehalogenation function. Protein Sci. 2003;12(9):1855–1864. doi: 10.1110/ps.0388503. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Klenchin VA, Schmidt DM, Gerlt JA, Rayment I. Evolution of enzymatic activities in the enolase superfamily: Structure of a substrate-liganded complex of the L-Ala-D/L-Glu epimerase from Bacillus subtilis. Biochemistry. 2004;43(32):10370–10378. doi: 10.1021/bi049197o. [DOI] [PubMed] [Google Scholar]
- 29.Song L, et al. Prediction and assignment of function for a divergent N-succinyl amino acid racemase. Nat Chem Biol. 2007;3(8):486–491. doi: 10.1038/nchembio.2007.11. [DOI] [PubMed] [Google Scholar]
- 30.Kalyanaraman C, et al. Discovery of a dipeptide epimerase enzymatic function guided by homology modeling and virtual screening. Structure. 2008;16(11):1668–1677. doi: 10.1016/j.str.2008.08.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Gulick AM, Hubbard BK, Gerlt JA, Rayment I. Evolution of enzymatic activities in the enolase superfamily: Crystallographic and mutagenesis studies of the reaction catalyzed by D-glucarate dehydratase from Escherichia coli. Biochemistry. 2000;39(16):4590–4602. doi: 10.1021/bi992782i. [DOI] [PubMed] [Google Scholar]
- 32.Yew WS, et al. Evolution of enzymatic activities in the enolase superfamily: L-Fuconate dehydratase from Xanthomonas campestris. Biochemistry. 2006;45(49):14582–14597. doi: 10.1021/bi061687o. [DOI] [PubMed] [Google Scholar]
- 33.Yew WS, et al. Evolution of enzymatic activities in the enolase superfamily: D-Tartrate dehydratase from Bradyrhizobium japonicum. Biochemistry. 2006;45(49):14598–14608. doi: 10.1021/bi061688g. [DOI] [PubMed] [Google Scholar]
- 34.Neidhart DJ, et al. Mechanism of the reaction catalyzed by mandelate racemase. 2. Crystal structure of mandelate racemase at 2.5-A resolution: Identification of the active site and possible catalytic residues. Biochemistry. 1991;30(38):9264–9273. doi: 10.1021/bi00102a019. [DOI] [PubMed] [Google Scholar]
- 35.Yew WS, Fedorov AA, Fedorov EV, Almo SC, Gerlt JA. Evolution of enzymatic activities in the enolase superfamily: L-Talarate/galactarate dehydratase from Salmonella typhimurium LT2. Biochemistry. 2007;46(33):9564–9577. doi: 10.1021/bi7008882. [DOI] [PubMed] [Google Scholar]
- 36.Rakus JF, et al. Computation-facilitated assignment of the function in the enolase superfamily: A regiochemically distinct galactarate dehydratase from Oceanobacillus iheyensis. Biochemistry. 2009;48(48):11546–11558. doi: 10.1021/bi901731c. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Rakus JF, et al. Evolution of enzymatic activities in the enolase superfamily: L-Rhamnonate dehydratase. Biochemistry. 2008;47(38):9944–9954. doi: 10.1021/bi800914r. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Rakus JF, et al. Evolution of enzymatic activities in the enolase superfamily: D-Mannonate dehydratase from Novosphingobium aromaticivorans. Biochemistry. 2007;46(45):12896–12908. doi: 10.1021/bi701703w. [DOI] [PubMed] [Google Scholar]
- 39.Levy CW, et al. Insights into enzyme evolution revealed by the structure of methylaspartate ammonia lyase. Structure. 2002;10(1):105–113. doi: 10.1016/s0969-2126(01)00696-7. [DOI] [PubMed] [Google Scholar]
- 40.Wedekind JE, Poyner RR, Reed GH, Rayment I. Chelation of serine 39 to Mg2+ latches a gate at the active site of enolase: Structure of the bis(Mg2+) complex of yeast enolase and the intermediate analog phosphonoacetohydroxamate at 2.1-A resolution. Biochemistry. 1994;33(31):9333–9342. doi: 10.1021/bi00197a038. [DOI] [PubMed] [Google Scholar]
- 41.Zhang Y, Skolnick J. TM-align: A protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 2005;33(7):2302–2309. doi: 10.1093/nar/gki524. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Zhang Y, Skolnick J. Scoring function for automated assessment of protein structure template quality. Proteins. 2004;57(4):702–710. doi: 10.1002/prot.20264. [DOI] [PubMed] [Google Scholar]
- 43.Zhang Z, Huang J, Wang Z, Wang L, Gao P. Impact of indels on the flanking regions in structural domains. Mol Biol Evol. 2011;28(1):291–301. doi: 10.1093/molbev/msq196. [DOI] [PubMed] [Google Scholar]
- 44.Tian D, et al. Single-nucleotide mutation rate increases close to insertions/deletions in eukaryotes. Nature. 2008;455(7209):105–108. doi: 10.1038/nature07175. [DOI] [PubMed] [Google Scholar]
- 45.Lin YS, Hsu WL, Hwang JK, Li WH. Proportion of solvent-exposed amino acids in a protein and rate of protein evolution. Mol Biol Evol. 2007;24(4):1005–1011. doi: 10.1093/molbev/msm019. [DOI] [PubMed] [Google Scholar]
- 46.Pandya C, et al. Consequences of domain insertion on sequence-structure divergence in a superfold. Proc Natl Acad Sci USA. 2013;110(36):E3381–E3387. doi: 10.1073/pnas.1305519110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Reeves GA, Dallman TJ, Redfern OC, Akpor A, Orengo CA. Structural diversity of domain superfamilies in the CATH database. J Mol Biol. 2006;360(3):725–741. doi: 10.1016/j.jmb.2006.05.035. [DOI] [PubMed] [Google Scholar]
- 48.Sandhya S, et al. Length variations amongst protein domain superfamilies and consequences on structure and function. PLoS One. 2009;4(3):e4981. doi: 10.1371/journal.pone.0004981. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Sakai A, et al. Evolution of enzymatic activities in the enolase superfamily: Stereochemically distinct mechanisms in two families of cis,cis-muconate lactonizing enzymes. Biochemistry. 2009;48(7):1445–1453. doi: 10.1021/bi802277h. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Taylor EA, Palmer DR, Gerlt JA. The lesser “burden borne” by o-succinylbenzoate synthase: An “easy” reaction involving a carboxylate carbon acid. J Am Chem Soc. 2001;123(24):5824–5825. doi: 10.1021/ja010882h. [DOI] [PubMed] [Google Scholar]
- 51.Bearne SL, Wolfenden R. Mandelate racemase in pieces: Effective concentrations of enzyme functional groups in the transition state. Biochemistry. 1997;36(7):1646–1656. doi: 10.1021/bi9620722. [DOI] [PubMed] [Google Scholar]
- 52.Odokonyero D, et al. Divergent evolution of ligand binding in the o-succinylbenzoate synthase family. Biochemistry. 2013;52(42):7512–7521. doi: 10.1021/bi401176d. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Zhu WW, et al. Residues required for activity in Escherichia coli o-succinylbenzoate synthase (OSBS) are not conserved in all OSBS enzymes. Biochemistry. 2012;51(31):6171–6181. doi: 10.1021/bi300753j. [DOI] [PubMed] [Google Scholar]
- 54.Bourque JR, Bearne SL. Mutational analysis of the active site flap (20s loop) of mandelate racemase. Biochemistry. 2008;47(2):566–578. doi: 10.1021/bi7015525. [DOI] [PubMed] [Google Scholar]
- 55.Siddiqi F, et al. Perturbing the hydrophobic pocket of mandelate racemase to probe phenyl motion during catalysis. Biochemistry. 2005;44(25):9013–9021. doi: 10.1021/bi0473096. [DOI] [PubMed] [Google Scholar]
- 56.Bridgham JT, Ortlund EA, Thornton JW. An epistatic ratchet constrains the direction of glucocorticoid receptor evolution. Nature. 2009;461(7263):515–519. doi: 10.1038/nature08249. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.England JL, Shakhnovich EI. Structural determinant of protein designability. Phys Rev Lett. 2003;90(21):218101. doi: 10.1103/PhysRevLett.90.218101. [DOI] [PubMed] [Google Scholar]
- 58.Bloom JD, Drummond DA, Arnold FH, Wilke CO. Structural determinants of the rate of protein evolution in yeast. Mol Biol Evol. 2006;23(9):1751–1761. doi: 10.1093/molbev/msl040. [DOI] [PubMed] [Google Scholar]
- 59.Pettersen EF, et al. UCSF Chimera—a visualization system for exploratory research and analysis. J Comput Chem. 2004;25(13):1605–1612. doi: 10.1002/jcc.20084. [DOI] [PubMed] [Google Scholar]
- 60.Raghava GP, Barton GJ. Quantification of the variation in percentage identity for protein sequence alignments. BMC Bioinformatics. 2006;7:415. doi: 10.1186/1471-2105-7-415. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.





