Abstract
The emergence of genomics; ongoing computational advances; and the development of large-scale sequence, structural, and functional databases have created important new interdisciplinary linkages between molecular evolution, molecular biology, and enzymology. The five minireviews in this series survey advances and challenges in this burgeoning field from complementary perspectives. The series has three major themes. The first is the evolution of enzyme superfamilies, in which members exhibit increasing sequence, structural, and functional divergence with increasing time of divergence from a common ancestor. The second is the evolutionary role of promiscuous enzymes, which, in addition to their primary function, have adventitious secondary activities that frequently provide the starting point for the evolution of new enzymes. The third is the importance of in silico approaches to the daunting challenge of assigning and predicting the functions of the many uncharacterized proteins in the large-scale sequence and structural databases that are now available. A recent computational advance, the use of protein similarity networks that map functional data onto proteins clustered by similarity, is presented as an approach that can improve functional insight and inference. The three themes are illustrated with several examples of enzyme superfamilies, including the amidohydrolase, metallo-β-lactamase, and enolase superfamilies.
Keywords: Enzyme Mechanisms, Enzyme Structure, Enzymes, Evolution, Molecular Evolution, Enzyme Superfamily
Introduction
As Theodosius Dobzhansky famously remarked, “Nothing in biology makes sense except in the light of evolution.” Nevertheless, for many decades, protein science and evolutionary biology were largely separate and distinct scientific disciplines, with knowledge in one field often seemingly having little relevance to the other. However, as a result of the emergence of genomics; large-scale sequence, structural, and functional databases; and powerful new computational approaches, insights from molecular evolution are now guiding and informing biochemical investigations and vice versa. The wealth of microbial genomes that are now available and the ease with which function can be manipulated and analyzed experimentally both in vitro and in vivo make enzyme evolution a particularly fruitful area of investigation. The five minireviews in this series provide reinforcing and complementary perspectives on what has been learned from post-genomic multidisciplinary investigations of enzyme evolution and what remains to be learned.
The minireviews have several recurring themes, framed in terms that may be unfamiliar to some readers but that are defined here. The most fundamental concept is the existence of protein families, whose members share a common evolutionary ancestor. The later in evolutionary history two family members diverged from a common ancestor, the greater their sequence, structural, functional, and mechanistic similarity. Thus, a family consists of a group of enzymes that have diverged relatively recently and share similar three-dimensional structures and functions. Superfamilies are made up of several families that diverged earlier in evolutionary time and therefore share fewer common features. Members of specificity diverse superfamilies, as defined by Babbitt and Gerlt in an earlier minireview (1), catalyze the same chemical reaction but do not share the same substrate specificities and have less sequence similarity than members of the same family. Mechanistically diverse superfamilies diverged even earlier and have greater diversity in that many of their constituent families show no significant sequence similarity and do not share the same substrate specificities, so families within them catalyze different overall reactions and share only some mechanistic attributes. Finally, enzymes that have different substrates and mechanisms and that generally share only a common fold and active site residues that each family has evolved to perform different mechanistic roles can be considered to form suprafamilies.
The concept of enzyme promiscuity is a second theme in several minireviews. Promiscuous activities are adventitious secondary activities that do not play a physiological role and that do not normally affect the fitness of the organism. Even though they are very inefficient, promiscuous enzyme activities provide an excellent starting place for the evolution of new enzymes. Copley places the evolution of new enzymes in a systems biology context. As she describes, the earliest enzymes probably had low specificity and low catalytic efficiency. The first steps in evolving today's highly specific and efficient enzymes were probably duplications of genes encoding primitive enzymes, followed by divergence of the duplicated genes to generate a family of enzymes with increased specificity and catalytic efficiency. However, at some later time, possibly within a billion years of the origin of life, the usual starting point in evolving new enzymes became not generalist enzymes but the promiscuous activities of specialized enzymes. As discussed by Copley, the potential for evolution of a new enzyme by this mechanism depends upon the collection of enzymes in a microbe, the topology of the metabolic network, the environmental conditions, and the net effect of the trade-offs between the original and novel activities of the enzyme. Examples in this minireview, drawn from several metabolic pathways, illustrate and validate this conceptual framework.
The role of divergent, convergent, and parallel evolution of protein superfamilies is the central theme of the review by Elias and Tawfik. As defined by these authors, divergent families arose from the same common ancestor and are typically in the same superfamily, sharing common structural and functional motifs and significant sequence similarity. In contrast, although convergent families have similar structural and functional features, they arose from different ancestors, belong to different superfamilies, and do not share detectable sequence similarity. However, Elias and Tawfik emphasize that families that are now considered to be examples of divergent evolution may, with further investigation, prove to be examples of parallel evolution from a more distant, ancient ancestor. This minireview touches on two examples of convergent evolution, in glycosidases and membrane proteases, before turning to quorum-quenching lactonases (QQLs)2 as a possible example of parallel evolution. Three different families of QQLs, belonging to three different superfamilies, have been identified. However, the QQLs in all three superfamilies have the same key active site residues and strikingly similar active site configurations. In all three superfamilies, a promiscuous organophosphate hydrolase activity, completely unrelated to lactonase activity, also has appeared. This promiscuous activity has given rise to organophosphate hydrolases that specifically hydrolyze paraoxon, a derivative of parathion, the first widely used organophosphate pesticide. Currently, these families would be considered as an example of convergent evolution of enzymes with different ancestries. However, if the relationship of the three superfamilies to a common ancient ancestor has simply not yet been discovered, the common features of the three QQL families would be the result of parallel, rather than convergent, evolution. The authors argue that the unexpected similarities between the active sites and predicted mechanisms of the lactonases and organophosphate hydrolases and the appearance of organophosphate hydrolase activity in all three superfamilies are further evidence for parallel evolution.
Galperin and Koonin provide well developed examples of the interplay between convergent and divergent evolution in both eukaryotic and prokaryotic genomes in five superfamilies: ATP-grasp, alkaline phosphatase, cupin, HD hydrolase, and N-terminal nucleophile hydrolase. Several enzymes in these superfamilies are moonlighters, promiscuous enzymes that have other roles, often structural, in atypical environments. This minireview also introduces a new concept: that maintaining the active site in a strained conformation (the entatic state, first identified by Vallee and Williams (2)) that enables it to bind substrates and form the transition state is likely to be one of the most important constraints leading to conservation of sequence motifs and active site residues in enzyme superfamilies throughout evolution. In contrast, the subsequent fate of the transition complex is not necessarily conserved, limiting our current ability to predict function from sequence.
The last two reviews, by Gerlt et al. and Brown and Babbitt, address the daunting challenge of assigning and predicting functions of the many proteins of unknown function in large-scale sequence and structure databases. Gerlt et al. took on this challenge for the large and mechanistically diverse enolase superfamily, which has >7200 members. This minireview compares three widely different approaches: developing inferences from genomic context and testing those inferences through knock-out experiments, screening physical libraries for possible substrates, and in silico ligand docking experiments. The experience of these investigators led them to conclude that, although all three strategies have had some success, in silico approaches have the highest throughput capabilities and will therefore become increasingly important as databases grow.
Building on this conclusion, Brown and Babbitt note how the huge increase in sequencing data opens up opportunities for large-scale mapping of sequences and structures to provide a more informative context than has previously been available for inference and interpretation of functional properties in enzyme superfamilies. This minireview describes and assesses emerging large-scale computational approaches and introduces protein similarity networks to provide summary views of what is known and unknown about large protein superfamilies in functional space. The authors view networks not as a substitute for phylogenetic analysis but as an approach that enables orthogonal data to be mapped onto proteins clustered by similarity and used to improve functional insight and inference. Discussions of the application of this approach to the eukaryotic protein kinase-like superfamily, acid-sugar dehydratases from the enolase superfamily in the human intestinal microbiome, and the GST superfamily illuminate the challenges and opportunities of applying freely available computational tools to other problems.
Collectively, these minireviews illustrate the potential of multidisciplinary efforts that combine genomics, structural biology, computation, bioinformatics, in vitro investigations of mechanism, and in vivo analyses of function. Although very few investigations will involve all of these approaches, their use in various combinations will inform and accelerate investigations of protein evolution, structure, function, and mechanism.
Footnotes
- QQL
- quorum-quenching lactonase.
REFERENCES
- 1. Gerlt J. A., Babbitt P. C. (2001) Annu. Rev. Biochem. 70, 209–246 [DOI] [PubMed] [Google Scholar]
- 2. Vallee B. L., Williams R. J. P. (1968) Proc. Natl. Acad. Sci. U.S.A. 59, 498–505 [DOI] [PMC free article] [PubMed] [Google Scholar]
