Skip to main content
. Author manuscript; available in PMC: 2012 Apr 1.
Published in final edited form as: Curr Opin Struct Biol. 2011 Feb 24;21(2):180–188. doi: 10.1016/j.sbi.2011.02.001

Figure 2. Evolutionary Trace Annotation (ETA) of protein function.

Figure 2

A. ETA is composed of three steps. 1) The Evolutionary Trace [55] aligns homologous sequences and ranks positions according to the correlation between evolutionary divergence and amino acid variations. 2) The protein structure is labeled with these evolutionary importance rankings. 3) A heuristic selects clustered, surface exposed and evolutionarily important amino acids to form a structural template (red spheres). 4) A library of proteins with known function is searched for matches (called hits) to this template. An SVM filters discards the hits if they do not fall on top ranked ET residues (not depicted). 5-8) A reciprocal match is searched for and here shown to be found by repeating steps 1-4 in the opposite direction. B. ETA matches define a graph. Each protein chains is a node, and structural and evolutionary similarities are the edges. Some nodes are known to carry a given function (blue), other nodes are known to not carry that function (white), and the functional status of remaining nodes is unknown (?). The labels are then transferred among all nodes in the network based on the number of edges and their strength, in a process analogous to diffusion. The result is a score for every enzymatic function at every node. Finally, these scores are normalized and compared (not depicted). The predicted functional label is the one with the highest normalized weight (called z-score) that is also significant. C. Performance comparison of ETA network diffusion versus BLAST on a test set of structural genomics proteins. Diffusion of enzymatic function annotations showed a consistent accuracy advantage of approximately 9% over BLAST across many coverage levels [80]. D. UV absorbance (y-axis) confirms the predicted carboxylesterase activity of a previously unannotated protein from the medically relevant organism Staphylococcus aureus (3h04 in the Protein Data Bank). ETA network diffusion predicted this enzymatic function which was tested and confirmed in vitro. Specific activity was similar to that of a known carboxylesterase; the negative control, Bovine serum albumin (BSA), had no activity.