Abstract
Empirically derived models of amino acid replacement are employed to study the association between various physical features of proteins and evolution. The strengths of these associations are statistically evaluated by applying the models of protein evolution to 11 diverse sets of protein sequences. Parametric bootstrap tests indicate that the solvent accessibility status of a site has a particularly strong association with the process of amino acid replacement that it experiences. Significant association between secondary structure environment and the amino acid replacement process is also observed. Careful description of the length distribution of secondary structure elements and of the organization of secondary structure and solvent accessibility along a protein did not always significantly improve the fit of the evolutionary models to the data sets that were analyzed. As indicated by the strength of the association of both solvent accessibility and secondary structure with amino acid replacement, the process of protein evolution-both above and below the species level-will not be well understood until the physical constraints that affect protein evolution are identified and characterized.
Full Text
The Full Text of this article is available as a PDF (386.9 KB).
Selected References
These references are in PubMed. This may not be the complete list of references from this article.
- Asai K., Hayamizu S., Handa K. Prediction of protein secondary structure by the hidden Markov model. Comput Appl Biosci. 1993 Apr;9(2):141–146. doi: 10.1093/bioinformatics/9.2.141. [DOI] [PubMed] [Google Scholar]
- Benner S. A., Badcoe I., Cohen M. A., Gerloff D. L. Bona fide prediction of aspects of protein conformation. Assigning interior and surface residues from patterns of variation and conservation in homologous protein sequences. J Mol Biol. 1994 Jan 21;235(3):926–958. doi: 10.1006/jmbi.1994.1049. [DOI] [PubMed] [Google Scholar]
- Bernstein F. C., Koetzle T. F., Williams G. J., Meyer E. F., Jr, Brice M. D., Rodgers J. R., Kennard O., Shimanouchi T., Tasumi M. The Protein Data Bank. A computer-based archival file for macromolecular structures. Eur J Biochem. 1977 Nov 1;80(2):319–324. doi: 10.1111/j.1432-1033.1977.tb11885.x. [DOI] [PubMed] [Google Scholar]
- Bleasby A. J., Wootton J. C. Construction of validated, non-redundant composite protein sequence databases. Protein Eng. 1990 Jan;3(3):153–159. doi: 10.1093/protein/3.3.153. [DOI] [PubMed] [Google Scholar]
- Brown M., Hughey R., Krogh A., Mian I. S., Sjölander K., Haussler D. Using Dirichlet mixture priors to derive hidden Markov models for protein families. Proc Int Conf Intell Syst Mol Biol. 1993;1:47–55. [PubMed] [Google Scholar]
- Bruno W. J. Modeling residue usage in aligned protein sequences via maximum likelihood. Mol Biol Evol. 1996 Dec;13(10):1368–1374. doi: 10.1093/oxfordjournals.molbev.a025583. [DOI] [PubMed] [Google Scholar]
- Cao Y., Adachi J., Janke A., Päbo S., Hasegawa M. Phylogenetic relationships among eutherian orders estimated from inferred sequences of mitochondrial proteins: instability of a tree based on a single gene. J Mol Evol. 1994 Nov;39(5):519–527. doi: 10.1007/BF00173421. [DOI] [PubMed] [Google Scholar]
- Chothia C., Lesk A. M. The relation between the divergence of sequence and structure in proteins. EMBO J. 1986 Apr;5(4):823–826. doi: 10.1002/j.1460-2075.1986.tb04288.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Felsenstein J., Churchill G. A. A Hidden Markov Model approach to variation among sites in rate of evolution. Mol Biol Evol. 1996 Jan;13(1):93–104. doi: 10.1093/oxfordjournals.molbev.a025575. [DOI] [PubMed] [Google Scholar]
- Felsenstein J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol. 1981;17(6):368–376. doi: 10.1007/BF01734359. [DOI] [PubMed] [Google Scholar]
- Friedlander T. P., Regier J. C., Mitter C., Wagner D. L. A nuclear gene for higher level phylogenetics: phosphoenolpyruvate carboxykinase tracks mesozoic-age divergences within Lepidoptera (Insecta). Mol Biol Evol. 1996 Apr;13(4):594–604. doi: 10.1093/oxfordjournals.molbev.a025619. [DOI] [PubMed] [Google Scholar]
- Goldman N. Statistical tests of models of DNA substitution. J Mol Evol. 1993 Feb;36(2):182–198. doi: 10.1007/BF00166252. [DOI] [PubMed] [Google Scholar]
- Goldman N., Thorne J. L., Jones D. T. Using evolutionary trees in protein secondary structure prediction and other comparative sequence analyses. J Mol Biol. 1996 Oct 25;263(2):196–208. doi: 10.1006/jmbi.1996.0569. [DOI] [PubMed] [Google Scholar]
- Goldman N., Yang Z. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol. 1994 Sep;11(5):725–736. doi: 10.1093/oxfordjournals.molbev.a040153. [DOI] [PubMed] [Google Scholar]
- Gotoh O. An improved algorithm for matching biological sequences. J Mol Biol. 1982 Dec 15;162(3):705–708. doi: 10.1016/0022-2836(82)90398-9. [DOI] [PubMed] [Google Scholar]
- Hansen J. E., Lund O., Nielsen J. O., Brunak S., Hansen J. E. Prediction of the secondary structure of HIV-1 gp120. Proteins. 1996 May;25(1):1–11. doi: 10.1002/(SICI)1097-0134(199605)25:1<1::AID-PROT1>3.0.CO;2-N. [DOI] [PubMed] [Google Scholar]
- Hasegawa M., Kishino H., Yano T. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol. 1985;22(2):160–174. doi: 10.1007/BF02101694. [DOI] [PubMed] [Google Scholar]
- Jones D. T., Taylor W. R., Thornton J. M. A mutation data matrix for transmembrane proteins. FEBS Lett. 1994 Feb 21;339(3):269–275. doi: 10.1016/0014-5793(94)80429-x. [DOI] [PubMed] [Google Scholar]
- Jones D. T., Taylor W. R., Thornton J. M. The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci. 1992 Jun;8(3):275–282. doi: 10.1093/bioinformatics/8.3.275. [DOI] [PubMed] [Google Scholar]
- Kabsch W., Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983 Dec;22(12):2577–2637. doi: 10.1002/bip.360221211. [DOI] [PubMed] [Google Scholar]
- Koshi J. M., Goldstein R. A. Context-dependent optimal substitution matrices. Protein Eng. 1995 Jul;8(7):641–645. doi: 10.1093/protein/8.7.641. [DOI] [PubMed] [Google Scholar]
- Lüthy R., McLachlan A. D., Eisenberg D. Secondary structure-based profiles: use of structure-conserving scoring tables in searching protein sequence databases for structural similarities. Proteins. 1991;10(3):229–239. doi: 10.1002/prot.340100307. [DOI] [PubMed] [Google Scholar]
- Naylor G. J., Brown W. M. Structural biology and phylogenetic estimation. Nature. 1997 Aug 7;388(6642):527–528. doi: 10.1038/41460. [DOI] [PubMed] [Google Scholar]
- Overington J., Johnson M. S., Sali A., Blundell T. L. Tertiary structural constraints on protein evolutionary diversity: templates, key residues and structure prediction. Proc Biol Sci. 1990 Aug 22;241(1301):132–145. doi: 10.1098/rspb.1990.0077. [DOI] [PubMed] [Google Scholar]
- Russell R. B., Saqi M. A., Sayle R. A., Bates P. A., Sternberg M. J. Recognition of analogous and homologous protein folds: analysis of sequence and structure conservation. J Mol Biol. 1997 Jun 13;269(3):423–439. doi: 10.1006/jmbi.1997.1019. [DOI] [PubMed] [Google Scholar]
- Teller J. K., Baker P. J., Britton K. L., Engel P. C., Rice D. W., Stillman T. J. Correlation of intron-exon organisation with the three-dimensional structure in glutamate dehydrogenase. Biochim Biophys Acta. 1995 Mar 15;1247(2):231–238. doi: 10.1016/0167-4838(94)00240-h. [DOI] [PubMed] [Google Scholar]
- Thorne J. L., Goldman N., Jones D. T. Combining protein evolution and secondary structure. Mol Biol Evol. 1996 May;13(5):666–673. doi: 10.1093/oxfordjournals.molbev.a025627. [DOI] [PubMed] [Google Scholar]
- Topham C. M., McLeod A., Eisenmenger F., Overington J. P., Johnson M. S., Blundell T. L. Fragment ranking in modelling of protein structure. Conformationally constrained environmental amino acid substitution tables. J Mol Biol. 1993 Jan 5;229(1):194–220. doi: 10.1006/jmbi.1993.1018. [DOI] [PubMed] [Google Scholar]
- Wako H., Blundell T. L. Use of amino acid environment-dependent substitution tables and conformational propensities in structure prediction from aligned sequences of homologous proteins. I. Solvent accessibility classes. J Mol Biol. 1994 May 20;238(5):682–692. doi: 10.1006/jmbi.1994.1329. [DOI] [PubMed] [Google Scholar]
- White J. V., Stultz C. M., Smith T. F. Protein classification by stochastic modeling and optimal filtering of amino-acid sequences. Math Biosci. 1994 Jan;119(1):35–75. doi: 10.1016/0025-5564(94)90004-3. [DOI] [PubMed] [Google Scholar]
- Yang Z. A space-time process model for the evolution of DNA sequences. Genetics. 1995 Feb;139(2):993–1005. doi: 10.1093/genetics/139.2.993. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang Z., Lauder I. J., Lin H. J. Molecular evolution of the hepatitis B virus genome. J Mol Evol. 1995 Nov;41(5):587–596. doi: 10.1007/BF00175817. [DOI] [PubMed] [Google Scholar]
- Yang Z. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J Mol Evol. 1994 Sep;39(3):306–314. doi: 10.1007/BF00160154. [DOI] [PubMed] [Google Scholar]
- Yang Z. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci. 1997 Oct;13(5):555–556. doi: 10.1093/bioinformatics/13.5.555. [DOI] [PubMed] [Google Scholar]
- Yokoyama S., Harry D. E. Molecular phylogeny and evolutionary rates of alcohol dehydrogenases in vertebrates and plants. Mol Biol Evol. 1993 Nov;10(6):1215–1226. doi: 10.1093/oxfordjournals.molbev.a040073. [DOI] [PubMed] [Google Scholar]
- Yokoyama S., Starmer W. T. Phylogeny and evolutionary rates of G protein alpha subunit genes. J Mol Evol. 1992 Sep;35(3):230–238. doi: 10.1007/BF00178599. [DOI] [PubMed] [Google Scholar]