Abstract
Four methods for weighting aligned biological sequences have recently appeared that differ mathematically, philosophically, and in their results. Thus, while there is consensus about the need to weight sequences, the method to use is contentious. A geometric analysis based on a continuous sequence space is presented that provides a common framework in which to compare the methods. It is concluded that there are two "best" methods. When the sequences are known to be phylogenetically related and a tree can be generated without introducing excessive stress into the data, the method of Altschul et al. [Altschul, S. F., Carroll, R. J. & Lipman, D. J. (1989) J. Mol. Biol. 207, 647-653] is appropriate. When the sequences are not known to be phylogenetically related or a tree cannot be produced without unduly distorting the distances between the sequences, a modification of the method of Sibbald and Argos [Sibbald, P. R. & Argos, P. (1990) J. Mol. Biol. 216, 813-818] is preferable.
Full text
PDFSelected References
These references are in PubMed. This may not be the complete list of references from this article.
- Altschul S. F., Carroll R. J., Lipman D. J. Weights for data related by a tree. J Mol Biol. 1989 Jun 20;207(4):647–653. doi: 10.1016/0022-2836(89)90234-9. [DOI] [PubMed] [Google Scholar]
- Altschul S. F., Lipman D. J. Equal animals. Nature. 1990 Dec 6;348(6301):493–494. doi: 10.1038/348493c0. [DOI] [PubMed] [Google Scholar]
- Bandelt H. J., Dress A. W. Weak hierarchies associated with similarity measures--an additive clustering technique. Bull Math Biol. 1989;51(1):133–166. doi: 10.1007/BF02458841. [DOI] [PubMed] [Google Scholar]
- Barton G. J., Sternberg M. J. A strategy for the rapid multiple alignment of protein sequences. Confidence levels from tertiary structure comparisons. J Mol Biol. 1987 Nov 20;198(2):327–337. doi: 10.1016/0022-2836(87)90316-0. [DOI] [PubMed] [Google Scholar]
- Corpet F. Multiple sequence alignment with hierarchical clustering. Nucleic Acids Res. 1988 Nov 25;16(22):10881–10890. doi: 10.1093/nar/16.22.10881. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eigen M., Winkler-Oswatitsch R., Dress A. Statistical geometry in sequence space: a method of quantitative comparative sequence analysis. Proc Natl Acad Sci U S A. 1988 Aug;85(16):5913–5917. doi: 10.1073/pnas.85.16.5913. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Felsenstein J. Maximum-likelihood estimation of evolutionary trees from continuous characters. Am J Hum Genet. 1973 Sep;25(5):471–492. [PMC free article] [PubMed] [Google Scholar]
- Gribskov M., McLachlan A. D., Eisenberg D. Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci U S A. 1987 Jul;84(13):4355–4358. doi: 10.1073/pnas.84.13.4355. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hein J. Unified approach to alignment and phylogenies. Methods Enzymol. 1990;183:626–645. doi: 10.1016/0076-6879(90)83041-7. [DOI] [PubMed] [Google Scholar]
- Higgins D. G. Sequence ordinations: a multivariate analysis approach to analysing large sequence data sets. Comput Appl Biosci. 1992 Feb;8(1):15–22. doi: 10.1093/bioinformatics/8.1.15. [DOI] [PubMed] [Google Scholar]
- Higgins D. G., Sharp P. M. Fast and sensitive multiple sequence alignments on a microcomputer. Comput Appl Biosci. 1989 Apr;5(2):151–153. doi: 10.1093/bioinformatics/5.2.151. [DOI] [PubMed] [Google Scholar]
- Hogeweg P., Hesper B. The alignment of sets of sequences and the construction of phyletic trees: an integrated method. J Mol Evol. 1984;20(2):175–186. doi: 10.1007/BF02257378. [DOI] [PubMed] [Google Scholar]
- Li W. H. Simple method for constructing phylogenetic trees from distance matrices. Proc Natl Acad Sci U S A. 1981 Feb;78(2):1085–1089. doi: 10.1073/pnas.78.2.1085. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lipman D. J., Altschul S. F., Kececioglu J. D. A tool for multiple sequence alignment. Proc Natl Acad Sci U S A. 1989 Jun;86(12):4412–4415. doi: 10.1073/pnas.86.12.4412. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sander C., Schneider R. Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins. 1991;9(1):56–68. doi: 10.1002/prot.340090107. [DOI] [PubMed] [Google Scholar]
- Sibbald P. R., Argos P. Weighting aligned protein or nucleic acid sequences to correct for unequal representation. J Mol Biol. 1990 Dec 20;216(4):813–818. doi: 10.1016/S0022-2836(99)80003-5. [DOI] [PubMed] [Google Scholar]
- Taylor W. R. A flexible method to align large numbers of biological sequences. J Mol Evol. 1988 Dec;28(1-2):161–169. doi: 10.1007/BF02143508. [DOI] [PubMed] [Google Scholar]
- Vingron M., Argos P. A fast and sensitive multiple sequence alignment algorithm. Comput Appl Biosci. 1989 Apr;5(2):115–121. doi: 10.1093/bioinformatics/5.2.115. [DOI] [PubMed] [Google Scholar]
- Zvelebil M. J., Barton G. J., Taylor W. R., Sternberg M. J. Prediction of protein secondary structure and active sites using the alignment of homologous sequences. J Mol Biol. 1987 Jun 20;195(4):957–961. doi: 10.1016/0022-2836(87)90501-8. [DOI] [PubMed] [Google Scholar]
- van Heel M. A new family of powerful multivariate statistical sequence analysis techniques. J Mol Biol. 1991 Aug 20;220(4):877–887. doi: 10.1016/0022-2836(91)90360-i. [DOI] [PubMed] [Google Scholar]