SAGA: sequence alignment by genetic algorithm

C Notredame; D G Higgins

doi:10.1093/nar/24.8.1515

. 1996 Apr 15;24(8):1515–1524. doi: 10.1093/nar/24.8.1515

SAGA: sequence alignment by genetic algorithm.

C Notredame ¹, D G Higgins ¹

PMCID: PMC145823 PMID: 8628686

Abstract

We describe a new approach to multiple sequence alignment using genetic algorithms and an associated software package called SAGA. The method involves evolving a population of alignments in a quasi evolutionary manner and gradually improving the fitness of the population as measured by an objective function which measures multiple alignment quality. SAGA uses an automatic scheduling scheme to control the usage of 22 different operators for combining alignments or mutating them between generations. When used to optimise the well known sums of pairs objective function, SAGA performs better than some of the widely used alternative packages. This is seen with respect to the ability to achieve an optimal solution and with regard to the accuracy of alignment by comparison with reference alignments based on sequences of known tertiary structure. The general attraction of the approach is the ability to optimise any objective function that one can invent.

Full Text

The Full Text of this article is available as a PDF (204.5 KB).

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

Altschul S. F., Carroll R. J., Lipman D. J. Weights for data related by a tree. J Mol Biol. 1989 Jun 20;207(4):647–653. doi: 10.1016/0022-2836(89)90234-9. [DOI] [PubMed] [Google Scholar]
Altschul S. F., Erickson B. W. Optimal sequence alignment using affine gap costs. Bull Math Biol. 1986;48(5-6):603–616. doi: 10.1007/BF02462326. [DOI] [PubMed] [Google Scholar]
Altschul S. F. Gap costs for multiple sequence alignment. J Theor Biol. 1989 Jun 8;138(3):297–309. doi: 10.1016/s0022-5193(89)80196-1. [DOI] [PubMed] [Google Scholar]
Barton G. J., Sternberg M. J. A strategy for the rapid multiple alignment of protein sequences. Confidence levels from tertiary structure comparisons. J Mol Biol. 1987 Nov 20;198(2):327–337. doi: 10.1016/0022-2836(87)90316-0. [DOI] [PubMed] [Google Scholar]
Feng D. F., Doolittle R. F. Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol. 1987;25(4):351–360. doi: 10.1007/BF02603120. [DOI] [PubMed] [Google Scholar]
Gribskov M., McLachlan A. D., Eisenberg D. Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci U S A. 1987 Jul;84(13):4355–4358. doi: 10.1073/pnas.84.13.4355. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gupta S. K., Kececioglu J. D., Schäffer A. A. Improving the practical space and time efficiency of the shortest-paths approach to sum-of-pairs multiple sequence alignment. J Comput Biol. 1995 Fall;2(3):459–472. doi: 10.1089/cmb.1995.2.459. [DOI] [PubMed] [Google Scholar]
Henikoff S., Henikoff J. G. Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A. 1992 Nov 15;89(22):10915–10919. doi: 10.1073/pnas.89.22.10915. [DOI] [PMC free article] [PubMed] [Google Scholar]
Henikoff S. Playing with blocks: some pitfalls of forcing multiple alignments. New Biol. 1991 Dec;3(12):1148–1154. [PubMed] [Google Scholar]
Hirosawa M., Hoshida M., Ishikawa M., Toya T. MASCOT: multiple alignment system for protein sequences based on three-way dynamic programming. Comput Appl Biosci. 1993 Apr;9(2):161–167. doi: 10.1093/bioinformatics/9.2.161. [DOI] [PubMed] [Google Scholar]
Kim J., Pramanik S., Chung M. J. Multiple sequence alignment using simulated annealing. Comput Appl Biosci. 1994 Jul;10(4):419–426. doi: 10.1093/bioinformatics/10.4.419. [DOI] [PubMed] [Google Scholar]
Krogh A., Brown M., Mian I. S., Sjölander K., Haussler D. Hidden Markov models in computational biology. Applications to protein modeling. J Mol Biol. 1994 Feb 4;235(5):1501–1531. doi: 10.1006/jmbi.1994.1104. [DOI] [PubMed] [Google Scholar]
Lipman D. J., Altschul S. F., Kececioglu J. D. A tool for multiple sequence alignment. Proc Natl Acad Sci U S A. 1989 Jun;86(12):4412–4415. doi: 10.1073/pnas.86.12.4412. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pascarella S., Argos P. A data bank merging related protein structures and sequences. Protein Eng. 1992 Mar;5(2):121–137. doi: 10.1093/protein/5.2.121. [DOI] [PubMed] [Google Scholar]
Taylor W. R. A flexible method to align large numbers of biological sequences. J Mol Evol. 1988 Dec;28(1-2):161–169. doi: 10.1007/BF02143508. [DOI] [PubMed] [Google Scholar]
Thompson J. D., Higgins D. G., Gibson T. J. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994 Nov 11;22(22):4673–4680. doi: 10.1093/nar/22.22.4673. [DOI] [PMC free article] [PubMed] [Google Scholar]
Thompson J. D., Higgins D. G., Gibson T. J. Improved sensitivity of profile searches through the use of sequence weights and gap excision. Comput Appl Biosci. 1994 Feb;10(1):19–29. doi: 10.1093/bioinformatics/10.1.19. [DOI] [PubMed] [Google Scholar]
Vingron M., Argos P. Determination of reliable regions in protein sequence alignments. Protein Eng. 1990 Jul;3(7):565–569. doi: 10.1093/protein/3.7.565. [DOI] [PubMed] [Google Scholar]
Waterman M. S., Vingron M. Rapid and accurate estimates of statistical significance for sequence data base searches. Proc Natl Acad Sci U S A. 1994 May 24;91(11):4625–4628. doi: 10.1073/pnas.91.11.4625. [DOI] [PMC free article] [PubMed] [Google Scholar]

[PDF_00854] Altschul S. F., Carroll R. J., Lipman D. J. Weights for data related by a tree. J Mol Biol. 1989 Jun 20;207(4):647–653. doi: 10.1016/0022-2836(89)90234-9. [DOI] [PubMed] [Google Scholar]

[PDF_00847] Altschul S. F., Erickson B. W. Optimal sequence alignment using affine gap costs. Bull Math Biol. 1986;48(5-6):603–616. doi: 10.1007/BF02462326. [DOI] [PubMed] [Google Scholar]

[PDF_00853] Altschul S. F. Gap costs for multiple sequence alignment. J Theor Biol. 1989 Jun 8;138(3):297–309. doi: 10.1016/s0022-5193(89)80196-1. [DOI] [PubMed] [Google Scholar]

[PDF_00824] Barton G. J., Sternberg M. J. A strategy for the rapid multiple alignment of protein sequences. Confidence levels from tertiary structure comparisons. J Mol Biol. 1987 Nov 20;198(2):327–337. doi: 10.1016/0022-2836(87)90316-0. [DOI] [PubMed] [Google Scholar]

[PDF_00820] Feng D. F., Doolittle R. F. Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol. 1987;25(4):351–360. doi: 10.1007/BF02603120. [DOI] [PubMed] [Google Scholar]

[PDF_00878] Gribskov M., McLachlan A. D., Eisenberg D. Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci U S A. 1987 Jul;84(13):4355–4358. doi: 10.1073/pnas.84.13.4355. [DOI] [PMC free article] [PubMed] [Google Scholar]

[PDF_00829] Gupta S. K., Kececioglu J. D., Schäffer A. A. Improving the practical space and time efficiency of the shortest-paths approach to sum-of-pairs multiple sequence alignment. J Comput Biol. 1995 Fall;2(3):459–472. doi: 10.1089/cmb.1995.2.459. [DOI] [PubMed] [Google Scholar]

[PDF_00851] Henikoff S., Henikoff J. G. Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A. 1992 Nov 15;89(22):10915–10919. doi: 10.1073/pnas.89.22.10915. [DOI] [PMC free article] [PubMed] [Google Scholar]

[PDF_00872] Henikoff S. Playing with blocks: some pitfalls of forcing multiple alignments. New Biol. 1991 Dec;3(12):1148–1154. [PubMed] [Google Scholar]

[PDF_00842] Hirosawa M., Hoshida M., Ishikawa M., Toya T. MASCOT: multiple alignment system for protein sequences based on three-way dynamic programming. Comput Appl Biosci. 1993 Apr;9(2):161–167. doi: 10.1093/bioinformatics/9.2.161. [DOI] [PubMed] [Google Scholar]

[PDF_00840] Kim J., Pramanik S., Chung M. J. Multiple sequence alignment using simulated annealing. Comput Appl Biosci. 1994 Jul;10(4):419–426. doi: 10.1093/bioinformatics/10.4.419. [DOI] [PubMed] [Google Scholar]

[PDF_00825] Krogh A., Brown M., Mian I. S., Sjölander K., Haussler D. Hidden Markov models in computational biology. Applications to protein modeling. J Mol Biol. 1994 Feb 4;235(5):1501–1531. doi: 10.1006/jmbi.1994.1104. [DOI] [PubMed] [Google Scholar]

[PDF_00827] Lipman D. J., Altschul S. F., Kececioglu J. D. A tool for multiple sequence alignment. Proc Natl Acad Sci U S A. 1989 Jun;86(12):4412–4415. doi: 10.1073/pnas.86.12.4412. [DOI] [PMC free article] [PubMed] [Google Scholar]

[PDF_00875] Pascarella S., Argos P. A data bank merging related protein structures and sequences. Protein Eng. 1992 Mar;5(2):121–137. doi: 10.1093/protein/5.2.121. [DOI] [PubMed] [Google Scholar]

[PDF_00821] Taylor W. R. A flexible method to align large numbers of biological sequences. J Mol Evol. 1988 Dec;28(1-2):161–169. doi: 10.1007/BF02143508. [DOI] [PubMed] [Google Scholar]

[PDF_00822] Thompson J. D., Higgins D. G., Gibson T. J. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994 Nov 11;22(22):4673–4680. doi: 10.1093/nar/22.22.4673. [DOI] [PMC free article] [PubMed] [Google Scholar]

[PDF_00856] Thompson J. D., Higgins D. G., Gibson T. J. Improved sensitivity of profile searches through the use of sequence weights and gap excision. Comput Appl Biosci. 1994 Feb;10(1):19–29. doi: 10.1093/bioinformatics/10.1.19. [DOI] [PubMed] [Google Scholar]

[PDF_00884] Vingron M., Argos P. Determination of reliable regions in protein sequence alignments. Protein Eng. 1990 Jul;3(7):565–569. doi: 10.1093/protein/3.7.565. [DOI] [PubMed] [Google Scholar]

[PDF_00876] Waterman M. S., Vingron M. Rapid and accurate estimates of statistical significance for sequence data base searches. Proc Natl Acad Sci U S A. 1994 May 24;91(11):4625–4628. doi: 10.1073/pnas.91.11.4625. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

SAGA: sequence alignment by genetic algorithm.

C Notredame

D G Higgins

Abstract

Full Text

Selected References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

SAGA: sequence alignment by genetic algorithm.

C Notredame

D G Higgins

Abstract

Full Text

Selected References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases