CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice

J D Thompson; D G Higgins; T J Gibson

doi:10.1093/nar/22.22.4673

. 1994 Nov 11;22(22):4673–4680. doi: 10.1093/nar/22.22.4673

CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.

J D Thompson ¹, D G Higgins ¹, T J Gibson ¹

PMCID: PMC308517 PMID: 7984417

Abstract

The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved for the alignment of divergent protein sequences. Firstly, individual weights are assigned to each sequence in a partial alignment in order to down-weight near-duplicate sequences and up-weight the most divergent ones. Secondly, amino acid substitution matrices are varied at different alignment stages according to the divergence of the sequences to be aligned. Thirdly, residue-specific gap penalties and locally reduced gap penalties in hydrophilic regions encourage new gaps in potential loop regions rather than regular secondary structure. Fourthly, positions in early alignments where gaps have been opened receive locally reduced gap penalties to encourage the opening up of new gaps at these positions. These modifications are incorporated into a new program, CLUSTAL W which is freely available.

Images in this article

Image
on p.4679

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

Altschul S. F. Gap costs for multiple sequence alignment. J Theor Biol. 1989 Jun 8;138(3):297–309. doi: 10.1016/s0022-5193(89)80196-1. [DOI] [PubMed] [Google Scholar]
Bairoch A., Boeckmann B. The SWISS-PROT protein sequence data bank. Nucleic Acids Res. 1992 May 11;20 (Suppl):2019–2022. doi: 10.1093/nar/20.suppl.2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
Barton G. J., Sternberg M. J. A strategy for the rapid multiple alignment of protein sequences. Confidence levels from tertiary structure comparisons. J Mol Biol. 1987 Nov 20;198(2):327–337. doi: 10.1016/0022-2836(87)90316-0. [DOI] [PubMed] [Google Scholar]
Bashford D., Chothia C., Lesk A. M. Determinants of a protein fold. Unique features of the globin amino acid sequences. J Mol Biol. 1987 Jul 5;196(1):199–216. doi: 10.1016/0022-2836(87)90521-3. [DOI] [PubMed] [Google Scholar]
Bashford D., Chothia C., Lesk A. M. Determinants of a protein fold. Unique features of the globin amino acid sequences. J Mol Biol. 1987 Jul 5;196(1):199–216. doi: 10.1016/0022-2836(87)90521-3. [DOI] [PubMed] [Google Scholar]
Devereux J., Haeberli P., Smithies O. A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res. 1984 Jan 11;12(1 Pt 1):387–395. doi: 10.1093/nar/12.1part1.387. [DOI] [PMC free article] [PubMed] [Google Scholar]
Feng D. F., Doolittle R. F. Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol. 1987;25(4):351–360. doi: 10.1007/BF02603120. [DOI] [PubMed] [Google Scholar]
Gotoh O. Optimal alignment between groups of sequences and its application to multiple sequence alignment. Comput Appl Biosci. 1993 Jun;9(3):361–370. doi: 10.1093/bioinformatics/9.3.361. [DOI] [PubMed] [Google Scholar]
Henikoff S., Henikoff J. G. Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A. 1992 Nov 15;89(22):10915–10919. doi: 10.1073/pnas.89.22.10915. [DOI] [PMC free article] [PubMed] [Google Scholar]
Higgins D. G., Bleasby A. J., Fuchs R. CLUSTAL V: improved software for multiple sequence alignment. Comput Appl Biosci. 1992 Apr;8(2):189–191. doi: 10.1093/bioinformatics/8.2.189. [DOI] [PubMed] [Google Scholar]
Higgins D. G., Sharp P. M. CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. Gene. 1988 Dec 15;73(1):237–244. doi: 10.1016/0378-1119(88)90330-7. [DOI] [PubMed] [Google Scholar]
Higgins D. G., Sharp P. M. Fast and sensitive multiple sequence alignments on a microcomputer. Comput Appl Biosci. 1989 Apr;5(2):151–153. doi: 10.1093/bioinformatics/5.2.151. [DOI] [PubMed] [Google Scholar]
Jones D. T., Taylor W. R., Thornton J. M. A mutation data matrix for transmembrane proteins. FEBS Lett. 1994 Feb 21;339(3):269–275. doi: 10.1016/0014-5793(94)80429-x. [DOI] [PubMed] [Google Scholar]
Kabsch W., Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983 Dec;22(12):2577–2637. doi: 10.1002/bip.360221211. [DOI] [PubMed] [Google Scholar]
Kimura M. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol. 1980 Dec;16(2):111–120. doi: 10.1007/BF01731581. [DOI] [PubMed] [Google Scholar]
Krogh A., Brown M., Mian I. S., Sjölander K., Haussler D. Hidden Markov models in computational biology. Applications to protein modeling. J Mol Biol. 1994 Feb 4;235(5):1501–1531. doi: 10.1006/jmbi.1994.1104. [DOI] [PubMed] [Google Scholar]
Lawrence C. E., Altschul S. F., Boguski M. S., Liu J. S., Neuwald A. F., Wootton J. C. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science. 1993 Oct 8;262(5131):208–214. doi: 10.1126/science.8211139. [DOI] [PubMed] [Google Scholar]
Lipman D. J., Altschul S. F., Kececioglu J. D. A tool for multiple sequence alignment. Proc Natl Acad Sci U S A. 1989 Jun;86(12):4412–4415. doi: 10.1073/pnas.86.12.4412. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lukashin A. V., Engelbrecht J., Brunak S. Multiple alignment using simulated annealing: branch point definition in human mRNA splicing. Nucleic Acids Res. 1992 May 25;20(10):2511–2516. doi: 10.1093/nar/20.10.2511. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lüthy R., Xenarios I., Bucher P. Improving the sensitivity of the sequence profile method. Protein Sci. 1994 Jan;3(1):139–146. doi: 10.1002/pro.5560030118. [DOI] [PMC free article] [PubMed] [Google Scholar]
Musacchio A., Gibson T., Lehto V. P., Saraste M. SH3--an abundant protein domain in search of a function. FEBS Lett. 1992 Jul 27;307(1):55–61. doi: 10.1016/0014-5793(92)80901-r. [DOI] [PubMed] [Google Scholar]
Musacchio A., Noble M., Pauptit R., Wierenga R., Saraste M. Crystal structure of a Src-homology 3 (SH3) domain. Nature. 1992 Oct 29;359(6398):851–855. doi: 10.1038/359851a0. [DOI] [PubMed] [Google Scholar]
Myers E. W., Miller W. Optimal alignments in linear space. Comput Appl Biosci. 1988 Mar;4(1):11–17. doi: 10.1093/bioinformatics/4.1.11. [DOI] [PubMed] [Google Scholar]
Needleman S. B., Wunsch C. D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 1970 Mar;48(3):443–453. doi: 10.1016/0022-2836(70)90057-4. [DOI] [PubMed] [Google Scholar]
Noble M. E., Musacchio A., Saraste M., Courtneidge S. A., Wierenga R. K. Crystal structure of the SH3 domain in human Fyn; comparison of the three-dimensional structures of SH3 domains in tyrosine kinases and spectrin. EMBO J. 1993 Jul;12(7):2617–2624. doi: 10.2210/pdb1shf/pdb. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pascarella S., Argos P. Analysis of insertions/deletions in protein structures. J Mol Biol. 1992 Mar 20;224(2):461–471. doi: 10.1016/0022-2836(92)91008-d. [DOI] [PubMed] [Google Scholar]
Pearson W. R., Lipman D. J. Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A. 1988 Apr;85(8):2444–2448. doi: 10.1073/pnas.85.8.2444. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sainsard-Chanet A., Begel O., Belcour L. DNA deletion of mitochondrial introns is correlated with the process of senescence in Podospora anserina. J Mol Biol. 1993 Nov 5;234(1):1–7. doi: 10.1006/jmbi.1993.1558. [DOI] [PubMed] [Google Scholar]
Saitou N., Nei M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol. 1987 Jul;4(4):406–425. doi: 10.1093/oxfordjournals.molbev.a040454. [DOI] [PubMed] [Google Scholar]
Smith R. F., Smith T. F. Pattern-induced multi-sequence alignment (PIMA) algorithm employing secondary structure-dependent gap penalties for use in comparative protein modelling. Protein Eng. 1992 Jan;5(1):35–41. doi: 10.1093/protein/5.1.35. [DOI] [PubMed] [Google Scholar]
Smith T. F., Waterman M. S., Fitch W. M. Comparative biosequence metrics. J Mol Evol. 1981;18(1):38–46. doi: 10.1007/BF01733210. [DOI] [PubMed] [Google Scholar]
Thompson J. D., Higgins D. G., Gibson T. J. Improved sensitivity of profile searches through the use of sequence weights and gap excision. Comput Appl Biosci. 1994 Feb;10(1):19–29. doi: 10.1093/bioinformatics/10.1.19. [DOI] [PubMed] [Google Scholar]
Vingron M., Sibbald P. R. Weighting in sequence space: a comparison of methods in terms of generalized sequences. Proc Natl Acad Sci U S A. 1993 Oct 1;90(19):8777–8781. doi: 10.1073/pnas.90.19.8777. [DOI] [PMC free article] [PubMed] [Google Scholar]

[OCR_01094] Altschul S. F. Gap costs for multiple sequence alignment. J Theor Biol. 1989 Jun 8;138(3):297–309. doi: 10.1016/s0022-5193(89)80196-1. [DOI] [PubMed] [Google Scholar]

[OCR_01066] Bairoch A., Boeckmann B. The SWISS-PROT protein sequence data bank. Nucleic Acids Res. 1992 May 11;20 (Suppl):2019–2022. doi: 10.1093/nar/20.suppl.2019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[OCR_01091] Barton G. J., Sternberg M. J. A strategy for the rapid multiple alignment of protein sequences. Confidence levels from tertiary structure comparisons. J Mol Biol. 1987 Nov 20;198(2):327–337. doi: 10.1016/0022-2836(87)90316-0. [DOI] [PubMed] [Google Scholar]

[OCR_01026] Bashford D., Chothia C., Lesk A. M. Determinants of a protein fold. Unique features of the globin amino acid sequences. J Mol Biol. 1987 Jul 5;196(1):199–216. doi: 10.1016/0022-2836(87)90521-3. [DOI] [PubMed] [Google Scholar]

[OCR_01014] Bashford D., Chothia C., Lesk A. M. Determinants of a protein fold. Unique features of the globin amino acid sequences. J Mol Biol. 1987 Jul 5;196(1):199–216. doi: 10.1016/0022-2836(87)90521-3. [DOI] [PubMed] [Google Scholar]

[OCR_01042] Devereux J., Haeberli P., Smithies O. A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res. 1984 Jan 11;12(1 Pt 1):387–395. doi: 10.1093/nar/12.1part1.387. [DOI] [PMC free article] [PubMed] [Google Scholar]

[OCR_01075] Feng D. F., Doolittle R. F. Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol. 1987;25(4):351–360. doi: 10.1007/BF02603120. [DOI] [PubMed] [Google Scholar]

[OCR_01092] Gotoh O. Optimal alignment between groups of sequences and its application to multiple sequence alignment. Comput Appl Biosci. 1993 Jun;9(3):361–370. doi: 10.1093/bioinformatics/9.3.361. [DOI] [PubMed] [Google Scholar]

[OCR_01083] Henikoff S., Henikoff J. G. Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A. 1992 Nov 15;89(22):10915–10919. doi: 10.1073/pnas.89.22.10915. [DOI] [PMC free article] [PubMed] [Google Scholar]

[OCR_01007] Higgins D. G., Bleasby A. J., Fuchs R. CLUSTAL V: improved software for multiple sequence alignment. Comput Appl Biosci. 1992 Apr;8(2):189–191. doi: 10.1093/bioinformatics/8.2.189. [DOI] [PubMed] [Google Scholar]

[OCR_01003] Higgins D. G., Sharp P. M. CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. Gene. 1988 Dec 15;73(1):237–244. doi: 10.1016/0378-1119(88)90330-7. [DOI] [PubMed] [Google Scholar]

[OCR_01005] Higgins D. G., Sharp P. M. Fast and sensitive multiple sequence alignments on a microcomputer. Comput Appl Biosci. 1989 Apr;5(2):151–153. doi: 10.1093/bioinformatics/5.2.151. [DOI] [PubMed] [Google Scholar]

[OCR_01062] Jones D. T., Taylor W. R., Thornton J. M. A mutation data matrix for transmembrane proteins. FEBS Lett. 1994 Feb 21;339(3):269–275. doi: 10.1016/0014-5793(94)80429-x. [DOI] [PubMed] [Google Scholar]

[OCR_01071] Kabsch W., Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983 Dec;22(12):2577–2637. doi: 10.1002/bip.360221211. [DOI] [PubMed] [Google Scholar]

[OCR_01048] Kimura M. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol. 1980 Dec;16(2):111–120. doi: 10.1007/BF01731581. [DOI] [PubMed] [Google Scholar]

[OCR_01058] Krogh A., Brown M., Mian I. S., Sjölander K., Haussler D. Hidden Markov models in computational biology. Applications to protein modeling. J Mol Biol. 1994 Feb 4;235(5):1501–1531. doi: 10.1006/jmbi.1994.1104. [DOI] [PubMed] [Google Scholar]

[OCR_01100] Lawrence C. E., Altschul S. F., Boguski M. S., Liu J. S., Neuwald A. F., Wootton J. C. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science. 1993 Oct 8;262(5131):208–214. doi: 10.1126/science.8211139. [DOI] [PubMed] [Google Scholar]

[OCR_01087] Lipman D. J., Altschul S. F., Kececioglu J. D. A tool for multiple sequence alignment. Proc Natl Acad Sci U S A. 1989 Jun;86(12):4412–4415. doi: 10.1073/pnas.86.12.4412. [DOI] [PMC free article] [PubMed] [Google Scholar]

[OCR_01096] Lukashin A. V., Engelbrecht J., Brunak S. Multiple alignment using simulated annealing: branch point definition in human mRNA splicing. Nucleic Acids Res. 1992 May 25;20(10):2511–2516. doi: 10.1093/nar/20.10.2511. [DOI] [PMC free article] [PubMed] [Google Scholar]

[OCR_01002] Lüthy R., Xenarios I., Bucher P. Improving the sensitivity of the sequence profile method. Protein Sci. 1994 Jan;3(1):139–146. doi: 10.1002/pro.5560030118. [DOI] [PMC free article] [PubMed] [Google Scholar]

[OCR_01018] Musacchio A., Gibson T., Lehto V. P., Saraste M. SH3--an abundant protein domain in search of a function. FEBS Lett. 1992 Jul 27;307(1):55–61. doi: 10.1016/0014-5793(92)80901-r. [DOI] [PubMed] [Google Scholar]

[OCR_01022] Musacchio A., Noble M., Pauptit R., Wierenga R., Saraste M. Crystal structure of a Src-homology 3 (SH3) domain. Nature. 1992 Oct 29;359(6398):851–855. doi: 10.1038/359851a0. [DOI] [PubMed] [Google Scholar]

[OCR_01030] Myers E. W., Miller W. Optimal alignments in linear space. Comput Appl Biosci. 1988 Mar;4(1):11–17. doi: 10.1093/bioinformatics/4.1.11. [DOI] [PubMed] [Google Scholar]

[OCR_01077] Needleman S. B., Wunsch C. D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 1970 Mar;48(3):443–453. doi: 10.1016/0022-2836(70)90057-4. [DOI] [PubMed] [Google Scholar]

[OCR_01067] Noble M. E., Musacchio A., Saraste M., Courtneidge S. A., Wierenga R. K. Crystal structure of the SH3 domain in human Fyn; comparison of the three-dimensional structures of SH3 domains in tyrosine kinases and spectrin. EMBO J. 1993 Jul;12(7):2617–2624. doi: 10.2210/pdb1shf/pdb. [DOI] [PMC free article] [PubMed] [Google Scholar]

[OCR_01105] Pascarella S., Argos P. Analysis of insertions/deletions in protein structures. J Mol Biol. 1992 Mar 20;224(2):461–471. doi: 10.1016/0022-2836(92)91008-d. [DOI] [PubMed] [Google Scholar]

[OCR_01038] Pearson W. R., Lipman D. J. Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A. 1988 Apr;85(8):2444–2448. doi: 10.1073/pnas.85.8.2444. [DOI] [PMC free article] [PubMed] [Google Scholar]

[OCR_01104] Sainsard-Chanet A., Begel O., Belcour L. DNA deletion of mitochondrial introns is correlated with the process of senescence in Podospora anserina. J Mol Biol. 1993 Nov 5;234(1):1–7. doi: 10.1006/jmbi.1993.1558. [DOI] [PubMed] [Google Scholar]

[OCR_01012] Saitou N., Nei M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol. 1987 Jul;4(4):406–425. doi: 10.1093/oxfordjournals.molbev.a040454. [DOI] [PubMed] [Google Scholar]

[OCR_01056] Smith R. F., Smith T. F. Pattern-induced multi-sequence alignment (PIMA) algorithm employing secondary structure-dependent gap penalties for use in comparative protein modelling. Protein Eng. 1992 Jan;5(1):35–41. doi: 10.1093/protein/5.1.35. [DOI] [PubMed] [Google Scholar]

[OCR_01034] Smith T. F., Waterman M. S., Fitch W. M. Comparative biosequence metrics. J Mol Evol. 1981;18(1):38–46. doi: 10.1007/BF01733210. [DOI] [PubMed] [Google Scholar]

[OCR_01116] Thompson J. D., Higgins D. G., Gibson T. J. Improved sensitivity of profile searches through the use of sequence weights and gap excision. Comput Appl Biosci. 1994 Feb;10(1):19–29. doi: 10.1093/bioinformatics/10.1.19. [DOI] [PubMed] [Google Scholar]

[OCR_01112] Vingron M., Sibbald P. R. Weighting in sequence space: a comparison of methods in terms of generalized sequences. Proc Natl Acad Sci U S A. 1993 Oct 1;90(19):8777–8781. doi: 10.1073/pnas.90.19.8777. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.

J D Thompson

D G Higgins

T J Gibson

Abstract

Full text

Images in this article

Selected References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.

J D Thompson

D G Higgins

T J Gibson

Abstract

Full text

Images in this article

Selected References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases