Enhanced graphic matrix analysis of nucleic acid and protein sequences

J V Maizel, Jr; R P Lenk

doi:10.1073/pnas.78.12.7665

. 1981 Dec;78(12):7665–7669. doi: 10.1073/pnas.78.12.7665

Enhanced graphic matrix analysis of nucleic acid and protein sequences.

J V Maizel Jr, R P Lenk

PMCID: PMC349330 PMID: 6801656

Abstract

The enhanced graphic matrix procedure analyzes nucleic acid and amino acid sequences for features of possible biological interest and reveals the spatial patterns of such features. When a sequence is compared to itself the technique shows regions of self-complementarity, direct repeats, and palindromic subsequences. Comparison of two different sequences, exemplified by immunoglobulin kappa light chain genes, by using colored graphic matrices showed domains of similarity, regions of divergence, and features explainable by transpositions. Analysis of mouse constant domain immunoglobulin sequences revealed self-complementary regions that can be used to fold the molecule into a structure consistent with electron microscopic observations. Computer translation of nucleic acid sequences into all possible amino acid sequences followed by graphic matrix analysis provides a way to detect the most likely protein encoding regions and can predict the correct reading frames in sequences in which splicing patterns are not defined. Application of this technique to regions of simian virus 40 and polyoma virus demonstrates the frames of translation and shows the agreement of sequences determined in separate laboratories with different virus isolates. The graphic matrix technique can also be used to assemble fragmentary sequences during determination, to display local variations in base composition, to detect distant evolutionary relationships, and to display intragenic variation in rates of evolution.

Images in this article

Image
on p.7667

Image
on p.7668

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

Fitch W. M. Locating gaps in amino acid sequences to optimize the homology between two proteins. Biochem Genet. 1969 Apr;3(2):99–108. doi: 10.1007/BF00520346. [DOI] [PubMed] [Google Scholar]
Friedmann T., Doolittle R. F., Walter G. Amino acid sequence homology between polyoma and SV40 tumour antigens deduced from nucleotide sequences. Nature. 1978 Jul 20;274(5668):291–293. doi: 10.1038/274291a0. [DOI] [PubMed] [Google Scholar]
Gibbs A. J., McIntyre G. A. The diagram, a method for comparing sequences. Its use with amino acid and nucleotide sequences. Eur J Biochem. 1970 Sep;16(1):1–11. doi: 10.1111/j.1432-1033.1970.tb01046.x. [DOI] [PubMed] [Google Scholar]
Gingeras T. R., Roberts R. J. Steps toward computer analysis of nucleotide sequences. Science. 1980 Sep 19;209(4463):1322–1328. doi: 10.1126/science.6251542. [DOI] [PubMed] [Google Scholar]
Hamlyn P. H., Browniee G. G., Cheng C. C., Gait M. J., Milstein C. Complete sequence of constant and 3' noncoding regions of an immunoglobulin mRNA using the dideoxynucleotide method of RNA sequencing. Cell. 1978 Nov;15(3):1067–1075. doi: 10.1016/0092-8674(78)90290-8. [DOI] [PubMed] [Google Scholar]
Hieter P. A., Max E. E., Seidman J. G., Maizel J. V., Jr, Leder P. Cloned human and mouse kappa immunoglobulin constant and J region genes conserve homology in functional segments. Cell. 1980 Nov;22(1 Pt 1):197–207. doi: 10.1016/0092-8674(80)90168-3. [DOI] [PubMed] [Google Scholar]
Konkel D. A., Maizel J. V., Jr, Leder P. The evolution and sequence comparison of two recently diverged mouse chromosomal beta--globin genes. Cell. 1979 Nov;18(3):865–873. doi: 10.1016/0092-8674(79)90138-7. [DOI] [PubMed] [Google Scholar]
Korn L. J., Queen C. L., Wegman M. N. Computer analysis of nucleic acid regulatory sequences. Proc Natl Acad Sci U S A. 1977 Oct;74(10):4401–4405. doi: 10.1073/pnas.74.10.4401. [DOI] [PMC free article] [PubMed] [Google Scholar]
Max E. E., Maizel J. V., Jr, Leder P. The nucleotide sequence of a 5.5-kilobase DNA segment containing the mouse kappa immunoglobulin J and C region genes. J Biol Chem. 1981 May 25;256(10):5116–5120. [PubMed] [Google Scholar]
Maxam A. M., Gilbert W. A new method for sequencing DNA. Proc Natl Acad Sci U S A. 1977 Feb;74(2):560–564. doi: 10.1073/pnas.74.2.560. [DOI] [PMC free article] [PubMed] [Google Scholar]
McLachlan A. D. Tests for comparing related amino-acid sequences. Cytochrome c and cytochrome c 551 . J Mol Biol. 1971 Oct 28;61(2):409–424. doi: 10.1016/0022-2836(71)90390-1. [DOI] [PubMed] [Google Scholar]
Nishioka Y., Leder P. Organization and complete sequence of identical embryonic and plasmacytoma kappa V-region genes. J Biol Chem. 1980 Apr 25;255(8):3691–3694. [PubMed] [Google Scholar]
Nussinov R., Jacobson A. B. Fast algorithm for predicting the secondary structure of single-stranded RNA. Proc Natl Acad Sci U S A. 1980 Nov;77(11):6309–6313. doi: 10.1073/pnas.77.11.6309. [DOI] [PMC free article] [PubMed] [Google Scholar]
Queen C. L., Korn L. J. Computer analysis of nucleic acids and proteins. Methods Enzymol. 1980;65(1):595–609. doi: 10.1016/s0076-6879(80)65062-9. [DOI] [PubMed] [Google Scholar]
Reddy V. B., Thimmappaya B., Dhar R., Subramanian K. N., Zain B. S., Pan J., Ghosh P. K., Celma M. L., Weissman S. M. The genome of simian virus 40. Science. 1978 May 5;200(4341):494–502. doi: 10.1126/science.205947. [DOI] [PubMed] [Google Scholar]
Sanger F., Nicklen S., Coulson A. R. DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci U S A. 1977 Dec;74(12):5463–5467. doi: 10.1073/pnas.74.12.5463. [DOI] [PMC free article] [PubMed] [Google Scholar]
Seidman J. G., Leder A., Edgell M. H., Polsky F., Tilghman S. M., Tiemeier D. C., Leder P. Multiple related immunoglobulin variable-region genes identified by cloning and sequence analysis. Proc Natl Acad Sci U S A. 1978 Aug;75(8):3881–3885. doi: 10.1073/pnas.75.8.3881. [DOI] [PMC free article] [PubMed] [Google Scholar]
Seidman J. G., Leder A., Nau M., Norman B., Leder P. Antibody diversity. Science. 1978 Oct 6;202(4363):11–17. doi: 10.1126/science.99815. [DOI] [PubMed] [Google Scholar]
Seidman J. G., Max E. E., Leder P. A kappa-immunoglobulin gene is formed by site-specific recombination without further somatic mutation. Nature. 1979 Aug 2;280(5721):370–375. doi: 10.1038/280370a0. [DOI] [PubMed] [Google Scholar]
Soeda E., Arrand J. R., Smolar N., Walsh J. E., Griffin B. E. Coding potential and regulatory signals of the polyoma virus genome. Nature. 1980 Jan 31;283(5746):445–453. doi: 10.1038/283445a0. [DOI] [PubMed] [Google Scholar]
Staden R. A strategy of DNA sequencing employing computer programs. Nucleic Acids Res. 1979 Jun 11;6(7):2601–2610. doi: 10.1093/nar/6.7.2601. [DOI] [PMC free article] [PubMed] [Google Scholar]
Staden R. Further procedures for sequence analysis by computer. Nucleic Acids Res. 1978 Mar;5(3):1013–1016. doi: 10.1093/nar/5.3.1013. [DOI] [PMC free article] [PubMed] [Google Scholar]
Staden R. Sequence data handling by computer. Nucleic Acids Res. 1977 Nov;4(11):4037–4051. doi: 10.1093/nar/4.11.4037. [DOI] [PMC free article] [PubMed] [Google Scholar]
Studnicka G. M., Eiserling F. A., Lake J. A. A unique secondary folding pattern for 5S RNA corresponds to the lowest energy homologous secondary structure in 17 different prokaryotes. Nucleic Acids Res. 1981 Apr 24;9(8):1885–1904. doi: 10.1093/nar/9.8.1885. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tinoco I., Jr, Uhlenbeck O. C., Levine M. D. Estimation of secondary structure in ribonucleic acids. Nature. 1971 Apr 9;230(5293):362–367. doi: 10.1038/230362a0. [DOI] [PubMed] [Google Scholar]
Zuker M., Stiegler P. Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res. 1981 Jan 10;9(1):133–148. doi: 10.1093/nar/9.1.133. [DOI] [PMC free article] [PubMed] [Google Scholar]

[OCR_00781] Fitch W. M. Locating gaps in amino acid sequences to optimize the homology between two proteins. Biochem Genet. 1969 Apr;3(2):99–108. doi: 10.1007/BF00520346. [DOI] [PubMed] [Google Scholar]

[OCR_00831] Friedmann T., Doolittle R. F., Walter G. Amino acid sequence homology between polyoma and SV40 tumour antigens deduced from nucleotide sequences. Nature. 1978 Jul 20;274(5668):291–293. doi: 10.1038/274291a0. [DOI] [PubMed] [Google Scholar]

[OCR_00780] Gibbs A. J., McIntyre G. A. The diagram, a method for comparing sequences. Its use with amino acid and nucleotide sequences. Eur J Biochem. 1970 Sep;16(1):1–11. doi: 10.1111/j.1432-1033.1970.tb01046.x. [DOI] [PubMed] [Google Scholar]

[OCR_00771] Gingeras T. R., Roberts R. J. Steps toward computer analysis of nucleotide sequences. Science. 1980 Sep 19;209(4463):1322–1328. doi: 10.1126/science.6251542. [DOI] [PubMed] [Google Scholar]

[OCR_00818] Hamlyn P. H., Browniee G. G., Cheng C. C., Gait M. J., Milstein C. Complete sequence of constant and 3' noncoding regions of an immunoglobulin mRNA using the dideoxynucleotide method of RNA sequencing. Cell. 1978 Nov;15(3):1067–1075. doi: 10.1016/0092-8674(78)90290-8. [DOI] [PubMed] [Google Scholar]

[OCR_00789] Hieter P. A., Max E. E., Seidman J. G., Maizel J. V., Jr, Leder P. Cloned human and mouse kappa immunoglobulin constant and J region genes conserve homology in functional segments. Cell. 1980 Nov;22(1 Pt 1):197–207. doi: 10.1016/0092-8674(80)90168-3. [DOI] [PubMed] [Google Scholar]

[OCR_00785] Konkel D. A., Maizel J. V., Jr, Leder P. The evolution and sequence comparison of two recently diverged mouse chromosomal beta--globin genes. Cell. 1979 Nov;18(3):865–873. doi: 10.1016/0092-8674(79)90138-7. [DOI] [PubMed] [Google Scholar]

[OCR_00763] Korn L. J., Queen C. L., Wegman M. N. Computer analysis of nucleic acid regulatory sequences. Proc Natl Acad Sci U S A. 1977 Oct;74(10):4401–4405. doi: 10.1073/pnas.74.10.4401. [DOI] [PMC free article] [PubMed] [Google Scholar]

[OCR_00793] Max E. E., Maizel J. V., Jr, Leder P. The nucleotide sequence of a 5.5-kilobase DNA segment containing the mouse kappa immunoglobulin J and C region genes. J Biol Chem. 1981 May 25;256(10):5116–5120. [PubMed] [Google Scholar]

[OCR_00751] Maxam A. M., Gilbert W. A new method for sequencing DNA. Proc Natl Acad Sci U S A. 1977 Feb;74(2):560–564. doi: 10.1073/pnas.74.2.560. [DOI] [PMC free article] [PubMed] [Google Scholar]

[OCR_00783] McLachlan A. D. Tests for comparing related amino-acid sequences. Cytochrome c and cytochrome c 551 . J Mol Biol. 1971 Oct 28;61(2):409–424. doi: 10.1016/0022-2836(71)90390-1. [DOI] [PubMed] [Google Scholar]

[OCR_00808] Nishioka Y., Leder P. Organization and complete sequence of identical embryonic and plasmacytoma kappa V-region genes. J Biol Chem. 1980 Apr 25;255(8):3691–3694. [PubMed] [Google Scholar]

[OCR_00797] Nussinov R., Jacobson A. B. Fast algorithm for predicting the secondary structure of single-stranded RNA. Proc Natl Acad Sci U S A. 1980 Nov;77(11):6309–6313. doi: 10.1073/pnas.77.11.6309. [DOI] [PMC free article] [PubMed] [Google Scholar]

[OCR_00767] Queen C. L., Korn L. J. Computer analysis of nucleic acids and proteins. Methods Enzymol. 1980;65(1):595–609. doi: 10.1016/s0076-6879(80)65062-9. [DOI] [PubMed] [Google Scholar]

[OCR_00822] Reddy V. B., Thimmappaya B., Dhar R., Subramanian K. N., Zain B. S., Pan J., Ghosh P. K., Celma M. L., Weissman S. M. The genome of simian virus 40. Science. 1978 May 5;200(4341):494–502. doi: 10.1126/science.205947. [DOI] [PubMed] [Google Scholar]

[OCR_00755] Sanger F., Nicklen S., Coulson A. R. DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci U S A. 1977 Dec;74(12):5463–5467. doi: 10.1073/pnas.74.12.5463. [DOI] [PMC free article] [PubMed] [Google Scholar]

[OCR_00803] Seidman J. G., Leder A., Edgell M. H., Polsky F., Tilghman S. M., Tiemeier D. C., Leder P. Multiple related immunoglobulin variable-region genes identified by cloning and sequence analysis. Proc Natl Acad Sci U S A. 1978 Aug;75(8):3881–3885. doi: 10.1073/pnas.75.8.3881. [DOI] [PMC free article] [PubMed] [Google Scholar]

[OCR_00814] Seidman J. G., Leder A., Nau M., Norman B., Leder P. Antibody diversity. Science. 1978 Oct 6;202(4363):11–17. doi: 10.1126/science.99815. [DOI] [PubMed] [Google Scholar]

[OCR_00810] Seidman J. G., Max E. E., Leder P. A kappa-immunoglobulin gene is formed by site-specific recombination without further somatic mutation. Nature. 1979 Aug 2;280(5721):370–375. doi: 10.1038/280370a0. [DOI] [PubMed] [Google Scholar]

[OCR_00827] Soeda E., Arrand J. R., Smolar N., Walsh J. E., Griffin B. E. Coding potential and regulatory signals of the polyoma virus genome. Nature. 1980 Jan 31;283(5746):445–453. doi: 10.1038/283445a0. [DOI] [PubMed] [Google Scholar]

[OCR_00761] Staden R. A strategy of DNA sequencing employing computer programs. Nucleic Acids Res. 1979 Jun 11;6(7):2601–2610. doi: 10.1093/nar/6.7.2601. [DOI] [PMC free article] [PubMed] [Google Scholar]

[OCR_00760] Staden R. Further procedures for sequence analysis by computer. Nucleic Acids Res. 1978 Mar;5(3):1013–1016. doi: 10.1093/nar/5.3.1013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[OCR_00759] Staden R. Sequence data handling by computer. Nucleic Acids Res. 1977 Nov;4(11):4037–4051. doi: 10.1093/nar/4.11.4037. [DOI] [PMC free article] [PubMed] [Google Scholar]

[OCR_00837] Studnicka G. M., Eiserling F. A., Lake J. A. A unique secondary folding pattern for 5S RNA corresponds to the lowest energy homologous secondary structure in 17 different prokaryotes. Nucleic Acids Res. 1981 Apr 24;9(8):1885–1904. doi: 10.1093/nar/9.8.1885. [DOI] [PMC free article] [PubMed] [Google Scholar]

[OCR_00776] Tinoco I., Jr, Uhlenbeck O. C., Levine M. D. Estimation of secondary structure in ribonucleic acids. Nature. 1971 Apr 9;230(5293):362–367. doi: 10.1038/230362a0. [DOI] [PubMed] [Google Scholar]

[OCR_00801] Zuker M., Stiegler P. Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res. 1981 Jan 10;9(1):133–148. doi: 10.1093/nar/9.1.133. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Enhanced graphic matrix analysis of nucleic acid and protein sequences.

J V Maizel Jr

R P Lenk

Abstract

Full text

Images in this article

Selected References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Enhanced graphic matrix analysis of nucleic acid and protein sequences.

J V Maizel Jr

R P Lenk

Abstract

Full text

Images in this article

Selected References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases