An Integrated Sequence-Structure Database incorporating matching mRNA sequence, amino acid sequence and protein three-dimensional structure data

I A Adzhubei; A A Adzhubei; S Neidle

doi:10.1093/nar/26.1.327

. 1998 Jan 1;26(1):327–331. doi: 10.1093/nar/26.1.327

An Integrated Sequence-Structure Database incorporating matching mRNA sequence, amino acid sequence and protein three-dimensional structure data.

I A Adzhubei ¹, A A Adzhubei ¹, S Neidle ¹

PMCID: PMC147252 PMID: 9399866

Abstract

We have constructed a non-homologous database, termed the Integrated Sequence-Structure Database (ISSD) which comprises the coding sequences of genes, amino acid sequences of the corresponding proteins, their secondary structure and straight phi,psi angles assignments, and polypeptide backbone coordinates. Each protein entry in the database holds the alignment of nucleotide sequence, amino acid sequence and the PDB three-dimensional structure data. The nucleotide and amino acid sequences for each entry are selected on the basis of exact matches of the source organism and cell environment. The current version 1.0 of ISSD is available on the WWW at http://www.protein.bio.msu.su/issd/ and includes 107 non-homologous mammalian proteins, of which 80 are human proteins. The database has been used by us for the analysis of synonymous codon usage patterns in mRNA sequences showing their correlation with the three-dimensional structure features in the encoded proteins. Possible ISSD applications include optimisation of protein expression, improvement of the protein structure prediction accuracy, and analysis of evolutionary aspects of the nucleotide sequence-protein structure relationship.

Full Text

The Full Text of this article is available as a PDF (126.0 KB).

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

Adzhubei A. A., Adzhubei I. A., Krasheninnikov I. A., Neidle S. Non-random usage of 'degenerate' codons is related to protein three-dimensional structure. FEBS Lett. 1996 Dec 9;399(1-2):78–82. doi: 10.1016/s0014-5793(96)01287-2. [DOI] [PubMed] [Google Scholar]
Adzhubei A. A., Sternberg M. J. Left-handed polyproline II helices commonly occur in globular proteins. J Mol Biol. 1993 Jan 20;229(2):472–493. doi: 10.1006/jmbi.1993.1047. [DOI] [PubMed] [Google Scholar]
Bernstein F. C., Koetzle T. F., Williams G. J., Meyer E. F., Jr, Brice M. D., Rodgers J. R., Kennard O., Shimanouchi T., Tasumi M. The Protein Data Bank: a computer-based archival file for macromolecular structures. J Mol Biol. 1977 May 25;112(3):535–542. doi: 10.1016/s0022-2836(77)80200-3. [DOI] [PubMed] [Google Scholar]
Brunak S., Engelbrecht J. Protein structure and the sequential structure of mRNA: alpha-helix and beta-sheet signals at the nucleotide level. Proteins. 1996 Jun;25(2):237–252. doi: 10.1002/(SICI)1097-0134(199606)25:2<237::AID-PROT9>3.0.CO;2-E. [DOI] [PubMed] [Google Scholar]
Kabsch W., Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983 Dec;22(12):2577–2637. doi: 10.1002/bip.360221211. [DOI] [PubMed] [Google Scholar]
Krasheninnikov I. A., Komar A. A., Adzhubei I. A. Chastota ispol'zovaniia kodonov v mRNK i kodirovanie domennoi struktury belka. Dokl Akad Nauk SSSR. 1989;305(4):1006–1012. [PubMed] [Google Scholar]
Krasheninnikov I. A., Komar A. A., Adzhubei I. A. Nonuniform size distribution of nascent globin peptides, evidence for pause localization sites, and a contranslational protein-folding model. J Protein Chem. 1991 Oct;10(5):445–453. doi: 10.1007/BF01025472. [DOI] [PubMed] [Google Scholar]
Krasheninnikov I. A., Komar A. A., Adzhubei I. A. Rol' vyrozhdennosti koda v opredelenii puti kotransliatsionnogo svorachivaniia belka. Biokhimiia. 1989 Feb;54(2):187–200. [PubMed] [Google Scholar]
Nakamura Y., Wada K., Wada Y., Doi H., Kanaya S., Gojobori T., Ikemura T. Codon usage tabulated from the international DNA sequence databases. Nucleic Acids Res. 1996 Jan 1;24(1):214–215. doi: 10.1093/nar/24.1.214. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sharp P. M., Tuohy T. M., Mosurski K. R. Codon usage in yeast: cluster analysis clearly differentiates highly and lowly expressed genes. Nucleic Acids Res. 1986 Jul 11;14(13):5125–5143. doi: 10.1093/nar/14.13.5125. [DOI] [PMC free article] [PubMed] [Google Scholar]
Thanaraj T. A., Argos P. Protein secondary structural types are differentially coded on messenger RNA. Protein Sci. 1996 Oct;5(10):1973–1983. doi: 10.1002/pro.5560051003. [DOI] [PMC free article] [PubMed] [Google Scholar]
Thanaraj T. A., Argos P. Ribosome-mediated translational pause and protein domain organization. Protein Sci. 1996 Aug;5(8):1594–1612. doi: 10.1002/pro.5560050814. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wada K., Aota S., Tsuchiya R., Ishibashi F., Gojobori T., Ikemura T. Codon usage tabulated from the GenBank genetic sequence data. Nucleic Acids Res. 1990 Apr 25;18 (Suppl):2367–2411. doi: 10.1093/nar/18.suppl.2367. [DOI] [PMC free article] [PubMed] [Google Scholar]

[PDF_00243] Adzhubei A. A., Adzhubei I. A., Krasheninnikov I. A., Neidle S. Non-random usage of 'degenerate' codons is related to protein three-dimensional structure. FEBS Lett. 1996 Dec 9;399(1-2):78–82. doi: 10.1016/s0014-5793(96)01287-2. [DOI] [PubMed] [Google Scholar]

[PDF_00249] Adzhubei A. A., Sternberg M. J. Left-handed polyproline II helices commonly occur in globular proteins. J Mol Biol. 1993 Jan 20;229(2):472–493. doi: 10.1006/jmbi.1993.1047. [DOI] [PubMed] [Google Scholar]

[PDF_00245] Bernstein F. C., Koetzle T. F., Williams G. J., Meyer E. F., Jr, Brice M. D., Rodgers J. R., Kennard O., Shimanouchi T., Tasumi M. The Protein Data Bank: a computer-based archival file for macromolecular structures. J Mol Biol. 1977 May 25;112(3):535–542. doi: 10.1016/s0022-2836(77)80200-3. [DOI] [PubMed] [Google Scholar]

[PDF_00242] Brunak S., Engelbrecht J. Protein structure and the sequential structure of mRNA: alpha-helix and beta-sheet signals at the nucleotide level. Proteins. 1996 Jun;25(2):237–252. doi: 10.1002/(SICI)1097-0134(199606)25:2<237::AID-PROT9>3.0.CO;2-E. [DOI] [PubMed] [Google Scholar]

[PDF_00248] Kabsch W., Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983 Dec;22(12):2577–2637. doi: 10.1002/bip.360221211. [DOI] [PubMed] [Google Scholar]

[PDF_00236] Krasheninnikov I. A., Komar A. A., Adzhubei I. A. Chastota ispol'zovaniia kodonov v mRNK i kodirovanie domennoi struktury belka. Dokl Akad Nauk SSSR. 1989;305(4):1006–1012. [PubMed] [Google Scholar]

[PDF_00238] Krasheninnikov I. A., Komar A. A., Adzhubei I. A. Nonuniform size distribution of nascent globin peptides, evidence for pause localization sites, and a contranslational protein-folding model. J Protein Chem. 1991 Oct;10(5):445–453. doi: 10.1007/BF01025472. [DOI] [PubMed] [Google Scholar]

[PDF_00234] Krasheninnikov I. A., Komar A. A., Adzhubei I. A. Rol' vyrozhdennosti koda v opredelenii puti kotransliatsionnogo svorachivaniia belka. Biokhimiia. 1989 Feb;54(2):187–200. [PubMed] [Google Scholar]

[PDF_00252] Nakamura Y., Wada K., Wada Y., Doi H., Kanaya S., Gojobori T., Ikemura T. Codon usage tabulated from the international DNA sequence databases. Nucleic Acids Res. 1996 Jan 1;24(1):214–215. doi: 10.1093/nar/24.1.214. [DOI] [PMC free article] [PubMed] [Google Scholar]

[PDF_00250] Sharp P. M., Tuohy T. M., Mosurski K. R. Codon usage in yeast: cluster analysis clearly differentiates highly and lowly expressed genes. Nucleic Acids Res. 1986 Jul 11;14(13):5125–5143. doi: 10.1093/nar/14.13.5125. [DOI] [PMC free article] [PubMed] [Google Scholar]

[PDF_00241] Thanaraj T. A., Argos P. Protein secondary structural types are differentially coded on messenger RNA. Protein Sci. 1996 Oct;5(10):1973–1983. doi: 10.1002/pro.5560051003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[PDF_00240] Thanaraj T. A., Argos P. Ribosome-mediated translational pause and protein domain organization. Protein Sci. 1996 Aug;5(8):1594–1612. doi: 10.1002/pro.5560050814. [DOI] [PMC free article] [PubMed] [Google Scholar]

[PDF_00255] Wada K., Aota S., Tsuchiya R., Ishibashi F., Gojobori T., Ikemura T. Codon usage tabulated from the GenBank genetic sequence data. Nucleic Acids Res. 1990 Apr 25;18 (Suppl):2367–2411. doi: 10.1093/nar/18.suppl.2367. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

An Integrated Sequence-Structure Database incorporating matching mRNA sequence, amino acid sequence and protein three-dimensional structure data.

I A Adzhubei

A A Adzhubei

S Neidle

Abstract

Full Text

Selected References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

An Integrated Sequence-Structure Database incorporating matching mRNA sequence, amino acid sequence and protein three-dimensional structure data.

I A Adzhubei

A A Adzhubei

S Neidle

Abstract

Full Text

Selected References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases