Analysis of protein chameleon sequence characteristics

Amine Ghozlane; Agnel Praveen Joseph; Aurelie Bornot; Alexandre G de Brevern

doi:10.6026/97320630003367

. 2009 May 4;3(9):367–369. doi: 10.6026/97320630003367

Analysis of protein chameleon sequence characteristics

Amine Ghozlane ¹, Agnel Praveen Joseph ^1,², Aurelie Bornot ¹, Alexandre G de Brevern ^1,^*

PMCID: PMC2732029 PMID: 19759809

Abstract

Conversion of local structural state of a protein from an α-helix to a β-strand is usually associated with a major change in the tertiary structure. Similar changes were observed during the self assembly of amyloidogenic proteins to form fibrils, which are implicated in severe diseases conditions, e.g., Alzheimer disease. Studies have emphasized that certain protein sequence fragments known as chameleon sequences do not have a strong preference for either helical or the extended conformations. Surprisingly, the information on the local sequence neighborhood can be used to predict their secondary at a high accuracy level. Here we report a large scale-analysis of chameleon sequences to estimate their propensities to be associated with different local structural states such as α -helices, β-strands and coils. With the help of the propensity information derived from the amino acid composition, we underline their complexity, as more than one quarter of them prefers coil state over to the regular secondary structures. About half of them show preference for both α-helix and β-sheet conformations and either of these two states is favored by the rest.

Keywords: chameleon sequence, structural characteristics, secondary structures

Background

Repetitive secondary structures like α-helices and β-strands have been viewed as key building blocks of proteins. These local protein structures are stabilized mainly by hydrogen bonds within the protein backbone. In 1984, Kabsch and Sander identified identical fragment sequences of limited length found in both α-helices and β-strands, namely chameleon sequences [1]. This suggests that only local sequence composition and the order of amino acids are not sufficient to predict the secondary structure accurately [2]. The number of examples supporting the above speculation has strikingly increased in the recent past [3]. Elegant experimental studies have shown the importance of nonlocal interactions to guide the formation of α -helix or β - strand, e.g. the IgG-binding domain of protein G (GB1) [4]. Chameleon sequences have also been designed, e.g. MATa2 and MCM1 DNA complexes [5]. Studies have emphasized that these chameleon sequences, have no strong preference for either α-helical or β-strand conformations [6]. Nonetheless, the information on the local sequence neighborhood can be used to predict their secondary at a high accuracy level [3,7]. Here, we have analyzed chameleon sequences to estimate their propensities to form not only the regular secondary structures like α-helix or β-strand, but also coil [8].

Description

Unlike the previous studies that focused only on limited parts of the Protein DataBank [9], all the protein structures available in 2007 (∼40.000 protein structures) have been used. Secondary structures have been assigned for these proteins using the DSSP algorithm [10]. Only those proteins with complete side-chain co-ordinates and without multiple breaks in the chain were considered, leading to a final number of 14,692,070 amino acid residues associated to a given secondary structure. The 8 secondary structural assignments made by DSSP were reduced to the 3 classical states: helix includes α, 3.₁₀ and π-helices, strand has only the β-strand assignments, and coil covering the rest of the assignments (γ-bridges, turns, bends, and coil). Default parameters of the program have been used.

In the second step, we searched for chameleon sequences of length L, L ranging from 4 to 8 amino acids. A fragment is considered as a chameleon sequence if all the residues in this fragment are associated at least once to the helical conformation and also, at least once to the β-strand. Thus, numerous chameleon sequences have been located: 63,228 (for L = 4 residues), 34,408 (for L = 5), 2,423 (for L = 6), 179 (for L = 7) and 64 (for L = 8). As the dataset is large and complete when compared to the ones used in previous studies, more examples were found, especially for the longer fragments [3].

Our main goal is to check whether the chameleon sequences don't have any strong preference for either helical or strand conformations [6], and also to extend the questioning to the preference of chameleon sequences for the coil state, a question not directly tackled in the previous works. For this purpose, we have used a simple methodology. We have used a non-redundant databank containing proteins with not more than 20% pairwise sequence identity. The selected chains have X-ray crystallographic resolutions less than 1.6 Å, with a R-factor less than 0.25 (details can be found in [11]). Using this non-redundant databank, the propensity of an amino acid k to be associated to a given secondary structure state i, namely p_i^k, has been computed (see equation 1 in supplementary material) and i corresponds to α-helix, β-strand or the coil state, while k corresponds to one of the 20 amino acids.

Hence, each chameleon sequence X^S is associated to a score S_α, S_β and S_coil As these scores are propensity products, a score S_i of 1.0 corresponds to the random value. If S_i is higher than one, this chameleon sequence is found preferentially associated with the secondary state i and vice versa. This measure is crude but gives some basic insights into the behaviors of chameleon sequence.

Figure 1a shows a plot of S_α versus S_β for the 63,228 chameleon sequences (for L=4 residues). The adequacy scores greater than 4.0 were set to a maximum value of 4.0. The figure shows that 53.7% and 47.3% of the chameleon sequences have S_β and S_α scores greater than 1.0 respectively. Thus, each square delineated by the red lines are quite equivalent. S_β scores go far beyond S_α scores, as 16% of the S_β scores are greater than 2.0, 5.3% than 3.0 and 2.7% than 4.0, while only 5.1% of the Sα scores are greater than 2.0 and 0.2% than 3.0. 21.6% of the chameleon sequences have Sα and Sβ scores greater than one, with an average S_coil of 0.42 (i.e. less than two times the random value). For 25.7% of these fragments, α-helix is statistically preferred over β-strand, with an average S_coil of 0.68, while for 24.7%, only β-strand is preferred (average S_coil of 0.65). Interestingly, 27.9% of the chameleon sequences have S_α and S_β less than 1.0, i.e., the coil state is favored.

(a) Distribution of adequacy scores S(α) and S(β) of chameleon sequence fragment of length 4. The legend gives the occurrence number of observed fragments. (b) example of the chameleon sequence fragments MLIL found (left) in a β-strand of Guinea pig 11 beta-hydroxysteroid 2 dehydrogenase type 1 (PDB code 1XSE) and in an α-helix of a hyperthermophilic tungstoperin enzyme 2 aldehyde ferredoxin oxidoreductase (PDB code 1aor). The blue point in (a)represents the scores of example (b).

Figure 1b shows the chameleon sequence fragment MLIL that have S_α and S_β scores greater than 2.0 (shown as the blue dot in Figure 1a). In type-1 beta-hydroxysteroid 2 dehydrogenase, this chameleon sequence forms the central β-strand of a β-sheet composed of 5 β-strands (Figure 1b left), while in hyperthermophilic tungstoperin enzyme 2 aldehyde ferredoxin oxidoreductase, this sequence is in the middle of a long α-helix (Figure 1b right).

With this simple approach, we have underlined that chameleon sequences have no strong preference for either α- or β-conformation. We have also found that very different chameleon sequences exist, some showing a higher preference for either helical or strand conformations, some showing preference for both, while some sequences favor the coil state over the regular secondary structures. These observations again support the idea that non-local factors [2,3] have a major influence over the secondary structure that an amino acid sequence adopts. Supplementary information can be found on our website: http://www.dsimb.inserm.fr/~joseph/chameleon/

Supplementary material

Data 1

97320630003367S1.pdf^{(26.4KB, pdf)}

Acknowledgments

This work was supported by grants from the Ministère de la Recherche, Université Paris Diderot - Paris 7, Université de La Réunion and the French Institute for Health and Medical Care (INSERM). APJ has a grant from CEFIPRA number 3903-E and AB has a grant from Ministère de la Recherche.

Footnotes

Citation:Ghozlane et al, Bioinformation 3(9): 367-369 (2009)

References

1.Kabsch W, Sander C. Proc Natl Acad Sci U S A. 1984;81(1075) doi: 10.1073/pnas.81.4.1075. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Cohen BI, et al. Protein Sci. 1993;2(2134) doi: 10.1002/pro.5560021213. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Guo JT, et al. Proteins. 2007;67(548) doi: 10.1002/prot.21285. [DOI] [PubMed] [Google Scholar]
4.Minor DL, Kim PS. Nature. 1996;380(730) doi: 10.1038/380730a0. [DOI] [PubMed] [Google Scholar]
5.Takano K, et al. Proteins. 2007;68(617) doi: 10.1002/prot.21451. [DOI] [PubMed] [Google Scholar]
6.Mezei M. Protein Eng. 1998;11(411) doi: 10.1093/protein/11.6.411. [DOI] [PubMed] [Google Scholar]
7.Jacoboni I, et al. Proteins. 2000;41(535) doi: 10.1002/1097-0134(20001201)41:4<535::aid-prot100>3.0.co;2-c. [DOI] [PubMed] [Google Scholar]
8.Offmann B, et al. Current Bioinformatics. 2007;3(165) [Google Scholar]
9.Berman HM, et al. Nucleic Acid Res. 2000;28(235) doi: 10.1093/nar/28.1.235. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Kabsch W, Sander C. Biopolymers. 1983;22(2577) doi: 10.1002/bip.360221211. [DOI] [PubMed] [Google Scholar]
11.Faure G, et al. Biochimie. 2008;90(626) doi: 10.1016/j.biochi.2007.11.007. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data 1

97320630003367S1.pdf^{(26.4KB, pdf)}

[R01] 1.Kabsch W, Sander C. Proc Natl Acad Sci U S A. 1984;81(1075) doi: 10.1073/pnas.81.4.1075. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R02] 2.Cohen BI, et al. Protein Sci. 1993;2(2134) doi: 10.1002/pro.5560021213. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R03] 3.Guo JT, et al. Proteins. 2007;67(548) doi: 10.1002/prot.21285. [DOI] [PubMed] [Google Scholar]

[R04] 4.Minor DL, Kim PS. Nature. 1996;380(730) doi: 10.1038/380730a0. [DOI] [PubMed] [Google Scholar]

[R05] 5.Takano K, et al. Proteins. 2007;68(617) doi: 10.1002/prot.21451. [DOI] [PubMed] [Google Scholar]

[R06] 6.Mezei M. Protein Eng. 1998;11(411) doi: 10.1093/protein/11.6.411. [DOI] [PubMed] [Google Scholar]

[R07] 7.Jacoboni I, et al. Proteins. 2000;41(535) doi: 10.1002/1097-0134(20001201)41:4<535::aid-prot100>3.0.co;2-c. [DOI] [PubMed] [Google Scholar]

[R08] 8.Offmann B, et al. Current Bioinformatics. 2007;3(165) [Google Scholar]

[R09] 9.Berman HM, et al. Nucleic Acid Res. 2000;28(235) doi: 10.1093/nar/28.1.235. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Kabsch W, Sander C. Biopolymers. 1983;22(2577) doi: 10.1002/bip.360221211. [DOI] [PubMed] [Google Scholar]

[R11] 11.Faure G, et al. Biochimie. 2008;90(626) doi: 10.1016/j.biochi.2007.11.007. [DOI] [PubMed] [Google Scholar]

PERMALINK

Analysis of protein chameleon sequence characteristics

Amine Ghozlane

Agnel Praveen Joseph

Aurelie Bornot

Alexandre G de Brevern

Abstract

Background

Description

Figure 1.

Supplementary material

Acknowledgments

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Analysis of protein chameleon sequence characteristics

Amine Ghozlane

Agnel Praveen Joseph

Aurelie Bornot

Alexandre G de Brevern

Abstract

Background

Description

Figure 1.

Supplementary material

Acknowledgments

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases