Abstract
Background:
Establishing theories for designing arbitrary protein structures is complicated and depends on understanding the principles for protein folding, which is affected by applied features. Computer algorithms can reach high precision and stability in computationally designed enzymes and binders by applying informative features obtained from natural structures.
Methods:
In this study, a position-specific analysis of secondary structures (α-helix, β-strand, and tight turn) was performed to reveal novel features for protein structure prediction and protein design.
Results:
Our results showed that the secondary structures in the N-termini region tend to be more compact than C-termini. Decoying periodicity in length and distribution of amino acids in α-helices is deciphered using the curve-fitting methods. Compared with α-helix, β-strands do not show distinct periodicities in length. Also, significant differences in position-dependent distribution of physicochemical properties are shown in secondary structures.
Conclusion:
Position-specific propensities in our study underline valuable parameters that could be used by researchers in the field of structural biology, particularly protein design through site-directed mutagenesis.
Keywords: Algorithms, Amino acids, Physicochemical, Protein structure
INTRODUCTION
Understanding the relationship between position-specific properties of amino acids sequence and the secondary structure formation is vital for protein structure prediction and de novo protein design. The first ideas of protein structure prediction and de novo protein design come from very early studies on the correlation of amino acid distribution in protein structures[1-4]. It has been shown that the occurrence of amino acids in local structures, e.g. secondary structure, is position-dependent[5-9]. Recent studies have explored more details of amino acid distribution in secondary structures[5,10-15] and their functional roles[16,17]. In addition, the physicochemical bases that dictate the preference or avoidance of the amino acids for the secondary structure formation have been reported in a number of investigations[18,19]. These properties would be useful for designing algorithms to encode the molecular structures of natural proteins, which would improve the stability and precision of the resulting proteins[20]. However, the lack of comprehensive studies on position-specific evolutionary conservation and physicochemical properties of amino acids in secondary structures have motivated us to investigate these matters in the current research work.
In this study, to extract rules governing position-specific preference or avoidance of amino acids in secondary structures, an extensive analysis was performed based on position-specific distribution and conservation of amino acids in secondary structures, as well as based on the position-specific physicochemical properties of amino acids in secondary structures. This analysis was conducted on a database of secondary structure segments, including helical segments, β-strands, and tight turns (δ-turns, γ-turns, β-turns, α-turns, and π-turns). Our result introduces novel rules that govern formation and stabilization of secondary structures.
MATERIALS AND METHODS
Utilized dataset
The PDB database was culled at 25% sequence identity by PISCES webserver[21]. The structures of the selected proteins were determined via X-ray crystallography with resolution higher than 2 Å and R-factor value lower than 0.3. The sequences were excluded for proteins smaller than 40 amino acids. Additionally, we discarded the PDB files containing protein chains with chain break(s) and/or high frequency of nonstandard residues. This attempt resulted in a database containing 5362 non-redundant protein chains, corresponding to a total number of 1,197,533 amino acid residues. We used the standard method of definition of secondary structure of proteins to derive the secondary structure information from the remaining PDB files[22]. Using this database, the secondary structure information for α-helix (H), β-strand (E), and tight turn (T) were selected for further investigation. We separated tight turns based on their classification into subclasses δ-turn, γ-turn, β-turn, α-turn, and π-turn[23]. Consequently, three main subsets were formed with 34422, 63279, and 53192 sequences corresponding to α-helix, β-strand, and tight turn, respectively.
Definition of specific positions in secondary structures
Amino acid positions in secondary structures were annotated as Ni and Ci, where N and C are the N-terminus and C-terminus of the secondary structure, respectively, and i is the position number of the amino acid with respect to distance from reference terminus, i.e. N and C. For helices and strands, N-cap and C-cap are referred to the first residue that precedes and succeeds the helix or strand, respectively; both residues do not participate in the conformation[13,24,25]. The notation used for different secondary structures is illustrated in Figure 1.
Amino acid propensities in the secondary structure elements
In order to investigate the relationship between a specific position in a secondary structure and the amino acid residue located at this position, we defined position-specific propensity as follows:
where and are the frequency and fraction of a given amino acid residue (type i) in position j of secondary structure, respectively. Also, Ni and fi are the frequency and fraction of a given amino acid (type i) over the entire database.
Position-specific physicochemical properties of amino acids
Up to now, the majority of statistical studies of position-specific secondary structure properties have been focused on distribution of amino acid residues in the secondary structure[10,12,26,27]. In this study, we analyzed diverse evolutionary and physicochemical properties of amino acid residues in the secondary structure elements including conservation, compactness, planarity of side chains, crystal contact, B-factor, and surface accessibility. Most of these features were extracted from PDBFIND2 (ftp://ftp.cmbi.ru.nl/pub/molbio/data/pdbfinder2/), using in-house written programs.
Approximation of appropriate function for the obtained data
The Curve Fitting Toolbox of MATLAB V7.14 (R2012a) was applied to fit our data to smooth equations.
RESULTS AND DISCUSSION
In this study, the provided database includes the sample size larger than that found in the literature such as Bhattacharjee and Biswas’ work[10], with only 2586 non-redundant protein chains. Also, as illustrated in Figure 1, a more comprehensive analysis was performed on all the secondary structures, including helical segments, β-strands, and tight turns (δ-turns, γ-turns, β-turns, α-turns, and π-turns) in contrast to the few limited structures that were considered in the similar studies[10-15].
α-helices
Our database includes a large number of α-helices (n = 34422). The size of this database supports the reliable analysis of length distribution of α-helices and position-specific distribution of amino acids and physicochemical propensities in α-helices.
Decaying periodicity in distribution of α-helices length
The number of different lengths of α-helices that observed in our database is illustrated in Figure 2. Only α-helices shorter than 20 amino acids occur more than 500 times. Besides, α-helices with 10 amino acids length are the most frequent. The number of residues in each helix is fitted to a smooth equation and supports previous reports[13,28]. We found that a vertically shifted Gaussian with two terms gave a much better fit than that reported earlier (R2 = 0.9953)[13,28].
The weighted residuals, as shown in the inset of Figure 2, confirm preference for certain helical lengths reported by other works[13,28]. In agreement with those works, the preferences are periodic, showing ~3.6 residue periodicity. However, the weighted residuals in Figure 2 highlighted a neat two-sided decaying periodicity in the region of α-helices shorter than 20 amino acids length, which cover over 70% of our large database. The weighted residuals defined in terms of the observed frequency fi(O) and the Gaussian fit fi(P) are as follows: WRi=(fi(O)-fi(P))/fi(P).
Periodicity in position-specific propensities of amino acids in α-helices
Figures 3 and 4 show the average propensities within helices for each amino acid, grouped based on the physicochemical properties of the amino acid. We examined position-specific propensities for the first 15 positions at both N-cap and C-cap in α-helices. As depicted in these Figures, we grouped the amino acids into five categories, including short polar (Figs. 3A and 4A, D, N, S), long polar (Figs. 3B and 4B, E, K, Q, R), aromatic (Figs. 3C and 4C, F, W, Y), hydrophobic aliphatic, and Cys (Figs. 3D and 4D, C, I, L, V, M), and other residues that do not fall into any one of these categories (Figs. 3E and 4E, G, H, P, T, A). Our results, as demonstrated in Figures 3 and 4, showed position-specific independency and periodicity of the presence of amino acids in helix. Besides, in a few cases, the data for positions 6-16 were fitted to a decaying sinusoid with R2 values over 0.9 (e.g. Fig. 5).
Position-specific physicochemical propensities in α-helices
In addition to sequence-based position-specific propensities, we have analyzed position-specific physicochemical properties, including relative side chain accessibility, B-factor, conservation, crystal contact, entropy, absolute inside/outside distribution, insertions and deletions, packing, and planarity of side chains for the first 15 positions at both N-cap and C-cap in α-helices (Figs. 6 and 7). The comparison of the curves demonstrates a periodic pattern of residue positioning regarding their physicochemical properties This pattern is particularly observed for the 15 N-terminal residues within each helical conformation. Interestingly, the central positions of α-helices are highly conserved, and the N-terminus of α-helices is more compact compared to the C-terminus.
β-strands
Length distribution, position-dependent distribution of amino acids, and physicochemical propensities in β-strands were studied for 63279 β-strand in our database. The obtained results confirm diversity in rules in formation and stabilization of the secondary structures.
No periodicity in the distribution of β-strands length
A plot of occurrence of strands, as the function of the strand length in our database, is illustrated in Figure 8. In our database, strands with five amino acids length are the most frequent; the result is consistent with previous reports[10,29]. The occurrence level decreased sharply for β-strands longer than six residues.
Position-specific propensities of amino acids in β-strands
Except for both N1 and C1 positions in β-strands, the average propensities of amino acid residues in other positions show no significant fluctuation. Figure 9 shows the average propensities of amino acid residues in five selected positions, i.e. N1, N2, N-int, C1, and C2. From our results, three amino acids, including glutamine, lysine, and isoleucine demonstrated higher local and global propensity in β-strands. Also, these three amino acids avoided in N1 and C1 positions. Interestingly, some amino acid residues preferred in one or both N1 and C1 positions. For instance, leucine preferred in N1 position but not in other positions.
Position-specific physicochemical propensities in β-strands
Position-specific physicochemical properties for five different positions in β-strands, i.e. N1, N2, N-int, C1, and C2, were analyzed (Table 1). We observed that the central positions of β-strands have less insertions and deletions and the relative side chain accessibility, similar to our obtained results for α-helices. In addition, entropy constantly decreased from N- to C- termini. Interestingly, packing and B-factor show higher values in the middle and lower values on both N1 and C1 positions. However, packing in C1 position is slightly lower than N1 position. As we mentioned that the distribution of amino acids in C1 and N1 is different in β-strands (Fig. 9); therefore, the difference in compactness between the termini is indeed encoded in the primary sequence. Contrary to recent works that have evaluated a limited list of properties (propensity, χ2-values, hydrophobicity, and free energy) in β-strands[10,24], Table 1 provides a more complete picture of the secondary structures using physicochemical position-dependent properties in β-strands.
Table 1.
Tight turns
Physicochemical features and Position-specific propensities of amino acids were calculated for tight turns, including δ-, γ- β-, α-, and π-turns, also called as 2-, 3-, 4-, 5-, and 6-turns, respectively (Tables 2 and 3). The obtained results demonstrated significant differences in position-dependent distribution of amino acid residues and physicochemical properties in tight turns. Pattern of preference of physicochemical properties was completely different in tight turn subclasses. However, packing in C1 position was slightly lower compared to N1 position in β-, α-, and π-turns (Table 2). Significant differences observed in the distribution of amino acids confirm that the differences in compactness between the termini are indeed encoded in the primary sequence.
Table 2.
Table 3.
This work presents the most comprehensive analysis of position-dependent properties in protein secondary structures. An exhaustive study of the frequency of occurrence of individual amino acids and physico-chemical properties was carried out on a set of 34422, 63279, and 53192 sequences corresponding to α-helix, β-strand, and tight turns, respectively. The protein sample used in this study was very large, hence unbiased, giving high confidence to the obtained results, expressed in terms of global and local propensities. Some position-dependent physico-chemical features were also studied in α-helix, β-strand, and tight turns. The amount of information collected will need a further automatic analysis in order to obtain useful predictive rules. The physicochemical properties and the data concerning their individual and pair propensities generated in this work would be crucial to start the predictive modeling. With this approach, we aimed to find some general rules that can be applied to any amino acid sequence in order to predict the stability of secondary structures.
In summary, our results suggested more compactness in N-termini of α-helix, β-strand, and tight turns secondary structures compared to C-termini. We have observed decoying periodicity in position-specific propensities of amino acids in α-helices and the length of α-helices. Meanwhile, we have shown significant differences in propensities of amino acids in different positions, which could guide the the formation of secondary structures.
ACKNOWLEDGEMENTS
This research was financially supported by the Medical Image and Signal Processing Research Center, Isfahan University of Medical Sciences, Isfahan, Iran (Grant no. 295236).
Footnotes
CONFLICT OF INTEREST. None declared.
REFERENCES
- 1.Argos P, Palau J. Amino acid distribution in protein secondary structures. International journal of peptide and protein research. 1982;19(4):380–393. doi: 10.1111/j.1399-3011.1982.tb02619.x. [DOI] [PubMed] [Google Scholar]
- 2.Levitt M. Conformational p of amino acids in globular proteins. Biochemistry. 1978;17(20):4277–4285. doi: 10.1021/bi00613a026. [DOI] [PubMed] [Google Scholar]
- 3.Chou PY, Fasman GD. Conformational parameters for amino acids in helical, β-sheet, and random coil regions calculated from proteins. Biochemistry. 1974;13(2):211–222. doi: 10.1021/bi00699a001. [DOI] [PubMed] [Google Scholar]
- 4.Davies DR. a correlation between amino acid composition and protein structure. Journal of molecular biology. 1964;9:605–609. doi: 10.1016/s0022-2836(64)80232-1. [DOI] [PubMed] [Google Scholar]
- 5.Fooks HM, Martin ACR, Woolfson DN, Sessions RB, Hutchinson EG. Amino acid pairing p in parallel β-sheets in proteins. Journal of molecular biology. 2006;356(1):32–44. doi: 10.1016/j.jmb.2005.11.008. [DOI] [PubMed] [Google Scholar]
- 6.Petukhov M, Muñoz V, Yumoto N, Yoshikawa S, Serrano L. Position dependence of non-polar amino acid intrinsic helical propensities. Journal of molecular biology. 1998;278(1):279–289. doi: 10.1006/jmbi.1998.1682. [DOI] [PubMed] [Google Scholar]
- 7.Petukhov M, Uegaki K, Yumoto N, Yoshikawa S, Serrano L. Position dependence of amino acid intrinsic helical propensities II:Non-charged polar residues:Ser, Thr, Asn, and Gln. Protein science. 1999;8(10):2144–2150. doi: 10.1110/ps.8.10.2144. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Richardson JS, Richardson DC. Amino acid p for specific locations at the ends of alpha helices. Science. 1988;240(4859):1648–1652. doi: 10.1126/science.3381086. [DOI] [PubMed] [Google Scholar]
- 9.Kumar S, Bansal M. Dissecting a-helices:position-specific analysis of a-helices in globular proteins. Proteins. 1998;31(4):460–476. doi: 10.1002/(sici)1097-0134(19980601)31:4<460::aid-prot12>3.0.co;2-d. [DOI] [PubMed] [Google Scholar]
- 10.Bhattacharjee N, Biswas P. Position-specific propensities of amino acids in the β-strand. BMC structural biology. 2010;10:29. doi: 10.1186/1472-6807-10-29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Crasto CJ, Feng J. Sequence codes for extended conformation:A neighbor-dependent sequence analysis of loops in proteins. Proteins. 2001;42(3):399–413. doi: 10.1002/1097-0134(20010215)42:3<399::aid-prot100>3.0.co;2-e. [DOI] [PubMed] [Google Scholar]
- 12.Fonseca NA, Camacho R, Magalhaes A. Amino acid pairing at the N-and C-termini of helical segments in proteins. Proteins. 2008;70(1):188–196. doi: 10.1002/prot.21525. [DOI] [PubMed] [Google Scholar]
- 13.Engel DE, DeGrado WF. Amino acid propensities are position-dependent throughout the length of α-helices. Journal of molecular biology. 2004;337(5):1195–1205. doi: 10.1016/j.jmb.2004.02.004. [DOI] [PubMed] [Google Scholar]
- 14.Enkhbayar P, Hikichi K, Osaki M, Kretsinger RH, Matsushima N. 310-helices in proteins are parahelices. Proteins. 2006;64(3):691–699. doi: 10.1002/prot.21026. [DOI] [PubMed] [Google Scholar]
- 15.Iqbalsyah TM, Doig AJ. Pairwise coupling in an Arg-Phe-Met triplet stabilizes α-helical peptide via shared rotamer p. Journal of the American chemical society. 2005;127(14):5002–5003. doi: 10.1021/ja043446e. [DOI] [PubMed] [Google Scholar]
- 16.Sato T, Kawasaki T, Mine S, Matsumura H. Functional role of the C-terminal amphipathic helix 8 of olfactory receptors and other G protein-coupled receptors. International journal of molecular sciences. 2016;17(11):1930. doi: 10.3390/ijms17111930. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Liou YF, Huang HL, Ho SY. A hydrophobic spine stabilizes a surface-exposed α-helix according to analysis of the solvent-accessible surface area. BMC bioinformatics. 2016;17(Suppl 19):503. doi: 10.1186/s12859-016-1368-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Avbelj F, Moult J. Role of electrostatic screening in determining protein main chain conformational p. Biochemistry. 1995;34(3):755–764. doi: 10.1021/bi00003a008. [DOI] [PubMed] [Google Scholar]
- 19.Street AG, Mayo SL. Intrinsic β-sheet propensities result from van der Waals interactions between side chains and the local backbone. Proceedings of the national academy of sciences. 1999;96(16):9074–9076. doi: 10.1073/pnas.96.16.9074. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Netzer R, Fleishman SJ. Programming protein shapes and specificities by modular design. Science. 2016;352(6286):657–658. doi: 10.1126/science.aaf7599. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Wang G, Dunbrack RL. PISCES:a protein sequence culling server. Bioinformatics. 2003;19(12):1589–91. doi: 10.1093/bioinformatics/btg224. [DOI] [PubMed] [Google Scholar]
- 22.Kabsch W, Sander C. Dictionary of protein secondary structure:pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983;22(12):2577–2637. doi: 10.1002/bip.360221211. [DOI] [PubMed] [Google Scholar]
- 23.Chou K-C. Prediction of tight turns and their types in proteins. Analytical biochemistry. 2000;286(1):1–16. doi: 10.1006/abio.2000.4757. [DOI] [PubMed] [Google Scholar]
- 24.Dai Q, Li Y, Liu X, Yao Y, Cao Y, He P. Comparison study on statistical features of predicted secondary structures for protein structural class prediction:From content to position. BMC bBioinformatics. 2013;14:152–165. doi: 10.1186/1471-2105-14-152. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Duan M, Huang M, Ma C, Li L, Zhou Y. Position-specific residue preference features around the ends of helices and strands and a novel strategy for the prediction of secondary structures. Protein science. 2008;17(9):1505–15012. doi: 10.1110/ps.035691.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.De Sousa MM, Munteanu CR, Pazos A, Fonseca NA, Camacho R, Magalhaes A. Amino acid pair-and triplet-wise groupings in the interior of α-helical segments in proteins. Journal of theoretical biology. 2011;271(1):136–144. doi: 10.1016/j.jtbi.2010.11.028. [DOI] [PubMed] [Google Scholar]
- 27.Fujiwara K, Toda H, Ikeguchi M. Dependence of α-helical and β-sheet amino acid propensities on the overall protein fold type. BMC structural biology. 2012;12:18. doi: 10.1186/1472-6807-12-18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Penel S, Morrison RG, Mortishire-Smith RJ, Doig AJ. Periodicity in α-helix lengths and C-capping p. Journal of molecular biology. 1999;293(5):1211–1219. doi: 10.1006/jmbi.1999.3206. [DOI] [PubMed] [Google Scholar]
- 29.Sreerama N, Venyaminov SY, Woody RW. Estimation of the number of α-helical and β-strand segments in proteins using circular dichroism spectroscopy. Protein science. 1999;8(2):370–380. doi: 10.1110/ps.8.2.370. [DOI] [PMC free article] [PubMed] [Google Scholar]