Abstract
Linear groups—polypeptide conformations based on a single repeating φ,ψ-pair—are a foundational concept in protein structure, yet how they are presented in textbooks is based largely on theoretical studies from the early days of protein structure analysis. Now, ultra-high resolution protein structures provide a resource for an accurate empirical and systematic assessment of the linear groups that truly exist in proteins. Here, a purely conformation-based survey of linear groups shows that only three distinct φ,ψ-regions occur: a diverse set of extended conformations mostly present as β-strands, a broad population of polyproline-II-like spirals, and a tight cluster that includes the highly populated α-helix and the conformationally-similar but much less populated 310-helix. Rare, short left-handed α-/310-helical turns with repeating φ,ψ-angles occur, but none are longer than three residues. Misperceptions dispelled by this study are the existence of 2.27- and π-helices as linear groups, the existence of specific ideal φ,ψ-angles for each linear group, and the existence of a substantive difference in the φ,ψ-preferences for parallel versus antiparallel β-strands. This study provides a concrete basis for updating and enhancing how we think about and teach the basics of protein structure.
Keywords: Ramachandran plot, linear group, α-helix, β-sheet, polyproline, left-handed helix, protein standard conformation, secondary structure
Statement for Broader Audience
The fundamentals of protein conformation are currently presented in textbooks and taught based on out-of-date information that misses the true nature of the conformations that build proteins. This work asks one simple question that provides key information and insights needed for revising our understanding and teaching of these basics.
In 1950, based on crystal structures of five small-molecule peptides, Corey and Donohue1 published the first set of standard bond lengths and angles for guiding polypeptide modeling. Pauling and coworkers then used this geometry to search for linear groups, defined as a series of residues all having identical conformations†, that could enter regular hydrogen-bonded interactions: they predicted the γ- and α-helices2 and the β-pleated sheet.3 Subsequently, additional linear groups were predicted including the π-helix,4 the 2.27-ribbon and the 310-helix.5 Later, structural studies led to the recognition of the poly-l-proline II (PII)/polyglycine II/collagen-like helix as a linear group that does not undergo regular hydrogen bonding.6–8 Although some predicted linear groups, such as Pauling's γ-helix,2 were never seen in proteins and are no longer referred to, a subset of linear groups compiled in 1970,9 with a set of associated φ,ψ-angles, are often presented in contemporary biochemistry10–13 and structural biology texts14,15 as standard protein conformations (see Fig. 1).
Although much has been learned about structural motifs in proteins,16,17 we were surprised to find no single systematic survey in the literature directly addressing the occurrence of linear groups in proteins that updates the information summarized by IUPAC in 1970. What is needed is a simple, direct analysis that, independent of hydrogen bonding patterns, assesses which single conformations are seen to repeat in real proteins. Here, we present such an analysis.
The 1.2 Å Resolution Data Set
The primary data set was created by a search of a Protein Geometry Database (DSB, PAK, unpublished) for three-residue segments in protein structures determined at ≤1.2 Å resolution and which according to the PDBSelect March 2006 release18 had ≤25% sequence identity with any other included structure. Each residue furthermore was required to have an average backbone B-factor ≤25 Å2 and a trans peptide bond (|ω| > 140°). The search resulted in 30,692 segments from 209 protein chains. Considering just the central residue of each segment, all amino acid types were well-represented (numbers of occurrences ranging from 2,710 for Ala to 472 for Trp) and the φ,ψ-angles [Fig. 2(A)] were distributed as expected based on previous analyses using strict selection criteria.20–22 In terms of regular hydrogen-bonded secondary structures, 10,028 (33%), 8,756 (29%), 1,200 (4%), and 9 (0.03%) residues were classified by DSSP23 as α-helix (α), β-sheet (β), 310-helix, and π-helix, respectively.
Identification of Linear Groups
Identifying linear groups requires selecting criteria for the minimal number of residues and the φ,ψ-variation allowed: the longer the minimal length and the narrower the allowed φ,ψ-variation, the fewer groups will be found but the more truly linear those groups will be. A minimal length of one is meaningless because it would simply define every observed conformation as a linear group. In addition, two φ,ψ-pairs are considered to define turn types rather than “regular” secondary structures.24 While three residue segments also make up many turn structures,25 we settled on three residues as the shortest (i.e. least stringent) reasonable length requirement, acknowledging that some of the groups satisfying these criteria will not actually represent true linear groups that occur at longer lengths.
In terms of φ,ψ-variation allowed, we first carried out a low-stringency study, considering each residue to simply belong to one of the three main well-populated regions of the Ramachandran plot often referred to as the alpha, beta, and l-alpha regions [Fig. 2(A)]. Segments with all three residues residing in the alpha, beta, or l-alpha regions accounted for approximately 30%, 23%, and 0.02% of all residues respectively, leaving over 45% of the segments adopting more conformationally diverse structures. Whereas no hydrogen-bonding criteria were used in the search, in all cases, the majority of qualifying segments were part of regularly hydrogen-bonded β-strands and right-handed α- or 310-helices [Fig. 2(A)]. The six segments qualifying in the L-alpha region included four 3-residue segments and one 4-residue segment, leading us to conclude that whereas they passed this minimal filter, they do not occur in segments long enough to be considered true linear groups (Supporting Information Fig. S1).
At a more discriminatory and informative level, we assessed linear groups as three consecutive residues all having the same φ,ψ-angles within ±10° by mapping φ,ψ-space in 20° × 20° boxes at 10° intervals. There were substantial numbers of observations in the alpha and beta regions [Figs. 2(B,C)]. Qualitatively equivalent results were obtained searching 30° × 30° boxes, showing that the results are not highly sensitive to the choice of box size (Supporting Information Fig. S2).
The alpha region shows a rather tight distribution of qualifying linear groups, with a single 20° × 20° box (centered at ϕ,ψ = −60, −40) having more than 4,000 observations [Fig. 2(B)]. In terms of hydrogen-bonding patterns, all occurrences at lower ψ-values were α-helical with 310- and mixed α-/310-helices occurring at the higher ψ-values.
For the beta region, this more stringent mapping reveals a natural division into two populations: a broad elliptical grouping with nearly all observations having β-sheet hydrogen-bonding patterns and a smaller but also well-dispersed grouping around the classical PII conformation that is nearly devoid of regular β-sheet hydrogen bonding [Fig. 2(C)]. The narrow bridge between the two regions is centered near ϕ,ψ = −95,+140. Despite the standard depiction of distinct ideal values for antiparallel and parallel β-strands (see Fig. 1), both types of strands are seen to have their highest density of observations surrounding ϕ,ψ = −115, +130, and both occur throughout the elliptical β-sheet forming region. In agreement with Nagano,26 it appears that the parallel strands are somewhat less diverse in conformation [Fig. 2(C), Supporting Information Figure S3].
Discussion
This survey shows that considering conformational properties alone, only three distinct linear groups comprise the protein-building toolkit. The most populated is a right-handed helical conformation dominated by the α-helix but also including the 310-helix and mixed α/310 forms. The second is a diverse group of extended conformations that are dominated by residues occurring in β-strands in both parallel and antiparallel β-sheets. The third is a set of left-handed spiral conformations in the PII area that generally are not a part of β-sheets. Although these results may not surprise many structural biologists, they illustrate some basic features of protein structure that are not generally appreciated or incorporated into current curricula.
A first point worth emphasizing is that beyond these three clusters, no other true linear building blocks of proteins exist. Textbooks can now be clear that although isolated residues and rare short segments may exist with the 2.27-ribbon, the π-helical, and the left-handed α- or 310-helical conformations, those conformations do not occur in extended segments having repeating ϕ,ψ-angles. This conclusion is in apparent contradiction to reports describing π-helices in proteins as uncommon but real.27,28 The resolution to this apparent contradiction is that the reported π-helices were defined not by conformation, but simply by the presence of two or more consecutive i+5,i hydrogen-bonds. They do not satisfy the definition of true linear groups as they are formed by a series of residues conformations varying by up to 60° in both ϕ and ψ (see Fig. 2 of Fodje and Al-Karadaghi28); also, the large majority are short with only two hydrogen-bonds.
A related inference is that the 310-helices identified by hydrogen-bonding are mostly not built from a narrowly repeating conformation. Considering just the central residue of each three-residue segment (see above), there are 12% (1200/10,028) as many residues in 310-helices as in α-helices. In contrast, for the three-residue segments having consistent ϕ,ψ-angles, the ratio is only 0.8% [67/8,623 derived from the sums of the occurrences in Fig. 2(B)]. This means over 90% of residues identified as 310-helical in proteins are involved in turn-like conformations rather than true linear groups. This fits with the observation that 310-helices are mostly short and associated with the beginnings or ends of α-helices.29
A second main point is that the β-strand and PII represent fully distinct linear groups [Fig. 2(B)]. The PII conformation was overlooked for many years in proteins because it is not defined by hydrogen bonding. It came to be recognized as an important element of folded proteins in the 1990s30, occurring in lengths up to 12 residues,31 and since then has become recognized as a significant conformation in unfolded peptides and proteins.32–34 It has been noted that PII is a confusing and unfortunate designation, since the conformation is not just associated with Pro but can be adopted by all amino acids. In the study here, about one-third of the residues in the center of PII tripeptides are Pro; the rest include all types of amino acids. Perhaps the common name could be changed to a more general “polypeptide-II” conformation. This would maintain the familiar PII acronym, avoid the misleading association with only Pro, and be consistent with the observation that it is a prominent conformation in unfolded polypeptide chains. Similarly, we suggest that the region of the Ramachandran plot broadly referred to as the β-region [Fig. 2(A)], be renamed the β/PII-region so the nomenclature used lays a foundation for proper recognition of both of the two main contributing conformations.
A third important point is that the β-strand and PII populations are both very spread out and cannot be well-characterized by a single ϕ,ψ-conformation. In contrast, the α-/310-helix group is more tightly clustered. The tighter clustering of the ϕ,ψ-preferences for the α-/310-helices compared to the b-strand and PII spirals is consistent with different hydrogen-bonding constraints: the α-/310-helices are constrained by the need to satisfy local hydrogen bonding, whereas both β-strands and PII spirals exhibit variety as they access many relatively isoenergetic conformations to optimize tertiary hydrogen-bonding interactions with other parts of the protein and solvent. Given these spreads, it is misleading to give a single ϕ,ψ as the “ideal” β or PII value. Also misleading is the assignment of distinct ideal values for parallel and anti-parallel β-strands (see Fig. 1) that are here shown to exhibit no large difference in preferred value [Fig. 2(C)]. Instead, we suggest a simple figure illustrating the full range of ideal values for each type of linear group would better convey the reality that a range of values can all be considered normal (Fig. 3, and Supporting Information Fig. S4).
Acknowledgments
The authors thank Kevin Ahern for discussions and encouragement. They also thank all the crystallographers who have graciously deposited their coordinate sets in the PDB.
Footnotes
Technically, linear groups may also be called helices, but as the term helix is now commonly used to refer to any spiral-like structure, we use the less ambiguous term here.
References
- 1.Corey R, Donohue J. Interatomic distances and bond angles in the polypeptide chain of proteins. Proc Natl Acad Sci USA. 1950;72:2899–2900. [Google Scholar]
- 2.Pauling L, Corey RB, Branson HR. The structure of proteins; two hydrogen-bonded helical configurations of the polypeptide chain. Proc Natl Acad Sci USA. 1951;37:205–211. doi: 10.1073/pnas.37.4.205. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Pauling L, Corey RB. The pleated sheet, a new layer configuration of polypeptide chains. Proc Natl Acad Sci USA. 1951;37:251–256. doi: 10.1073/pnas.37.5.251. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Low B, Baybutt R. The pi-helix—a hydrogen bonded configuration of the polypeptide chain. J Am Chem Soc. 1952;74:5806–5807. [Google Scholar]
- 5.Donohue J. Hydrogen bonded helical configurations of the polypeptide chain. Proc Natl Acad Sci USA. 1953;39:470–478. doi: 10.1073/pnas.39.6.470. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Arnott S, Dover SD. The structure of poly-l-proline II. Acta Crystallogr B. 1968;24:599–601. doi: 10.1107/s056774086800289x. [DOI] [PubMed] [Google Scholar]
- 7.Sasisekharan V. Structure of poly-l-proline II. Acta Crystallogr. 1959;12:897–903. [Google Scholar]
- 8.Ramachandran GN, Ramakrishnan C, Sasisekharan V. Stereochemistry of polypeptide chain configurations. J Mol Biol. 1963;7:95–99. doi: 10.1016/s0022-2836(63)80023-6. [DOI] [PubMed] [Google Scholar]
- 9.IUPAC. IUPAC-IUB Commission on biochemical nomenclature. Abbreviations and symbols for the description of the conformation of polypeptide chains. Biochemistry. 1970;9:3471–3479. doi: 10.1021/bi00820a001. [DOI] [PubMed] [Google Scholar]
- 10.Garrett R, Grisham CM. Biochemistry. 3rd ed. Belmont, CA: Thomson Brooks/Cole; 2005. [Google Scholar]
- 11.Mathews CK, Van Holde KE, Ahern KG. Biochemistry. 3rd ed. San Francisco, California: Benjamin Cummings; 2000. [Google Scholar]
- 12.Voet D, Voet JG. Biochemistry. 3rd ed. Hoboken, NJ: Wiley; 2004. [Google Scholar]
- 13.Lehninger AL, Nelson DL, Cox MM. Lehninger principles of biochemistry. 5th ed. New York: W.H. Freeman; 2008. [Google Scholar]
- 14.Lesk AM. Introduction to protein architecture: the structural biology of proteins. Oxford: Oxford University Press; 2001. [Google Scholar]
- 15.Van Holde KE, Johnson WC, Ho PS. Principles of physical biochemistry. 2nd ed. Upper Saddle River, NJ: Pearson/Prentice Hall; 2006. [Google Scholar]
- 16.Beck DA, Alonso DO, Inoyama D, Daggett V. The intrinsic conformational propensities of the 20 naturally occurring amino acids and reflection of these propensities in proteins. Proc Natl Acad Sci USA. 2008;105:12259–12264. doi: 10.1073/pnas.0706527105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Perskie LL, Street TO, Rose GD. Structures, basins, and energies: A deconstruction of the protein coil library. Protein Sci. 2008;17:1151–1161. doi: 10.1110/ps.035055.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Hobohm U, Sander C. Enlarged representative set of protein structures. Protein Sci. 1994;3:522–524. doi: 10.1002/pro.5560030317. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Laskowski RA, Chistyakov VV, Thornton JM. PDBsum more: new summaries and analyses of the known 3D structures of proteins and nucleic acids. Nucleic Acids Res. 2005;33:D266–D268. doi: 10.1093/nar/gki001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Karplus PA. Experimentally observed conformation-dependent geometry and hidden strain in proteins. Protein Sci. 1996;5:1406–1420. doi: 10.1002/pro.5560050719. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Kleywegt GJ, Jones TA. Phi/psi-chology: Ramachandran revisited. Structure. 1996;4:1395–1400. doi: 10.1016/s0969-2126(96)00147-5. [DOI] [PubMed] [Google Scholar]
- 22.Lovell SC, Davis IW, Arendall WB, III, de Bakker PI, Word JM, Prisant MG, Richardson JS, Richardson DC. Structure validation by Cα geometry: phi, psi, and Cβ deviation. Proteins. 2003;50:437–450. doi: 10.1002/prot.10286. [DOI] [PubMed] [Google Scholar]
- 23.Kabsch W, Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983;22:2577–2637. doi: 10.1002/bip.360221211. [DOI] [PubMed] [Google Scholar]
- 24.Venkatachalam CM. Stereochemical criteria for polypeptides and proteins. V. Conformation of a system of three linked peptide units. Biopolymers. 1968;6:1425–1436. doi: 10.1002/bip.1968.360061006. [DOI] [PubMed] [Google Scholar]
- 25.Street TO, Fitzkee NC, Perskie LL, Rose GD. Physical-chemical determinants of turn conformations in globular proteins. Protein Sci. 2007;16:1720–1727. doi: 10.1110/ps.072898507. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Nagano K. Logical analysis of the mechanism of protein folding. IV. Super-secondary structures. J Mol Biol. 1977;109:235–250. doi: 10.1016/s0022-2836(77)80032-6. [DOI] [PubMed] [Google Scholar]
- 27.Weaver TM. The pi-helix translates structure into function. Protein Sci. 2000;9:201–206. doi: 10.1110/ps.9.1.201. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Fodje MN, Al-Karadaghi S. Occurrence, conformational features and amino acid propensities for the pi-helix. Protein Eng. 2002;15:353–358. doi: 10.1093/protein/15.5.353. [DOI] [PubMed] [Google Scholar]
- 29.Richardson JS. The anatomy and taxonomy of protein structure. Adv Protein Chem. 1981;34:167–339. doi: 10.1016/s0065-3233(08)60520-3. [DOI] [PubMed] [Google Scholar]
- 30.Adzhubei AA, Sternberg MJ. Left-handed polyproline II helices commonly occur in globular proteins. J Mol Biol. 1993;229:472–493. doi: 10.1006/jmbi.1993.1047. [DOI] [PubMed] [Google Scholar]
- 31.Stapley BJ, Creamer TP. A survey of left-handed polyproline II helices. Protein Sci. 1999;8:587–595. doi: 10.1110/ps.8.3.587. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Shi Z, Woody RW, Kallenbach NR. Is polyproline II a major backbone conformation in unfolded proteins? Adv Protein Chem. 2002;62:163–240. doi: 10.1016/s0065-3233(02)62008-x. [DOI] [PubMed] [Google Scholar]
- 33.Creamer TP, Campbell MN. Determinants of the polyproline II helix from modeling studies. Adv Protein Chem. 2002;62:263–282. doi: 10.1016/s0065-3233(02)62010-8. [DOI] [PubMed] [Google Scholar]
- 34.Whittington SJ, Chellgren BW, Hermann VM, Creamer TP. Urea promotes polyproline II helix formation: implications for protein denatured states. Biochemistry. 2005;44:6269–6275. doi: 10.1021/bi050124u. [DOI] [PubMed] [Google Scholar]