Abstract
We investigate the supersecondary structure of a large group of proteins, the so-called sandwich proteins. The analysis of a large number of such proteins has led us to propose a set of rules that can be used to predict the possible arrangements of strands in the two β-sheets forming a given sandwich structure. These rules imply the existence of certain invariant supersecondary substructures common to all sandwich proteins. Furthermore, they dramatically restrict the number of permissible arrangements. For example, whereas for proteins consisting of three strands in each β-sheet 180 possible strand arrangements exist a priori, our rules imply that only 15 of them are permissible. Five of these predicted arrangements describe all currently known sandwich proteins with six strands.
Keywords: protein secondary structure, supersecondary structure, structure prediction
Proteins that do not appear to have sequence similarity may have similar folds (1–4). This fact implies that structure is better conserved than sequence. One of the possible explanations for the limited number of protein folds is that certain rules exist that constrain the folding of a polypeptide chain. It has been suggested that such rules can be divided into two types: rules that allow one, starting from a given sequence, to predict the secondary structure elements (strands and helices), and rules that govern the assembling of strands and helices into a tertiary structure (5–9). Both of these problems are actively investigated by computational structural biologists.
Much progress has occurred with respect to the first problem. Indeed, several efficient programs (most of them based on the neural network approach) that allow one to determine strands and helices in a given amino acid sequence (10–13) exist.
Researchers investigating the second problem are trying to find rules that constrain the packing of strands and helices into the limited canonical structures found in nature. For β proteins, this problem can be formulated as follows: Starting from a given consecutive set of strands, the goal is to find the possible arrangement of these strands in the β-sheets. The pioneering research of this kind is the famous work of Richardson (14), who discovered the so-called Greek key topology: it has been shown that this structural motif occurs in ≈70% of β proteins. Another interesting approach to this problem is based on the analysis of the folding mechanism of β proteins (5): it has been shown that the number of permissible topologies depends on the number of β-strands and on the folding pathway. Furthermore, a number of regularities for β-sheet motifs, such as the nonexistence of knots in the chain, has been discovered (15–20). It has also been shown that loops do not cross (14, 21).
In this article, we focus on a large group of β proteins, the so-called sandwich-like proteins. This type of architecture unites a number of very different protein superfamilies, which have no detectable sequence homology. The aim of our research is to find rules that govern the structural motifs common to all sandwich proteins. In this respect, we note the recent discovery of a certain supersecondary substructure (“interlock”), which appears to be a geometrical invariant of sandwich-like proteins (22). Here, we investigate further the supersecondary motifs of sandwich proteins. We show that the formation of an interlock is a particular manifestation of a general set of rules for sandwich proteins. These rules allow one to predict the arrangement of the strands in the β-sheets and all permissible variants of supersecondary structure motifs.
Methods and Results
Supersecondary Structure of Sandwich Proteins. Sandwich-like proteins constitute a large group of proteins. According to the scop structural protein classification, about one-third of β proteins are described as sandwich-like. Sandwich proteins are defined as those structures that consist of two main β-sheets packed against each other. In some sandwich proteins there are more than two β-sheets in the structure, forming the so-called “auxiliary” sheets. In this article, we consider only sandwich proteins consisting of two β-sheets with antiparallel strands.
In our analysis, following the scop hierarchical classification, we have considered the structures of 27 folds, 52 superfamilies, 87 families, and 177 domains (23).
The analysis starts with the assignment of a secondary state to each residue. To define the secondary structures of a protein from its atomic coordinates we have calculated the hydrogen bonds between the main-chain atoms of pairs of residues. By using the list of the hydrogen bonds, we have determined both the strands and the order of the strands in the β-sheets, i.e., the supersecondary structure motifs.
For example, for the structure 1MSP chain a, the calculations of the hydrogen bonds have revealed the existence of nine strands (Fig. 1) and the arrangement of these strands in the two A and B β-sheets:
![]() |
Fig. 1.
The schematic representation of the strands and the arrangements of the strands in the β-sheets of the 1MSP structure. (A) The strands are consequently numbered starting from the N-terminal of chain a. (B) Fold of chain a of the motile major sperm protein crystal structure.
An important characteristic of the supersecondary structure is the mutual orientation of the two sheets, i.e., the order of the strands in the sheets. For example, in the 1MSP structure the order of the strands in sheet A from the right to the left edge is 5, 7, 3, 1, and the order of the strands in sheet B from right to left is 6, 4, 8, 9, 2 (see Fig. 1B).
Jumping Pairs (JPs) of Strands. We will use the following definitions.
Neighboring strands (NS) are strands found in the same β-sheet and connected by hydrogen bonds between the mainchain atoms. Each strand has two NS unless it occurs at the edge of the sheet. For example, in 1MSP, strand 7 is the right NS of strand 3 and strand 1 is the left NS of strand 3 (Fig. 1B).
Two consecutive strands i and i + 1 are called a JP if they are in different sheets. If both i and i + 1 are at the edges of the same side of the two sheets, then the JP is called an edge JP (EJP); otherwise it is called an internal JP (IJP). For example, in the 1MSP structure strands 3 and 4, 4 and 5, 6 and 7, 7 and 8, and 9 and 1 are IJPs, whereas strands 1 and 2 as well as 5 and 6 are EJPs.
Throughout this article we assume periodicity; i.e., the first strand of the domain follows the last strand. For example, strands 9 and 1 are considered to be consecutive strands.
Constraint Rules. The analysis of the arrangement of the strands has led us to formulate certain rules, which impose severe restrictions on the possible supersecondary structural motifs.
A distinctive feature of sandwich proteins is that an IJP exists in every sandwich structure.
Rule I. This rule describes the formation of a certain fundamental supersecondary sandwich substructure called interlock in ref. 22. Let i and i + 1 be an IJP. For every IJP there exists another IJP, k and k + 1, such that k is a NS of i and k + 1 is a NS of i + 1. Strand k can be either the left or the right NS of i.If k is the left (right) NS of i, then k + 1 is the right (left) NS of i + 1:
![]() |
(The JP i and i + 1 is indicated here by an arrow.) Each IJP participates in the formation of one, and only one, interlock.
Rule II. This rule describes the possible arrangements of four consecutive strands i – 1, i, i + 1, and i + 2. Let i and i + 1 be an IJP. Two alternative positions for strand i – 1 exist: either (a) it is a NS of i,
![]() |
or (b) it is on the same sheet with i + 1 and at most four positions away from it,
![]() |
The situation is analogous for i + 2; namely, it is either a NS of i + 1, or it is in the same sheet with i and at most four positions away from it. Furthermore, strands i – 1 and i + 2 are on opposite sides with respect to the IJP i and i + 1, and strand i – 1 is either on the same side with the strand k + 1, or it is strand k + 1.
A particular manifestation of this rule is the case that i – 1 is a NS of i and i + 2 is a NS of i + 1. We call the resulting supersecondary substructure a superinterlock.
Rule III. This rule describes the formation of EJPs. In every sandwich structure two EJPs exist; i.e., JPs are at both edges. In the 1MSP structure two EJPs are strands 1 and 2 at one edge and strands 5 and 6 at the other (Fig. 1B).
An example. We now analyze every IJP of the example 1MSP (see Fig. 1B). For this analysis, we indicate the IJP we analyze by an arrow, and if either of the two strands of the IJP is an edge strand, then we place the symbol • next to it.
The first IJP is formed by strands i = 2 and i + 1 = 3,
![]() |
In accordance with rule II, strand 1 (i – 1) is in sheet A and one position away from 3 (i + 1), whereas strand 4 (i + 2) is in sheet B and three positions away from strand 2 (i). Furthermore, strands 1 and 4 are on opposite sides of the IJP 2 → 3.
In accordance with rule I, there is another IJP, 9 and 1, which forms an interlock with the IJP 2 → 3. The structure consists of nine strands; thus, because of the condition of periodicity k = 9 and k + 1 = 1.
The next IJP (i → i + 1) is 3 and 4,
![]() |
The IJP 7 and 8 forms an interlock with the IJP 3 → 4, in accordance with rule I (i = 3 and k = 7). Strand 2 (i – 1) is on sheet B and three positions away from strand 4, whereas strand 5 (i + 2) is on sheet A and two positions away from strand 3. Furthermore, strands 2 and 5 are on opposite sides of the IJP 3 → 4.
The next IJP is 4 and 5,
![]() |
The IJP 6 and 7 forms an interlock with the IJP 4 → 5 in accordance with rule I (k = 6). Also, strand 3 (i – 1) is on sheet A and two positions away from strand 5, whereas strand 6 (i + 2) is on sheet B and one position from strand 4. Furthermore, strands 6 and 3 are on opposite sides of the IJP 4 → 5.
The analysis for the IJPs 6 → 7 and 7 → 8 is similar. The last IJP is 9 and 1,
![]() |
The IJP 2 and 3 forms an interlock with the IJP 9 → 1 in accordance with rule I (k = 2). Also, strand 8 (i – 1) is a NS of 9 and strand 2 (i + 2) is on sheet B and one position away from strand i (9). Furthermore, strands 8 and 2 are on opposite sides of the IJP 9 → 1.
So in accordance with rule III, two EJPs exist, namely, EJPs 1 and 2 as well as 5 and 6.
The Analysis of the Arrangement of Strands. We have investigated the arrangement of strands in the structures of 177 protein sandwich domains. The two β-sheets of these structures consist of a total of 6–11 strands. Our analysis has revealed that there are 58 supersecondary structural motifs, which describe all of these domains (Table 1). Each motif is characterized by a unique arrangement of strands.
Table 1. Supersecondary structural motifs.
| Strands | Variants | Domains | IJPs | Motifs |
|---|---|---|---|---|
| 6 | 5 | 6 | 10 | 3 6 1 |
| 4 5 2 | ||||
| 7 | 13 | 49 | 36 | 4 3 6 7 |
| 5 2 1 | ||||
| 8 | 13 | 65 | 36 | 6 3 8 1 |
| 5 4 7 2 | ||||
| 9 | 17 | 39 | 60 | 5 4 8 9 2 |
| 6 7 3 1 | ||||
| 10 | 7 | 11 | 30 | 6 5 4 9 10 2 |
| 7 8 3 1 | ||||
| 11 | 3 | 7 | 13 | 8 7 10 11 12 |
| 9 6 5 3 4 |
Strands, the number of strands that make up the two main sheets; Variants, the number of different motifs with a given number of strands; Domains, the number of protein domains where the variants with the given number of strands were found; IJPs, the total number of IJPs in all variants; Motifs, the most common supersecondary structural motif.
The proteins in these domains have different 3D structures and functions. Included among these proteins are the following: Igs, T cell antigen receptor, fibronectins, β-galactosidase, cadherins, IFN-γ receptor, cupredoxins, fibroblast growth factor receptor, killer cell inhibitory receptor, yeast killer toxin, growth hormone receptor, plant lipoxigenase, trypsin inhibitor, Cu, Zn superoxide dismutase, ganglioside activator, necrovirus coat protein, adenovirus fiber protein, and galectin.
We have found that protein domains from different superfamilies and folds can be described by the same motifs. For example, one particular supersecondary structure motif with seven strands was found in the structures of 27 domains, which according to the scop classification are classified in 3 folds, 10 superfamilies, and 11 protein families.
Rules I–III predict all observed motifs with an even number of strands (i.e., motifs consisting of 6 or 8 or 10 strands), as well as 11 of 13 motifs with 7 strands, 15 of 17 motifs with 9 strands, and 2 of 3 motifs with 11 strands. Overall, rules I–III predict 53 of the observed motifs (92%).
The five motifs that are not predicted by rules I–III are the following:
![]() |
We emphasize that these five motifs have the following distinctive feature: one of the two β-sheets has three consecutive strands that are not in consecutive order. For example, the first motif contains the consecutive strands 3, 2, 1, which appear in nonconsecutive order.
A total of 185 IJPs are in all analyzed supersecondary structural motifs. Among these IJPs, 176 (95%) satisfy rules I and II, whereas the remaining 9 (5%) do not satisfy these rules (four of these IJPs occur in the two motifs with seven strands (motifs 1 and 2), four of the IJPs occur in the two motifs with nine strands (motifs 3 and 4), and one occurs in the one motif with 11 strands (motif 5).
We note that one interlock was found in 30 motifs, two interlocks in 25 motifs, and three interlocks in 3 motifs. Furthermore, one superinterlock was found in 31 motifs and two superinterlocks were found in 4 motifs. Overall, one or two superinterlocks were found in 35 (60%) motifs.
Conclusions
We have presented rules that dictate the architectural properties of sandwich-like proteins.
Rule I implies that an even number of IJPs exist in a protein. Furthermore, because every IJP participates in the formation of an interlock, this structure can be considered as an invariant supersecondary substructure of sandwich proteins. This finding implies that it can be used to make supersecondary structure-based sequence multialignment of nonhomologous proteins (22).
Rules I and II impose severe restrictions on possible NS to a given IJP i and i + 1. For example, suppose that the IJP k and k + 1 forms an interlock with the IJP i and i + 1 (k ≠ i – 1), and suppose that d is a NS of i and m is a NS of i + 1,
![]() |
Rule II implies that d = i – 1 or d ≠ i – 1 and m = i + 2 or m ≠ i + 2. Furthermore, because strand i is already involved in the interlock i, i + 1, k, k + 1, this strand cannot take part in another interlock, for example, in i, i + 1, d, d + 1. Thus, strand d cannot be the first strand of any JP. Similarly, strand m cannot be the second strand of the JP m –1 and m.
Rules I–III decrease sharply the number of possible strand arrangements. For example, a priori 180 motifs consisting of three strands exist in each β-sheet. However, rules I–III imply that only the 15 following motifs are permissible:
![]() |
The first five of these proteins have already been deposited in the Protein Data Bank (ID codes 1HEH, 1CMO, 1CGT, 1HOE, and 1FLC). These considerations indicate that the above rules provide a powerful tool for structure prediction.
Acknowledgments
We thank Dr. A. Lesk for useful suggestions and critical comments, M. Kleyzit for performing computer calculations, and V. Potapov for drawing the figure. I.M.G. and A.E.K. thank Drs. C. Chothia and A. Finkelstein for very helpful discussions. A.S.F. thanks Dr. T. Papatheodorou for very useful suggestions.
Author contributions: A.S.F., I.M.G., and A.E.K. designed research, performed research, analyzed data, and wrote the paper.
Abbreviations: JP, jumping pair; EJP, edge JP; IJP, internal JP; NS, neighboring strand(s).
References
- 1.Chothia, C. & Lesk, A. M. (1986) EMBO J. 5, 823–826. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Chothia, C. (1993) Nature 357, 543–544. [DOI] [PubMed] [Google Scholar]
- 3.Thornton, J. M., Orengo, C. A., Todd, A. E., Frances, M. G. & Pearl, F. M. G. (1999) J. Mol. Biol. 293, 333–342. [DOI] [PubMed] [Google Scholar]
- 4.Russell, R. B. & Barton, G. J. (1994) J. Mol. Biol. 244, 332–350. [DOI] [PubMed] [Google Scholar]
- 5.Ptitsyn, O. B., Finkelstein, A. V. & Falk, P. (1979) FEBS Lett. 101, 1–5. [PubMed] [Google Scholar]
- 6.Cohen, F. E., Sternberg, M. J. E. & Taylor, W. R. (1981) J. Mol. Biol. 148, 253–272. [DOI] [PubMed] [Google Scholar]
- 7.Harris, N. L., Presnell, S. R. & Cohen F. E. (1994) J. Mol. Biol. 236, 1356–1368. [DOI] [PubMed] [Google Scholar]
- 8.Yue, K. & Dill, K. A. (2000) Protein Sci. 9, 1935–1946. [PMC free article] [PubMed] [Google Scholar]
- 9.Harrison, A., Pearl, F., Mott, R., Thornton, J. & Orengo, C. (2002) J. Mol. Biol. 323, 909–926. [DOI] [PubMed] [Google Scholar]
- 10.Rost, B. (2001) J. Struct. Biol. 134, 204–218. [DOI] [PubMed] [Google Scholar]
- 11.Jones, D. T. (1999) J. Mol. Biol. 292, 195–202. [DOI] [PubMed] [Google Scholar]
- 12.Cuff, J. A. & Barton, G. J. (2000) Proteins 40, 502–511. [DOI] [PubMed] [Google Scholar]
- 13.Chandonia, J. M. & Karplus, M. (1999) Proteins 35, 293–306. [PubMed] [Google Scholar]
- 14.Richardson, J. S. (1976) Proc. Natl. Acad Sci. USA 173, 2619–2623. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Cohen, F. E., Sternberg, M. J. E. & Taylor, W. R. (1982) J. Mol. Biol. 156, 821–862. [DOI] [PubMed] [Google Scholar]
- 16.Richardson, J. S. (1977) Nature 268, 495–500. [DOI] [PubMed] [Google Scholar]
- 17.Taylor, W. R. & Green, N. M. (1989) Eur. J. Biochem. 179, 241–248. [DOI] [PubMed] [Google Scholar]
- 18.Clark, D. A., Shirazi, J. & Rawlings, C. J. (1991) Protein Eng. 4, 751–760. [DOI] [PubMed] [Google Scholar]
- 19.Woolfson, D. N., Evans, P. A., Hutchinson, E. G. & Thornton, J. M. (1993) Protein Eng. 6, 461–470. [DOI] [PubMed] [Google Scholar]
- 20.Ruczinski, I., Kooperberg, C., Bonneau, R. & Baker, D. (2002) Proteins 48, 85–97. [DOI] [PubMed] [Google Scholar]
- 21.Sternberg, M. J. E. & Thornton, J. M. (1977) J. Mol. Biol. 110, 269–283. [DOI] [PubMed] [Google Scholar]
- 22.Kister, A. E., Finkelstein, A. V. & Gelfand, I. M. (2002) Proc. Natl. Acad. Sci. USA 99, 14137–14141. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Murzin, A. G., Brenner, S. E., Hubbard, T. & Chothia, C. (1995) J. Mol. Biol. 247, 536–540. [DOI] [PubMed] [Google Scholar]












