Abstract
A wide range of de novo design of αβ-proteins has been achieved based on the design rules, which describe secondary structure lengths and loop torsion patterns favorable for design target topologies. This paper proposes design rules for register shifts in βαβ-motifs, which have not been reported previously, but are necessary for determining a target structure of de novo design of αβ-proteins. By analyzing naturally occurring protein structures in a database, we found preferences for register shifts in βαβ-motifs, and derived the following empirical rules: (1) register shifts must not be negative regardless of torsion types for a constituent loop in βαβ-motifs; (2) preferred register shifts strongly depend on the loop torsion types. To explain these empirical rules by physical interactions, we conducted physics-based simulations for systems mimicking a βαβ-motif that contains the most frequently observed loop type in the database. We performed an exhaustive conformational sampling of the loop region, imposing the exclusion volume and hydrogen bond satisfaction condition. The distributions of register shifts obtained from the simulations agreed well with those of the database analysis, indicating that the empirical rules are a consequence of physical interactions, rather than an evolutionary sampling bias. Our proposed design rules will serve as a guide to making appropriate target structures for the de novo design of αβ-proteins.
Introduction
De novo protein design allows us to explore the whole protein sequence space in principle and create proteins with brand new structures and functions, independently from any naturally existing proteins [1]. Recently, significant progress has been made in de novo protein design and many successful examples have been reported, such as proteins with new shape [2–10] and new therapeutic proteins [11–14]. One of the procedures used for de novo design of proteins with β-sheets consists of the following three steps: (1) Determinig a blueprint of the target structure, a two-dimensional map specifying the number and lengths of secondary structures and loop torsion angle bins represented by the ABEGO classification [15], etc. (2) Building a three-dimensional backbone structure based on the blueprint. (3) Searching for amino acid sequences that fold to the target structure. In this procedure, the determining blueprints is important because if blueprints specify the three-dimensional structures that are physically undesignable, the de novo design invariably fails. In fact, some studies demonstrated that inappropriate blueprints for β-sheet-containing structures resulted in the failure of the de novo design [7, 10]. Then, how can we identify blueprints that specify designable three-dimensional structures?
Blueprints for de novo protein design have been created based on the rules for physically preferred local backbone geometries [1]. Examples of the rules include, the length of the loop controls the packing orientation of ββ-, βα-, and αβ-units [3], side chain directionality determines the preferred loop types connecting unpaired β-strands of β-sandwich structures [8], and large local deviations in the ideal β-strand twist are necessary to form a closed β-barrel [7]. Incorporating these rules into a blueprint made it possible to design proteins with shapes that could not have been designed rationally before [1]. Therefore, discovering new rules will contribute to the development of de novo protein design technology. For designing proteins containing β-sheets, the following parameters are required for making blueprints, secondary structure lengths [4]; loop geometries [8]; locations of bulges [6]; register shifts [5], etc. Here, a register shift is identified as a residue offset between terminal residues of adjacent β-strands. Among these parameters, the design rules specifying register shifts have not been reported, even though they are necessary. To create a blueprint that leads to successful design of αβ-proteins, understanding the design rules of register shifts from a physical viewpoint is essential. Here, we propose the rules for register shifts in βαβ-motifs, and provide the physical origin by physics-based simulation.
Results and discussion
Definition of register shift for βαβ-motifs
This section introduces βαβ-motifs and defines its register shifts. A βαβ-motif consists of two parallel β-strands belonging to the same β-sheet and an α-helix connecting the two strands with a right-handed connection (Fig 1A and 1B) [16]. Fig 1C shows a schematic representation of a βαβ-motif with two neighboring β-strands. In the figure, the N-terminal β-strand of the βαβ-motif (S1), the C-terminal β-strand (S2), and the α-helix connecting the two β-strands are colored in cyan, orange, and green, respectively. The neighboring β-strand paired with S1 is referred to as S1’ (white), and the one paired with S2 is referred to as S2’ (gray). S1’ and S2’ are depicted by two-headed arrows, to make it clear that we included S1’ and S2’ regardless of their (N- to C-terminus) chain directions in the present database analysis. The most N- and C-terminal residue pairs that form a cross-strand residue pairing are indicated by blue and red arrows, respectively. The definition of the cross-strand residue pairing is given in Ref. [17] and graphical explanation is shown in S1 Fig.
The register shifts for the βαβ-motif are defined as follows. Let iN(X) and iN(X, Y) be the residue number of the N-terminal residue of the X β-strand and the most N-terminal residue among residues of the X β-strand forming a cross-strand residue pairing with a residue of the Y β-strand, respectively (see S1 Fig). Using these notations, the N-terminal register shift for a parallel β-strand pair of S1 and S2 (denoted by ) is defined as
Similarly, the C-terminal register shift of a pair of S1 and S2 (denoted by ) is defined as
where iC(X) and iC(X, Y) are the residue number of the C-terminal residue of the X β-strand and the most C-terminal residue among residues of the X β-strand forming a cross-strand residue pairing with the Y β-strand, respectively.
When performing a database analysis of the register shifts of βαβ-motifs, we only included conformations with one or more β-strands on both sides of a βαβ-motif, as shown in Fig 1C. Here, any chain connectivity and either (N- to C-terminus) chain direction of S1’ and S2’ were considered. The purpose of including S1’ and S2’ is to avoid artifacts regarding the possible values of register shifts caused by the presence of S1 or S2 at the edge of the β-sheet. For example, if S1 is at the edge of the β-sheet, structures with a register shift of or , as shown in S2A Fig, are not possible in principle because there is no hydrogen bonding partner for the residues at the terminal of S1 and, thus, they cannot exist as a β-strand residue. Conversely, if S1 is not at the edge of the β-sheet, a structure with a register shift of or could, in principle, exist (see S2B Fig). The same argument can be made about S2 (see S2C and S2D Fig). Therefore, depending on whether S1 and S2 are at the edge of the β-sheet, the possible register shift values can be restricted. To eliminate such artifacts, in the database analysis of register shifts, we focused exclusively on conformations with one or more β-strands on both sides of βαβ-motifs.
As shown later, the distribution of register shifts for the strand pair S1 and S1’ and that S2 and S2’ also exhibited an interesting behavior. The N-terminal and C-terminal register shifts between S1 and S1’ are expressed as and . Similarly, those between S2 and S2’ are expressed as and . The graphical explanation of these register shifts is shown in Fig 1C, and the mathematical definition is provided in Materials and methods section. Using these expressions, the register shifts of the βαβ-motif depicted in Fig 1D is given as .
Database analysis of register shifts
We performed a statistical analysis of the six register shifts (, , , , and ) for a subset of protein structures deposited in the protein structure databank (PDB). The culled PDB dataset generated by the PISCES server [18] was used with the following parameters: resolution, ≤ 2.5Å; R-factor, ≤ 1.0; and sequence identity, ≤ 25%. For this dataset, we used the STRIDE program [19] for secondary structure assignment, from which βαβ-motifs with β-strands on both sides were identified. The number of identified βαβ-motifs was 5,776. For each motif, we identified the six register shifts and obtained their respective distributions.
The observed frequencies of the six register shifts in the dataset are shown in Fig 1E. The striking features of these graphs were as follows. (1) Common to all six graphs, a register shift of zero was the most frequently observed, indicating that the start or end points of spatially adjacent β-strands tend to be aligned. (2) For all six types of distributions, the occurrence of negative shifts was rare compared with the positive one. In particular, , , and exhibited few negative shifts. Later in this paper, we show that the negative register shifts of were physically prohibited for the most frequently observed loop type in αβ-units.
As reported in the literature [4], loops in αβ-units and those in βα-units exhibit various torsion types. It would be instructive to investigate whether the trend of the statistical distribution of register shifts depends on the loop types included in βαβ-motifs. Before conducting this analysis, we performed a census of loop types in our dataset to obtain an overview of the degree of variation in loop types. Fig 2 shows the occurrence frequencies for loop types in βα-units and those in αβ-units, as classified according to the ABEGO representation [4, 15, 20] and the packing geometry (parallel or anti-parallel) [4]. Here, the 10 loops with the highest frequency of occurrence are shown. In the ABEGO classification, “A”, “B”, “E”, “G”, and “O” denote the right-handed α-helix region of the Ramachandran plot; right-handed β-strand region; left-handed β-strand region; left-handed helix region; and the cis peptide conformation, respectively. As shown in Fig 2(A) and 2(B), there was no predominant loop type in loops in βα-units, whereas there were three prominently more frequent loop types in loops in αβ-units; i.e., GB(A), GBB(P), and GBA(A). This result suggests that loops in βα-units are highly diverse, whereas those in αβ-units exhibits less diversity. In fact, in this dataset, there were 2,749 loop types in βα-units and only 910 loop types in αβ-units, indicating that loop types in αβ-units have a limited variation compared with those in βα-units. In the following, we refer to GB(A) loop in αβ-units as GB loop because GB(P) loop was not observed in the dataset, erasing the need to distinguish between GB(A) and GB(P). Similarly, we will refer to GBA(A) loop in αβ-units as GBA loop.
Do distributions of register shifts depend on the loop types of components of a βαβ-motif? The observed register shifts for each loop type in βα and αβ-units listed in Fig 2 are shown in S3–S6 Figs These figures indicates that the distributions of register shifts vary significantly according to the loop types and that the distributions for some loop types (e.g., ) largely differed from the average distribution of all loop types shown in Fig 1E.
As interesting examples, the distributions of and , the register shifts of the two most frequent loops with anti-parallel orientation in αβ-units, are shown in Fig 3. In the distribution, almost only was observed (Fig 3A), whereas in the distribution, was observed with roughly the same frequency (Fig 3D). In addition, for , a register shift of zero was rarely observed (Fig 3B), whereas for , the observed frequency of a register shift of zero was prominently large (Fig 3E). Therefore, the difference between the distribution of register shifts of GB and GBA loop was large, implying that the preferred register shifts strongly depend on loop type.
Method for exhaustive structural sampling of model peptides and physical interaction calculations
Can the distributions of register shifts shown in Fig 3, S3 and S4 Figs be attributed to an evolutionary sampling bias, or are they an inevitable consequence of physical interactions? In the following, we show that the distribution of and , as shown in Fig 3, can be explained by physical interactions, as demonstrated by an exhaustive structural sampling of model peptides and physical interaction calculations.
The outline of this calculation is as follows. First, we generated exhaustive GB loop structures with an α-helix on their N-termini and a β-strand connected to their C-termini, and obtained various αβ-unit structures. Then, we placed each of them within a β-sheet consisting of three β-strands with various register shifts, and identified the structures that met the various conditions. The conditions imposed here were the exclusion volume [21] and the hydrogen bond satisfaction conditions [22]. The exclusion volume condition based on the hard-sphere model stipulates that the distance between any two atoms cannot be less than the sum of their van der Waals radii. The hydrogen bond satisfaction condition implies that the polar backbone atoms must form hydrogen bonds, either intramolecularly or with a solvent. Fitzkee et al. showed that the conformational constraints imposed by these two rules are sufficient to reproduce the fragment conformations observed in PDB [23], suggesting that breaking either of the two rules is such a strong violation that it is forbidden in native structures. We show that structural ensembles satisfying the two conditions were qualitatively consistent with the distributions obtained from the database analysis reported in Fig 3A and 3B.
We used a 12-residue peptide as a model for a αβ-unit connected by a GB loop (see S7 Fig). For this system, we selected an amino acid sequence made up solely of alanine residues, except for the ninth residue, which was glycine, because this position is the G position in the GB loop; thus, the dihedral angle at this position is the G region of the ABEGO classification; therefore, glycine is most physically preferred. In fact, only glycine was prominently abundant at the G position in the GB loop in the dataset (see S8 Fig). The structure of the peptide from 1st to the 8th residue was constrained to be an α-helix, and that of the 12th residue to be a β-strand. The internal coordinates of these residues (dihedral angle, bond angle, and bond length) were fixed to the values of those of the consensus structure of the αβ-unit connected by a GB loop. The consensus structure was defined as the one with the highest similarity to all structures of GB loops in αβ-units in the dataset. More specifically, given N αβ-units connected by a GB loop, the sum of the root-mean-square deviations (RMSDs) to all other structures from a structure i,
(1) |
was calculated and the structure with the smallest S(i) was selected as the consensus structure. The consensus structure was the structure of residues 85–96 of a putative oxidoreductase (PDB ID: 3c1aB). For this model peptide, ϕ and ψ angles of the 9th, 10th, and 11th residues were exhaustively sampled within the G, B, and B regions shown in S9 Fig, respectively, by subdividing the ϕ-ψ space into a 5° × 5° grid. Since, there were 145 and 277 discrete states in the region G and B, respectively, the total number of exhaustively sampled structures was 11,125,705 (= 145 × 277 × 277). Among the generated αβ-unit structures, only conformations preserving the hydrogen bonds of the consensus structure and met the the steric exclusion condition were used in the next stage of the experiment. Here, for evaluating the steric exclusion condition, only the heavy atoms of the main chain and the Cβ atoms were considered, and the hard sphere atomic radii described in Ref. [21] were used. For identifying hydrogen bonds, the HBPLUS program [24] was used. The number of structures that met the two conditions was 1,586,681.
Each generated αβ-unit structure that satisfied the two abovementioned conditions was implanted in three-stranded β-sheet structures with the nine different register shifts shown in Fig 4. The systems shown in Fig 4A–4I are termed as the system I–IX, respectively. The three-dimensional structures of the three-stranded β-sheet were obtained from the consensus structure of the three-stranded β-sheets with lengths of five. The reason for the choice of the five-residue β-strand is documented in Materials and methods section. The method used for determining the consensus structure was essentially the same as that for determining the consensus structure of the αβ-unit, except the use of the MICAN algorithm [25–27], a method that ignores the connectivity of the β-strands for computing RMSDs. The consensus structure obtained was the structure of residues 5–9, 36–40, and 106–110 of the bacterial cell division regulator protein MipZ (PDB ID: 2xj4A). Only the heavy atoms of the main chain and the Cβ atoms were used for the evaluation (these atoms are shown in S10 Fig). The β-sheet structures with the nine different register shifts shown in Fig 4 were prepared by removing the atoms from the consensus β-sheet structure. All atoms of β-sheet structures of each of the system I–IX are shown in S11–S19 Figs. Into each of the nine β-sheet structures, each exhaustively generated conformations of the αβ-unit was implanted. The procedure for this implantation is documented in Materials and methods section.
For a given conformation of each implanted system, we evaluated whether the conformation satisfied the following three conditions.
-
(a)
The β-sheet hydrogen bond condition: The hydrogen bonds necessary for each β-sheets shown in Fig 4 must be correctly formed. The list of the hydrogen bond conditions required for each β-sheet system is shown in Table 1. The intra-chain hydrogen bonds used for evaluating β-sheet formation are indicated by red dotted lines in Fig 4. Definition of the hydrogen bonds listed in Table 1 is provided in S20 Fig. Note that the system VII cannot satisfy this condition at all because there is no hydrogen bonding partner for the N-terminal residue of S2 and thus it cannot, in principle, be a β-strand residue.
-
(b)
The steric exclusion condition: Atomic collisions between the αβ-unit and S1 or S2’ are prohibited. The hard sphere atomic radii described in Ref. [21] were used for evaluating the steric exclusion condition.
-
(c)
The hydrogen bond satisfaction condition: Any donor and acceptor in the main chain must form hydrogen bonds, either intramolecularly or with a water solvent. The CHASA program [28] was used to determine whether a polar group that did not form an intra-chain hydrogen bond can undergo hydrogen bonding with a water molecule.
Table 1. List of intra-chain hydrogen bonds that must/must not be satisfied for the systems I–IX.
hydrogen bond\system | I | II | III | IV | V | VI | VII | VIII | IX |
---|---|---|---|---|---|---|---|---|---|
HB1 | - | - | n | - | - | n | - | - | n |
HB2 | - | b | b | - | b | b | - | b | b |
HB3 | b | b | b | b | b | b | - | - | - |
We repeated this evaluation for all exhaustively generated structures in each system, and identified the number of structures that satisfy the three conditions for the system I–IX.
The computational results were consistent with the statistics obtained in the database analysis
Fig 5A presents the occurrence frequency of structures that satisfied the three conditions for the nine systems shown in Fig 4. It is evident from the figure that the occurrence frequency of structures that satisfied all the three conditions was prominently large only in the system VIII, and almost zero in the other systems. This result implies that the system VIII alone, which has the register shift of , is physically suitable for a GB loop, and that the other systems are almost physically prohibited.
Next, we demonstrate that the results of these calculations are consistent with the statistics of the database analysis shown in Fig 3A and 3B. To compare them, we computed the occurrence frequencies for register shifts of , , , , , and . Here, we assume that their frequencies denoted by P can be obtained by the following equations.
Here, f(x) denotes the frequency of structures satisfying the conditions in the system x (from I to IX). The resulting distributions of and are shown in Fig 5B and 5C, respectively. The comparison of the register shift distribution shown in Fig 5B with the distribution of the range of the horizontal axis from −1 to 1 in Fig 3A showed that both are in good agreement; in both graphs, the observed frequency of is large, and the other states are negligible. Similarly, the comparison of the register shift distribution of Fig 5C with that of Fig 3B revealed that the two graphs are also in good agreement; only the observed frequency of was large. These observations suggest that the statistical distributions of the register shifts obtained by the database analysis shown in Fig 3 are a consequence of physical interactions, rather than an evolutionary sampling bias.
Physical explanation of the distribution of the GB loop register shift: is prohibited by the exclusion volume condition
The following sections discuss the physical factors prohibiting all the systems, except for the system VIII. As data for consideration, frequencies of structures satisfying the condition (a); (b); (c); (a) and (b); (b) and (c); (c) and (a); and all three conditions for each of the systems (from I–IX) are shown in Fig 6A–6I. We excluded the system VII from the following discussion because it cannot satisfy the condition (a) at all.
First, let us discuss the systems with a register shift of , i.e., the systems I-VI. Fig 6B, 6C, 6E and 6F show that the systems II, III, V, and VI were mostly prohibited by condition (b) alone, i.e., the exclusion volume condition, as their surviving percentages were 2 × 10−3%, 6 × 10−4%, 0.5%, and 0.2%, respectively. Different from these systems, when the condition (b) alone was imposed, the surviving percentages of the systems I and IV were not nearly zero; they were 2.0% and 11%, respectively. In addition to the condition (b), if the condition (a) or (c) was added, the surviving percentages of the systems I and IV become almost zero. Note that the satisfaction of the condition (a) in these two systems corresponded to the formation of the HB3 bond (see S20 Fig). Moreover, we confirmed that all structures satisfying the condition (c) of the systems I and IV formed HB3 bond; i.e., all structures satisfying the condition (c) of the systems I and IV were included in those satisfying condition (a). These results imply that the systems I and IV cannot achieve collision-free structures while forming the hydrogen bond required for the systems.
As described above, the systems with are primarily prohibited by the exclusion volume condition. Therefore, in which atomic pairs do atomic collisions occur? Fig 7 shows the frequency of structures with inter-atomic collisions between a given secondary structure pair for each system (from I to IX). For the systems with (i.e., I, II, and III), the percentage of inter-atomic collisions between the helix and S2’ was approximately 100%. Similarly, for the systems with (i.e., IV, V, and VI), the percentage was as high as 90%. In contrast to these systems, the systems with (i.e., VII, VIII, and IX) showed the inter-atomic collisions with a probability of only about 50%. These results suggest that structures with are almost prohibited because of an atomic crash between the helix and S2’, and that several structures without inter-atomic collisions are possible only with a register shift of . Fig 8A shows an example of a typical structure of the system V, i.e., a structure with a register shift of , with the inter-atomic collision between the helix and S2’. In this figure, the atomic overlap between the Cα atom in the helix and the Cα atom in S2’ is represented as gray-colored transparent van der Waals spheres. As shown in this figure, the C-terminal part of the helix was close to S2’, and an inter-atomic collision occurred.
Rationale for reducing the number of structures of the system VIII
The next question is why many structures in the system IX are forbidden. To address this question, we divide the structures of the system IX into structures that are prohibited for reasons common to those that are prohibited in the system VIII, and those that are prohibited for reasons specific to the system IX. This approach will facilitate the understanding of the significant difference in the number of allowed structures between the systems VIII and IX, although they differ by only one residue in the length of S1.
For this reason, we first determined the structures that are forbidden in the system VIII, and revealed the mechanism underlying this phenomenon. Fig 6H shows that the largest impact of one condition alone was condition (b), the exclusion volume condition, among the three conditions: the percentages of the surviving structures when imposing condition (a), (b), and (c) on the system VIII were about 91%, 15%, and 45%, respectively. The percentage of the surviving structures for only the condition (b) was relatively close to that observed imposing all the three conditions (10.45%), suggesting that the main cause of the reduction in the system VIII is the exclusion volume condition. Next, we discuss the percentage of the surviving structures when two conditions were imposed. The smallest percentage was observed for the combination of the condition (b) and (c) (i.e., the exclusion volume and the hydrogen bond satisfaction conditions), which was small compared with the other combinations: The percentages of the combinations of (a) and (b); (b) and (c); and (c) and (a) were 15.79%, 10.48%, and 45. 23%, respectively. The surviving percentage under conditions (b) and (c) was close to that under all the three conditions (10.45%), indicating that the combination of the conditions (b) and (c) was the dominant factor in the reduction of the system VIII.
Subsequently, we examined which secondary structure pairs frequently collide in the system VIII. Fig 7H shows that the secondary structure pairs with a high collision percentage in the system VIII were the GB loop-S1 pair and the helix-S2’ pair: Their collision percentages were 47.79% and 46.71%, respectively. The frequency of collision for either of these two secondary structure pairs was 83.1% (i.e., the frequency of the two secondary structure pairs colliding simultaneously was 11.40%), which was close to the decrease caused by inter-atomic collisions when considering all atoms in the system (see Fig 6H). This observation implies that the main factor in the decrease under the condition (b) is the collisions between these two secondary structure pairs. Fig 8B and 8C show a structure with an atomic collision between the GB loop and S1 and that with a collision between the helix and S2’, respectively.
Next, we investigated which polar groups of the system VIII fail to satisfy the condition (c), i.e., the hydrogen bond satisfaction condition. As described above, the percentage that satisfied the condition (b) alone (15.79%) was further reduced to 10.48% when the condiition (c) was added, which is nearly equal to the percentage under all the three conditions (10.45%). Here, we focused on the decrease in the percentage (5.31%). Regarding this decrease, the frequencies that each polar group failed to form a hydrogen bond are shown in Fig 8D. The largest contributor to the decrease was the CO group located just before S2, and it failed to form a hydrogen bond in 72% of the cases of the decrease. Since the other polar groups have a small percentage of unsatisfied hydrogen bonds compared with the CO group, the main contributor to the decrease was the unsatisfied hydrogen bonding of the CO group. Fig 8E shows an example of a structure with the most unsatisfied CO group not satisfying the hydrogen bond satisfaction condition. As shown in the figure, the N-terminal part of S2’ collided with the virtual water molecule generated from the CO group, implying that the CO group cannot form a hydrogen bond with a water molecule.
Taken together, these findings reveal the existence of three dominant factors for the decrease of structures satisfying the conditions in the system VIII: the atomic collisions between the GB loop and S1, those between the helix and S2’, and the unsatisfied hydrogen bond between the CO group just before S2 and a water solvent. Although the number of structures that satisfied the conditions was decreased due to the aforementioned reasons, the system VIII exhibits a survival of about 10% for the exhaustively generated structures, unlike the other systems.
To understand whether the value 10% can be considered as a relevant value or not, we compared this value with a surviving percentage of an α-helix formation among exhaustively generated structures. The detailed description of the α-helix system and the evaluating procedure are documented in Materials and methods section. The surviving percentage of the α-helix formation was found to be 2.3%, which was smaller than that of the system VIII, implying that the system VIII is more favorable in terms of a number of conformations that satisfied the conditions than the α-helix system. Thus, the value of 10% is a relevant.
Factors that reduce the number of structures of the system IX for reasons specific to the system IX
Finally, we discuss the mechanism underlying the decrease in the number of structures of the system IX for reasons specific to the system IX. Fig 8F shows frequencies of structures that satisfied the various conditions imposed on the 165,911 structures of the system IX, which corresponded to the survivors of the system VIII. The figure demonstrates that imposing a single condition alone did not drastically reduce the number of survivors: The percentages of survivors, when conditions (a), (b), and (c) were imposed, were 49.8%, 70.3%, and 60.4%, respectively. Next, we consider the case with the two conditions imposed. The smallest percentage of survivors was observed for the combination of the conditions (c) and (a) (i.e., the hydrogen bonding satisfaction condition and the β-sheet hydrogen bonding), which was significantly small compared with the other combinations: the percentages of the survivors under the conditions (a) and (b); (b) and (c); and (c) and (a) were 43.5%, 36.4%, and 10.2%, respectively. The percentage of survivors under the condition (c) and (a) was close to that under the three conditions (9.6%), suggesting that the dominant factor reducing the system IX for reasons specific to the system IX is the inability to simultaneously satisfy the conditions (c) and (a).
We then identified which polar groups of the system IX cannot satisfy the two conditions. Fig 8G shows the polar groups that failed to satisfy either of the two conditions for the 165,911 structures. There were only two polar groups; the NH group located in the starting residue of S1 and the CO group in the GB loop, indicated by blue and red arrows, respectively. The NH group did not satisfy the conditions (c) or (a) in 99.9% of the decrease. On the other hand, the CO group did not satisfy the conditions in only 59.8%, implying that the dominant reason for the decrease is the inability of the NH group to satisfy the two conditions.
How does this NH group not satisfy the two conditions? As indicated in Fig 8G, this NH group did not satisfy the conditions (a) and (c) in 55.8% and in 44.1% of the decrease. Thus, the frequencies of not meeting the conditions were approximately the same for the two conditions. It is worth commenting that, when focusing only on the NH group, there was no structure that did not satisfy both conditions (a) and (c) because the unsatisfaction of the condition (a) (the HB1 bond is formed) automatically leads to the satisfaction of the condition (c) (the hydrogen bond satisfaction condition). Conversely, the unsatisfaction of the condition (c) (the NH is not hydrogen bonded) automatically leads to the satisfaction of the condition (a) (the HB1 bond is not formed). Thus, structures that satisfy the condition (a) and those that satisfy the condition (c) are exclusive for the NH group.
Fig 8H shows an example of a structure of the system IX that does not satisfy the condition (a). Recall that the satisfaction of condition (a) implies that the HB1 bond must not be formed (see S20 Fig). If the HB1 bond is formed, S2 extends one residue to the N-terminal side and the register shift becomes , as shown in Fig 8H, which is a different register shift required for the system IX. In addition, the formation of the HB1 bond converts the GB loop into the G loop. Therefore, the tendency to form the HB1 bond is a significant factor that hampers generating structures of the system IX. The next example is a structure that does not satisfy condition (c) (see Fig 8I). The figure indicates that the virtual water molecule generated from the NH group collides with Cα atoms in the GB loop. Since the NH group and the GB loop are close in space, the NH group cannot form hydrogen bonds with a water molecule. Taken together, these findings suggest that the fact that system IX has the NH group explains the large discrepancy in the number of allowed structures observed between the system VIII and IX, as it has no choice but to form intramolecular HB1 bond or stay in a state of hydrogen bond unsatisfaction.
Lessons from de novo designed proteins with blueprints that violate the register shift rules
From the above arguments, we have confirmed that the register shift rule for a GB loop deduced from the database analysis is a consequence of physical interactions. The rule for a GB loop states that must be zero, and must be greater than or equal to 1. Here, we examine the consequences of performing de novo protein design based on a blueprint that violates the register shift rules for GB loops. We found that, in the blueprints of the five αβ fold proteins designed in Ref. [3], some GB loops violated the rules. Let us compare these blueprints with their corresponding structures determined by nuclear magnetic resonance (NMR). Fig 9 shows the blueprints used in Ref. [3](the first column), the three-dimensional target structures generated based on the blueprints (the second column), the blueprints colored according to the consistency with their NMR structures (the third column), the NMR structures (the fourth column), and the frequency of hydrogen bond formation calculated based on the NMR structures (the fifth column). In the first and third columns of the figure, the black curves, the magenta curves, and the gray-colored filled rectangles represent GB loops that satisfy the rule; those that violate the rule; and residues that were designated to β-strands in the blueprints, but did not form in the NMR structures; respectively. Although the NMR structures were consistent with the target structures in terms of RMSD, they were not consistent with the hydrogen bonds specified in the blueprints. Note that all hydrogen bonds located near the GB loops that violated the rules were not formed in the NMR structure. In contrast, those that satisfied the rules, with one exception, were well-formed, and the exception can be rationalized: the one exception was the hydrogen bond 1 in Fold III (Fig 9C), which was disrupted by the lack of formation of β-strand of the C-terminal residue of the red-colored β-strand; This phenomenon is a consequence of violating the rule. Additionally, the phenomenon that N-terminal residues of the red- and blue-colored β-strands were not formed in the NMR structure of Fold V (Fig 9E) can be interpreted as the domino effect of breaking β-strand residues initiated by violating the rule of a GB loop that is connected to the cyan-colored β-strand. Such failures, when they occur in proteins consisting of a larger number of β-strands, could lead to more serious design failures. For more a complete and accurate de novo design of αβ-proteins, the register shift rules are necessary and will play an important role in creating appropriate blueprints.
Materials and methods
Mathematical definition of the register shift
This section describes the mathematical definition of the register shifts. We consider four-stranded β-sheets consisting of the S1’, S1, S2, and S2’ β-strand, as shown in Fig 1D. For these β-sheets, the register shifts , , , , , and are defined as follows.
Here, iN(X) and iC(X) are the residue number of the N-terminal and C-terminal residues of β-strand X, respectively. iN(X, Y) and iC(X, Y) are the most N-terminal and the most C-terminal residue among residues of strand X forming a cross-strand residue pairing with a residue of strand Y, respectively. An explanation of these variables in graphical form is given in S1 Fig.
The reason for the choice of the five-residue β-strand for the consensus three-stranded β-sheet structure
We used the five-residue β-strands for the consensus three-stranded β-sheet structure. The reasons are as follows: First, as shown in S11–S19 Figs, atoms with over five residues are required for the sysmtes. Second, by choosing a five-residue length β-strand, the effect of the loop structure attached to the β-strand can be eliminated almost totally. As shown in S11–S19 Figs, not all five residues need to be β-strand; the middle three residues of the five residue β-stranded structure are sufficient. However, if the edge of a five-residue peptide is a loop, the structure of the edge residue and that of the residues spatially adjacent to it can be optimized for its specific loop structure. To eliminate this dependency on the loop structure, we chose the five-residue β-strands for the consensus β-sheet.
The procedure for implanting the structure of the αβ-unit into the three-stranded β-sheet structure
This section provides the procedure for implanting the structure of the αβ-unit into the three-stranded β-sheet structure. A schematic representation of the procedure is presented in S21 Fig. The C and O atoms of the 11th residue and the N, CA, C, and O atoms of the 12th residue of the αβ-unit structure are superimposed on the C and O at the 108th residue and the N, CA, C, and O atoms of the 109th residue of the consensus β-sheet structure, as shown in S10 Fig, respectively, so that their RMSD was minimized. To remove the redundant atoms, the atoms of the S2 β-strand of the consensus β-sheet used for the superposition were removed from the system.
The procedure for calculating surviving percentage of an α-helix system
We used an 11-residue peptide as a model system for an α-helix (see S22A Fig). For this system, we selected an amino acid sequence made solely of alanine residues. All bond lengths and bond angles were fixed to the standard values defined in the CHARMM Param 19 parameter set [29]. All ϕ, ψ and ω angles were also fixed to the typical values of α-helix ((ϕ, ψ, ω) = (−60, −45, 180)), except for ϕ and ψ angles of the central three residues of the model peptide (5th, 6th and 7th residue). The ϕ and ψ angles of the central three residues of the model peptide were exhaustively sampled within the popular region of the A region of the ABEGO classification (S22B and S22C Fig). The threshold for the popular regions was determined so that the total number of sampling structures were roughly equivalent to the number of structure exhaustively generated in the simulation of the system VIII (1,586,681). The resultant popular A region had 117 discrete states, thus, the total number of exhaustively sampled structures was = 1173 = 1, 601, 613. The surviving percentage was calculated by counting the number of structures that formed all the hydrogen bonds required for an α-helix.
Supporting information
Acknowledgments
The authors thank Masaki Sasai and Yukio Tanaka for fruitful discussions.
Data Availability
All relevant data are within the manuscript and its Supporting information files.
Funding Statement
This work was supported by Platform Project for Supporting Drug Discovery and Life Science Research JP20am0101111 (to GC) from Japan Agency for Medical Research and Development (https://www.amed.go.jp/) and by Grant-in-Aid for Scientific Research (B) 19H03166 (to GC) from Japan Society for the Promotion of Science (https://www.jsps.go.jp/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1.Marcos E, Silva DA. Essentials of de novo protein design: Methods and applications. WIREs Computational Molecular Science. 2018;8(6):e1374. doi: 10.1002/wcms.1374 [DOI] [Google Scholar]
- 2.Kuhlman B, Dantas G, Ireton GC, Varani G, Stoddard BL, Baker D. Design of a novel globular protein fold with atomic-level accuracy. Science. 2003;302(5649):1364–1368. doi: 10.1126/science.1089427 [DOI] [PubMed] [Google Scholar]
- 3.Koga N, Tatsumi-Koga R, Liu G, Xiao R, Acton TB, Montelione GT, et al. Principles for designing ideal protein structures. Nature. 2012;491(7423):222–227. doi: 10.1038/nature11600 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Lin YR, Koga N, Tatsumi-Koga R, Liu G, Clouser AF, Montelione GT, et al. Control over overall shape and size in de novo designed proteins. Proc Natl Acad Sci USA. 2015;112(40):E5478–E5485. doi: 10.1073/pnas.1509508112 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Huang PS, Feldmeier K, Parmeggiani F, Velasco DAF, Höcker B, Baker D. De novo design of a four-fold symmetric TIM-barrel protein with atomic-level accuracy. Nat Chem Biol. 2016;12(1):29–34. doi: 10.1038/nchembio.1966 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Marcos E, Basanta B, Chidyausiku TM, Tang Y, Oberdorfer G, Liu G, et al. Principles for designing proteins with cavities formed by curved β sheets. Science. 2017;355(6321):201–206. doi: 10.1126/science.aah7389 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Dou J, Vorobieva AA, Sheffler W, Doyle LA, Park H, Bick MJ, et al. De novo design of a fluorescence-activating β-barrel. Nature. 2018;561:485–491. doi: 10.1038/s41586-018-0509-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Marcos E, Chidyausiku TM, McShan AC, Evangelidis T, Nerli S, Carter L, et al. De novo design of a non-local β-sheet protein with high stability and accuracy. Nature Structural & Molecular Biology. 2018;25(11):1028–1034. doi: 10.1038/s41594-018-0141-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Pan X, Thompson MC, Zhang Y, Liu L, Fraser JS, Kelly MJS, et al. Expanding the space of protein geometries by computational design of de novo fold families. Science. 2020;369(6507):1132–1136. doi: 10.1126/science.abc0881 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Koga N, Koga R, Liu G, Castellanos J, Montelione GT, Baker D. Role of backbone strain in de novo design of complex α/β protein structures. Nat Commun. 2021;12:3921. doi: 10.1038/s41467-021-24050-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Fleishman SJ, Whitehead TA, Ekiert DC, Dreyfus C, Corn JE, Strauch EM, et al. Computational design of proteins targeting the conserved stem region of influenza hemagglutinin. Science. 2011;332(6031):816–821. doi: 10.1126/science.1202617 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Silva DA, Yu S, Ulge UY, Spangler JB, Jude KM, Labão-Almeida C, et al. De novo design of potent and selective mimics of IL-2 and IL-15. Nature. 2019;565(7738):186–191. doi: 10.1038/s41586-018-0830-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Sesterhenn F, Yang C, Bonet J, Cramer JT, Wen X, Wang Y, et al. De novo protein design enables the precise induction of RSV-neutralizing antibodies. Science. 2020;368 (6492). doi: 10.1126/science.aay5051 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Linsky TW, Vergara R, Codina N, Nelson JW, Walker MJ, Su W, et al. De novo design of potent and resilient hACE2 decoys to neutralize SARS-CoV-2. Science. 2020;370(6521):1208–1214. doi: 10.1126/science.abe0075 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Wintjens RT, Rooman MJ, Wodak SJ. Automatic Classification and Analysis of αα-Turn Motifs in Proteins. JMolBiol. 1996;255(1):235–253. [DOI] [PubMed] [Google Scholar]
- 16.Sternberg MJE, Thornton JM. On the conformation of proteins: The handedness of the β-strand-α-helix-β-strand unit. JMolBiol. 1976;105(3):367–382. [DOI] [PubMed] [Google Scholar]
- 17.Merkel JS, Sturtevant JM, Regan L. Sidechain interactions in parallel β sheets: the energetics of cross-strand pairings. Structure. 1999;7(11):1333–1343. doi: 10.1016/S0969-2126(00)80023-4 [DOI] [PubMed] [Google Scholar]
- 18.Wang G, Dunbrack J Roland L. PISCES: a protein sequence culling server. Bioinformatics. 2003;19(12):1589–1591. doi: 10.1093/bioinformatics/btg224 [DOI] [PubMed] [Google Scholar]
- 19.Frishman D, Argos P. Knowledge-based protein secondary structure assignment. Proteins. 1995;23(4):566–579. doi: 10.1002/prot.340230412 [DOI] [PubMed] [Google Scholar]
- 20.Wintjens R, Wodak SJ, Rooman M. Typical interaction patterns in alphabeta and betaalpha turn motifs. Protein EngDesSel. 1998;11(7):505–522. doi: 10.1093/protein/11.7.505 [DOI] [PubMed] [Google Scholar]
- 21.Fitzkee NC, Rose GD. Steric restrictions in protein folding: An α-helix cannot be followed by a contiguous β-strand. Protein Science. 2004;13(3):633–639. doi: 10.1110/ps.03503304 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Fleming PJ, Rose GD. Do all backbone polar groups in proteins form hydrogen bonds? Protein Science. 2005;14(7):1911–1917. doi: 10.1110/ps.051454805 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Fitzkee NC, Rose GD. Sterics and Solvation Winnow Accessible Conformational Space for Unfolded Proteins. JMolBiol. 2005;353(4):873–887. [DOI] [PubMed] [Google Scholar]
- 24.McDonald IK, Thornton JM. Satisfying hydrogen bonding potential in proteins. JMolBiol. 1994;238(5):777–793. [DOI] [PubMed] [Google Scholar]
- 25.Minami S, Sawada K, Chikenji G. MICAN: a protein structure alignment algorithm that can handle Multiple-chains, Inverse alignments, Cα only models, Alternative alignments, and Non-sequential alignments. BMC Bioinformatics. 2013;14(1):24. doi: 10.1186/1471-2105-14-24 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Minami S, Sawada K, Chikenji G. How a Spatial Arrangement of Secondary Structure Elements Is Dispersed in the Universe of Protein Folds. PLoS ONE. 2014;9(9):e107959. doi: 10.1371/journal.pone.0107959 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Minami S, Sawada K, Ota M, Chikenji G. MICAN-SQ: a sequential protein structure alignment program that is applicable to monomers and all types of oligomers. Bioinformatics. 2018;34(19):3324–3331. doi: 10.1093/bioinformatics/bty369 [DOI] [PubMed] [Google Scholar]
- 28.Fleming PJ, Fitzkee NC, Mezei M, Srinivasan R, Rose GD. A novel method reveals that solvent water favors polyproline II over β-strand conformation in peptides and unfolded proteins: Conditional hydrophobic accessible surface area (CHASA). Protein science. 2005;14(1):111–118. doi: 10.1110/ps.041047005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Neria E, Fischer S, Karplus M. Simulation of activation free energies in molecular systems. Journal of Chemical Physics. 1996;105(5):1902–1921. doi: 10.1063/1.472061 [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All relevant data are within the manuscript and its Supporting information files.