Abstract
Symmetric protein architectures have a compelling aesthetic that suggests a plausible evolutionary process (i.e., gene duplication/fusion) yielding complex architecture from a simpler structural motif. Furthermore, symmetry inspires a practical approach to computational protein design that substantially reduces the combinatorial explosion problem, and may provide practical solutions for structure optimization. Despite such broad relevance, the role of structural symmetry in the key area of hydrophobic core‐packing cooperativity has not been adequately studied. In the present report, the threefold rotational symmetry intrinsic to the β‐trefoil architecture is shown to form a geometric basis for highly‐cooperative core‐packing interactions that both stabilize the local repeating motif and promote oligomerization/long‐range contacts in the folding process. Symmetry in the β‐trefoil structure also permits tolerance towards mutational drift that involves a structural quasi‐equivalence at several key core positions.
Keywords: oligomerization, protein design, protein evolution, protein symmetry
1. INTRODUCTION
The “β‐trefoil” is a common globular protein architecture exhibiting an internal pseudo‐threefold rotational symmetry, 1 and with the repeating structural motif known as a “trefoil.” 2 β‐trefoil proteins comprise a functionally diverse group of ligands, consisting of 10 superfamilies, 17 families, and 53 domains, in addition to several symmetric de novo designs (The Structural Classification of Proteins database [SCOP2] 3 ). Functional diversity is provided principally by loop and turn region heterogeneity that distinguishes the different β‐trefoil family tertiary structures. The fundamental trefoil motif is ~40–50 amino acids in length and comprises four β‐strands (β1‐4) separated by three turn regions. Two β‐strands (β1 and β4) from each motif associate to form a six‐stranded anti‐parallel β‐barrel. The other two strands (β2 and β3) from each motif form a β‐hairpin, and the three combined β‐hairpins from each motif cap the “bottom” of the β‐barrel (Figure 1). 2
FIGURE 1.

General architecture of the β‐trefoil: threefold rotational symmetry of a repeating “trefoil” motif. Top panel: a ribbon diagram of the designed symmetric “Symfoil” β‐trefoil protein (RCSB accession 3O4D) oriented with the threefold axis of rotational symmetry vertically aligned (left), or with the view down the axis of symmetry (center). The repeating 42‐mer trefoil motif comprising β‐strands 1–4 is indicated by dark shading, also shown are the N‐ and C‐termini. On the right is a similar view of the “Monofoil” trefoil motif peptide that assembles as a trimeric oligomer to generate an intact β‐trefoil protein (RCSB accession 3OL0). Bottom panel: the primary structure of the Symfoil and Monofoil polypeptides. Monofoil is a single instance of the repeating 42‐mer sequence trefoil motif from Symfoil. The numbering scheme is derived from fibroblast growth factor‐1 (FGF‐1) from which Symfoil was derived via top down symmetric deconstruction 6 , 20
β‐trefoil proteins adopt a similar threefold symmetric architecture despite relatively low sequence identity (~4–12%) between family members, suggesting that the structural determinants of the β‐trefoil architecture likely involve a relatively small number of conserved amino acids. 4 In a statistical study of five representative β‐trefoil protein sequences Xiao and coworkers proposed that key structural residues may play an important role in formation of the threefold symmetric architecture, including buried residues symmetrically located in the structure and exhibiting large residue interaction numbers (i.e., a large number of neighbor contacts). 5
Although naturally‐evolved β‐trefoil proteins exhibit differing extents of threefold primary and tertiary structure symmetry, de novo designed β‐trefoil proteins having an exact threefold symmetry have also been reported. 6 , 7 , 8 , 9 Such proteins were designed using various approaches, including top‐down symmetric deconstruction, folding nucleus symmetric expansion, computational and consensus sequence design. 3 , 6 , 7 , 8 , 9 Designed β‐trefoil proteins having exact primary structure symmetry typically involve a general minimization of polypeptide length due to omission of large/asymmetric turn structure associated with specific functionality; thus, designed symmetric β‐trefoil proteins can exhibit a reduced or absent functionality. Notably, however, such proteins also exhibit enhanced stability and folding properties; thus, symmetric design can be accomplished concomitant with a significant degree of structural optimization. Such optimization is likely due to the “function/stability” and “function/foldability” tradeoff that is present in naturally‐evolved proteins (i.e., functionally deleterious mutations are often associated with enhanced stability and folding properties). 10 , 11 , 12 , 13 , 14 , 15
The archetypal central hydrophobic core‐packing group in the β‐trefoil architecture comprises 21 amino acids, with seven residues contributed by each of the repeating trefoil motifs. Depending upon the particular β‐trefoil, a subset of these residues at the top of the β‐barrel can be interfacial with solvent; thus, while 15 of the central core positions are typically fully solvent inaccessible, the complete cooperative hydrophobic core‐packing group (considered herein) includes 21 positions. 16 , 17 , 18 A number of β‐trefoil proteins also contain three isolated “mini‐core” regions, located outside of the central core and each involving cooperative interactions among three hydrophobic residues. 19 These isolated mini‐core regions, distinct from the central core, are not considered in this report.
Notably, for one member of the symmetric designed β‐trefoil proteins (“Symfoil,” RCSB accession 3O4D) the repeating 42‐mer trefoil motif when expressed as an isolated peptide (“Monofoil”) spontaneously oligomerizes as a stable trimer, generating an intact β‐trefoil architecture (RCSB accession 3OL0; Figure 1). 6 , 20 Analysis of the core‐packing interactions for the Monofoil oligomeric assembly can therefore identify specific inter‐motif interactions that define the oligomerization interface, and are distinguished from intra‐motif interactions localized to the trefoil motif. While intra‐motif contacts define short‐range contacts in both Monofoil and Symfoil, the inter‐motif oligomerization interactions in Monofoil are intra‐molecular with potentially large contact order interactions in Symfoil. Large contact order interactions are typically associated with late folding events in the protein folding pathway 21 , 22 , 23 , 24 ; thus, a comparison of the core‐packing interactions between Monofoil and Symfoil can elucidate aspects of core packing cooperativity that promote early versus late events in the folding pathway of the intact β‐trefoil protein.
Analysis of the interactions of the seven core‐packing residues within the trefoil motifs of the Monofoil oligomer identifies a uniquely different extent of cooperative interactions at each position. The number of neighbor contacts for each core residue varies from three to eight; furthermore, four of the seven core‐packing residues are involved in direct contacts with their symmetry mates and define oligomerization interactions. Additionally, each of these four residues is located within a different β‐strand that characterizes the fundamental trefoil motif structure. The other three core‐packing residues are primarily involved in intra‐motif interactions and are therefore postulated to primarily stabilize the local trefoil motif structure. Two specific core residues are structurally organized by a ubiquitously conserved buried water molecule in each trefoil motif of β‐trefoil proteins. 25 These two residues are unique in exhibiting the largest number of neighbor contacts among the set of seven core residues; furthermore, these two positions belong to the group that has direct threefold symmetry mate contacts. Thus, nascent organization of the conserved buried water with these two core residues establishes an initial intra‐motif packing interface that promotes subsequent oligomerization of adjacent trefoil motifs.
2. RESULTS
Table S1 lists the van der Waals contacts, distances, standard deviations, and pairwise contact order, for the central hydrophobic core‐packing resides in the Monofoil structure (RCSB accession 3OL0). There are three instances of the Monofoil trefoil motif in the oligomeric assembly, labeled as chains A, B and C; however, this is an arbitrary nomenclature, and any given Monofoil chain simply has a preceding (i.e., N‐terminal), and following (i.e., C‐terminal) trefoil motif neighbor related by threefold rotational symmetry. In Table S1 contact with a residue in a preceding motif is indicated by brackets [ResXX], and contact with a residue in a following motif is indicated by braces {ResXX} (where Res indicates residue type, and XX indicate the residue position). The distance values provided in this table are the average and standard deviation for the three instances of each indicated contact. The following is a description of the neighbor contacts for the individual core residue positions (with the particular β‐strand that each residue occupies given in parentheses); these contacts are illustrated graphically in Figure 2.
FIGURE 2.

Neighbor contacts for the hydrophobic core residues in the Monofoil β‐trefoil structure (RCSB accession 3OL0). Van der Waals contacts for each of the seven hydrophobic core residues (Val12, Leu14, Leu23, Ile25, Val31, Phe44, and Ile46) of the trefoil motif in the Monofoil trimeric assembly are indicated by magenta lines. Three trefoil motifs assemble to generate an intact β‐trefoil architecture; thus, the threefold axis of rotational symmetry is vertically aligned in this schematic. The individual trefoil motifs are indicated in red, green and blue shading, and the β‐strand numbering is also indicated along with N‐ and C‐termini. The central trefoil motif (green) is the frame of reference; thus, the red trefoil motif is considered to be the preceding motif, and the blue trefoil motif is the following motif
2.1. Val12 (β1)
Val12 has four neighbor contacts: one is with Ile46 (β4) in the preceding trefoil motif, and three others involve Leu14 (β1), Phe44 (β4) and Ile46 (β4) within the same trefoil motif (i.e., comprise intra‐motif interactions). The packing contacts for Val12 lie exclusively within the upper β‐barrel region (β1, β4); the majority of which involve intra‐motif contacts. The smallest intra‐motif contact order is with Leu14 (0.05). The interaction with Ile46 in the preceding trefoil motif has no defined contact order in Monofoil; however, in Symfoil this interaction also generates a small contact order of 0.06.
2.2. Leu14 (β1)
Leu14 is one of two core residues that have the greatest number of neighbor contacts, with a total of eight. Leu14 makes contact with its symmetry mates in the preceding and following trefoil motifs. Leu14 makes two additional contacts with Phe44 (β4) and Ile46 (β4) in the preceding trefoil motif. Leu14 makes four intra‐motif contacts with Val12 (β1), Leu23 (β2), Phe44 (β4) and Ile46 (β4). Thus, the majority of contacts for Leu14 involve positions within the upper β‐barrel (involving β1 and β4), and most of these involve intra‐ and preceding‐motif contacts. Leu14 has a single intra‐motif contact with the lower β‐hairpin (β2) region (with Leu23). In addition to the above‐described small (0.06) contact order with Val12, interactions with Ile46 and Phe44 in the preceding trefoil motif in Monofoil generate small contact orders of 0.08 and 0.10, respectively, in the Symfoil β‐trefoil structure.
2.3. Leu23 (β2)
Leu23 is the other core residue that exhibits the greatest number of neighbor contacts, with a total of eight. Leu23 makes contact with its symmetry mates in the preceding and following trefoil motifs. Leu23 makes three packing contacts with Phe44 (β4), Ile25 (β2) and Val31 (β3) in the preceding trefoil motif. Leu23 makes three intra‐motif contacts involving positions Leu14 (β1), Val31 (β3), and Phe44 (β4). Thus, the packing contacts for Leu23 comprise a number of upper β‐barrel (β1 and β4), and lower β‐hairpin (β2, β3) interactions, involving preceding, intra‐ and following motifs. The smallest contact order of Leu23 is for Phe44 (0.17) in the preceding trefoil motif, and therefore generated in the Symfoil β‐trefoil structure.
2.4. Ile25 (β2)
Ile25 exhibits the least number of contacts among all the core residues, with only three neighbor contacts. These contacts involve Val31 (β3) and Phe44 (β4) as intra‐motif interactions, as well as Leu23 (β2) in the following motif. Thus, the packing contacts for Ile25 involve positions in both the upper β‐barrel, as well as β‐hairpin regions, involving intra‐ and following motif contacts. The smallest contact order of Ile25 (0.14) is for the intra‐motif Val31.
2.5. Val31 (β3)
Val31 exhibits five neighbor contacts, and includes its symmetry mates in the preceding and following trefoil motifs. Val31 makes contact with Leu23 (β2) in the following motif, and two intra‐motif contacts with Leu23 (β2) and Ile25 (β2). Thus, the packing contacts for Val31 comprise positions exclusively within the lower β‐hairpin region (β2, β3); involving preceding, intra‐, and following motifs. The smallest contact order of Val31 (0.14) is with the previously described intra‐motif Ile25.
2.6. Phe44 (β4)
Phe44 exhibits six neighbor contacts. These include intra‐motif contacts with Val12 (β1), Leu14 (β1), Leu23 (β2) and Ile25 (β2), as well as Leu14 (β1) and Leu23 (β2) in the following motif. Thus, the packing contacts for Phe44 involve positions in both the upper β‐barrel, as well as lower β‐hairpin regions, involving both intra‐ and following motif contacts. The smallest contact orders involving Phe44 are with Leu23 (0.17) and Leu14 (0.10) in the following trefoil motif, and therefore generated in the Symfoil β‐trefoil structure.
2.7. Ile46 (β4)
Ile46 exhibits six neighbor contacts, and includes its symmetry mates in the preceding and following motifs. Ile46 makes intra‐motif contacts with Val12 (β1) and Leu14 (β1), as well as contacts with Val12 (β1) and Leu14 (β1) of the following motif. Thus, the packing contacts for Ile46 comprise positions exclusively within the upper β‐barrel region (β1, β4) and involve preceding, intra‐ and following motifs. The smallest contact orders for Ile46 are with Val12 (0.06) and Leu14 (0.07) in the following trefoil motif, and therefore generated in the Symfoil β‐trefoil structure; four of the six van der Waals contacts with Ile46 in Monofoil involve inter‐motif interactions, which generate contact order in the Symfoil β‐trefoil structure.
For ease of discussion, the four residue positions having direct contact with symmetry mates are referred to hereafter as the “symmetric” set of core residues; while the three residues that have no direct contact with their symmetry mates are referred to as the “asymmetric” set (however, it should be noted that each core residue in the trefoil motif has threefold symmetry mates in the overall β‐trefoil architecture).
3. DISCUSSION
What relevance does the Monofoil/Symfoil core packing group have for β‐trefoil proteins in general? To address this question, the core residue composition of representative members of each of the 17 families of β‐trefoil proteins was characterized (Table S2). Each β‐trefoil protein has three trefoil motifs, yielding a composite of 51 different motif sequences. The frequency of the seven core‐packing residues in the β‐trefoil family of proteins is shown graphically as a sequence logo plot in Figure 3. This plot indicates preferred residues of Val12, Leu14, Leu23, Val25, Val31, Phe44, and Ile46 among the β‐trefoil proteins evaluated. This preferred set differs from that of Monofoil/Symfoil at position 25, which is Ile in Monofoil/Symfoil, and Val in the preferred set; however, Ile is the second most frequent residue at this position and essentially equivalent in occurrence to Val. Thus, the core‐packing group of Monofoil/Symfoil, generated by the method of top‐down symmetric deconstruction, 20 , 26 is a highly conserved, and likely optimized, core‐packing set for β‐trefoil proteins in general.
FIGURE 3.

Sequence logo plot of consensus core residues in β‐trefoil proteins. Representative members of each of the 17 families of β‐trefoil proteins were queried for core residue composition (see Table S2). The residue numbering references the FGF‐1/Symfoil/Monofoil proteins. Each β‐trefoil protein has three repeating trefoil motifs; thus, the figure represents a composite of 51 different trefoil motifs
Of the set of seven core residues, four (Leu14, Leu23, Val31, and Ile46) make direct contacts with their symmetry mates (Figure 4). Notably, each of these residue positions is located within a different β‐strand of the trefoil motif (Leu14 in β1, Leu23 in β2, Val31 in β3, and Ile46 in β4). These four residues orient approximately normal (i.e., “horizontally”) relative to the threefold symmetry axis, and therefore describe stratified packing layers of hydrophobic residues (Figure 5). The effective structural diameter (measured from Cα distances) is 11.3 Å at Leu14, 11.3 Å at Leu23, 7.5 Å at Val31, and 11.7 Å at Ile46. Leu14 (β1) and Ile46 (β4) form part of the upper β‐barrel, while Leu23 (β2) and Val31 (β3) contribute to the lower β‐hairpin cap. The diameter of the β‐barrel is therefore essentially identical at Leu14 and Ile46 (11.3 Å and 11.7 Å, respectively). This suggests that Leu and Ile side chains are likely interchangeable at these positions without perturbing the overall β‐barrel dimensions. While Leu23 is part of the lower β‐hairpin, and not part of the upper β‐barrel, the structural diameter of 11.3 Å is identical to that of the Leu14 and Ile46 β‐barrel residues. This indicates that the structure of β2 at position Leu23 conserves the overall barrel dimensions; and furthermore, that Leu and Ile side chains are likely also interchangeable at this position while maintaining compatibility with β‐hairpin tertiary structure.
FIGURE 4.

β‐Trefoil core‐packing residue positions having threefold symmetry mate contacts. Core positions Leu14, Leu23, Val31, and Ile46 in the Monofoil structure each exhibit van der Waals contacts with threefold symmetry mates. The residue positions are shown in space filling representation, and the view is down the threefold axis of rotational symmetry. Also shown are the associated β‐strands of the individual trefoil motifs. The Cα‐Cα distances (Å) are indicated by light gray dashed lines and numbers; the corresponding β‐barrel diameter (Å) is indicated by black solid line and number
FIGURE 5.

Layering of Monofoil core‐packing residues having symmetry mate contacts. The four core positions having direct symmetry mate contacts are horizontally stratified in relationship to the threefold axis of rotational symmetry (which is vertically aligned in this “side view”). Coloring is used to indicate the three repeating trefoil motif peptides (chains A, B and C) in the oligomeric assembly that generates an intact β‐trefoil protein in the Monofoil structure (PDB accession 3OL0)
The threefold contact interface at Leu14, Leu23 and Ile46 involves a Cδ atom, which is common to both Leu and Ile sidechains; thus, it is not only feasible that a triplet of Leu or Ile could structurally substitute, but also that an asymmetric combination of these two amino acids likely forms a packing group compatible with the overall β‐trefoil architecture (i.e., at both barrel and hairpin positions). Inspection of the sequence logo plot of consensus core residues (Figure 3) indicates that Ile is the next most frequent amino acid at both Leu14 and Leu23; and Leu is the third most frequent (and similar in frequency to Val, the second most frequent) at position Ile46. Thus, a postulated structural quasi‐equivalence of Leu/Ile mutations at these positions is also suggested by sequence consensus. As one example to support this hypothesis, beta‐galactose‐specific lectin 1 in Ricin B‐like lectin family (RCSB 1SZ6) has Ile at the equivalent positions of Leu14, Leu56, and Leu97 in Symfoil (at positions Ile14, Ile58 and Ile99 in 1SZ6, respectively). The small diameter of 7.5 Å at the Val31 position suggests a “collapse” of the general barrel dimensions, perhaps afforded by the lack of Cδ atoms in Val (the most common amino acid observed at this position).
Three residues in the core, Val12, Ile25, and Phe44, have no direct interactions with their symmetry mates (i.e., they form the “asymmetric” set). Instead, these residues have a majority of interactions within their own trefoil motif (i.e., intra‐motif), and only limited interactions with either the preceding or following motif (i.e., inter‐motif). This set of residues is approximately vertically aligned (i.e., parallel to the axis of rotational symmetry; Figure 6), and essentially intercalates between the symmetric set of residues (Figure 7). The asymmetric set of core residues are thus interpreted as having contacts that are primary drivers of structure formation within individual trefoil motifs.
FIGURE 6.

The “asymmetric” set of core‐packing residues. A ribbon drawing of the interior view of the repeating trefoil motif of the Monofoil structure (RCSB access 3OL0) is shown, along with a space filling representation of the three “asymmetric” core packing residues Val12, Ile25, and Phe44 (in CPK coloring). The threefold axis of rotational symmetry is vertically aligned. The other two trefoil motifs are omitted for clarity. The coloring reflects the secondary structure (yellow = β‐strand; grey = turn)
FIGURE 7.

β‐trefoil core‐packing residues segregated by symmetry‐mate interactions. Shown on the left is a space filling representation of the set of four core residues, from each of the three trefoil motifs, having direct contact with their symmetry mates (the view is down the threefold axis of rotational symmetry). This set is stratified normal (i.e., “horizontal”) to the axis of symmetry. In the center, is a similar representation of the set of three residues, contributed by each of the three trefoil motifs, that lack direct interactions with their symmetry mates (i.e., the “asymmetric” set). This set is aligned parallel to the symmetry axis (i.e., “vertical”) and intercalates between the symmetric set. On the right is a ribbon diagram of the three trefoil motif chains in the Monofoil oligomer (PDB accession 3OL0) illustrating the arrangement of the symmetric and asymmetric sets of core packing residues (the asymmetric set is identified by dark shading)
Segments of β‐strand pairs β1/β4, β1/β2, and β2/β4 hydrogen‐bond in a characteristic anti‐parallel hydrogen‐bonding arrangement. 6 , 20 The site of divergence of each of these three pairs of β‐strands occurs at a common coordinate. A single buried water molecule is located at this unique position, simultaneously providing a terminal bridging hydrogen‐bond for each of the three diverging pairs of β‐strands, and is a ubiquitous feature in the trefoil motifs of β‐trefoil proteins. 25 This water participates in three hydrogen bonds, involving the main chain amide of Leu14 in β1, the main chain carbonyl of Leu23 in β2, and the main chain carbonyl of Ile42 in β4 (referencing the Monofoil protein). Tetrahedral hydrogen‐bonding geometry with this water consequently orients the sidechain Cα‐Cβ vector of Leu14 and Leu23 toward the central core region. Although Ile42 is not located within the central core, establishment of β‐strand secondary structure at this position subsequently positions Phe44 toward the central core. A detailed description of the interaction geometry of this conserved water and local residues has previously been described, and interested readers are directed to that report. 25 Folding studies of circular permutations of the Monofoil trefoil motif indicate that local organization of a Leu14/Leu23/Phe44 packing group in response to this conserved buried water interaction appears essential for the formation of an effective folding nucleus. 25 , 27 Leu14 and Leu23 exhibit the largest number of neighbor contacts (at eight each), and both also exhibit direct contacts with their symmetry mates. The neighbor contact map for the Leu14/Leu23 pair is shown in Figure 8, and these two core positions alone establish six intra‐motif interactions (including contacts between the β‐barrel and β‐hairpin regions), seven inter‐motif interactions with the preceding motif, and two inter‐motif interactions with the following motif. Additionally, Leu14 and Leu23 are both internal layers in the horizontal stratification of core residues (Figure 5); thus, nascent structuring of Leu14/Leu23 establishes a favorable packing interface for the subsequently assemble of other layers of the symmetric core positions (i.e., Val31 and Ile46). This assembly of the symmetric set establishes the complete oligomerization interface of the trefoil‐motif core. It is therefore postulated that Leu14 and Leu23 interactions (perhaps also augmented by Phe44), coordinated by the unique buried solvent, likely forms the key folding nucleus structure in β‐trefoil proteins. This interpretation is consistent with phi‐value analyses 28 of Symfoil and fibroblast growth factor‐1 (FGF‐1), which indicate that the folding nucleus (which forms early in the folding pathway) is centrally‐located (i.e., involves the middle trefoil motif); while the N‐ and C‐termini regions fold late in the folding pathway. 12 , 29 A similar central region forming early in the folding pathway has been reported for interleukin‐1β. 30
FIGURE 8.

Core packing interactions of Leu14 and Leu23 organized by the ubiquitous buried solvent in β‐trefoil proteins. The main chain atoms of these residue positions are oriented via main chain H‐bond interactions with a universally‐conserved buried solvent in β‐trefoil proteins (indicated by red sphere in each trefoil motif) 25
Phe44 is the largest residue in the set of hydrophobic core positions, and therefore participates in the greatest van der Waals contact surface area. Phe/Trp mutations at position 44/85/132 in FGF‐1 and interleukin 1‐β are essentially neutral in thermostability. 31 , 32 The indole nitrogen of Trp44/85/132 finds an ideally‐positioned H‐bond partner with the adjacent main chain carbonyl of Leu23/66/109, respectively. The sequence logo plot at position 44 also supports an almost equal preference for Phe or Trp. Therefore, the hypothesis of structural equivalence described above for Leu/Ile substitutions at positions 14, 23 and 46 of the symmetric set, is also postulated to extend to Phe/Trp substitutions at position 44 of the asymmetric set. Thus, primary structure drift within the core region, associated with an apparent packing solution degeneracy at several positions, appears to be an intrinsic feature of the β‐trefoil architecture. This packing solution degeneracy is able to maintain the overall β‐trefoil tertiary structure dimensions as well as the intra‐ and inter‐motif core interactions essential for structure and oligomerization/folding.
Packing solution degeneracy can promote oligomerization of the trefoil motif, since it would not require a unique specific sequence at the oligomerization interface. Gene duplication and fusion, in going from Monofoil to Symfoil, converts intermolecular oligomerization interactions into intramolecular folding interactions. This process generally results in novel, but large, contact order interactions. In the cases of residues having direct contact with symmetry mates in the β‐trefoil, oligomerization would yield a contact order of 0.33. However, for other residues oligomerization can yield contact orders significantly smaller (i.e., ~0.10) than that seen for the symmetric oligomerization interactions. Some small contact orders are observed with Val12, Leu14, Phe44 and Ile46 in Symfoil (Table S1). Thus, converting intermolecular (i.e., oligomerization) contacts into intramolecular interactions with a small contact order, via gene duplication and fusion, involves contacts that do not involve direct symmetry mates.
4. MATERIALS AND METHODS
4.1. X‐ray structures
The Symfoil‐4P (RCSB accession 3O4D, 1.65 Å resolution) and Monofoil‐4P (RCSB accession 3OL0, 1.48 Å resolution) X‐ray structures were analyzed for core‐packing interactions. Symfoil‐4P is a single‐chain β‐trefoil protein of 126 amino acids. Monofoil‐4P is a 42‐mer polypeptide, representing the repeating trefoil motif present in Symfoil‐4P, and assembles as a homo‐trimer to generate an integral β‐trefoil protein. 6 , 20 Molecular images were constructed using the SwissPDBViewer 33 and POV‐ray 34 software.
4.2. Calculation of solvent accessible surface area and van der Waals contact distance
The solvent accessible surface area of residue atoms, against a background of all protein atoms, was calculated using the EDPDB software package 35 and a 1.4 Å radius probe. Since these crystal structures lack hydrogens, the van der Waals neighbor contacts for hydrophobic side chain carbon atoms were identified using the radius of methane (CH4), determined from both experimental X‐ray crystallography measurements and quantum mechanics calculations as 2.1 Å. 36 Hydrophobic side chains were therefore identified as being in van der Waals contact if the carbon–carbon distance was <4.4 Å (i.e., 4.2 Å and considering a potential coordinate positional error of 0.2 Å). Neighbor queries were performed using the COOT 37 and Swiss PDB‐viewer 33 software packages.
4.3. Consensus β‐trefoil core residues and sequence logo chart
Core residues for representative protein structures from the 17 different β‐trefoil families were identified using neighbor distance calculations to the core residues identified in the Symfoil protein. A listing of the RCSB accessions for the set of β‐trefoil proteins can be found in Table S2 of Blaber. 25 The core residues were organized as equivalent positions in a single trefoil motif, referencing the numbering scheme of Monofoil/Symfoil (i.e., positions 12, 14, 23, 25, 31, 44 and 46). A sequence logo chart was generated from this data using WebLogo. 38
4.4. Neighbor residue contact order calculation
Contact order for pairwise interactions was calculated as primary structure distance/overall polypeptide chain length. 21 In the case of Monofoil the polypeptide length is 42 residues; while for Symfoil the length is 126 residues. In Symfoil the residue separation count for any specific pair of residues can yield two different values depending upon whether contact with a preceding, or following, trefoil subunit spans the (discontinuous) N‐ and C‐termini. In such cases, the apparent residue separation will be 42 residues longer than a corresponding residue pair that does not span the termini. For simplicity, the shorter residue separation value is used in the contact order calculation. Any neighbor residue contact in Monofoil that is inter‐motif has an undefined contact order.
CONFLICT OF INTEREST
Michael Blaber is a cofounder and has equity ownership in Trefoil Therapeutics Inc.
AUTHOR CONTRIBUTIONS
Michael Blaber: Conceptualization; data curation; formal analysis; funding acquisition; investigation; methodology; project administration; resources; software; supervision; validation; visualization; writing‐original draft; writing‐review & editing.
Supporting information
Appendix S1: Supporting information
Table S1: Listing of the van der Waals contacts for the set of seven buried hydrophobic core residues in the oligomeric trefoil motif of the designed Monofoil protein (RCSB 3OL0). The contact order is calculated for both the Monofoil and Symfoil (RCSB 3O4D) proteins.
Table S2: Core packing residues for representative members of the 17 families of β‐trefoil proteins. The numbering scheme references the Symfoil protein and equivalent positions for the repeating trefoil motifs in each protein are listed.
ACKNOWLEDGEMENTS
The author is grateful to Dr. Liam Longo for providing a critical reading of the manuscript and for helpful discussions. This work was supported in part by a research support agreement from Trefoil Therapeutics Inc. Support from the FSU department of Biomedical Sciences is acknowledged.
Blaber M. Cooperative hydrophobic core interactions in the β‐trefoil architecture. Protein Science. 2021;30:956–965. 10.1002/pro.4059
REFERENCES
- 1. McLachlan AD. Three‐fold structural pattern in the soybean trypsin inhibitor (Kunitz). J Mol Biol. 1979;133:557–563. [DOI] [PubMed] [Google Scholar]
- 2. Murzin AG, Lesk AM, Chothia C. B‐trefoil fold. Patterns of structure and sequence in the kunitz inhibitors interleukins‐1b and 1a and fibroblast growth factors. J Mol Biol. 1992;223:531–543. [DOI] [PubMed] [Google Scholar]
- 3. Andreeva A, Howorth D, Chothia C, Kulesha E, Murzin AG. SCOP2 prototype: A new approach to protein structure mining. Nucleic Acids Res. 2013;42:D310–D314. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Kirioka T, Panyavut A, Kikuchi T. Detection of folding sites of β‐trefoil fold proteins based on amino acid sequence analyses and structure‐based sequence alignment. J Proteom Bioinformat. 2017;10:222–235. [Google Scholar]
- 5. Feng J, Li M, Huang Y, Xiao Y. Symmetric key structural residues in symmetric proteins with beta‐trefoil fold. PLoS One. 2010;5:e14138. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Lee J, Blaber M. Experimental support for the evolution of symmetric protein architecture from a simple peptide motif. Proc Natl Acad Sci U S A. 2011;108:126–130. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Longo LM, Kumru OS, Middaugh CR, Blaber M. Evolution and design of protein structure by folding nucleus symmetric expansion. Structure. 2014;22:1377–1384. [DOI] [PubMed] [Google Scholar]
- 8. Broom A, Ma SM, Xia K, et al. Designed protein reveals structural determinants of extreme kinetic stability. Proc Natl Acad Sci U S A. 2015;112:14605–14610. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Terada D, Voet ARD, Noguchi H, et al. Computational design of a symmetrical β‐trefoil lectin with cancer cell binding activity. Sci Rep. 2017;7:5943. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Beadle BM, Shoichet BK. Structural basis of stability—function tradeoffs in enzymes. J Mol Biol. 2002;321:285–296. [DOI] [PubMed] [Google Scholar]
- 11. Gosavi S. Understanding the folding‐function tradeoff in proteins. PLoS One. 2013;8:e61222. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Longo L, Lee J, Blaber M. Experimental support for the foldability‐function tradeoff hypothesis: Segregation of the folding nucleus and functional regions in FGF‐1. Protein Sci. 2012;21:1911–1920. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Rubini M, Lepthien S, Golbik R, Budisa N. Aminotryptophan‐containing barstar: Structure‐function tradeoff in protein design and engineering with an expanded genetic code. Biochim Biophys Acta. 2006;1764:1147–1158. [DOI] [PubMed] [Google Scholar]
- 14. Smock Robert G, Yadid I, Dym O, Clarke J, Tawfik DS. De novo evolutionary emergence of a symmetrical protein is shaped by folding constraints. Cell. 2016;164:476–486. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Tokuriki N, Stricher F, Serrano L, Tawfik DS. How protein stability and new functions trade off. PLoS Comput Biol. 2008;4:e1000002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Brych SR, Blaber SI, Logan TM, Blaber M. Structure and stability effects of mutations designed to increase the primary sequence symmetry within the core region of a b‐trefoil. Protein Sci. 2001;10:2587–2599. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Brych SR, Kim J, Logan TM, Blaber M. Accommodation of a highly symmetric core within a symmetric protein superfold. Protein Sci. 2003;12:2704–2718. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Brych SR, Dubey VK, Bienkiewicz E, Lee J, Logan TM, Blaber M. Symmetric primary and tertiary structure mutations within a symmetric superfold: A solution, not a constraint, to achieve a foldable polypeptide. J Mol Biol. 2004;344:769–780. [DOI] [PubMed] [Google Scholar]
- 19. Dubey VK, Lee J, Blaber M. Redesigning symmetry‐related "mini‐core" regions of FGF‐1 to increase primary structure symmetry: Thermodynamic and functional consequences of structural symmetry. Protein Sci. 2005;14:2315–2323. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Lee J, Blaber SI, Dubey VK, Blaber M. A polypeptide "building block" for the ß‐trefoil fold identified by "top‐down symmetric deconstruction". J Mol Biol. 2011;407:744–763. [DOI] [PubMed] [Google Scholar]
- 21. Plaxco KW, Simons KT, Baker D. Contact order, transition state placement and the refolding rates of single domain proteins. J Mol Biol. 1998;277:985–994. [DOI] [PubMed] [Google Scholar]
- 22. Shmygelska A. Search for folding nuclei in native protein structures. Bioinformatics. 2005;21:i394–i402. [DOI] [PubMed] [Google Scholar]
- 23. Baxa MC, Freed KF, Sosnick TR. Quantifying the structural requirements of the folding transition state of protein A and other systems. J Mol Biol. 2008;381:1362–1381. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Naganathan AN, Muñoz V. Insights into protein folding mechanisms from large scale analysis of mutational effects. Proc Natl Acad Sci U S A. 2010;107:8611–8616. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Blaber M. Conserved buried water molecules enable the β‐trefoil architecture. Protein Sci. 2020;29:1794–1802. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Blaber M, Lee J. Designing proteins from simple motifs: Opportunities in top‐down symmetric deconstruction. Curr Opin Struct Biol. 2012;22:442–450. [DOI] [PubMed] [Google Scholar]
- 27. Tenorio CA, Longo LM, Parker JB, Lee J, Blaber M. Ab initio folding of a trefoil‐fold motif reveals structural similarity with a β‐propeller blade motif. Protein Sci. 2020;29:1172–1185. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Fersht AR, Sato S. F‐value analysis and the nature of protein‐folding transition states. Proc Natl Acad Sci U S A. 2004;101:7976–7981. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Xia X, Longo LM, Sutherland MA, Blaber M. Evolution of a protein folding nucleus. Protein Sci. 2016;25:1227–1240. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Capraro DT, Roy M, Onuchic JN, Jennings PA. Backtracking on the folding landscape of the beta‐trefoil protein interleukin‐1beta? Proc Natl Acad Sci U S A. 2008;105:14844–14848. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Lee J, Blaber M. The interaction between thermodynamic stability and buried free cysteines in regulating the functional half‐life of fibroblast growth factor‐1. J Mol Biol. 2009;393:113–127. [DOI] [PubMed] [Google Scholar]
- 32. Adamek DH, Guerrero L, Blaber M, Caspar DL. Structural and energetic consequences of mutations in a solvated hydrophobic cavity. J Mol Biol. 2004;346:307–318. [DOI] [PubMed] [Google Scholar]
- 33. Guex N, Peitsch MC. SWISS‐MODEL and the Swiss‐PdbViewer: An environment for comparative protein modeling. Electrophoresis. 1997;18:2714–2723. [DOI] [PubMed] [Google Scholar]
- 34. Persistence of Vision Raytracer. 3.6. Williamstown, Victoria, Australia: Retrieved from http://www.povray.org/download/; 2004.
- 35. Zhang X‐J, Matthews BW. EDPDB: A multi‐functional tool for protein structure analysis. J Appl Cryst. 1995;28:624–630. [Google Scholar]
- 36. Kammeyer CW, Whitman DR. Quantum mechanical calculation of molecular radii. I. Hydrides of elements of periodic groups IV through VII. J Chem Phys. 1972;56:4419–4421. [Google Scholar]
- 37. Emsley P, Cowtan K. Coot: Model‐building tools for molecular graphics. Acta Crystallogr. 2004;D60:2126–2132. [DOI] [PubMed] [Google Scholar]
- 38. Crooks GE, Wolfe J, Brenner SE. Measurements of protein sequence–structure correlations. Proteins. 2004;57:804–810. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Appendix S1: Supporting information
Table S1: Listing of the van der Waals contacts for the set of seven buried hydrophobic core residues in the oligomeric trefoil motif of the designed Monofoil protein (RCSB 3OL0). The contact order is calculated for both the Monofoil and Symfoil (RCSB 3O4D) proteins.
Table S2: Core packing residues for representative members of the 17 families of β‐trefoil proteins. The numbering scheme references the Symfoil protein and equivalent positions for the repeating trefoil motifs in each protein are listed.
