Abstract
Available high‐resolution crystal structures for the family of β‐trefoil proteins in the structural databank were queried for buried waters. Such waters were classified as either: (a) unique to a particular domain, family, or superfamily or (b) conserved among all β‐trefoil folds. Three buried waters conserved among all β‐trefoil folds were identified. These waters are related by the threefold rotational pseudosymmetry characteristic of this protein architecture (representing three instances of an identical structural environment within each repeating trefoil‐fold motif). The structural properties of this buried water are remarkable and include: residing in a cavity space no larger than a single water molecule, exhibiting a positional uncertainty (i.e., normalized B‐factor) substantially lower than the average Cα atom, providing essentially ideal H‐bonding geometry with three solvent‐inaccessible main chain groups, simultaneously serving as a bridging H‐bond for three different β‐strands at a point of secondary structure divergence, and orienting conserved hydrophobic side chains to form a nascent core‐packing group. Other published work supports an interpretation that these interactions are key to the formation of an efficient folding nucleus and folded thermostability. The fundamental threefold symmetric structural element of the β‐trefoil fold is therefore, surprisingly, a buried water molecule.
Keywords: β‐strand, cavity, protein evolution, protein folding, protein stability
1. INTRODUCTION
The study and interpretation of buried water molecules in protein structures has a long history. Two distinct viewpoints regarding the nature of such waters have been described in the literature. In one view, protein interiors cannot pack perfectly and packing defects are to be expected. If such defects are large enough, then water molecules can potentially enter 1 , 2 (although water occupancy in purely nonpolar cavities has been called into question 3 ). Such buried waters may either be disordered, 4 , 5 or if a polar group is present in the walls of such cavities, then buried waters may become ordered through specific hydrogen bonding. 6 , 7 The question of whether such buried waters contribute significantly to protein stability is a matter of debate 8 , 9 , 10 , 11 and the net energetic balance may depend upon the number and type of H‐bonds that are formed. 12 In the other view, 13 since the polypeptide main chain traverses the interior of globular proteins, the presence of polar atoms in the core is unavoidable. If such main chain carbonyl or amide polar groups in the core are unsatisfied in their H‐bonding potential (for conformational reasons), water may be incorporated into the core to serve as H‐bond partners. Studies of buried waters are typically investigated through cavity‐creating or modifying mutations, 7 , 9 , 10 , 14 , 15 , 16 through surveys of extant protein crystal structures, 6 , 17 or by computation or theoretical studies. 8 , 11 , 12
Single buried waters exhibiting low B‐factors pay the greatest entropic penalty, and so must provide a compensating favorable enthalpy in order to be present in protein structures. The presence of such waters presumably occurs at positions that cannot otherwise be satisfied by any protein main chain or side chain polar group. Conserved buried waters have been identified in families of related proteins 18 , 19 , 20 and have been interpreted as playing a key role in stability, folding or function for that fold. The purpose of the present study is to identify potentially conserved buried waters in β‐trefoil proteins and, if present, to discern the structural or functional role.
The β‐trefoil is a common protein architecture comprising 10 superfamilies, 17 families, and 53 domains (plus four independent de novo designs). 21 , 22 , 23 , 24 , 25 The structure is characterized by a pseudo‐threefold axis of rotational symmetry aligned vertically through a six‐stranded β‐barrel (Figure 1). The threefold repeating motif, typically comprising 40–50 amino acids in length, is comprised of four antiparallel β‐strands in a characteristic conformation termed a “trefoil‐fold” 26 , 27 , 28 (where β‐strands 1 and 4 of each motif contribute to the β‐barrel). This superfamily is known for functioning principally as diverse ligands, with no known enzymatic activity. This fold is also known for exhibiting extremely low (~4–12%) amino acid identity between family members, 29 and also between the individual trefoil‐fold motifs related by threefold rotational symmetry within individual β‐trefoil proteins. 30 Novel β‐trefoil proteins are therefore typically identified as belonging to this superfamily not by amino acid identity but, rather, by crystal structure or a characteristic hydrophobic patterning of the primary structure. 29 , 31 , 32
FIGURE 1.

Conserved buried Waters 1A, 1B, and 1C in representative β‐trefoil proteins. These waters are related by threefold rotational symmetry characteristic of the β‐trefoil fold, where A, B, and C reference the associated trefoil‐fold motif. In this view, the threefold axis of rotational symmetry is aligned vertically, and each view represents an approximately 120° rotation about the vertical axis. Representative β‐trefoil structures include the de novo designed symmetric β‐trefoil protein “Symfoil” (3O49), Clostridium neurotoxins, C‐terminal domain (1EPW), MIR domain (1N4K), and proteinase inhibitor 1‐like protein (3VWC)
High‐resolution crystal structures for 46 available β‐trefoil domains were analyzed for buried waters (Table S1). These waters fell into two general categories: (1) waters unique to a particular β‐trefoil family or superfamily and (2) waters common to all β‐trefoil proteins. This latter category comprised a set of three waters related by threefold rotational symmetry; thus, they represent a single water at an identical position in each of the three repeating trefoil‐fold motifs. This water is shown to facilitate the spatial coordination of a nascent cluster of hydrophobic residues that form a substantial portion of the characteristic hydrophobic patterning within β‐trefoil proteins that defines the central core‐packing group. Modeling shows that the structural organization promoted by this buried water is possible only with the archetype N‐ and C‐termini definitions for the trefoil‐fold, and reported properties of circular permutants of the trefoil‐fold motif indicate that this structural organization is likely a key element of foldability. 33 Although the β‐trefoil has been described as exhibiting a discernable threefold rotational symmetry, the most conserved structural element of such symmetry, surprisingly, is a buried water molecule.
2. RESULTS
2.1. Crystal data for representative structures of β‐trefoil domains
The highest resolution structure identified for each domain, within each β‐trefoil family, is given in supplementary Table S1. A total of 42 different domain structures were identified, representing the 16 known families of β‐trefoil proteins. An additional informal family was added representing the purely symmetric de novo designed β‐trefoil proteins (comprising four domain structures). Thus, a total of 46 crystal structures were initially characterized for buried waters. These 46 structures represent 46 different crystal forms. All crystal systems except for cubic are present in these structures (although a cubic crystal form, space group I23 is present in a lower resolution structure of IL‐36G (RCSB 6P9E). The resolution ranged from 1.10 to 2.33 Å with an average of 1.81 Å. The Cα B‐factors ranged from 9.7 to 43.4 Å2 with an average of 25.1 Å2.
2.2. Identification of buried waters
In the identification of buried waters in the different β‐trefoil proteins, it became apparent that such waters could be divided into two general categories: (1) waters unique to a domain or family and (2) waters conserved among all β‐trefoil proteins. Three buried waters were identified that are structurally conserved among the β‐trefoil proteins (Table S2). These waters are related by the threefold rotational symmetry characteristic of the β‐trefoil fold; thus, in the numbering scheme developed for the purpose of comparing among different β‐trefoil proteins they are identified as Waters 1A, 1B, and 1C (Figure 1).
2.3. Cavity volume and Cα‐normalized B‐factors associated with conserved buried Waters 1A, 1B, and 1C
An analysis of the cavity volumes associated with the conserved buried Waters 1A, 1B, and 1C for representative β‐trefoil families is provided in Table S2. This analysis shows that 14% of the water cavities are detected using a probe radius of 1.4 Å, 64% are detected using a probe radius of 1.3 Å, and 100% are detected with a probe radius of 1.2 Å. The requirement of a probe radius <1.4 Å in order to detect the cavities associated with these buried waters indicates that the cavity volumes are essentially no larger than the size of a single water molecule. The corresponding Cα‐normalized B‐factors for buried Waters 1A, 1B, and 1C, for the set of β‐trefoil proteins, are 0.75 ± 0.23, 0.74 ± 0.25, and 0.75 ± 0.22, respectively (Table S2). Thus, the positional uncertainty of these buried waters is significantly lower than the average Cα atom in the protein structures (i.e., the reference normalized to 1.0). Additionally, the normalized B‐factors for the 1A, 1B, and 1C waters are essentially identical, and therefore consistent with a threefold symmetric relationship.
2.4. H‐bond interactions of buried Waters 1A, 1B, and 1C
Waters 1A, 1B, and 1C in the β‐trefoil proteins forms three structurally identical hydrogen bonds with local main chain atoms (Table S3). Due to this symmetric relationship, the H‐bond interactions can be described in general terms using a single example. The following discussion refers to the 3O49 structure (a de novo symmetric β‐trefoil) and the residue numbering scheme of that protein. Water 1A provides an acceptor for a main chain amide at residue Position 14, a donor for a main chain carbonyl at residue Position 23, and another donor for a main chain carbonyl at residue Position 42. Each of these main chain positions is located within the first trefoil‐fold subdomain; thus, with each of the buried waters the H‐bond interactions are contained entirely within the local trefoil‐fold subdomain. With regard to the H‐bonding geometry of these main chain groups, the angles are within 10° of optimum (i.e., trigonal planar geometry) and the H‐bond distances are similarly optimal (i.e., 2.75–2.87 Å) (Table S3). The coordination geometry of the water involves average angles of 93.6, 114, and 142.4°. A tetrahedral angle is 109.5; thus, there is some minor deformation from ideal tetrahedral angles; however, the distance from the buried water to the plane described by the protein H‐bond partners (i.e., out‐of‐plane distance) is 0.44 ± 0.21 Å, corresponding to a tetrahedron with a radius of 1.30 Å. Thus, the coordination geometry of the buried water to H‐bond partners is essentially tetrahedral. This tetrahedral geometry positions the unsatisfied water acceptor toward the exterior of the protein; however, the water is completely buried and there are no obvious H‐bond partners. Thus, the buried water may form a bifurcated H‐bond to the main chain amide, thereby avoiding an unsatisfied oxygen lone pair acceptor. In the 3O49 structure, the main chain amide at position 14 is assigned as “NX,” the main chain carbonyl of Position 23 is assigned as “OY” and the main chain carbonyl at Position 42 is assigned as “OZ.” Individual values for H‐bonding geometry with buried Waters 1A, 1B, and 1C with the NX, OY, and OZ partners in representative structures of β‐trefoil families is given in Table S3, and a diagram of the average geometry for the 1A, 1B, and 1C buried water in all structures is provided in Figure 2.
FIGURE 2.

Summary of H‐bond geometries for the conserved buried Waters A1, A2, and A3 in β‐trefoil proteins. Values indicated are the average and SD of 50 measurements from representative structures of 17 families of β‐trefoil proteins (see Table S3 for a comprehensive listing)
3. DISCUSSION
The set of buried waters common to all β‐trefoil proteins was interpreted as serving primarily a structural role, while the set of buried waters unique to a particular β‐trefoil family or domain was hypothesized as likely serving a functional role (or a structural role related to function) unique to that family. The focus of the present report is on the buried waters common to the β‐trefoil proteins. The analysis of the cavity volume and H‐bond geometry associated with Waters 1A, 1B, and 1C are remarkable: the cavity volume appears precisely sized for a water (i.e., optimizing van der Waals interactions); furthermore, the H‐bonding geometry with three main chain polar groups that form the cavity wall are essentially optimal and involve both donor and acceptor groups. This provides support that interpretation of the electron density in all the crystal structures as water and not an ion is warranted. The position of these waters within the structure is at a location where three β‐strands (Strands β1, β2, and β4) of the trefoil‐fold motif diverge from canonical antiparallel H‐bonding interactions (Figure 3). Goodfellow and coworkers 34 reported a detailed survey of protein main chain solvation as determined from high‐resolution crystal structures (although with no distinction as to whether water is buried or accessible). With regard to beta‐sheet solvation, this group identified three general categories: (a) edge, (b) middle, and (c) end. A common end‐type water is involved in an H‐bond interaction between the i, and j‐2 main chain residues, intercalating where antiparallel beta‐strands initiate a divergence (and with this water bridging a final interstrand H‐bond). This arrangement describes the buried 1A, 1B, and 1C waters; thus, at this position, a single water is performing an end‐type interaction simultaneously for three diverging β‐strands. Furthermore, this H‐bond role is structurally compatible with required donor/acceptor matching, tetrahedral geometry of water, and precise positioning of the buried water within a water‐sized cavity space. That such waters are rigidly positioned is supported by the average normalized B‐factors of these waters (~0.75 Cα).
FIGURE 3.

Main chain H‐bond interactions with buried Waters 1A, 1B, and 1C. The buried Waters 1A, 1B, and 1C are located at a site where three β‐strands diverge from the canonical antiparallel H‐bond pattern. The water is an “end” type water 34 that bridges the disrupted H‐bond interactions
The main chain H‐bonding groups of the buried waters are identified as NX, OY, and OZ. The amino acid side chain composition at the NX and OY positions in the β‐trefoil proteins are exclusively hydrophobic, with a majority representation by Leu at both positions (Table S4). These two positions form part of the canonical hydrophobic patterning of the primary structure of β‐trefoil proteins and contribute to the hydrophobic core. 27 , 28 , 35 Carugo 36 performed a statistical study of buried waters in the RCSB for 261 structures of 1.50 Å resolution or better. On average, the frequency of occurrence of an isolated buried water was once per 45 amino acids. No trend was observed for neighboring amino acid type (with the exception of a slightly greater representation of Pro residues), and main‐chain and side‐chain contacts with buried water appear with equal frequency. Thus, the H‐bonding of Waters 1A, 1B, and 1C with exclusively hydrophobic amino acids in the β‐trefoil proteins is notable. Water donor/acceptor coordination with the NX, OY, and OZ residues influences the Cα/Cβ side chain bond vector at these positions. Specifically, water H‐bonding coordination orients the two hydrophobic side chains at NX and OY toward the protein interior as a van der Waals interacting pair (Figure 4). Water coordination also influences the Cα/Cβ side chain bond vector at the OZ position. The OZ position does not exhibit any apparent conservation of side chain chemical property and is not located in the protein interior; however, the constraints of the local β4‐strand secondary structure positions the Cα/Cβ side chain bond vector at the OZ+2 position in the same general plane as the OZ side chain; consequently, the side chain at the OZ+2 position orients toward the core region (Figure 4). Notably, aromatic residues are present at 91% of the OZ+2 positions, and form extensive van der Waals interactions with the hydrophobic NX and OY packing group. Thus, the packing of the NX and OY hydrophobic side chains provides a complementary van der Waals surface for an aromatic ring at position OZ+2 and the conserved aromatic residue at the OZ+2 position is also part of the conserved hydrophobic patterning forming the central core of β‐trefoil proteins. The 1A, 1B, and 1C buried water molecules therefore not only provide H‐bond partners to unsatisfied buried main chain groups, but also orient conserved hydrophobic residues to promote van der Waals interactions, forming a significant portion of the cooperative hydrophobic core region of the β‐trefoil protein. These waters therefore serve a major structural role in the stability and folding of β‐trefoil proteins (reflecting their universally conserved nature).
FIGURE 4.

The H‐bond network of conserved buried water and local hydrophobic packing. A relaxed stereo diagram illustrating the exterior view (top panel) and interior view (bottom panel) of the conserved buried water environment in β‐trefoil proteins (this example is from 3O49, a de novo designed symmetric β‐trefoil). Residues associated with NX (Leu) and OY (Leu) are part of the characteristic hydrophobic patterning of the primary structure of β‐trefoil proteins and contribute to the cooperatively packing core. The H‐bond geometry of the buried water promotes orientation of the side chain (i.e., Cα–Cβ vector) at these positions toward the interior and engage in van der Waals interactions (lower panel). The OZ+2 position (Phe) is also part of the hydrophobic patterning of the primary structure. The buried water in combination with local β‐strand secondary structure orients the Cα–Cβ vector of the OZ+2 side chain toward the interior, promoting van der Waals interactions with the NZ and OY side chains (shown in space filling representation in the lower panel)
Water 1A provides H‐bond interactions to β‐strands 1, 2, and 4 within the first trefoil‐fold domain. Similarly, Waters 1B and 1C provide related H‐bond interactions within the second and third trefoil‐fold domains, respectively (i.e., β‐strands 5, 6, and 8, and 9, 10, and 12, respectively). The isolated trefoil‐fold domain (“Monofoil”) of the de novo designed symmetric β‐trefoil protein “Symfoil‐4P” was shown to spontaneously fold as a trimeric oligomer generating an intact β‐trefoil protein. 22 , 37 Thus, this trefoil‐fold domain contains an effective folding nucleus. Experimental studies of circular permutants of the Monofoil trefoil‐fold motif have demonstrated that all circular permutations of the trefoil‐fold motif yield collapsed molten globule monomers. 33 It is only with the wild‐type trefoil‐fold definition that Waters 1A, 1B, or 1C are buried and coordinate the nascent NX, OY, and OZ+2 hydrophobic cluster; all circular permutations yield a solvent accessible water and disruption of one or more H‐bonds. These considerations suggest that the interactions promoted by Waters 1A, 1B, or 1C form the essential folding nucleus of the trefoil‐fold motif (supporting a key role of these conserved buried waters in protein folding).
Huggins utilized the statistical mechanical method of inhomogeneous fluid solvation theory to quantify the enthalpic and entropic contributions of individual water molecules in protein cavities. 11 This analysis involved 23 waters in five different types of protein folds, including interleukin‐1 beta (IL‐1β) a β‐trefoil protein belonging to the cytokine superfamily. The set of waters characterized for IL‐1β included Waters 1A and 1B in the current nomenclature. Huggins' calculations indicated an average entropic penalty of 1.85 ± 0.05 kcal/mol (7.74 kJ/mol), and favorable enthalpy of −8.26 ± 1.22 kcal/mol (‐34.6 kJ/mol), for an overall favorable Gibbs energy of water burial of −6.41 ± 1.16 kcal/mol (−26.8 kJ/mol). Thus, these buried water molecules within the folded protein are associated with a modest entropic penalty, but a substantially greater favorable interaction enthalpy. Cast in terms of protein unfolding, the thermostability of FGF‐1 (a β‐trefoil protein, also in the cytokine superfamily), has a ΔG U of +21.3 kJ/mol 38 and an unfolding enthalpy of ΔH U of +423 kJ/mol. 39 According to Huggins, the three conserved buried water should contribute a total of approximately +104 kJ/mol in ΔH U; thus, they are a major contributor to the favorable enthalpy of protein folding, since desolvation of any one of these waters from the folded protein would lead to a net negative ΔG U (i.e., spontaneous protein unfolding). These three waters are therefore major contributors to the favorable thermodynamics of folding of the β‐trefoil fold. Water has been described as the “21st” amino acid due to its ubiquitous presence in protein structures. 40 In the case of the β‐trefoil fold, the underlying threefold symmetric architecture appears keenly dependent upon a specific buried water with no potential alternative H‐bond partner provided by a nearby side chain. These results strongly suggest that computational approaches to protein stability and folding that do not include discrete water effects are unlikely to adequately capture key folding and stability properties for the β‐trefoil protein architecture (and perhaps other folds as well). Revisiting the two alternative views of buried water described in the literature: Waters 1A, 1B, and 1C are not merely occupying a packing defect, neither are their effects limited to simply providing an H‐bond partner to an unsatisfied main chain polar group in the protein interior. These waters provide key coordination geometry that orients the formation of a central hydrophobic cluster.
The β‐trefoil architecture is described as having pseudo‐threefold rotational symmetry due to substantial primary structure asymmetry and relative insertions/deletions between members of this large family, as well as between repeating trefoil‐fold motifs within a single β‐trefoil protein (e.g., see Figure 1). Kikuchi and coworkers 29 reported that sequence identity between the different β‐trefoil superfamilies ranges from 3.9 to 12.1%. In a number of cases, only a single position may exhibit a conserved amino acid when comparing the primary structure of the repeating trefoil‐fold domains in a given β‐trefoil protein. Thus, computational identification of β‐trefoil proteins relies more upon characteristic chemical patterning rather than sequence identity. 31 , 32 This begs the question: “what particular threefold symmetric structural feature, if any, is conserved among β‐trefoil proteins”? The present report yields a surprising answer: it is a specific buried water and its structural coordination of three intersecting main chain groups and associated hydrophobic side chains. This conserved symmetric buried water is shown to be key to the structure, stability, and folding of the β‐trefoil architecture.
4. MATERIALS AND METHODS
4.1. Set of β‐trefoil structures and reference coordinate frame
The Structural Classification of Proteins database (SCOP2) 21 was queried for proteins belonging to the β‐trefoil fold (ID: 2000422). This search identified 10 superfamilies, including ricin B‐like lectins (ID: 3000678), STI‐like (ID: 300715), cytokine (ID: 3000648), actin‐crosslinking proteins (ID: 3000731), DNA‐binding protein LAG‐1 (CSL) (ID: 3000391), AbfB domain (ID: 3000397), agglutinin (ID: 3000397), MIR domain (ID: 3001569), 30 K lipoprotein C‐terminal domain‐like (ID: 3002324), and proteinase inhibitor 1‐like (ID: 3002326). Additionally, an informal superfamily of four de novo designed symmetric β‐trefoil proteins was also included. Each of these superfamilies contains one or more families of proteins, and each family contains one or more domains. Available X‐ray crystal structures for each domain in the Research Collaboratory for Structural Biology (RSCB) databank (rcsb.org) 41 were queried for resolution and the presence of mutations or ligands (NMR structures were excluded due to lack of modeled waters). The highest resolution structure with no mutations or ligands was selected for analysis. If no structure with unbound ligand was available, the highest resolution complex structure was identified. Mutant forms were selected if no other structures were available. Structures having a resolution >2.5 Å or average B‐factor >60 Å2 were not considered. If multiple instances of the protein were present in the asymmetric unit then the A chain was selected. If multiple (e.g., tandem) repeats of the β‐trefoil motif existed in the protein, then the first repeat was selected. Not all domains or families yielded representative structures using the above criteria, although a significant majority were.
Each crystal structure is in a potentially different coordinate frame, preventing a direct structural comparison. Therefore, the selected β‐trefoil crystal structures were rotated and translated to overlay them in an identical coordinate frame. The reference coordinate frame utilized was that of the “A” molecule of RCSB deposition 1RG8 (a 1.10 Å resolution structure of human FGF‐1) and a coordinate overlay of Cα atoms was performed using the Swiss PDB‐viewer software. 42 An overlay procedure was used that iteratively excluded positional outliers until the set of Cα residues overlaid with an RMSD of <~1.5 Å. The resulting rotation and translational matrix was applied to all protein atoms and waters in the structure for overlay. For some proteins, the β‐trefoil fold defines the complete architecture; in other cases, the β‐trefoil fold is a subdomain within a more extensive architecture (sometimes present as multiple subdomains). In such cases, individual β‐trefoil folds were isolated, along with associated waters within 3.4 Å, and treated as a separate structure for analysis.
4.2. Buried water identification, reference numbering scheme, and B‐factor normalization
The accessible surface area of water molecules, against a background of all protein atoms, was calculated using the EDPDB software package 43 and using a 1.4 Å radius probe. Waters exhibiting 0 Å2 accessible surface area were identified as “buried.” Although all proteins were overlaid to the same reference coordinate frame, waters in RCSB depositions are numbered essentially randomly, preventing a common reference number for structurally related waters; for this reason, a simple nomenclature was developed to identify buried waters. This nomenclature utilized the threefold rotational pseudosymmetry apparent in β‐trefoil proteins, using “A,” “B,” or “C” to indicate the respective trefoil‐fold subdomain (from N‐terminus to C‐terminus) that a particular buried water was principally associated. Waters were assigned an identifying number based upon structural uniqueness; thus, any waters present at the equivalent structural location within different trefoil‐fold subdomains were assigned the same identifying number; thus, Waters 1A, 1B, and 1C reside at structurally equivalent positions in each of the three trefoil‐fold subdomains. Working initially with the 1RG8 structure, all buried water were labeled in this manner. Several buried waters were located in only one or two of the trefoil‐fold repeats and were labeled accordingly. Structurally equivalent waters in different β‐trefoil proteins were identified based upon a contact distance within 1.8 Å of the reference set of waters established for FGF1. This reference set was extended as other β‐trefoil proteins were examined and novel buried waters identified. In this way, a consistent numbering scheme was established that permitted a direct comparison of all buried waters (both conserved and nonconserved) for the set of β‐trefoil proteins. Buried water B‐factors were normalized to the average Cα B‐factor for their respective crystal structure (H2O B norm = B H2O/<B Cα>) using the EDPDB software package. 43
4.3. Cavity calculations and structural measurements
Interior cavity volumes in crystal structures were calculated using the Molecular Surface Package software. 44 Calculations were performed by first stripping waters from the coordinate files. Cavity volumes were calculated using probe radii of 1.4, 1.3, and 1.2 Å. Model visualization, molecular distance, and angle calculations were performed using the COOT 45 and Swiss PDB‐viewer 42 software packages.
CONFLICT OF INTEREST
M. B. is a cofounder and has equity ownership in Trefoil Therapeutics Inc.
AUTHOR CONTRIBUTIONS
Michael Blaber: Conceptualization; data curation; formal analysis; funding acquisition; investigation; methodology; resources; supervision; validation; visualization; writing‐original draft; writing‐review and editing.
Supporting information
Table S1 Crystal data for representative structures of β‐trefoil domains
Table S2. Conserved buried waters 1A, 1B and 1C in representative β‐trefoil family members, their Cα‐normalized B‐factors, and associated cavity volumes (Å3) with different probe radii
Table S3. H‐bond parameters of buried solvent 1A, 1B and 1C in representative families of β‐trefoil proteins
Table S4. Amino acid side chain composition at buried water XN, YO and ZO + 2 positions.
ACKNOWLEDGMENTS
The author thanks Dr Liam Longo for helpful discussions. This work was supported in part by a research support agreement from Trefoil Therapeutics Inc. Support from the FSU Department of Biomedical Sciences is acknowledged.
Blaber M. Conserved buried water molecules enable the β‐trefoil architecture. Protein Science. 2020;29:1794–1802. 10.1002/pro.3899
Funding information Trefoil Therapeutics Inc., Grant/Award Number: RF02251
REFERENCES
- 1. Carugo O. Structure and function of water molecules buried in the protein core. Curr Protein Pept Sci. 2015;16:259–265. [DOI] [PubMed] [Google Scholar]
- 2. Connolly ML. Atomic size packing defects in proteins. Intl J Pept Protein Res. 1986;28:360–363. [DOI] [PubMed] [Google Scholar]
- 3. Quillin ML, Wingfield PT, Matthews BW. Determination of solvent content in cavities in IL‐1beta using experimentally phased electron density. Proc Natl Acad Sci U S A. 2006;103:19749–19753. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Ernst JA, Clubb RT, Zhou H‐X, Gronenborn AM, Clore GM. Demonstration of a positionally disordered water within a protein hydrophobic cavity by NMR. Science. 1995;267:1813–1817. [DOI] [PubMed] [Google Scholar]
- 5. Yu B, Blaber M, Gronenborn AM, Clore GM, Caspar DLD. Disordered water within a hydrophobic protein cavity visualized by x‐ray crystallography. Proc Natl Acad Sci U S A. 1999;96:103–108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Williams MA, Goodfellow JM, Thornton JM. Buried waters and internal cavities in monomeric proteins. Protein Sci. 1994;3:1224–1235. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Buckle AM, Cramer P, Fersht AR. Structural and energetic responses to cavity‐creating mutations in hydrophobic cores: Observation of a buried water molecule and the hydrophilic nature of such hydrophobic cavities. Biochemistry. 1996;35:4298–4305. [DOI] [PubMed] [Google Scholar]
- 8. Wolfenden R, Radzicka A. On the probability of finding a water molecule in a nonpolar cavity. Science. 1994;265:936–937. [DOI] [PubMed] [Google Scholar]
- 9. Takano K, Funahashi J, Yamagata Y, Fujii S, Yutani K. Contribution of water molecules in the interior of a protein to the conformational stability. J Mol Biol. 1997;274:132–142. [DOI] [PubMed] [Google Scholar]
- 10. Takano K, Yamagata Y, Yutani K. Buried water molecules contribute to the conformational stability of a protein. Protein Eng Des Sel. 2003;16:5–9. [DOI] [PubMed] [Google Scholar]
- 11. Huggins DJ. Quantifying the entropy of binding for water molecules in protein cavities by computing correlations. Biophys J. 2015;108:928–936. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Yu H, Rick SW. Free energy, entropy, and enthalpy of a water molecule in various protein environments. J Phys Chem B. 2010;114:11552–11560. [DOI] [PubMed] [Google Scholar]
- 13. Park S, Saven JG. Statistical and molecular dynamics studies of buried waters in globular proteins. Proteins. 2005;60:450–463. [DOI] [PubMed] [Google Scholar]
- 14. Eriksson AE, Baase WA, Zhang XJ, et al. Response of a protein structure to cavity‐creating mutations and its relation to the hydrophobic effect. Science. 1992;255:178–183. [DOI] [PubMed] [Google Scholar]
- 15. Jackson SE, Moracci M, elMasry N, Johnson CM, Fersht AR. Effect of cavity‐creating mutations in the hydrophobic core of chymotrypsin inhibitor 2. Biochemistry. 1993;32:11259–11269. [DOI] [PubMed] [Google Scholar]
- 16. Lim WA, Farruggio DC, Sauer RT. Structural and energetic consequences of disruptive mutations in a protein core. Biochemistry. 1992;31:4324–4333. [DOI] [PubMed] [Google Scholar]
- 17. Loris R, Langhorst U, de Vos S, et al. Conserved water molecules in a large family of microbial ribonucleases. Proteins. 1999;36:117–134. [DOI] [PubMed] [Google Scholar]
- 18. Likić VA, Prendergast FG, Juranić N, Macura S. A “structural” water molecule in the family of fatty acid binding proteins. Protein Sci. 2000;9:497–504. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Teze D, Hendrickx J, Dion M, et al. Conserved water molecules in family 1 glycosidases: A DXMS and molecular dynamics study. Biochemistry. 2013;52:5900–5910. [DOI] [PubMed] [Google Scholar]
- 20. Dey P, Bairagya HR, Roy A. Putative role of invariant water molecules in the X‐ray structures of family G fungal endoxylanases. J Biosci. 2018;43:339–349. [PubMed] [Google Scholar]
- 21. Andreeva A, Howorth D, Chothia C, Kulesha E, Murzin AG. SCOP2 prototype: A new approach to protein structure mining. Nucleic Acids Res. 2013;42:D310–D314. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Lee J, Blaber M. Experimental support for the evolution of symmetric protein architecture from a simple peptide motif. Proc Natl Acad Sci U S A. 2011;108:126–130. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Longo LM, Kumru OS, Middaugh CR, Blaber M. Evolution and design of protein structure by folding nucleus symmetric expansion. Structure. 2014;22:1377–1384. [DOI] [PubMed] [Google Scholar]
- 24. Broom A, Ma SM, Xia K, et al. Designed protein reveals structural determinants of extreme kinetic stability. Proc Natl Acad Sci. 2015;112:14605–14610. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Terada D, Voet ARD, Noguchi H, et al. Computational design of a symmetrical β‐trefoil lectin with cancer cell binding activity. Sci Rep. 2017;7:5943. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Blow DM, Janin J, Sweet RM. Mode of action of soybean trypsin inhibitor (Kunitz) as a model for specific protein‐protein interactions. Nature. 1974;249:54–57. [DOI] [PubMed] [Google Scholar]
- 27. McLachlan AD. Three‐fold structural pattern in the soybean trypsin inhibitor (Kunitz). J Mol Biol. 1979;133:557–563. [DOI] [PubMed] [Google Scholar]
- 28. Murzin AG, Lesk AM, Chothia C. β‐Trefoil fold. Patterns of structure and sequence in the kunitz inhibitors interleukins‐1β and 1α and fibroblast growth factors. J Mol Biol. 1992;223:531–543. [DOI] [PubMed] [Google Scholar]
- 29. Kirioka T, Panyavut A, Kikuchi T. Detection of folding sites of β‐trefoil fold proteins based on amino acid sequence analyses and structure‐based sequence alignment. J Proteom Bioinform. 2017;10:222–235. [Google Scholar]
- 30. Brych SR, Blaber SI, Logan TM, Blaber M. Structure and stability effects of mutations designed to increase the primary sequence symmetry within the core region of a β‐trefoil. Protein Sci. 2001;10:2587–2599. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Feng J, Li M, Huang Y, Xiao Y. Symmetric key structural residues in symmetric proteins with beta‐trefoil fold. PLoS One. 2010;5:e14138. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Kimura R, Aumpuchin P, Hamaue S, Shimomura T, Kikuchi T. Analyses of the folding sites of irregular β‐trefoil fold proteins through sequence‐based techniques and Gō‐model simulations. BMC Mol Cell Biol. 2020;21:1–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Tenorio CA, Longo LM, Parker JB, Lee J, Blaber M. Ab initio folding of a trefoil‐fold motif reveals structural similarity with a β‐propeller blade motif. Protein Sci. 2020;29:1172–1185. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Thanki N, Umrania Y, Thornton JM, Goodfellow JM. Analysis of protein main‐chain solvation as a function of secondary structure. J Mol Biol. 1991;221:669–691. [DOI] [PubMed] [Google Scholar]
- 35. Blaber M, DiSalvo J, Thomas KA. X‐ray crystal structure of human acidic fibroblast growth factor. Biochemistry. 1996;35:2086–2094. [DOI] [PubMed] [Google Scholar]
- 36. Carugo O. Statistical survey of the buried waters in the Protein Data Bank. Amino Acids. 2016;48:193–202. [DOI] [PubMed] [Google Scholar]
- 37. Lee J, Blaber SI, Dubey VK, Blaber M. A polypeptide "building block" for the ß‐trefoil fold identified by "top‐down symmetric deconstruction". J Mol Biol. 2011;407:744–763. [DOI] [PubMed] [Google Scholar]
- 38. Blaber SI, Culajay JF, Khurana A, Blaber M. Reversible thermal denaturation of human FGF‐1 induced by low concentrations of guanidine hydrochloride. Biophys J. 1999;77:470–477. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Longo LM, Gao Y, Tenorio CA, Wang G, Paravastu AK, Blaber M. Folding nucleus structure persists in thermally‐aggregated FGF‐1. Protein Sci. 2018;27:431–440. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Fraxedas J. Water at interfaces: A molecular approach. Boca Raton, FL: CRC Press, 2014. [Google Scholar]
- 41. Berman HM, Westbrook J, Feng Z, et al. The Protein Data Bank. Nucleic Acids Res. 2000;28:235–242. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Guex N, Peitsch MC. SWISS‐MODEL and the Swiss‐PdbViewer: An environment for comparative protein modeling. Electrophoresis. 1997;18:2714–2723. [DOI] [PubMed] [Google Scholar]
- 43. Zhang X‐J, Matthews BW. EDPDB: A multi‐functional tool for protein structure analysis. J Appl Cryst. 1995;28:624–630. [Google Scholar]
- 44. Connolly ML. The molecular surface package. J Mol Graph. 1993;11:139–141. [DOI] [PubMed] [Google Scholar]
- 45. Emsley P, Cowtan K. Coot: Model‐building tools for molecular graphics. Acta Crystallogr. 2004;D60:2126–2132. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Table S1 Crystal data for representative structures of β‐trefoil domains
Table S2. Conserved buried waters 1A, 1B and 1C in representative β‐trefoil family members, their Cα‐normalized B‐factors, and associated cavity volumes (Å3) with different probe radii
Table S3. H‐bond parameters of buried solvent 1A, 1B and 1C in representative families of β‐trefoil proteins
Table S4. Amino acid side chain composition at buried water XN, YO and ZO + 2 positions.
