Abstract
The size of a protein is an important factor for understanding the sequence–structure relationship, and it affects both the amino acid composition and the residue burial of proteins. However, it is usually measured as the number of amino acids, although these effects would result from the reduction of surface regions relative to the volume of core regions in larger proteins. In addition, although these two effects are dependent on each other, they have been studied separately. In this study, we investigated them by considering the surface-to-volume ratio (SVR), and observed the correlation between them. We found that the reduction of several hydrophilic residues is more strongly correlated with SVR than with protein size (the number of amino acids) and that SVR directly affects the amino acid composition. The difference as a descriptor between SVR and size is also supported by the observation that the secondary structural elements correlate completely differently with SVR and with size. Furthermore, for the four most hydrophilic residues, glutamine, arginine, glutamic acid, and lysine, balances between the decrease in composition and the increase in core burial were observed. We found that the burial of glutamine and arginine became accelerated at SVR = 0.3 Å−1 (approximately 132 residues) as the protein size increased, but that lysine has an upper limit of 0.9% for its occurrence in the core. The uniqueness of lysine was also elucidated by comparison with the burial environments of the four hydrophilic residues.
Keywords: protein structure, domain size, amino acid propensity, hydrophobicity, surface-to-volume ratio
At present, there is >10% annual growth rate in the numbers of both the total protein structures and nonredundant representative structures available in the Protein Data Bank (PDB) (Levitt 2007). Importantly, larger databases facilitate more detailed analyses of protein structure and function. As databases grow, it becomes possible to calculate meaningful statistics about the characteristics of individual amino acids under different structural conditions. This requires a large amount of structural information, because a sufficient number of structures for statistical analyses should remain after dividing the database. Moreover, larger databases provide more frequent chances to observe rare events, such as the complete burial of charged residues in protein core regions.
The size of a protein is an important factor to describe the behavior of amino acids, because the residues in small proteins and those in large ones are in different “average” environments, and because protein sizes range from several tens to several hundreds of residues. As Janin (1976) and Teller (1976) pointed out, if the shape of a protein were close to a sphere, then its surface area would be approximately proportional to two-thirds power law of its volume. Actually, the surface area of a protein reportedly follows a power law, 0.73 to its molecular weight (or volume, approximately) (Miller et al. 1987), which is a bit larger than that of a complete sphere, 0.67, but qualitatively, the relative core size becomes larger and the relative surface shell area gets smaller with increasing protein size.
Since hydrophilic residues usually occupy surface regions to interact with water molecules, the reduction of the surface area would affect the propensities of these residues. Two possible changes are expected: the decrease in occurrence and the burial in the protein core.
For the first possibility, the authors of several reports have examined the effect of protein size on amino acid occurrence. Some of them addressed the correlation between the protein size and the fraction of hydrophobic residues, but no increase was observed, except for small proteins with fewer than 170 residues, when the amino acids were classified as either hydrophobic or hydrophilic residues (Miao et al. 2004; Sandelin 2004; Shen et al. 2005). Others directly compared the frequency of each of the 20 amino acids among subsets of different sizes (White 1992; Bastolla and Demetrius 2005; Rykunov and Fiser 2007). Bastolla and Demetrius (2005) reported the correlations between the protein size and the frequency of residues and found that the frequency of hydrophobic residues remained constant with respect to the protein size but that the frequency of charged residues changed as the protein size increased, where lysine, arginine, and glutamic acid negatively correlated whereas aspartic acid positively correlated with the protein size. In short, the increase in the protein size decreases the frequency of charged residues or highly hydrophilic residues, except for aspartic acid.
For the second possibility, hydrophilic residues reportedly increased in the protein core as the protein size increased (Miller et al. 1987; Sandelin 2004). The partitioning of hydrophobic and hydrophilic residues into surface and core is usually used to evaluate the stability of protein structures, in the forms of statistical potentials. Therefore, statistical potentials have been shown to be sensitive to the protein size in the database, because of changes in the probabilities of hydrophilic residues being buried in the protein core (Thomas and Dill 1996).
As described, there are two alternative ways that hydrophilic residues adapt themselves to the reduction of the polar environment, but these two effects have been discussed separately, although the two effects would correlate with each other. In addition, the size of a protein is usually considered by using the number of amino acid residues, although the surface-to-volume ratio (SVR) directly evaluates the reduction of the polar environment (surface regions) relative to the hydrophobic regions (protein cores). Therefore, in this paper, we tried to analyze the dependence of these two effects on each other by classifying proteins according to the SVR. The SVR would lead to a better understanding of how proteins adjust themselves to changes in the environment.
Results and Discussion
Data set
In this study, we used the basic assumption that each domain of a protein comprises an independently folded unit and employed the definition of a structural domain that appears in the Structural Classification of Proteins (SCOP) database for the following analyses (Murzin et al. 1995). More precisely, the SCOP40 domain set derived from the SCOP 1.73 domain set was used, in which representative domains were selected after clustering all of the SCOP domains by sequence similarities, with the threshold of 40% identity (Chandonia et al. 2004). Among them, X-ray structures with resolutions better than or equal to 3.5 Å were selected, and membrane proteins were excluded. As a result, 7309 domains were chosen for the analyses.
It should be noted that our approach assumes that the domain surfaces are entirely solvent accessible. However, some domain surfaces may not be solvent accessible because of the domain–domain interactions, which would affect our results. It is not easy to eliminate all of the interfaces on domains, but to estimate the effects, we constructed another data set including only the protein chains made from a single domain, in which 3015 chains were selected from the 7309 domains, according to the definition in SCOP. As shown in the Supplemental Table 1, which corresponds to the main Table 1 with all domains, the results described in the main text are not seriously affected by selecting single domains. Therefore, we used all of the domains in the data set in this study.
Table 1.
Correlations between the residue occurrences and the reduction of surface area
Changes in amino acid occurrence
SVR was defined as the ratio between the accessible surface area and the volume of the domain, and it ranged from about 0.19 Å−1 to 0.45 Å−1. The distribution of SVR against protein size is shown in Supplemental Figure 1. The correlations between amino acid occurrence and SVR or size were observed by using the Pearson's correlation coefficients (PCCs) and partial correlations when the effect of size or SVR was removed (Table 1). The distributions of all amino acids against SVR are also shown in Supplemental Figure 2. Since the increase in the domain size decreases the SVR, the minus SVR value was employed, instead of the intact SVR, to ensure that the signs of the PCCs in the first and third columns represented the same trend. Thus, negative and positive correlations reflect systematic decreases and increases in amino acid occurrences as the proteins become larger, respectively. For our data set of 7309 domains, using a significant level of P < 10−3, the absolute values of the PCCs larger than 0.040 are considered as significant and are indicated in boldface in Table 1.
As described in the introductory paragraph, hydrophobic residues are expected to have positive correlations, and hydrophilic residues to have negative ones. Among the hydrophilic residues, we observed significant negative correlations for lysine, glutamic acid, arginine, and glutamine, and positive correlations for aspartic acid and asparagine. This observation is fundamentally the same as that previously reported (asterisks in the third column of Table 1 indicate the residues reported by Bastolla and Demetrius [2005]), except that the small negative correlation of glutamine and the small positive correlation of asparagine were detected here. Glutamic acid and glutamine (Glx) were found to be less frequently buried in the core than their similar residues, aspartic acid and asparagine (Asx), respectively (Wang et al. 2004; Zhou and Zhou 2004), because in protein structures, Asx residues have a stronger tendency to hydrogen bond to the local main chain than their Glx residue counterparts (Wan and Milner-White 1999; Eswar and Ramakrishnan 2000). These observations support the idea that the residues with decreased occurrence were highly hydrophilic residues.
On the other hand, we observed significant increases of hydrophobic residues that were not reported previously. The reason these hydrophobic residues increase as proteins become larger is not fully understood. Interestingly, a hydrophobic residue, valine, has a negative correlation with size, which can be explained by changes in the composition of secondary structural elements, as will be described later.
The residue occurrences usually correlated more strongly with SVR than with protein size. To assess which of the two correlations was essential, we measured the partial correlation coefficients between occurrence and size, with the effect of SVR removed (the second column), and those between occurrence and SVR, with the effect of size removed (the fourth column). For the hydrophilic residues, the partial correlation between occurrence and size became insignificant when the effect of SVR was removed, whereas the partial correlation between occurrence and SVR with the effect of size removed remained significant. These results support the idea that the occurrences of hydrophilic residues are more directly dependent on the change in SVR than the change in size.
Similarly, for the hydrophobic residues, the occurrences of highly hydrophobic residues (tyrosine, tryptophan, and phenylalanine) and glycine were directly dependent on SVR. However, for some residues, the situation seemed to be more complicated. As shown in the second and fourth columns of Table 1, valine and isoleucine have oppositely significant partial correlations. Valine and isoleucine are known to prefer strand structures (Chou and Fasman 1974). Actually, in our data set about 13.8% and 10.0% of the valine and isoleucine residues were found in strand structures, respectively, which were much higher percentages than in helix (6.1% for valine and 5.8% for isoleucine) and loop (4.6% for valine and 3.5% for isoleucine) structures. Therefore, we suspected that the correlations of secondary structure compositions with SVR might be vastly different from those with size, in spite of the high correlation between SVR and size (PCC = −0.74, Supplemental Fig. 1).
The PCCs and the partial correlation coefficients for secondary structure contents are shown in Table 2. Here, the intact SVR was used in the second column to measure the correlation of secondary structures with the increase of surface area. The helix content was strongly correlated with the protein size (correlation = 0.102). However, it is surprising that the helix content does not change with respect to SVR (correlation = 0.003). In a similar way, the correlations of the loop content with either SVR or size were different. These tendencies become clearer in the partial correlation. The partial correlations in the third and fourth columns have opposite signs. Given that the sizes of the proteins were the same, the α-helix and loop content increased, and that of the strands decreased with SVR. These differences come from the fact that SVR can evaluate the structural differences of secondary structures. For a given number of amino acids, helical proteins would have structures different from a sphere on average, resulting in a large relative surface area. In our data set, the α proteins had significantly larger SVR values than β proteins or α-β proteins with a similar size (data not shown). Loop structures would increase the complexity of the surface, which would increase both the surface area and SVR.
Table 2.
Correlations between the composition of secondary structural elements and the change of protein size
Data set division according to SVR
For further analyses, the data set was divided into subsets according to SVR (Table 2). The second and third rows in Table 3 show the mean and standard deviation of the domain size in each SVR bin. One disadvantage of classifying the domains according to size is that it does not separate them clearly according to the fractional amount of the surface area, as indicated by the relatively large standard deviations to the difference between the neighboring means. Another disadvantage is that classifying domains by size evenly would not separate larger domains as well as smaller domains, because the difference between the mean sizes of neighboring SVR bins enlarges with increasing size or with decreasing SVR. Therefore, to assess the effect of the fractional amount of the surface area appropriately, it is necessary to classify domains according to SVR.
Table 3.
Division of the data set according to the SVR
The balance of the two size effects, the change in occurrence, and the propensity to be buried in the core for hydrophilic residues
In Table 3, the subsets with more than 200 domains, the SVR bins from 0.22 Å−1 to 0.36 Å−1, were used for further analyses. Four hydrophilic residues, glutamine, arginine, glutamic acid, and lysine, were chosen, because their decreases in occurrence with SVR were significant. For these residues, a correlation between the two size effects, the decrease in occurrence and the increase in the propensity to be buried in the core, were observed with the change of SVR (Fig. 1). The corresponding figure for the other two hydrophilic residues, aspartic acid and asparagine, is shown in Supplemental Figure 3.
Figure 1.
The balance between composition change and residue burial. The horizontal and vertical axes show the fractional change of the occurrence from the base line occurrence (at SVR = 0.3 Å−1) and the normalized occurrence in the domain core, respectively, for glutamine (A), arginine (B), glutamic acid (C), and lysine (D). The data points and the error bars show the estimates and the standard errors for each bin. The figures near the data points represent their corresponding SVR values for SVR bins of 0.22 Å−1, 0.3 Å−1, and 0.36 Å−1. The broken line represents the regression line for the eight data points. The two solid lines represent the regression lines for the first half (SVR = 0.22 ∼ 0.28 Å−1) or the latter half (SVR = 0.3 ∼ 0.36 Å−1) of the data points.
The two effects were measured by the fractional change in occurrence and by the normalized occurrence in the core, and these variables are depicted in the scatter plot for each of the four residues (Fig. 1). Each data point and its error bars correspond to the estimate and the standard errors of these variables for a single SVR bin. In all four graphs, the data points are distributed from right to left, in descending order of their SVR values. The horizontal trend of the data points from right to left as the SVR decreased indicates the decrease in occurrence, and the vertical trend upward indicates the increase in the fraction the residue occupies in the core. The direction of the trend between two data points shows the balance between the magnitudes of the two effects induced by the change in SVR.
Since the vertical axes were normalized to remove the bias from the difference in the residue abundance (see Materials and Methods), the vertical positions of the plots for the same SVR bin can be compared among the residues. Among the four residues, glutamine is most frequently and lysine is least frequently observed in the core, for all of the SVR bins. On the other hand, there is a crossover of the normalized fractions in the core for glutamic acid and arginine between the SVR bins of 0.28 Å−1 and 0.26 Å−1, which correspond to the mean domain sizes of 176 and 238 residues, respectively (Fig. 1B,C). Arginine is more frequently found in the core than glutamic acid, when the relative surface area or SVR is small. This means that the “hydrophobicity” changes according to the protein size, when it is measured by the normalized occurrence in the core, as usual.
Linear regression (Fig. 1, broken line) was performed to approximate the balance between the two effects. The slopes of the regression lines were −0.063, −0.040, −0.013, and −0.011 for glutamine, arginine, glutamic acid, and lysine, respectively. This indicates that a 1% decrease in occurrence (20% fractional decrease from 5%) was accompanied by increases in the core fraction by 1.3%, 0.8%, 0.26%, and 0.21% for these residues, respectively. The slight slopes of lysine and glutamic acid indicate that they decrease in occurrence rapidly with the reduction of surface area, because of the difficulty of burying these residues, whereas the steep slopes of arginine and glutamine mean that they increase their fractions of buried residues, rather than decrease in occurrence.
Although the regression by a single line approximated the data points well, there seemed to be a change in the slopes around SVR = 0.3 Å−1. Therefore, two regression lines (Fig. 1, solid lines) were added, one for small SVR bins (SVR = 0.22 ∼ 0.28 Å−1), and the other for large SVR bins (SVR = 0.3 ∼ 0.36 Å−1). Three- to sevenfold change in the slopes were observed for glutamine, arginine, and glutamic acid, but for lysine, the two regression lines were almost parallel. The slopes of the regression lines for small and large SVR bins indicated that as the occurrence decreased by 1%, the fraction of the residue in the core increased by 2.2% and 0.56% for glutamine, 1.0% and 0.14% for arginine, 0.33% and 0.11% for glutamic acid, and 0.085% and 0.086% for lysine. The improvements in the regression were measured by (d1 − d2)/d1, where d1 is the sum of squared residuals for the regression by the one line and d2 is that for the regression by the two lines, and were 0.86 for glutamine, 0.84 for arginine, 0.77 for glutamic acid, and 0.64 for lysine. The significance of the improvements when using two regression lines is not clear because of the small number of data points, but the strong improvements for glutamine and arginine may imply the existence of a turning point for the balance between the two effects around SVR = 0.3 Å−1 (about 132 residues). The increase in the buried residues seems to be accelerated with SVR >0.3 Å−1, which means that glutamine and arginine can be buried more easily when proteins become large. On the other hand, such a turning point is not clear for lysine. The fraction of lysine in the core peaks at about 0.9% (Fig. 1D), and thus may have an upper limit in the domain core, even if the relative core size increases.
The increase in buried hydrophilic residues and their stability
These increased fractions of hydrophilic residues in the domain core could cause structural stress, where the stability of the local structure is not satisfied, because of the increase in domain size. Therefore, we examined whether the polar side chains of those hydrophilic residues formed satisfactory polar interactions in the domain core. The stability of the buried hydrophilic residues among proteins with different SVRs was estimated by the number of stabilizing factors, such as hydrogen bonds, amino-aromatic hydrogen bonds, and salt bridges (see Materials and Methods for details). For this analysis, stricter criteria for a buried residue were used, and those subsets with more than 50 buried residues for each of the four hydrophilic residues, the SVR bins from 0.22 Å−1 to 0.30 Å−1, were used. The results for the other two hydrophilic residues are provided as Supplemental Figure 4.
As shown in Figure 2, most of the buried hydrophilic side chains had multiple stabilizing factors. Despite their increasing numbers as the SVR decreased, the number of stabilizing factors per buried residue was independent of the SVR for glutamine, arginine, and glutamic acid (Fig. 2A–C). The increased numbers of buried hydrophilic residues in the core for these residues did not cause structural stress, since they were accommodated by polar interactions. Therefore, as the SVR decreased, the enlarged protein core can afford to provide more polar interactions and to bury increasing numbers of hydrophilic residues.
Figure 2.
Changes in the number of stabilizing factors per buried residue according to SVR. The number of stabilizing factors, hydrogen bonds, amino-aromatic hydrogen bonds, and salt bridges that lack hydrogen bonds, per buried residue of glutamine (A), arginine (B), glutamic acid (C), and lysine (D) are shown in stacked bar graphs. Asterisks indicate that the difference between the distributions of the two bins is statistically significant (P < 0.05 by χ2 test).
On the other hand, the number of stabilizing factors for buried lysine increased as the SVR decreased (Fig. 2D, P < 0.05 by χ2 test), although the significance was relatively low because of the small numbers of buried residues. As the capacity of the core to provide polar interactions increased, the fraction in the domain core did not increase much (Fig. 1D), but each buried lysine had more stabilizing factors. These changes for lysine are in contrast with those for the other three residues, where the fraction in the cores increased, but the number of stabilizing factors for each buried residue remained constant. This difference may result from the fact that lysine is the most difficult residue to bury.
Lysine was, by far, the least frequently buried residue among the 20 amino acids. The average number of stabilizing factors for buried lysine was 2.40(3), which was smaller than that for buried arginine, 3.52(3), or for glutamic acid, 2.43(2). The errors of the last digit are denoted by parentheses, for example, 2.40(3) = 2.40 ± 0.03. However, arginine and glutamic acid could frequently form multiple hydrogen bonds with another residue, which was quite rare for lysine. Therefore, the numbers of interacting residues for buried arginine and glutamic acid were reduced to 2.67(2) and 2.07(2), respectively. On the other hand, that for buried lysine was approximately the same as the number of stabilizing factors, 2.35(3). As the numbers of polar atoms in the side chains for arginine, glutamic acid, and lysine are 3, 2, and 1, the numbers of interacting residues per polar atom are 0.88, 1.04, and 2.35, respectively. Burying a lysine would be difficult, because it requires two other residues to hydrogen bond to the single nitrogen atom in its side chain.
Materials and Methods
Domain size and surface-to-volume ratio (SVR)
As explained in the introductory paragraph, we aimed to observe the behavior of hydrophilic residues when the relative surface area of a domain decreased. The accessible surface area and the volume of the protein domain were calculated, as described by Eisenhaber and colleagues (Eisenhaber et al. 1995). The ratio between the accessible surface area and the volume of the domain was defined as SVR, which ranged from 0.19 Å−1 to 0.45 Å−1 in our data set. The surface of a domain in a multidomain chain is defined under the condition without other domains of the protein chain, because we assumed that the domain surfaces were accessible to water.
The data set was divided according to the SVR. Twelve bins were created, with eleven spanning 0.02 Å−1 and centered from 0.2 Å−1 to 0.4 Å−1 for every 0.02 Å−1, and the last bin involving domains with SVRs >0.41 Å−1. The number of domains and the average and standard deviation of the domain size for each bin are shown in Table 2.
Calculation of the correlations or partial correlations
The correlation between two variables X and Y is defined by the PCC,
![]() |
where 〈X〉 represents the average of X. To assess whether the correlation between X and Y is due to the third variable Z, the partial correlation coefficients were used. The partial correlation coefficient between X and Y when the effect of Z is removed is defined as
![]() |
Evaluation of the size effects
The changes in the propensities for hydrophilic residues were determined as follows. The bin of SVR = 0.3 Å−1 was defined as the reference state. The occurrence of residues at the reference state was considered as the baseline occurrence for each residue. The fractional change in occurrence from the baseline was measured as
![]() |
Here fSVRi and frefi refer to the frequency of residue i for each SVR bin and for the reference state, respectively.
The fractions of these hydrophilic residues in the protein core were then calculated as NbSVRi/NbSVR, where NbSVRi and NbSVR represent the number of residue i buried in the core and the number of residues in the core for each SVR bin, respectively. A residue was considered as being buried if its accessible surface area was <15% of the maximum value for its residue type (Chothia 1976), and the domain core was considered to consist of all of the buried residues. These fractions were then normalized with respect to the baseline frequency of residues by multiplying them by 0.05/frefi .Therefore, the normalized fraction in the domain core represents the occurrence of each residue in the core, when the baseline occurrence for each residue was normalized to 5%.
Stabilizing factors for buried hydrophilic residues
To analyze the number of stabilizing factors for buried hydrophilic residues, stricter criteria for a buried residue were used, in that a residue was defined as being buried if both the accessible surface area (ASA) for the entire residue and that for the polar atoms in its side chain were <5% of the maximum value. Maximum ASA values for the polar atoms in side chains were empirically determined as 100 Å2 for glutamine and glutamic acid, 60 Å2 for lysine, and 132 Å2 for arginine. Stabilizing factors taken into account were (1) hydrogen bonds, (2) amino-aromatic hydrogen bonds (for lysine and arginine only), and (3) salt bridges. Hydrogen bonds and amino-aromatic hydrogen bonds were defined as described by McDonald and Thornton (1994). A salt bridge was defined for a pair of oppositely charged residues if at least one pair of the side chain polar atoms was within 4 Å, and the centers of the side chain polar atoms were within 4 Å (Kumar and Nussinov 1999). A salt bridge was counted as an additional stabilization factor only if the negative and positive side chains lacked hydrogen bonds between them. Since amino-aromatic hydrogen bonds or salt bridges lacking hydrogen bonds were rare, most of the stabilizing factors consisted of hydrogen bonds.
Electronic supplemental material
The Supplemental materials provide additional details on the following: (1) correlations between the residue occurrences and the reduction of surface area for single domain chains (Supplemental Table 1); (2) scatter plot between the number of amino acids and SVR (Supplemental Fig. 1); (3) amino acid occurrences as functions of SVR (Supplemental Fig. 2); (4) the balance between composition change and residue burial for asparagine and aspartic acid (Supplemental Fig. 3); and (5) changes in the number of stabilizing factors per buried residue according to SVR for asparagine and aspartic acid (Supplemental Fig. 4).
Acknowledgments
This work was partially supported by a Grant-in-Aid for Scientific Research on the Priority Area “Transportsome” from the Ministry of Education, Culture, Sports, Science, and Technology of Japan, and by a Grant-in-Aid from the Institute for Bioinformatics Research and Development, Japan Science and Technology Corporation (BIRD-JST) to K.K. Computation time was provided by the Super Computer System, Human Genome Center, Institute of Medical Science, The University of Tokyo.
Footnotes
Supplemental material: see www.proteinscience.org
Reprint requests to: Kengo Kinoshita, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan; e-mail: kino@ims.u-tokyo.ac.jp; fax: 81-3-5449-5133.
Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps.035592.108.
References
- Bastolla, U., Demetrius, L. Stability constraints and protein evolution: The role of chain length, composition and disulfide bonds. Protein Eng. Des. Sel. 2005;18:405–415. doi: 10.1093/protein/gzi045. [DOI] [PubMed] [Google Scholar]
- Chandonia, J.M., Hon, G., Walker, N.S., Lo Conte, L., Koehl, P., Levitt, M., Brenner, S.E. The ASTRAL compendium in 2004. Nucleic Acids Res. 2004;32:D189–D192. doi: 10.1093/nar/gkh034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chothia, C. The nature of the accessible and buried surfaces in proteins. J. Mol. Biol. 1976;105:1–12. doi: 10.1016/0022-2836(76)90191-1. [DOI] [PubMed] [Google Scholar]
- Chou, P.Y., Fasman, G.D. Conformational parameters for amino acids in helical, β-sheet, and random-coil regions calculated from proteins. Biochemistry. 1974;13:211–222. doi: 10.1021/bi00699a001. [DOI] [PubMed] [Google Scholar]
- Eisenhaber, F., Lijnzaad, P., Argos, P., Sander, C., Scharf, M. The double cubic lattice method: Efficient approaches to numerical integration of surface and volume and to dot surface contouring of molecular assemblies. J. Comput. Chem. 1995;16:273–284. [Google Scholar]
- Eswar, N., Ramakrishnan, C. Deterministic features of side-chain main-chain hydrogen bonds in globular protein structures. Protein Eng. 2000;13:227–238. doi: 10.1093/protein/13.4.227. [DOI] [PubMed] [Google Scholar]
- Janin, J. Surface area of globular proteins. J. Mol. Biol. 1976;105:13–14. doi: 10.1016/0022-2836(76)90192-3. [DOI] [PubMed] [Google Scholar]
- Kumar, S., Nussinov, R. Salt bridge stability in monomeric proteins. J. Mol. Biol. 1999;293:1241–1255. doi: 10.1006/jmbi.1999.3218. [DOI] [PubMed] [Google Scholar]
- Levitt, M. Growth of novel protein structural data. Proc. Natl. Acad. Sci. 2007;104:3183–3188. doi: 10.1073/pnas.0611678104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McDonald, I.K., Thornton, J.M. Satisfying hydrogen bonding potential in proteins. J. Mol. Biol. 1994;238:777–793. doi: 10.1006/jmbi.1994.1334. [DOI] [PubMed] [Google Scholar]
- Miao, J., Klein-Seetharaman, J., Meirovitch, H. The optimal fraction of hydrophobic residues required to ensure protein collapse. J. Mol. Biol. 2004;344:797–811. doi: 10.1016/j.jmb.2004.09.061. [DOI] [PubMed] [Google Scholar]
- Miller, S., Janin, J., Lesk, A.M., Chothia, C. Interior and surface of monomeric proteins. J. Mol. Biol. 1987;196:641–656. doi: 10.1016/0022-2836(87)90038-6. [DOI] [PubMed] [Google Scholar]
- Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C. SCOP: A structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 1995;247:536–540. doi: 10.1006/jmbi.1995.0159. [DOI] [PubMed] [Google Scholar]
- Rykunov, D., Fiser, A. Effects of amino acid composition, finite size of proteins, and sparse statistics on distance-dependent statistical pair potentials. Proteins. 2007;67:559–568. doi: 10.1002/prot.21279. [DOI] [PubMed] [Google Scholar]
- Sandelin, E. On hydrophobicity and conformational specificity in proteins. Biophys. J. 2004;86:23–30. doi: 10.1016/S0006-3495(04)74080-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shen, M., Davis, F., Sali, A. The optimal size of a globular protein domain: A simple sphere-packing model. Chem. Phys. Lett. 2005;405:224–228. [Google Scholar]
- Teller, D.T. Accessible area, packing volumes, and interaction surfaces of globular proteins. Nature. 1976;260:729–731. doi: 10.1038/260729a0. [DOI] [PubMed] [Google Scholar]
- Thomas, P.D., Dill, K.A. Statistical potentials extracted from protein structures: How accurate are they? J. Mol. Biol. 1996;257:457–469. doi: 10.1006/jmbi.1996.0175. [DOI] [PubMed] [Google Scholar]
- Wan, W.Y., Milner-White, E.J. A natural grouping of motifs with an aspartate or asparagine residue forming two hydrogen bonds to residues ahead in sequence: Their occurrence at α-helical N termini and in other situations. J. Mol. Biol. 1999;286:1633–1649. doi: 10.1006/jmbi.1999.2552. [DOI] [PubMed] [Google Scholar]
- Wang, J.Y., Ahmad, S., Gromiha, M.M., Sarai, A. Look-up tables for protein solvent accessibility prediction and nearest neighbor effect analysis. Biopolymers. 2004;75:209–216. doi: 10.1002/bip.20113. [DOI] [PubMed] [Google Scholar]
- White, S.H. Amino acid preferences of small proteins. Implications for protein stability and evolution. J. Mol. Biol. 1992;227:991–995. doi: 10.1016/0022-2836(92)90515-l. [DOI] [PubMed] [Google Scholar]
- Zhou, H., Zhou, Y. Quantifying the effect of burial of amino acid residues on protein stability. Proteins. 2004;54:315–322. doi: 10.1002/prot.10584. [DOI] [PubMed] [Google Scholar]