Skip to main content
Biophysical Journal logoLink to Biophysical Journal
. 2012 Jun 6;102(11):2605–2614. doi: 10.1016/j.bpj.2012.04.029

Self-Complementarity within Proteins: Bridging the Gap between Binding and Folding

Sankar Basu , Dhananjay Bhattacharyya , Rahul Banerjee †,
PMCID: PMC3368132  PMID: 22713576

Abstract

Complementarity, in terms of both shape and electrostatic potential, has been quantitatively estimated at protein-protein interfaces and used extensively to predict the specific geometry of association between interacting proteins. In this work, we attempted to place both binding and folding on a common conceptual platform based on complementarity. To that end, we estimated (for the first time to our knowledge) electrostatic complementarity (Em) for residues buried within proteins. Em measures the correlation of surface electrostatic potential at protein interiors. The results show fairly uniform and significant values for all amino acids. Interestingly, hydrophobic side chains also attain appreciable complementarity primarily due to the trajectory of the main chain. Previous work from our laboratory characterized the surface (or shape) complementarity (Sm) of interior residues, and both of these measures have now been combined to derive two scoring functions to identify the native fold amid a set of decoys. These scoring functions are somewhat similar to functions that discriminate among multiple solutions in a protein-protein docking exercise. The performances of both of these functions on state-of-the-art databases were comparable if not better than most currently available scoring functions. Thus, analogously to interfacial residues of protein chains associated (docked) with specific geometry, amino acids found in the native interior have to satisfy fairly stringent constraints in terms of both Sm and Em. The functions were also found to be useful for correctly identifying the same fold for two sequences with low sequence identity. Finally, inspired by the Ramachandran plot, we developed a plot of Sm versus Em (referred to as the complementarity plot) that identifies residues with suboptimal packing and electrostatics which appear to be correlated to coordinate errors.

Introduction

All forms of biomolecular recognition are said to involve interaction between complementary molecular surfaces. This specific match between two interacting surfaces is primarily supposed to have a dual aspect: 1) surface (or shape) complementarity (1) arising out of the steric fit of closely packed interface atoms in van der Waals contact; and 2), electrostatic complementarity (2) mediated by long-range electric fields due to charged or partially charged atoms. For small-molecule ligands or cofactors binding to proteins, the above point of view appears to be only partially true. Not only can one ligand adopt a wide range of conformations upon binding to different proteins, the binding pocket also exhibits more variability in shape and physicochemical characteristics than can be accounted for by the multiple conformations adopted by the ligand (3–5). For protein-protein interfaces, however, the concept appears to have greater plausibility and wider appeal. Due to the relatively larger size of protein-protein interfaces (∼1600 Å2 on average) (6), the surfaces have to be carefully tailored so that extended areas buried upon association can move into close contact. A variety of shape correlation and electrostatic complementarity measures incorporated into docking algorithms have been shown to be effective in predicting the interfaces between interacting proteins (7,8). Electrostatic complementarity based on optimized charge distribution has also been used to match two halves of the same molecule (myoglobin) from a repertoire of homologous structures (9). On the other hand, surface complementarity has found application in determining native side-chain torsions within proteins (10,11) and has also served to rationalize the variability in the quaternary arrangements of legume lectins (12). Lawrence and Colman (1) and McCoy et al. (2) formulated and estimated shape correlation (Sc) and electrostatic complementarity (EC) measures for a wide range of proteins in quaternary association, protein-inhibitor, and antigen-antibody complexes. It thus appears reasonable that threshold values of geometric and electrostatic complementarities will have to be satisfied for the stereospecific association between two polypeptide chains. Within proteins, surface complementarity (Sm) has been used to enumerate specific modes of packing between amino acid side chains (13) and, somewhat analogously to protein interfaces, all residues upon burial achieve uniformly high measures of surface fit (14).

Although the notion of complementarity lends itself naturally to the characterization of interprotein association, it has been suggested that both binding and folding should be approached from a common conceptual platform (15,16). The native conformation adopted by the polypeptide chain leads to the stereospecific packing of its buried side chains and optimal electrostatic interactions due to the strategic three-dimensional placement of charges. Thus, folding can possibly be described as the self-recognition of the polypeptide chain as it collapses onto itself. However, one inherent problem in equating binding with folding lies in the different characteristics of protein interiors compared with interfaces. Barring dimers, interfaces resemble protein surfaces rather than interiors, both in their composition and in the spatial distribution of amino acid residues (17). Unlike hydrophobic clusters found within proteins, nonpolar residues are found in isolation at protein-protein interfaces, surrounded by polar or charged amino acids. However, despite these differences, the fact remains that both interfacial (1) and interior atoms (13,14) have to satisfy fairly stringent packing requirements, and, at least for the interfaces, significant values of electrostatic complementarity have been found (2,8). To explore the similarities or equivalence between binding and folding (in terms of complementarity), we first estimated the electrostatic complementarity (Em) of residues buried within proteins from a representative database of crystal structures. Second, in similarity to protein-protein docking (7), we used scoring functions based on Sm and Em for protein fold recognition, validated in state-of-the-art databases. Lastly, to detect local regions of suboptimal packing and/or electrostatics in a native fold, we developed a plot based on Sm and Em (in analogy to the famous Ramachandran plot (18)) to identify such residues, which appear to be correlated to coordinate errors.

Materials and Methods

Two representative databases of high-resolution protein crystal structures (resolution ≤ 2.0 Å, R-factor ≤ 20%, sequence identity ≤ 30%) were used in the calculations. The first database (DB1), consisting of 719 polypeptide chains, is described in detail elsewhere (13). This database was used in the computation of all relevant statistics involving Sm. We assembled a subset of this larger database consisting of 400 polypeptide chains (DB2) by removing proteins with deeply embedded prosthetic groups (e.g., cytochromes) and any missing atoms (data set S1 in the Supporting Material). DB2 (composed of 65 all α, 70 all β, 106 α|β, 124 α+β, and 35 multidomain proteins) was used in the calculation of Em of amino acid residues and their related statistics. Sixty-two of these proteins were found to contain metal ions as an integral part of their structure. Hydrogen atoms were geometrically fixed to all structures by means of the program REDUCE (19).

Before calculating the electrostatic potential, we assigned partial charges and atomic radii for all protein atoms from the AMBER94 all-atom molecular-mechanics force field (20). Asp, Glu, Lys, Arg, doubly-protonated histidine (Hip), and both the carboxy and amino terminal groups were considered to be ionized. Crystallographic water molecules and surface-bound ligands were excluded from the calculations and thus modeled as bulk solvent. Ionic radii were assigned to the bound metal ions according to their charges (21).

The van der Waals surfaces of the polypeptide chains were sampled at 10 dots/Å2. The details of the surface generation were discussed in a previous report (14). We estimated the exposure of individual atoms to solvent by rolling a probe sphere of radius 1.4 Å over the protein atoms (22), and estimated the burial (Bur) of individual residues by the ratio of solvent-accessible surface areas of the amino acid X in the polypeptide chain to that of an identical residue located in a Gly-X-Gly peptide fragment with a fully extended conformation.

The finite-difference Poisson-Boltzmann method as implemented in Delphi (version 4) (23,24) was used to compute the electrostatic potential of the molecular surface along the polypeptide chain. The protein interior was considered to be a low dielectric medium (dielectric constant of 2) and the surrounding solvent was considered a high dielectric medium (dielectric constant of 80). Ionic strength was set to zero because adoption of physiological strength has been found to have little effect on the final electrostatic solution (25,26), and calculations were performed at 298 K. The dielectric boundary and the partial charges were mapped onto a cubic grid either 151× 151 × 151 or 201 × 201 × 201 grid points/side in size (the latter for proteins that exhibited pronounced asymmetry in their physical dimensions). The percentage grid fill was set to 80% with a scale of 1.2 grid points/Å. Boundary potentials were approximated by the Debye-Hückel potential of the dipole equivalent to the molecular charge distribution. A probe radius of 1.4 Å was used to delineate the dielectric boundary. The linearized Poisson-Boltzmann equation (LPBE) was then solved iteratively until convergence. The number of cycles to convergence was automatically determined by the program (with the convergence threshold based on the maximum change in potential set to 0.0001 kT/e), and was monitored by examining a plot of convergence in the output log file.

Delphi requires a set of surface points on which the electrostatic potentials are to be computed along with a set of atoms that contribute to the potential. After generating the van der Waals surface of the entire polypeptide chain, we identified the dot surface points of the individual amino acids (targets) and fed them to the program along with the selected set of (charged) atoms. The electrostatic potential for each residue surface was then calculated twice: first, due to the atoms of the particular target residue, and second, from the rest of the protein excluding the selected amino acid. In either case, atoms that did not contribute to the potential (dummy atoms) were only assigned their radii with zero charge, to maintain the scaling and orientation of the molecule on the grid. Thus, each dot surface point of the (selected) residue was tagged with two values of electrostatic potential. Adapted from the function EC originally proposed by McCoy et al. (2) (for protein-protein interfaces), the Em of an amino acid residue (within protein) was then defined as the negative of the correlation coefficient (Pearson's) between these two sets of potential values:

Em=(i=1N(φ(i)φ¯)(φ(i)φ¯)(i=1N(φ(i)φ¯)2i=1N(φ(i)φ¯)2)12), (1)

where for a given residue consisting of a total of N dot surface points, φ(i) is the potential on its ith point realized due to its own atoms and φ(i), due to the rest of the protein atoms, and φ¯ and φ¯ are the mean potentials of φ(i), i = 1…N and φ(i), i = 1…N respectively.

After calculating the electrostatic potentials, we divided the values corresponding to N dot surface points into two distinct sets based on whether the dot point was obtained from main-chain or side-chain atoms of the target residue, and calculated Em separately for each set. Thus for a given residue, Em was estimated for the entire residue (Emall, as described above), the side-chain surface points (Emsc), and the main-chain surface points (Emmc).

The calculation of Sm has been discussed extensively in previous studies (13,14). Briefly, Sm can be calculated between the side-chain surface points of a target residue and all other dot points in its immediate neighborhood (within a distance of 3.5 Å), contributed by the rest of the protein. Any dot surface point (which is essentially an area element) is characterized by its coordinates (x, y, z) and the direction cosines of its normal (dl, dm, dn). Sm is then defined (following Lawrence and Colman (1)) to be the median of the distribution {S(a,b)}, S(a,b), calculated by the following equation:

S(a,b)=na·nb.exp(w.dab2), (2)

where na and nb are two unit normal vectors corresponding to the dot surface point a (located on the side-chain surface of the target residue) and b (the dot point nearest to a, within 3.5 Å), respectively, with dab the distance between them and w, a scaling constant set to 0.5. After identifying nearest neighbors, we could also partition the side-chain surface points of the specified residue into two sets by virtue of their neighbors coming from either side-chain or main-chain atoms, and calculate Sm separately for each set. Thus, every target residue (side chain) has three measures of Sm based on the choice of its nearest neighbors (surface points), whether obtained from side-chain (Smsc), main-chain (Smmc) atoms alone, or all atoms (Small). Because glycines lack any nonhydrogen side-chain atom, they were excluded as targets from all calculations.

Two scoring functions (based on the amino acid identity (Res), burial (Bur), Emsc, and Smsc) were formulated to identify the native fold amid a set of decoys. Only residues that were completely (0.00 ≤ Bur ≤ 0.05) or partially (0.05 < Bur ≤ 0.3) buried were considered. Initially, the average and standard deviation (SD) for both Smsc (Smsc¯, σS) and Emsc (Emsc¯, σE) were estimated (over their respective databases, DB1 and DB2) separately for different amino acid residues (Ala, Val, etc.) distributed into three bins based on their burial (bin 1: 0.0 ≤ Bur ≤ 0.05; bin 2: 0.05 < Bur ≤ 0.15; bin 3: 0.15 < Bur ≤ 0.30). The center (mode:E0sc) and the halfwidth at half-maximum height (γE) were also computed for individual residues (in different burial bins) from the normalized frequency distributions in Emsc by numerical curve fitting. For the first measure, we computed Smsc(i), Emsc(i) for all buried residues (i = 1….N; Bur ≤ 0.30) of a given polypeptide chain, and calculated the following expression:

CSgl=1Ni=1,Bur0.3N(12πσSexp(12(Smsc(i)Smsc¯σS)2))(1π(γE(Emsc(i)E0sc)2+γE2))=1Ni=1,Bur0.3NGaussian(Smsc(i))·Lorentzian(Emsc(i)) (3)

The second scoring function was based on the conditional probability distributions of Emsc and Smsc for each residue type within a particular burial bin. As in the previous case, three burial bins were considered. Distributions of Emsc and Smsc for a given residue type in a particular burial bin were then divided into intervals of 0.05. The conditional probability distributions of Emsc and Smsc were then defined as

P(Cmsc(i)|{Res(i),Bur(i)})=N(Cmsc(i)Res(i)Bur(i))N(Res(i)Bur(i)). (4)

for the ith residue along the polypeptide chain, where Cmsc stands for either Emsc or Smsc, and N denotes the count of residues in the specified sets.

Thus, for example,

P(Smsc:0.450.5|{Valine,Bur:0.00.05})=N(Valine(0.0Bur0.05)(0.45<Smsc0.5))N(Valine(0.0Bur0.05)).

For any given polypeptide chain, the products of the conditional probabilities in Smsc and Emsc for each (ith) residue (i = 1…N, Bur ≤ 0.30) were then summed and divided by the total number of buried residues (N), giving rise to the following measure:

CScp=1Ni=1,Bur0.3N(P(Smsc(i)|{Res(i),Bur(i)}))(P(Emsc(i)|{Res(i),Bur(i)})) (5)

Z-scores corresponding to the native structure (along with its rank) for the complementarity scores (CSgl, CScp) were calculated in a multiple decoy set by the following equation:

ZCS=CSnativeCS¯σ, (6)

where CSnative is the score obtained for the parameter CSgl or CScp from the native structure, and CS¯ and σ are the mean and SD for the scores in the decoy set. Average ZCS (〈Z〉) was calculated for the successful hits (native at rank 1) in a decoy set.

Results

Em within proteins

The electrostatic potential within proteins was computed by means of the LPBE as implemented in Delphi (23,24), and estimation of Em was adapted (see Eq. 1) from a method proposed by McCoy et al. (2) for protein-protein interfaces. Nonlinear PBE at nonzero ionic strengths is preferred for highly charged molecules such as DNA (24), microtubules, and ribosomal subunits (27). Globular proteins, however, have appreciably low net charge densities, and LPBE has been used extensively to compute electrostatic potentials at protein-protein interfaces and solvent-exposed residue surfaces (25,28,29). Electrostatic potentials estimated by nonlinear PBE (in a trial calculation involving 150 polypeptide chains) under physiological counterionic strength (0.15 M NaCl, ion exclusion radii: 2.0 Å) were virtually identical to those calculated by LBPE (Fig. S1).

Em was estimated for all residues at the protein interior (burial ≤ 0.30; see Materials and Methods) from a database of 400 polypeptide chains (DB2). To test the sensitivity of Em with respect to the internal dielectric of the continuum (εp), we repeated all calculations three times, setting εp to 2, 4, and 10, respectively. The root mean-square deviations (RMSDs) among these three sets of Em values for different residues were negligible, indicating the invariance of Em at least in the commonly used ranges of εp (Fig. S2). Identical calculations performed with higher internal dielectric (εp = 20 and 40) also preserved the overall trends in the results (Table S1). It should be noted that Em estimates the correlation between potentials generated by the two sets of atoms (over a collection of surface points) regardless of their magnitude.

Before the statistical analysis was performed, all completely/partially buried (target) residues were distributed in three burial bins (burial: 0.0–0.05, 0.05–0.15, 0.15–0.30; see Materials and Methods). Enumeration of the average Em values in each burial bin for different amino acids (targets), calculated over the entire residue surface (Emall¯), revealed a fairly uniform distribution among the different residues, within the range of ∼0.5–0.7 (Table 1). The high positive values of Emall throughout the protein interior suggest that individual residues buried within proteins have anticorrelated (complementary) surface electrostatic potentials (Fig. S3) similar to those of protein-protein interfaces (2). In fact, Emall¯ values for hydrophobic residues were comparable to those for polar and charged amino acids. From these observations, we thought that the main-chain surface points could be contributing predominantly to Emall, especially for hydrophobic residues. To test this hypothesis, we segregated the surface points by virtue of their residence on main-chain/side-chain atoms, and calculated Em separately for each set, i.e., Emsc and Emmc for side- and main-chain surface points, respectively. As expected, Emmc¯ values were again uniform for all the amino acids and comparable in magnitude to Emall¯. Interestingly, even for hydrophobic residues, Emsc¯ was also found to exhibit fairly significant values. However, differences were observed in Emsc¯ between hydrophobic (Val: 0.48, Leu: 0.46, Ile: 0.48, Phe: 0.41) and charged/polar (Asn: 0.67, Gln: 0.64, Asp: 0.61, Glu: 0.63, Lys: 0.62, Arg: 0.56) residues, albeit within 1 SD (∼0.1–0.25; Table 1). Somewhat reduced values were obtained for sulfur-containing amino acids (Cys: 0.34, Met: 0.32) and proline (0.34). A similar pattern was observed in all three burial bins, indicating that within the protein interior, the distribution in Em appears to be independent of the exposure of a residue to solvent.

Table 1.

Native electrostatic complementarities of completely buried residues

Residue Emall¯ Emsc¯ Emmc¯
ALA 0.68 (0.17) 0.48 (0.25) 0.72 (0.17)
VAL 0.62 (0.16) 0.48 (0.18) 0.72 (0.16)
LEU 0.61 (0.16) 0.46 (0.19) 0.73 (0.16)
ILE 0.61 (0.16) 0.48 (0.17) 0.72 (0.16)
PHE 0.56 (0.15) 0.41 (0.16) 0.70 (0.17)
TYR 0.58 (0.15) 0.50 (0.19) 0.69 (0.18)
TRP 0.57 (0.15) 0.50 (0.17) 0.68 (0.20)
SER 0.64 (0.18) 0.59 (0.27) 0.67 (0.18)
THR 0.62 (0.16) 0.55 (0.23) 0.68 (0.18)
CYS 0.51 (0.18) 0.34 (0.22) 0.66 (0.21)
MET 0.45 (0.13) 0.32 (0.16) 0.72 (0.16)
ASP 0.63 (0.22) 0.61 (0.26) 0.62 (0.17)
GLU 0.64 (0.25) 0.63 (0.28) 0.66 (0.19)
ASN 0.68 (0.17) 0.67 (0.22) 0.68 (0.17)
GLN 0.66 (0.17) 0.64 (0.21) 0.70 (0.18)
LYS 0.72 (0.17) 0.62 (0.22) 0.75 (0.15)
ARG 0.68 (0.16) 0.56 (0.19) 0.75 (0.15)
PRO 0.53 (0.20) 0.34 (0.23) 0.65 (0.19)
HIS 0.54 (0.26) 0.50 (0.28) 0.65 (0.21)

Average Em values and their SDs (in parentheses) for different residues in the first burial bin (0.0 ≤ Bur ≤ 0.05) were calculated from all atoms on the entire residue surface (Emall¯), the side-chain surface (Emsc¯), and the main-chain surface (Emmc¯).

To assess the relative contribution of side- or main-chain atoms to Em, we performed four more sets of calculations based on the choice of residue surface (target: side chain/main chain) on which to calculate the electrostatic potentials and the atoms (side chain/main chain) contributing to the potential:

  • Set 1: Main-chain surface, main-chain atoms.

  • Set 2: Side-chain surface, main-chain atoms.

  • Set 3: Side-chain surface, side-chain atoms.

  • Set 4: Side-chain surface, side-chain atoms of the target, and all atoms from the rest of the polypeptide chain.

Except for the choice of surfaces and atoms, the method used to calculate Em was identical to that outlined above. As expected, set 1 gave a uniform distribution in Em¯ with elevated values for all residues (Table S2). For set 2, fairly significant values of Em¯ were still retained for hydrophobic residues (Ala: 0.43, Val: 0.44, Leu: 0.42, Ile: 0.43, Phe: 0.36, Met: 0.38), which is a reflection of the long-range electric fields generated by the main-chain atoms overwhelmingly contributing to the complementarity attained on hydrophobic side-chain surfaces. This was confirmed by the comparison of Em¯ in set 2 and Emsc¯: both sets of values were almost identical for hydrophobic residues (Table 1 and Table S2), whereas polar/charged residues exhibited a marked reduction in set 2 compared with Emsc¯, because the contribution of side-chain atoms carrying high partial charges was disregarded in set 2. For both set 3 and set 4, Em¯ for hydrophobic residues were practically negligible (Table S2); however, polar/charged residues gave consistently high values for set 4 but were distinctly reduced for set 3. The substantial increase in Em¯ for set 4 relative to set 3 (except for alanine) was indicative of the considerable role played by the main-chain atoms (contributed by the rest of the polypeptide chain) in the overall determination of Em¯. This holds true even for hydrophilic amino acids, where the main-chain atoms contribute appreciably to the neutralization of the electric fields generated by polar/charged side-chain atoms.

It is thus evident that the long-range electric fields generated by main-chain atoms cast their shadow over the side-chain surface in such a manner that all residues, regardless of their hydrophobicity and burial, attain a fairly uniform level of overall complementarity. Polar/charged (side-chain) atoms of hydrophilic residues additionally contribute to the elevated complementarity attained on their side-chain surfaces.

Application of Sm and Em in fold recognition and structure validation

The second part of the work has to do with the application of Sm and Em in the area of protein fold recognition and structure validation. Two such scoring functions were designed based on the combined use of the complementarity measures obtained for different residues distributed in the aforementioned burial bins.

Plots of the normalized frequency distributions in Smsc, Emsc for the individual residues in each burial bin (i.e., P(Smsc|{Res,Bur}), P(Emsc|{Res,Bur})) gave characteristic curves (symmetric for Smsc and negatively skewed for Emsc), which fitted best to Gaussian and Lorentzian functions for Smsc and Emsc, respectively (goodness of fit, R2 ≥ 0.85 for all cases; Fig. 1). From these observations, the first scoring function (CSgl) was designed based on Gaussian for Smsc and Lorentzian for Emsc (see Eq. 3.). The second function (CScp) directly multiplies the conditional probabilities P(Smsc|{Res,Bur}) and P(Emsc|{Res,Bur}) for each residue along the polypeptide chain to obtain the joint probability of their co-occurrence. These individual probabilities were averaged over all buried residues (Bur ≤ 0.3) in the polypeptide chain to give the final score (see Eq. 5.). The conditional probabilities were estimated previously (see Materials and Methods).

Figure 1.

Figure 1

Normalized frequency distributions of Smsc and Emsc give characteristic curves that fit best to Gaussian and Lorentzian functions, respectively. These normalized frequencies for a given burial bin (Bur) and residue type (Res) can also be interpreted as conditional probabilities P(Smsc|{Res,Bur}) and P(Emsc|{Res,Bur}). (A) Distribution in Smsc for leucine (0.0 ≤ Bur ≤ 0.05) fitted to a Gaussian function (R2 = 0.997). (B) Distribution in Emsc for asparagine (same burial) fitted to a Lorentzian function (R2 = 0.948). Similar curves were obtained for all completely/partially buried amino acids for all three burial bins.

It is to be noted that both CSgl and CScp are averages of individual scores given by all the completely/partially buried residues in a protein and thus are independent of the polypeptide chain length. Thus, for any given native structure, one would expect their values to cluster around optimal numbers characteristic of native folds. The distributions of CSgl and CScp computed for the native folds (in DB2) had a very good linear correlation between each other (R2 = 0.94; Fig. S4) and gave mean values of 3.7 (± 0.437) and 0.015 (± 0.0017), respectively. Thus for the native folds, these functions exhibit a reduced scatter about the mean, whereas for decoys, reduced scores for both functions are to be expected. The decoy sets used to benchmark and validate the scoring functions included both single and multiple decoys, with Z-scores calculated for the latter (see Eq. 6.). Because both of the knowledge-based scoring functions were parameterized on crystal structures alone, NMR structures were excluded in their validation.

Identification of the native crystal structure from decoys

One of the single decoy sets tested, Misfold (30), consists of 26 pairs of structures. In each pair, the native sequence is threaded onto an unrelated fold to generate the decoy. Twenty-five pairs were considered in the calculation (with the exception of 1CBH, which is an NMR structure). The Pdberr decoy set (31) consists of three correctly solved x-ray crystal structures along with their erroneous decoy counterparts, whereas sgpa (32) contains the experimental structure of Streptomyces griseus Protease A (2SGA) and its two corresponding decoys, generated by molecular-dynamics simulations. For the three data sets, both functions successfully identified the native structure from the corresponding decoys for all cases (Table S3). A comparison with other knowledge-based scoring functions (Table S4) shows that the performance of the complementarity scores in single decoy sets is as efficient as or better than the other functions.

The four-state reduced decoy set (33) consists of seven sequences (chain length ranging from 54–75 residues), each with nearly 600–700 decoys that include structures with RMSD (Cα atoms) ranging from 0.8 to 9.4 Å from the native. Out of the seven sequences, six native structures were correctly identified (rank 1) by CSgl and CScp with significant Z-scores (Table S5 A). In the case of 4RXN (all-β class), the native structure was found to be at ranks 10 and 15, respectively, for CSgl and CScp. Further investigation revealed that 4RXN has negligible side-chain packing between its secondary structural elements. The decoy set, Fisa (34), contains four small (43–76 residues) all-α proteins, with 500 decoys for each set. Major failures were encountered for this decoy set, where both CSgl and CScp were successful in detecting the native at the top rank in two out of the four proteins (Table S5 B). 1HDD-C was detected at ranks 4 (CSgl) and 5 (CScp); however, for 1FC2, both of the functions failed entirely, leading to insignificant or negative Z-scores. This was due to minimal packing between the helices for these low-resolution structures (2.8 Å). It is notable (Table S6) that for 1HDD-C, 1FC2, and 4RXN, failure is quite common even for the other functions.

Hg_structal is a decoy set composed of 29 globins (35). Each globin was built by comparative modeling using 29 other globins as templates, with Cα RMSDs ranging from 1.96 to 8.57 Å. Thus, for each native globin chain there are 29 decoys. In 23 out of 29 globins, both CSgl and CScp were able to correctly detect the native at the top rank (〈Z〉: 3.23 and 3.24, respectively; Table S7 A). For similar decoy sets, ig_structal CSgl and CScp were successful in 48 and 50 cases (〈Z〉: 3.89, 3.91) out of 61 immunoglobulins, whereas for ig_structal_hires (subset of 20 high-resolution structures), 100% success was achieved for both (Table S7, B and C).

The ROSETTA all-atom decoy sets are built for small, single-domain proteins by the fragment insertion-simulated annealing strategy. The latest ROSETTA decoy set (36) contains >75,000 decoys for 41 proteins (25 of which are x-ray structures, and with the number of decoys in each set ranging from 1610 to 1934), sampling a wide variety of topological folds and polypeptide chain lengths ranging from 35 to 85 amino acids. CSgl, CScp were able to rank the native in 23 and 24 instances, respectively (out of 25). The high average Z-scores (7.24 and 6.98) also demonstrate the discriminatory ability of both scoring functions (Table S8). The only major failure was encountered for 1CC5 (detected at ranks 36 and 58), which is a cytochrome C molecule with an embedded Fe+2-containing protoporphyrin IX ring. Because only protein atoms were considered, a false picture of interior atomic packing was available to the scoring functions.

CASP9 (37) is probably the most challenging test, because the decoys are the best-predicted near-native models submitted by different groups that participated in the CASP experiment. CASP9 (conducted in July–August, 2010) consisted of 111 valid targets with 90 x-ray crystal structures. T0543 (2XRQ) and T0605 (3NMD) were not considered in the calculation, the former because of its excessively huge chain length (887 residues) and the latter because it is a single standalone helix. For the remaining 88 targets (with a total of 9197 models, chain length ranging from 83 to 611 residues) CSgl and CScp detected the native at the top rank in 70, 72 and 85, 86 within rank 5 (<Z>: 3.65, 3.95), respectively (Table S9).

Discrimination between good and bad RMSD models

To test the sensitivity of the functions with respect to deviations from the experimentally determined coordinates of the side-chain atoms, we selected 10 native (top ranked) targets from CASP9 along with their corresponding models. After superposing the models onto the native structure by Dali server (38), we calculated the RMSD of the side-chain atoms at a one-to-one atomic correspondence with respect to the native. Local deviations (in Cα) > 10 Å were considered to be so large as to lose all structural relationship with the corresponding region of the native, as well as models that were nonsuperposable (by Dali), and these were thus not included in the calculation. CSgl and CScp of the native structure and ∼60 models per target were then plotted (Fig. 2 and Fig. S5) as a function of their RMSDs (ranging from ∼1.5 to 10 Å). Although the scores generally fell with an increase in RMSD, especially in the range of 1.5–5 Å, there was substantial scatter among the points that belied the expectation of obtaining a functional relationship between the two variables. However, because these RMSDs contain contributions from both main- and side-chain deviations, we performed a second calculation (with 10 structures; Fig. 2) in which the backbone coordinates were held fixed and errors were incorporated into the side-chain conformations using three distinct methods: 1), randomizing the side-chain χ-angles (50 erroneous models) (13); 2), subjecting the same 50 models as in method 1 to an energy minimization protocol (using CHARMM (39)) as described previously (13); and 3), obtaining a unique solution as determined by SCWRL4.0 (11) upon threading. Two distinct clusters were obtained for methods 1 and 2, and energy minimization significantly improved scores in method 2 relative to method 1. The models derived from SCWRL4.0 (method 3) generally gave values closest to the native (Fig. 2 and Fig. S6), and rarely a few structures from method 2 gave similar/slightly better scores than method 3 (Fig. S6). Thus, the scores indeed reflect errors in side-chain coordinates as estimated by RMSD (with respect to native) and generally drop with an increase in error.

Figure 2.

Figure 2

Complementarity scores drop with increased errors in side-chain coordinates. (A) CScp values as a function of side-chain RMSDs for three CASP9 targets (native and models). N1, N2, and N3 correspond to the native crystal structures of T0522 (3NRD), T0623 (3NKH), and T0586 (3NEU), respectively, and their corresponding models plotted in red, green, and blue. (B) CScp values as a function of side-chain RMSDs for three globular proteins and their models (see text). N1, N2, N3 and S1, S2, S3 correspond to the native structures and the unique solutions generated by SCWRL4.0 for 2OEB (red), 3COU (green), and 2HAQ (blue), respectively. The two distinct clusters are for structures produced by randomization of the side-chain conformers (with lower values) and energy minimization of the same set of randomized conformers (higher values). Similar patterns were obtained for CSgl. Except for 2HAQ, all structures are from DB2.

Fold recognition by cross-threading

The scoring functions were also tested for protein pairs that belonged to the same fold but had low sequence identity upon alignment. We selected 100 such pairs (sequence identities ranging from 6% to 30%) sampling diverse folds from the PREFAB4.0 database (40). The sequence identities upon structural alignment for each pair were determined by Dali Server (38) and their folds assigned according to the SCOP database (41) (data set S2). For every pair, we aligned the two native sequences using CLUSTAL W (42). Insertions in the sequence to be threaded onto the main chain (of its partner) were excised, whereas deletions were padded with glycine to maintain the correct position of the threaded residues consistent with the alignment. For the cross-threaded sequences, padded polyglycine stretches at the N- and C-termini were also excised before the calculations were performed. When the fold was part of a larger polypeptide chain (domain), two possibilities were considered. If the fold was found to be completely separated from the other domains in the chain, it was considered in isolation for all subsequent calculations, whereas if the fold was found to be integrally embedded in the composite structure, the entire chain was used to calculate Smsc and Emsc, and the relevant residues in the domain were then used to compute CSgl and CScp. For all pairs, the native structures gave characteristic similar scores (Table S10) for both CSgl and CScp. The two sequences were then cross-threaded onto the backbone of each other, with their side-chain torsions being set to values determined by SCWRL4.0 (11). For each such pair, 100 random sequences (≤15% identity between any two sequences in a set) were threaded onto each of the two corresponding templates according to the same protocol. Hydrogen atoms were geometrically fixed by REDUCE (19) in all models. In a large majority of the cases, the average score of the two cross-threaded structures was found to be markedly lower than that of their native counterparts but noticeably higher (Z ≥ 2 for 86, 87 pairs) than those obtained from the random decoys (〈Z〉: 3.43, 3.33 for CSgl, CScp, respectively). However, below 15% sequence identity, there was a drop in the Z-scores (<1.5 for 5 out of 21 such pairs) primarily due to large mismatches in structural (Dali server) and sequence (CLUSTL W) alignments. In general, large variations were observed in the Z-scores (ranging from 0.4 to 8.0; Table S10) for different folds.

Complementarity plot

In contrast to evaluating the overall quality of an atomic model in terms of packing and electrostatics, Smsc and Emsc can also be used to identify local packing defects and regions of suboptimal electrostatics in a crystal structure. To that end, we plotted the individual (Smsc, Emsc) values of completely/partially buried residues in a complementarity plot (CP) spanning −1.0 to 1.0 in both the X (Smsc) and Y (Emsc) axes. Given the fact that for residues in correctly folded proteins both Smsc and Emsc are largely constrained to a limited range of values (as a function of their burial), regions in CP encompassing points corresponding to such amino acids could be clearly delineated. From DB2, we plotted Smsc and Emsc of all (target) residues irrespective of the amino acid type separately based on their burial bins, accounting for 23,850, 10,624, and 13,255 residues in bins 1, 2, and 3, respectively (see Materials and Methods). Thus, in all, three plots (CP1, CP2, and CP3) were obtained (Fig. S7, Fig. S8, and Fig. S9). Each two-dimensional plot was then divided into square grids (0.05 × 0.05 wide) and the probability of finding any residue (Pgrid) in a particular grid was estimated by the ratio of the number of points in that grid to the total number of points in the plot. The plots were then contoured based on their probability values Pgrid ≥ 0.005 for the first contour level and ≥ 0.002 for the second. The cumulative probability of locating a point within the second (outer) contour for the three plots was 91%, 90%, and 88%, respectively, whereas for the first (inner) contour, the probability gradually dropped with increasing solvent exposure to 82%, 76%, and 71%, respectively. Inspired by the Ramachandran plot (18), we termed the region within the first contour “probable”, that between the first and second contours “less probable”, and that outside the second contour “improbable”. In such a plot, residues with low Smsc and Emsc (<0.2 for both) are easily identified. However, in general, residues with suboptimal packing (as a function of their burial) and electrostatics could lie in sections partially spanning all three regions of the plots.

To test whether these suboptimal points (in CPs) were correlated with coordinate errors, we obtained 20 pairs of crystal structures consisting of an upgraded structure and its superseded partner from the PDB (Table S11). Subsequent to superposition by Dali server, residues from each pair were selected whose RMSD between side-chain atoms exceeded 1.5 Å for the first burial bin (with respect to native) and 2.0 Å for the second and third bins (method 1) (43). These residues were considered to be erroneous in the superseded PDB file. In addition, the calculation was repeated for residue pairs whose deviation in the side-chain (χ1) torsion was >40° (method 2) (43). Of note, the same residue could have different burials in the two files (superseded and upgraded).

The distribution of points for the upgraded and superseded structures was markedly different in the plots. Based on DB2, 82.1%, 9.2%, and 8.7% of the total points were found to be located in the probable, less-probable, and improbable regions, respectively, for the first burial bin (for the other bins, see Table S12). In sharp contrast, the distribution (method 1) was respectively 41.4%, 14.6%, 44.0% for the superseded structures, and 80.2%, 9.6%, and 10.2% for the upgraded structures. Deviation from the expected distribution (DB2) was estimated by χ2 in each plot (CP1, CP2, and CP3) for both the superseded and upgraded sets. χ2 (df = 3–1: probable, less probable, improbable; χ0.052 = 5.991) for burial bins 1, 2, 3 were found to be 1.1, 10.2, 15.7 and 503.9, 275.3, 187.8 for the upgraded and superseded sets, respectively. A similar pattern was also obtained for method 2 (Table S13). Thus, residues with positional errors have a heightened tendency to lie in the less-probable and improbable regions of the plot (Fig. 3, Fig. S10, and Fig. S11) associated with low complementarities. However, because CPs are essentially probabilistic in nature, there is a significant likelihood to encounter false positives (in the probable regions), specifically with regard to coordinate errors.

Figure 3.

Figure 3

Distribution of points in CP1. Distribution of points from superseded (erroneous: red) and upgraded structures (blue) in the CP for burial bin 1 (0.0 ≤ Bur ≤ 0.05). Errors were estimated by (A) side-chain RMSD and (B) deviation in χ1 torsion angles (see text).

Discussion

In the case of specific association, at least among proteins, some correspondence is to be expected between the geometrical features of their associating surfaces and their electrostatic potentials at the interface. Likewise, for a correctly folded globular protein, all buried residues should achieve optimal packing within the interior of the molecule and meticulously balance the electric fields that arise from different parts of the folded chain so as to neutralize all destabilizing electrostatic effects. Several calculations have confirmed that for correctly folded proteins, all residues upon burial exhibit fairly high levels of Sm for their side-chain atoms, enabling dense packing (13,14). To our knowledge, this is the first time that Em has been calculated within proteins to extend the analogy between folding and binding. The results show that one of the universal characteristics of correctly folded proteins is the almost uniformly elevated values in Smsc and Emsc attained by all deeply buried residues (Fig. S12). However, the constraints in Smsc appear to be more stringent relative to Emsc, given its reduced SD, compared with the latter. The nature of short- and long-range forces that determine the values of Smsc and Emsc also gives rise to their contrasting features. Smsc is a function of burial, whereas Emsc is not (Table S14). Furthermore, the primary determinants of Smsc and Emsc are side-chain atoms (for all residues) and main-chain atoms (for hydrophobic residues), respectively, whereas both side-chain and main-chain atoms contribute equally to the Emsc of hydrophilic residues.

The fact that both folding and binding require a narrow window of Sm and Em values was used to predict the native fold of a sequence. Both functions (CSgl and CScp) based on the probability distributions in Smsc and Emsc performed successfully in state-of-the-art decoy sets. This could be considered analogous to protein-protein docking, wherein both surface and electrostatic complementarities rise to their optimum values upon the interlocking of interacting protein molecules in the correct stereospecific geometry of association. That is to say, folding can be envisaged as the docking of interior residues to their respective native environments consistent with short- and long-range forces. The fact that the performance of both functions was comparable to or better than the best scoring functions currently available in the literature demonstrates the practical application of complementarity in the area of protein folding and structure prediction. The functions were also found to be useful for correctly identifying the same fold for two sequences with low sequence identity. Lastly, individual residues with suboptimal packing and electrostatics are easily identified in the CPs, which are highly correlated with coordinate errors. In contrast to the Ramachandran plot, which detects errors in backbone atoms due to local steric overlap, CPs detect side-chain conformations that are in disharmony with short- and long-range forces sustaining the native fold.

Acknowledgments

We thank Prof. Dipak Dasgupta (Biophysics Division, Saha Institute of Nuclear Physics (SINP)) for his constant support during this project. We also thank Dr. Bhaswar Ghosh and members of the computer section of SINP.

This work was supported by an intramural grant from the Department of Atomic Energy, Government of India (Chemical and Biophysical Approaches for Understanding Natural Processes, project of SINP).

Supporting Material

Document S1. Two data sets, 12 figures, 14 tables, and references (44–54)
mmc1.pdf (956.4KB, pdf)

References

  • 1.Lawrence M.C., Colman P.M. Shape complementarity at protein/protein interfaces. J. Mol. Biol. 1993;234:946–950. doi: 10.1006/jmbi.1993.1648. [DOI] [PubMed] [Google Scholar]
  • 2.McCoy A.J., Chandana Epa V., Colman P.M. Electrostatic complementarity at protein/protein interfaces. J. Mol. Biol. 1997;268:570–584. doi: 10.1006/jmbi.1997.0987. [DOI] [PubMed] [Google Scholar]
  • 3.Stockwell G.R., Thornton J.M. Conformational diversity of ligands bound to proteins. J. Mol. Biol. 2006;356:928–944. doi: 10.1016/j.jmb.2005.12.012. [DOI] [PubMed] [Google Scholar]
  • 4.Kahraman A., Morris R.J., Thornton J.M. Shape variation in protein binding pockets and their ligands. J. Mol. Biol. 2007;368:283–301. doi: 10.1016/j.jmb.2007.01.086. [DOI] [PubMed] [Google Scholar]
  • 5.Kahraman A., Morris R.J., Thornton J.M. On the diversity of physicochemical environments experienced by identical ligands in binding pockets of unrelated proteins. Proteins. 2010;78:1120–1136. doi: 10.1002/prot.22633. [DOI] [PubMed] [Google Scholar]
  • 6.Lo Conte L., Chothia C., Janin J. The atomic structure of protein-protein recognition sites. J. Mol. Biol. 1999;285:2177–2198. doi: 10.1006/jmbi.1998.2439. [DOI] [PubMed] [Google Scholar]
  • 7.Mandell J.G., Roberts V.A., Ten Eyck L.F. Protein docking using continuum electrostatics and geometric fit. Protein Eng. 2001;14:105–113. doi: 10.1093/protein/14.2.105. [DOI] [PubMed] [Google Scholar]
  • 8.Heifetz A., Katchalski-Katzir E., Eisenstein M. Electrostatics in protein-protein docking. Protein Sci. 2002;11:571–587. doi: 10.1110/ps.26002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Caravella, J. A. 2002. Electrostatics and packing in biomolecules: accounting for conformational change in protein folding and binding. PhD thesis. Massachusetts Institute of Technology, Cambridge.
  • 10.Liang S., Grishin N.V. Side-chain modeling with an optimized scoring function. Protein Sci. 2002;11:322–331. doi: 10.1110/ps.24902. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Krivov G.G., Shapovalov M.V., Dunbrack R.L., Jr. Improved prediction of protein side-chain conformations with SCWRL4. Proteins. 2009;77:778–795. doi: 10.1002/prot.22488. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Prabu M.M., Suguna K., Vijayan M. Variability in quaternary association of proteins with the same tertiary fold: a case study and rationalization involving legume lectins. Proteins. 1999;35:58–69. doi: 10.1002/(sici)1097-0134(19990401)35:1<58::aid-prot6>3.0.co;2-a. [DOI] [PubMed] [Google Scholar]
  • 13.Basu S., Bhattacharyya D., Banerjee R. Mapping the distribution of packing topologies within protein interiors shows predominant preference for specific packing motifs. BMC Bioinformatics. 2011;12:195. doi: 10.1186/1471-2105-12-195. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Banerjee R., Sen M., Saha P. The jigsaw puzzle model: search for conformational specificity in protein interiors. J. Mol. Biol. 2003;333:211–226. doi: 10.1016/j.jmb.2003.08.013. [DOI] [PubMed] [Google Scholar]
  • 15.Tsai C.J., Xu D., Nussinov R. Protein folding via binding and vice versa. Fold. Des. 1998;3:R71–R80. doi: 10.1016/S1359-0278(98)00032-7. [DOI] [PubMed] [Google Scholar]
  • 16.Bahadur R.P., Chakrabarti P. Discriminating the native structure from decoys using scoring functions based on the residue packing in globular proteins. BMC Struct. Biol. 2009;9:76. doi: 10.1186/1472-6807-9-76. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Jones S., Thornton J.M. Principles of protein-protein interactions. Proc. Natl. Acad. Sci. USA. 1996;93:13–20. doi: 10.1073/pnas.93.1.13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Ramakrishnan C., Ramachandran G.N. Stereochemical criteria for polypeptide and protein chain conformations. II. Allowed conformations for a pair of peptide units. Biophys. J. 1965;5:909–933. doi: 10.1016/S0006-3495(65)86759-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Word J.M., Lovell S.C., Richardson D.C. Asparagine and glutamine: using hydrogen atom contacts in the choice of side-chain amide orientation. J. Mol. Biol. 1999;285:1735–1747. doi: 10.1006/jmbi.1998.2401. [DOI] [PubMed] [Google Scholar]
  • 20.Cornell W.D., Cieplak P., Kollman P.A. A second generation force field for the simulation of proteins, nucleic acids, and organic molecules. J. Am. Chem. Soc. 1995;117:5179–5197. [Google Scholar]
  • 21.Shannon R.D. Revised effective ionic radii and systematic studies of interatomic distances in halides and chalcogenides. Acta Crystallogr. A. 1976;32:751–767. [Google Scholar]
  • 22.Lee B., Richards F.M. The interpretation of protein structures: estimation of static accessibility. J. Mol. Biol. 1971;55:379–400. doi: 10.1016/0022-2836(71)90324-x. [DOI] [PubMed] [Google Scholar]
  • 23.Rocchia W., Sridharan S., Honig B. Rapid grid-based construction of the molecular surface and the use of induced surface charge to calculate reaction field energies: applications to the molecular systems and geometric objects. J. Comput. Chem. 2002;23:128–137. doi: 10.1002/jcc.1161. [DOI] [PubMed] [Google Scholar]
  • 24.Nichollos A., Honig B. A rapid finite difference algorithm, utilizing successive over-relaxation to solve the Poisson-Boltzmann equation. J. Comput. Chem. 1991;12:435–445. [Google Scholar]
  • 25.Radhakrishnan M.L., Tidor B. Optimal drug cocktail design: methods for targeting molecular ensembles and insights from theoretical model systems. J. Chem. Inf. Model. 2008;48:1055–1073. doi: 10.1021/ci700452r. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Jackson R.M., Sternberg M.J.E. Application of scaled particle theory to model the hydrophobic effect: implications for molecular association and protein stability. Protein Eng. 1994;7:371–383. doi: 10.1093/protein/7.3.371. [DOI] [PubMed] [Google Scholar]
  • 27.Baker N.A., Sept D., McCammon J.A. Electrostatics of nanosystems: application to microtubules and the ribosome. Proc. Natl. Acad. Sci. USA. 2001;98:10037–10041. doi: 10.1073/pnas.181342398. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Green D.F., Tidor B. Design of improved protein inhibitors of HIV-1 cell entry: Optimization of electrostatic interactions at the binding interface. Proteins. 2005;60:644–657. doi: 10.1002/prot.20540. [DOI] [PubMed] [Google Scholar]
  • 29.Morreale A., Gil-Redondo R., Ortiz A.R. A new implicit solvent model for protein-ligand docking. Proteins. 2007;67:606–616. doi: 10.1002/prot.21269. [DOI] [PubMed] [Google Scholar]
  • 30.Holm L., Sander C.J. Evaluation of protein models by atomic solvation preference. J. Mol. Biol. 1992;225:93–105. doi: 10.1016/0022-2836(92)91028-n. [DOI] [PubMed] [Google Scholar]
  • 31.Branden C.I., Jones T.A. Between objectivity and subjectivity. Nature. 1990;343:687–689. [Google Scholar]
  • 32.Avbelj F., Moult J., Hagler A.T. Molecular dynamics study of the structure and dynamics of a protein molecule in a crystalline ionic environment, Streptomyces griseus protease A. Biochemistry. 1990;29:8658–8676. doi: 10.1021/bi00489a023. [DOI] [PubMed] [Google Scholar]
  • 33.Park B., Levitt M. Energy functions that discriminate X-ray and near native folds from well-constructed decoys. J. Mol. Biol. 1996;258:367–392. doi: 10.1006/jmbi.1996.0256. [DOI] [PubMed] [Google Scholar]
  • 34.Simons K.T., Kooperberg C., Baker D. Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. J. Mol. Biol. 1997;268:209–225. doi: 10.1006/jmbi.1997.0959. [DOI] [PubMed] [Google Scholar]
  • 35.Samudrala R., Levitt M. Decoys ‘R’ Us: a database of incorrect conformations to improve protein structure prediction. Protein Sci. 2000;9:1399–1401. doi: 10.1110/ps.9.7.1399. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Tsai J., Bonneau R., Baker D. An improved protein decoy set for testing energy functions for protein structure prediction. Proteins. 2003;53:76–87. doi: 10.1002/prot.10454. [DOI] [PubMed] [Google Scholar]
  • 37.Moult J., Fidelis K., Tramontano A. Critical assessment of methods of protein structure prediction (CASP)—round IX. Proteins. 2011;79(Suppl 10):1–5. doi: 10.1002/prot.23200. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Holm L., Rosenström P. Dali server: conservation mapping in 3D. Nucleic Acids Res. 2010;38(Web Server issue) doi: 10.1093/nar/gkq366. W545–549. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Brooks B.R., Bruccoleri R.E., Karplus M. CHARMM: a program for macromolecular energy, minimization, and dynamics calculations. J. Comput. Chem. 1983;4:187–217. [Google Scholar]
  • 40.Edgar R.C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792–1797. doi: 10.1093/nar/gkh340. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Murzin A.G., Brenner S.E., Chothia C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 1995;247:536–540. doi: 10.1006/jmbi.1995.0159. [DOI] [PubMed] [Google Scholar]
  • 42.Thompson J.D., Higgins D.G., Gibson T.J. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994;22:4673–4680. doi: 10.1093/nar/22.22.4673. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Lee C., Subbiah S. Prediction of protein side-chain conformation by packing optimization. J. Mol. Biol. 1991;217:373–388. doi: 10.1016/0022-2836(91)90550-p. [DOI] [PubMed] [Google Scholar]
  • 44.Samudrala R., Moult J. An all-atom distance-dependent conditional probability discriminatory function for protein structure prediction. J. Mol. Biol. 1998;275:895–916. doi: 10.1006/jmbi.1997.1479. [DOI] [PubMed] [Google Scholar]
  • 45.Arab S., Sadeghi M., Sheari A. A pairwise residue contact area-based mean force potential for discrimination of native protein structure. BMC Bioinformatics. 2010;11:16. doi: 10.1186/1471-2105-11-16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Lu H., Skolnick J. A distance-dependent atomic knowledge-based potential for improved protein structure selection. Proteins. 2001;44:223–232. doi: 10.1002/prot.1087. [DOI] [PubMed] [Google Scholar]
  • 47.Skolnick J., Kolinski A., Ortiz A. Derivation of protein-specific pair potentials based on weak sequence fragment similarity. Proteins. 2000;38:3–16. [PubMed] [Google Scholar]
  • 48.Zhang C., Liu S., Zhou Y. An accurate, residue-level, pair potential of mean force for folding and binding based on the distance-scaled, ideal-gas reference state. Protein Sci. 2004;13:400–411. doi: 10.1110/ps.03348304. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Misura K.M., Chivian D., Baker D. Physically realistic homology models built with ROSETTA can be more accurate than their templates. Proc. Natl. Acad. Sci. USA. 2006;103:5361–5366. doi: 10.1073/pnas.0509355103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Melo F., Sánchez R., Sali A. Statistical potentials for fold assessment. Protein Sci. 2002;11:430–448. doi: 10.1002/pro.110430. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Li X., Hu C., Liang J. Simplicial edge representation of protein structures and alpha contact potential with confidence measure. Proteins. 2003;53:792–805. doi: 10.1002/prot.10442. [DOI] [PubMed] [Google Scholar]
  • 52.Mirzaie M., Eslahchi C., Sadeghi M. A distance-dependent atomic knowledge-based potential and force for discrimination of native structures from decoys. Proteins. 2009;77:454–463. doi: 10.1002/prot.22457. [DOI] [PubMed] [Google Scholar]
  • 53.Shen M.Y., Sali A. Statistical potential for assessment and prediction of protein structures. Protein Sci. 2006;15:2507–2524. doi: 10.1110/ps.062416606. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Miyazawa S., Jernigan R.L. Residue-residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading. J. Mol. Biol. 1996;256:623–644. doi: 10.1006/jmbi.1996.0114. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Two data sets, 12 figures, 14 tables, and references (44–54)
mmc1.pdf (956.4KB, pdf)

Articles from Biophysical Journal are provided here courtesy of The Biophysical Society

RESOURCES