Abstract
Cellular functions are performed through protein-protein interactions; therefore, identification of these interactions is crucial for understanding biological processes. Recent studies suggest that knowledge-based approaches are more useful than ‘blind’ docking for modeling at large scales. However, a caveat of knowledge-based approaches is that they treat molecules as rigid structures. The Protein Data Bank (PDB) offers a wealth of conformations. Here, we exploited ensemble of the conformations in predictions by a knowledge-based method, PRISM. We tested ‘difficult’ cases in a docking-benchmark dataset, where the unbound and bound protein forms are structurally different. Considering alternative conformations for each protein, the percentage of successfully predicted interactions increased from ~26% to 66%, and 57% of the interactions were successfully predicted in an ‘unbiased’ scenario, in which data related to the bound forms were not utilized. If the appropriate conformation, or relevant template interface, is unavailable in the PDB, PRISM could not predict the interaction successfully. The pace of the growth of the PDB promises a rapid increase of ensemble conformations emphasizing the merit of such knowledge-based ensemble strategies for higher success rates in protein-protein interaction predictions on an interactome-scale. We constructed the structural network of ERK interacting proteins as a case study.
Keywords: protein-protein interaction prediction, PRISM, structural network, knowledge-based method, conformations, docking
INTRODUCTION
Biological processes take place through protein-protein interactions (PPIs). They are crucially important for cellular function, regulation and signaling. Following the Human Genome Project, high-throughput studies have become popular and PPIs have also been investigated on large scales. Proteome-level studies elucidate cellular functions,1, 2 regulation,3, 4 disease mechanisms,5, 6 conservation through evolution,7-9 drug discovery10 and drug side-effects.11, 12 A key first step in such studies is determining the PPIs. PPIs on large scales are identified by experimental techniques like the yeast two-hybrid system,13 phage display,14 protein arrays,15 and affinity purification.16 The output of these experiments can include false negatives and false positives.17-20 Experimental structural techniques offer more reliable data. The structures of interacting proteins illustrate not only that they interact, but also how they interact. X-ray crystallography,21 nuclear magnetic resonance (NMR) spectroscopy,22 cryo-electron microscopy (Cryo-EM)23 and small-angle X-ray scattering (SAXS)24 provide the 3-dimentional (3D) architecture of the proteins and their interactions, and high resolution structures are available in the PDB.25 The number of structures in the PDB grows exponentially; there are more than 89,000 structures as of March 2013 and more than a quarter of them have been added after 2009.
Computational approaches can assist experiments in verifying results and in predicting new interactions; in addition, they are cheaper and faster. They can be classified into two categories: ‘blind’ docking and knowledge-based methods. ‘Blind’ docking methods26-29 search for the ‘best’ (i.e. native) bound state by considering a large number of possible structural combinations of query proteins. They are computationally expensive. Further, in the absence of biological, functional knowledge that the proteins interact and some data on the interaction site, their predictions may not be reliable. This is because there will always be some favorable modes for any two proteins to interact. ‘Blind’ docking is impractical for large-scale studies, particularly if new interactions are sought, where no data are available on whether the proteins interact. Knowledge-based methods30-35 may fare better: they are based on the notion that motifs recur in nature.36-39 They compare the surfaces of query proteins with known protein-protein interfaces of interacting protein pairs. Since there are fewer templates (derived from known protein interactions), the process is faster and more affordable, and thus applicable on large-scale. Knowledge-based methods are appropriate techniques to construct interactome-scale structural networks. Even though the PDB covers a limited number of protein-protein interactions, a systematic large scale study40 showed that templates are available to model complexes of structurally characterized proteins. However, on the down side, knowledge-based approaches typically consider the structures as rigid bodies, even though proteins are flexible41-44 and their preferred conformational states are expected to change with the environment. Neighboring molecules, atoms, or ions can redistribute their ensembles;45, 46 this redistribution is dynamic47-51 and is reflected in the observed protein structures. Binding, post-translational modifications, changes in ligand concentration, pH and ionic strength of the medium will affect the conformational distribution. The ensembles confer on the protein the ability to function.49-51 The bound forms of the proteins can be obtained from the unbound forms in in silico experiments;52 however, protein flexibility and conformational changes upon binding are mostly ignored in large scale knowledge-based methods. This limits the capability and success in prediction, and therefore, the construction of the structural network.
In principle, different conformations of proteins can be considered by knowledge-based methods, and this can be expected to improve the predictions. The PDB offers many protein structures including different conformations of the same protein. These conformations may include bound, unbound or any alternative forms (e.g. following allosteric post-translational modifications, or bound to different ligands) of the proteins which can be utilized. The collection of all available structures of a protein provides a subset of the repertoire of its conformations under different conditions, and can constitute the input rather than a single structure. These conformations would help to more reliably figure out whether the proteins interact, and how they interact, and identifying binary protein interactions is the first step in the construction of the structural network. We tested a motif-based protein interaction prediction tool, PRISM (PRotein Interactions by Structural Matching),33, 35, 53 on a docking benchmark dataset,54 to see if providing different conformations of proteins would indeed increase PRISM’s capability to detect interactions. As we have shown earlier, if the structures are available and the proteins are known to play a role in the same pathway, PRISM can successfully predict if two proteins in these interact (76% and 78% accuracies in the ubiquitination55 and apoptosis56 pathways, respectively). We also demonstrated that it can be used to construct structural networks.57, 58 This knowledge-based tool can successfully predict the ‘easy’ cases of the docking benchmark dataset;59 thus, in this study we targeted the ‘difficult’ cases toward proteome applications. We found the alternative structures of a protein through sequence homology and structural alignment. If two PDB entries of the same protein have different structures, they are included as different molecules in the target set. The interactions between the two proteins are predicted and compared with the bound state given in the benchmark dataset. The success of the prediction is assessed by the global energy values of the predicted complex and by IS-score60 which shows the structural similarity between the predicted molecule and the bound state of the proteins. As a case study we constructed the structural network of the extracellular signal-regulated kinases (ERK) interactions, and show that considering alternative conformations of the query proteins improve the modeled structural network.
MATERIALS AND METHODS
In this section, we first describe PRISM, and then the method used to find different conformations of the proteins together with the dataset.
PRotein Interactions by Structural Matching (PRISM)
The flowchart of PRISM is illustrated in Figure 1 (flow through blue boxes; steps 1-4) and explained below. PRISM was implemented in Python and runs in a UNIX environment. Details on how to run PRISM were given earlier.35 The PRISM source codes and external programs can be downloadable at http://prism.ccbb.ku.edu.tr/prism_protocol/. First, external programs, FASTA version 35,61 NACCESS,62 MultiProt63 and FiberDock,64 need to be installed. The user enters target protein names in the file named “PDB.list” in ”PRISM_protocol/0-SurfaceExtraction” directory and the PDB files are downloaded from the PDB web page (http://www.pdb.org). Surfaces of target proteins are extracted using the script and output files are copied to the structural matching directory, “PRISM_protocol/1-Prediction”. There, the user selects template interfaces. We used all template interfaces in this study. Structural matching between target protein surfaces and template interfaces is done using the script. Transformation of the structures and filtering are done in “PRISM_protocol/2-DistanceCalculation” directory. At the last step, flexible refinement and energy calculation are done in directory “PRISM_protocol/6-FiberDock”. When PRISM run is complete, structures of predictions and their energies are reachable in “FIBERDOCK_Structures” and “ENERGIES” directories, respectively.
Figure 1. The flowchart of PRISM.
Cyan box shows the flowchart of the PRISM algorithm. Template and target datasets are inputs, and 3 dimensional data and global energies of predicted interactions are outputs of the algorithm. Step 0: Template dataset organization; step 1: surface extraction of target proteins; step 2: structural alignment of target surfaces with template interfaces; step 3: elimination of clashing structures; step 4: flexible refinement.
Template Dataset Organization of PRISM
The template dataset includes a non-redundant set of protein-protein interface structures.38 Interfaces of all two pair interactions in the PDB were extracted (Figure 1, step 0). Interfaces consist of ‘contacting’ and ‘nearby’ residues. If two residues from each interacting protein are close enough, they are labeled as contacting residues. The cut-off value is the sum of the van der Waals radii of the heavy atoms plus 0.5 Å. The ‘nearby’ residues constitute the scaffold of an interface. They are the neighboring residues whose Cα atoms are at most 6.0 Å away from the Cα atom of a contacting residue. Interfaces are structurally clustered to obtain a non-redundant set. Each cluster has members which are structurally similar to the representative interface of the cluster. Computational hotspots of representative interfaces are found via the web server, HotPoint.65
Target Dataset of PRISM
Query proteins among whose interactions are to be searched constitute the target set (Figure 1). These can be assembled by the user.
Prediction Algorithm of PRISM
First, the surfaces of the target proteins are extracted using the NACCESS program (Figure 1, step 1).62 Then, PRISM checks the structural similarity between target surfaces and template interfaces. Structural similarity is searched via alignment of target surfaces onto template interfaces using MultiProt63 (Figure 1, step 2). If a target surface is structurally similar to one side of a template interface and another target surface is structurally similar to the complementary side, these two target structures may interact. To guarantee a proper match between a target surface and a template interface in the alignment, PRISM checks if the matched residues from the two sides are against each other; and if at least one residue of the target surface matches with a hotspot of the template interface. The candidate protein complexes are next physically and biologically evaluated. Clashes between residues of the two structures are counted (Figure 1, step 3). If there are ≥ 5 clashes among alpha-carbons, the candidate complex is discarded. After that, flexible refinement and energy calculation are done using FiberDock64 (Figure 1, step 4). Side-chains of the structures are oriented to have a more favorable state. Hydrogen atoms are also considered in this process. The backbones of the structures can be slightly re-oriented in this refinement step. Finally, the global energy of the candidate complex is calculated. Complexes with energy lower than the threshold value are considered as biologically meaningful, i.e. that these proteins interact. PRISM gives the atomic coordinates of the potentially interacting proteins.
Enlarging the Target Dataset Using Different Conformations
Different conformations of target proteins are found in the PDB. First, chains of structures which have the same sequence as the query protein are detected (Figure 2B). 100% (and then 95%) FASTA sequence homology between the molecules is considered. Then, the molecular structures are compared using MultiProt63 (Figure 2C). If MultiProt matches the candidate structure with less than 90% of the query structure and the root-mean-square-deviation (RMSD) value between the matched residues of the two structures is more than 2.0 Å, the candidate structure is considered as a different conformation of the query protein. Different conformations of the target proteins are added to the target set (Figure 2D); and PRISM is run with the enlarged target set to test if the prediction improves.
Figure 2. Exploiting different conformations to enlarge target set.
A) The standard target set includes one conformation of each protein. B) The PDB structures of the standard target proteins are found using sequence homology. C) Conformations are clustered using structural alignment. Structurally similar conformations are given in the same column. Representatives of the clusters are alternative conformations of the standard target proteins. D) The target dataset is enlarged with the different conformations of the standard target proteins.
The Docking Benchmark Dataset and Prediction with PRISM
We tested the method on ‘difficult’ cases of a docking benchmark dataset.54 The benchmark set provides the bound state of the two given molecules in each case. In some cases, structures are multimeric and there is more than one binary interaction. In these cases (cases 1, 2, 11, 12 and 18), each binary interaction was considered separately. A list of all interactions studied for 30 ‘difficult’ cases is given in Table S1. The structural difference between the bound and unbound forms determines the case-type of the interaction: ‘difficult’ cases of the benchmark dataset include structures with interface root-mean-square distance (iRMSD) larger than 2.2 Å.54 Figures 3 and 4 provide examples of allostery66 and the conformational changes upon binding. First, PRISM was run with the target set containing only the two molecules from the benchmark to predict their bound state (Figure 5A). Second, different conformations of the two molecules were found (Table S2; more than one chain name indicates that the molecule is similar to each chain) and added to the target set (Figure 2). Then, PRISM was run with the enlarged target set (where this is repeated for each case in the docking benchmark) (Figure 5B). The energy value cut-off, −10kJ/mol, was used to determine favorable predictions. Table S3 provides the FiberDock energies of the bound forms for the benchmark dataset. All interfaces except 2hmiCB and 1h1vAG (28 out of 30 interactions) have low energies (below −10 kJ/mol). Based on our previous studies,35, 55, 56, 58, 67 we set the cut-off value as −10 kJ/mol. The success of a prediction was evaluated based on IS-score,60 which is a metric to evaluate protein-protein interaction predictions. The IS-score does not consider the equivalence of target and template residues,60 unlike the Critical Assessment of PRedicted Interactions (CAPRI) criteria68 which are based on RMSD and native contact fractions. These could present problems because PDB structures may lack residues or domains (due to e.g. conformational disorder), and alternative conformations of the query proteins may differ in length. Predicted models are considered as ‘near native’ if the IS-score > 0.17, ‘acceptable’ if the IS-score is between 0.12 and 0.17, and ‘incorrect’ if the IS-score < 0.12. We also presented the native (fnat) and the non-native (fnon-nat) contact fractions; fnat: is the number of contacts correctly predicted divided by the number of the native contacts, fnon-nat: the number of contacts incorrectly predicted divided by the number of the predicted contacts. IS-score detects substructure similarity and characterizes the similarity of the interfaces of the complex. The alignment exploits iAlign31 and IS-score compares only matched residues. Zero contact value indicates the model and the native complex have no good match and the model is incorrect. In addition, we present the performance of PRISM as the ratio of successful predictions which were evaluated by IS-score (IS-score > 0.12) to the energetically favorable PRISM predictions (energy < −10 kJ/mol). The percentage of favorable predictions of PRISM that successfully match the bound forms is given for the benchmark dataset.
Figure 3. An example of allosteric effect on binding – Case 17.
A) Unbound form of heparin cofactor II (blue, 1jmjA). B) Allosteric effect of glycosaminoglycan (GAG) leads to conformational change in heparin cofactor II. Unbound (blue) and bound (cyan) forms of heparin cofactor II are given together. C) Bound forms of heparin cofactor II (cyan: 1jmoA) and thrombin (orange: heavy chain, 1jmoH; green: light chain, 1jmoL) are shown separately. D) Interaction of heparin cofactor II and thrombin.
Figure 4. An example of conformational change upon binding – Case 12ii.
A) Unbound forms of soluble tissue factor (purple, 1tfhB) and blood coagulation factor VIIA (blue: 1qfkL). B) Conformational changes are shown, unbound and bound forms of the soluble tissue factor are given in purple and yellow, and unbound and bound forms of the soluble tissue factor are given in blue and green, respectively. Large conformational change of the blood coagulation factor VIIA is indicated via an arrow. C) Bound forms are shown separately. D) Interaction of soluble tissue factor (1fakT) and blood coagulation factor VIIA (1fakL).
Figure 5. PRISM predictions with standard and enlarged target datasets.
A) PRISM is run using standard target dataset. No favorable interaction can be found for the given conformations of the proteins. B) PRISM is run using enlarged target dataset. A favorable interaction is found with one conformation of each protein.
Eliminating bias by discarding information on the bound form
Bound forms of the proteins exist in the PDB. These forms can appear as alternative structures of the unbound forms or the template interface. For example, in interaction 14, the bound form given in the benchmark dataset is 1ibrAB. 1ibrA and 1ibrC are found as alternative structures of the unbound form 1qg4A, and 1ibrB is for 1f59A (Table S2). Moreover, the template 1ibrAB is used to predict this interaction. To eliminate bias, the bound forms were discarded from the target and template datasets (Figure 6). First, PDB entries of the bound forms were discarded from the PDB list; then alternative forms were found. These new alternative target structures are given in Table S4. Second, if a template interface was the interface of a bound form, another interface from the same template cluster was chosen as the representative. If the template cluster had no other member, this template was eliminated. The changes in the template organization are shown in Table S5. Since the bound forms are given also as the unbound forms in cases 2i (molecule: 2hmiC) and 2ii (molecule: 2hmiD) in the benchmark dataset, these cases were not considered and the remaining 28 interactions were processed under this scenario.
Figure 6. Prediction using or not using the structures of bound forms.
The benchmark dataset gives the unbound and the bound forms of the molecules. Different color tones represent different conformations of proteins shown in green and blue. The green-cyan template represents the interface of the bound forms. We followed two scenarios in PRISM predictions. A) First, we found alternative conformations from all PDB structures and run PRISM using all templates. B) Second, we eliminated bound forms from the PDB structures and then found alternative conformations. In addition, if a template interface was the interface of a bound form, another interface from the same template cluster was chosen as the representative. If the template cluster had no other member, this template was eliminated.
RESULTS AND DISCUSSION
We obtained two sets of predictions. First, PRISM was run with a ‘standard’ target set of given unbound structures (Figure 5A); one conformation for each molecule. Then, the target set was enlarged with the different conformations of the unbound forms of each protein available in the PDB (Table S2, Figures 2 and 5B). The first target set had two conformations (a size of 2) in all cases; and on average the enlarged target sets had a size of 8.87, and the median values was 6.50, which did not affect considerably the computation process. Using standard target sets, 9 out of 30 ‘difficult’ interactions were predicted as energetically favorable (−10kJ/mol cut-off value; Table S6). In the second set, when the target sets were enlarged with the different conformations of the given unbound structures, 7 of these 9 interactions were better predicted, with lower global energy values (interactions 2i, 11i, 11ii, 12i, 16, 18ii and 19). Energetically favorable predictions (an increase from 9 to 24) were obtained for additional interactions when enlarged target sets were used (Table S6). We note that energetically favorable models do not necessarily correspond to the native complex structures of the benchmark. Energetically favorable interactions together with their templates are given in Tables S7a and S7b. If the chain is specified, the interacting molecule is the monomer; otherwise, it is the whole structure. If more than one chain is listed, the chains are structurally similar. IS-scores and contact fractions of the predictions obtained using alternative conformations are provided in Table 1 (Table S8 shows the results of predictions obtained using the standard target set). As to IS-scores, among the predictions obtained using standard target sets, 7 cases were ‘near native’, 1 case was ‘acceptable’ and 1 case was ‘incorrect’ (Table S8), and among predictions obtained using enlarged target sets, 18 cases were ‘near native’ and 6 cases were ‘incorrect’ (Table 1). When we eliminated incorrect predictions, we had 8 successful predictions with the standard target set and 18 with the enlarged set. In addition, if a different prediction was obtained when the enlarged target set was used, the prediction has a higher IS-score.
Table 1. Energies, IS-scores and contact fractions of PRISM predictions obtained using standard template set and enlarged target set.
| Difficult Case | Energy (kJ/mol) |
IS-score | Rating | Real Contacts |
Model Contacts |
Common Contacts |
fnat | fnon-nat |
|---|---|---|---|---|---|---|---|---|
| 1i | −21.85 | 0.2366 | Near Native |
17 | 27 | 16 | 0.94 | 0.41 |
| 2i | −21.69 | 0.0648 | Incorrect | 0 | 39 | 0 | 0.00 | 1.00 |
| 2ii | −63.70 | 0.0751 | Incorrect | 0 | 72 | 0 | 0.00 | 1.00 |
| 4 | −68.75 | 0.4041 | Near Native |
21 | 38 | 18 | 0.86 | 0.53 |
| 5 | −74.40 | 0.6640 | Near Native |
65 | 74 | 55 | 0.85 | 0.26 |
| 8 | −46.33 | 0.3775 | Near Native |
38 | 46 | 21 | 0.55 | 0.54 |
| 9 | −114.68 | 0.6956 | Near Native |
86 | 113 | 80 | 0.93 | 0.29 |
| 11i | −63.43 | 0.6691 | Near Native |
47 | 57 | 41 | 0.87 | 0.28 |
| 11ii | −64.65 | 0.6036 | Near Native |
38 | 46 | 34 | 0.89 | 0.26 |
| 12i | −57.47 | 0.4789 | Near Native |
15 | 29 | 15 | 1.00 | 0.48 |
| 12ii | −102.61 | 0.5916 | Near Native |
53 | 76 | 47 | 0.89 | 0.38 |
| 13 | −106.01 | 0.5890 | Near Native |
46 | 58 | 41 | 0.89 | 0.29 |
| 14 | −144.40 | 0.7976 | Near Native |
79 | 85 | 68 | 0.86 | 0.20 |
| 15 | −63.74 | 0.6283 | Near Native |
53 | 71 | 48 | 0.91 | 0.32 |
| 16 | −49.32 | 0.6967 | Near Native |
62 | 78 | 56 | 0.90 | 0.28 |
| 17 | −21.97 | 0.0659 | Incorrect | 0 | 32 | 0 | 0.00 | 1.00 |
| 18i | −20.13 | 0.4864 | Near Native |
19 | 23 | 18 | 0.95 | 0.22 |
| 18ii | −50.84 | 0.4942 | Near Native |
35 | 38 | 26 | 0.74 | 0.32 |
| 19 | −140.42 | 0.9363 | Near Native |
93 | 100 | 93 | 1.00 | 0.07 |
| 20 | −58.42 | 0.6201 | Near Native |
38 | 44 | 29 | 0.76 | 0.34 |
| 21 | −14.77 | 0.1770 | Near Native |
10 | 21 | 6 | 0.60 | 0.71 |
| 22 | −40.35 | 0.0696 | Incorrect | 0 | 50 | 0 | 0.00 | 1.00 |
| 23 | −42.70 | 0.0679 | Incorrect | 0 | 36 | 0 | 0.00 | 1.00 |
| 24 | −43.80 | 0.0700 | Incorrect | 9 | 43 | 0 | 0.00 | 1.00 |
PRISM was also run without using the data of the bound forms in the benchmark dataset to eliminate any bias. The results of this scenario (see Materials and Methods, section “Eliminating bias by discarding information of the bound form”) are given in Table S9. Enlarging the target set by including alternative conformations increased the number of energetically favorable predictions from 6 to 21 out of 28 interactions (5 of these 6 were predicted better, with lower global energy), with the cut-off energy set at −10 kJ/mol. Favorable interactions together with their templates and global energy values are listed in Tables S10a and S10b. IS-scores and contact fractions are provided in Tables S11a and S11b. According to the IS-scores classifications, all 6 predictions obtained using standard target sets were ‘near native’, and among predictions using enlarged target sets, 14 cases were ‘near native’ and 7 cases were ‘incorrect’. When we eliminated incorrect predictions, there were 6 successful predictions with the standard target set and 14 with the enlarged set. In addition, higher IS-scores were obtained (except for case 16 with better energy but lower IS-score) for predictions with the enlarged target set.
95% sequence homology was used to enlarge the target set more
Up to here, we considered 100% sequence homology for alternative conformations from the PDB. Analysis of the sequence homology between unbound and bound forms given in the benchmark dataset, illustrates that some are not 100% similar (Table S12). However, reducing sequence homology much below 100% can lead to picking different proteins. We reduced this value to 95% for cases that we could not obtain successful predictions (for 16 interactions: 1ii, 2i, 2ii, 3, 4, 6, 7, 10, 14, 17, 18i, 20, 22, 23, 24 and 25 – for cases 4, 14, 18i and 20, unsuccessful results were obtained when bound data was eliminated from the template and target sets; therefore, we added these to this list- to enlarge the target set. We used the same criteria to identify structurally different conformations as described in Materials and Methods, section “Enlarging the Target Dataset Using Different Conformations”. These additional alternative conformations are given in Table S13 (more than one chain name indicates that the molecule is similar to each chain). No additional conformations were detected for the unbound forms of cases 7, 10, 22, 23 and 24. Although additional favorable predictions were obtained for cases 1ii, 2i, 2ii, 3, 4, 17, 20 and 25 by using these conformations (Table S14), only successful results were obtained for cases 1ii, 3 and 14 (IS-score > 0.12). Energy values, IS-scores and contact fractions are given in Table S15.
To enlarge the target sets, we also tested 90% and 85% sequence homology. Lower sequence homology yields more structures; however, increases the risk of picking a different protein. For target sets enlarged with 90% sequence homology, we tested all templates. Due to computational cost, for target sets enlarged with 85% sequence homology, we tested templates that successfully matched target proteins found with higher sequence homology. At 95% sequence homology, successful results were obtained for 3 cases; the 90% and 85% sequence homology cases failed to give additional successful predictions (data not shown). 95% sequence homology covers sequence difference between unbound and bound forms in the benchmark dataset, where the lowest sequence similarity is 98.8% (Table S12). Therefore, we consider 95% sequence homology as appropriate to further enlarge the target set.
When we could use the structures of bound forms, the success was 18 out of 30 ‘difficult’ interactions at 100% sequence homology, and considering additional successful results of cases 1ii and 3, increased the number to 20. When we did not use the data of the bound forms (see Materials and Methods, section “Eliminating bias by discarding information of the bound form”), the success was 14 out of 28 ‘difficult’ interactions at 100% sequence homology, and considering additional successful results of cases 1ii and 14, it increased to 16. The final success rates are given in Table 2. We also present the ratio of successfully predicted results vs. favorable predictions in Table 3, which indicates the percentage of predictions that successfully match the bound forms in the benchmark dataset. When the standard target set was used together with the standard or modified template set, almost all predictions (except one obtained with the standard target and template sets) matched with the bound forms in the benchmark dataset. When the target set was enlarged with alternative conformations, more than 76% of the favorable predictions matched the bound forms following both two scenarios. Figure 7 presents the distribution of the predictions based on their energies and IS-scores. Predictions with relatively lower energies can have low or high IS-scores; predictions with relatively higher energies have high IS-scores. Predictions with favorable energies but low IS-scores (red triangles) might correspond to alternative binding modes of the proteins. The benchmark offers favorable interactions of protein pairs; yet, other conformations of the proteins may also interact in a different way than the bound forms given in the benchmark dataset, and we may capture such interactions. We checked if those interactions can be biological using machine learning tools, NOXclass,69 DiMoVo70 and EPPIC71 (Table S16). Default parameters, interface area, interface area ratio and area-based amino acid composition, were used in NOXclass evaluations; cut-off value was chosen as 0.5 in the DiMoVo as indicated there; the EPPIC server directly indicates if the interface is biological or the outcome of crystal packing. NOXclass evaluated 13 interactions, DiMoVo evaluated 2 interactions and EPPIC evaluated 5 out of 18 interactions as biological. 2 interactions were evaluated as biological by all three tools and 13 interactions by at least one tool. Machine learning tools depend on their training datasets; however, evaluation of some interactions as biological indicates that these interactions might be biological although structurally they are not similar to the bound forms given in the benchmark dataset.
Table 2. The success of PRISM predictions.
| Cases | Total Interaction |
Predicted Interactions Using Standard Target Set |
Predicted Interactions Using Enlarged Target Set |
||
|---|---|---|---|---|---|
| Number | Number | Percentage | Number | Percentage | |
| Difficult cases (also using data of bound forms) |
30 | 8 | 26.7% | 20 | 66.7% |
| Difficult cases (not using data of bound forms) |
28 | 6 | 21.4% | 16 | 57.1% |
Table 3. Successful PRISM predictions in energetically favorable PRISM predictions.
| Predictions | Favorable PRISM Predictions (Energy < −10 kJ/mol) |
Successful PRISM Predictions (IS-score > 0.12) |
Percentage of favorable PRISM results matching bound forms |
|---|---|---|---|
| Standard Target Set, Standard Template Set |
9 | 8 | 88.9% |
| Enlarged Target Set, Standard Template Set |
26 | 20 | 76.9% |
| Standard Target Set, Unbiased Template Set |
6 | 6 | 100% |
| Enlarged Target Set, Unbiased Template Set |
21 | 16 | 76.2% |
Figure 7. Distribution of predictions based on energy and IS-score.
Red triangles show incorrect predictions (IS-score < 0.12). Green square represents acceptable prediction (0.12 < IS-score < 0.17). Blue diamonds show near native predictions (IS-score > 0.17). They are distributed in a triangle profile where predictions with higher energies do not have low IS-scores. We should note that the predictions with favorable energies but low IS-scores (red triangles) might correspond to alternative binding modes of the proteins.
A case study: ERK interactions
We tested this method on the extracellular signal-regulated kinases (ERK) interactions. In the KEGG database,72 interactions among the proteins ERK, MEK1, MEK2, MP1, RSK2 and Mnk1/2 are given as in Figure 8. In this ‘classical’ edge-and-node representation, nodes are the proteins and edges are the interactions (Figures 8 and 9A). This protein-protein interaction network provides information related to which proteins interact; but not how they interact. Figure 9A presents as example: MEK1 and MEK2 interact with ERK; however, it is unclear whether these two proteins can (or cannot) interact with ERK simultaneously to form a trimer. In contrast, structural data provide the contacting residues and the interfaces of the interacting proteins, which allow distinguishing between distinct and overlapping protein-protein interactions.57 Panchenko’s group has also shown that the structurally inferred interaction network is more functionally coherent.73 We constructed the structural network of the ERK interactions, first using one conformation of each protein (Figure 9B), and then including alternative conformations (Figure 9C). Interacting conformations are represented by an edge and binding sites are shown by small rectangles on the nodes.74 As we show below, considering alternative conformations can improve the network. Alternative conformations were found in the PDB using 100% sequence homology. PDB IDs of the conformations are given in Table S17. The best results with respect to their energy values for the query proteins and their conformations are shown in Tables S18a and S18b (if the chain name is not indicated, it is the whole structure). In the first case, 5 predictions were obtained and 4 of them were energetically favorable when the cut-off energy value was −10 kJ/mol (Figure 9B, Table S18a). One of these was between MEK1 and MEK2, shown in Figure 10A. However, in the second case, 7 interactions were predicted, all favorable and better results (with lower energy values) were obtained for the previous predictions (Figure 9C, Table 18b). For example, the energy of MEK1-MEK2 interaction prediction decreased from to −23.84 kJ/mol to −78.99 kJ/mol, when a different conformation (PDB ID 3e8n instead of 3eqc) of MEK1 was utilized. Compared to the previous conformation of MEK1, this conformation is a better complementary structure to MEK2 (Figure 10B), which indicates that utilizing alternative conformations can improve the predictions and thus the structural pathways.
Figure 8. MAPK signaling pathway given in the KEGG database.
Nodes are the proteins and edges are the interactions. Gary lines indicate the membrane. The data is taken from the KEGG database. The focus is on ERK interactions with MEK1, MEK2, MP1, Mnk1/2 and RSK2.
Figure 9. Interactions of ERK protein as given in the KEGG database and predicted by PRISM.
A) ERK interactions with MEK1, MEK2, MP1, Mnk1/2 and RSK2. The data is taken from the KEGG database. B) PRISM predictions obtained by using one conformation of the ERK interacting proteins. Different colors are for different proteins. Binding sites are shown as little rectangles. If two interactions are through the same binding site, edges are connected to the same little box. C) PRISM predictions obtained by using alternative conformations of the ERK interacting proteins. Alternative conformations are found from the PDB using 100% sequence homology.
Figure 10. Predicted interactions of MEK1-MEK2.
A) Predicted interaction of MEK1-MEK2, PDB IDs are 3eqcA (MEK1) and 1s9iA (MEK2). PRISM calculated the energy as −23.84 kJ/mol. Red box focuses on the helices where two structures are most close to each other. B) Predicted interaction of MEK1-MEK2, PDB IDs are 3e8nA (MEK1) and 1s9iA (MEK2). PRISM calculated the energy as −78.99 kJ/mol, lower than the energy calculated in part A. Red box focuses on the helixes where two structures are most close to each other. It shows helices are more complementary to each other compared to the interaction in part A.
Some of the predicted interactions have been shown experimentally. MEK1 activates ERK1 via phosphorylation. MEK1 residues 33-393, which cover all interacting residues in our model, interact with ERK.75 Deletion of ERK residues 241-272, which in our model contribute 10 out of the 13 interacting residues, abolishes the interaction. However, this deletion does not affect the ERK-Mnk association, and in our model the interaction is not through this portion of the surface. MP1 is a MEK1 scaffolding protein which regulates cell spreading. In the MEK1-MP1 interaction, binding is through MEK1 residues 220-393, which in our model include all interacting residues (without contribution from MEK1 residues 1-219).76 No MEK2-MP1 interaction was detected in the two-hybrid system in agreement with our predictions. Kinase suppressor of Ras (KSR) is a scaffolding protein for the Raf-MEK-ERK complex. KSR2 promotes phosphorylation of MEK1 and the crystal structure of this heterodimer was obtained (PDB ID: 2y4i).77 Since the KEGG database does not include KSR2 in the MAPK pathway, we also did not include this protein in our predictions. However, PRISM could predict the KSR2-MEK1 interaction (data not shown). MEK1-MEK2 interaction determines the strength and duration of the ERK signal78 and the ERK-RSK interaction is observed in response to various stimuli, like growth factors, polypeptide hormones, neurotransmitters, and chemokines.79 MEK1 and MEK2 signaling stimulate ribosomal S6 kinase (RSK).80
Using a structural protein-protein interaction network, we are able to observe which interactions can (or cannot) occur simultaneously. If two proteins interact through different sites of a third protein and there is no spatial clash, the two interactions can occur simultaneously, forming a trimer. For example, a trimer of ERK-MEK1-MEK2 can form, as shown in Figure 11. The energetically most favorable predictions were 3mblA-2ojgA (PDB codes) for MEK1-ERK and 3e8nA-1s9iA for MEK1-MEK2. These formed via two different conformations of MEK1, 3mblA and 3e8nA. In the construction of the trimer, we considered one of these MEK1 conformations, 3mblA. The most favorable interaction of 3mblA with a MEK2 was 3mblA-1s9iB (energy: −62.75 kJ/ml, template: 1s9iAB). Since ERK and MEK2 bind through different sites of MEK1 and their residues did not clash, a trimer could be obtained based on MEK1-ERK and MEK1-MEK2 interactions. The trimer structure was modeled by superimposing MEK1 (3mblA) of MEK1-ERK (3mblA-2ojgA) and MEK1-MEK2 (3mblA-1s9iB). Figure 12 presents the most favorable interacting conformations of ERK, obtained by using alternative conformations. Different conformations of the same protein are given by different structures in the same color. Collectively, these show which conformation of a protein is more favorable to interact with another protein, and indicate that the protein switches its conformation while switching its interacting partner, information that is valuable for figuring out pathway regulation.
Figure 11. Trimer structure of ERK, MEK1 and MEK2.
Trimer structure was obtained via superimposing MEK1 protein (3mblA) of MEK1-ERK (3mblA-2ojgA) and MEK1-MEK2 (3mblA-1s9iB) interactions predicted by PRISM. Blue: MEK1 protein, green: ERK protein and orange: MEK2 protein.
Figure 12. Interacting conformations of ERK interacting proteins.
Interactions are energetically most favorable PRISM predictions obtained by using alternative conformations. Different conformations of the same protein are given by different structures in the same color.
CONCLUSIONS
Identification of PPIs is a first step in the elucidation of the interactome-scale structural network. Toward this aim, a tool which can help in the modeling of the structural pathway can be very useful. Here, we investigated whether different conformations of the query proteins found from the PDB could increase success in PPI identification by a motif, knowledge-based protein-protein interaction prediction tool. Knowledge-based methods can predict successfully an interaction between proteins and its three dimensional structure, if a physically and biologically meaningful interaction exists in the database. They neither consider the conformational space nor model conformational changes. Flexibility is considered only in the refinement process, which is the last step in the prediction, where side chain and slight backbone re-orientation are carried out. In principle, the capability of knowledge-based methods, like PRISM, to consider protein flexibility is limited by the supply of different conformations to the target set. Adding different conformations to the target set and searching interactions among these can be expected to improve the predictions. In this study, we tested such a strategy. Possible conformations of the query proteins were extracted from the PDB and treated as individual structures in the predictions. These structures were guaranteed to belong to the same protein by requiring high percentage (first 100%, and then 95%) sequence homology. Different conformations were identified by structural similarity to eliminate redundancy.
Adding sufficiently different conformations of proteins to the target set may allow predicting the interaction of two proteins in any form. However, two factors limit the capability of template-based methods to predict correctly the native interaction: (i) the template set organization, and (ii) availability of the ‘right’ conformations of the query proteins. For the first, it is important that the template structures represent the interactions in their corresponding clusters, otherwise, appropriate structural similarity cannot be found between the target and template structures. Second, in particular, a major limitation of the prediction success is the coverage of the PDB. If the appropriate conformation exists in the PDB, it can be found and the interaction can be predicted successfully; in its absence the prediction will fail. The exponentially increasing numbers of structures in the PDB promises a high likelihood of finding the ‘right’ conformations of the query proteins. More successful predictions will be obtained as the PDB covers more structures. Even with these limitations, knowledge-based approaches offer successful predictions. A large scale, systematic study40 indicates that knowledge-based approaches are more reliable than ‘blind’ docking strategies. Appropriate templates can be extracted to model nearly all complexes of structurally characterized proteins, although there are currently a limited number of protein-protein complexes in the PDB.
In the specific example here with PRISM, successful prediction was achieved for almost all the ‘easy’ cases (87 out of 88) of the docking benchmark dataset,59 and considering other conformations in the PDB, ‘difficult’ cases of this dataset were predicted with 66.7% success rate. A success rate 57.1% was obtained in unbiased predictions. Previous studies predicted interactions in signaling pathways (76% and 78% accuracies in the ubiquitination55 and apoptosis56 pathways, respectively- and constructed structural networks.57 Here, we modeled the structural network of ERK, first based on one conformation of each protein; then considering alternative conformations in the PDB an improved structural network was obtained. Given the limited number of architectural motifs in single chain proteins and as we have shown earlier in protein-protein interfaces,43, 81-83 and the difficulties facing large-scale applications of ‘blind’ docking methods, knowledge-based approaches are a reasonable venue for proteome-scale cellular network constructions. Fast incorporation of experiment-based flexibility into these is advantageous. While vastly incomplete, and with limited sampling, it is a reasonable and reliable strategy to adopt.
Supplementary Material
ACKNOWLEDGMENT
Guray Kuzu is supported by a TUBITAK (The Scientific and Technological Research Council of Turkey) fellowship. This work has been supported by TUBITAK, Research Grant Number: 109T343. It has also been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under contract number HHSN261200800001E. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. Government. This research was supported (in part) by the Intramural Research Program of the NIH, National Cancer Institute, Center for Cancer Research.
Footnotes
ASSOCIATED CONTENT
Table S1. Difficult cases of the docking benchmark dataset. Table S2. Other conformations of given molecules in difficult cases of the benchmark dataset, found using 100% sequence homology. Table S3. FiberDock energies of the bound forms given in the benchmark dataset. Table S4. Alternative structures as the substitute of the bound form. Table S5. Changes in template organization. Table S6. Energy values of PRISM results obtained using standard template set. Table S7a. Favorable PRISM predictions of difficult cases using standard template set and standard target set. Table S7b. Favorable PRISM predictions of difficult cases using standard template set and enlarged target set. Table S8. Energies, IS-scores and contact fractions of PRISM predictions obtained using standard template set and standard target set. Table S9. Energy values of PRISM results obtained using rearranged templates and target sets. Table S10a. Favorable PRISM predictions of difficult cases using rearranged templates and standard target set. Table S10b. Favorable PRISM predictions of difficult cases using rearranged templates and enlarged target set. Table S11a. Energies, IS-scores and contact fractions of PRISM predictions obtained using rearranged templates and standard target set. Table S11b. Energies, IS-scores and contact fractions of PRISM predictions obtained using rearranged templates and enlarged target set. Table S12. Sequence similarity of bound and unbound forms given in the benchmark dataset. Table S13. Additional conformations of the query molecules found when the sequence homology is lowered from 100% to 95%. Table S14. Favorable PRISM predictions of difficult cases obtained using additional conformations found by 95% sequence homology. Table S15. Energies, IS-scores and contact fractions of PRISM predictions obtained using additional conformations found by 95% sequence homology. Table S16. Biological/non-biological interaction test results of predictions with favorable energies but lower IS-scores. Table S17. Different conformations of ERK interacting proteins. Table S18a. Energetically most favorable PRISM predictions of the query ERK interacting proteins. Table S18b. Energetically most favorable PRISM predictions obtained using alternative conformations of the ERK interacting proteins.
REFERENCES
- (1).Song JM, Singh M. How and when should interactome-derived clusters be used to predict functional modules and protein function? Bioinformatics. 2009;25(23):3143–3150. doi: 10.1093/bioinformatics/btp551. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (2).Glatter T, Wepf A, Aebersold R, Gstaiger M. An integrated workflow for charting the human interaction proteome: insights into the PP2A system. Molecular Systems Biology. 2009;5 doi: 10.1038/msb.2008.75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (3).Breitkreutz A, Choi H, Sharom JR, Boucher L, Neduva V, Larsen B, Lin ZY, Breitkreutz BJ, Stark C, Liu GM, Ahn J, Dewar-Darch D, Reguly T, Tang XJ, Almeida R, Qin ZS, Pawson T, Gingras AC, Nesvizhskii AI, Tyers M. A Global Protein Kinase and Phosphatase Interaction Network in Yeast. Science. 2010;328(5981):1043–1046. doi: 10.1126/science.1176495. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (4).Baker CL, Kettenbach AN, Loros JJ, Gerber SA, Dunlap JC. Quantitative Proteomics Reveals a Dynamic Interactome and Phase-Specific Phosphorylation in the Neurospora Circadian Clock. Molecular Cell. 2009;34(3):354–363. doi: 10.1016/j.molcel.2009.04.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (5).Nibbe RK, Markowitz S, Myeroff L, Ewing R, Chance MR. Discovery and Scoring of Protein Interaction Subnetworks Discriminative of Late Stage Human Colon Cancer. Molecular & Cellular Proteomics. 2009;8(4):827–845. doi: 10.1074/mcp.M800428-MCP200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (6).Kreeger PK, Lauffenburger DA. Cancer systems biology: a network modeling perspective. Carcinogenesis. 2010;31(1):2–8. doi: 10.1093/carcin/bgp261. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (7).Bell R, Hubbard A, Chettier R, Chen D, Miller JP, Kapahi P, Tarnopolsky M, Sahasrabuhde S, Melov S, Hughes RE. A Human Protein Interaction Network Shows Conservation of Aging Processes between Human and Invertebrate Species. Plos Genetics. 2009;5(3) doi: 10.1371/journal.pgen.1000414. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (8).Fossum E, Friedel CC, Rajagopala SV, Titz B, Baiker A, Schmidt T, Kraus T, Stellberger T, Rutenberg C, Suthram S, Bandyopadhyay S, Rose D, von Brunn A, Uhlmann M, Zeretzke C, Dong YA, Boulet H, Koegl M, Bailer SM, Koszinowski U, Ideker T, Uetz P, Zimmer R, Haas J. Evolutionarily Conserved Herpesviral Protein Interaction Networks. Plos Pathogens. 2009;5(9) doi: 10.1371/journal.ppat.1000570. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (9).Wiles AM, Doderer M, Ruan JH, Gu TT, Ravi D, Blackman B, Bishop AJR. Building and analyzing protein interactome networks by cross-species comparisons. Bmc Systems Biology. 2010;4 doi: 10.1186/1752-0509-4-36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (10).Chautard E, Thierry-Mieg N, Ricard-Blum S. Interaction networks: From protein functions to drug discovery. A review. Pathologie Biologie. 2009;57(4):324–333. doi: 10.1016/j.patbio.2008.10.004. [DOI] [PubMed] [Google Scholar]
- (11).Barabasi AL, Gulbahce N, Loscalzo J. Network medicine: a network-based approach to human disease. Nature Reviews Genetics. 2011;12(1):56–68. doi: 10.1038/nrg2918. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (12).West GM, Tucker CL, Xu T, Park SK, Han XM, Yates JR, Fitzgerald MC. Quantitative proteomics approach for identifying protein-drug interactions in complex mixtures using protein stability measurements. Proceedings of the National Academy of Sciences of the United States of America. 2010;107(20):9078–9082. doi: 10.1073/pnas.1000148107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (13).Bruckner A, Polge C, Lentze N, Auerbach D, Schlattner U. Yeast Two-Hybrid, a Powerful Tool for Systems Biology. International Journal of Molecular Sciences. 2009;10(6):2763–2788. doi: 10.3390/ijms10062763. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (14).Pande J, Szewczyk MM, Grover AK. Phage display: Concept, innovations, applications and future. Biotechnology Advances. 2010;28(6):849–858. doi: 10.1016/j.biotechadv.2010.07.004. [DOI] [PubMed] [Google Scholar]
- (15).Katz C, Levy-Beladev L, Rotem-Bamberger S, Rito T, Rudiger SGD, Friedler A. Studying protein-protein interactions using peptide arrays. Chemical Society Reviews. 2011;40(5):2131–2145. doi: 10.1039/c0cs00029a. [DOI] [PubMed] [Google Scholar]
- (16).Kim EDH, Sabharwal A, Vetta AR, Blanchette M. Predicting direct protein interactions from affinity purification mass spectrometry data. Algorithms for Molecular Biology. 2010;5 doi: 10.1186/1748-7188-5-34. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (17).Chen PY, Deane CM, Reinert G. Predicting and Validating Protein Interactions Using Network Structure. Plos Computational Biology. 2008;4(7) doi: 10.1371/journal.pcbi.1000118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (18).Hart GT, Ramani AK, Marcotte EM. How complete are current yeast and human protein-interaction networks? Genome Biology. 2006;7(11) doi: 10.1186/gb-2006-7-11-120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (19).von Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, Bork P. Comparative assessment of large-scale data sets of protein-protein interactions. Nature. 2002;417(6887):399–403. doi: 10.1038/nature750. [DOI] [PubMed] [Google Scholar]
- (20).Deane CM, Salwinski L, Xenarios I, Eisenberg D. Protein interactions - Two methods for assessment of the reliability of high throughput observations. Molecular & Cellular Proteomics. 2002;1(5):349–356. doi: 10.1074/mcp.m100037-mcp200. [DOI] [PubMed] [Google Scholar]
- (21).Fourme R, Girard E, Kahn R, Prange T, Dhaussy AC, Mezouar M, Ascone I. High-resolution structures and properties of biomolecules under high pressures probed by X-ray crystallography. High Pressure Research. 2010;30(1):100–103. [Google Scholar]
- (22).O’Connell MR, Gamsjaeger R, Mackay JP. The structural analysis of protein-protein interactions by NMR spectroscopy. Proteomics. 2009;9(23):5224–5232. doi: 10.1002/pmic.200900303. [DOI] [PubMed] [Google Scholar]
- (23).Zhou ZH. Atomic resolution cryo electron microscopy of macromolecular complexes. In: Ludtke SJ, Prasad BVV, editors. Advances in Protein Chemistry and Structural Biology, Vol 82: Recent Advances in Electron Cryomicroscopy, Pt B. 2011. pp. 1–35. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (24).Lipfert J, Doniach S. Small-angle X-ray scattering from RNA, proteins, and protein complexes. Annual Review of Biophysics and Biomolecular Structure. 2007;36:307–327. doi: 10.1146/annurev.biophys.36.040306.132655. [DOI] [PubMed] [Google Scholar]
- (25).Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The Protein Data Bank. Nucleic Acids Research. 2000;28(1):235–242. doi: 10.1093/nar/28.1.235. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (26).De Vries SJ, van Dijk M, Bonvin A. The HADDOCK web server for data-driven biomolecular docking. Nature Protocols. 2010;5(5):883–897. doi: 10.1038/nprot.2010.32. [DOI] [PubMed] [Google Scholar]
- (27).Garzon JI, Lopez-Blanco JR, Pons C, Kovacs J, Abagyan R, Fernandez-Recio J, Chacon P. FRODOCK: a new approach for fast rotational protein-protein docking. Bioinformatics. 2009;25(19):2544–2551. doi: 10.1093/bioinformatics/btp447. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (28).Chen R, Li L, Weng ZP. ZDOCK: An initial-stage protein-docking algorithm. Proteins-Structure Function And Genetics. 2003;52(1):80–87. doi: 10.1002/prot.10389. [DOI] [PubMed] [Google Scholar]
- (29).Li L, Guo DC, Huang YY, Liu SY, Xiao Y. ASPDock: protein-protein docking algorithm using atomic solvation parameters model. Bmc Bioinformatics. 2011;12 doi: 10.1186/1471-2105-12-36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (30).Chen HL, Skolnick J. M-TASSER: An algorithm for protein quaternary structure prediction. Biophysical Journal. 2008;94(3):918–928. doi: 10.1529/biophysj.107.114280. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (31).Gao M, Skolnick J. iAlign: a method for the structural comparison of protein-protein interfaces. Bioinformatics. 2010;26(18):2259–2265. doi: 10.1093/bioinformatics/btq404. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (32).Konc J, Janezic D. ProBiS algorithm for detection of structurally similar protein binding sites by local structural alignment. Bioinformatics. 2010;26(9):1160–1168. doi: 10.1093/bioinformatics/btq100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (33).Ogmen U, Keskin O, Aytuna AS, Nussinov R, Gursoy A. PRISM: protein interactions by structural matching. Nucleic Acids Research. 2005;33:W331–W336. doi: 10.1093/nar/gki585. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (34).Sinha R, Kundrotas PJ, Vakser IA. Docking by structural similarity at protein-protein interfaces. Proteins-Structure Function And Bioinformatics. 72010;78(15):3235–3241. doi: 10.1002/prot.22812. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (35).Tuncbag N, Gursoy A, Nussinov R, Keskin O. Predicting protein-protein interactions on a proteome scale by matching evolutionary and structural similarities at interfaces using PRISM. Nature Protocols. 2011;6(9):1341–1354. doi: 10.1038/nprot.2011.367. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (36).Tsai CJ, Lin SL, Wolfson HJ, Nussinov R. A dataset of protein-protein interfaces generated with a sequence-order-independent comparison technique. Journal of Molecular Biology. 1996;260(4):604–620. doi: 10.1006/jmbi.1996.0424. [DOI] [PubMed] [Google Scholar]
- (37).Tsai CJ, Lin SL, Wolfson HJ, Nussinov R. Protein-protein interfaces: Architectures and interactions in protein-protein interfaces and in protein cores. Their similarities and differences. Critical Reviews in Biochemistry and Molecular Biology. 1996;31(2):127–152. doi: 10.3109/10409239609106582. [DOI] [PubMed] [Google Scholar]
- (38).Tuncbag N, Gursoy A, Guney E, Nussinov R, Keskin O. Architectures and functional coverage of protein-protein interfaces. Journal of Molecular Biology. 2008;381(3):785–802. doi: 10.1016/j.jmb.2008.04.071. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (39).Keskin O, Nussinov R. Similar binding sites and different partners: Implications to shared proteins in cellular pathways. Structure. 2007;15(3):341–354. doi: 10.1016/j.str.2007.01.007. [DOI] [PubMed] [Google Scholar]
- (40).Kundrotas PJ, Zhu ZW, Janin J, Vakser IA. Templates are available to model nearly all complexes of structurally characterized proteins. Proceedings of the National Academy of Sciences of the United States of America. 2012;109(24):9438–9441. doi: 10.1073/pnas.1200678109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (41).Bennett WS, Huber R. Structural and functional-aspects of domain motions in proteins. Crc Critical Reviews in Biochemistry. 1984;15(4):291–384. doi: 10.3109/10409238409117796. [DOI] [PubMed] [Google Scholar]
- (42).Jacobs DJ, Rader AJ, Kuhn LA, Thorpe MF. Protein flexibility predictions using graph theory. Proteins-Structure Function And Genetics. 2001;44(2):150–165. doi: 10.1002/prot.1081. [DOI] [PubMed] [Google Scholar]
- (43).Keskin O, Gursoy A, Ma B, Nussinov R. Principles of protein-protein interactions: What are the preferred ways for proteins to interact? Chemical Reviews. 2008;108(4):1225–1244. doi: 10.1021/cr040409x. [DOI] [PubMed] [Google Scholar]
- (44).Ma BY, Shatsky M, Wolfson HJ, Nussinov R. Multiple diverse ligands binding at a single protein site: A matter of pre-existing populations. Protein Science. 2002;11(2):184–197. doi: 10.1110/ps.21302. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (45).Dill KA. Polymer principles and protein folding. Protein Science. 1999;8(6):1166–1180. doi: 10.1110/ps.8.6.1166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (46).Frauenfelder H, Sligar SG, Wolynes PG. The energy landscapes and motions of proteins. Science. 1991;254(5038):1598–1603. doi: 10.1126/science.1749933. [DOI] [PubMed] [Google Scholar]
- (47).Ozbabacan SEA, Gursoy A, Keskin O, Nussinov R. Conformational ensembles, signal transduction and residue hot spots: Application to drug discovery. Current Opinion in Drug Discovery & Development. 2010;13(5):527–537. [PubMed] [Google Scholar]
- (48).Tsai CJ, Del Sol A, Nussinov R. Protein allostery, signal transmission and dynamics: a classification scheme of allosteric mechanisms. Molecular Biosystems. 2009;5(3):207–216. doi: 10.1039/b819720b. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (49).Ma BY, Kumar S, Tsai CJ, Nussinov R. Folding funnels and binding mechanisms. Protein Engineering. 1999;12(9):713–720. doi: 10.1093/protein/12.9.713. [DOI] [PubMed] [Google Scholar]
- (50).Tsai CJ, Kumar S, Ma BY, Nussinov R. Folding funnels, binding funnels, and protein function. Protein Science. 1999;8(6):1181–1190. doi: 10.1110/ps.8.6.1181. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (51).Tsai CJ, Ma BY, Nussinov R. Folding and binding cascades: Shifts in energy landscapes. Proceedings Of The National Academy Of Sciences Of The United States Of America. 1999;96(18):9970–9972. doi: 10.1073/pnas.96.18.9970. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (52).Bakan A, Bahar I. Computational generation inhibitor-bound conformers of p38 map kinase and comparison with experiments. Pac Symp Biocomput. 2011:181–92. doi: 10.1142/9789814335058_0020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (53).Aytuna AS, Gursoy A, Keskin O. Prediction of protein-protein interactions by combining structure and sequence conservation in protein interfaces. Bioinformatics. 2005;21(12):2850–2855. doi: 10.1093/bioinformatics/bti443. [DOI] [PubMed] [Google Scholar]
- (54).Hwang H, Vreven T, Janin J, Weng ZP. Protein-protein docking benchmark version 4.0. Proteins-Structure Function and Bioinformatics. 2010;78(15):3111, 3114. doi: 10.1002/prot.22830. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (55).Kar G, Keskin O, Nussinov R, Gursoy A. Human Proteome-scale Structural Modeling of E2-E3 Interactions Exploiting Interface Motifs. Journal of Proteome Research. 2012;11(2):1196–1207. doi: 10.1021/pr2009143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (56).Acuner Ozbabacan SE, Keskin O, Nussinov R, Gursoy A. Enriching the human apoptosis pathway by predicting the structures of protein-protein complexes. Journal of Structural Biology. 2012;179(3):338–346. doi: 10.1016/j.jsb.2012.02.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (57).Kuzu G, Keskin O, Gursoy A, Nussinov R. Constructing structural networks of signaling pathways on the proteome scale. Current Opinion in Structural Biology. 2012;22(3):367–377. doi: 10.1016/j.sbi.2012.04.004. [DOI] [PubMed] [Google Scholar]
- (58).Kar G, Gursoy A, Keskin O. Human Cancer Protein-Protein Interaction Network: A Structural Perspective. Plos Computational Biology. 2009;5 doi: 10.1371/journal.pcbi.1000601. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (59).Tuncbag N, Keskin O, Nussinov R, Gursoy A. Fast and accurate modeling of protein-protein interactions by combining template-interface-based docking with flexible refinement. Proteins. 2011 doi: 10.1002/prot.24022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (60).Gao M, Skolnick J. New benchmark metrics for protein-protein docking methods. Proteins-Structure Function and Bioinformatics. 2011;79(5):1623–1634. doi: 10.1002/prot.22987. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (61).Pearson WR, Lipman DJ. Improved tools for biological sequence comparison. Proceedings of the National Academy of Sciences of the United States of America. 1988;85(8):2444–2448. doi: 10.1073/pnas.85.8.2444. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (62).Hubbard SJ, Thornton JM. Naccess. University College; London: 1993. [Google Scholar]
- (63).Shatsky M, Nussinov R, Wolfson HJ. A method for simultaneous alignment of multiple protein structures. Proteins-Structure Function and Bioinformatics. 2004;56(1):143–156. doi: 10.1002/prot.10628. [DOI] [PubMed] [Google Scholar]
- (64).Mashiach E, Nussinov R, Wolfson HJ. FiberDock: a web server for flexible induced-fit backbone refinement in molecular docking. Nucleic Acids Research. 2010;38:W457–W461. doi: 10.1093/nar/gkq373. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (65).Tuncbag N, Keskin O, Gursoy A. HotPoint: hot spot prediction server for protein interfaces. Nucleic Acids Research. 2010;38:W402–W406. doi: 10.1093/nar/gkq323. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (66).Baglin TP, Carrell RW, Church FC, Esmon CT, Huntington JA. Crystal structures of native and thrombin-complexed heparin cofactor II reveal a multistep allosteric mechanism. Proceedings of the National Academy of Sciences of the United States of America. 2002;99(17):11079–11084. doi: 10.1073/pnas.162232399. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (67).Tuncbag N, Kar G, Gursoy A, Keskin O, Nussinov R. Towards inferring time dimensionality in protein-protein interaction networks by integrating structures: the p53 example. Molecular Biosystems. 2009;5(12):1770–1778. doi: 10.1039/b905661k. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (68).Lensink MF, Mendez R, Wodak SJ. Docking and scoring protein complexes: CAPRI. Proteins-Structure Function and Bioinformatics. (3rd) 2007;69(4):704–718. doi: 10.1002/prot.21804. [DOI] [PubMed] [Google Scholar]
- (69).Zhu HB, Domingues FS, Sommer I, Lengauer T. NOXclass: prediction of protein-protein interaction types. Bmc Bioinformatics. 2006;7 doi: 10.1186/1471-2105-7-27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (70).Bernauer J, Bahadur RP, Rodier F, Janin J, Poupon A. DiMoVo: a Voronoi tessellation-based method for discriminating crystallographic and biological proteinprotein interactions. Bioinformatics. 2008;24(5):652–658. doi: 10.1093/bioinformatics/btn022. [DOI] [PubMed] [Google Scholar]
- (71).Duarte JM, Srebniak A, Scharer MA, Capitani G. Protein interface classification by evolutionary analysis. BMC Bioinformatics. 2012;13:334. doi: 10.1186/1471-2105-13-334. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (72).Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28(1):27–30. doi: 10.1093/nar/28.1.27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (73).Tyagi M, Hashimoto K, Shoemaker BA, Wuchty S, Panchenko AR. Large-scale mapping of human protein interactome using structural complexes. Embo Reports. 2012;13(3):266–271. doi: 10.1038/embor.2011.261. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (74).Gursoy A, Keskin O, Nussinov R. Topological properties of protein interaction networks from a structural perspective. Biochemical Society Transactions. 2008;36:1398–1403. doi: 10.1042/BST0361398. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (75).Robinson FL, Whitehurst AW, Raman M, Cobb MH. Identification of novel point mutations in ERK2 that selectively disrupt binding to MEK1. Journal of Biological Chemistry. 2002;277(17):14844–14852. doi: 10.1074/jbc.M107776200. [DOI] [PubMed] [Google Scholar]
- (76).Schaeffer HJ, Catling AD, Eblen ST, Collier LS, Krauss A, Weber MJ. MP1: A MEK binding partner that enhances enzymatic activation of the MAP kinase cascade. Science. 1998;281(5383):1668–1671. doi: 10.1126/science.281.5383.1668. [DOI] [PubMed] [Google Scholar]
- (77).Brennan DF, Dar AC, Hertz NT, Chao WCH, Burlingame AL, Shokat KM, Barford D. A Raf-induced allosteric transition of KSR stimulates phosphorylation of MEK. Nature. 2011;472(7343):366–U134. doi: 10.1038/nature09860. [DOI] [PubMed] [Google Scholar]
- (78).Catalanotti F, Reyes G, Jesenberger V, Galabova-Kovacs G, Simoes RD, Carugo O, Baccarini M. A Mek1-Mek2 heterodimer determines the strength and duration of the Erk signal. Nature Structural & Molecular Biology. 2009;16(3):294–303. doi: 10.1038/nsmb.1564. [DOI] [PubMed] [Google Scholar]
- (79).Hauge C, Frodin M. RSK and MSK in MAP kinase signalling. Journal of Cell Science. 2006;119(15):3021–3023. doi: 10.1242/jcs.02950. [DOI] [PubMed] [Google Scholar]
- (80).Carriere A, Cargnello M, Julien LA, Gao H, Bonneil E, Thibault P, Roux PP. Oncogenic MAPK signaling stimulates mTORC1 activity by promoting RSK-mediated Raptor phosphorylation. Current Biology. 2008;18(17):1269–1277. doi: 10.1016/j.cub.2008.07.078. [DOI] [PubMed] [Google Scholar]
- (81).Tsai CJ, Lin SL, Wolfson HJ, Nussinov R. Protein-protein interfaces: architectures and interactions in protein-protein interfaces and in protein cores. Their similarities and differences. Crit Rev Biochem Mol Biol. 1996;31(2):127–52. doi: 10.3109/10409239609106582. [DOI] [PubMed] [Google Scholar]
- (82).Keskin O, Nussinov R. Favorable scaffolds: proteins with different sequence, structure and function may associate in similar ways. Protein Engineering Design & Selection. 2005;18(1):11–24. doi: 10.1093/protein/gzh095. [DOI] [PubMed] [Google Scholar]
- (83).Keskin O, Ma BY, Rogale K, Gunasekaran K, Nussinov R. Protein-protein interactions: organization, cooperativity and mapping in a bottom-up Systems Biology approach. Physical Biology. 2005;2(2):S24–S35. doi: 10.1088/1478-3975/2/2/S03. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.












