Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Jan 1.
Published in final edited form as: Proteins. 2021 Aug 3;90(1):83–95. doi: 10.1002/prot.26188

Benchmarking of Structure Refinement Methods for Protein Complex Models

Jacob Verburgt 1, Daisuke Kihara 1,2,3,*
PMCID: PMC8671191  NIHMSID: NIHMS1727640  PMID: 34309909

Abstract

Protein structure docking is the process in which the quaternary structure of a protein complex is predicted from individual tertiary structures of the protein subunits. Protein docking is typically performed in two main steps. The subunits are first docked while keeping them rigid to form the complex, which is then followed by structure refinement. Structure refinement is crucial for a practical use of computational protein docking models, as it is aimed for correcting conformations of interacting residues and atoms at the interface. Here, we benchmarked the performance of eight existing protein structure refinement methods in refinement of protein complex models. We show that the fraction of native contacts between subunits is by far the most straightforward metric to improve. However, backbone dependent metrics, based on the Root Mean Square Deviation (RMSD) proved more difficult to improve via refinement.

Keywords: protein docking, protein-protein interaction, protein structure refinement, CAPRI, protein complexes

Introduction

Protein-protein interactions are fundamental to essentially all biological processes in living cells. To understand molecular mechanisms of protein-protein interaction, tremendous efforts have been paid for solving structures of these complexes by experimental methods, such as X-ray crystallography, Nuclear Magnetic Resonance (NMR), and cryogenic Electron Microscopy (cryo-EM). However, experimental approaches are resource intensive and often have difficulty in determining complex structures. To circumvent these limitations, computational protein docking can be used to predict the protein complex structures1.

There have been various protein docking methods developed, such as LZerD24, ClusPro5, HADDOCK,6, SwarmDock7, and FlexPepDock8 to name a few, and their modeling accuracy has improved in the past years9. A typical protein docking protocol has several key steps: It starts by exploring the protein docking space generating many, often tens of thousands of protein docking models. These models are usually clustered and ranked by a scoring function10,11,12. Then, at last, top-ranked models will be subjected to structure refinement. In many protein-protein docking programs the first step of docking model generation is performed with a rigid-body docking approach, which often leads to acceptable relative positioning of the ligand and receptor subunits. However, rigid body docking is unable to account for small conformational changes that occur upon binding, particularly at the interface. This high-level detail at the interface is crucial for elucidating the intricacies of the protein-protein interaction. In real-world applications, this information may include “hot-spot” residues that may be targeted in structure-based rational drug design applications for protein-protein interactions13 as well as for many structure-function studies14,15. Thus, refinement of rigidly docked protein structures is a crucial step if the structure is to be used for downstream applications.

The development of structure refinement protocols has been a focus point in the protein structure prediction field, as evidenced in the Critical Assessment of protein Structure Prediction (CASP)16, which now contains a dedicated target refinement subsection17. A diverse set of refinement techniques have been employed in CASP. These methods include genetic-algorithm based methods, e.g. Rosetta Iterative-Hybridize18, molecular dynamics (MD)-based methods, such as Galaxy-Refine19 and PREFMD20, and machine learning based methods, such as RefineD21. Although these methods showed reasonable performance in refinement for models of monomeric proteins, they cannot be carelessly extended to application for protein complex models. Along with this, there are very few refinement methods built specifically for docked protein complexes. The need of structure refinement has also been recognized in the protein docking field, as illustrated by the recently revised docking model evaluation system used in the Critical Assessment of PRediction of Interactions (CAPRI)22, which now rewards high-accuracy models with a higher score. One of the main concerns in the refinement of protein complexes is the risk of adversely decreasing the quality of the structure, especially when the input structure is already expected to be of suitable quality.

In this study, we compared eight structure refinement methods on two datasets. The methods we tested can be split into two main categories. The first category is backbone-mobile methods, which allow for movement of all atoms including backbone atoms. The second category is backbone-fixed methods, in which backbone atoms are constrained and only side-chain atoms are mobile. The former category is comprised of HADDOCK6 (which we split into two protocols), Galaxy-Refine-Complex23 (from which we considered two protocols), and an in-house CHARMM relaxation protocol used in our group24. The latter category includes Rosetta Fastrelax25, SCWRL26, and OSCAR-star27. To evaluate refined models, we used commonly used metrics that quantify the proportion of correct contacts between subunits as well as several metrics that compute the quality of the backbone trace both at the interface and in the orientation of the subunits.

We used two datasets. The first dataset consisted of optimally positioned, unbound individual subunits superimposed onto complexes from the ZDOCK protein docking benchmark set28. The reason we made the dataset in this way was because these optimally positioned models contain no bias from any single docking protocol. Optimally oriented models thus represent models created by an ideal docking protocol, in which the only refinement to be made are of conformational changes within each subunit that occur upon binding. With this dataset, we are specifically interested in the ability to produce high quality CAPRI models, and would be a final refinement step before model submission. Models that stay rigid during docking are thus of high quality without refinement, in which case we set out to test the ability of refinement methods to retain the high quality of models. The second dataset we used was based on several model datasets for recent scoring rounds in CAPRI. The purpose of this dataset is to examine whether refinement protocols are beneficial in a practical docking modeling scenario.

We found that, in general, the methods tested here are most apt to improve side-chain positioning, and that proper backbone positioning refinement is much more difficult. For backbone refinement, all the methods often move the backbone in a wrong direction, increasing the Root Mean Square Deviation (RMSD) at the interface. Due to the difficulty of moving the backbone toward the correct conformation, we observed that the more conservative refinement techniques achieved more consistent retainment of high-quality docking models.

Materials and Methods

Docking Model Datasets

We used two datasets, one based on the ZDOCK benchmark set28 and another set from CAPRI scoring docking models29.

The ZDOCK benchmark set28 version 5 contains a collection of 230 experimentally determined complex structures, in which both the bound complexes and unbound structures of the ligand and receptor subunits are available. The dataset is classified into three separate groups, based on the conformational differences that occur upon binding (rigid, intermediate, and difficult). From the total 230 complexes in the dataset, we removed 18 that had multiple chains present in the ligand subunit. To generate complexes from the unbound ligand and receptor subunits, we assumed ideal positioning (rotation and translation) of the subunits by superimposing each unbound structure onto the corresponding subunit on the bound complex. Such models are practical and resemblant of cases where predicted structures of the ligand and receptor are superimposed onto a known complex containing distant homologs of both structures. We note that generation of models in this fashion would represent an optimal case and do not encapsulate larger positional deviations that may be present in some docking protocols. From the remaining 212 complex models, we removed eight cases in which the constructed models contained entanglements between the ligand and receptor chains. Thus, we have a total of 204 complex models.

In addition, we used a dataset derived from CAPRI scoring models. The purpose of this dataset is to examine how refinement methods benefit in a practical docking scenario. From the recent targets in the CAPRI rounds 38 to 4529, we considered eight protein docking targets (T122–T125, T131–T133, and T136), discarding protein-peptide and protein-oligosaccharide targets. These scoring models were available at the CAPRI website. In CAPRI, these models were used for the scoring category. For each target, there were about 100 to 100,000 models, depending on the targets. These models were collected from participants of the modeling category, and thus a set contained models built by various methods.

Then, following our group’s docking protocol, we used our ranksum score30 and selected the top 10 best scoring models for each target. Among the 10 selected models by ranksum, we discarded “unrefinable” models that are far from the native structure and no meaningful refinement could be expected. More concretely, a model was discarded if it did not meet at least one of the three criteria necessary for the acceptable CAPRI quality (see the Evaluation Criteria section below). This process remained four targets, T122 (5mzv), T125 (5mgt), T133 (6ere), and T136 (6q6i) (the PDB ID of the targets are shown in parentheses). For the other targets, all of the selected 10 models were unrefinable.

T125 and T133 were multi-chain complexes, and we considered all different pairwise interfaces in our evaluation. T125 is a six-chain complex with two subunits of LLT1 and four subunits of NKRP1A. Therefore, there are three different interfaces, i.e. the interface between two LLT1 chains, the interface between LLT1 and NKRP1A, and the interface between two NKRP1A chains. For this target, a refinement protocol was applied to the entire complex model, and for each interface type, the best interface judged by the fraction of native contacts (fnat; see the Evaluation Criteria section below) in the unrefined (starting) model was subject to the evaluation.

T136 is a homo decamer, which had three different interface types. Since the entire complex was too large for some of the refinement methods we tested, we extracted a tetramer that makes up the three unique interfaces and then applied the refinement methods.

Overall, there were 61 refinable interfaces comprised of two from T122, in total 19 from T125, 10 from T133, and 30 from T136.

The dataset used in this study is made openly available in Zenodo at http://doi.org/10.5281/zenodo.5026936.

Model Refinement Methods

We tested eight refinement methods, which can be classified into two categories. The first being backbone-mobile methods, in which the backbone is allowed to move in the refinement process, although restraints may be added to limit the amount of movement. The second category is comprised of backbone-fixed methods, which do not apply any movement of the backbone positioning and will only allow the reorientation of side-chains. We tested five different backbone-mobile methods, and three different backbone-fixed methods for a total of eight different protein refinement methods.

Backbone-Mobile Methods

Galaxy-Refine-Complex

The Galaxy Refine-Complex program exists as a freely available binary as a part of the Seok Lab’s Galaxy software suite23. The core of the Galaxy-Refine-Complex protocol consists of iterative steps of side-chain perturbation and restrained MD-based relaxation. With default parameters, the method yields ten models, the first five with only distance restraints, and the last five with both distance and position restraints. The aforementioned restraints are applied to any Cα-Cα or N-O residue pairs of within 10 Å of each other. Restraints between the interface residues are weighted more weakly than non-interface pairs to allow for more deviation at the interface. Each of the targets were submitted to the protocol with the default conditions, after which the lowest energy model for the first protocol with only distance restraints (denoted GRC 1) and the second protocol with both distance and position restraints (denoted GRC 6) were extracted for model evaluation.

HADDOCK Explicit Solvent Refinement

The HADDOCK docking protocol is split into three distinct stages: it0: an initial rigid-body minimization; it1: semi-flexible simulated annealing; and itw: a refinement with explicit water6. The interface within HADDOCK makes it possible to skip either the first step, or the first two steps, to only perform refinement on a pre-docked complex by keeping the initial orientations of the ligand and receptor. The refinement with explicit water (denoted “HADDOCK itw”) can be isolated by skipping the first two steps and starts with 5 kcal*mol−1 Å−2 positional restraints being placed on side chain atoms at the interface. The system is explicitly solvated with a of TIP3P water shell and gradually heated to 300K over 1500 steps (with a 2 femtosecond time step), followed by an additional 5000 steps at 300K with the same restraints. A final cooling to 100K was performed over 3000 steps where restraints were only limited to non-interface backbone atoms. We followed this refinement protocol based on the refinement example in the HADDOCK repository, where a total of 20 refined structures were generated. Since the standard clustering procedure places these in a single cluster, we simply took the best model by the HADDOCK score. A combination of it1 and itw was also tested, which we denote as “HADDOCK it1”.

In-house CHARMM Relaxation

In previous rounds of CASP and CAPRI, we have subjected models to a short, restrained minimization and relaxation in the CHARMM MD engine using the CHARMM22 force field31 with FACTS implicit solvent32. The minimization consists of applying 100 kcal/mol/Å2 harmonic restraints to all Cα atoms, an initial steepest-decent minimization for 500 steps, accompanied by an additional adopted basis Newton-Raphson minimization for 1000 steps. The minimized and restrained model was then equilibrated for 20ps with a time-step of 2fs at 100K24.

Backbone-Fixed Methods

SCWRL

SCWRL4 is a side-chain rebuilding method that uses a backbone dependent rotamer library along with a Discrete Oriented Polytope collision detection26. To solve the combinatorial problem involving rotamers of all residues, SCWRL4 constructs a graph, in which all vertices are residues and lines between vertices represent possible interactions. From this graph representing all possible interactions both edge decomposition and the dead-end elimination are applied. To use SCWRL4, the side-chains of all residues were removed and rebuilt from the backbone coordinates alone.

OSCAR-star

We benchmarked another side-chain optimization method, OSCAR-star27. OSCAR-star is a discrete version of the Optimized Side Chain Atomic Energy function33 that uses a modified distance-dependent term for handling rigid rotamers. Using this orientation dependent energy function, side-chains are rebuilt using a genetic algorithm, in which a pool of 20 structures with random initial side-chain rotamers are optimized over 30 cycles. The lowest energy structure out of the 20 models was chosen as the output.

Rosetta Fastrelax

We used the Fastrelax protocol in the Rosetta suite as a conservative method for resolving side-chain conflicts within the models25. The protocol we used allows all side-chain atoms to undergo movement by applying relax cycles that consist of rounds of repacking using rotamer libraries followed by gradient base minimization in torsion angle space. During the refinement, main-chain atoms were fixed.

Evaluation Criteria

CAPRI Criteria

The standard CAPRI criteria for model quality assessment include Interface Root Mean Square Deviation (I-RMSD), Ligand Root Mean Square Deviation (L-RMSD), and the fraction of native contacts (fnat)34. fnat measures the fraction of native contacts (found in the experimental complex) that are present in the predicted complex. Here, a contact is defined as an interchain residue pair where any heavy atom in the first residue is within 5 Å of any heavy atom in the second residue. The I-RMSD measures the RMSD of the Cα atoms of any residues at the interface to the experimental structure, where the interface is defined as any residue within 10 Å of another chain. The L-RMSD measures the RMSD of the docked ligand to the experimental structure ligand when the receptors of the predicted and experimental complexes are superimposed. The overall CAPRI quality is broken into four distinct categories based on the three metrics listed above34. High quality models have an fnat of more than 0.5 and either 1) an I-RMSD less than 1 Å or 2) L-RMSD lower than 1 Å. Medium quality models have an fnat of more than 0.3, and either 1) an I-RMSD less than 2 Å or 2) L-RMSD lower than 5 Å. Acceptable quality models have an fnat of more than 0.1, and either 1) an I-RMSD less than 4 Å or 2) L-RMSD lower than 10 Å. Anything that fails to meet even acceptable quality models is then placed in the Incorrect quality.

To see an overview of the CAPRI criteria with a single metric, we also used the DockQ Score35, which normalizes and averages the I-RMSD, L-RMSD, and fnat metrics into a single, normalized score ranging between 0 (worst) and 1 (best). The normalized RMSD metrics are each normalized into a set range between 0 and 1, with 1 corresponding to an RMSD of 0, and a value close to 0 corresponding to a very large RMSD value. Intermediate RMSD values were scaled by two optimized parameters, termed d1 and d2 for L-RMSD and I-RMSD respectively and were optimized to match CAPRI quality ranges.

Other Criteria

We examined the ability of refinement methods to resolve clashes. We used a clash cutoff of 3.0 Å and calculated and summed all clashes between the ligand chain and all receptor chains for any given model. We also designed a custom, more stringent version of the fnat CAPRI criteria, which considers the deviation of a rotation angle that is defined by contacting side-chain pairs. We depict this angle, termed the μ angle, in Figure 1. This angle consists of the dihedral between the Cα-Cβ bonds of the two interacting residues and is thus only calculated for residues that contain Cβ atoms (all residues but glycine). The general distribution of μ angles across all refined models, as well as the unrefined models, is shown in Figure 1B. We denote this new criterion as fnat_μx, where x is the tolerance degree. We used 10 degrees as the cutoff, thus we evaluate fnat_μ10. For a residue contact to be considered as correct for fnat_μ10, the deviation of the μ angle between the model and the reference must be less than the tolerance degree, 10 degrees, as well as being within the 5.0 Å cutoff. Of the contacts that are made by the standard CAPRI fnat, 32.09% of those contacts (across all models) remain after considering the μ angle. The μ angle is not dependent on any side-chain rotamers, but instead can only be satisfied by proper orientation between the residues making the contact. We considered the μ angle because this angle is shown to be predicted well by deep learning in recent works36,37 and having correct μ angles can help in placing the two subunits in correct relative orientations.

Figure 1: μ Angle for torsional deviation restraints.

Figure 1:

A, Diagram depicting contact torsion angle (μ) between a contacting pair of residues. The μ angle consists of the Cα and Cβ atoms within the contacting pair. B, Distribution of μ angle deviations from all tested refinement methods, and unrefined structures.

Results

Atom Clashes

To begin this analysis, we examined the atom clashes between ligand and receptor proteins before and after each refinement method was applied. As shown in Figure 2, using a 3.0 Å cutoff, even crystal structures of the complexes have some number of atom clashes. The number of clashes is roughly proportional to the size of the interface (i.e. the number of atoms at the interface) (Fig. 2A). The average number of clashes in the crystal structures was 6.70. On the other hand, the starting unrefined complex models often have more than 40 clashes in a complex model (Fig. 2B). The average number of clashes in the unrefined models is 15.37, which is on average 2.30 times more than corresponding crystal structures.

Figure 2: Clashes present in experimental structures and unrefined models.

Figure 2:

The number of heavy atom clashes between ligand and receptor chains in the 204 experimentally solved complexes and unrefined models in the benchmark dataset. Clashes are defined as atom pairs from the ligand and receptor within 3 Å from each other. Interface atoms are those which are within 5 Å to any heavy atom from the other protein. A, Atom clashes in the bound (target) structures relative to the number of heavy atoms at the interface. B, The number of atom clashes of unrefined models and the experimentally solved complexes (bound structures). The average number of clashes in the unrefined models is 15.37, and the average number in the bound models is 6.70. The line shown is y=x.

Figure 3 shows how much each method has reduced the number of clashes. In almost all cases across all methods, the number of clashes was reduced. The percentage of clashes removed for a model was on average 31.69% for GRC1, 47.34% for GRC6, 36.37% for HADDOCK it1, 42.90% for HADDOCK itw, 42.02% for CHARMM, 22.94% for SCWRL, 28.49% for OSCAR-star, and 38.97% for Rosetta Fastrelax. The number of models where refinement reduced the number of clashes are 162 for GRC1, 182 for GRC6, 166 for HADDOCK it1, 177 for HADDOCK itw, 187 for CHARMM, 158 for SCWRL, 176 for OSCAR-star, and 193 for Rosetta Fastrelax.

Figure 3: Clash count improvement of refinement methods.

Figure 3:

The number of clashes between the receptor chain and all ligand chains was calculated for all structures across before and after the refinement.

There is a noticeable difference between the fixed-backbone methods (SCWRL, OSCAR-star, and Rosetta Fastrelax) compared to backbone-mobile methods. When the fixed backbone methods were applied, the number of atom clashes were not significantly reduced, particularly in models with many clashes (>20 clashes). When the backbone-mobile methods were applied, the number of remaining clashes was less than 20 for the majority of the cases, comparable to the level of crystal structures in Figure 2A.

All methods had several models where the number of clashes was higher in the refined structure than in the unrefined (data points above the diagonal lines in Figure 3). The number of models that have more clashes after refinement was 10, 5, 11, 7, 6, 15, 11, 5, for GRC1, GRC6, HADDOCK it1, HADDOCK itw, CHARMM, SCWRL, OSCAR-star, and Rosetta Fastrelax, respectively. In Figure 4, we examined the distances of remaining clashes in these models. Again, there is a clear difference between the backbone-mobile and the backbone-fixed methods. By the backbone-mobile methods, almost all remaining clashes were alleviated, having a distance of 2.5 Å or more, which is close to the cutoff value used in this analysis. In contrast, the backbone-fixed methods did not change the distance distribution, still having severe clashes with a very short distance.

Figure 4: Distributions of clash distances after refinement.

Figure 4:

For each refinement method, the distribution of the distances between remaining clashing atoms are shown. A bin size of 0.01 Å was used for these histograms. Gray: before refinement; Black: after refinement.

fnat

Next, we examined the changes of fnat values by applying the refinement methods (Figure 5). The percentage of models that improved by each method were 63.24%, 58.82%, 36.27%, 28.92%, 34.80%, 35.78%, 34.80%, and 31.86% for GRC1, GRC6, HADDOCK it1, HADDOCK itw, CHARMM, SCWRL, OSCAR-star, and Rosetta Fastrelax, respectively. We see a broad range of improvements in these models, with the two GRC methods making significant improvements in many of these models, and are the only models able to improve the fnat for the majority of models. The two HADDOCK methods, CHARMM, and all backbone fixed methods had similar improvements in the range of 28% to 36% of all models.

Figure 5. Improvement of fnat by the refinement methods.

Figure 5.

fnat was calculated for all models before and after applying refinement methods. The CAPRI fnat cutoffs for high, medium, and low quality models are shown in a dotted line (0.5), a dashed line (0.3), and a solid line (0.1), respectivley.

We further examined fnat_μ10, which considers both contacts and orientation of the interacting side-chains (see Methods) (Figure 6). Among the correct contacts in all the refined models by each method, the percentage of them with a μ angle within 10 degrees to the native were 26.88%, 31.31%, 28.38%, 29.94%, 31.44%, 34.86%, 35.59%, and 35.14% for GRC1, GRC6, HADDOCK it1, HADDOCK itw, CHARMM, SCWRL, OSCAR-star, and Rosetta Fastrelax, respectively. The values by the backbone-fixed methods are about 35%, a few percentage points larger than the backbone-flexible methods. Nevertheless, roughly only about one-third of correct contacts remain when the μ angle is considered. The results indicate that correcting atom contacts (fnat) does not necessarily correct the orientation of the contacting residues. Since the hinge angles of Cβ-Cα-N and Cβ-Cα-C are almost constant in different rotamers, the μ angle is independent of any side-chain rotamers of contacting residues. Deviation of the μ angle is primarily caused by a shift of Cβ atoms. To improve fnat_μ10, backbone improvements that can correctly position Cα atoms are needed.

Figure 6. Improvement of fnat_μ10 by the refinement methods.

Figure 6.

fnat_μ10 was calculated for all models before and after applying refinement methods. Although fnat_μ10 is not used in CAPRI, the three lines were drawn at the values of 0.5, 0.3, and 0.1 to be consistent with Figure 5.

I-RMSD and L-RMSD

We examine the improvement of I-RMSD and L-RMSD in this section. Since I-RMSD and L-RMSD concern backbone movement, we only discuss the five backbone-mobile methods (Figure 7 and Figure 8). As shown in the plots, the improvement of these two metrics occurred only for a small fraction of models by the methods, and the large majority of the models were deteriorated by the refinement methods. Out of 204 models, I-RMSD improved for only 31 (15.2%), 93 (45.6%), 17 (8.3%), 23 (11.3%), and 75 (36.8%) models by GRC1, GRC6, HADDOCK it1, HADDOCK itw, and CHARMM, respectively (Figure 7). In terms of L-RMSD, an improvement was observed for 5 (2.5%), 61 (29.9%), 6 (2.9%), 14 (6.9%), and 61 (29.9%) by GRC1, GRC6, HADDOCK it1, HADDOCK itw, and CHARMM, respectively (Figure 8).

Figure 7: Improvement of I-RMSD by the refinement methods.

Figure 7:

I-RMSD was calculated for all models before and after applying refinement methods. Following the CAPRI quality criteria, the three lines, a dotted line, a dashed line, and a solid line, were drawn at the values of 1.0, 2.0, and 4.0, respectively.

Figure 8: Improvement of L-RMSD by the refinement methods.

Figure 8:

L-RMSD was calculated for all models before and after applying refinement methods. Following the CAPRI quality criteria, the three lines, a dotted line, a dashed line, and a solid line, were drawn at the values of 1.0, 5.0, and 10.0, respectively.

It is clear from the plots that the refinement methods follow a general trend where a smaller deviation in the structural change results in improvement of more targets. CHARMM and GRC6, the two methods that showed improvement in I-RMSD and L-RMSD for the largest number of targets, made the smallest deviations: The average changes of I-RMSD/L-RMSD by CHARMM were 0.026 Å/0.014 Å and those for GRC6 were 0.016 Å/0.186 Å. In contrast, HADDOCK it1 and GRC1, which made an L-RMSD improvement for only 6 and 5 targets, respectively, on average moved the L-RMSD by 1.06 Å and 2.79 Å, respectively, much larger than CHARMM and GRC6.

DockQ Score

Next, we look at the DockQ score, which serves as a summary metric for the three CAPRI metrics (Figure 9). The fractions of models with an improved DockQ score by each method were 19.61%, 59.31%, 4.90%, 17.16%, 39.71%, 35.29%, 34.80%, and 31.84% for GRC1, GRC6, HADDOCK it1, HADDOCK itw, CHARMM, SCWRL, OSCAR-star, and Rosetta Fastrelax respectively. Similar to the results for individual metrics, GRC6 showed the largest fraction and CHARMM was the second.

Figure 9: Improvement of DockQ score by the refinement methods.

Figure 9:

DockQ scores were calculated from the standard CAPRI metrics before and after applying the refinement methods to the models in the ZDOCK dataset.

CAPRI Quality

Finally, we summarize the refinement results considering the changes of the CAPRI quality categories34 of the docking models after the refinement (Table 1). Note that the quality of the starting models used in this work were skewed toward the medium and the high qualities, since the starting models were optimally superimposed unbound subunits onto the bound complex structure.

Table 1.

The summary of the CAPRI quality level changes by the refinement methods.

Refined Model Quality GRC1 GRC6 HADDOCK it1 HADDOCK itw CHARMM SCRWL OSCAR-star Rosetta Fastrelax
Incorrect Input (2)
incorrect 0 1 0 1 2 2 2 2
acceptable 2 1 2 1 0 0 0 0
medium 0 0 0 0 0 0 0 0
high 0 0 0 0 0 0 0 0
Acceptable Input (11)
incorrect 0 0 0 0 0 0 0 0
acceptable 10 8 9 9 10 10 10 10
medium 1 3 2 2 1 1 1 1
high 0 0 0 0 0 0 0 0
Medium Input (94)
incorrect 0 0 0 0 0 0 0 0
acceptable 23 1 6 4 4 0 3 1
medium 71 83 88 90 90 93 89 91
high 0 10 0 0 0 1 2 2
High Input (97)
incorrect 0 0 0 0 0 0 0 0
acceptable 10 0 0 0 0 0 0 0
medium 66 9 83 53 5 4 3 6
high 21 88 14 44 92 93 94 91

The starting CAPRI quality category of the 204 models were 2, 11, 94, and 97 in the incorrect, acceptable, medium, and high quality, respectively. The table shows the number of cases each category after refinement by the eight methods.

Of all the unrefined input models, two were in the incorrect category. After refinement, four methods were able to push at least one model to a higher quality. Two methods, GRC1 and HADDOCK it1, were able to push both incorrect models to acceptable quality.

Of the 11 unrefined models that started as acceptable quality, no refinement methods inadvertently reduced the quality to incorrect. The best performance came from GRC6, which was able to push three of the models to medium quality. Both HADDOCK methods were only able to push two models to medium quality. It would be worthwhile to mention that no method deteriorated models to incorrect quality.

Slightly less than half of the unrefined models (94 models) started in the medium quality category. GRC6 had the best results with these models, pushing 10 of these models to high quality. OSCAR-star, SCWRL, and Rosetta Fastrelax all follow with between 1 and 2 models being pushed to high quality, respectively. GRC1 and HADDOCK it1 both reduced 23 and 6 models, respectively, from medium quality to acceptable quality. GRC1 dropped quality level to acceptable for 23 models. All the methods, except for SCRWL, deteriorated model quality for at least 1 model.

There are 97 unrefined models in the high quality category. With these models, we examine the ability of the refinement methods to not adversely decrease the quality. OSCAR-star, SCWRL, CHARMM, Rosetta Fastrelax, and GRC6 primarily preserved the quality of these models, only reducing 3, 4, 5, 6, and 9 models to acceptable quality, respectively. The last three methods, HADDOCK itw, GRC1, and HADDOCK it1, reduced the quality of a substantial amount of these models, with some even being pushed down to acceptable quality.

It is worthwhile to note that the three main-chain fixed methods, SCWRL, OSCAR-Star, and Rosetta Fastrelax, were, for the most part, robust in keeping the starting quality. Thus, it is worthwhile to run these methods at the end of docking modeling as the quality of the models are likely to either improve or stay the same. CHARMM was also robust in this regard, lowering the quality level only for nine cases.

Evaluation on the CAPRI scoring model dataset

In addition to the ZDOCK benchmark derived dataset, we also tested the refinement methods for top scoring docking models used in the CAPRI scoring category. The dataset includes 61 interfaces to evaluate. The DockQ improvement for these models are shown in Figure 10, with improvements observed in 44.26%, 68.65%, 27.87%, 24.59%, 45.90%, 39.34%, 50.82%, and 29.51% of the models for GRC1, GRC6, HADDOCK it1, HADDOCK itw, CHARMM, SCWRL, OSCAR-star, and Rosetta Fastrelax, respectively. Compared to the DockQ results on the ZDOCK dataset (Fig. 9), all the methods except for Rosetta Fastrelax had a larger fraction. We see that these values follow approximately the same trend as was seen in our dataset, with particular deviation in loosely restrained methods, such as GRC1. The larger backbone deviations that GRC1 makes are not nearly as detrimental in these models as they are in the ZDOCK dataset.

Figure 10: Improvement of DockQ score on CAPRI derived dataset.

Figure 10:

The DockQ scores were calculated before and after applying the refinement methods to the CAPRI model derived dataset, which have 61 interfaces.

In Table 2, we summarized if the refined models have changed the CAPRI quality levels. GRC6 improved six acceptable quality models to medium quality with one high quality model deteriorated into medium. SCRWL and OSCAR-Star moved one and two acceptable models, respectively, up to medium level without deteriorating any model down to a lower level. CHARMM and Rosetta Fastrelax did not change the quality classification summary. The other methods made results worse by applying refinement.

Table 2.

The summary of the CAPRI quality level changes on the CAPRI scoring models dataset.

Model Quality Input GRC1 GRC6 HADDOCK it1 HADDOCK itw CHARMM SCRWL OSCAR-star Rosetta Fastrelax
incorrect 3 3 3 4 3 3 3 3 3
acceptable 17 28 11 23 20 17 16 15 17
medium 39 30 46 34 38 39 40 41 39
high 2 0 1 0 0 2 2 2 2

The starting CAPRI quality category of the 61 interfaces were 3, 17, 39, and 2 in the incorrect, acceptable, medium, and high quality, respectively, as shown in the Input column. The table shows the number of cases each category after refinement by the eight methods. The numbers that show improvement over the input models are shown in bold.

Discussion

We tested eight different protein structure refinement methods on optimally docked protein complexes for their ability to produce and retain the quality of the models. We investigated the changes in atom clashes, fnat, I-RMSD, and L-RMSD, the metrics used in CAPRI. We also introduced a more stringent criterion for fnat, fnat_μ10, which considers the orientation of residue contacts in our evaluation. Having correct residue interaction angles in protein docking would certainly help improve the relative orientation of two chains and forming an accurate docking interface. The importance of residue interaction angles has also been demonstrated in statistical potentials for evaluating protein structure models38,39. As methods for protein docking and protein structure prediction continue to improve, it may be worthwhile to start evaluating fnat with interaction angles with fnat_μx routinely in docking model evaluation. Among the metrics we examined, atom clashes and fnat were improved for many cases, while I-RMSD and L-RMSD were difficult to improve.

Although backbone movement is important for structure refinement, the current methods showed difficulty in producing favorable deviations in the backbone. We observed that the backbone-mobile refinement methods often moved main-chains in wrong directions, deteriorating RMSD. These conclusions might be magnified by the nature of our dataset, where many starting models were already of good quality and further improvement or keeping the quality level were mainly examined.

The common drive of the current methods has been classical MD or coarse-grained dynamics simulation. As we observe a rapid development and improvement of deep-learning-based structure modeling methods and quality assessment methods in CASP36,37,40,41 and CAPRI11,42, we expect that new structure refinement strategies using deep learning will appear in the near future.

Data Availability Statement

The dataset used in this study is made openly available in Zenodo at http://doi.org/10.5281/zenodo.5026936.

Acknowledgements

This work was partly supported by the National Institutes of Health (R01GM133840, R01GM123055) and the National Science Foundation (DMS1614777, CMMI1825941, and MCB1925643). JV is supported by NIGMS-funded predoctoral fellowship (T32 GM132024).

Footnotes

Conflict of Interest

The authors declare no conflict of interest.

References

  • 1.Aderinwale T, Christoffer CW, Sarkar D, Alnabati E, Kihara D. Computational structure modeling for diverse categories of macromolecular interactions. Curr Opin Struct Biol. 2020;64:1–8. doi: 10.1016/j.sbi.2020.05.017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Li B, Kihara D. Protein docking prediction using predicted protein-protein interface. BMC Bioinformatics. 2012;13(1):7. doi: 10.1186/1471-2105-13-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Venkatraman V, Yang YD, Sael L, Kihara D. Protein-protein docking using region-based 3D Zernike descriptors. BMC Bioinformatics. 2009;10(1):407. doi: 10.1186/1471-2105-10-407 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Christoffer C, Chen S, Bharadwaj V, et al. LZerD webserver for pairwise and multiple protein-protein docking. Nucleic Acids Res. Published online May 8, 2021. doi: 10.1093/nar/gkab336 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Comeau SR, Gatchell DW, Vajda S, Camacho CJ. ClusPro: a fully automated algorithm for protein–protein docking. Nucleic Acids Res. 2004;32(suppl_2):W96–W99. doi: 10.1093/nar/gkh354 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Dominguez C, Boelens R, Bonvin AMJJ. HADDOCK: A Protein–Protein Docking Approach Based on Biochemical or Biophysical Information. J Am Chem Soc. 2003;125(7):1731–1737. doi: 10.1021/ja026939x [DOI] [PubMed] [Google Scholar]
  • 7.Moal IH, Bates PA. SwarmDock and the Use of Normal Modes in Protein-Protein Docking. Int J Mol Sci. 2010;11(10):3623–3648. doi: 10.3390/ijms11103623 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Khramushin A, Marcu O, Alam N, et al. Modeling beta-sheet peptide-protein interactions: Rosetta FlexPepDock in CAPRI rounds 38–45. Proteins Struct Funct Bioinforma. 2020;88(8):1037–1049. doi: 10.1002/prot.25871 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Lensink MF, Velankar S, Kryshtafovych A, et al. Prediction of homoprotein and heteroprotein complexes by protein docking and template-based modeling: A CASP-CAPRI experiment. Proteins Struct Funct Bioinforma. 2016;84(S1):323–348. doi: 10.1002/prot.25007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Huang S-Y, Zou X. An iterative knowledge-based scoring function for protein-protein recognition. Proteins. 2008;72(2):557–579. doi: 10.1002/prot.21949 [DOI] [PubMed] [Google Scholar]
  • 11.Wang X, Terashi G, Christoffer CW, Zhu M, Kihara D. Protein docking model evaluation by 3D deep convolutional neural networks. Bioinformatics. 2020;36(7):2113–2118. doi: 10.1093/bioinformatics/btz870 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Andreani J, Faure G, Guerois R. InterEvScore: a novel coarse-grained interface scoring function using a multi-body statistical potential coupled to evolution. Bioinformatics. 2013;29(14):1742–1749. doi: 10.1093/bioinformatics/btt260 [DOI] [PubMed] [Google Scholar]
  • 13.Shin W-H, Kumazawa K, Imai K, Hirokawa T, Kihara D. Current Challenges and Opportunities in Designing Protein-Protein Interaction Targeted Drugs. Advances and Applications in Bioinformatics and Chemistry. doi: 10.2147/AABC.S235542 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Du D, Wang-Kan X, Neuberger A, et al. Multidrug efflux pumps: structure, function and regulation. Nat Rev Microbiol. 2018;16(9):523–539. doi: 10.1038/s41579-018-0048-6 [DOI] [PubMed] [Google Scholar]
  • 15.Zheng N, Shabek N. Ubiquitin Ligases: Structure, Function, and Regulation. Annu Rev Biochem. 2017;86(1):129–157. doi: 10.1146/annurev-biochem-060815-014922 [DOI] [PubMed] [Google Scholar]
  • 16.Kryshtafovych A, Schwede T, Topf M, Fidelis K, Moult J. Critical assessment of methods of protein structure prediction (CASP)—Round XIII. Proteins Struct Funct Bioinforma. 2019;87(12):1011–1020. doi: 10.1002/prot.25823 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Read RJ, Sammito MD, Kryshtafovych A, Croll TI. Evaluation of model refinement in CASP13. Proteins Struct Funct Bioinforma. 2019;87(12):1249–1262. doi: 10.1002/prot.25794 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Park H, Ovchinnikov S, Kim DE, DiMaio F, Baker D. Protein homology model refinement by large-scale energy optimization. Proc Natl Acad Sci. 2018;115(12):3054–3059. doi: 10.1073/pnas.1719115115 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Heo L, Park H, Seok C. GalaxyRefine: protein structure refinement driven by side-chain repacking. Nucleic Acids Res. 2013;41(W1):W384–W388. doi: 10.1093/nar/gkt458 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Heo L, Feig M. PREFMD: a web server for protein structure refinement via molecular dynamics simulations. Bioinformatics. 2018;34(6):1063–1065. doi: 10.1093/bioinformatics/btx726 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Bhattacharya D refineD: improved protein structure refinement using machine learning based restrained relaxation. Bioinformatics. 2019;35(18):3320–3328. doi: 10.1093/bioinformatics/btz101 [DOI] [PubMed] [Google Scholar]
  • 22.Janin J, Henrick K, Moult J, et al. CAPRI: A Critical Assessment of PRedicted Interactions. Proteins Struct Funct Bioinforma. 2003;52(1):2–9. doi: 10.1002/prot.10381 [DOI] [PubMed] [Google Scholar]
  • 23.Heo L, Lee H, Seok C. GalaxyRefineComplex: Refinement of protein-protein complex model structures driven by interface repacking. Sci Rep. 2016;6(1):32153. doi: 10.1038/srep32153 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Christoffer C, Terashi G, Shin W-H, et al. Performance and enhancement of the LZerD protein assembly pipeline in CAPRI 38–46. Proteins Struct Funct Bioinforma. 2020;88(8):948–961. doi: 10.1002/prot.25850 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Conway P, Tyka MD, DiMaio F, Konerding DE, Baker D. Relaxation of backbone bond geometry improves protein energy landscape modeling. Protein Sci. 2014;23(1):47–55. doi: 10.1002/pro.2389 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Krivov GG, Shapovalov MV, Dunbrack RL. Improved prediction of protein side-chain conformations with SCWRL4. Proteins Struct Funct Bioinforma. 2009;77(4):778–795. doi: 10.1002/prot.22488 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Liang S, Zheng D, Zhang C, Standley DM. Fast and accurate prediction of protein side-chain conformations. Bioinformatics. 2011;27(20):2913–2914. doi: 10.1093/bioinformatics/btr482 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Vreven T, Moal IH, Vangone A, et al. Updates to the Integrated Protein–Protein Interaction Benchmarks: Docking Benchmark Version 5 and Affinity Benchmark Version 2. J Mol Biol. 2015;427(19):3031–3041. doi: 10.1016/j.jmb.2015.07.016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Lensink MF, Nadzirin N, Velankar S, Wodak SJ. Modeling protein-protein, protein-peptide, and protein-oligosaccharide complexes: CAPRI 7th edition. Proteins Struct Funct Bioinforma. 2020;88(8):916–938. doi: 10.1002/prot.25870 [DOI] [PubMed] [Google Scholar]
  • 30.Peterson LX, Shin W-H, Kim H, Kihara D. Improved performance in CAPRI round 37 using LZerD docking and template-based modeling with combined scoring functions. Proteins Struct Funct Bioinforma. 2018;86(S1):311–320. doi: 10.1002/prot.25376 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Brooks BR, Brooks CL, Mackerell AD, et al. CHARMM: The biomolecular simulation program. J Comput Chem. 2009;30(10):1545–1614. doi: 10.1002/jcc.21287 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Haberthür U, Caflisch A. FACTS: Fast analytical continuum treatment of solvation. J Comput Chem. 2008;29(5):701–715. doi: 10.1002/jcc.20832 [DOI] [PubMed] [Google Scholar]
  • 33.Liang S, Zhou Y, Grishin N, Standley DM. Protein Side Chain Modeling with Orientation Dependent Atomic Force Fields Derived by Series Expansions. J Comput Chem. 2011;32(8):1680–1686. doi: 10.1002/jcc.21747 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Méndez R, Leplae R, Maria LD, Wodak SJ. Assessment of blind predictions of protein–protein interactions: Current status of docking methods. Proteins Struct Funct Bioinforma. 2003;52(1):51–67. doi: 10.1002/prot.10393 [DOI] [PubMed] [Google Scholar]
  • 35.Basu S, Wallner B. DockQ: A Quality Measure for Protein-Protein Docking Models. PLOS ONE. 2016;11(8):e0161879. doi: 10.1371/journal.pone.0161879 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Yang J, Anishchenko I, Park H, Peng Z, Ovchinnikov S, Baker D. Improved protein structure prediction using predicted interresidue orientations. Proc Natl Acad Sci. 2020;117(3):1496–1503. doi: 10.1073/pnas.1914677117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Jain A, Terashi G, Kagaya Y, Maddhuri Venkata Subramaniya SR, Christoffer C, Kihara D. Analyzing effect of quadruple multiple sequence alignments on deep learning based protein inter-residue distance prediction. Sci Rep. 2021;11(1):7574. doi: 10.1038/s41598-021-87204-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Zhou H, Skolnick J. GOAP: A Generalized Orientation-Dependent, All-Atom Statistical Potential for Protein Structure Prediction. Biophys J. 2011;101(8):2043–2052. doi: 10.1016/j.bpj.2011.09.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Xu G, Ma T, Zang T, Sun W, Wang Q, Ma J. OPUS-DOSP: A Distance- and Orientation-Dependent All-Atom Potential Derived from Side-Chain Packing. J Mol Biol. 2017;429(20):3113–3120. doi: 10.1016/j.jmb.2017.08.013 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.AlQuraishi M AlphaFold at CASP13. Bioinformatics. 2019;35(22):4862–4865. doi: 10.1093/bioinformatics/btz422 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Gao M, Zhou H, Skolnick J. DESTINI: A deep-learning approach to contact-driven protein structure prediction. Sci Rep. 2019;9(1):3514. doi: 10.1038/s41598-019-40314-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Wang X, Flannery ST, Kihara D. Protein Docking Model Evaluation by Graph Neural Networks. bioRxiv. Published online December 31, 2020:2020.12.30.424859. doi: 10.1101/2020.12.30.424859 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The dataset used in this study is made openly available in Zenodo at http://doi.org/10.5281/zenodo.5026936.

RESOURCES