Efficient flexible backbone protein–protein docking for challenging targets

Nicholas A Marze; Shourya S Roy Burman; William Sheffler; Jeffrey J Gray

doi:10.1093/bioinformatics/bty355

. 2018 Apr 30;34(20):3461–3469. doi: 10.1093/bioinformatics/bty355

Efficient flexible backbone protein–protein docking for challenging targets

Nicholas A Marze ^1,¹, Shourya S Roy Burman ^1,¹, William Sheffler ^2,³, Jeffrey J Gray ^1,^4,^5,^6,^✉

Editor: Alfonso Valencia

PMCID: PMC6184633 PMID: 29718115

Abstract

Motivation

Binding-induced conformational changes challenge current computational docking algorithms by exponentially increasing the conformational space to be explored. To restrict this search to relevant space, some computational docking algorithms exploit the inherent flexibility of the protein monomers to simulate conformational selection from pre-generated ensembles. As the ensemble size expands with increased flexibility, these methods struggle with efficiency and high false positive rates.

Results

Here, we develop and benchmark RosettaDock 4.0, which efficiently samples large conformational ensembles of flexible proteins and docks them using a novel, six-dimensional, coarse-grained score function. A strong discriminative ability allows an eight-fold higher enrichment of near-native candidate structures in the coarse-grained phase compared to RosettaDock 3.2. It adaptively samples 100 conformations each of the ligand and the receptor backbone while increasing computational time by only 20–80%. In local docking of a benchmark set of 88 proteins of varying degrees of flexibility, the expected success rate (defined as cases with ≥50% chance of achieving 3 near-native structures in the 5 top-ranked ones) for blind predictions after resampling is 77% for rigid complexes, 49% for moderately flexible complexes and 31% for highly flexible complexes. These success rates on flexible complexes are a substantial step forward from all existing methods. Additionally, for highly flexible proteins, we demonstrate that when a suitable conformer generation method exists, the method successfully docks the complex.

Availability and implementation

As a part of the Rosetta software suite, RosettaDock 4.0 is available at https://www.rosettacommons.org to all non-commercial users for free and to commercial users for a fee.

Supplementary information

Supplementary data are available at Bioinformatics online.

1 Introduction

Proteins bind each other in a highly specific and regulated manner. Often, a change in conformation from the unbound to the bound state forms the basis of the protein’s specificity and function in its interaction (Chu et al., 2013; Hwang et al., 2010; Vakser, 2014; Vreven et al., 2015). Since the beginning of the field (Janin and Wodak, 1978), conformational changes in proteins induced by binding have confounded protein–protein docking algorithms by greatly increasing the degrees of freedom to be sampled. While rotamer libraries have alleviated the sampling challenges for surface side chains (Krivov et al., 2009), backbone flexibility remains the principal challenge in protein–protein docking. Previous studies have found limited success by varying the backbone along a restricted set of coordinates (Mashiach et al., 2010; Moal and Bates, 2010; Venkatraman and Ritchie, 2012) or interface residues (Schindler et al., 2015; Wang et al., 2005) or by docking a small number of backbone conformations of the two partners (Chaudhury and Gray, 2008; Grünberg et al., 2004; Trellet et al., 2013; Zhang et al., 2017). The most recent rounds of the blind docking challenge, Critical Assessment of PRediction of Interactions (CAPRI), demonstrated that protein flexibility is still a community-wide weakness, with flexible target complexes eliciting no successful predictions from any method (Lensink et al., 2016, 2017).

Flexible-backbone docking, as well as other key remaining protein–protein docking challenges such as global docking and docking of large multi-domain complexes, demands more algorithmic complexity to explore a larger conformational search space than rigid-body docking of small proteins (Kuroda and Gray, 2016). Coarse-graining is commonly used to model longer time-scales and larger systems in a rapid, yet meaningful manner (Baaden and Marrink, 2013; Kmiecik et al., 2016). Score functions designed to navigate this reduced space smoothen the energy landscape to avoid getting stuck in local minima. While allowing orders-of-magnitude more conformational sampling, coarse-grained models are limited by their accuracy and typically require high-resolution refinement.

The consensus on the kinetic mechanism of many conformational changes is that the protein monomers exist in an equilibrium of multiple conformations from which the preferred conformations are selected during an initial encounter with the binding partner, and subsequently, localized structural rearrangements stimulated by the partner tightens the binding (Changeux and Edelstein, 2011; Vogt and Di Cera, 2012). The former mechanism is called conformational selection, and it lends itself to coarse-graining as the discrete conformations can be individually sampled. However, large conformational ensembles of flexible proteins multiply the computational demand and increase the false positive rates. Previous studies have used experimental data to create a minimal ensemble that captures the observed flexibility (Xu and Lill, 2012), or have selected optimal conformations from a large ensemble with a priori knowledge of the native orientation (Pallara et al., 2016), but these data are seldom available. Thus, it is desirable to have a coarse-grained method that efficiently samples a sizeable ensemble while distinguishing spurious interfaces from the native interface. Smaller changes caused by induced fit are less suitable to be modeled at this resolution, but are more amenable to full-atom modeling.

RosettaDock has been among the top-performing methods for computational protein–protein docking (Chaudhury et al., 2007; Daily et al., 2005; Gray et al., 2003a; Kilambi et al., 2013; Sircar et al., 2010). Combining coarse-grained conformational selection with full-atom induced fit, RosettaDock 3.2 achieved successful docking predictions on a majority of rigid complexes (58%) in the Docking Benchmark 3.0 set (Chaudhury et al., 2011). On the more flexible targets, however, RosettaDock (like other methods) performed poorly, only achieving a successful docking prediction on 29% of the moderately flexible complexes and 14% of the highly flexible complexes. The performance in CAPRI rounds since the last advances mimicked the benchmark performance (Marze et al., 2017). For flexible docking, the current protocol relies on sampling a pre-generated ensemble of monomer backbone conformations (Chaudhury and Gray, 2008), but increasing the ensemble size beyond 20 conformers is computationally infeasible. Additionally, the ‘centroid’ score function used to discriminate near-native conformations from incorrect ones is not sufficiently accurate in the coarse-grained phase, where the search is the broadest (Zhang et al., 2013).

In this study, we pursued two avenues to address these computational limitations. First, to improve sampling efficiency, we developed a fast and scalable backbone sampling algorithm, Adaptive Conformer Selection (ACS), that modulates the frequency of conformer selection for each partner depending on the size and diversity of the ensemble. Second, to improve scoring efficiency, we developed a fast and accurate scoring method, Motif Dock Score (MDS), based on the residue-pair transform (RPX) score, which was recently developed to design hydrophobic symmetric protein interfaces (Fallas et al., 2017). RPX score evaluates residue pairs using the 6D transformation needed to superimpose the residues' N–C_α–C backbone atoms onto each other. In a single lookup, RPX score queries this transformation against a pre-tabulated database of aliphatic amino acid pairs and their corresponding geometries and full-atom Rosetta scores. The pair score and sequence of the best amino acid pair from the database are then assigned to the queried residue pair. We derived and optimized MDS from the RPX basis in the context of the RosettaDock protocol, expanding it to all twenty amino acids and selecting for enrichment of near-native candidate structures.

We tested RosettaDock 4.0, which contains both ACS and MDS enhancements, on a subset of Docking Benchmark 5.0 (Vreven et al., 2015) to evaluate the relative performance versus RosettaDock 3.2, and other commonly used docking protocols. The performance in both the full benchmark set and the three flexibility-based subsets (rigid, medium-flexible and highly flexible) showed significant improvements, most notably among previously intractable flexible-backbone complexes.

2 Materials and methods

2.1 Motif querying

To create the score tables for motif dock score, we culled the Protein Data Bank (Berman et al., 2000) for all crystal structures containing two or more interacting protein chains and a resolution of 3.0 Å or better (detailed in Supplementary Method S1). Each of the 154 955 protein–protein complex structures in the protein interface set was loaded into Rosetta and scored with a full-atom score function; the resultant energies were decomposed onto the set of interacting residue pairs. The system was queried for cross-chain pairs of residues with C_β atoms (C_α for glycine) within 10 Å of each other with a pair score below a constant energy cut-off (typically 0 kcal/mol; i.e. residue pairs that are net-attractive). For each residue pair in the filtered residue set, we calculated the six-dimensional transform needed to superimpose one amino acid backbone onto the other (three-dimensional Cartesian translation and three-dimensional Euler angle rotation). Each pair score was stored with its corresponding 6D-transform as a one-line motif.

2.2 Score grid generation

A score grid is initialized with a translational and rotational grid size. One by one, motifs are analyzed. The motif 6D-transform is binned, and the corresponding bin in the score grid is queried. If the bin is empty, the motif score is saved as the bin score. If the bin is populated, the old bin score and the motif score are compared, and the lower of the two is saved as the new bin score (see Supplementary Method S4 for further details).

2.3 Scoring with motif dock score

RosettaDock 4.0 uses the same algorithmic framework as RosettaDock 3.2 described previously (Gray et al., 2003b), with modernizations described in thereafter (Chaudhury and Gray, 2008; Chaudhury et al., 2011; Marze et al., 2017). The standard low-resolution score function (interchain_cen) is replaced with a motif-based score function, called motif_dock_score. The score function consists of a new scoring term, motif_dock and a clash penalty (interchain_vdw). The motif_dock term is a residue pair energy that acts only on cross-chain residue pairs with C_α atoms within 10 Å of each other. The residue pairs are scored by calculating their 6D-transform, converting this to the hash value of the corresponding 6D bin, querying the hash table and reporting the bin score. If the bin is empty (i.e. there are no matches for the hash), the pair score will either be zero if no penalty is used, or 0.5 kcal/mol, if a penalty is used.

2.4 Generation of backbone ensembles

To generate diversity in backbone conformations for the RosettaDock 4.0 runs, we used three conformer generation methods: perturbation of the backbones along the normal modes by 1 Å (Atilgan et al., 2001) [using RosettaScripts (Fleishman et al., 2011)], refinement using the Relax protocol in Rosetta (Tyka et al., 2011), and backbone flexing using the Rosetta Backrub protocol (Smith and Kortemme, 2008). (Detailed protocols and complete command lines are provided in Supplementary Method S5.) Since the normal mode analysis generated the largest deviations, we used 40 normal mode conformers, 30 Relax conformers and 30 Backrub conformers to comprise the ensemble of 100 conformers.

2.5 Local docking simulations

Docking simulations were performed using two versions of RosettaDock, viz. 3.2 (Chaudhury and Gray, 2008) and 4.0 (developed in this article). The sampling and scoring enhancements implemented in version 4.0 have been implemented in the low-resolution stage. The starting structure was generated by superimposing unbound monomers on the bound structure, moving them 15 Å apart, and rotating the smaller partner by 60° to scramble the interface. For each trajectory, a Gaussian random 3 Å and 8° perturbation provided different starting states. This allowed a broad local search. For motif dock score optimization and benchmarking runs, 10 000 and 5000 decoys were generated per target, respectively. (Detailed protocols and complete command lines are provided in Supplementary Method S6.) Global docking results have been briefly discussed in Supplementary Result S1 and Figure S8.

2.6 Benchmark evaluation and success metrics

We evaluated the results of the docking benchmark runs using two types of metrics: a top-scoring near-native model count (N#) and near-native enrichment values (E_N%). We define N# as the number of near-native decoys among a set number (#) of top-scoring decoys after the high-resolution stage, analogous to the N5 metric used in previous studies (Chaudhury et al., 2011). Docking runs with N# values above a given threshold are categorized as ‘successful’. For N5, we define 3 near-native decoys as a success when evaluating docking protocols. To be counted as near-native, the high resolution models must meet the standard criteria for a CAPRI acceptable, medium-quality or high-quality model (elaborated in Supplementary Method S3). We also use N50, N100, N500 and N1000 (success thresholds of 15, 30, 75 and 150, respectively) to measure the sampling rates of near-native models in our top 1% and top 10% of models, respectively. Enrichment values are defined as:

E_{N %} = \frac{\frac{# near - native in top N %}{# decoys in top N %}}{\frac{# near - native}{# decoys}}

We use E_1% and E_10% to measure the ability of our scoring methods to enrich a model set. We calculated the expected value of N# and E_N% metrics by bootstrapping, i.e. resampling with replacement from the available model set a number of models equal to the size of the set. This process was repeated 1000 times, and bootstrapped averages are denoted by ⟨·⟩.

3 Results

RosettaDock is a Monte Carlo-plus-minimization algorithm (Li and Scheraga, 1987) consisting of a low-resolution stage, which simulates conformer selection during the formation of the encounter complex, followed by a high-resolution stage, which simulates induced fit in the bound complex (Chaudhury and Gray, 2008; Gray et al., 2003b). To produce a variety of starting states for the different trajectories, the ligand (the smaller protein) is first randomly rotated and translated about the receptor (the larger protein). In the low-resolution stage, side chains are replaced by coarse-grained ‘pseudoatoms’, allowing the ligand to efficiently sample the interface by rigid-body movements in a smoothened energy landscape. These rigid-body moves are coupled with backbone conformation swaps where the current backbone conformations of the ligand and the receptor are swapped with different ones from a pre-generated ensemble of conformations. In the high-resolution stage, the side chains are reintroduced to the putative encounter complex and those at the interface are packed for tight binding. There is minimal rigid-body motion in this second stage.

3.1 Adaptive conformer selection

The previous version of RosettaDock, version 3.2, was optimized to handle small ensembles and hence had a fixed number conformation swaps. This choice led to reduced sampling of near-bound conformations as the ensembles grew larger. In RosettaDock 4.0, we alleviate this problem by modulating the number of conformer swaps depending on the swap acceptance rate of the previous cycle. If the acceptance rate of the conformer swaps is under 30%, the ensemble is presumed to be large and diverse, and hence the probability of the conformer swap is increased by 25%; conversely, if the acceptance rate is 30% or more, the probability is reduced by 25%. This adjustment helps prevent unnecessary backbone sampling for small ensembles and those with similar backbones while increasing backbone sampling for diverse ensembles by up to 477% over the course of 8 cycles. We call this backbone variation method Adaptive Conformer Selection (ACS). Figure 1A shows the variation in conformer sampling frequencies for an example case of the ClpA chaperone: Clp protease adapter complex (PDB: 1R6Q), where the unbound to bound deviation of the C_α atoms at the interface is 1.4 Å for the chaperone and 2.0 Å for the protease. In this case, the protocol adapts to enable more trials of the protease backbone conformer swaps, and to a lesser effect the chaperone too.

Fig. 1. — Amount of backbone sampling in RosettaDock 4.0. (A) Modulation of backbone conformer swap trials in Rosetta 4.0 for each of the first 8 cycles of Monte Carlo moves in the low-resolution search stage. The dashed line indicates the number of trials for each of the different moves in RosettaDock 3.2. Adaptive conformer selection in RosettaDock 4.0 ensures increased backbone swapping frequency for Clp protease adapter over ClpA chaperone, which is less flexible at the interface. (B) Comparison of the number of self-swaps versus swaps to other conformations in RosettaDock 3.2 versus Rosetta 4.0 for the highly flexible CCS metallochaperone: superoxide dismutase complex. RosettaDock 4.0 has increased backbone sampling both in the number and fraction of other conformations sampled

Previously, to determine which backbone was to be swapped in during conformer swapping, RosettaDock calculated the partition function of the entire ensemble of backbones superimposed along the protein–protein interface. The constraints of the interface, steric and otherwise, penalized conformations with backbone variations near the interface, creating a high probability for the existing backbone to be reselected during the conformer swap. In the case of superoxide dismutase (PDB: 1JK9), 36% of the backbone swaps were self-swaps (Fig. 1B). Moreover, if there are n₁ conformations of the receptor and n₂ conformations of the ligand, the partition function calculation required O(n₁•n₂) time, which meant that it required 10³ times longer for ensembles with 100 conformations each than for ensembles with 1 receptor conformation and 10 ligand conformations (Chaudhury and Gray, 2008). We replaced this expensive partition function calculation with random conformer swaps, speeding up the protocol by as much as 12-fold and reducing self-swapping to 8% (approximately the inverse of the size of the ensemble).

We examined the importance of using large and varied ensembles and concluded that using efficient sampling of ensembles generated by different, diverse methods yielded higher quality docked structures for many complexes (Supplementary Result S2 and Fig. S9).

3.1.1 Efficiency of conformer selection

ACS made RosettaDock 4.0 marginally faster than RosettaDock 3.2 for simulations with small ensembles of 1 receptor and 10 ligand conformations (Supplementary Fig. S1). The speed-up was pronounced when the ensembles of both partners have 100 conformations each. For protein complexes larger than 1000 total residues, for example, eEF2-ETA-bTAD complex (PDB: 1ZM4) with 204 residues in the ligand and 822 residues in the receptor, ACS was over 12 times faster than RosettaDock 3.2 (Fig. 2). Thus, the ACS method scales up practically for larger ensembles.

Fig. 2. — Time comparison of the docking protocols for large ensembles. Average time per decoy for RosettaDock 3.2 (x) and 4.0 (+) with ensembles having 100 receptor and 100 ligand conformations for complexes ranging from 191 to 1026 total residues. Adaptive Conformer Sampling makes RosettaDock 4.0 up to 12 times faster for cases with large interfaces

3.2 Optimization and benchmarking of motif dock score

For the recognition of the native interface during the broad, low-resolution search, docking requires a score function with predictive accuracy close to that of the well-tested full-atom score function. In earlier versions of RosettaDock, the low-resolution ‘centroid’ score function relied on a single distance between potential interacting residues to score inter-chain contacts. This one-dimensional information was insufficient to represent the relative orientation of the two residues and consequently, their interaction. A statistical potential derived by using two inter-residue distances (C_α–C_α and C_β–C_β) showed remarkable accuracy on Bcl-2 affinity predictions (DeBartolo et al., 2012), suggesting that with more information on relative orientation, it could be possible to distinguish native interfaces without representing the side chain in full. With this idea in mind, we developed Motif Dock Score (MDS) based on the residue-pair transform (RPX) framework (Fallas et al., 2017) for interface design.

MDS calculates the 6-dimensional transform (3 rotations and 3 translations) needed to superimpose the backbone atoms of interacting residues, looks up the residue pair score from pre-generated tables, and sums scores over all such pairs. Each entry in these tables is the lowest full-atom score calculated for a pair of interface residues in the bin for the given relative backbone orientation. MDS depends on a discrete space tabulation of all-atom energies; therefore, we optimized the bin size of the scoring grid to 2 Å/22.5° (Supplementary Figs S2A, S3, S4). We also tested alternate underlying score functions to generate the residue pair motifs and recognized that the current Rosetta standard, REF15 (Alford et al., 2017; Park et al., 2016) had the highest average near-native enrichment of all score grids tested (Supplementary Figs S2B, S5, S6). Lastly, we added a van der Waals repulsive term to prevent protein partners from embedding in each other. (See Supplementary Method S4 for details on optimization.)

To evaluate the accuracy of local docking using MDS, we compared its performance against a baseline method, RosettaDock 3.2’s centroid low-resolution docking mode, on a representative, nine-target benchmark set (set 2, see Supplementary Method S2). For each of the two algorithms, we generated 10 000 candidate structures per complex. As examples, Figure 3 shows the Ras: RALGDS domain complex (PDBID: 1LFD) and BET3: TPC6 complex (PDBID: 2CFH) results. All candidate structures generated by the low-resolution phase of docking are plotted, comparing their low-resolution score to their RMSD from the experimental bound structure. For the baseline score function (Fig. 3A), the lowest-scoring models are nearly all incorrect with RMSD values from 7 to 22 Å, and few models under 6 Å are sampled at all. In contrast, with MDS (Fig. 3B), a clear ‘funnel’ can be seen in the plot, with the lowest-scoring models having low RMSD values from the native structure. The top-scoring structures are near-native indicating a successful discrimination. Further, if MDS was used to filter the candidate structures so that only the top 1 or 10% of low-resolution candidates were sent to the computationally intensive refinement stage, near-native structures would be included in the set. In contrast, filtering with the centroid score would eliminate the best structures. Docking results of BET3: TPC6 complex (Fig. 3C and D) present a similar trend in that near-native models are lost when filtering on centroid score and can be retained by filtering on MDS. Supplementary Tables S2 and S3 present docking metrics for each of the nine complexes in the test set. Since this is a coarse-grained structure comparison, instead of the standard CAPRI metrics, we defined near-native as ligand RMSD_Cα < 6 Å. Significant improvements occur for all but the most flexible complexes.

Fig. 3. — Low-resolution score versus RMSD from native plots for two examples, *viz.* Ras: RALGDS domain complex (A and B) and BET3: TPC6 complex (C and D). (A and C) 10 000 models generated by RosettaDock 3.2 using the centroid score function, and (B and D) 10 000 models generated by RosettaDock 4.0 using motif dock score (MDS) function. (A) Centroid score does not generate many near-native candidate structures, and it cannot distinguish them from incorrect models. All metrics indicate failure: N5 = 0, *N100* = 0, *N1000* = 23. (B) MDS generates a large number of near-native candidate structures, and discriminates them from incorrect models. All metrics indicate success: N5 = 5, *N100* = 95, *N1000* = 750. (C) N5 = 1 indicates discrimination failure, but *N100* = 86 and *N1000* = 673 indicate that the broader set is enriched in near-native structures. (D) All metrics indicate success: N5 = 5, *N100* = 98, *N1000* = 813

To test whether MDS was unduly biased by existing structures of homologous interfaces in creating the score function, we removed all homologs of the proteins in Docking Benchmark 5.0 identified in the Dockground (Anishchenko et al., 2015) and the PIFACE (Cukuroglu et al., 2014) libraries before building the motif tables. Supplementary Table S4 demonstrates that the performance of MDS with tables built after removal of the 8126 homologs is similar to that with just the benchmark PDBs removed.

3.3 Evaluation of RosettaDock 4.0 on benchmark set

The ensemble generation methods used, viz. Rosetta Backrub, Rosetta Relax and NMA, have been shown to produce backbones that are between 1 and 4 Å RMSD from the unbound starting structure, with an average correlation of 0.4–0.5 to the experimentally determined displacements of the bound and unbound states (Kuroda and Gray, 2016). The extent of motion suggested that the ensembles generated using these methods could be used to dock moderately flexible proteins. Thus, we built a benchmark set enriched with moderately flexible proteins to evaluate the RosettaDock 4.0 protocol.

We evaluated the accuracy of RosettaDock 4.0 for 43 complexes classified as medium-flexible, as well as for 32 classified as flexible and 13 classified as rigid, for a total of 88 targets (set 3, see Supplementary Method S2). For each target, we pre-generated 100 conformations for both the ligand and the receptor ensembles. The three conformer generation methods produce motions in different directions and locations, and hence we increased the variability of the full ensembles by using 40 conformations made using NMA, 30 made using Backrub and 30 made using Relax. We then generated 5000 local docked models using the full RosettaDock 4.0 protocol for each target. We also ran control simulations using the RosettaDock 3.2 protocol, also generating 5000 candidate structures per target. For a fair comparison to the previously published accuracy metrics, we generated conformer ensembles for the control runs containing only 1 receptor conformation and 10 ligand conformations.

The ability of the two protocols to sample and discriminate near-native structures was evaluated using the bootstrapped N5 average, ⟨N5⟩, both after the low-resolution stage and for the final models after the high-resolution stage. To evaluate the enrichment in the low-resolution stage alone, which dictates how many trajectories need to be run, we used the ⟨E_1%⟩ metric. As summarized in Table 1, RosettaDock 4.0 shows significant performance gains over RosettaDock 3.2, particularly in the low-resolution phase. RosettaDock 4.0’s near-native enrichment is improved markedly, with median ⟨E_1%⟩ value of 2.5, implying that its very low-scoring sets are significantly enriched with near-native structures from the bulk candidate set. RosettaDock 3.2’s median ⟨E_1%⟩ value is 0.0, indicating that the very low-scoring set is devoid of near-native structures. Figure 4 compares enrichments of RosettaDock 3.2 versus RosettaDock 4.0 for each target. The ⟨E_1%⟩ performance (Fig. 4A) improves for 62 complexes in RosettaDock 4.0, most of which had zero enrichment previously. The performance is worse for seven complexes, primarily due to favorable scoring of spurious interfaces. For the remaining 19 complexes, neither method was enriched in near-native decoys. RosettaDock 4.0 has an average low-resolution ⟨N5⟩ value of 1.3 across all targets, which implies that even after coarse-graining the side chains, more than one in the five top-scoring structures is near-native on average. This is approximately a ten-fold improvement over the corresponding average from RosettaDock 3.2. Our criterion for success discrimination is that the ⟨N5⟩ value should be 3 or higher. We see a seven-fold improvement in the number of expected low-resolution discrimination successes across the benchmark set (16.8 versus 2.5 complexes). Pairwise target comparison (Fig. 4B) shows that only 2 success cases are lost from RosettaDock 3.2 to RosettaDock 4.0, while 13 additional successes are added.

Table 1.

Summary of performance of RosettaDock 3.2 versus RosettaDock 4.0 across an 88-target benchmark set

		RosettaDock 3.2			RosettaDock 4.0
		Low-Res ⟨N5⟩	High-Res ⟨N5⟩	⟨E_1%⟩	Low-Res ⟨N5⟩	High-Res ⟨N5⟩	⟨E_1%⟩
Average value	Rigid Body	0.0	2.7	0.0	2.2	3.5	9.0
	Medium	0.3	1.8	0.0	1.0	2.4	3.6
	Difficult	0.0	1.2	0.0	0.6	1.6	0.2
	Difficult (Doped)				0.7	2.2	2.9
	All	0.1	1.9	0.0	1.3	2.5	2.5
	All (Doped)				1.3	2.7	3.7

Expected successes	Rigid Body	0.0	7.1	0.1	5.6	9.5	7.0
	Medium	2.5	15.4	2.8	7.6	20.2	13.0
	Difficult	0.0	7.4	0.3	3.6	9.8	5.0
	Difficult (Doped)				4.2	13.7	4.7
	All	2.5	29.9	3.2	16.8	39.6	25.1
	All (Doped)				17.4	43.4	24.8

Open in a new tab

Note: The ⟨N5⟩ values are the average bootstrapped N5 values, both after the low-resolution stage and after the high-resolution stage (full protocol), with averages calculated across all targets in each flexibility category, as well as across the entire set. ⟨E_1%⟩ is the median bootstrapped enrichment in the 1% top-scoring structures (after the low-resolution phase). Flexible target results include measurements with doped ensembles. The number of expected success cases, as calculated via bootstrapping is defined as follows: for N5 values, ⟨N5⟩ ≥ 3; for ⟨E_1%⟩, ⟨N50⟩ ≥ 15.

Fig. 4. — Comparison of performance metrics between RosettaDock 3.2 and RosettaDock 4.0 for individual complexes in the benchmark. Targets are represented by different symbols corresponding to their difficulty category (circle: rigid; triangle: medium; diamond: flexible). Points above the solid line represent better performance in RosettaDock 4.0, while points below the line represent better performance in RosettaDock 3.2. Comparison of (A) ⟨*E_1%*⟩ enrichment values between the two protocols on a log-log axes. ⟨*E_1%*⟩ shows marked improvement in the vast majority of the complexes. Dashed lines demarcate regions where the low-scoring set is enriched in near-native structures. Comparison of ⟨N5⟩ values (B) after low-resolution stage, and (C) after high-resolution stage (full protocol). Dashed lines highlight the region in which the two protocols differ significantly, i.e. by more than one point in their ⟨N5⟩ values. After the full protocol, 23 of the 88 complexes are modeled significantly better and 7 complexes are modeled significantly worse

While the low-resolution stage is improved using binned energy approximations, additional gains are possible in the high-resolution stage where all protein atoms are explicitly represented. After the full protocol with both low- and high-resolution stages, the average ⟨N5⟩ increases from 1.9 in RosettaDock 3.2, which represents a marginal failure, to 2.5 in RosettaDock 4.0, which represents the borderline for success. The expected number of successes in the benchmark set increases from 29.9 to 39.6 complexes, a 32% improvement. About half of the additional successes are gained from moderately-flexible complexes, with another quarter coming from flexible complexes, suggesting that RosettaDock 4.0 is better at capturing flexible backbones than RosettaDock 3.2. Additionally, although rigid complexes only comprise 15% of the benchmark set, they comprise 25% of the docking improvements, suggesting that in a more balanced benchmark set containing more rigid targets, the improvement in performance in RosettaDock 4.0 might be even larger. As shown in Figure 4C, while 23 complexes have full protocol ⟨N5⟩ values improved by 1 or more in the RosettaDock 4.0 simulations, 7 complexes have ⟨N5⟩ decreased by 1 or more. Detailed metrics for each target can be found in Supplementary Table S5 and Figures S12–S17.

3.3.1 Ensembles doped with near-bound structures

We previously showed that when the RMSD gap between the closest conformation in the ensemble and the bound state exceeds 1 Å, induced fit methods are rarely able to access the binding funnel (Kuroda and Gray, 2016). We observed similar results for the Docking Benchmark 5.0 difficult targets (cases with interface RMSD_Cα > 2.2 Å). As none of the ensemble generation methods used move the backbone quite so far, neither RosettaDock 3.2 nor RosettaDock 4.0 performed well on difficult targets. For example, the complex of SRP GTPase with FtsYh undergoes an interface conformational change of 2.67 Å RMSD (Supplementary Fig. S10), and the docking run is only able to create a few acceptable predictions, but not rank them highly (Fig. 5A). Both monomer backbones undergo about 3 Å of conformational change upon binding, but the ensembles created from the unbound state do not contain any conformations closer than 2.5 Å from the bound state (Fig. 5C).

Fig. 5. — Improvement in docking performance of RosettaDock 4.0 by doping the ensemble with near-bound decoys for SRP GTPase: FtsY complex. Score versus RMSD plot of runs with (A) backbone conformations generated using NMA, Backrub and Relax protocols, and (B) ensembles doped with 10% near-bound conformations. (A) Without the ensemble doping, the simulations did not generate medium- or high-quality docked structures, and the acceptable structures did not score low enough to be discriminated from incorrect structures. (B) Ensemble doping generated deep docking funnels with high-quality structures. Colored points indicate CAPRI-quality category for each decoy, and the blue points provide a reference energy of the refined, bound crystal structure. (C and D) Plot of the contact-residue RMSD_Cα from the bound conformation for the ligand and the receptor conformers selected after the docking simulation for (C) ensembles without near-native doping, and (D) ensembles with 10% near-bound conformations doped. The RMSD values of the unbound conformations are marked with a green line segment, and those of the near-bound conformations are marked in colors corresponding to the biasing constraint weight. (C) The conformer generation methods are unable to generate sub-Å contact-residue RMSD_Cα structures starting from the unbound ligand conformation (with RMSD_Cα of 3.57 Å) and the unbound receptor conformation (with RMSD_Cα of 2.92 Å). (D) Four of the biased conformations of the ligand and five of the receptor are within 1 Å RMSD_Cα from the bound state. RosettaDock 4.0 is able to recognize these close conformations, find the native-like interface and successfully dock the complex

For cases with large backbone variation, we wondered whether RosettaDock 4.0 could select a near-bound backbone if such structures were present in a large, diverse ensemble used in the conformer selection stage. Therefore, we tested docking using ensembles doped with near-native backbone structures. To generate near-bound structures, we used Rosetta’s Relax protocol with pairwise C_α-C_α distance constraints to bias the simulation towards the known bound state (detailed in Supplementary Method S7). Using different constraint weights, we generated 10 conformers that were progressively nearer to the bound state, with the closest 4 conformations ranging from 0.59 to 0.81 Å RMSD from the bound structure for both receptor and ligand. To complete the ensemble, we mixed these 10 structures with an unbiased set of 36 NMA structures, 27 Backrub structures and 27 Relax structures. For the SRP GTPase: FtsY complex (PDB: 2J7P), RosettaDock 4.0 produces structures using the full range of backbone conformations (Fig. 5D) after the full protocol. Furthermore, the lowest-scoring docked structures are near-native (Fig. 5B) and are chosen from the monomer backbones near the bound conformation (Fig. 5D). Remarkably, even with just four near-bound backbones present in an ensemble of a hundred conformations with widely differing interface structures, RosettaDock 4.0 correctly recognizes these close conformations and docks them successfully. Figure 5D shows the correlation between closer backbones and better docked structures. Similar results are seen for others including the Pol III-ɛ: Hot complex (PDB: 2IDO), which has a 2.79 Å interface RMSD_Cα between the unbound and bound states (Supplementary Fig. S10). In all, the doping method was able to add nearly 4 additional expected successes among the 32 difficult targets in the benchmark set. Detailed metrics for each target can be found in Supplementary Table S4 and Figs S18 and S19.

3.3.2 Improved efficiency for large ensembles

One of the principal aims was to create a protocol that scales well with increasing ensemble sizes. Figure 6 shows run time across the benchmark set. In 77 of the 88 complexes tested, for ensembles containing 100 conformations each, RosettaDock 4.0 requires only 20–80% more time than RosettaDock 3.2 with just 1 receptor and 10 ligand conformations. Time per structure scales as ∼ $N_{res}^{1.4}$ for both RosettaDock 4.0 and RosettaDock 3.2, where $N_{res}$ is the number of residues in the complex.

Fig. 6. — Efficiency of RosettaDock 4.0 on large ensembles. Despite sampling 100 conformations each of the receptor and the ligand as compared to 1 receptor and 10 ligand conformations in RosettaDock 3.2, the time per decoy for RosettaDock 4.0 is 20-80% more in 77 of the 88 targets tested

4 Discussion and conclusions

We developed two key advances here to create RosettaDock 4.0. First, ACS now allows us to examine a variety of backbone motions introduced by different ensemble generation protocols. The protocol scales well with an increasing number of backbones by providing adequate sampling with a runtime overhead of merely 56% on average when testing 1000-times more backbone combinations. Second, the low-resolution scoring using MDS shows a marked improvement in accuracy over centroid scoring. MDS triples the number of targets in which the top 1% of models are significantly enriched with near-bound structures, and it is seven to nine times as effective for discriminating top models, as measured by the bootstrapped ⟨N5⟩ metrics. More generally, MDS captures nearly all of the discriminatory power of the full-atom score function upon which it is based, exhibiting similar low-resolution and high-resolution N5, N100 and N1000 metrics. Most importantly for a low-resolution score function, MDS achieves these gains in accuracy without sacrificing computational efficiency, running in roughly equivalent time to the centroid scoring method. It does require about 2 GB of additional memory to store the score table (requiring approximately 2.6 GB total compared to 0.6 GB for the baseline protocol). However, with modern computer architecture, this requirement is not prohibitive. With enhanced scoring and sampling, RosettaDock 4.0 can now select near-bound backbones in large, diverse ensembles for targets with significant changes at the interface.

RosettaDock 4.0 compares favorably to other docking protocols despite using more stringent success criteria. Table 2 summarizes recent published results from five leading docking methods: HADDOCK (Vangone et al., 2017), iATTRACT (Schindler et al., 2015), ClusPro (Kozakov et al., 2017), ZDOCK (Pierce et al., 2011) and RosettaDock 3.2 (Chaudhury et al., 2011). While the methods have different scopes and benchmarks, and report their results in different forms, we were able to assign an N# success metric (analogous to N5, N100 etc.) to each method. In general, current methods are good at docking rigid-body targets (∼50% accuracy or better), but they are all poor when the targets become more flexible (< ∼30% accuracy on medium flexibility targets, <∼15% on high flexibility targets). RosettaDock 4.0 maintains this level of accuracy for easy targets (77%) while showing dramatically improved accuracy for flexible targets, both among medium difficulty targets (49%) and high difficulty targets (31%). The performance of RosettaDock 4.0 on different success metrics is shown in Supplementary Table S6. To our knowledge, this is the first report of a protein docking protocol achieving ∼50% accuracy on targets with backbone flexibility between 1 Å and 2 Å RMSD. Thus, RosettaDock 4.0 marks a key step toward a paradigm shift in protein–protein docking where complexes with backbone flexibility become tractable, which has long been a goal in the community (Lensink et al., 2017; Wodak and Méndez, 2004).

Table 2.

Comparison of five leading docking methods with RosettaDock 4.0

						Performance
Method	Description	Flexibility?	Benchmark Set	Docking Search	Success Metric	All Targets	Rigid Targets	Medium Targets^e	Flexible Targets^e
HADDOCK (2017)	Restraint-based docking, minimization	Yes	CASP-CAPRI^f	Mixed global/local	N10 = 1	16/25 (64%)	12/12 (100%)	4/13 (31%)
ClusPro (2017)	FFT docking, cluster evaluation	No	CAPRI Rds. 13 − 35	Mixed global/local	N10 = 1	19/42 (45%)	12.5^b/16 (78%)	6.5^b/26 (25%)
iATTRACT (2015)	Rigid-body docking, interface refinement	Yes	Docking Benchmark 4.0^g	Global^a	N200 = 30	64/166 (39%)	55/119 (46%)	9/28 (32%)	0/19 (0%)
ZDOCK (2011)	FFT docking, model evaluation	No	Docking Benchmark 4.0	Global	N100 = 1^c	65/176 (37%)	58/121 (48%)	7/30 (23%)	0/25 (0%)
RosettaDock 3.2 (2011)	Monte Carlo docking, model evaluation	Yes	Docking Benchmark 4.0	Local	N5 = 3	56/115 (49%)	49/84 (58%)	5/17 (29%)	2/14 (14%)
RosettaDock 4.0	Monte Carlo docking, model evaluation	Yes	Docking Benchmark 5.0^h	Local	N5 = 3^d	41/88 (47%)	10/13 (77%)	21/43 (49%)	10/32 (31%)

Open in a new tab

Nearest-native structures from rigid-body docking selected for refinement.

Half successes awarded for targets with multiple binding sites evaluated, where at least one but not all binding sites are captured.

2.5 Å cutoff for near-native structures.

Cases where bootstrapping gives ≥50% chance of N5 ≥ 3 are considered successfully docked.

For CAPRI sets, medium & difficult targets are combined, comprising all targets without at least one high-quality prediction by any predictor.

Lensink et al. (2016).

Hwang et al. (2010).

Vreven et al. (2015).

The limiting factor to successfully docking protein complexes with greater flexibility is now the ability to generate conformers within 0.7 Å of the bound state where MDS can start recognizing interfaces. Previously, our lab compared seven commonly used methods to generate ensembles from monomers; while ensembles from most methods had ∼50% directional overlap with the experimentally observed direction, the magnitudes of these motions were insufficient to reach the bound conformations (Kuroda and Gray, 2016). Diversifying ensembles by pushing them along their top principal components may help close the gap. Another possible solution for proteins which have been crystalized in different contexts or have structurally diverse homologs is a distance geometry-based conformer selection method, which has recently been shown to span relevant conformational space (Greener et al., 2017). Using energetic complementarity to the unbound partner as a means of generating and selecting conformers can also improve docking performance (Pallara et al., 2016).

While RosettaDock 4.0 makes large strides in conformer selection, the protocol still simulates induced fit only in the all-atom mode with small, rigid-body moves and side-chain packing at the interface. Other studies have shown significant contributions of induced fit, whether implemented via Cartesian minimization at the interface (Schindler et al., 2015) or through contact-specific normal mode analysis (Oliwa and Shen, 2015). Previous attempts to introduce flexibility at the interface in RosettaDock by varying backbone torsions resulted in 3-fold increased run times for the smallest targets (Wang et al., 2007). Doing so by minimizing along Cartesian coordinates can slow the protocol down by more than 10 times (data not shown). These protocols were implemented in the high-resolution phase because the centroid score was not accurate for native discrimination. MDS might now enable induced fit methods in the low-resolution phase, adding further backbone conformer sampling. Additionally, the accuracy of MDS means that low-resolution output structures might be filtered such that only a small fraction are sent to the expensive high-resolution phase. As such, MDS will be a critical component of the future ability to the RosettaDock protocol to induce a fit at the interface.

Supplementary Material

Supplementary Data

Click here for additional data file.^{(33MB, pdf)}

Acknowledgements

Computations in this study have been performed in part on the Maryland Advanced Research Computing Center (MARCC) cluster. We thank David Baker for comments on the manuscript.

Funding

N.A.M. and J.J.G. thank the National Institutes of Health, USA (grant R01-GM078221). S.S.R.B. and J.J.G. thank the National Science Foundation, USA (grant 1507736). W.S. thanks the Department of Energy, USA, the Air Force Office of Scientific Research, USA, and the Howard Hughes Medical Institute.

Conflict of Interest: J.J.G. is an unpaid board member of the Rosetta Commons. Under institutional participation agreements between the University of Washington, acting on behalf of the Rosetta Commons, Johns Hopkins University may be entitled to a portion of revenue received on licensing Rosetta software, which includes the methods described in this paper. As a member of the Scientific Advisory Board of Cyrus Biotechnology, J.J.G. is granted stock options. Cyrus Biotechnology distributes the Rosetta software, which may include methods described in this paper.

References

Alford R.F. et al. (2017) The Rosetta all-atom energy function for macromolecular modeling and design. J. Chem. Theory Comput., 13, 3031–3048. [DOI] [PMC free article] [PubMed] [Google Scholar]
Anishchenko I. et al. (2015) Structural templates for comparative protein docking. Proteins, 83, 1563–1570. [DOI] [PMC free article] [PubMed] [Google Scholar]
Atilgan A.R. et al. (2001) Anisotropy of fluctuation dynamics of proteins with an elastic network model. Biophys. J., 80, 505–515. [DOI] [PMC free article] [PubMed] [Google Scholar]
Baaden M., Marrink S.J. (2013) Coarse-grain modelling of protein–protein interactions. Curr. Opin. Struct. Biol., 23, 878–886. [DOI] [PubMed] [Google Scholar]
Berman H.M. et al. (2000) The Protein Data Bank. Nucleic Acids Res., 28, 235–242. [DOI] [PMC free article] [PubMed] [Google Scholar]
Changeux J.-P., Edelstein S. (2011) Conformational selection or induced fit? 50 years of debate resolved. F1000 Biol. Rep., 3, 19.. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chaudhury S. et al. (2011) Benchmarking and analysis of protein docking performance in Rosetta v3.2. PLoS One, 6, e22477. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chaudhury S. et al. (2007) Incorporating biochemical information and backbone flexibility in RosettaDock for CAPRI rounds 6-12. Proteins, 69, 793–800. [DOI] [PubMed] [Google Scholar]
Chaudhury S., Gray J.J. (2008) Conformer selection and induced fit in flexible backbone protein–protein docking using computational and NMR ensembles. J. Mol. Biol., 381, 1068–1087. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chu X. et al. (2013) Quantifying the topography of the intrinsic energy landscape of flexible biomolecular recognition. Proc. Natl. Acad. Sci. USA, 110, E2342–E2351. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cukuroglu E. et al. (2014) Non-redundant unique interface structures as templates for modeling protein interactions. PLoS One, 9, e86738. [DOI] [PMC free article] [PubMed] [Google Scholar]
Daily M.D. et al. (2005) CAPRI rounds 3-5 reveal promising successes and future challenges for RosettaDock. Proteins, 60, 181–186. [DOI] [PubMed] [Google Scholar]
DeBartolo J. et al. (2012) Predictive Bcl-2 family binding models rooted in experiment or structure. J. Mol. Biol., 422, 124–144. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fallas J.A. et al. (2017) Computational design of self-assembling cyclic protein homo-oligomers. Nat. Chem., 9, 353–360. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fleishman S.J. et al. (2011) RosettaScripts: a scripting language interface to the Rosetta macromolecular modeling suite. PLoS One, 6, e20161. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gray J.J. et al. (2003a) Protein–protein docking predictions for the CAPRI experiment. Proteins, 52, 118–122. [DOI] [PubMed] [Google Scholar]
Gray J.J. et al. (2003b) Protein–protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations. J. Mol. Biol., 331, 281–299. [DOI] [PubMed] [Google Scholar]
Greener J.G. et al. (2017) Predicting protein dynamics and allostery using multi-protein atomic distance constraints. Structure, 25, 546–558. [DOI] [PMC free article] [PubMed] [Google Scholar]
Grünberg R. et al. (2004) Complementarity of structure ensembles in protein–protein binding. Structure, 12, 2125–2136. [DOI] [PubMed] [Google Scholar]
Hwang H. et al. (2010) Protein–protein docking benchmark version 4.0. Proteins, 78, 3111–3114. [DOI] [PMC free article] [PubMed] [Google Scholar]
Janin J., Wodak S.J. (1978) Computer analysis of protein–protein interaction. J. Mol. Biol., 124, 323–342. [DOI] [PubMed] [Google Scholar]
Kilambi K.P. et al. (2013) Extending RosettaDock with water, sugar, and pH for prediction of complex structures and affinities for CAPRI rounds 20–27. Proteins, 81, 2201–2209. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kmiecik S. et al. (2016) Coarse-grained protein models and their applications. Chem. Rev., 116, 7898–7936. [DOI] [PubMed] [Google Scholar]
Kozakov D. et al. (2017) The ClusPro web server for protein–protein docking. Nat. Protoc., 12, 255–278. [DOI] [PMC free article] [PubMed] [Google Scholar]
Krivov G.G. et al. (2009) Improved prediction of protein side-chain conformations with SCWRL4. Proteins, 77, 778–795. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kuroda D., Gray J.J. (2016) Pushing the backbone in protein–protein docking. Structure, 24, 1821–1829. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lensink M.F. et al. (2017) Modeling protein–protein and protein–peptide complexes: cAPRI 6th edition. Proteins, 85, 359–377. [DOI] [PubMed] [Google Scholar]
Lensink M.F. et al. (2016) Prediction of homoprotein and heteroprotein complexes by protein docking and template-based modeling: a CASP-CAPRI experiment. Proteins, 84, 323–348. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li Z., Scheraga H.A. (1987) Monte Carlo-minimization approach to the multiple-minima problem in protein folding. Proc. Natl. Acad. Sci. USA, 84, 6611–6615. [DOI] [PMC free article] [PubMed] [Google Scholar]
Marze N.A. et al. (2017) Modeling oblong proteins and water-mediated interfaces with RosettaDock in CAPRI rounds 28–35. Proteins, 85, 479–486. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mashiach E. et al. (2010) FiberDock: a web server for flexible induced-fit backbone refinement in molecular docking. Nucleic Acids Res., 38, W457–W461. [DOI] [PMC free article] [PubMed] [Google Scholar]
Moal I.H., Bates P.A. (2010) SwarmDock and the use of normal modes in protein–protein docking. Int. J. Mol. Sci., 11, 3623–3648. [DOI] [PMC free article] [PubMed] [Google Scholar]
Oliwa T., Shen Y. (2015) cNMA: a framework of encounter complex-based normal mode analysis to model conformational changes in protein interactions. Bioinformatics, 31, i151–i160. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pallara C. et al. (2016) Conformational heterogeneity of unbound proteins enhances recognition in protein–protein encounters. J. Chem. Theory Comput., 12, 3236–3249. [DOI] [PubMed] [Google Scholar]
Park H. et al. (2016) Simultaneous optimization of biomolecular energy functions on features from small molecules and macromolecules. J. Chem. Theory Comput., 12, 6201–6212. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pierce B.G. et al. (2011) Accelerating protein docking in ZDOCK using an advanced 3D convolution library. PLoS One, 6, e24657. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schindler C.E.M. et al. (2015) iATTRACT: simultaneous global and local interface optimization for protein–protein docking refinement. Proteins Struct. Funct. Bioinf., 83, 248–258. [DOI] [PubMed] [Google Scholar]
Sircar A. et al. (2010) A generalized approach to sampling backbone conformations with RosettaDock for CAPRI rounds 13-19. Proteins, 78, 3115–3123. [DOI] [PMC free article] [PubMed] [Google Scholar]
Smith C.A., Kortemme T. (2008) Backrub-like backbone simulation recapitulates natural protein conformational variability and improves mutant side-chain prediction. J. Mol. Biol., 380, 742–756. [DOI] [PMC free article] [PubMed] [Google Scholar]
Trellet M. et al. (2013) A Unified Conformational Selection and Induced Fit Approach to Protein-Peptide Docking. PLoS One, 8, e58769. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tyka M.D. et al. (2011) Alternate states of proteins revealed by detailed energy landscape mapping. J. Mol. Biol, 405, 607–618. [DOI] [PMC free article] [PubMed] [Google Scholar]
Vakser I.A. (2014) Protein–protein docking: from interaction to interactome. Biophys. J., 107, 1785–1793. [DOI] [PMC free article] [PubMed] [Google Scholar]
Vangone A. et al. (2017) Sense and simplicity in HADDOCK scoring: lessons from CASP-CAPRI round 1. Proteins, 85, 417–423. [DOI] [PMC free article] [PubMed] [Google Scholar]
Venkatraman V., Ritchie D.W. (2012) Flexible protein docking refinement using pose-dependent normal mode analysis. Proteins, 80, 2262–2274. [DOI] [PubMed] [Google Scholar]
Vogt A.D., Di Cera E. (2012) Conformational selection or induced fit? A critical appraisal of the kinetic mechanism. Biochemistry, 51, 5894–5902. [DOI] [PMC free article] [PubMed] [Google Scholar]
Vreven T. et al. (2015) Updates to the integrated protein–protein interaction benchmarks: docking benchmark version 5 and affinity benchmark version 2. J. Mol. Biol., 427, 3031–3041. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang C. et al. (2005) Improved side-chain modeling for protein–protein docking. Protein Sci., 14, 1328–1339. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang C. et al. (2007) Protein–protein docking with backbone flexibility. J. Mol. Biol., 373, 503–519. [DOI] [PubMed] [Google Scholar]
Wodak S.J., Méndez R. (2004) Prediction of protein–protein interactions: the CAPRI experiment, its evaluation and implications. Curr. Opin. Struct. Biol., 14, 242–249. [DOI] [PubMed] [Google Scholar]
Xu M., Lill M.A. (2012) Utilizing experimental data for reducing ensemble size in flexible-protein docking. J. Chem. Inf. Model., 52, 187–198. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang Z. et al. (2017) Monte Carlo replica-exchange based ensemble docking of protein conformations. Proteins, 85, 924–937. [DOI] [PubMed] [Google Scholar]
Zhang Z. et al. (2013) Replica exchange improves sampling in low-resolution docking stage of RosettaDock. PLoS One, 8, e72096. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Click here for additional data file.^{(33MB, pdf)}

[bty355-B1] Alford R.F. et al. (2017) The Rosetta all-atom energy function for macromolecular modeling and design. J. Chem. Theory Comput., 13, 3031–3048. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty355-B2] Anishchenko I. et al. (2015) Structural templates for comparative protein docking. Proteins, 83, 1563–1570. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty355-B3] Atilgan A.R. et al. (2001) Anisotropy of fluctuation dynamics of proteins with an elastic network model. Biophys. J., 80, 505–515. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty355-B4] Baaden M., Marrink S.J. (2013) Coarse-grain modelling of protein–protein interactions. Curr. Opin. Struct. Biol., 23, 878–886. [DOI] [PubMed] [Google Scholar]

[bty355-B5] Berman H.M. et al. (2000) The Protein Data Bank. Nucleic Acids Res., 28, 235–242. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty355-B6] Changeux J.-P., Edelstein S. (2011) Conformational selection or induced fit? 50 years of debate resolved. F1000 Biol. Rep., 3, 19.. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty355-B7] Chaudhury S. et al. (2011) Benchmarking and analysis of protein docking performance in Rosetta v3.2. PLoS One, 6, e22477. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty355-B8] Chaudhury S. et al. (2007) Incorporating biochemical information and backbone flexibility in RosettaDock for CAPRI rounds 6-12. Proteins, 69, 793–800. [DOI] [PubMed] [Google Scholar]

[bty355-B9] Chaudhury S., Gray J.J. (2008) Conformer selection and induced fit in flexible backbone protein–protein docking using computational and NMR ensembles. J. Mol. Biol., 381, 1068–1087. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty355-B10] Chu X. et al. (2013) Quantifying the topography of the intrinsic energy landscape of flexible biomolecular recognition. Proc. Natl. Acad. Sci. USA, 110, E2342–E2351. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty355-B11] Cukuroglu E. et al. (2014) Non-redundant unique interface structures as templates for modeling protein interactions. PLoS One, 9, e86738. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty355-B12] Daily M.D. et al. (2005) CAPRI rounds 3-5 reveal promising successes and future challenges for RosettaDock. Proteins, 60, 181–186. [DOI] [PubMed] [Google Scholar]

[bty355-B13] DeBartolo J. et al. (2012) Predictive Bcl-2 family binding models rooted in experiment or structure. J. Mol. Biol., 422, 124–144. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty355-B14] Fallas J.A. et al. (2017) Computational design of self-assembling cyclic protein homo-oligomers. Nat. Chem., 9, 353–360. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty355-B15] Fleishman S.J. et al. (2011) RosettaScripts: a scripting language interface to the Rosetta macromolecular modeling suite. PLoS One, 6, e20161. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty355-B16] Gray J.J. et al. (2003a) Protein–protein docking predictions for the CAPRI experiment. Proteins, 52, 118–122. [DOI] [PubMed] [Google Scholar]

[bty355-B17] Gray J.J. et al. (2003b) Protein–protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations. J. Mol. Biol., 331, 281–299. [DOI] [PubMed] [Google Scholar]

[bty355-B18] Greener J.G. et al. (2017) Predicting protein dynamics and allostery using multi-protein atomic distance constraints. Structure, 25, 546–558. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty355-B19] Grünberg R. et al. (2004) Complementarity of structure ensembles in protein–protein binding. Structure, 12, 2125–2136. [DOI] [PubMed] [Google Scholar]

[bty355-B20] Hwang H. et al. (2010) Protein–protein docking benchmark version 4.0. Proteins, 78, 3111–3114. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty355-B21] Janin J., Wodak S.J. (1978) Computer analysis of protein–protein interaction. J. Mol. Biol., 124, 323–342. [DOI] [PubMed] [Google Scholar]

[bty355-B22] Kilambi K.P. et al. (2013) Extending RosettaDock with water, sugar, and pH for prediction of complex structures and affinities for CAPRI rounds 20–27. Proteins, 81, 2201–2209. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty355-B23] Kmiecik S. et al. (2016) Coarse-grained protein models and their applications. Chem. Rev., 116, 7898–7936. [DOI] [PubMed] [Google Scholar]

[bty355-B24] Kozakov D. et al. (2017) The ClusPro web server for protein–protein docking. Nat. Protoc., 12, 255–278. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty355-B25] Krivov G.G. et al. (2009) Improved prediction of protein side-chain conformations with SCWRL4. Proteins, 77, 778–795. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty355-B26] Kuroda D., Gray J.J. (2016) Pushing the backbone in protein–protein docking. Structure, 24, 1821–1829. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty355-B27] Lensink M.F. et al. (2017) Modeling protein–protein and protein–peptide complexes: cAPRI 6th edition. Proteins, 85, 359–377. [DOI] [PubMed] [Google Scholar]

[bty355-B28] Lensink M.F. et al. (2016) Prediction of homoprotein and heteroprotein complexes by protein docking and template-based modeling: a CASP-CAPRI experiment. Proteins, 84, 323–348. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty355-B29] Li Z., Scheraga H.A. (1987) Monte Carlo-minimization approach to the multiple-minima problem in protein folding. Proc. Natl. Acad. Sci. USA, 84, 6611–6615. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty355-B30] Marze N.A. et al. (2017) Modeling oblong proteins and water-mediated interfaces with RosettaDock in CAPRI rounds 28–35. Proteins, 85, 479–486. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty355-B31] Mashiach E. et al. (2010) FiberDock: a web server for flexible induced-fit backbone refinement in molecular docking. Nucleic Acids Res., 38, W457–W461. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty355-B32] Moal I.H., Bates P.A. (2010) SwarmDock and the use of normal modes in protein–protein docking. Int. J. Mol. Sci., 11, 3623–3648. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty355-B33] Oliwa T., Shen Y. (2015) cNMA: a framework of encounter complex-based normal mode analysis to model conformational changes in protein interactions. Bioinformatics, 31, i151–i160. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty355-B34] Pallara C. et al. (2016) Conformational heterogeneity of unbound proteins enhances recognition in protein–protein encounters. J. Chem. Theory Comput., 12, 3236–3249. [DOI] [PubMed] [Google Scholar]

[bty355-B35] Park H. et al. (2016) Simultaneous optimization of biomolecular energy functions on features from small molecules and macromolecules. J. Chem. Theory Comput., 12, 6201–6212. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty355-B36] Pierce B.G. et al. (2011) Accelerating protein docking in ZDOCK using an advanced 3D convolution library. PLoS One, 6, e24657. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty355-B37] Schindler C.E.M. et al. (2015) iATTRACT: simultaneous global and local interface optimization for protein–protein docking refinement. Proteins Struct. Funct. Bioinf., 83, 248–258. [DOI] [PubMed] [Google Scholar]

[bty355-B38] Sircar A. et al. (2010) A generalized approach to sampling backbone conformations with RosettaDock for CAPRI rounds 13-19. Proteins, 78, 3115–3123. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty355-B39] Smith C.A., Kortemme T. (2008) Backrub-like backbone simulation recapitulates natural protein conformational variability and improves mutant side-chain prediction. J. Mol. Biol., 380, 742–756. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty355-B40] Trellet M. et al. (2013) A Unified Conformational Selection and Induced Fit Approach to Protein-Peptide Docking. PLoS One, 8, e58769. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty355-B41] Tyka M.D. et al. (2011) Alternate states of proteins revealed by detailed energy landscape mapping. J. Mol. Biol, 405, 607–618. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty355-B42] Vakser I.A. (2014) Protein–protein docking: from interaction to interactome. Biophys. J., 107, 1785–1793. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty355-B43] Vangone A. et al. (2017) Sense and simplicity in HADDOCK scoring: lessons from CASP-CAPRI round 1. Proteins, 85, 417–423. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty355-B44] Venkatraman V., Ritchie D.W. (2012) Flexible protein docking refinement using pose-dependent normal mode analysis. Proteins, 80, 2262–2274. [DOI] [PubMed] [Google Scholar]

[bty355-B45] Vogt A.D., Di Cera E. (2012) Conformational selection or induced fit? A critical appraisal of the kinetic mechanism. Biochemistry, 51, 5894–5902. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty355-B46] Vreven T. et al. (2015) Updates to the integrated protein–protein interaction benchmarks: docking benchmark version 5 and affinity benchmark version 2. J. Mol. Biol., 427, 3031–3041. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty355-B47] Wang C. et al. (2005) Improved side-chain modeling for protein–protein docking. Protein Sci., 14, 1328–1339. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty355-B48] Wang C. et al. (2007) Protein–protein docking with backbone flexibility. J. Mol. Biol., 373, 503–519. [DOI] [PubMed] [Google Scholar]

[bty355-B49] Wodak S.J., Méndez R. (2004) Prediction of protein–protein interactions: the CAPRI experiment, its evaluation and implications. Curr. Opin. Struct. Biol., 14, 242–249. [DOI] [PubMed] [Google Scholar]

[bty355-B50] Xu M., Lill M.A. (2012) Utilizing experimental data for reducing ensemble size in flexible-protein docking. J. Chem. Inf. Model., 52, 187–198. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty355-B51] Zhang Z. et al. (2017) Monte Carlo replica-exchange based ensemble docking of protein conformations. Proteins, 85, 924–937. [DOI] [PubMed] [Google Scholar]

[bty355-B52] Zhang Z. et al. (2013) Replica exchange improves sampling in low-resolution docking stage of RosettaDock. PLoS One, 8, e72096. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Efficient flexible backbone protein–protein docking for challenging targets

Nicholas A Marze

Shourya S Roy Burman

William Sheffler

Jeffrey J Gray

Roles

Abstract

Motivation

Results

Availability and implementation

Supplementary information

1 Introduction

2 Materials and methods

2.1 Motif querying

2.2 Score grid generation

2.3 Scoring with motif dock score

2.4 Generation of backbone ensembles

2.5 Local docking simulations

2.6 Benchmark evaluation and success metrics

3 Results

3.1 Adaptive conformer selection

Fig. 1.

3.1.1 Efficiency of conformer selection

Fig. 2.

3.2 Optimization and benchmarking of motif dock score

Fig. 3.

3.3 Evaluation of RosettaDock 4.0 on benchmark set

Table 1.

Fig. 4.

3.3.1 Ensembles doped with near-bound structures

Fig. 5.

3.3.2 Improved efficiency for large ensembles

Fig. 6.

4 Discussion and conclusions

Table 2.

Supplementary Material

Acknowledgements

Funding

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases