Macromolecular Docking Restrained by a Small Angle X-Ray Scattering Profile

Dina Schneidman-Duhovny; Michal Hammel; Andrej Sali

doi:10.1016/j.jsb.2010.09.023

. Author manuscript; available in PMC: 2012 Mar 1.

Published in final edited form as: J Struct Biol. 2010 Oct 12;173(3):461–471. doi: 10.1016/j.jsb.2010.09.023

Macromolecular Docking Restrained by a Small Angle X-Ray Scattering Profile

Dina Schneidman-Duhovny ^1,^*, Michal Hammel ², Andrej Sali ^1,^*

PMCID: PMC3040266 NIHMSID: NIHMS242645 PMID: 20920583

Abstract

While many structures of single protein components are becoming available, structural characterization of their complexes remains challenging. Methods for modeling assembly structures from individual components frequently suffer from large errors, due to protein flexibility and inaccurate scoring functions. However, when additional information is available, it may be possible to reduce the errors and compute near-native complex structures. One such type of information is a small angle X-ray scattering (SAXS) profile that can be collected in a high-throughput fashion from a small amount of sample in solution. Here, we present an efficient method for protein-protein docking with a SAXS profile (FoXSDock): generation of complex models by rigid global docking with PatchDock, filtering of the models based on the SAXS profile, clustering of the models, and refining the interface by flexible docking with FireDock. FoXSDock is benchmarked on 124 protein complexes with simulated SAXS profiles, as well as on 6 complexes with experimentally determined SAXS profiles. When induced fit is less than 1.5Å interface C_⟨ RMSD and the fraction residues of missing from the component structures is less than 3%, FoXSDock can find a model close to the native structure within the top 10 predictions in 77% of the cases; in comparison, docking alone succeeds in only 34% of the cases. Thus, the integrative approach significantly improves on molecular docking alone. The improvement arises from an increased resolution of rigid docking sampling and more accurate scoring.

Keywords: Small Angle X-ray Scattering (SAXS), protein-protein docking, macromolecular assembly

Introduction

Many proteins are components of complexes, interacting with other proteins to deliver their functions, such as signal transduction, transport, and catalysis (Krogan et al., 2006; Robinson et al., 2007). Thus, structural description of protein complexes is important for understanding these processes. However, the number of solved complex structures remains relatively low, even while the number of experimentally solved single protein structures increases (Dutta and Berman, 2005). This gap can be bridged by hybrid or integrative methods(Alber et al., 2008; Alber et al., 2007; Steven and Baumeister, 2008). Integrative methods determine complex architectures by computationally combining information from different methods, such as X-ray crystallography or nuclear magnetic resonance (NMR) spectroscopy of component structures, electron microscopy of whole complexes, chemical cross-linking of components detected by mass spectrometry, and small angle X-Ray scattering (SAXS) of complexes.

The computational docking problem, which aims to predict a binary complex starting from the structures of unbound components, has been studied for more than three decades (Katchalski-Katzir et al., 1992; Wodak and Janin, 1978). Docking methods can be classified into three classes based on the sampling algorithms (Ritchie, 2008; Vajda and Kozakov, 2009): global search methods using the fast Fourier transform (FFTs) (Eisenstein and Katchalski-Katzir, 2004) or geometric shape matching (Schneidman-Duhovny et al., 2003), medium-range Monte Carlo methods (Fernandez-Recio et al., 2003; Gray et al., 2003), and the restraint-guided methods (van Dijk et al., 2005). Each class of methods is suitable for a specific docking sub-problem. Global methods are required for an adequate coverage of the search space, medium-range methods are best for local search and refinement, and restraint-guided methods perform well when additional information is available and can be translated into spatial restraints.

Docking methods have been systematically and prospectively evaluated at Critical Assessment of PRedictions of Interactions (CAPRI), relying on target complexes without available structures at the time of prediction (Janin, 2005). It is clear that the state-of-the-art docking methods can successfully (within top 10 predictions) predict the complex structure of two components with limited conformational change upon binding (induced fit that involves rotations of a few side chains), a standard size interface area (change in solvent accessibility area upon complex formation is between 1400 Å² and 2000 Å²), and significant hydrophobic interaction (solvation free energy of complex formation is less than - 4 kcal/mol) (Vajda, 2005). Predictions can also be accurate if additional experimental information about the interaction is available, such as mutations and cross-linking that help identify binding site residues. However, docking methods still suffer from a relatively high rate of incorrect prediction, due to protein flexibility and lack of a reliable scoring function (Lensink et al., 2007; Mendez et al., 2003; Mendez et al., 2005).

SAXS measurement is emerging as a rapid and effective way for obtaining low-resolution (10-30Å) structural information about macromolecular structures in solution (Petoukhov and Svergun, 2007; Putnam et al., 2007). The scattering curve resulting from the subtraction of the buffer from the sample, (SAXS profile, I(q)), is radially symmetric (isotropic) due to the randomly-oriented distribution of particles in solution. The profile can be converted into a radial distribution function of the molecule via a Fourier transform. Unlike electron microscopy, NMR spectroscopy, and X-ray crystallography, SAXS experiments can be performed under a wide variety of solution conditions, including near physiological conditions. The measurement is performed with ~1.0 mg/ml of a macromolecular sample in a ~15 μl volume, and usually takes only a few minutes on a well-equipped synchrotron beam line (Hura et al., 2009; Tsuruta and Irving, 2008).

Computational approaches for modeling a macromolecular structure based on its SAXS profile can be classified into ab initio and rigid body modeling methods (Putnam et al., 2007). On the one hand, the ab initio methods search for coarse shapes represented by dummy atoms (beads) that fit the experimental SAXS profile (Chacon et al., 1998; Svergun, 1999; Svergun et al., 2001). On the other hand, rigid body approaches search for an atomic model of the molecule with a computed SAXS profile that fits the experimental profile (Förster et al., 2008; Pelikan et al., 2009; Petoukhov and Svergun, 2005). Therefore, rigid body modeling can be used only if an approximate structure of the studied molecule or its components are available, as is the case in protein-protein docking.

There are several methods for rigid docking with a SAXS profile. DIMFOM, GLOBSYMM and SASREF (Petoukhov and Svergun, 2005) are based on the CRYSOL program (Svergun et al., 1995) for SAXS profile fitting with a simplified sampling algorithm, where the structure of one monomer is rolled over the surface of the other; however, no interface optimization is performed. In another method, the scoring function combines SAXS and simple interface complementarity terms, sampled by a local search method that requires a relatively accurate initial configuration (Förster et al., 2008); in the absence of the initial configuration, the method starts from 1000 random orientations. A number of analyses of specific biological systems relied on docking followed by filtering of models based on a fit to a SAXS profile (Covaceuszach et al., 2008; Filgueira de Azevedo et al., 2003; Sondermann et al., 2005).

Here, we present a hybrid approach that computes a model of a complex for two given component structures, by simultaneously satisfying physicochemical complementarity between the components as well as a fit to a SAXS profile. The SAXS profile allows to increase the configurational sampling precision and decrease the number of inaccurate models with good scores. Moreover, while docking methods optimize interface shape complementarity, a SAXS profile provides information about the global complex shape. In many cases, especially if the proteins are elongated, small changes in the interface can lead to large changes in the global complex shape. Therefore, it is necessary to increase the sampling resolution to sample the complex accurately in terms of its interface as well as global shape. We test the method on 124 cases with simulated SAXS profiles and six cases with experimental SAXS profiles. The hybrid approach significantly improves on molecular docking alone: When induced fit is less than 1.5Å interface C_⟨ RMSD and the fraction residues missing from the component structures is less than 3%, FoXSDock can find a model close to the native structure within the top 10 predictions in 77% of the cases; in comparison, docking alone succeeds in only 34% of the cases.

Method

Method Outline

The method presented here addresses the docking problem restrained by a SAXS profile: Given two structures of molecules (referred to as a receptor and a ligand) and the SAXS profile of their complex, the goal is to find the complex structure; only minor conformational changes, such as side chain repacking, are explicitly modeled.

The docking protocol involves five steps (Figure 1):

Global search. Rigid docking is performed by a geometric shape-matching algorithm PatchDock, generating thousands of models. In this step, the flexibility is taken into account implicitly by allowing a small amount of steric clashes at the interface.
Coarse SAXS filtering. Radius of gyration predicted from the SAXS profile of the complex is used to filter out rigid docking models that do not agree with the SAXS measurements.
SAXS scoring. Each docking model is fitted against the SAXS profile of the complex.
Clustering. Remaining docking models are clustered by their interface C_⟨ RMSD and the cluster representative with the best fit to the SAXS profile is selected.
Conformational refinement. Cluster representatives are refined and steric clashes are removed through optimization of side chain positions and relative protein orientations, with FireDock. The final models are scored and ranked by an energy-based score and the fit to the SAXS profile.

Global Search

Rigid Binary Docking

PatchDock is used for global rigid docking (Duhovny et al., 2002; Schneidman-Duhovny et al., 2005b). PatchDock is an efficient rigid docking method that maximizes geometric shape complementarity. To account for surface flexibility in real-life docking involving unbound component structures, the geometric shape complementarity scoring function allows a small amount of steric clashes at the interface. The molecular docking is similar to assembling a jigsaw puzzle. Given two molecules, their surfaces are divided into patches based on their shape: convex, flat, and concave. Once the patches are defined, a pair of neighboring patches on one molecule is superimposed with a pair of neighboring patches on the other molecule, using Geometric Hashing (Lamdan and Wolfson, 1988). Next, the resulting models are clustered, filtered for severe steric clashes, and scored by shape complementarity. The configurational sampling precision can be controlled by the resolution of the surface representation (minimal distance between surface points used to generate docking models) and clustering parameters. Usually, docking methods balance configurational sampling precision against the accuracy and efficiency of scoring function, with the goal of retaining a sufficiently accurate model within a sufficiently small fraction of the best scoring models.

Here, the configurational sampling precision is increased to ensure the complex is sampled accurately in terms of its interface as well as global shape. The final clustering of rigid docking models is performed with a 2Å cut-off on the ligand interface C_⟨ RMSD (compared to the default of 4Å; the ligand interface C_⟨ RMSD is computed using the ligand C_⟨ atoms within 10Å from the receptor in the docked configuration) and the resolution of surface representation of the ligand is decreased by 0.5Å to 1Å. These changes result in the average of 1.7 10⁵ rigid docking models per complex, compared to 8.2 10³ for the default parameter values. In addition, near-native models (ligand C_⟨ RMSD (L-RMSD) < 10Å or interface C_⟨ RMSD (I-RMSD) < 4Å as defined below in the assessment criteria) are observed in 94% of the benchmark cases, compared to 80% for default parameters.

Rigid Multi-Body Symmetric Docking

Symmetric cases are docked with SymmDock (Schneidman-Duhovny et al., 2005a), a docking algorithm for the prediction of cyclically symmetric complexes (C_n) given the structure of its asymmetric unit and symmetry order n. SymmDock a priori restricts its transformational search space only to symmetric transformations, and thus gains both in efficiency and accuracy. In the case of dihedral symmetry (D₂ tetramer is a dimer of dimers), SymmDock is applied first to generate dimers. Next, D₂ tetramers are constructed by combining dimer pairs with perpendicular symmetry axes.

Coarse SAXS Filtering

For a SAXS profile, radius of gyration (R_G^exp) is computed from the slope of the Guinier plot of the profile (Guinier and Fournet, 1955). For a protein structure, radius of gyration (R_G^3D) is computed as $R_{G}^{3 D} = \sqrt{\frac{1}{N} \sum_{k = 1}^{N} {(r_{k} - r_{c})}^{2}}$ , where r_k is a position of atom k, and r_c is the centroid of the structure.

A docking model is filtered out if its radius of gyration is 10% smaller or 4% larger than the radius of gyration computed from the SAXS profile (0.9R_G^exp ≤ R_G^3D ≤ 1.04R_G^exp); the larger tolerance for the lower bound results from ignoring the hydration layer in the radius of gyration calculation.

SAXS Profile Fitting

For a given structure or a model, the SAXS profile is computed by FoXS (Schneidman-Duhovny et al.), based on the Debye formula (Debye, 1915):

I (q) = \sum_{i = 1}^{N} \sum_{j = 1}^{N} f_{i} (q) f_{j} (q) \frac{\sin ({qd}_{ij})}{{qd}_{ij}}

(1)

where the intensity, I(q), is a function of the momentum transfer, q = (4π sin θ) / λ; 2θ is the scattering angle and λ is the wavelength of the incident X-ray beam; f_i(q) is the form factor of an atom i, d_ij is the distance between atoms i and j, and N is the number of atoms in the system. In the FoXS model, the form factor f_i(q) takes into account the displaced solvent as well as the hydration layer:

f_{i} (q) = f_{v} (q) - c_{1} f_{s} (q) + c_{2} s_{i} f_{w} (q)

(2)

where f_v(q) is the atomic form factor in vacuo (Svergun et al., 1995), f_s(q) is the form factor of the dummy atom that represents the displaced solvent (Fraser et al., 1978), s_i is the fraction of the solvent accessible surface of the atom i (Connolly, 1983), and f_w(q) is the water form factor. The parameter c₁ is used to adjust the total excluded volume of the atoms (default value is 1.0) and c₂ is used to adjust the density of the water in the hydration layer (default value is 0.0). In this work, the default values for c₁ and c₂ are used, because we want to rank docking models based on their SAXS fitting scores calculated under identical conditions.

The SAXS profile computed from the structure is fitted to the experimental SAXS profile by minimizing Χ:

χ = \sqrt{\frac{1}{M} {\sum_{i = 1}^{M} (\frac{I_{\exp} (q_{i}) - c I (q_{i})}{σ (q_{i})})}^{2}}

(3)

where I_exp(q) and I(q) are the experimental and computed profiles, respectively, σ(q) is the error of the experimental profile, M is the number of points in the profile, and c is the scaling factor.

For rigid binary docking, additional speed-up is achieved by pre-computing rigid body profiles (I_A, I_B), made possible by constant distances for atom pairs within a rigid body. Only the contribution of inter-rigid body distances to the complex profile (I_AB) is computed for each docking model by iterating over inter-molecular atom pairs in Equation 1. The profile of the docked complex is computed as the sum of three profiles: I_complex = I_A+ I_B+I_AB.

For symmetric complexes, even higher speed-up can be achieved, because the symmetric complex contains multiple copies of the symmetry unit. For dihedral symmetry D₂, the profile is given by I_complex = 4I_A+ 2I_AB+2I_AC+2I_AD (Figure 2a). For cyclic symmetry C_n, all distances between the symmetry units can be computed based on the distances between the first unit and n/2 other units in the complex. The complex profile is computed as $I_{complex} = {nI}_{u_{0}} + n \sum_{i = 1}^{n / 2 - 1} I_{u_{0} u_{i}} + c n I_{u_{0} u_{n / 2}}$ , where U_i is unit i in the symmetric complex, c=1 if n is odd, and c=0.5 if n is even (Figure 2b).

SAXS profile computation for symmetric assemblies. Only the distances between the units marked with arrows are computed. (a) Tetramer with dihedral symmetry D₂. (b) Symmetric assembly with cyclic symmetry C_n.

Clustering

The models are clustered iteratively, as follows. The clustering starts with the docking model that has the lowest Χ score. This model becomes a representative of the current cluster and the C_⟨ atoms in the binding site of its ligand (ie, the ligand C_⟨ atoms within 10Å from the receptor in the docked configuration) provide the frame of reference for calculating the ligand interface C_⟨ RMSD for each one of the remaining (unclustered) models. All models with a ligand interface C_⟨ RMSD below 4Å are assigned to the current cluster. When the cluster can no longer be expanded, the docking model with the lowest Χ score from the unclustered set of models initiates a new cluster.

Conformational Refinement

The steric clashes, introduced by PatchDock, are removed with FireDock (Andrusier et al., 2007; Mashiach et al., 2008) that refines side chain positions and relative protein orientations. After steric clashes are removed, an energy-like function is used to rank the docking models. This interface energy score is a weighted combination of softened van der Waals, desolvation, electrostatics, hydrogen bonding, disulfide bonding, π-stacking, aliphatic interactions, and rotamer preferences (Andrusier et al., 2007).

Composite Score and Ranking

The interface energy score and SAXS profile fitting scores (∣ values) of the final docking models are rescaled independently to the [0-1] interval and the composite score is computed as: S_Composite = S_Energy + 0.3S_SAXS, where S_Energy and S_SAXS are the rescaled scores and 0.3 is the weight of the SAXS term. This weight was determined by enumerating a range of weight values to maximize the number of cases with near-native model within 10 top scoring models. Half of the Benchmark 1 randomly selected cases were used to determine the weight and the other half was used for validation.

Benchmark

We test the method with two types of data. First, each test case consists of unbound component structures and a simulated SAXS profile for their complex. Second, each test case consists of bound component structures and an experimentally obtained SAXS profile for their complex.

Benchmark 1 - Simulated SAXS profiles

Protein-protein docking benchmark 3.0 (Hwang et al., 2008) is used for method validation with computed SAXS profiles. This benchmark contains 124 unbound-unbound test cases, classified into 88 rigid-body cases (I-RMSD ≤ 1.5Å), 19 medium-difficulty cases (1.5Å < I-RMSD ≤ 2.2Å), and 17 difficult cases (I-RMSD > 2.2Å). The complexes are also classified into three biochemical categories: enzyme–inhibitor (35 cases), antigen–antibody (25 cases), and others (64 cases). A SAXS profile is simulated using the co-crystallized structure of the complex for a q range from 0 to 0.5Å^-1. For Χ calculations involving only computed profiles, the relative error is calculated from the Poisson distribution with λ of 10 and bound to 5%.

Benchmark 2 – Experimental SAXS profiles

Experimental SAXS profiles and associated relative errors for 6 complexes (Table 1, Figure 3 - left column) from the BIOSIS database are used (Hura et al., 2009). These cases include three symmetric dimers with cyclic symmetry, two tetramers with dihedral symmetry, and one decamer with dihedral symmetry. The dimers are docked with SymmDock starting from the monomer structure. The tetramers are also docked with SymmDock by exhaustive enumeration of C₂ symmetric models (Methods). For the decamer, we start with the dimer structure and apply SymmDock to build a pentamer of dimers. BIOISIS entries include structures with modeled missing residues. These residues are used for SAXS calculations, but not in docking.

Table 1.

Benchmark 2 cases: PDB codes, complex type, number of residues, fraction of missing residues, R_G^3D (radius of gyration of the complex structure), R_G^exp (radius of gyration computed from the experimental SAXS profile), Χ value for the fit between the experimental and computed SAXS profiles with and without fitting parameters. Disordered regions were added in BIOISIS structure (Hura et al., 2009) in cases marked with *.

PDB	Complex type	Residue number	Fraction of missing residues	R_G^3D	R_G^exp	Χ
1YEM*	C₂ dimer	356	0	25.70	27.40	3.89 (7.81)
3F7L	C₂ dimer	302	0	20.02	20.66	3.23 (13.28)
2DVM*	C₂ dimer	948	0	32.98	32.83	2.88 (2.95)
2E2G	D₅ decamer	2382	1.6	50.97	51.24	7.72 (8.37)
2G4J	D₂ tetramer	1544	0	31.77	31.08	4.69 (14. 93)
1DQK	D₂ tetramer	496	0	21.07	22.39	4.78 (21.35)

Open in a new tab

Benchmark 2 complexes and SAXS profile fitting scores. In the first column, experimental SAXS profiles (red) and the computed profiles (green) from complex structures (ribbons) for Benchmark 2 cases are shown. The plots in the second column display SAXS scores (Χ values on the y-axis) as a function of C_⟨ RMSD (x-axis) for all models from the rigid docking stage. The third column plots the SAXS profile fitting score *versus R_G^3D* for the same set of complexes. The color-coding reflects the C_⟨ RMSD. The black line indicates the *R_G^exp* and the grey lines show the thresholds for filtering.

Assessment Criteria

An assessment criterion similar to that from CAPRI is used (Lensink et al., 2007). A docking model is considered acceptable (one star) if a ligand C_⟨ RMSD (L-RMSD) after superposition of the receptor is below 10Å or interface C_⟨ RMSD (I-RMSD) is below 4Å. A docking model is of medium accuracy (two stars) if L-RMSD < 5Å or I-RMSD < 2Å, and of high accuracy (three stars) if L-RMSD < 1Å or I-RMSD < 1Å. A docking model of acceptable or better accuracy is referred to as near-native. For symmetric complexes, C_⟨ RMSD is computed after least-squares-fit superposition of the model on the native complex. Symmetric docking model is considered near-native if C_⟨ RMSD is below 5Å.

Results

We begin by assessing the accuracy of the radius of gyration computed from a SAXS profile, followed by quantifying the match between an experimental SAXS profile and a SAXS profile computed for the native complex. Finally, we assess FoXSDock by its performance on the two benchmarks.

Accuracy of predicted radius of gyration

We first assess to what degree the radius of gyration (R_G^exp) computed from the SAXS profile of the complex fits the radius of gyration (R_G^3D) of the complex structure. This analysis is used to find the threshold values for coarse SAXS filtering stage. In Benchmark 1 with simulated SAXS profiles, we compared the R_G^exp to the R_G^3D of the best possible docking models (Table S1). The best possible docking model is constructed by superposing unbound components to the complex structure. In Benchmark 2 with experimental SAXS profiles, the R_G^exp is compared with the R_G^3D of the complex structures (Table 1). In Benchmark 1, the R_G^exp is predicted with 2.18% accuracy (average) for cases with less than 3% missing residues (81 cases out of 124). The fractional difference in the number of residues in the complexes with bound and unbound structures is referred to as the fraction of missing residues. We conclude that R_G measure is not very sensitive to conformational changes upon binding. Thus, it is possible to compute an accurate R_G^3D even when using unbound components for docking. In Benchmark 2, the R_G^exp can be up to ~7% larger than the R_G^3D. One possible explanation is that the hydration layer of a protein is not taken into account when computing R_G^3D from the coordinates of protein atoms.

Based on the numbers above, the thresholds for coarse SAXS filtering by R_G^exp are set to 0.9R_G^exp and 1.04R_G^exp (ie. a docking model is filtered out if its R_G^3D is more than 10% smaller or 4% larger than the R_G^exp). The R_G^3D of 119 (out of 124) complexes of Benchmark 1 and all the complexes of Benchmark 2 is within these thresholds (0.9R_G^exp ≤R_G^3D≤ 1.04R_G^exp). In the remaining 5 cases of Benchmark 1, the fraction of missing residues is more than 5% or large conformational changes are involved (I-RMSD > 8Å).

Accuracy of profile fit

For Benchmark 1, the profile computed from the complex structure is compared to the profile computed from the best possible docking model of unbound components (Table S1). The best possible docking model is constructed by superposing unbound components to the complex structure. The accuracy of the profile fit is assessed as a function of the fraction of missing residues and the I-RMSD between the bound and unbound component structures (Figure 4). As expected, Χ increases with the increase in the fraction of missing residues and I-RMSD.

Impact on the SAXS profile fitting by induced fit and missing residues. (a) I-RMSD between complex structures consisting of bound and unbound components as a function of Χ. (b) fraction of missing residues as a function of Χ.

For Benchmark 2, experimental SAXS profiles are compared with profiles computed from the complex structures. In all cases, except 1YEM and 2E2G, a good fit is observed (ie, the experimental and computed profiles overlap for q < 0.2 Å^-1; Figure 3a). The difference between the experimental and computed profiles for 1YEM might be explained by the modeling error for the residues missing in the crystallographic structure as well as by the difference between the solution and crystal structures. For 2E2G, an additional possible cause for the profile mismatch includes the differences between the experimental profile measured for PF1033 from P. furiosus and the profile computed from the homologous structure 2E2G (57% sequence identity).

Accuracy of FoXSDock based on Benchmark 1

Next, we assess each stage of the method to gain a better appreciation of the contribution of each stage to the final accuracy. The goal of each stage is to output as many good scoring near-native models as possible, while eliminating as many non-native models as possible. However, the emphasis on these two aspects changes with the progress through the flowchart. In the initial stages, the priority is to produce as many near-native models as possible, while in the later stages the priority is to rank them highly.