Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2013 Sep 12.
Published in final edited form as: J Am Chem Soc. 2012 Aug 29;134(36):14686–14689. doi: 10.1021/ja306359z

Contrast-Matched Small Angle X-ray Scattering from a Heavy Atom-Labeled Protein in Structure Determination: Application to a Lead-Substituted Calmodulin-Peptide Complex

Alexander Grishaev 1, Nicholas J Anthis 1, G Marius Clore 1,
PMCID: PMC3442789  NIHMSID: NIHMS403338  PMID: 22908850

Abstract

The information content in one-dimensional solution X-ray scattering profiles is generally restricted to low-resolution shape and size information that, on its own, cannot lead to unique three-dimensional structures of biological macromolecules comparable to all-atom models derived from X-ray crystallography or NMR spectroscopy. Here we show that contrast-matched X-ray scattering data collected on a protein incorporating specific heavy atom labels in 65% aqueous sucrose buffer can dramatically enhance the power of conventional small and wide angle X-ray scattering (SAXS/WAXS) measurements. Under contrast-matching conditions the protein is effectively invisible and the main contribution to the X-ray scattering intensity arises from the heavy atoms, allowing direct extraction of pairwise distances between them. In combination with conventional aqueous SAXS/WAXS data, supplemented by NMR-derived residual dipolar couplings (RDCs) measured in a weakly aligning medium, we show that it is possible to position protein domains relative to one another within a precision of 1 Å. We demonstrate this approach with respect to the determination of domain positions in a complex between calmodulin, in which the four Ca2+ ions have been substituted by Pb2+, and a target peptide from myosin light chain kinase. The uniqueness of the resulting solution is established by an exhaustive search over all models compatible with the experimental data, and could not have been achieved using aqueous SAXS and RDC data alone. Moreover, we show that the correct structural solution can be recovered using only contrast-matched SAXS and aqueous SAXS/WAXS data.


Small and wide angle X-ray scattering (SAXS/WAXS) in solution yield one-dimensional profiles that are determined by the pairwise distances between all atoms in a molecule.1 Because of the convoluted nature of SAXS/WAXS, it is not possible, outside of the low q-range, to directly relate features of the scattering profiles to a particular structure. Further, it is generally not feasible to derive unique three-dimensional structures from one-dimensional curves, as many models may be compatible with a given scattering profile. On the other hand, direct refinement against SAXS/WAXS data in combination with other constraints, such as those from NMR data, can be extremely powerful.2 In this communication, we investigate the utility of scattering arising from a few heavy atoms, such as Pb2+, under contrast-matched conditions at which the protein is rendered effectively invisible at low scattering angles by using a 65% aqueous sucrose solution as a solvent.3 The contrast-matched scattering profiles arise from only a small number of heavy atoms and should therefore effectively constrain the distances between them. We demonstrate the utility of contrast-matched SAXS using, as an example, a calmodulin (CaM)-myosin light chain kinase (MLCK) peptide complex in which the four coordinated Ca2+ ions have been substituted by Pb2+. We show that the combination of contrast-matched SAXS, aqueous SAXS/WAXS and NMR-derived residual dipolar couplings (RDCs)4 is sufficient to define the positions of the two domains of CaM within a precision of 1 Å, and that the contrast-matched SAXS data are critical for discriminating between several models that are fully consistent with aqueous SAXS/WAXS and RDC data.

Examination of Ca2+-loaded CaM-peptide complexes solved by NMR and crystallography reveals considerable variability in the relative positions of the N- and C-terminal domains (Fig. 1). While some of this variability can be attributed to differences in the target peptide sequences leading to distinct binding modes,5 even unique binding modes exhibit substantial structural differences. Further, none of these structures fit the X-ray scattering data within experimental error (Figs. S4 and S5 and Table S1) and only in a handful of cases are the fits of the RDC data (measured in a weakly aligning medium of phage pf16) to the whole complex comparable to those for the individual domains (Table S1).

Figure 1.

Figure 1

Distribution of the relative positions of the N- and C-domains of CaM in complexes with target peptides. Alignment on the N (blue) and C (red) domains are shown in (A) and (B), respectively, with the aligned domain represented by a ribbon of a single structure (1MXE9) and the non-aligned domain by a backbone traces of the crystal structures listed in Table S1. The peptide from one of the structures (1MXE9) is shown as a green ribbon to illustrate how the two domains clamp the peptide. The Ca2+ atoms of the N- and C-domains are shown in panels A and B, respectively, as grey spheres.

Substitution of Ca2+ by Pb2+ does not significantly perturb the structure of CaM (as judged by crystallography,7 NMR and SAXS/WAXS), its function,8 or binding affinity for the MLCK peptide (see Supplementary). Similarly, the structure of the CaM-MLCK complex appears unperturbed by the addition of sucrose as judged by NMR (see Supplementary).

Rather than refine directly against all available SAXS/WAXS and NMR data by simulated annealing, starting from a limited set of initial coordinates as is commonly done,2 we chose to evaluate the agreement with the experimental data by exhaustively sampling all possible, stereochemically feasible, relative geometries of the two domains of CaM. The latter approach guarantees that the best-fitting solution corresponds to the global minimum of the target function and is not simply one of many possible solutions, thereby removing the issue of model degeneracy which is universally recognized as one of the main pitfalls of biomolecular SAXS.

Fits of the RDC data to the individual domains of CaM from a large array of crystal structures indicated that some of the most accurate representations of the N- (residues 1–75) and C-(residues 82–148) domains in solution were those from the 1.7 Å resolution structure of a CaM-peptide complex from CaM-dependent protein kinase I (PDB accession code 1MXE)9 with RDC R-factors of 17% (Table S1). We therefore used the N- and C-domain coordinates from the 1MXE structure throughout this study.

The rotational and translational parameters (three of each) describing the position of the two domains of CaM were systematically sampled in 2° and 1 Å steps, respectively, to generate an initial set of ~4.5x1011 geometries with a spatial resolution of ~1 Å. Structures were removed that (i) had steric clashes <2.5 Å between backbone or Cβ atoms of the two domains; (ii) exhibited an increase of >10% in the RDC R-factor (equivalent to ~10° orientational uncertainty) relative to the minimum value obtained by rigid-body minimization of the relative domain orientations; or (iii) whose radius of gyration (Rgyr) lay outside a range of 13–19 Å. (Note the experimental Rgyrobtained from the aqueous SAXS data via Guinier and P(r) analyses is 17.8±0.4 Å for the CaM-MLCK complex with Ca2+ and 18.3±0.4 Å for the Pb2+ substituted complex). For relative domain geometries that passed the latter requirements, rotation and translation of the MLCK peptide taken from the solution structure of the CaM-MLCK complex (2BBM),10 were obtained by minimization of a target function comprising the intermolecular NOE distance restraints from the 2BBM deposition and a repulsion term between the heavy atoms of the two CaM domains and MLCK to prevent atomic overlap. Additional geometries were then excluded in instances with steric clashes <2.5 Å between the heavy atoms of MLCK and the two domains of CaM or with NOE distance restraint violations >3 Å. Finally the linker (residues 76–81) was built from a PDB database of 6250 non-redundant protein chains encompassing a total of 2.2x106 residues. All contiguous 8-residue stretches that did not contain either Gly or Pro (consistent with the composition of the CaM linker) were selected. The terminal residues of these 8-residue fragments were best-fitted to the backbone atoms of Lys75 of the N-domain and Glu82 of the C-domain for each CaM domain geometry. All linkers exhibiting backbone rms differences <1 Å relative to the coordinates of Lys75 and Glu82 were retained. The backbone linker geometries that did not exhibit steric clashes <2.5 Å with the two CaM domains and the MLCK peptide were processed further, and the best-fitting six-residue backbone segment corresponding to Met76-Ser81 was decorated with the appropriate side chains using residue-specific rotamers,11 avoiding steric clashes with other atoms of the complex.

The calculated SAXS/WAXS curves for the resulting 75,000 models of the CaM-MLCK complex were best-fitted to the experimental data (recorded on beamlines 12-IDC and 12-IDB, APS) using the AXES formalism which makes use of explicit water molecules to model the solvent boundary layer.12 The scattering intensity from the Pb-substituted sample in 65% sucrose was predicted via the Debye formula from the coordinates of the Pb sites. Agreement of the aqueous SAXS/WAXS and contrast-matched SAXS data sets with structural models was evaluated using χ2 statistics.

As is clear from Fig. 2, the discriminating power of the Pb-substituted contrast-matched SAXS data is essential for selecting between the candidate solutions. Three clusters of structures fit equally well to the aqueous SAXS/WAXS data (Figs. 2A and 3A), and the RDCs (Fig. 3C) within a few percentage points of their minimum values (Table 1). These clusters, depicted in blue (Cluster I), green (Cluster II) and red (Cluster III) in Figs. 24 also satisfy the NOE distance restraints between CaM and the bound MLCK peptide (Fig. 2B) indicating that all three capture the placement of the MLCK peptide in the complex within experimental error. The only observable that discriminates between the three clusters is the Pb-substituted contrast-matched SAXS data, and only one cluster, namely Cluster I, satisfies the contrast-matched SAXS data within experimental error (Figs. 2A, 3B and Table 1).

Figure 2.

Figure 2

Agreement with SAXS data. (A) Correlation of the quality of fit of the aqueous SAXS/WAXS data with that from the contrast matched SAXS data for the Pb-substituted CaM-MLCK complex in 65% (w/v) sucrose for the 75,000 structural models generated as described in the text. (B) Correlation of the normalized SAXS fitness score with the intermolecular NOE distance restraint violations between CaM and the MLCK peptide. The three clusters discriminated by the contrast-matched SAXS data that fit equally well to the aqueous SAXS/WAXS, RDC and NOE data are highlighted by the blue, green and red ovals. The normalized SAXS fitness score is given by {[χ2SAXS,water / χ2SAXS,water,min)2 + χ2SAXS,sucrose / χ2SAXS,sucrose,min)2]/2}1/2.

Figure 3.

Figure 3

Agreement between observed and calculated SAXS and RDC data for the structures closest to the mean of each cluster. (A) SAXS/WAXS in water. (B) Contrast-matched SAXS in 65% (w/v) sucrose, and (C) RDCs for the N- (left panel) and C- (right panel) domains. Cluster I is depicted in blue, cluster II in green and cluster III in red.

Table 1.

Parameters of the three clusters selected based on χ2SAXS, water values < 1.5 and grouped according to χ2SAXS, sucrose.a

Cluster I Cluster II Cluster III
χ2SAXS, sucrose 0.69±0.02 1.25±0.22 3.42±0.53
χ2SAXS, water 1.46±0.03 1.46±0.03 1.46±0.03
RDC R-factor 0.18±0.00 0.19±0.00 0.18±0.01
DaNH (Hz) −17.6±0.1 −17.6±0.2 −17.7±0.1
η 0.35±0.01 0.35±0.01 0.35±0.01
rPb(a)-Pb(a’) (Å) 30.9±0.2 31.9±0.3 33.3±0.3
rPb(a)-Pb(b’) (Å) 34.3±0.2 35.6±0.9 37.4±0.4
rPb(b)-Pb(a’) (Å) 36.9±0.2 37.5±0.9 38.2±0.3
rPb(b)-Pb(b’) (Å) 39.7±0.2 40.7±0.3 41.8±0.2
a

The RDC R-factor is given by [<(DobsDcalc)2>/2<Dobs2>/]1/2 where Dobs and Dcalc are the observed and calculated RDCs, respectively. DaNH and η are the magnitude of the principal component of the alignment tensor and the rhombicity, respectively. There are 115 RDCs, comprising 61 and 54 for the N- and C domains, respectively. The correspondence of the Pb2+ atoms to their location in the structure is shown in Fig. 4.

Figure 4.

Figure 4

Structural comparison of Clusters I, II and III. The N-domain (gray) of the three clusters is superimposed, illustrating the relative displacements of the C-domain of Clusters I (blue), II (green) and III (red). The backbone of CaM is displayed as a ribbon diagram, the backbone of the MLCK is shown as a Cα trace, and the positions of the Pb2+ ions as spheres. The two views shown in (A) and (B) are approximately orthogonal to one another. Only Cluster I accounts for the Pb-substituted contrast-matched SAXS data. The backbone rms displacement of the C-domain of Cluster I relative to the 1MXE coordinates9 is 2.5 Å.

A structural comparison of the three clusters is shown in Fig. 4 and the rms differences within and between the clusters is provided in Table 2. For Cluster I, the relative position of the two CaM domains is defined with a precision of 1 Å. When best-fitted to the N-domain, the backbone rms displacements of the C-domains of Clusters II and III relative to Cluster I are much larger, 1.8 and 5.4 Å, respectively, well outside the coordinate precision of Cluster I. The structural differences between the three clusters reflect systematic lengthening of the Pb-Pb distances from Cluster I to Clusters II and III (Fig. 4 and Table 1).

Table 2.

Backbone rms difference between the C-terminal domains of clusters I, II and III, when best-fitted to the N-domain. The diagonal elements in the table correspond to the average rms difference within a cluster to the structure closest to the mean coordinate positions of the cluster, and the off-diagonal elements to the rms differences between the structures closest to the means of the respective clusters.

Cluster Backbone rms difference (Å)
I II III
I 1.1 1.8 5.4
II 2.4 3.9
III 1.4

Given the discriminating power of the Pb-substituted contrast-matched SAXS data, can such data be directly used to accurately extract Pb-Pb distances? To assess this we carried out fits to the contrast-matched SAXS data using a Monte Carlo-generated random sampling of 4-atom geometries to represent the Pb sites within CaM (see Supplementary). We consider three cases: 4 variable inter-domain Pb-Pb distances with the two intradomain distances fixed to the values measured from the domain coordinates (11.7 Å); 5 variables comprising the 4 inter-domain Pb-Pb distances with the two intradomain distances constrained to have the same value; and 6 variables in which all four interdomain and two intradomain Pb-Pb distances are allowed to vary. The results are summarized in Table 3. For the 4-variable case, corresponding to knowledge of the atomic structures of the two CaM domains, the average interdomain Pb-Pb distances derived from the contrast-matched SAXS data have uncertainties of only ~1 Å and are in excellent agreement with the corresponding Pb-Pb distances in Cluster I (cf. Table 1). Introducing an additional variable, with the assumption that the two intradomain Pb-Pb distances are the same, results in larger uncertainties, but agreement with the Cluster I Pb-Pb distances is still excellent and within the uncertainties of the distance estimates. This case could be modeled based on the high sequence similarity of the N- and C-domains of CaM if their atomic structures were unknown. Finally, the completely unrestrained 6-variable simulation still results in reasonable agreement with the longer, interdomain Pb-Pb distances from the all-atom model but exhibits marginal agreement with the correct intradomain Pb-Pb separations. These calculations therefore indicate that up to 5 (from a total of 6) Pb-Pb distances can be extracted both precisely and accurately from a single curve of the contrast-matched SAXS data arising from 4 Pb labels. Accelerated deterioration of the uncertainties of the shorter separations is likely due to increased uncertainty at wider angles and the limited qmax of the fitted data, as the impact of the shorter Pb-Pb distances is increasingly felt at higher q values.

Table 3.

Pb-Pb distances determined directly from the Pb-substituted contrast-matched SAXS data.

Number of
unknown
distances
Pb-Pb distances (Å)
4a 31.5±1.0 33.8±0.9 36.5±1.1 39.6±1.1
5b 11.1±2.9 30.2±2.3 33.9±1.3 36.3±1.3 40.6±1.5
6 6.4±3.6 14.7±3.1 29.9±1.9 32.3±1.9 38.1±1.8 41.1±1.3
a

Only the 4 interdomain Pb-Pb distances are varied and the two intradomain Pb-Pb distances are fixed to their known value of 11.7 Å.

b

The two intradomain Pb-Pb distances are varied but constrained to be equal, and the 4 interdomain Pb-Pb distances are also varied.

In light of the above results we sought to investigate whether the contrast-matched SAXS and aqueous SAXS/WAXS data alone are sufficient to arrive at the correct solution without recourse to filtering by RDCs and intermolecular NOE distance violations. Using the same grid procedure, structures were filtered by the fits to the X-ray scattering data using a cutoff of χ 2SAXS,water < 1.69 and χ2SAXS,sucrose < 0.72, the absence of steric clashes, an Rgyr of 13–19 Å, and the ability to form a linkage between the N- and C-domains. The results in Fig. 5 indicate that the cluster with the lowest normalized SAXS fitness score yields solutions with a coordinate accuracy of better than 1 Å by reference to Cluster I.

Figure 5.

Figure 5

Accuracy of best-fitting models (χ2SAXS,water < 1.69 and χ2SAXS,sucrose < 0.72) obtained from the grid search procedure using aqueous and contrast-matched SAXS data as the only experimental restraints. The normalized SAXS fitness score is defined in the legend to Fig. 2. The best Cluster I structure is the structure closest to the mean coordinate positions of the Cluster I structures.

While this work capitalizes on the ability of CaM to specifically bind Pb2+ in place of Ca2+, applications of this approach can be readily extended beyond metal-binding proteins by incorporating heavy atom ions such as Pb2+ or Hg2+ into EDTA moieties conjugated via disulfide bonds to engineered surface cysteines as routinely done in NMR paramagnetic relaxation enhancement studies.13 Although the EDTA-metal moiety samples a large region of conformational space, the metal-metal separations measured by contrast-matched SAXS are simple linear averages of all the conformations present in solution and each metal atom can therefore be represented by a single average position.

Although, in principle, similar information has been obtained from neutron scattering of 240Pu,14 or X-ray scattering of DNA with attached gold nanoclusters,15 the present approach does not suffer from the lower signal-to-noise of small angle neutron scattering and complications of having to deal with a highly radioactive isotope, or the necessity to decompose the observed data into individual scattering functions from measurements on a series of samples.

Finally, the results obtained without recourse to filtering by NMR data suggest that accurate ‘triangulation-driven’ assembly of multi-component protein architectures, based only on a combination of aqueous and contrast-matched/heavy-atom-labeled SAXS data, is feasible providing the structures of the individual subunits are known.

Supplementary Material

1_si_001

Acknowledgements

We thank A. Bax for useful discussions, and S. Seifert and X. Zuo for assistance with SAXS data collection. This work was supported by the intramural program of NIDDK and the Intramural AIDS Antiviral Program of the Office of the Director of the NIH (G.M.C.). Use of the Advanced Photon Source (D.O.E.) and the shared scattering beamline resource (PUP-77 agreement between NCI, NIH and Argonne National Laboratory) is acknowledged.

Footnotes

Supporting Information: Experimental procedures, data tables and figures. Coordinates of the best-fitting Cluster I model and experimental data (PDB 2LV6) have been deposited in the Protein Data Bank (http://www.rcsb.org). This material is available free of charge via the Internet at http://pubs.acs.org.

References

  • (1).(a) Koch MH, Vachette P, Svergun DI. Q. Rev. Biophys. 2003;36:147–227. doi: 10.1017/s0033583503003871. [DOI] [PubMed] [Google Scholar]; (b) Putnam CD, Hammel M, Hura GL, Tainer JA. Q. Rev. Biophys. 2007;40:191–285. doi: 10.1017/S0033583507004635. [DOI] [PubMed] [Google Scholar]; (c) Hura, et al. Nature Methods. 2009;6:606–612. doi: 10.1038/nmeth.1353. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (2).(a) Grishaev A, Wu J, Trewhella J, Bax A. J. Am. Chem. Soc. 2005;127:16621–16628. doi: 10.1021/ja054342m. [DOI] [PubMed] [Google Scholar]; (b) Schwieters CD, Clore GM. Biochemistry. 2007;46:1152–1166. doi: 10.1021/bi061943x. [DOI] [PubMed] [Google Scholar]; (c) Grishaev A, Tugarinov V, Kay LE, Trewhella J, Bax A. J. Biomol. NMR. 2008;40:95–106. doi: 10.1007/s10858-007-9211-5. [DOI] [PubMed] [Google Scholar]; (d) Schwieters CD, Suh JY, Grishaev A, Ghirlando R, Takayama Y, Clore GM. J. Am. Chem. Soc. 2010;132:13026–13045. doi: 10.1021/ja105485b. [DOI] [PMC free article] [PubMed] [Google Scholar]; (e) Takayama Y, Schwieters CD, Grishaev A, Ghirlando R, Clore GM. J. Am. Chem. Soc. 2011;133:424–427. doi: 10.1021/ja109866w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (3).Lipfert J, Doniach S. Ann. Rev. Biophys. Biophys. Chem. 2007;36:307–327. doi: 10.1146/annurev.biophys.36.040306.132655. [DOI] [PubMed] [Google Scholar]
  • (4).Bax A, Grishaev A. Curr Opin Struc Biol. 2005;15:563–570. doi: 10.1016/j.sbi.2005.08.006. [DOI] [PubMed] [Google Scholar]
  • (5).(a) Hoeflich KP, Ikura M. Cell. 2002;108:739–742. doi: 10.1016/s0092-8674(02)00682-7. [DOI] [PubMed] [Google Scholar]; (b) Maximciuc AA, Putkey JA, Shamoo Y, Mackenzie KR. Structure. 2006;14:1547–1556. doi: 10.1016/j.str.2006.08.011. [DOI] [PubMed] [Google Scholar]
  • (6).Clore GM, Starich MR, Gronenborn AM. J. Am. Chem. Soc. 1998;120:10571–10572. [Google Scholar]
  • (7).Kursula P, Majava V. Acta Cryst. 2007;F63:653–656. doi: 10.1107/S1744309107034525. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (8).Fullmer CS, Edelstein S, Wasserman RH. J. Biol. Chem. 1985;260:6816–6819. [PubMed] [Google Scholar]
  • (9).Clapperton JA, Martin SR, Smerdon SJ, Gamblin SJ, Bayley PM. Biochemistry. 2002;41:14669–14679. doi: 10.1021/bi026660t. [DOI] [PubMed] [Google Scholar]
  • (10).Ikura M, Clore GM, Gronenborn AM, Zhu G, Klee CB, Bax A. Science. 1992;256:632–638. doi: 10.1126/science.1585175. [DOI] [PubMed] [Google Scholar]
  • (11).Krivov GG, Shapovalov MV, Dunbrack RL., Jr Proteins. 2009;77:778–795. doi: 10.1002/prot.22488. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (12).Grishaev A, Guo L, Irving T, Bax A. J. Am. Chem. Soc. 2010;132:15484–15486. doi: 10.1021/ja106173n. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (13).Clore GM, Iwahara J. Chem. Rev. 2009;109:4108–4139. doi: 10.1021/cr900033p. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (14).Seeger PA, Rokop SE, Palmer PD, Henderson SJ, Hobart DE, Trewhella J. J. Am. Chem. Soc. 1997;119:5118–5125. [Google Scholar]
  • (15).Mathew-Fenn RS, Das R, Silverman JA, Walker P, Harbury PAB. PloS ONE. 2008;3:e3229. doi: 10.1371/journal.pone.0003229. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1_si_001

RESOURCES