Skip to main content
The Journal of Chemical Physics logoLink to The Journal of Chemical Physics
. 2013 Jan 11;138(2):024112. doi: 10.1063/1.4774148

Fast-SAXS-pro: A unified approach to computing SAXS profiles of DNA, RNA, protein, and their complexes

Krishnakumar M Ravikumar 1, Wei Huang 1, Sichun Yang 1,a)
PMCID: PMC5942439  PMID: 23320673

Abstract

A generalized method, termed Fast-SAXS-pro, for computing small angle x-ray scattering (SAXS) profiles of proteins, nucleic acids, and their complexes is presented. First, effective coarse-grained structure factors of DNA nucleotides are derived using a simplified two-particle-per-nucleotide representation. Second, SAXS data of a 18-bp double-stranded DNA are measured and used for the calibration of the scattering contribution from excess electron density in the DNA solvation layer. Additional test on a 25-bp DNA duplex validates this SAXS computational method and suggests that DNA has a different contribution from its hydration surface to the total scattering compared to RNA and protein. To account for such a difference, a sigmoidal function is implemented for the treatment of non-uniform electron density across the surface of a protein/nucleic-acid complex. This treatment allows differential scattering from the solvation layer surrounding protein/nucleic-acid complexes. Finally, the applications of this Fast-SAXS-pro method are demonstrated for protein/DNA and protein/RNA complexes.

I. INTRODUCTION

Small angle X-ray scattering (SAXS) can serve as a complementary technique for structural studies of large biomolecular complexes.1–8 It provides useful information under physiological conditions and more importantly, offers a rather sensitive measure of large spatial separation, e.g., between the subunits of a multi-component complex formed by proteins and nucleic acids. Given the low-resolution nature of SAXS data, structural interpretation using SAXS can be facilitated by using computational approaches.9–27 Under this scenario, computing theoretical SAXS data for a given structural model of a biomolecular complex, e.g, generated from molecular simulations, is thus critical for such an interpretation of measured SAXS data.

There are quite a few algorithms available for computing SAXS profiles. They differ in two main aspects: one is about molecular representation of the macromolecule itself and the other is about the treatment of its solvation layer that can contribute to the total scattering due to the excess electron density relative to the bulk solvent.28 In the former, an atomistically detailed representation is widely used,13,23,29–31 although simplified bead-like representations have been recently implemented14,20,32–34 by taking advantage of the low-resolution nature of SAXS data. In the latter aspect of SAXS computation, a key consideration is the treatment of water molecules in the solvation layers, either implicitly or explicitly. Previous studies show that both types of treatment are quite accurate to calculate the scattering for proteins as well as nucleic acids.13,23,29,30,32,33,35 It should be noted that in both explicit and implicit treatments, a uniform excess electron density in the solvation layer is often assumed across the boundaries of proteins and nucleic acids. However, it is recognized that the solvation layers surrounding proteins and nucleic acids can have different scattering contributions due to their different chemical and ionic characters,36–42 as acknowledged by previous studies on protein and RNA scattering.32,33 Thus, it is desirable to have SAXS computational methods that take this differential scattering from the solvation layer into account, especially for large biomolecular complexes formed by protein, RNA, and/or DNA.

To address this proposition, we present a generalized method termed Fast-SAXS-pro, for computing SAXS profiles of protein/nucleic-acid complexes. First, a simplified two-particle-per-nucleotide representation of DNA nucleotides is introduced and their structure factors are derived in a similar fashion to the recently developed model for RNA.33 The model is calibrated and further validated using experimental SAXS data. Second, the stand-alone coarse-grained models for protein and nucleic acid are combined and extended for computing scattering profiles of protein/nucleic-acid complexes. In particular, the differential scattering contribution from the solvation layer is taken into consideration. Finally, the applications of this Fast-SAXS-pro method are demonstrated for two different protein/nucleic-acid complexes.

II. METHODS AND DETAILS

A. Derivation of DNA structure factors

Following Yang et al.,33 a coarse-grained representation was implemented to calculate the DNA solution scattering. Each DNA nucleotide is represented by two pseudo-particles or beads: one bead representing the backbone sugar-phosphate group and the other representing the base side-chain group. Structure factors for all the resulting beads of DNA nucleotides are derived via the Debye formula by

FCG(q)=i,j=1mfi(q)fj(q)sin(qrij)qrijPDB1/2, (1)

where q is the amplitude of the momentum transfer, q = 4πsin (θ)/λ (2θ is the scattering angle and λ is the x-ray wavelength), rij is the distance between atoms i and j, and m is the number of atoms within the backbone (or the base group). The ⟨...⟩PDB indicates that the scattering factor is averaged over a set of 100 high-resolution (<2 Å) crystal structures taken from the Protein Data Bank (PDB).43 The missing hydrogen atoms in the structure files were added using the Reduce software.44 f(q) is the corrected atomic form factor after excluded volume effect is taken in account,45

f(q)=f(q)vρsexp(πv2/3q2), (2)

where f(q) is the atomic scattering amplitude in vacuum, whose analytical expressions have been provided for the atom types of C, N, O, H, P, and S using the Cromer-Mann scattering-factor coefficients.46 v is the experimentally observed volume of each atom45 and ρs ≈ 0.334 e3 is used as the bulk solvent electron density at 20 oC.

The positions of the DNA beads were determined as follows. For each bead, the center-of-scattering (RCOS) is calculated by

R COS =i=1mri|fi(0)|2i=1m|fi(0)|2, (3)

where fi(0) is the solvent-corrected atomic form factor f(q) at q = 0 (Eq. (2)) and ri is the coordinate of the ith atom of the backbone or the base group. The closest atom to the center-of-scattering of RCOS is selected to represent its corresponding bead.

B. SAXS computation with an explicit solvation layer

A solvation layer was used for calculating DNA solution scattering, where explicit water molecules were placed around the DNA surface to account for the excess electron density. A pre-equilibrated TIP3P47 water box was used to add this solvation layer, where each water molecule is represented by a single bead at its oxygen position.32 Similar to previous treatments of protein and RNA,32,33 a 3-Å-thick solvation layer was obtained by retaining water molecules within the distance range of 3.5-6.5 Å from the DNA. The effective structure factor of a simplified water molecule is given by

FCG(q)=w DNA i,j=13fi(q)fj(q)sin(qrij)qrij1/2, (4)

where the summation is over the three water atoms and wDNA is the weighting factor used to calibrate the scattering contribution from the DNA solvation layer.

Finally, the total SAXS intensity of DNA is calculated by

Ical(q)=i,j=12N+MFiCG(q)FjCG(q)sin(qrij)qrij, (5)

where N is the number of nucleotides, M is the number of water molecules in the solvation layer, and FiCG(q) is the derived structure factor of a nucleotide bead or a water molecule (Eqs. (1) and (4)).

C. Details of SAXS data collection

SAXS data of a 18-bp double stranded DNA molecule were collected at the BioCAT-18ID at the Advanced Photon Source. This DNA duplex, whose crystal structure is available (PDB entry 1HCQ),48 consists of two strands: 5-CCAGGTCACAGTGACCTG-3 and 5- CCAGGTCACTGTGACCTG -3 (purchased from Integrated DNA Technologies with HPLC grade). These two strands were mixed in 1:1 ratio in 10 mM HEPES (pH = 7.4), 100 mM NaCl, 100 mM KCl, 4 mM MgCl2, and annealed by heating at 95 °C for 5 min and slowly cooling to room temperature. The resultant DNA sample with a concentration of 1 mg/ml was exposed under a 12-keV x-ray beam using a flow-cell setup kept at a temperature of 10 °C. The final scattering intensity was averaged over a series of 1-s exposure images.

III. RESULTS AND DISCUSSION

The method for computing SAXS profiles of protein/nucleic-acid complexes is developed on the basis of previous SAXS computing methods for protein and RNA molecules.32,33 A similar method is presented below for DNA molecules. First, a DNA molecule is simplified into a two-bead-per-nucleotide representation and the effective structure factors of resulting beads are derived. Second, the model calibration, which includes the scattering contribution from the DNA solvation layer, is achieved using experimentally measured SAXS data of a 18-bp DNA duplex. The model is further validated on a 25-bp DNA duplex. Finally, based on this DNA model, a method for computing the scattering of protein and nucleic acid complexes is achieved, and its applications to protein/DNA and protein/RNA complexes are demonstrated.

A. Structure factors of DNA nucleotides

A coarse-grained representation is introduced for DNA molecules, where each nucleotide is simplified into two pseudo-particles or beads: one representing the sugar-phosphate backbone and the other representing the base group. For each bead, an effective structure factor is derived using the Debye formula (see Sec. II A). Figure 1 (left panel) shows the resultant coarse-grained structure factors FCG(q), derived using Eq. (1), for the backbone and base beads of the four DNA nucleotides. Note that the excluded volume effect is inherently taken into account in these coarse-grained structure factors (Eq. (2)). It is not surprising that the structure factors for all the backbone groups of adenine (A), cytosine (C), guanine (G), and thymine (T) are nearly identical, since the configuration and the chemical composition of the backbone group are similar. On the other hand, it is quite clear that the structure factors for the base beads are different due to their different chemical composition. Given the structural similarity between DNA and RNA nucleotides, comparison between them is also made on these FCG(q) curves. Figure 1 (right panel) shows the structure factors of RNA beads, which has been previously reported,33 but re-calculated here in a similar fashion to DNA. Note that RNA nucleotides have an additional hydroxyl group in their backbone sugar group compared to DNA, and the resulting structure factor of the RNA backbone is different from that of DNA. Clearly, the base's structure factors are similar between DNA and RNA. Thus, the coarse-grained structure factors of DNA nucleotides are derived.

FIG. 1.

FIG. 1.

Derivation of coarse-grained DNA structure factors. (Left) A simplified two-bead representation of each DNA nucleotide, where one bead (in blue) represents the sugar-phosphate backbone group and the other (in red) the base side-chain group. The atoms colored in blue and red are selected to represent the positioning of each simplified group, based on their center-of-scattering as defined in Eq. (3). Their structure factors are shown on the bottom for both backbone and base groups. For comparison, structure factors and the positions of RNA nucleotides are also shown on the right.

The positions of these DNA beads are assigned based on the calculations of center-of-scattering of the backbone and base groups using Eq. (3). Given the calculated positions, the closest atom to the center-of-scattering of the backbone or the base, is chosen to represent its corresponding bead. Figure 1 (top) illustrates those selected atoms for all the beads, where the O5 atom is picked (and marked in blue) for all the sugar-phosphate backbones (Fig. 1), while different atoms (marked in red) are selected to represent their corresponding base beads. Thus, the positioning of DNA coarse-grained beads are determined.

B. Calibration of the solvation layer scattering

To account for the scattering from the DNA solvation layer, a layer of explicit water molecules is placed around the DNA surface. This is achieved by using a pre-equilibrated TIP3P water box (see Sec. II B). Note that water molecules in this box are used only as dummy atoms to account for excess electron density relative to bulk solvent, mainly due to water hydration and ion association in its solvation layer. In this case, the excess electron density is accounted for by assigning a weighting factor (wDNA) to the scattering of these water molecules (Eq. (4)). This treatment is similar to the previously developed SAXS computational methods for protein and RNA.32,33 Using this explicit solvation representation, theoretical SAXS profiles can be computed using the Debye formula described in Eq. (5), although the factor of wDNA remains to be determined.

To calibrate the weighting factor wDNA, a 18-bp double-stranded DNA is used as a model system (PDB entry 1HCQ),48 whose coarse-grained representation is shown in Fig. 2(a). Experimental SAXS data (Iexp(q)) of this DNA duplex were collected (see details in Sec. II C) and shown in Fig. 2(b). By comparing theoretical against experimental SAXS profile, the value of wDNA is optimized. This comparison is carried out by measuring the similarity between Iexp(q) and Ical(q) via

χ2=q min q max 1δIlog2(q)(logI cal (q)logI exp (q)Δ offset )2, (6)

where Ical(q), the theoretical SAXS profile, is calculated using different wDNA values. The values of qmin and qmax are the lower and upper limits of the q-range of measured SAXS data. δIlog(q) is the experimental uncertainty of logIexp(q) and Δoffset is the offset between log Ical(q) and log Iexp(q) at q = qmin. Note that in this similarity measure, a lower χ2 value indicates a better match between theoretical and experimental SAXS data.

FIG. 2.

FIG. 2.

Calibration of the computational method for DNA scattering. (a) Coarse-grained representation of a 18-bp double-stranded DNA from nuclear estrogen receptor alpha (PDB entry 1HCQ). Its backbone beads are colored in blue, and the base is in red. The blue dots represent explicit water molecules surrounding the DNA molecule. (b) Measured SAXS profile (black dotted line) and calculated SAXS profile (red solid line) using Eq. (5) with the value of weighting factor wDNA=7%. Experimental uncertainties are shown as the vertical bars. (c) A plot of χ2 as a function of different wDNA values.

Figure 2(c) shows the plot of χ2 versus wDNA, where the value of wDNA = 7% yields a minimum χ2 value with an optimal match between Ical(q) and Iexp(q). The best-fit Ical(q) calculated by using wDNA=7% is also plotted in Fig. 2(b). The good agreement with Iexp(q) suggests that an optimal wDNA of 7% can best reproduce experimental SAXS profile of this DNA duplex.

To test its validity to other DNAs, this factor of wDNA = 7% is used for the SAXS calculations of a 25-bp DNA duplex (Fig. 3(a)), whose experimental SAXS data are available in the literature.49 Figure 3(b) shows both experimental and calculated SAXS curves using wDNA = 7%, where a good match is observed between them. This validity test suggests that the method described is capable of determining SAXS profiles of DNA molecules.

FIG. 3.

FIG. 3.

Validation on a 25-bp DNA duplex. (a) A two-bead-per-nucleotide representation. The blue dots around DNA represent the explicit water molecules in the solvation layer. (b) Experimental and calculated SAXS profiles using wDNA=7%. Experimental uncertainties are shown as the vertical bars. The experimental SAXS data used here are taken from Fig. 1 in Ref. 49.

It should be noted that the scattering intensity can vary depending on salt concentration. However, in the case of this 25-bp DNA duplex, it appears that similar SAXS profiles are observed under a wide range of salt concentrations including 150 mM NaCl (at a low DNA concentration).50 Given that a similar salt concentration is also used for the 18-bp DNA, it appears that this wDNA value is applicable at a physiological salt concentration, although its applications to DNA under high salt conditions remain to be examined on a case-by-case basis.

The characterization of the solvation layer of DNA can be compared with that of protein and RNA, which have been obtained previously.32,33 In a similar fashion to DNA, a broad range of weighting factors are re-examined here for RNA and proteins using the model systems: tRNA (PDB entry 2K4C)11 and lysozyme (PDB entry 6LYZ).51 Figure 4 shows their plots of χ2 as a function of the weighting factor for the above model systems. The results show the differential contribution of the solvation layer to the total scattering, where wRNA = 13% and wprot = 4% are the optimized weighting factors for tRNA and lysozyme, respectively. Taken together, with wDNA = 7% for DNA, these results suggest that excess electron density in the solvation layer can contribute differently to the total scattering intensities of protein and nucleic acid.

FIG. 4.

FIG. 4.

Plots of χ2 as a function of weighting factors for RNA, DNA, and proteins. tRNA-val (PDB entry 2K4C) and lysozyme (PDB entry 6LYZ) were used as model systems. Their SAXS data were taken from Refs. 11,29, respectively. The best match with experimental SAXS data, measured by the lowest χ2, is at wRNA = 13% for tRNA, and wprot = 4% for lysozyme. For comparison, the plot for DNA is shown again in the middle panel.

We emphasize that the introduced weighting factor represents a measure of excess electron density in the solvation layer compared to the bulk buffer solution. For example, the value of wRNA = 13% indicates that the excess electron density in the RNA solvation layer is 113% of the bulk buffer. In other words, the relation of wRNA > wDNA > wprot suggests that the effective electron density surrounding RNA is higher than DNA, then followed by protein. Compared with wprot, the higher wDNA value is also consistent with a higher water density on the DNA surface as suggested previously.38 Our results also suggest that RNA has a higher excess electron density, in accordance with previous simulation results, which show stronger coordination of water hydration and ion association in RNA solvation layer than that of DNA.37,39–42

C. SAXS computing: Protein/nucleic-acid complexes

The development of DNA scattering models enables further extension to biomolecular complexes that can be formed by proteins and nucleic acids. Traditional methods for SAXS calculations of such complexes assume a uniform excess electron density across the surfaces of both proteins and nucleic acids. Given that the solvation layer of proteins and nucleic-acids contribute differently towards scattering, one can reasonably assume that different weighting factors be considered for the solvation water molecules in the case of protein/nucleic-acid complexes.

To compute SAXS profiles of protein/nucleic-acid complexes, we assign different weighting factors based on the proximity of water molecules to protein and nucleic-acid (Fig. 5(a)). This proximity is measured by the parameter Δ = d1d2, where d1 and d2 are the minimum distances of a water molecule from nucleic acids and proteins, respectively. If Δ ≫ 0, the water molecule is assigned to have a protein-like weighting factor w = wprot. If Δ ≪ 0, it is assigned to have a nucleic-acid-like weighting factor w = wNA, where wNA is wDNA or wRNA depending on the composition of the complex. For those water molecules at the interface of solvation layers, a smooth sigmoidal function is used to allow the transition between wNA and wprot as

w=w prot +w NA w prot 1+exp(Δ/l), (7)

where l is the characteristic length parameter reflecting the transition from wNA to wprot. Figures 5(b) and 5(c) show such transitions as a function of Δ for both protein/DNA and protein/RNA complexes at different l values (l = 0.5, 1, 2 Å), where a higher l value indicates a sharper transition and vice versa.

FIG. 5.

FIG. 5.

Computing SAXS profiles for protein/nucleic-acid complexes. (a) A cartoon representation of the solvation layer. d1 and d2 are the minimum distances of a water molecule from nucleic acid and protein, respectively. (b) A sigmoidal function is used to model the weighting factor w of water molecules as a function of Δ = d1d2 (Eq. (7)). Both protein/DNA and protein/RNA complexes are illustrated. The characteristic length l reflects the transition at the boundary using wDNA= 7% for DNA, wDNA = 13% for RNA, and wprot = 4% for protein.

To check how different l values can affect SAXS results, two sets of SAXS calculations are performed on a protein/DNA complex (PDB entry 1T2K)52 and a protein/RNA complex (PDB entry 3MOJ).53 Figure 6 shows the calculated SAXS results using Eqs. (5) and (7) with different values of l = 0.5, 1, 2 Å. For each system, the resultant SAXS curves are virtually identical using different l values, suggesting that this method is quite robust and not sensitive to the change in l. This is not surprising since only a small percentage of water molecules are affected in their contribution to the total scattering of the entire complex. Therefore, the value of l = 1 Å is chosen for mathematical convenience throughout the rest of SAXS calculations. Here, this SAXS computational method for protein/nucleic-acid complexes is termed Fast-SAXS-pro.

FIG. 6.

FIG. 6.

Theoretical SAXS profiles using different l values for (a) a protein/DNA (PDB entry 1T2K) complex and (b) a protein/RNA complex (PDB entry 3MOJ). The inset shows the corresponding structure, where nucleic acid is colored in red and protein in yellow. Calculated SAXS profiles using different l values are nearly identical in both systems.

We also illustrate the difference between Fast-SAXS-pro and other methods which use a uniform excess density around the surface of protein/nucleic acid complexes. For this purpose, the widely used CRYSOL program is used,29 which adopts an implicit solvation layer with a uniform excess electron density 10% (by default), i.e., higher than that of the bulk. Figure 7 shows the calculated SAXS results of the same two model systems as used earlier. In the case of the protein/DNA complex, the SAXS profiles calculated using Fast-SAXS-pro and CRYSOL are similar (Fig. 7(a)). This suggests that the treatment of assigning different weighting factors may not significantly improve the accuracy of protein/DNA scattering. This is not surprising since the values of wprot and wDNA are relatively close. On the other hand, for the case of protein/RNA complex, the resultant SAXS profiles are noticeably different (Fig. 7(b)), e.g., in the range of q > 0.15 Å−1. This difference, mainly due to the large gap between wprot and wRNA, suggests that the differential treatment of solvation layer can play a non-trivial role in the SAXS computations. It should be noted that a direct comparison to experimentally measured SAXS data is currently not possible because of the lack of a model protein/nucleic-acid complex whose high-resolution structure and SAXS data are available, although such an attempt will be pursued once relevant data become available. Nonetheless, it appears that the Fast-SAXS-pro method is capable of accounting for a physical consideration in the treatment of solvation layers, thus making a forward step towards accurate SAXS computation of protein/nucleic-acid complexes.

FIG. 7.

FIG. 7.

Comparison of calculated SAXS profiles using Fast-SAXS-pro and CRYSOL. (a) The protein/DNA (PDB entry 1T2K) complex and (b) the protein/RNA complex (PDB entry 3MOJ) are the same as in Fig. 6.

IV. CONCLUSION

We have developed a unified Fast-SAXS-pro method for computing the SAXS profiles of DNA, RNA, protein, and their complexes. For the DNA molecules, the contribution to scattering from its solvation layer was calibrated and validated against experimental SAXS data. The scattering contribution from the solvation layer of DNA is different from that of proteins and RNA, and this differential contribution is taken into account in Fast-SAXS-pro for computing the SAXS profiles of protein/nucleic-acid complexes. Comparison of the results from Fast-SAXS-pro with CRYSOL shows a good match for a protein/DNA complex, but a noticeable difference in the high-q range for a protein/RNA complex. Finally, given the increased amount of SAXS studies, this Fast-SAXS-pro method can be applied to a broad range of large protein/nucleic-acid complexes.

ACKNOWLEDGMENTS

We thank Dr. Lois Pollack and Dr. Xiangyun Qiu for providing us the structural coordinates and the experimental SAXS data for the 25-bp DNA used in this work. Allocation of SAXS beamtime was supported by the BioCAT-18ID beamline at the Advanced Photon Source and by the NSLS-X9 beamline at the National Synchrotron Light Source.

REFERENCES


Articles from The Journal of Chemical Physics are provided here courtesy of American Institute of Physics

RESOURCES