Abstract
In this paper, a new method for calculating effective atomic radii within the generalized Born (GB) model of implicit solvation is proposed, for use in computer simulations of bio-molecules. First, a new formulation for the GB radii is developed, in which smooth kernels are used to eliminate the divergence in volume integrals intrinsic in the model. Next, the Fast Fourier Transform (FFT) algorithm is applied to integrate smoothed functions, taking advantage of the rapid spectral decay provided by the smoothing. The total cost of the proposed algorithm scales as O(N3logN + M) where M is the number of atoms comprised in a molecule, and N is the number of FFT grid points in one dimension, which depends only on the geometry of the molecule and the spectral decay of the smooth kernel but not on M. To validate our algorithm, numerical tests are performed for three solute models: one spherical object for which exact solutions exist and two protein molecules of differing size. The tests show that our algorithm is able to reach the accuracy of other existing GB implementations, while offering much lower computational cost.
Keywords: Generalized Born radii, Poisson-Boltzmann Model, Implicit Solvation Model, Bio-molecules, Fast Fourier Transform
1 Introduction
Accuracy and speed are two primary objectives in developing computational techniques for modeling biomolecular systems [1] in aqueous environments where electrostatic interactions play an important role. Explicit solvent methods adopt microscopic representations of both solute and solvent molecules and offer an accurate description of the molecular system. However, these methods usually entail high computational cost. In contrast, implicit solvent simulations characterize the solvent in terms of macroscopic physical quantities such as dielectric constants and Debye lengths, and provide a higher speed in practical calculations while still providing a realistic description of the solvent environment.
In implicit solvent models [2], the degrees of freedom pertaining to water are integrated out and replaced by an effective potential energy term, ΔG, acting on the degrees of freedom of the solvated molecule only. From the thermodynamics point of view, this effective energy term corresponds to the free energy associated with the transfer of a solute molecule from vacuum to solvent, and is therefore referred to as solvation free energy. In practical calculations, ΔG, is most conveniently decomposed into two parts, ΔGpol and ΔGnp, which are referred to as polar and nonpolar solvation energy, respectively [3]. The nonpolar part, ΔGnp, is associated with the first step of the insertion process, at which empty space is created inside solvent and filled with solute atoms whose charges were canceled. The polar solvation energy, ΔGpol, corresponds to the free energy associated with charging the atoms of the neutral solute immersed in solvent, to their actual values.
The polar solvation energy is the more computationally expensive part in ΔG and it represents the bottleneck in computer simulations of biomolecules. In principle, ΔGpol can be found exactly in the framework of continuous electrostatics in inhomogeneous media. Depending on whether free ions are present in aqueous solution or not, ΔGpol is given by the solution of the Poisson or Poisson-Boltzmann (PB) equations, respectively, in which solute is treated as a medium of low dielectric constant, typically ε=1−10, while the solvent is assigned a high dielectric constant ε=80 [4]. As exact analytical solutions of the PB equation exist only for a few solute geometries, in the general case of arbitrarily shaped solutes, this equation has to be solved numerically. Apart from the more fundamental issues concerning applicability of continuous solvent representation at molecular length scales, the necessity to find a numerical solution severely limits the range of problems to which PB model can be applied. Although much progress has been made recently [5, 6] in developing efficient algorithms for solving PB equation, such as finite difference and boundary element methods, overall these methods are still considered too slow to be applied directly in molecular dynamic simulations of macromolecular systems. More commonly, numerical solutions of the PB equation are used as benchmarks in developing other, faster models of electrostatic solvation.
The generalized Born (GB) theory [7, 8], is one of the most successful fast though approximate approaches to electrostatic solvation in biomolecular systems. The solvation energy ΔGpol in GB model is represented as a pairwise summation over all atoms comprising a solute molecule, and is therefore relatively cheap to compute, compared to solving PB equation numerically. A critical ingredient of this model is effective atomic radius, so-called Born radius, whose physical meaning is based on the Born solvation model of spherical ions. The accuracy of the GB model was critically tested in relation to the PB theory, which it was designed to approximate. Remarkably, very good agreement between the PB and GB results were reported for model solutes of various shapes [9] as well as a large number of biomolecules of practical interest, proteins [10–12].
Although the generalized Born formula for electrostatic solvation is not computationally expensive per se, provided that the effective Born radii are given, evaluating these radii presents a major computational challenge. The calculation of the Born radii, which requires an exterior domain integration or an integration over the molecular volume for each atom, remains the most expensive part in the GB theory. There are several methods including analytical and non-analytical approaches to obtain the Born radii. The analytical pairwise GB models [13–17] approximate the volume integral by a sum of spherical integrals centered at each atom, in which the overlap region is compensated through an empirical correction. The spherical integrals can then be computed analytically. Nonetheless, the non-analytical approaches usually give more accurate results due to the deficiencies in the empirical corrections of analytical models. In one of the earliest and most widely accepted studies on GB model by Still et al. [18], the Born radii were obtained by constructing a set of concentric spherical shells and summing the fractional area of the shells inside the volume. In other papers, the Born radii were calculated by grid-based numerical methods, such as, using volume integration of the molecule based on a cubic grid [19], or applying the Green's function to convert the volume integral into a surface integral [20], or the approach of numerical quadrature techniques used in the density functional theory [21, 22].
Despite much recent improvement, most grid-based calculations of Born radii available today are not sufficiently fast to be applied in large macromolecular simulation [8]. As these methods usually need to compute integrals individually for each atom, for a grid of O(N3) points in the computational domain containing the molecule, a direct calculation of the volume integral has a computational complexity of O(MN3), where M is the total number of the atoms within the molecular volume. Even with the volume integrals reduced to surface integrals for a complexity of O(MN2), the computational cost remains too high for large atom number M in practical simulations. In this paper, we propose a new method to compute Born radii using the fast Fourier transform (FFT) algorithm. Our method relies on a new formulation of the Born radius where the singularity of the kernel function inside the exclusion sphere for each charge is removed by a smoothing function, resulting in high efficiency and accuracy of the FFT calculations. The Born radii for atoms located off grid sites are obtained by interpolations from nearby grid points. The overall complexity of our algorithm is linear (apart from a logarithmic factor) in the number of atoms comprised in a molecule, M, and the number of grid points N3 one uses to perform FFT, O(N3 log N + M). The grid parameter N is independent of M and is chosen such that sufficient accuracy in volume integrals over the biomolecules of interest is achieved. By changing the smoothness of the kernel function, one can control the rate of its spectral decay, and therefore keep the number of the grid points to a minimum. Comparing the complexity of our algorithm to other grid-based methods, it is clear that our FFT-based method has an advantage. As we will demonstrate later in the text in direct comparisons, this advantage becomes more pronounced for systems with large number of atoms.
The remainder of the paper is organized as follows. In Section 2, we review the approximate generalized Born theory. Section 3 contains a new formulation of the generalized Born radius with smooth kernels. Then, a new FFT based fast algorithm is proposed in Section 4 while Section 5 contains numerical validation of the proposed algorithm for one model molecule and two protein molecules. Finally, Section 6 summarizes this paper with conclusions.
2 Generalized Born theory
In the implicit description of bio-molecules in solvent environments, we approximate solvent as a uniform dielectric medium and the solute molecule with a lower dielectric constant and partial charges at atomic centers. The electrostatic potential therefore satisfies the Poisson equation
(2.1) |
where ε(r) is the dielectric constant taking value εin (εin = 1 in most cases) inside the molecule and εex outside the solute molecule (εex = 80 for water), and ρ(r) is the charge distribution
(2.2) |
with ri being the atomic position of the ith point charge of strength qi of both signs for . If ionic salt effects are to be considered, the linearized Poisson-Boltzmann (PB) equation will be used in (2.1) instead (with the right hand side as a linear function of ϕ(r) for the exterior region outside the molecule). Numerical methods for the Poisson equation for molecules of arbitrary shape include finite difference methods [23], finite element methods [24] and boundary element methods [25, 26]. Among these methods the boundary element method has been known as one of the most effective PB solvers, when used with fast multipole methods [27]. To compute the electrostatic solvation energy, we calculate the potential inside the molecule for two exterior dielectric environments εex = 1 and εex = 80 for water solvent, denote the corresponding potentials as ϕvac, ϕsol, respectively. Then, the difference of these two potentials gives a reaction field, ϕre = ϕsol - ϕvac, based on which we define the electrostatic solvation energy [7]
(2.3) |
Because of the high computational cost of solving the PB equation directly, much effort has been made in finding equivalent models with a reduced cost. The generalized Born (GB) theory is accepted as the most popular substitution of the Poisson equation, which provides an approximation to the solution of the Poisson equation with a relatively simple formula. The starting point is the well-known Born formula [28] for the solvation energy of a single ion q of radius R immersed in a solvent with the dielectric constant ε,
(2.4) |
which can be obtained analytically from the Poisson equation. The concept of the atomic radius R in the Born model is extended to polyatomic molecules, in which, for computational efficiency, it is assumed that the solvation energy is represented as a sum over the pairs of all atoms [18, 29, 30]
(2.5) |
Here fGB is a function of the distance rij between atoms i and j and their ‘effective Born radii’ Ri and Rj. To be consistent with the Born result for one particle and for two particles at a large separation, fGB has to interpolate between Ri and rij as the interparticle distance tends to zero and infinity respectively. In this paper, we use the expression of Still et al. [18] , which clearly satisfies these two conditions. The resulting solvation energy
(2.6) |
contains atomic positions, atomic charges and effective Born radii as input. The Still's formula for GB solvation energy was shown to be very accurate in test calculations of a large number of model solutes [9,10]. While positions and charges are readily available for a given configuration of the solute molecule to evaluate its solvation energy, the Born radii have to be calculated separately. The physical meaning of the effective Born radius Ri of atom i in a solute molecule, is that it corresponds to the radius of the sphere centered at this atom, whose solvation energy is equal to the solvation energy this molecule would have if all its charges were canceled except for the charge on the target atom. Using this definition allows Born radii to be computed directly by solving the PB equation numerically for a solute of arbitrary shape. As we noted earlier, however, numerical solutions entail significant computational cost, which makes the direct evaluation of Ri impractical.
Much effort has been invested recently in developing approximate analytical formulations for Born radii, which avoid solving the PB equation directly [10, 18, 21, 31]. Historically, the first such formulation approximated electric displacement created by charge i inside a solute molecule of arbitrary shape as coming from a point charge in homogeneous medium, that is being Coulombic in form [7, 15]. Termed accordingly as the Coulomb-field approximation (CFA), this GB formulation provides a simple expression for the Born radii as a volume integral
(2.7) |
where Ωex denotes the exterior domain outside the molecule, and ri is the location of the atom i within the molecule. The CFA is known to overestimate the Born radii of atoms, especially those located near the surface of the solute [9, 21], and much better models have been introduced recently [9, 10]. Nevertheless, we use this approximation in the present work as it is most widely accepted in the literature and available in software packages [8]. The main purpose of this paper is to prove that an algorithm based on the FFT can be adapted for use in the GB theory of solvation. Once the methodology is available to compute Born radii within the CFA, it will be a matter of technicality to extend it to more accurate approximations. Therefore, our starting point here is Eq. (2.7).
Typically, the integral over the exterior of a macromolecule in Eq. (2.7) is rewritten as an integral over the interior domain Ωin excluding a small sphere Si with a radius ai centered at the position of the charge (see Bashford and Case [7]),
(2.8) |
where the value of the integration outside is used. Due to the arbitrary shapes of the molecule, there is no analytical formula for Ri, and therefore, a numerical integration or approximate analytical formula is required.
3 A new formulation for the generalized Born radius with smooth kernels
In this section, we introduce a new method to calculate the generalized Born radius where the singularity of the kernel around the atom site in (2.8) is replaced by a smoothing function, which will be called “a smoother”. We rewrite Eq.(2.7) in the following form
(3.1) |
where we assume the excluded sphere Si, embedded inside the molecule (Fig. 1(a)), has a common radius ai = a for every atom i. G is a smoothed version of the function 1/r4 inside the excluded sphere Si, i.e.
(3.2) |
where the smoother produces a n-th order continuity of G(r) at r = a. For example,
(3.3) |
(3.4) |
(3.5) |
Note that a larger n will lead to a faster decay in the spectral of G(r) in the Fourier frequency domain, such a fast decay will be an important factor in the efficiency of the proposed method for calculating the Born radius with the FFT. The first integration on the right hand side of (3.1) can be calculated analytically as
(3.6) |
In Eq. (3.6), the first integral on the right equals to while the second term is the integral of the smoother inside , and when n = 1, 2 and 3, respectively.
If the smoother radius a here is taken as the atomic radii, for instance, the van der Waals radius in literature, then, the sphere Si is completely inside Ωin. In fact the radius a can be chosen arbitrarily. If the sphere Si is not completely inside Ωin, as illustrated in Fig. 1(b). Eq. (2.7) will be rewritten as
(3.7) |
in which Ai is the portion of Si outside Ωin. Since the center of Si is inside Ωin, the integration over the region Ai is not singular, and can be calculated by a numerical quadrature or by an approximate analytical formula to be discussed below.
3.1 Analytical formula for the integration over Ai
We consider the integral in Ai for Eq. (3.7), which can be written in the form of a local spherical coordinate system with the origin located at ri as
(3.8) |
where when r is in Ai and zero elsewhere. If we assume that the sphere Si is much less than the volume of the molecule and the portion of the molecular surface can be approximated by a plane, then, Ai is a spherical cap of the sphere Si. This assumption reduces the three dimensional integral to an one dimensional one
(3.9) |
where s (s < a) is the shortest distance from ri to the plane approximating the portion of the molecular surface inside Ai and . We have
(3.10) |
and
(3.11) |
(3.12) |
(3.13) |
4 A FFT-based Algorithm for the Born Radii
In this section, we present a new FFT-based algorithm to calculate the GB radii. The main tool will be the FFT for the evaluation of the second integral on right hand sides of Eqs. (3.1) and (3.7), which takes on the form,
(4.1) |
Once Φ(r) is calculated on grid lattice points, the value Φ(ri) corresponding to the ith off grid lattice atom can be obtained by a simple interpolation from the nearby data on the lattice sites surrounding the atom. In order to use the FFT, we define an indicator function for the molecular volume domain Ωin
(4.2) |
then, the integral in (4.1) can be extended to the full space as
(4.3) |
which is a convolution fit for evaluation by the Fourier transform . We will show later how the transform can be implemented by the discrete fast Fourier transform (FFT) in 1D case (3D case will be done dimension by dimension).
The FFT-based method to be proposed will give Φ(rijk) on the grid lattice sites rijk = (xi, yj, zk), 0 ≤ i, j, k ≤ N at a cost of O(N3 log N). Then, Φ(rα) for the α-th off grid lattice site atom can be obtained by an interpolation from Φ(rijk) at a cost of O(M) = 8M, for instance, with a linear interpolation for M atom sites. In this paper, we use a weighted average of surrounding eight grid points by taking the inverse of the square of the distance as weights, which gives a better accuracy than the linear interpolation, namely
(4.4) |
where
(4.5) |
and δ is a small positive number to avoid a division by zero. Here, N is independent of M and only depends on the shape of the molecule Ωin, i.e., the lattice should be fine enough to resolve the boundary of the molecule within a prescribed accuracy. Therefore, the total complexity is O(N3 log N + M).
4.1 Decay conditions for the smoother's spectral -
The decay conditions of the smoother G and indicator function f in the Fourier frequency space will affect the cost of calculating the convolution (4.3) by the Fast Fourier transforms. Let us consider the 1D analog of (4.1) for the evaluation of
(4.6) |
for x ∈ V = (−b, b) and
(4.7) |
Let f(x) be the indicator function for domain V defined in (4.2), then, the one-dimensional convolution corresponding to (4.1) is
(4.8) |
Applying the Fourier transform, we have
(4.9) |
with
(4.10) |
(4.11) |
and, then, using the inverse Fourier transform, we have
(4.12) |
Due to the fact that f(x) is discontinuous at x = ±b and G(x) is Cn-continuous at x = ±a with the smoother , the decay conditions of and are
(4.13) |
(4.14) |
If we further smooth the indicator function f(x), for example in the vicinity of x = b, using arctan with λ a large real number, higher order of decay for can also be obtained.
4.2 Calculation of Φ(xj) in one dimensional case
We will follow two steps in the calculation of (4.8).
Step 1: Using the decay condition of the Fourier transform of the smoother , integral in (4.12) can be truncated to a finite interval [−Ωπ, Ωπ], and the resulting integral will be approximated by a N point quadrature rule using .
Step 2: Each , , defined by (4.10) (4.11), only involves integral over a finite interval due to the compact support of f(x) and rapid decay of G(x), and will be approximated by another N point quadrature.
Both steps can be implemented by the FFT, which has an O(NlogN) complexity of evaluating the following two transforms between data and discrete Fourier coefficients
(4.15) |
(4.16) |
Let ε be an error tolerance of the whole algorithm, against which we truncate the integral over ξ ∈ (−∞, +∞), i.e.,
(4.17) |
with the truncation parameter Ω defined as follows based on the decay condition (4.13)(4.14)
(4.18) |
An N-point rectangle quadrature rule to the integral in (4.17) yields
(4.19) |
where and N will be selected based on the Shannon sampling rate of in the ξ variable
(4.20) |
Remark 1 In principle, the selection of N should also depend on the oscillatory behavior of the spectral function . In case a larger N is needed to resolve the oscillations in , we can achieve that by increasing the size of L.
4.2.1 Computing
Next, we will calculate the value of Φ(xj) at N points inside the interval [−L, L]. Again, the size of N will be based on the Shannon sampling rate for function e±iΩπx in the x-variable, which gives again
(4.21) |
Let
(4.22) |
then
(4.23) |
which can be evaluated by one FFT at a cost of O(N log N).
4.2.2 Calculation of
As sup(f) ⊂ [−L, L], we have
(4.24) |
which will also be approximated by an N−point rectangle quadrature rule
(4.25) |
where .
As N = 2ΩL, we have for
(4.26) |
which can be evaluated by one FFT at a cost of O(N log N).
The values for can be obtained by analytical formulae as to be shown later.
4.2.3 Algorithm I - 1D case
The following steps form the flow of the algorithm in the 1-D case.
4.3 Calculation of Φ(xm, yn, zl) in three dimensional case
4.3.1 Analytical expression for
The 3-D Fourier transform can be found analytically. We consider the three-dimensional Fourier transform of G(r) defined by
(4.29) |
which is a spherical symmetric function of ξ due to the spherical symmetry of G(r) in the spatial domain. Therefore, the Fourier transform at a radial distance ρ (by letting ξ = (0, 0, ρ)) is
(4.30) |
where ρ = |ξ|, and (r, θ, ψ) is the spatial spherical coordinate system with x = r cos θ sin ψ, y = r sin θ sin ψ and z = r cos ψ. Integration in ψ and substitution of the piecewise definition of G(r) yields
(4.31) |
The second term II can be integrated to give
(4.32) |
where μ = aρ, and 1F2(α; β, γ; x) is the hypergeometric function
with as the rising factorial.
For the first integral in (4.31), we have, for n = 1, 2 and 3,
(4.33) |
(4.34) |
(4.35) |
In Fig. 2, the spectral and with different choices of smoother are plotted where Gn denotes a Cn continuity. We remark that In is not convergent at μ = 0 due to higher order infinitesimal of the denominator than the numerator in In. Therefore, In(0) should be calculated by an extrapolation, for example for a first order accurate extrapolation, we can use .
4.3.2 Algorithm II - 3D case
Let the molecule be contained in a rectangular box of size [−Lx, Lx]×[−Ly, Ly]×[−Lz, Lz]. If the smallest box that contains the molecule is [−a, a] × [−b, b] × [−c, c], then, due to the periodicity of the FFT, the computational box [−Lx, Lx]×[−Ly, Ly] × [−Lz, Lz] should be chosen such that Lx ≥ 2a, Ly ≥ 2b, and Lz ≥ 2c to avoid the overlap of the images of f and G.
The following steps form the flow of the algorithm in the 3-D case.
- Step 1. For an n-th order smoother in (3.2) and an error tolerance ε > 0, choose the truncation parameter Ω by
and set(4.36) (4.37) - Step 2. compute , using one 3-D FFT for the following sums at a cost of O(NxNyNz log(NxNyNz)).
where .(4.38) - Step 3: Compute , using one 3-D FFT for the following sums at a cost of O(NxNyNz log(NxNyNz)).
(4.39)
Remark 2 In the 3-D case, the function f(x, y, z) is the indicator function of the solute molecule, therefore, Nx, Ny, Nz should be large enough so the boundary of the solute molecule is well resolved on the NxNyNz–lattice grid to ensure a prescribed accuracy in the Fourier transform (4.38).
Remark 3 In (4.38)(4.39), only the simplest rectangle quadrature rules are used. In fact, higher order equally spaced Newton-Cotes formula could be used and the FFT could still be applicable.
5 Validation of the FFT-based Algorithm
5.1 A model molecule - numerical accuracy
To test the accuracy and speed of the proposed method, we first implement a test example by taking a dimensionless unit sphere
as the interior region of a molecule. A window function is used as the smoother in (4.7). For the spherical geometry, the 3D exterior integral (2.7) can be computed exactly due to the radial symmetry of the sphere, i.e.
(5.1) |
where ri (ri < 1) is the distance between the ith atom and the origin.
We test the accuracy of the FFT-based algorithms by computing the relative error in Ri for a uniformly distributed 100 charges along a radial direction. In cases where part of the sphere Si for a charge is outside the molecule, we can obtain the height of the spherical cap Ai in (3.7) and the distance between the charge and the molecular surface analytically. In practical calculations, this can be done by defining a level set function C(r) [32, 33] as in the next section, which is a signed distance function (being greater or less than 0 when r is inside or outside the molecule, respectively). The distance function C(r) on a 3-D grid can be evaluated by a fast sweeping algorithm [34] with a complexity O(NxNyNz) with NxNyNz grid points. As the interior region in this test model molecule can be parameterized, the exact normal direction to the surface is known and, therefore, it is straightforward to compute the distance along that direction for the charges close to the boundary.
We summarize our results in Tab 1, which shows the maximum and average errors for different window sizes and grids in a spatial region [−L, L]3 with L = 2. We remark that there are several sources for errors in the calculations of Born radius Ri. The first source is the error in numerical quadrature involved in (4.23) (Step 3 in the FFT-based Algorithms). The second source comes from the interpolation of Born radii for atoms at off-lattice sites using the neighboring grid points. And the last source results from the treatment of atoms whose small sphere has a portion Ai outside the molecule and an approximation for the shape of Ai is used in (3.7). These three different sources of error depend on the parameters of our algorithm in a non-trivial way. In what follows we investigate their effect in detail. For the sake of comparison, we also implement the solution of the grid-based GB method [21] with a Cartesian grid [−1, 1]3, which directly sums the data at the centers of grid boxes for the numerical integration. This method is well known for its high accuracy, especially for solutes with complicated surface geometry.
Table 1.
Grid Size h | a = 0.25 | a = 0.2 | a = 0.15 | a = 0.1 | ||||
---|---|---|---|---|---|---|---|---|
max. | ave. | max. | ave. | max. | ave. | max. | ave. | |
0.125 | 7.16% | 1.95% | 9.15% | 2.16% | 11.1% | 2.50% | 12.8% | 3.19% |
0.0625 | 4.18% | 0.86% | 3.77% | 0.82% | 4.20% | 0.95% | 6.15% | 1.34% |
0.03125 | 4.00% | 0.76% | 3.53% | 0.56% | 2.98% | 0.49% | 2.32% | 0.77% |
Tables 1 and 2 contain the numerical errors for our FFT-based method and the original grid-based method, respectively. It is seen that at the largest grid spacing h=0.125, the error is large for all sizes of the smoothing window radius a considered. Both FFT and grid-based integration yield errors well over 1%. For increasing values of a, the error is seen to decrease gradually. In the FFT-based calculations, this decrease is due to the faster decay in the spectral of the window function at larger a. In the grid-based integration, small size of the window function a can not be efficiently resolved in a volume integration using a grid of comparable spacing h ∼ a. Note that for the smallest a=0.1, the FFT calculation yields a ∼ 3% error which is a significant improvement over the grid-based calculation generating a 68% error. Comparing the results of FFT and grid-based calculations for h=0.125 it is clear that the main source of error is space integral discretization and that the error is unacceptably large.
Table 2.
Grid Size h | a = 0.25 | a = 0.2 | a = 0.15 | a = 0.1 | ||||
---|---|---|---|---|---|---|---|---|
max. | ave. | max. | ave. | max. | ave. | max. | ave. | |
0.125 | 3.66% | 1.13% | 5.11% | 1.76% | 35.9% | 11.4% | 125.3% | 68.03% |
0.0625 | 3.82% | 0.61% | 3.22% | 0.63% | 2.57% | 0.74% | 10.37% | 3.31% |
0.03125 | 3.96% | 0.38% | 3.35% | 0.28% | 2.66% | 0.19% | 1.88% | 0.30% |
As the grid spacing h is reduced, the Born radii computed using our algorithms become more accurate. At h=0.0625, both FFT and grid-based errors show a tendency to decrease with increasing a, echoing the results observed for a larger h=0.125. On a grid of higher resolution, h=0.03125, that tendency is reversed for the largest a considered. In the FFT calculation, the error jumps from 0.56% at a=0.2 to 0.76% at a=0.25. In the grid-based calculation, a similar increase is from 0.28% to 0.38%. That a non-linear behavior with a is observed signals that the dominant error in Born radii present at larger h due to the numerical quadrature, has become comparable to the error resulting from the treatment of surface atoms at the smaller h=0.0315. For the surface atoms, the approximation involved in calculating the volume of spheres overlapping with the solute becomes worse for larger a. Consequently, the total error in Ri grows. Analyzing the last rows in Tabs. 1 and 2 we conclude that for a sufficiently fine grid network, it is possible to find a size of the window function a such that the cumulative error in Ri reaches a minimum. For both FFT and grid-based algorithms the minimum error is well below 1%. The error is slightly higher, 0.49%, in the FFT algorithm than in its grid-based counterpart, 0.19%, highlighting the importance of the intrinsic numerical errors present in the former algorithm that go beyond space discretization. Nevertheless, Tabs. 1 and 2 clearly show that the new algorithm proposed in this paper is sufficiently accurate to yield reliable Born radii for a spherical solute.
To evaluate the cost-efficiency of our FFT algorithm, time-performance tests were carried out in comparison to the grid-based method. The results of these tests are summarized in Fig. 3, which shows the CPU times for both methods implemented on a notebook PC with AMD 1.90 GHz double CPUs and 2G memory. As expected, the FFT-based algorithm incurs a much less computational cost. It is seen in Fig. 3 that for the grid size h = 0.0625, where the average error in Ri is below 1% (See Tab. 1), the FFT algorithm becomes faster than the grid-based method when the number of atoms in a solute molecule reaches 600. Typically, proteins of moderate sizes contain more than 600 atoms. It is clear, therefore, that for such proteins the new method proposed here becomes advantageous. For a system composed of about 4000 atoms, the speed improvement of our algorithm reaches one order of magnitude. For larger systems, the improvement is even more dramatic. For instance, starting at approximately 5000 atoms, FFT calculations on a finer mesh h = 0.03125 with much improved accuracy, become faster than the grid-based calculations on a coarser mesh of h=0.0625. We conclude from this analysis, that the FFT algorithm offers a much better performance over the grid-based methods, and should therefore become the method of choice in GB calculations of large solute molecules.
In the example of a spherical solute, the Born radii are known exactly, and, in principle, there is no need to evaluate Ri using grid-based approach; FFT calculations could be compared to the known solutions directly. For an arbitrarily shaped solute, however, the exact solution is not known and it is the grid-based solutions that serve as the reference point for comparison with the FFT algorithm. In view of this, it makes sense to compare FFT and grid-based solutions for a spherical solute, to estimate the level of agreement one can expect between the two methods for an arbitrarily shaped solute. Figure 4 presents such analysis for the radius of the smoothing function a=0.2. It is visually apparent from the Born radii plotted in this figure, that the results of the FFT algorithm agree well with those of the direct grid summation. The smaller the grid spacing h the better the agreement is observed. To characterize the agreement between the two data sets quantitatively, we use the standard correlation coefficient, defined as
(5.2) |
where E is the mathematical expectation, X and Y are FFT radii and grid-based radii, respectively. The correlation coefficients we observe are very high, ρ=0.9936, 0.9994 and 0.9999 for h = 0.125, 0.0625 and 0.03125, respectively.
5.2 Effective Born Radii for Proteins
A thorough investigation on the performance of the new FFT-based algorithm proposed in this work to large set of proteins is underway and its results will be reported in a separate publication. Here in this paper we present our results for two proteins, immunoglobulin binding protein [35]( PDB access code 3GB1 ) and human cyclophilin A [36] ( PDB access code 1OCA ). These two proteins differ in size significantly, containing 56 amino acids, 855 atoms and 165 amino acids, 2503 atoms, respectively, and thus serve well the aim of critically testing the performance of our algorithm. The indicator function which defines the molecular domain is generated by the solvent accessible surface with a probe radius of 1.5 Å. The fast sweeping method [34] with 8N3 operations is used to obtain the distance function from grid points to the surface. In order to smooth the step-like discontinuity of the indicator function across the surface, a smoothing procedure based on a time dependent heat equation [37] is employed with the indicator function as the initial condition so that there are 2−4 grid points across the molecule boundary. We remark that smoothing treatment is also often used in generalized Born models [22] and for the purpose of avoiding numerical instability in finite difference Poisson-Boltzmann solvers [38]. In our FFT implementation, a cubic box of dimension L=32 Å was used for protein 3GB1 and a box of dimension L=40 Å for protein 1OCA. In Figs. 5 and 6, we plot the FFT results for different window size a, in comparison with the reference calculations of the effective Born radii using the macromolecular modeling package CHARMM [39]. In the reference computations, the grid-based version of the molecular volume GB formulation (GB/MV) of Brooks and colleagues [21] was used, with the grid spacing set at 0.2 Å. Atomic charges and radii were adopted from the PARAM22 version of the CHARMM force field [40] and all calculations were done in the Coulomb-field approximation.
Overall, the agreement between the Born radii obtained by our FFT-based algorithm and those derived in CHARMM is good. Some scatter is seen in Fig. 5 and 6 but in general, it is clear that the two methods produce well correlated data. As mentioned previously, there are several factors contributing to the numerical error in Ri, with one of them being how atoms close to molecular surface are treated. A direct consequence of this approximation is that large window radii a should lead to large errors for surface atoms. This is exactly the trend we observe for both proteins in Figs. 5 and 6. As the window radius a is increased, the atoms with small Ri, that is those close to the surface, deviate more from the diagonal than do the atoms buried inside the protein and whose Born radii are large. The surface atoms place an upper limit on how large a one is allowed to adopt and still obtain accurate values for Ri. Much will depend on the geometry of a solute of interest, in deciding on what radius a is optimal. The extent of correlation observed between FFT results and those obtained in CHARMM are summarized in Tab. 3 for protein 3GB1 and in Tab. 4 for protein 1OCA. For the smaller protein 3GB1, it is seen that for all but the smallest value of a considered, 3 Å, and a fine integration mesh h=0.5Å, the correlation coefficient ρ > 0.95 is comparable to the values reported in the literature for other GB implementations [21]. The correlation is even better for the larger protein 1OCA, where for all values of a tested, ρ > 0.96. It is not clear at the moment, what causes this improvement. Importantly, Tabs 3 and 4 show that ρ is not very sensitive to a as long as this parameter is larger than 4 Å. This observation gives hope that a value of a between 4 and 6 Å would be acceptable for a large data set of proteins in our planned forthcoming systematic study of the FFT-algorithm for more proteins.
Table 3.
Corr. Coef. | CPU time (s) | |||
---|---|---|---|---|
h=1 Å | h=0.5 Å | h=1 Å | h=0.5 Å | |
FFT, a = 3.0 Å | 0.945 | 0.931 | 4.64 | 38.78 |
FFT, a = 4.0 Å | 0.950 | 0.953 | 4.59 | 39.79 |
FFT, a = 4.5 Å | 0.949 | 0.956 | 4.67 | 38.82 |
FFT, a = 5.0 Å | 0.948 | 0.957 | 4.68 | 40.01 |
Grid-based | 22.11 | 191.04 |
Table 4.
Corr. Coef. | CPU time (s) | |||
---|---|---|---|---|
h=1.25 Å | h=0.625 Å | h=1.25 Å | h=0.625 Å | |
FFT, a = 4.0 Å | 0.969 | 0.962 | 4.59 | 39.99 |
FFT, a = 5.0 Å | 0.974 | 0.973 | 4.57 | 40.04 |
FFT, a = 5.5 Å | 0.974 | 0.975 | 4.57 | 40.00 |
FFT, a = 6.0 Å | 0.973 | 0.975 | 4.61 | 40.09 |
Grid-based | 64.13 | 529.73 |
Finally, to evaluate the time performance of our FFT-based algorithm, for comparison, we carried out grid-based calculations using the same set of grid parameters (comparison with CHARMM would not be appropriate here as CHARMM does more than just calculation of Ri). The results of this comparison, presented in Tab. 4, show that the FFT approach is 5 times faster than the grid-based approach for protein 3GB1, and 14 times faster for protein 1OCA. These speed improvements are in line with what we observed for the spherical solute and they clearly demonstrate the advantage of using the FFT-based algorithm for Born radii proposed in this work when computing GB solvation energy of large proteins.
6 Conclusions
In this paper, we have proposed a fast FFT-based algorithm to calculate the effective Born radii in the generalized Born model of implicit solvation. The algorithm relies on a new formulation for the GB radii using a smooth kernel in the definition of the GB radii via a convolution integral. Using the fast spectral decay of the smooth kernel in the Fourier space and the FFT to calculate the convolution integral on a grid, the GB radii at grid lattice sites are obtained at a cost of O(N3 log N) with N independent of the number of atoms M inside the molecule, only dependent on the geometry of the bio-molecule and the spectral decay of the kernel through (4.37) (see also Remark 2). The GB radii for off grid lattice sites are obtained by a simple interpolation at a cost O(M). Thus, the total cost for finding the GB radius using the FFT-based method for M atoms using a N3 grid is O(N3 log N + M), thus yielding significant speed improvement over traditional grid based methods, as demonstrated by numerical tests of model spherical and protein solute molecules.
Acknowledgement
The authors would like to thank Prof. Michael Feig for bringing this interesting problem to the authors’ attention and later discussions. Also, the authors thank Prof. Guowei Wei for sharing his molecular surface code during this work. Financial support for this work was provided by the National Institutes of Health (grant number: 1R01GM083600-01). Z. Xu is also partially supported by the Charlotte Research Institute through a Duke Postdoctoral Fellowship.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- 1.Leach AR. Molecular Modelling: Principles and Applications. 2nd Edition Prentice Hall; 2001. [Google Scholar]
- 2.Cramer CJ, Truhlar DG. Implicit solvation models: Equilibria, structure, spectra and dynamics. Chem. Rev. 1999;99:2161–2200. doi: 10.1021/cr960149m. [DOI] [PubMed] [Google Scholar]
- 3.Roux B. Computational biochemistry and biophysics. Marcel Dekker; 2001. Implicit solvent models. [Google Scholar]
- 4.Baker NA. Improving implicit solvent simulations: a Poisson-centric view. Curr. Opin. Struct. Biol. 2005;15:137–143. doi: 10.1016/j.sbi.2005.02.001. [DOI] [PubMed] [Google Scholar]
- 5.Koehl P. Electrostatics calculations: latest methodological advances. Curr. Opin. Struct. Biol. 2006;16:142–151. doi: 10.1016/j.sbi.2006.03.001. [DOI] [PubMed] [Google Scholar]
- 6.Lu BZ, Zhou YC, Holst MJ, McCammon JA. Recent progress in numerical methods for the Poisson-Boltzmann equation in biophysical applications. Commun. Comput. Phys. 2008;3:973–1009. [Google Scholar]
- 7.Bashford D, Case DA. Generalized Born models of macromolecular solvation effects. Annu. Rev. Phys. Chem. 2000;51:129–152. doi: 10.1146/annurev.physchem.51.1.129. [DOI] [PubMed] [Google Scholar]
- 8.Feig M, Brooks CL., III Recent advances in the development and application of implicit solvent models in biomolecule simulations. Curr. Opin. Struct. Biol. 2004;14:217–224. doi: 10.1016/j.sbi.2004.03.009. [DOI] [PubMed] [Google Scholar]
- 9.Grycuk T. Deficiency of the Coulomb-field approximation in the generalized born model: An improved formula for born radii evaluation. J. Chem. Phys. 2003;119:4817–4826. [Google Scholar]
- 10.Mongan H, Svrcek-Seiler WA, Onufriev A. Analysis of integral expressions for effective born radii. J. Chem. Phys. 2007;127:185101-1–185101-9. doi: 10.1063/1.2783847. [DOI] [PubMed] [Google Scholar]
- 11.Onufriev A, Case DA, Bashford D. Effective Born radii in the generalized Born approximation: The importance of being perfect. J. Comput. Chem. 2002;23:1297–1304. doi: 10.1002/jcc.10126. [DOI] [PubMed] [Google Scholar]
- 12.Feig M, Onufriev A, Lee MS, Im W, Case DA, Brooks CL., III Performance comparison of generalized Born and Poisson methods in calculation of electrostatic solvation energies for protein structures. J. Comput. Chem. 2004;25:265–284. doi: 10.1002/jcc.10378. [DOI] [PubMed] [Google Scholar]
- 13.Hawkins GD, Cramer CJ, Truhlar DG. Pairwise solute descreening of solute charges from a dielectric medium. Chem. Phys. Lett. 1995;146:122–129. [Google Scholar]
- 14.Schaefer M, Froemmel C. A percise analytical method for calculation the electrostatic energy of macromolecules in aqueous-solution. J. Mol. Biol. 1990;216(4):1045–1066. doi: 10.1016/S0022-2836(99)80019-9. [DOI] [PubMed] [Google Scholar]
- 15.Qiu D, Shenkin PS, Hollinger FP, Still WC. The GB/SA continuum model for solvation. A fast analytical method for the calculation of approximate Born radii. J. Phys. Chem. A. 1997;101(16):3005–3014. [Google Scholar]
- 16.Dominy BN, Brooks CL., III Development of a generalized Born model parametrization for proteins and nucleic acids. J. Phys. Chem. B. 1999;103(18):3765–3773. [Google Scholar]
- 17.Gallicchio E, Levy RM. AGBNP: An analytic implicit solvent model suitable for molecular dynamics simulations and high-resolution modeling. J. Comput. Chem. 2004;25:479–499. doi: 10.1002/jcc.10400. [DOI] [PubMed] [Google Scholar]
- 18.Still WC, Tempczyk A, Hawley RC, Hendrickson T. Semianalytical treatment of solvation for molecular mechanics and dynamics. J. Am. Chem. Soc. 1990;112:6127. [Google Scholar]
- 19.Scarsi M, Apostolakis J, Caflisch A. Continuum electrostatic energies of macromolecules in aquous solutions. J. Phys. Chem. A. 1997;101:8098–8106. [Google Scholar]
- 20.Ghosh A, Rapp CS, Friesner RA. Generalized Born model based on a surface integral formulation. J. Phys. Chem. B. 1998;102(52):10983–10990. [Google Scholar]
- 21.Lee MS, Salsbury FR, Brooks CL., III Novel generalized Born methods. J. Chem. Phys. 2002;116(24):10606–10614. [Google Scholar]
- 22.Im W, Lee MS, Brooks CL., III Generalized Born model with a simple smoothing function. J. Comput. Chem. 2003;24:1691–1702. doi: 10.1002/jcc.10321. [DOI] [PubMed] [Google Scholar]
- 23.Rocchia W, Sridharan S, Nicholls A, Alexov E, Chiabrera A, Honig B. Rapid grid-based construction of the molecular surface and the use of induced surface charge to calculate reaction field energies: Applications to the molecular systems and geometric objects. J. Comput. Chem. 2002;23(1):128–137. doi: 10.1002/jcc.1161. [DOI] [PubMed] [Google Scholar]
- 24.Baker NA, Holst MJ, Wang F. Adaptive multilevel finite element solution of the Poisson-Boltzmann equation II. Refinement at solvent-accessible surfaces in biomolecular systems. J. Comput. Chem. 2000;21(15):1343–1352. [Google Scholar]
- 25.Zhou HX. Boundary-element solution of macromolecular electrostatics - interaction energy between 2 proteins. Biophysical J. 1993;65(2):955–963. doi: 10.1016/S0006-3495(93)81094-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Juffer AH, Botta EFF, Vankeulen BAM, Vanderploeg A, Berendsen HJC. The electric-potential of a macromolecule in a solvent - a fundamental approach. J. Comput. Phys. 1991;97(1):144–171. [Google Scholar]
- 27.Lu BZ, Cheng XL, Huang JF, McCammon JA. Order N algorithm for computation of electrostatic interactions in biomolecular systems. Proc. Natl. Acad. Sci. USA. 2006;103(51):19314–19319. doi: 10.1073/pnas.0605166103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Born M. Volumes and heats of hydration of ions. Z. Phys. 1920;1:45–48. [Google Scholar]
- 29.Constanciel T, Contreras R. Self-consistent field theory of solvent effects representation by continuum models: Introduction of desolvation contribution. Theoret. Chim. Acta. 1984;65:1–11. [Google Scholar]
- 30.Contreras R, Aizman A. On th scf theory of continuum solvent effects representation: Introduction of local dielectric effects. Int. J. Quant. Chem. 1985;27:293–301. [Google Scholar]
- 31.Grant JA, Pickup BT, Sykes MJ, Kitchen CA, Nicholls A. The Gaussian generalized Born model: application to small molecules. Phys. Chem. Chem. Phys. 2007;9:4913–4922. doi: 10.1039/b707574j. [DOI] [PubMed] [Google Scholar]
- 32.Sethian JA. Level Set Methods and Fast Marching Methods. Cambridge University Press; 1999. [Google Scholar]
- 33.Osher S, Fedkiw R. Level Set Methods and Dynamic Implicit Surfaces. Springer; 2003. [Google Scholar]
- 34.Tsai YHR, Cheng LT, Osher S, Zhao HK. Fast sweeping algorithms for a class of Hamilton-Jacobi equations. SIAM J. Numer. Anal. 2003;41(2):673–694. [Google Scholar]
- 35.Juszewski K, Gronenborn AM, Clore GM. Improving the packing and accuracy of NMR structures with a pseudopotential for the radius of gyration. J. Am. Chem. Soc. 1999;121:2337–2338. [Google Scholar]
- 36.Ottiger M, Zerbe O, Guntert P, Wuthrich K. The NMR solution conformation of unligated human cyclophilin A. J. Mol. Biol. 1997;272:64–81. doi: 10.1006/jmbi.1997.1220. [DOI] [PubMed] [Google Scholar]
- 37.Gustafsson B, Kreiss H-O, Oliger J. Time dependent problems and dierence methods. John Wiley & Sons, Inc.; 1995. [Google Scholar]
- 38.Im W, Beglov D, Roux B. Continuum solvation model: computation of electrostatic forces from numerical solutions to the Poisson-Boltzmann equation. Comput. Phys. Commun. 1998;111:59–75. [Google Scholar]
- 39.Brooks BR, Bruccoleri R, Olafson B, States D, Swaninathan S, Karplus M. Charmm: a program for macromolecular energy minimization and dynamics calculations. J. Comp. Chem. 1983;4:187–200. [Google Scholar]
- 40.MacKerell AD, Jr., Bashford D, Bellott M, Dunbrack RL, Jr., Evanseck JD, Field MJ, Fischer S, Gao J, Guo H, Ha S, Joseph-McCarthy D, Kuchnir L, Kuczera K, Lau FTK, Mattos C, Michnick S, Ngo T, Nguyen DT, Prodhom B, Reiher WE, III, Roux B, Schlenkrich M, Smith JC, Stote R, Straub J, Watanabe M, Wiórkiewicz-Kuczera J, Yin D, Karplus M. All-atom empirical potential for molecular modeling and dynamics studies of proteins. J. Phys. Chem. B. 1998;102:3586–3616. doi: 10.1021/jp973084f. [DOI] [PubMed] [Google Scholar]