Abstract
Geometry plays a major role in our attempt to understand the activity of large molecules. For example, surface area and volume are used to quantify the interactions between these molecules and the water surrounding them in implicit solvent models. In addition, the detection of pockets serves as a starting point for predictive studies of biomolecule-ligand interactions. The alpha shape theory provides an exact and robust method for computing these geometric measures. Several implementations of this theory are currently available. We show however that these implementations fail on very large macromolecular systems. We show that these difficulties are not theoretical; rather, they are related to the architecture of current computers that rely on the use of cache memory to speed up calculation. By rewriting the algorithms that implement the different steps of the alpha shape theory such that we enforce locality, we show that we can remediate these cache problems; the corresponding code, UnionBall has an apparent
(n) behavior over a large range of values of n (up to tens of millions), where n is the number of atoms. As an example, it takes 136 seconds with UnionBall to compute the contribution of each atom to the surface area and volume of a viral capsid with more than five million atoms on a commodity PC. UnionBall includes functions for computing the surface area and volume of the intersection of two, three and four spheres that are fully detailed in an appendix. UnionBall is available as an OpenSource software.
Keywords: space-filling diagrams, surface area, volume, pockets, macromolecules
1 Introduction
Cellular functions rest mostly on the activity of two types of large bio-molecules, namely proteins and nucleic acids. A fundamental finding that has shaped over the last forty years of studies of large molecules is that geometry plays a major role in our attempt to understand their activities. This paper emphasizes the former, i.e. the connection between geometry and chemistry. In particular we focus on measuring the shapes of molecules and detecting their cavities and pockets.
Significance of shape
The idea that shape defines function is a general concept from physical chemistry. Molecular structure or shape and chemical reactivity are highly correlated as the latter depends on the positions of the nuclei and electrons within the molecule. Indeed, chemists have long used three-dimensional plastic and metal models to understand the many subtle effects of structure on reactivity and have invested in experimentally determining the structure of important molecules. A common concrete model representing molecular shape is a union of balls, in which each ball corresponds to an atom. Properties of the molecule are then expressed in terms of properties of the union. For example, the putative active sites of an enzyme are detected as cavities and the interaction between a protein and its environment is quantified through the surface area and/or volume of the union of balls [1–4]. The most common use of molecular shape however is found in the quantification of the hydrophobic effect. For this, Lee and Richards introduced the concept of the solvent-accessible surface [5]. They computed the accessible area of each atom in both the folded and extended state of a protein, and found that the decrease in accessible area between the two states is greater for hydrophobic than for hydrophilic atoms. These ideas were refined by Eisenberg and McLachlan [1], who introduced the concept of a solvation free energy for large biomolecules, computed as a weighted sum of the accessible areas of all their atoms i. It is not clear, however, which surface area should be used to compute this solvation energy [6–8]. There is also some evidence that for small solute, the hydrophobic term is not proportional to the surface area [8], but rather to the solvent excluded volume of the molecule [9]. Current models for the non-polar part of the solvent energy include both a surface-based term and a volume-based term [10]. Within this debate on the exact form of the solvation energy, there is however a consensus that it depends on the geometry of the biomolecule under study, more specifically on its volume and surface area. In what follows, we discuss how these geometric measures are usually computed for a union of balls.
Geometric measures of biomolecules
The original approach of Lee and Richards [5] computed the accessible surface area by first cutting the molecule with a set of parallel planes. The intersection of a plane with an atomic ball, if it exists, is a circle which can be partitioned into accessible arcs on the boundary and occluded arcs in the interior of the union. The accessible surface area of atom i is the sum of the contributions of all its accessible arcs, computed approximately as the product of the arc length and the spacing between the plane. This method was originally implemented in the program ACCESS [5]. Shrake and Rupley [11] refined Lee and Richards’ method and proposed a Monte Carlo numerical integration of the accessible surface area. Their method placed 92 points on each atomic sphere, and determined which points were accessible to solvent (not inside any other sphere). Efficient implementations of this method include applications of look-up tables [12], of vectorized algorithm [13] and of parallel algorithms [14]. Similar numerical methods have been developed for computing the volume of a union of balls [15–18].
The surface area and/or volume computed by numerical integration over a set of points, even if closely spaced, is not accurate and cannot be readily differentiated. To improve upon the numerical methods, analytical approximations to the accessible surface area have been developed, which either treat multiple overlapping balls probabilistically [19–21] or ignore them altogether [22, 23]. While these approaches are approximative, they are fast and lead to differentiable geometric measures. In addition, they are well suited for hardware acceleration on graphics processing units [24].
Better analytical methods describe the molecule as a union of pieces of balls, each defined by their center, radius, and arcs forming their boundary, and subsequently apply analytical geometry to compute the surface area and volume [25–29]. For example, Pavani and Ranghino [16] proposed a method for computing the volume of a molecule by inclusion-exclusion. In their implementation, only intersections of up to three balls were considered. Petitjean however noticed that practical situations for proteins frequently involve simultaneous overlaps of up to six balls [28]. Subsequently, Pavani and Ranghino’s idea was generalized to any number of simultaneous overlaps by Gibson and Scheraga [30] and by Petitjean [28], applying a theorem that states that higher-order overlaps can always be reduced to lower-order overlaps [31]. Doing the reduction correctly remains however computationally difficult and expensive. The Alpha Shape Theory solves this problem using Delaunay triangulations and their filtrations, as described by Edelsbrunner [32].
The distinction between approximate and exact computation also applies to existing methods for computing the derivatives of the volume and surface area of a molecule with respect to its atomic coordinates [33–36]. In the case of the derivatives of the surface area, computationally efficient methods were implemented in the MSEED software by Perrot et al. [37] and in the SASAD software by Sridharan et al. [38]. All these methods introduce approximations to deal with singularities caused by numerical errors or by discontinuities in the derivatives [35, 37, 39].
Note that the complexity of the computation of the area and volume of a union of balls, the problems of singularities encountered when computing their derivatives, and the inherent existence of discontinuities have led to the development of alternative geometric representations of molecules. We cite for example the Gaussian description of molecular shape, that allows for easy analytical computation of surface area, volume and derivatives [40, 41], as well as the molecular skin that provides a smooth definition of the surface of a molecule [42].
Detecting pockets and cavities in biomolecular structure
The problem of detecting and measuring internal cavities of biomolecules is very popular as these cavities often serve as leads for drug design as they correspond to putative binding sites for these drugs. Most solutions to this problem rely heavily on geometry. They can be divided into three categories: (i) the grid based methods, (ii) the probe sphere detection and (ii) the analytical methods.
In the grid based method, the molecule is positioned in a 3D Cartesian grid whose vertices are then sorted into two groups: those that are covered by a protein atom and those that are not. The latter are further characterized as being inside a pocket if they satisfy some geometric conditions (such as being inside and at a distance greater than the radius of a water molecule from the convex hull of the biomolecule). The measures of these pockets (volume and surface area) are then computed by Monte Carlo integration over their corresponding grid points. POCKET [43], LIGSITE [44], LigandFit [45], PocketPicker [46], and McVol [18] are cavity-detecting programs that implement the grid-based method.
The probe sphere method proceeds by placing probe spheres that are tangent to the surfaces of two atoms of the biomolecules and then reducing their radii to eliminate overlaps with neighboring atoms; all remaining spheres whose radii exceed a minimal cutoff value (usually 1 Å) are used to define the pockets and cavities. This method was originally implemented in the program SURFNET [47] and later modified in the programs PASS [48] and PHECOM [49]. Interestingly, the grid and probe sphere methods were recently combined in the program POCASA [50].
The alpha shape theory combined with the discrete flow concept was the first analytical method proposed for detecting and measuring inaccessible cavities [4] as well as pockets [51, 52] in biomolecules. It has been extended since to detect channels between inner cavities and the outside [53]. The program CAVE implements a complementary approach in which the boundary of the pockets are directly triangulated, forming the so-called enveloping triangulation [54]
This work
Edelsbrunner and colleagues have developed analytical methods based on the alpha shape theory for computing the metrics of a union of balls, including surface area, volume, their derivatives with respect to the Cartesian coordinates of the centers of the balls and the detection and measurement of pockets [32, 55–57]. These methods have been implemented in different software packages, such as AlphaShape, CASTp [52, 58] and AlphaVol [57]. Most of these softwares however have not been recently updated; in addition, they have been written using generic algorithms that work fine for small molecules but have not been tested on vary large molecular systems (i.e. with more than one million atoms). In this paper, we show that these algorithms lead to inefficient programs for these very large systems. In response, we describe a new efficient implementation of these methods in an open source software package, UnionBall; this new implementation allows us to characterize and quantify the geometry of a molecular system with more than sixteen million atoms is less than eight minutes CPU time on a single processor running at 3.15 GHz. We also propose new geometric derivations of the equations that give the surface areas and volumes of the intersection of two, three and four balls, as well as their derivatives with respect to inter-atomic distances. The paper is organized as follows. The next section provides a brief description of the alpha shape theory and its application to measuring a union of balls. The following section describes our implementation of this theory in the program UnionBall; it includes testing on a set of large virus capsids. Appendix A covers the geometry of the intersections of two, three and four balls while appendix B describes the geometry of a tetrahedron.
2 Measuring Union of Balls
2.1 Surface area and volume of a union of balls
Given a collection Pi of N three dimensional sets, the volume and the surface area of the union of P can be computed using the principle of inclusion-exclusion. That is, the volume and surface area of the union ∪ P can be expressed as an alternating sum of volumes and surface areas of the common intersections of the subsets of P,
| (1) |
where
stands for the volume
of the union of sets or the area of its boundary
. There are two issues that need to be solved to make these equations computationally tractable. Firstly, we need to have a consistent way to reduce significantly the number of terms in the Inclusion-Exclusion formula; brute force application of this formula would lead to an algorithm with exponential running time, as the total number of terms in in formula in 2N − 1, with each term corresponding to the measure of the intersection of at most N balls. Secondly, we need analytical formula for computing the non-empty intersection of sets. The next two sections overview solutions to these two issues when the sets are 3D balls.
2.1.1 A simplified inclusion-exclusion formula for union of balls
The Alpha Shape Theory provides a method for reducing significantly the number of terms in the inclusion-exclusion formula applied to unions of balls. It is based on the concept of Voronoi decompositions and Delaunay triangulations and their filtrations, as described by Edelsbrunner [32]. Note that the concept of using the Voronoi decomposition and Delaunay triangulation to simplify the inclusion-exclusion formula was originally introduced by Naiman and Wynn [59].
Voronoi decomposition and dual complex
Let us consider a finite set of spheres Si with centers zi and radii ri and let Bi be the ball bounded by Si. We define the square distance between a point x and a sphere Si as . This distance definition allows for varying radii for the spheres.
The Voronoi region of Si consists of all points x at least as close to Si as to any other sphere: Vi = {x ∈ ℝ3 | πi(x) ≤ πj(x)}. The Voronoi region of Si is a convex polyhedron obtained as the common intersection of finitely many closed half-spaces, one per sphere Sj ≠ = Si. These half-spaces are defined as follows. If Si and Sj intersect in a circle then the plane bounding the corresponding half-spaces passes through that circle. The union of all Voronoi regions Vi defines the Voronoi diagram of the union of spheres; this union covers the whole space. The intersection of the Voronoi diagram with the union of balls Bi decomposes this union into convex regions of the form Bi ∩ Vi, as illustrated in figure 1. The boundary of each such region consists of spherical patches on Si and planar patches on the boundary of Vi. The spherical patches separate the inside from the outside and the planar patches decompose the inside of the union.
Figure 1. Voronoi decomposition and dual complex.
Given a finite set of disks, the Voronoi diagram decomposes the plane into regions, one per disk, such that any point in the region assigned to disk Si is closer to that disk than to any other disk, where the distance to Si is defined as . In the drawing, we restrict the Voronoi diagram (dashed lines) to within the portion of the plane covered by the disks (magenta) and get a decomposition of the union into convex regions. The dual Delaunay triangulation is obtained by drawing edges between circle centers of neighboring Voronoi regions. The dual complex is a subset of the Delaunay triangulation, limited to the edges (in blue) and triangles (light green) whose corresponding Voronoi regions intersect within the union of disks.
The Delaunay triangulation is the dual of the Voronoi diagram, obtained by drawing an edge between the centers of Si and Sj if the two corresponding Voronoi regions share a common face. Furthermore, we draw a triangle connecting zi, zj and zk if Vi, Vj and Vk intersect in a common line segment, and we draw a tetrahedron connecting zi, zj, zk and zℓ if Vi, Vj, Vk and Vℓ meet at a common point. Assuming general position of the spheres, there are no other cases to be considered. We refer to this as the generic case; it is important to mention that it is rare in practice because of limited precision. Nevertheless, it is possible to simulate a perturbation of the union of balls that restores the generic case [60]. This method, referred to as simulation of simplicity, consistently unfold potentially complicated degenerate cases to non-degenerate ones.
Let us limit the construction of the Delaunay triangulation to within the union of balls. In other words, we draw a dual edge between the two vertices zi and zj only if Bi ∩ Vi and Bj ∩ Vj share a common face, and similarly for triangles and tetrahedra. The result is a sub-complex of the Delaunay triangulation which we refer to as the dual complex K of the set of spheres.
Area and volume formulas
A simplex τ in the dual complex can be interpreted abstractly as a collection of balls, one ball if it is a vertex, two if it is an edge, etc. In this interpretation, the dual complex is a system of sets of balls. We write vol ∩ τ for the volume of the intersection of the balls in τ. This is exactly the term we would see in an inclusion-exclusion formula for the volume of the union of balls, ∪i Bi. As proved in [32, 59], the inclusion-exclusion formula that corresponds to the dual complex gives the correct volume of a union of balls, as well as the correct area of its boundary.
We state the corresponding theorems for the case in which the contribution of each ball Bi is weighted by a constant αi, yielding the weighted volume
of the union of balls and weighted area
of its boundary. When the coefficients αi correspond to atomic solvation parameters, these two terms estimate the solvation free energy of the molecule represented by the union of balls. Let τi be the simplex corresponding to the ball Bi, τij the simplex formed by the edge between the balls Bi and Bj, τijk the triangle corresponding the the three balls Bi, Bj and Bk, and finally τijkl the tetrahedron defined by the four balls Bi, Bj, Bk and Bl. then:
Weighted Volume Theorem
| (2) |
and
Weighted Volume Theorem
| (3) |
Here
is the volume of the ball Bi,
is the contribution of Bi to the volume of the intersection of the balls Bi and Bj, etc. Similar definitions are used for the surface areas
.
These results are direct extensions of the Area and Volume Theorems derived by Edelsbrunner [32, 57]; they overcome past difficulties by implicitly reducing higher-order to lower-order overlaps. An added advantage of these formulas is that the balls in each term form a unique geometric configuration so that the analytic calculation of the volume can be done without case analysis [32, 57].
As a side note, it is interesting that the dual complex is not the only simplicial complex that leads to a minimal inclusion exclusion formulas: Attali and Edelsbrunner have shown that it is possible to construct a family of such complexes, that are characterized by the independence of their simplices and by geometric realizations with the same underlying space as the dual complex [61].
2.1.2 Angle weighted inclusion-exclusion formula for union of balls
Even though the equations described above are minimal, it is possible to find even shorter expressions for the weighted areas and volumes if non integer coefficients are considered. This is what is referred to as the short inclusion-exclusion method and is described in detail in [32]. In this method, the area and volume are expressed as the sums of the contributions of intersections of at most three balls, with angular coefficients. The corresponding expressions for the weighted areas and weighted volumes are:
Short Weighted Area Formula
| (4) |
and
Short Weighted Volume Formula
| (5) |
where Fi is the fraction of the Voronoi region of Si delimited by the planes defined by the triangles Δzizjzk, Δzizjzl and Δzizkzl. We show in appendix A how to compute the volume of Fi.
The coefficients γ are the normalized exposed angles of the simplices [57]; they are given by:
| (6) |
| (7) |
| (8) |
where Ωi is the solid angle at vertex zi and φij is the dihedral angle associated with the edge zizj in the tetrahedron defined by zi, zj, zk and zl. These coefficients can be interpreted as the fraction of solid angle (for a vertex), of dihedral angle (for an edge) or face of triangle that remains accessible in the dual complex. All edges and triangles that are fully buried have zero contribution in equations 4 and 5. In parallel, tetrahedra in the dual complex that are fully buried do not contribute to the area, and only contribute their volume (which is easer to compute than the volume of the intersection of balls) in the volume formula.
Note that in the special case that the weights of all atoms are equal to 1, these equations give the surface area and volume of a union of balls and can be written:
| (9) |
and
| (10) |
2.1.3 Area and volume derivatives
We are interested in the derivatives of the area and the volume of a union of N balls with respect to their positions. We have recently derived expressions for these derivatives with respect to the Cartesian coordinates of the center of the balls [55, 56]. We revisit this problem here and propose new expressions for the derivatives with respect to the distances between the center of these balls; these distances represent internal coordinates for the system that are rotationally invariant.
Derivatives with respect to internal distances
The volume of a union of balls and area of its boundary are fully characterized by the simplified, angle-weighted inclusion-exclusion equations 5 and 4, respectively. In appendix A, we show that all terms included in these two formulas can be expressed as functions of the radii of the balls and the distances between their centers. We compute the derivatives of the volume and area with respect to these distances algebraically. Note that the derivatives with respect to the distance rab between the centers za and zb of the two balls Ba and Bb is non zero if and only if the edge zazb belongs to the dual complex. We get:
Weighted Area Derivative Theorem
| (11) |
and
Weighted Volume Derivative Theorem
| (12) |
In the specific case that all weights are equal to 1, the derivative of the volume is:
Volume Derivative Theorem
| (13) |
Note that there are no terms involving the derivatives of γabi as those are independent of distances.
Formulas for the derivatives of the different terms
,
,
,
and vol (Fi) are straightforward from their analytical expressions (see appendix A). The angular coefficient γi of a vertex zi is computed over all tetrahedra of K that contain i. if zi is such that it belongs to at least one tetrahedron of K that also contains za and zb, then:
In all other cases, . Similarly,
if τijab ∈ K, and 0 otherwise.
The derivatives of the dihedral angles φij of a tetrahedron with respect to its edge lengths are given in appendix B. Finally, the volume derivative formula (13) also includes the derivatives of the volume of a tetrahedron with respect to its edge lengths, whose expressions are also given in appendix B.
Derivatives with respect to Cartesian coordinates
Once the derivatives with respect to internal coordinates are available, derivatives with respect to Cartesian coordinates are easily computed using the chain rule:
Cartesian Derivative Theorem The gradients a and v ∈ ℝ3n of the area and volume derivatives are
where uij = (zi − zj)/rij is the unit vector in the direction of the edge zizj.
2.2 Voids and Pockets
A full description of how to detect and measure pockets in a union of balls based on the alpha shape theory is available in [51]. Briefly, the concept of pockets is ultimately connected to the notion of a continuous flow field defined on the Delaunay triangulation of these balls. Let T be the set of tetrahedra in the Delaunay triangulation and T = T ∪ τ∞ where τ∞ is a dummy element representing the complement of the triangulation in
. The flow relation ’➢’ with τ ➢ σ is defined by:
τ and σ share a common triangle Δ, and
The interior of τ and the orthogonal center zτ of τ lie on different sides of the plane defined by Δ.
where the orthogonal center zτ is the center of the smallest ball that is orthogonal to all four balls whose centers are the vertices of τ.
If τ ➢ σ, τ is a predecessor of σ and σ is a successor of τ. σ ∈ T is a sink if it has no successors; in other words, a tetrahedron is a sink if and only if it contains its orthogonal center. Sinks are important since they are responsible for the formation of voids: if H is a void of the union of balls then at least one tetrahedron in H is a sink.
By definition, pockets consist of the Delaunay tetrahedra that do not belong to the dual complex K and are not ancestors of τ∞. The only type of pockets without connection to the outside are the voids. All other pockets connect to the outside at one or more places, called mouth. Figure 2 illustrates these concepts on a simple two-dimensional example.
Figure 2. Illustration of the discrete flow and pockets in a union of disks.
The dual complex of the union of disks is shown in red; all triangles in the Delaunay complex that do not belong to the dual complex are referred to as empty. Acute empty triangles contain their orthocenters: they correspond to sinks. We identify them with large blue dots to mark the position of the orthocenter. The obtuse empty triangles either flow to these acute triangles or to the outside (”infinity”). Triangles III, IV and V (shown in light blue) for example flow to infinity: they do not define pockets. The remaining triangles can be partitioned into two groups: region I is completely surrounded by the union of disks and therefore defines a void, while region II is connected to the outside by one mouth, and is referred to as a pocket.
The surface area and volume of a pocket are easily computed by first identifying their tetrahedra and their faces that belong to the dual complex followed by the application of simplified inclusion-exclusion formula similar to those used for measuring the dual complex (see [57] for details).
3 Algorithm & Implementation
AlphaVol is our original software package that implemented the Alpha Shape theory for measuring biomolecules [57]; its origins lie in the Alpha Shape package [62]. AlphaVol takes as input a set of balls in ℝ3, each specified by the coordinates of its center and the radius. In the case of biomolecules, this set is extracted from the corresponding PDB file using one of several standard sets of van der Waals radii. The computation is performed through four successive tasks:
Step 1. Construct the Delaunay triangulation.
Step 2. Extract the dual complex.
Step 3. Measure the union using inclusion-exclusion.
Step 4. Detecting and measuring the pockets
AlphaVol uses a standard algorithm from computational geometry for each of these tasks: the incremental flipping algorithm from Edelsbrunner and Shah [63] for computing the Delaunay, an algorithm based on the primitives described by Edelsbrunner [64] for computing the dual complex, our own algorithm for implementing the inclusion-exclusion formula [55, 56] and the algorithm of Edelsbrunner, Facello and Liang [51] for computing pockets. We had made a few small modifications to these algorithms as our interests are mostly measuring biomolecules. For example, we only compute the dual complex and not the full filtration of the Delaunay complex.
AlphaVol showed good performances on a set of small to medium-sized proteins [57]; table 1 illustrates however that the algorithms we have implemented fail, or at least become very slow for vary large system. This is especially true for computing the Delaunay (the capsid corresponding to 2dum is ten times larger than the capsid for 1ihm, however it takes more than 100 times longer to compute its Delaunay triangulation which is not in par with an expected O(n log(n)) time complexity) and even more for detecting and measuring pockets. We have written a new version of the AlphaVol software [57] in which the original algorithms have been either modified or fully rewritten to alleviate these severe drawbacks. In the following we describe these modified algorithms and their implementations. The new program is called UnionBall.
Table 1.
CPU times for measuring biomolecules using AlphaVol
| Molecules a | Number of atoms | Delaunay | Dual complex | Volume | Pockets |
|---|---|---|---|---|---|
| 1TIM | 2288 | 0.02 | 0.01 | 0.01 | 0.02 |
| GroEL | 66136 | 1.76 | 0.35 | 0.80 | 23.16 |
| 1ihm | 677040 | 78.27 | 9.39 | 5.87 | 2810.00 |
| 2dum | 5214540 | 8577.00 | 99.00 | 59.30 | 205818.00 |
The four different molecules are: the chicken triose phosphate isomerase (PDB code 1TIM), the GroEL chaperonin (PDB code 1SX3), a Norwalk virus (PDB code 1ihm; we use the fully reconstructed capsid available at the viperdb web site [65], and the full capsid of a human adenovirus, PDB code 2dum, available at viperdb. For each molecule, we compute the accessible surface area, the corresponding volume inaccessible to solvent, and identify and measure all pockets. Calculation are performed on a single Intel processor running at 3.16 GHz, with 6MB of cache memory and 8 GB of RAM. Computing times are reported in seconds.
We compare the original and modified algorithms as implemented in the corresponding programs AlphaVol and UnionBall for all four steps defined above on a dataset of 285 virus capsids varying in size from sixty thousand atoms to sixteen million atoms; the structures for these capsids were downloaded from the web enabled relational database VIPERdb2 [65]. Note that these capsids are highly symmetric (icosahedral) and repetitive; we do not however make use of these symmetries. For all capsids, we compute the accessible surface area, the corresponding solvent-excluded volume, we detect all pockets, and we compute their volume and surface area. Figure 3 illustrates this process on the Sindbis virus.
Figure 3. Caracterizing the geometry of the Sindbis virus.
The Sindbis virus is an RNA virus, member of the alphavirus; it is transmitted by mosquitoes and is responsible for the Sindbis fever, most common is South and East Africa, the Philipines and Australia. The structure of the full virion consists of two protein capsids (the outside capsid made of glycoproteins and the inner nucleocapsid, a lipid bilayer sandwiched between the two capsids, a set of transmembrane domains that cross the lipid bilayer and connect the two capsids, and the single stranded RNA that occupy the cavity inside the nucleocapsid; it was determined by combination of X-ray crystallography on individual proteins of the capsid and cryoelectron microscopy [78]. All images show here are based on the complete structure of the capsids obtained from the VIPERdb2 database (file 1ld4.vdb); note that this file only includes the proteins (capsids and transmembrane domians). A. Surface of the virus: the outer glycoprotein capsid. B. Cross section through the capsid, showing the outer capsid, the inner nucleocapsid and the transmembrane. C. Cross section of the dual complex corresponding to the two capsids and transmembrane domains. The simplices of this complex define all the terms of the inclusion-exclusion formula needed to compute the volume and surface area of the virus D. The two main pockets identified by UnionBall: the central pocket (in green) occupies the whole region in the center; it includes many large tetrahedra that are cut by the cross section. The second largest pocket (shown in pink) between the two capsids corresponds to the region where the lipid bilayer is found.
3.1 Improved Delaunay computations for large molecular systems
Our implementation of the Delaunay triangulation in the original program AlphaVol was based on the randomized incremental algorithm described in [63]. In this algorithm, the triangulation is constructed incrementally, by adding one sphere at a time. Before starting the construction, the spheres are re-indexed such that S1, S2, …, Sn is a random permutation of the spheres as they appear in the input file. Four dummy additional spheres with their centers at infinity are added so that all input spheres are contained in the tetrahedron they define. Let Di be the Delaunay triangulation of the four spheres at infinity together with S1, S2, …, Si. The algorithm proceeds by iterating three steps:
| for i = 1 to n do | |
| 1 | find tetrahedron τ ∈ Di−1 that contains zi; |
| 2 | add zi to decompose τ into four tetrahedra; |
| 3 | flip locally non-Delaunay triangles attached to zi |
| endfor. |
The randomization preprocessing in this algorithm guarantees an expected theoretical running time of O(n log(n) + n2) in the worst case [63]. In practice however, a very different behavior is observed for very large dataset as illustrated in figure 4. This is unfortunately a known problem related to memory access on a computer observed for large dataset. Inherent to their nature, randomized algorithms access the data structures they maintain randomly, and random access wroks poorly with memory hierarchies available on modern computers. Virtual memory operating systems cache recently used data in memory, under the assumption that they are more likely to be used again soon. This assumption is violated by randomized algorithms who consequently perform poorly as the data structure exceeds the cache size.
Figure 4.
The running times of AlphaVol (with randomization) and UnionBall (without randomization) for computing the regular Delaunay triangulation.
A simple solution is to insert points in an order which improves locality. Amenta, Choi and Rote [66] developed such a scheme while at the same time maintaining enough randomness so that the algorithm remains theoretically optimal. Their Biased Randomized Insertion Order (BRIO) method was shown to significantly improve performance. Later, Liu and Snoeyink [67] proposed a different method for ordering the points based on the space-filling curve. In their methods, all input points are placed into a 3D grids of N3 bins which are then visited in a Hilbert curve order; this method was found to speed up step 1 (point location). Both BRIO and this method require preprocessing of the data that comes with a computational cost.
Interestingly, the order in which data points are stored in a PDB file is inherently local. In most cases, two consecutive atoms either belong to the same amino acids or to two sequential amino acids that are in contact. Breaks occur for missing data and/or between chains in the case of a multimeric structures. These breaks may lead to non locality; they are however the exception and are not expected to play a significant role. We tested the effect of using the locality provided by the input PDB file by simply removing the randomization step in the algorithm described above. As illustrated in figure 4, this resulted in a significant improvement in performance. Removing the randomization leads to an observed linear dependence of the computing time with respect to the number of weighted spheres considered. We implemented this modification in UnionBall. Note that our initial attempts to implement BRIO and the Hilbert curve ordering did not lead to improved performances (data not shown).
3.2 Improved dual complex construction
Changing the Delaunay algorithm leads to a different ordering of the tetrahedra in the geometric data structure that stores the triangulation; this by itself is expected to result in faster construction of the dual complex. There is however another step that can be improved.
Given the Delaunay triangulation D of the input spheres, we construct the dual complex K ⊆ D by labeling the Delaunay simplices. Specifically, for each simplex τ ∈ D there is a threshold ατ such that τ ∈ K if and only if . To label the Delaunay simplices we therefore need to decide the signs of their square thresholds. This test can be expressed in terms of the signs of the determinants of small matrices whose entries are center coordinates and square radii of the input spheres. Detailed expressions for these tests can be found in [62, 64]. An important ingredient in this context is the treatment of singularities. Inexact versions of the numerical tests are vulnerable to roundoff errors and can lead to wrong output. Following work in computational geometry [68], we implemented these tests using a so-called floating-point filter that first evaluates the tests approximately, using floating-points arithmetic, and if the results cannot be trusted, switches to exact arithmetic. The difficult part in implementing such a filter comes in the definition of ”trust”. Let us consider for example the determinant:
that is needed for computing for the tetrahedron τ defined by the four vertices zi, zj, zk, and zl. An upper bound on the error in computing D is given by:
where C is a constant that depends on the number of terms in the expansion of D, Cmax is the maximum absolute value of all coordinates of the four vertices, and ε0 is the IEEE machine precision equal to 2−53. For large molecules, Cmax can be quite large (in the order of several thousands) leading to large values for ε(D). This value however is unnecessarily large for the predicates involved in the construction of the dual complex. A tetrahedron belongs to the dual complex if the four spheres it represents have a common intersection. As such, the distance between any two of these centers cannot be larger than the sum of the radii of the two spheres, typically lower than 10 in the case of molecule, i.e. much smaller than the absolute coordinates of the point. We can use this fact by first centering the simplex under consideration on its orthocenter; the corresponding Cmax value is consequently much smaller, leading to smaller ε(D) and consequently to a smaller number of switches to exact arithmetics. We have implemented this modification in UnionBall.
Figure 5 compares the computing times required to construct the dual complex for all virus capsids in our dataset by AlphaVol and UnionBall; the improvement is not as drastic as for computing the Delaunay triangulation but still significant.
Figure 5.
The running times of AlphaVol and UnionBall (with a new floating point filter) for constructing the dual complex.
Weighted surface areas, volumes, and their derivatives
In UnionBall, we compute the weight surface area and volume of the union of balls using the short weighted formulas given by equations 4 and 5 as well as the formulas for computing the intersections of one, two and three balls given in appendix A. Note that these formulas only depend on the distances between the centers of the balls, and not on their Cartesian coordinates. All the distances can be precomputed, resulting in a significant speedup. Figure 6 compares the computing times required to measure the dual complex for all virus capsids in our dataset by AlphaVol and UnionBall; we do believe that most of the improvement comes from the better ordering of the tetrahedron resulting from the modified algorithm used for computing the Delaunay triangulation.
Figure 6.
The running times of AlphaVol and UnionBall for computing the volume and surface area of the dual complex.
The derivatives of the weighted surface area and volume are computed using equations 11, 12, and 14.
Detecting and measuring voids
As shown in table 1, the detection of voids and cavities as implemented in AphaVol is the most inefficient step in characterizing the geometry of a very large union of balls, with a near N2 dependence with respect to the number of balls. The corresponding algorithm was originally designed for generic union of balls. It starts from the master list of tetrahedra in the Delaunay complex, stored in the order in which they belong to the alpha complex and proceeds in two steps (see [51] for more details). Firstly, it computes a depth for each tetrahedron which is the index of its largest successor based on the discrete flow relationship. Secondly, pockets are constructed as sets of tetrahedra represented by a union-find data structure. The initial list of pockets is empty. The process then scans the tetrahedra that do not belong to the the dual complex as they appear in the master list; when it reaches the tetrahedron with index j, all tetrahedra with depth j are added to the union-find structure, each as an individual pocket. When a tetrahedron is added however, the algorithm checks its four direct neighbours; if one of these belong to an existing pocket and the face between the two tetrahedra is not in the dual complex, the two corresponding pockets are joined. The algorithm stops when all tetrahedra have been processed. Each set of tetrahedra in the final union-find structure is deemed a pocket, with the exception of the set containing the dummy tetrahedron τ∞ which represents the outside.
This algorithm is theoretically optimal; in practice however, it suffers from the same problem of lack of locality when accessing data in memory, leading to significant thrashing. To circumvent this problem, we propose a different approach that is geared towards improved locality. Similar to the original approach, we compute the depth of each tetrahedron in the Delaunay complex; this step is fast as it is local by nature. From this knowledge, we can isolate the tetrahedra that flows to the outside. Each tetrahedron in the Delaunay complex is then assigned a flag, visited, initially set to one if it belongs to the dual complex or flows to outside and zero otherwise. The algorithm then proceeds as follows:
for all tetrahedra σ in del(B) do
if visited(σ) = 0 then
Define new pocket P = {σ}; Define L(P) = {σ}; visited(σ) = 1;
while |L(P)| ≠ = 0 do
τ = pop(L(P))
for all φ ∈ N(τ) with visited(φ)=0 do
let T be the triangle shared by τ and φ
if T ∉ K then
P = P∪{φ}; L(P) = L(P)∪{φ}; visited(φ) = 1
endif
endfor
endwhile
endif
endfor.
where N(τ) is the list of (up to) four neighbors of the tetrahedron τ. As written, this algorithm detects the pockets in the union of ball; it can easily be extended to compute their volume and surface area (as a tetrahedron is added to a pocket we also compute its contribution to the geometric measures).
The main element of this algorithm is the list L(P): it enforces spatial locality, which to a first approximation matches with locality in the list of tetrahedron, resulting in a much better usage of cache; figure 7 illustrates this improvement.
Figure 7.
The running times of AlphaVol and UnionBall for detecting and measuring voids in large biomolecules.
Characterizing the geometry of large biomolecules
UnionBall incorporates all the modifications presented above. With this new program, it takes 18.4 s and 136.5 seconds to fully characterize the two viruses 1ihm and 2dum, respectively; these numbers are significantly better than the corresponding computing times for AlphaVol, namely 2903s for 1ihm and 214553 seconds for 2dum (see table 1).
The plots in Figure 8 shows the running times per point in millisecond for all four steps performed by UnionBall as a function of the virus size. All four steps show nearly constant behavior over the whole range of virus sizes. Constructing the regular triangulation is the slowest step with an average running time of 13 μ-seconds per points. This running time compares favorably with those reported for five popular codes that compute 3D Delaunay tessalations (figures 3 and 4 in Liu and Snoeyink [67]), even after correcting for the difference in processor speed. Among these five codes, Tess3 is the fastest, with an average running time of 20 μ-second per point, which would correspond to 10 μ-second on the processor we have used; it should be mentioned however that Tess3 performs all calculations using floating points, resulting in some topological mistakes in a few rare cases. Filtering the regular triangulation to construct the dual complex, computing the accessible surface area and volume of the virus, and detecting and measuring the pockets require on average 3 μ-seconds, 7 μ-seconds and 5 μ-seconds, respectively.
Figure 8.
The running time of the different steps performed by UnionBall. The timings are computed on a single Intel Xeon processor running at 3.16 GHz with 6MB of cache memory and 8 GB of RAM. UnionBall is written in Fortran, except for all calculation in arbitrary precision arithmetics that are performed with the C library libgmp. The program is compiled with the Intel Fortran and C compilers. Lines correspond to the averages of five runs.
4 Conclusion
The alpha shape theory provides an accurate and robust method for computing the geometric measures of a biomolecule. Among these measures, surface area and volume are used to quantify the interactions between such a biomolecule, and the water surrounding it, in implicit solvent models. In addition, the detection of pockets within a biomolecule and the determination of their sizes serve as a starting point for predictive studies of biomolecule-ligand interactions. Several implementations of the alpha shape theory exist, including our own, AlphaVol [57]; these implementations have mostly been tested on globular proteins or medium-size nucleic acids. In the last few years however, spectacular advances in structural biology have produces an abundance of data on large macromolecular complexes, such as the RNA polymerase transcription complexes [69], the ribosome complexes [70, 71], and full size viruses [65] that contain several millions of atoms. Modeling these large systems is as important as modeling smaller proteins or nucleic acids. We have shown in this paper that the standard implementations of the alpha shape theory fail on such large systems, or at least become impractical as their running times are then unrealistically large. We have shown also that these difficulties are not theoretical; rather, they are related to the architecture of current computers that rely on the use of cache memory based on the principle of locality to speed up calculation. By rewriting the algorithms that implement the different steps of the alpha shape theory such that we improve locality, we have shown that we can remediate these cache problems; the corresponding code, UnionBall has an apparent O(n) behavior over a large range of values of n (up to tens of millions), where n is the number of atoms in the macromolecule. The two critical steps for which the largest improvements are observed are the construction of the Delaunay tessalation and the detection and measurements of the pockets.
The key to improving the construction of the Delaunay complex for large sets of points was to recognize that a fully randomized algorithm is impractical as it brakes the locality principle. We have remediated this problem by biasing the order in which the points are inserted, using the simplest possible scheme, i.e. the order in which the points appear in the PDB structure file. Since biomolecules are basically long chains of monomers that are stored sequentially as they appear along the chain, this natural ordering ensures locality. Obviously, this ”trick” is specific to biomolecular structural data and would not apply to generic data sets. The concept of biasing the order of insertion for constructing the Delaunay complex is however more general and has been implemented before [66, 67].
Further improvements might still be possible if we take into account the nature of the data even more. For example, all the viral capsids that were used as a test set in this study have a large empty cavity in their center (see figure 3). In the process of building the Delaunay complex for such a capsid, the algorithm will cover this cavity with a large number of complicated, elongated tetrahedra. It might be possible to generate a more regular tessalation of this cavity by introducing dummy points in the cavity, following the idea of adding points in a tetrahedral mesh to improve its quality [72]. We are currently exploring this idea.
We conclude this paper by mentioning that UnionBall is available as OpenSource software by contacting P.K.
Acknowledgments
This work derives from a long standing collaboration between P.K. and Prof. Herbert Edelsbrunner; we thank him for his mentorship and guidance. P.K. acknowledges current support from the NIH under contract GM080399.
Appendix A: Measuring the intersections of two, three and four balls
Several formulas have been presented for the volume and surface areas of the intersection of two, three and four spheres with unequal radii (see for example [30, 73, 74]). Here we propose new geometric derivations of these formulas that satisfy a specific constraint, namely we need expressions for the intersections that only depend of the radii of the spheres and the distance between their centers.
Notation
We consider up to four balls Bi, Bj, Bk and Bl whose boundaries are the spheres Si, Sj, Sk and Sl, respectively. Let zi and ri be the center and radius of ball Bi and let rij be the distance between zi and zj. The intersection between the two balls Bi and Bj is the union of two caps
and
, illustrated in red and blue respectively in figure A.1. These two caps are connected at the level of the plane that separates the Voronoi cells of Si and Sj; this plane cuts the two spheres in a circle with center yi;j and radius ri;j. We also define the height of spherical cap
as hi;j. It can easily be shown that:
| (A.1) |
| (A.2) |
Figure A1.

Intersection of two disks.
As above,
is the surface area of the sphere Si;
,
and
are the contributions of Si to the surface areas of the intersections of Si and Sj, of Si, Sj and Sk, and of Si, Sj, Sk and Sl, respectively:
Similar expressions are used for volumes.
Intersection of two balls
Lemma 1
The intersection between two balls is illustrated in figure A.1. We have:
| (A.3) |
| (A.4) |
with hi;j defined in equation A.1.
Proof
Computing the volume and surface area of a spherical cap is a standard textbook problem that has been solved in many forms. Proofs for the formula given above can be found for example at the MathWorld web site (http://mathworld.wolfram.com/SphericalCap.html).
Intersection of three balls
The contribution of Bi to the surface area and volume of the intersection of the three balls Bi, Bj and Bk is defined by the intersection of the two caps
and
, illustrated in red and blue respectively in panel A of figure A.2. The three spheres Si, Sj and Sk intersect in two points Pijk and Pikj. We consider the tetrahedron T3 formed by the centers of the three balls and Pijk (see panel B of figure A.2). The faces of T3 are labeled zizjzk, zizjPijk, zizkPijk and zjzkPijk with areas sP, sk, sj and si, respectively. The areas are computed using Heron’s formula (see appendix A). The dihedral angles corresponding to the edges zizj and zizk are denoted as θij;k and θik;j, respectively, while ψi is the dihedral angle corresponding to the edge ziPijk.
Figure A2.
A. Intersection of three balls. B. The core tetrahedron T that defines the intersection of the three balls. zi, zj and zk are the centers of the spheres. Pijk is one of the two points common to the three spheres; as such, it is located at distances ri, rj and rk from zi, zj and zk, respectively.
Lemma 2
The contributions of Si and Bi to the surface area and volume of the triple intersection are given by:
| (A.5) |
| (A.6) |
where the dihedral angles are computed from the edge lengths of the tetrahedron T3 (see appendix B). Formulas for the contributions of Bj and Bk to the intersection are easily deduced by index permutation on these equations.
Proof
We focus on the geometric proofs of equations A.5 and A.6.
Surface area
Let zi;j and zi;k be the points of intersection of the sphere Si bounding the ball Bi with the lines zizj and zizk, respectively; these two points can be seen as ”centers” of the two caps. Pijk and Pikj are the two points that are common to all three spheres. These four points form a spherical quadrangle, with spherical angles βij;k, βik;j, αijk and αikj (see figure A.3). Note that this quarangle is symmetric with respect to the plane formed by the centers of the three balls which is also the plane passing by the three points zi, zi;j and zi:k. Consequently, αijk = αikj.
Figure A3.
Intersection of three spheres Bi, Bj and Bk viewed on the flattened surface of Bi. Key to our approach is the spherical quadrangle formed by the two points Pijk and Pikj that are common to the three spheres and by the ”centers” of the caps, zi;j and zi;k.
The spherical angle βij;k is the dihedral angle between the plane Δzizi;jPijk and the plane Δzizi;jPikj. Because of the symmetry with respect to the plane containing the three centers, and because zi;j belongs to the line zizj, we find:
Similarly, βik;j = 2θik;j and αijk = φi.
We compute the surface area Q of this spherical quadrangle in two different ways. First, we use the formula for the area of a polygon on a sphere ( , where R is the radius of the sphere, n the number of vertices in the polygon, and θi the internal angle at vertex i):
| (A.7) |
Second, we observe that the area of the quadrangle can be decomposed as:
+ the area A1 of the sector of the cap
that is delimited by the two arcs zi;jPijk and zi;jPikj+ the area A2 of the sector of the cap
that is delimited by the two arcs zi;kPijk and zi;kPikj,− the area of the intersection
as it appears twice.
Therefore
| (A.8) |
The surface areas A1 and A2 are the fraction of the surface areas of the caps
and
covered by the angles 2θij and 2θik
| (A.9) |
where hi:j and hi:k are the heights of the two caps.
Combining equations A.7, A.8 and A.9, we validate equation A.5.
Volume
To compute the contribution
of ball Bi to the volume of the intersection of the three balls, we consider the sector of Bi that joins its center zi to the sphere sector whose surface is
. The volume Vs of this sector can be computed in two different ways:
- First, the volume Vs of a sector is given as riA/3, where ri is the radius of the ball and A is the area of the sector on the surface of the ball:
(A.10) -
Second, the same sector can be divided into three parts (see panel A in figure A.4): two fractions of cones (filled in red and blue), and the region Bi;jk, whose volume is
(shown in green):
(A.11) The volume of Fij;k is:(A.12) where ASij;k is the area of the base of Fij;k, i.e. the area of the disk of intersection between Bi and Bj covered by the cap Ci;k (see panel B in figure A.4). ASij;k is computed as the difference between the area of the disk covered by 2θik and the triangle Δyi;jPijkPikj.(A.13) where ri;j is the radius of the disk (see equation A.2). Note that this formula is valid even if the disk sector covers the disk center. Similar expressions are derived for the volume of Fik;j.
Figure A4. Computing the volume of the intersection of 3 balls.
A The plane passing through the centers zi, zj and zk of the three balls. vi;j and vi;k are the distances between the center zi and the Voronoi planes separating i and j, and i and k, respectively, while yi;j and yi;k are the points of intersection between the edges zizj and zizk with these two planes. The contribution Bi;jk of Bi to the intersection of the three balls is shown in green. The sector joining zi to Bi;jk is the key to computing its volume. This sector can be divided into three parts: Bi;jk itself, and two fractions of cones Fij;k and Fik;j, filled in blue and red, respectively. B. Projected view on the plane identified with arrows on panel A, i.e. the Voronoi plane between balls Bi and Bj. The base of the fraction of cone Fij;k is shown filled in blue.
Combining equations A.10 to A.13, we validate equation A.6.
Intersection of four balls
Let Bi, Bj, Bk and Bl be the four balls with a common intersection. Their centers define a tetrahedron T4 with faces Ti, Tj, Tk and Tl and corresponding areas si, sj, sk and sl, respectively, defined such that za ∉ Ta for all a = i, j, k, l. We denote the dihedral angle of T4 between the two faces that share the edge zizi as φij. We also define the solid angle subtented by the vertex zi as Ωi. These angles can be computed from the edge lengths of the tetrahedron (see appendix B). Note that:
Lemma 3
The contribution of Bi to the surface area and volume of the intersection of the four balls is defined by the intersection of the three caps
,
and
:
| (A.14) |
| (A.15) |
where
| (A.16) |
and the angles θ have been defined above for the different intersections of three balls.
Proof
We focus on the geometric proofs of equations A.14 and A.15.
Surface area
Let us consider the spherical triangle ST = {zi;jzi:kzi:l} whose vertices are the intersections of the edges zizj, zizk and zizl with the sphere Si. The spherical angle at vertex zi;j is the dihedral angle between the planes defined by zizi;jzk and zizi;jzl; it is therefore the dihedral angle between the planes zizjzk and zizjzl, i.e. φij. Similarly, the spherical angles at vertices zi:k and zi:l are φik and φil, respectively. We compute the area Atot of the spherical triangle ST using two approaches:
-
Firstly, we use the formula for the area of a spherical polygon:
(A.17) where
is the surface area of sphere Si, and Ωi the solid angle subtented by the vertex zi of T4. -
Secondly, Atot is computed using an inclusion-exclusion formula: it is the sum of the surface areas Aj, Ak and Al of sectors of the three spherical caps Ci:j, Ci:k and Ci:l, minus the surface areas Ajk, Ajl and Akl of the intersections of these sectors, plus the surface area of their triple intersection (see panel B in figure A.5).
The surface areas of the sectors are:(A.18) By noticing that the arc circle zi;azi:b cuts the intersection between the two caps Ci:a and Ci:b in half for all a ≠ b = j, k, l, we get:(A.19) Finally, the surface area of the triple intersection is
. Therefore,
(A.20)
Figure A5. Computing the surface area of the intersection of four spheres.
A Projected view on the flattened surface of Si. Key to our approach is the spherical triangle ST formed by the ”centers” of the caps, zi;j zi:k and zi;l, corresponding to the points of intersection of Si with the lines zizj, zizk and zizl, respectively. B. The surface area of ST is computed using an inclusion exclusion formula.
Combining the two equations A.17 and A.20, we validate equation A.14.
Volume
Let us now consider the sector of the ball Bi obtained by joining its center zi to the spherical triangle ST = {zi;jzi:kzi:l} (see panel A of figure A.6). Similar to the computation of the surface area, we compute the volume Vtot of this sector using two approaches:
Figure A6.
A We consider the sector of Bi whose apex zi joins the spherical triangle ST. This sector can be divided into two parts: the region Fi (shown in grey) corresponding to the intersection of the tetrahedron T4 formed by the centers of the four balls and the three Voronoi planes that separates Bi from the three other balls, and the region Gi (light grey) that sits between the sphere Si and these three planes. The volume of Gi is computed by inclusion-exclusion (see text). Fi is the union of three pyramids, all three with apex zi; their bases lie in the three Voronoi planes; for example, the base of the pyramid corresponding to Bi and Bj is labeled A1. B. A1 is the quadrilateral defined by yi;j (the center of the disk of intersection between Bi and Bj), Dk and Dl (the projections of yi;j on the Voronoi planes between Bi and Bk and Bl, respectively), and Pijkl the Voronoi vertex dual to the tetrahedron T4.
- Firstly, the volume of a ball sector is equal to 1/3 times the radius of the ball times the surface area of the sector:
(A.21) -
Secondly, we divide the sector into two regions: the region Fi delimited by the tetrahedron T4 and the three Voronoi planes that separates Bi from Bj, Bk and Bl, and the region Gi delimited by these three planes and the sphere Si (panel A of figure A.6). The volume of Gi is computed using the same inclusion-exclusion formula that was used for the surface area of ST:
(A.22) therefore,(A.23)
Combining the two equations A.21 and A.23, we validate equation A.15.
The volume of Fi is computed as the sum of the volumes of the three pyramids with apex zi and bases on the Voronoi planes relative to Bi (see figure A.6):
| (A.24) |
The surface area of the base A1 is computed as the difference between the area of the triangles Δyi;jDlD and ΔDkDPijkl (see panel B in figure A.6):
| (A.25) |
where dk = ri;j cos θij;k and dl = ri;j cosθij;l (see figure A.4). Similar equations are derived for the areas of A2 and A3. Combining these equations with equation A.24 validates equation A.16.
Note that:
| (A.26) |
Angle weighted inclusion-exclusion formula for union of balls
Equation A.14 implies that the surface area of the intersection of four spheres is a linear combination of the surface areas of the individual spheres and of the intersections of two and three spheres, where the linear coefficients are related to the six dihedral angles of the tetrahedron formed by the centers of the four spheres. The same relationship exists for the volume of the intersection of four balls, with additional terms for the intersection of the tetrahedron formed by the center of the four balls and their Voronoi cells. Replacing the corresponding equations (A.14 and A.15) in the Weighted Volume and Weighted Area Theorems, we get the simplified inclusion-exclusion formulas 4 and 5.
Appendix B: The geometry of a tetrahedron
Let us consider the tetrahedron T defined by the four vertices P1, P2, P3 and P4. The four faces of this tetrahedron are T1 = ΔP2P3P4, T2 = ΔP1P3P4, T3 = ΔP1P2P4, and T4 = ΔP1P2P3 and their areas are s1, s2, s3 and s4, respectively. We denote the dihedral angle with respect Ti and Tj for i ≠ j = 1, 2, 3, 4 as θij. The edge between Pi and Pj has length lij, for i ≠ j = 1, 2, 3, 4.
Surface area and volume
The Cayley-Menger matrix M associated with T is given by:
| (B.1) |
We also define the submatrix Mi,j of M obtained by deleting its i – th row and j – th column.
The volume of the tetrahedron T and the surface areas of its faces can be expressed in terms of the determinants of these matrices:
| (B.2) |
| (B.3) |
Dihedral angles
The well-known relationship between the volume of a tetrahedron and any of its dihedral angle [75]
| (B.4) |
cannot be used directly to compute the latter as it does not distinguish if the angle is obtuse or not. We use instead a result referred to as the law of cosine of dihedrals [76, 77]:
| (B.5) |
for 1 ≤ i < j ≤ 4. Combining these two equations, we get:
| (B.6) |
Derivatives of the volume of a tetrahedron
Lemma 4
Let T be a non degenerate tetrahedron whose volume is V. The derivative of V with respect to the length of the edge PaPb is given by:
| (B.7) |
Proof
The Cayley-Menger matrix M of a non degenerate tetrahedron T is invertible (if it is not invertible, its determinant is 0 and the volume of the tetrahedron is 0). Let us call M−1 the inverse of M. Using Jacobi’s formula for the differential of a determinant, we get:
| (B.8) |
where (M−1)ab is the element of the matrix M−1 at row a and column b: this element is the co-factor of M corresponding to the positions (a, b), i.e.:
| (B.9) |
Therefore,
| (B.10) |
Then we have:
| (B.11) |
Combining this equation with the equations B.4 and B.5, we validate equation B.7.
Derivatives of the surface areas of the faces of a tetrahedron
Jacobi’s formula can also be used to compute the derivatives of the surface areas of the faces of a tetrahedron, based on equation B.3. It is easier however to expand the determinant:
| (B.12) |
Then:
| (B.13) |
and
| (B.14) |
if a = i or b = i.
Derivatives of the dihedral angles
Deriving equation B.6 with respect to the length lab of the edge PaPb, we get:
| (B.15) |
where δij;ab is 1 if the pair (i, j) is equal to the pair (a, b) and equal to 0 otherwise.
All terms in this equation are known except for the derivatives of det (Mij). While we could use Jacobi’s formula to compute these derivatives, it is easier to expand the determinant:
| (B.16) |
Its derivatives with respect to each edge length are then straightforward.
Contributor Information
Paul Mach, Email: mach@math.ucdavis.edu, Graduate Group in Applied Mathematics, University of California, Davis, CA 95616.
Patrice Koehl, Email: koehl@cs.ucdavis.edu, Department of Computer Science and Genome Center, University of California, Davis, CA 95616.
References
- 1.Eisenberg D, McLachlan AD. Nature (London) 1986;319:199–203. doi: 10.1038/319199a0. [DOI] [PubMed] [Google Scholar]
- 2.Ooi T, Oobatake M, Nemethy G, Scheraga HA. Proc Natl Acad Sci (USA) 1987;84:3086–3090. doi: 10.1073/pnas.84.10.3086. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Liang J, Edelsbrunner H, Fu P, Sudhakar PV, Subramaniam S. Proteins: Struct Func Genet. 1998;33:1–17. [PubMed] [Google Scholar]
- 4.Liang J, Edelsbrunner H, Fu P, Sudhakar PV, Subramaniam S. Proteins: Struct Func Genet. 1998;33:18–29. [PubMed] [Google Scholar]
- 5.Lee B, Richards FM. J Mol Biol. 1971;55:379–400. doi: 10.1016/0022-2836(71)90324-x. [DOI] [PubMed] [Google Scholar]
- 6.Wood RH, Thompson PT. Proc Natl Acad Sci (USA) 1990;87:946–949. doi: 10.1073/pnas.87.3.946. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Tunon I, Silla E, Pascual-Ahuir JL. Protein Eng. 1992;5:715–716. doi: 10.1093/protein/5.8.715. [DOI] [PubMed] [Google Scholar]
- 8.Simonson T, Brünger AT. J Phys Chem. 1994;98:4683–4694. [Google Scholar]
- 9.Lum K, Chandler D, Weeks JD. J Phys Chem B. 1999;103:4570–4577. [Google Scholar]
- 10.Wagoner J, Baker N. Proc Natl Acad Sci (USA) 2006;103:8331–8336. doi: 10.1073/pnas.0600118103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Shrake A, Rupley JA. J Mol Biol. 1973;79:351–371. doi: 10.1016/0022-2836(73)90011-9. [DOI] [PubMed] [Google Scholar]
- 12.Legrand SM, Merz KM. J Comp Chem. 1993;14:349–352. [Google Scholar]
- 13.Wang H, Levinthal C. J Comp Chem. 1991;12:868–871. [Google Scholar]
- 14.Futamura N, Alura S, Ranjan D, Hariharan B. IEEE Trans Parallel Dist Syst. 2004;13:544–555. [Google Scholar]
- 15.Rowlinson JS. Mol Phys. 1963;6:517–524. [Google Scholar]
- 16.Pavani R, Ranghino G. Computers and Chemistry. 1982;6:133–135. [Google Scholar]
- 17.Gavezzotti A. J Am Chem Soc. 1983;105:5220–5225. [Google Scholar]
- 18.Till M, Ullmann GM. J Mol Model. 2010;16:419–429. doi: 10.1007/s00894-009-0541-y. [DOI] [PubMed] [Google Scholar]
- 19.Wodak SJ, Janin J. Proc Natl Acad Sci (USA) 1980;77:1736–1740. doi: 10.1073/pnas.77.4.1736. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Hasel W, Hendrikson TF, Still WC. Tetrahed Comp Method. 1988;1:103–106. [Google Scholar]
- 21.Cavallo LJK, Fraternali F. Nucl Acids Res. 2003;31:3364–3366. doi: 10.1093/nar/gkg601. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Street AG, Mayo SL. Folding & Design. 1998;3:253–258. doi: 10.1016/S1359-0278(98)00036-4. [DOI] [PubMed] [Google Scholar]
- 23.Weiser J, Shenkin PS, Still WC. J Comp Chem. 1999;20:217–230. [Google Scholar]
- 24.Dynerman D, Butzlaff E, Mitchell J. J Comput Biol. 2009;16:523–537. doi: 10.1089/cmb.2008.0157. [DOI] [PubMed] [Google Scholar]
- 25.Richmond TJ. J Mol Biol. 1984;178:63–89. doi: 10.1016/0022-2836(84)90231-6. [DOI] [PubMed] [Google Scholar]
- 26.Connolly ML. J Am Chem Soc. 1985;107:1118–1124. [Google Scholar]
- 27.Dodd LR, Theodorou DN. Mol Phys. 1991;72:1313–45. [Google Scholar]
- 28.Petitjean M. J Comp Chem. 1994;15:507–523. [Google Scholar]
- 29.Irisa M. Comp Phys Comm. 1996;98:317–338. [Google Scholar]
- 30.Gibson KD, Scheraga HA. Mol Phys. 1987;62:1247–1265. [Google Scholar]
- 31.Kratky KW. J Phys A: Math Gen. 1978;11:1017–1024. [Google Scholar]
- 32.Edelsbrunner H. Discrete Comput Geom. 1995;13:415–440. [Google Scholar]
- 33.Kundrot CE, Ponder JW, Richards FM. J Comp Chem. 1991;12:402–409. [Google Scholar]
- 34.Gogonea V, Osawa E. J Mol Struct (Theochem) 1994;311:305–324. [Google Scholar]
- 35.Gogonea V, Osawa E. J Comp Chem. 1995;16:817–842. [Google Scholar]
- 36.Cossi M, Mennucci B, Cammi R. J Comp Chem. 1996;17:57–73. [Google Scholar]
- 37.Perrot G, Cheng B, Gibson KD, Vila J, Palmer KA, Nayeem A, et al. J Comp Chem. 1992;13:1–11. [Google Scholar]
- 38.Sridharan S, Nicholls A, Sharp KA. J Comp Chem. 1994;16:1038–1044. [Google Scholar]
- 39.Wawak RJ, Gibson KD, Scheraga HA. J Math Chem. 1994;15:207–232. [Google Scholar]
- 40.Grant JA, Pickup BT. J Phys Chem. 1995;99:3503–3510. [Google Scholar]
- 41.Weiser J, Shenkin PS, Still WC. J Comp Chem. 1999;20:688–703. doi: 10.1002/(SICI)1096-987X(199905)20:7<688::AID-JCC4>3.0.CO;2-F. [DOI] [PubMed] [Google Scholar]
- 42.Edelsbrunner H. Discrete Comput Geom. 1999;21:87–115. [Google Scholar]
- 43.Levitt D, Banaszak L. J Mol Graph. 1992;10:229–234. doi: 10.1016/0263-7855(92)80074-n. [DOI] [PubMed] [Google Scholar]
- 44.Hendlich M, Rippmann F, Barnickel G. J Mol Graph Model. 1997;15:359–363. doi: 10.1016/s1093-3263(98)00002-3. [DOI] [PubMed] [Google Scholar]
- 45.Venkatachalam C, Jiang X, Oldfield T, Waldman M. J Mol Graph Model. 2003;21:289–307. doi: 10.1016/s1093-3263(02)00164-x. [DOI] [PubMed] [Google Scholar]
- 46.Weisel M, Proschak E, Schneider G. Chem Central J. 2007;1:7. doi: 10.1186/1752-153X-1-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Laskowski R. J Mol Graph. 1995;13:323–330. doi: 10.1016/0263-7855(95)00073-9. [DOI] [PubMed] [Google Scholar]
- 48.Brady G, Stouten P. J Comput Aided Mol Des. 2000;14:383–401. doi: 10.1023/a:1008124202956. [DOI] [PubMed] [Google Scholar]
- 49.Kawabata T, Go N. Proteins: Struct Func Genet. 2007;68:516–529. doi: 10.1002/prot.21283. [DOI] [PubMed] [Google Scholar]
- 50.Yu J, Zhou Y, Tanaka I, Yao M. Bioinformatics. 2010;26:46–52. doi: 10.1093/bioinformatics/btp599. [DOI] [PubMed] [Google Scholar]
- 51.Edelsbrunner H, Facello MA, Liang J. Discrete Appl Math. 1998;88:83–102. [Google Scholar]
- 52.Liang J, Edelsbrunner H, Woodward C. Prot Sci. 1998;7:1884–1897. doi: 10.1002/pro.5560070905. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Yaffe E, Fishelovitch D, Wolfson H, Halperin D, Nussinov R. Nucl Acids Res. 2008;36:W210–W215. doi: 10.1093/nar/gkn223. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Busa J, Hayryan S, Hu C-K, Skrivanek J, Wu M-C. J Comp Chem. 2009;30:346–357. doi: 10.1002/jcc.21060. [DOI] [PubMed] [Google Scholar]
- 55.Edelsbrunner H, Koehl P. Proc Natl Acad Sci (USA) 2003;100:2203–2208. doi: 10.1073/pnas.0537830100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Bryant R, Edelsbrunner H, Koehl P, Levitt M. Discrete Comput Geom. 2004 [Google Scholar]
- 57.Edelsbrunner H, Koehl P. Discrete and Computational Geometry (MSRI Publications) 2005;52:243–275. [Google Scholar]
- 58.Dundas J, Ouyang Z, Tseng J, Binkowski A, Turpaz Y, Liang J. Nucl Acids Res. 2006;34:W116–W118. doi: 10.1093/nar/gkl282. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Naiman D, Wynn H. Annals of Stat. 1992:43–76. [Google Scholar]
- 60.Edelsbrunner H, Mücke EP. ACM Trans Graphics. 1990;9:66–104. [Google Scholar]
- 61.Attali D, Edelsbrunner H. Discrete Comput Geom. 2007;37:59–77. [Google Scholar]
- 62.Edelsbrunner H, Mücke EP. ACM Trans Graphics. 1994;13:43–72. [Google Scholar]
- 63.Edelsbrunner H, Shah NR. Algorithmica. 1996;15:223–241. [Google Scholar]
- 64.Edelsbrunner H. Weighted alpha shapes Technical Report UIUC-CS-R-92-1760. Comput. Sci. Dept., Univ. Illinois; Urbana, Illinois: 1992. [Google Scholar]
- 65.Carrillo-Tripp M, Shephered C, Borelli I, Venkataram S, Lander G, Natarajan P, Johnson J, III, CB, Reddy V. Nucl Acids Res. 2009;37:D436–D442. doi: 10.1093/nar/gkn840. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Amenta N, Choi S, Rote G. Proc. 19th ACM Sympos. Comput. Geom; 2003. pp. 211–219. [Google Scholar]
- 67.Liu Y, Snoeyink J. Discrete and Computational Geometry (MSRI Publications) 2005;52:439–458. [Google Scholar]
- 68.Fortune S, VanWyk CJ. ACM Trans Graph. 1996;15:223–248. [Google Scholar]
- 69.Cramer P, Bushnell DA, Kornberg RD. Science. 2001;292:1863–1876. doi: 10.1126/science.1059493. [DOI] [PubMed] [Google Scholar]
- 70.Wimberly BT, Brodersen DE, Clemons WM, Jr, Morgan-Warren RJ, Carter AP, Vonrhein C, et al. Nature (London) 2000;407:327–39. doi: 10.1038/35030006. [DOI] [PubMed] [Google Scholar]
- 71.Ban N, Nissen P, Hansen J, Moore PB, Steitz TA. Science. 2002;289:905–20. doi: 10.1126/science.289.5481.905. [DOI] [PubMed] [Google Scholar]
- 72.Shewchuk J. Proc. 14th Ann. Sympos. Comput. Geom; 1998. pp. 86–95. [Google Scholar]
- 73.Gibson KD, Scheraga HA. Mol Phys. 1988;64:641–644. [Google Scholar]
- 74.Edelsbrunner H, Fu P. Measuring space filling diagrams and voids Technical Report UIUC-BI-MB-94-01. Beckman Inst., Univ. Illinois; Urbana, Illinois: 1994. [Google Scholar]
- 75.Lee J. J Korea Soc Math Educ Ser B: Pure Appl Math. 1997;4:1–6. [Google Scholar]
- 76.Yang L, Zhang J. Metric equations in geometry and their applications Technical Report IC/89/281. International Center for Theoretical Physics; Trieste, Italy: 1989. [Google Scholar]
- 77.Yang L, Zeng Z. In: Proc ADG2006, LNAI. Botana F, Recio T, editors. Vol. 4869. 2007. pp. 203–211. [Google Scholar]
- 78.Zhang W, Mukhopadhyay S, Pletnev S, Baker T, Kuhn R, Rossmann M. J Virol. 2002;76:11645–11658. doi: 10.1128/JVI.76.22.11645-11658.2002. [DOI] [PMC free article] [PubMed] [Google Scholar]













