Computing the Volume, Surface Area, Mean, and Gaussian Curvatures of Molecules and Their Derivatives

Patrice Koehl; Arseniy Akopyan; Herbert Edelsbrunner

doi:10.1021/acs.jcim.2c01346

. 2023 Jan 13;63(3):973–985. doi: 10.1021/acs.jcim.2c01346

Computing the Volume, Surface Area, Mean, and Gaussian Curvatures of Molecules and Their Derivatives

Patrice Koehl ^†,^*, Arseniy Akopyan ^‡,^*, Herbert Edelsbrunner ^§,^*

PMCID: PMC9930125 PMID: 36638318

Abstract

Geometry is crucial in our efforts to comprehend the structures and dynamics of biomolecules. For example, volume, surface area, and integrated mean and Gaussian curvature of the union of balls representing a molecule are used to quantify its interactions with the water surrounding it in the morphometric implicit solvent models. The Alpha Shape theory provides an accurate and reliable method for computing these geometric measures. In this paper, we derive homogeneous formulas for the expressions of these measures and their derivatives with respect to the atomic coordinates, and we provide algorithms that implement them into a new software package, AlphaMol. The only variables in these formulas are the interatomic distances, making them insensitive to translations and rotations. AlphaMol includes a sequential algorithm and a parallel algorithm. In the parallel version, we partition the atoms of the molecule of interest into 3D rectangular blocks, using a kd-tree algorithm. We then apply the sequential algorithm of AlphaMol to each block, augmented by a buffer zone to account for atoms whose ball representations may partially cover the block. The current parallel version of AlphaMol leads to a 20-fold speed-up compared to an independent serial implementation when using 32 processors. For instance, it takes 31 s to compute the geometric measures and derivatives of each atom in a viral capsid with more than 26 million atoms on 32 Intel processors running at 2.7 GHz. The presence of the buffer zones, however, leads to redundant computations, which ultimately limit the impact of using multiple processors. AlphaMol is available as an OpenSource software.

Introduction

Biological nanomachines, such as proteins and nucleic acids, are essential for all cellular functions due to their abilities to store information, to provide transport to and out of the cell, to catalyze chemical reactions, and to interact and recognize ligands, among other things. Their functions are believed to be intimately related to their shapes (referred to as structures), as well as to the dynamics of these shapes. Our current knowledge of the structures and dynamics of large biomolecules remains inadequate. This is because only a few experimental techniques have the ability to gather structural data that are resolved in time and those that can are typically constrained to small length scales and to short time windows. Recently, new algorithms have been proposed for predicting the structures of proteins that have reached significant success. However, predicting and analyzing the dynamics of such structures are tasks that are still limited in scope, both with respect to time scales (usually microseconds to milli-seconds) and length scales (several nanometers for systems of up to hundred thousand atoms).

With the success of AlphaFold¹ and its successor AlphaFold2,² artificial intelligence has stormed into structural molecular biology in the recent years.^3,4 This is software designed by the company DeepMind to predict the structure of a protein based on its sequence only. AlphaFold has refined numerous deep learning techniques to predict these structures at near experimental-scale resolution, inspiring experimental structural biologists to rethink the way they study the function and evolution of proteins, as well as their impact on diseases.⁵⁻⁷ AlphaFold’s achievement has been made possible by the wealth of information present in the Protein Data Bank,⁸ the database of experimentally determined protein structures (approximately 200,000 as of October 2022). In return, AlphaFold allowed for the prediction of millions of previously unknown protein structures, all available in an open database.⁹ There are, however, limitations to AlphaFold,⁷ which only generates single-ranked conformations for a protein. As such, it is currently unable to provide information on ensembles of conformations for a protein, which may arise if this protein is intrinsically disordered. AlphaFold does not solve the protein folding problem as it is inherently static. It does not capture conformational mechanisms such as allostery. The study of these mechanisms still relies on simulations of molecular dynamics.

The standard approach to simulating the dynamics of a biomolecule is to solve numerically the Newton equations associated with all its atoms. The step size in time required for finding accurate solutions to those differential equations is extremely small (in the order of a femtosecond), leading to the need to compute the energy of the molecular system under study a large number of times. One evaluation of the energy is of order O(N log N), with N being the total number of atoms in the system (including the water molecules in the environment of the biomolecule). For large values of N, say in the millions, such a calculation and more importantly its repeats become computationally prohibitive. While it is possible to design hardware that is specific to such calculations and while many efforts are underway to improve the software that implement them¹⁰⁻¹⁶ (currently allowing for molecular dynamics simulations of systems with up to 100 million atoms¹⁷⁻²¹), parallel efforts are put into developing simplified models in which the number of atoms is reduced to make the calculation more tractable. Of particular interest is to replace the explicit solvent with a potential of mean force that mimics its effect on the molecule. This is akin to deriving and computing a solvation free energy, W_sol, for the biomolecule. How to compute the nonpolar component of this solvation free energy is the topic of this paper. It is noteworthy that current versions of AlphaFold focus on the conformation of the protein alone, independent of its environment. Inclusion of solvation free energy to further refine the prediction is likely to be an addition in newer versions of the software, reinforcing the need to derive accurate and robust methods for computing such solvation free energies.

A Morphometric Approach to the Nonpolar Solvation Free Energy

The solvation free energy W_sol(X) of a biomolecule with conformation X is set to capture the presence of a cavity within the solvent that enables it to accommodate the biomolecule and the vdW interactions between the water molecules and the atoms at the surface of the biomolecule, as well as the interactions between the charged atoms of the biomolecule in the presence of water. The first two contributions define the nonpolar effect, W_np, while the third one captures the polar effect, W_pol. These effects are additive, namely, W_sol = W_np + W_pol. They can be computed individually with the help of a thermodynamic cycle, as illustrated in Figure 1.

Computing the solvation free energy of a biomolecule. The solvation free energy, W_sol, is defined as a mean force potential that quantifies the energy that is required to solvate a molecule. It consists of two parts: (i) a polar contribution, W_pol, which accounts for the effects of the solvent on the charges of the biomolecule and (ii) a nonpolar contribution, W_np, which accounts for the formation of a hole within the solvent so that it can fit the biomolecule as well as for the vdW interactions between the biomolecule and the solvent (at the surface of the biomolecule). These two parts are best described with a thermodynamics cycle. First, the charges on the biomolecule (symbolized by the red balls) are neutralized in vacuo. The corresponding free energy cost is referred to as W_ch^vac. Second, the corresponding neutral molecule is solvated, with a cost W_np. Finally, the charges are added back to the molecule, now in solution, with an energetic cost W_ch. The solvation free energy is the sum of those three contributions, namely, (red arrow).

Inline graphic — Computing the solvation free energy of a biomolecule. The solvation free energy, W_sol, is defined as a mean force potential that quantifies the energy that is required to solvate a molecule. It consists of two parts: (i) a polar contribution, W_pol, which accounts for the effects of the solvent on the charges of the biomolecule and (ii) a nonpolar contribution, W_np, which accounts for the formation of a hole within the solvent so that it can fit the biomolecule as well as for the vdW interactions between the biomolecule and the solvent (at the surface of the biomolecule). These two parts are best described with a thermodynamics cycle. First, the charges on the biomolecule (symbolized by the red balls) are neutralized in vacuo. The corresponding free energy cost is referred to as W_ch^vac. Second, the corresponding neutral molecule is solvated, with a cost W_np. Finally, the charges are added back to the molecule, now in solution, with an energetic cost W_ch. The solvation free energy is the sum of those three contributions, namely, (red arrow).

Eisenberg and McLachlan²² proposed an atomic break down of the computation of the nonpolar part of the solvation free energy of a biomolecule. In their model, each atom is represented with its accessible surface area, ASA,²³ which is then scaled with a surface tensor factor referred to as atomic solvation parameter, or ASP, such that

The ASP is a signed number, positive for nonpolar atoms (large accessible surface areas are then penalized for such nonpolar atoms) and negative for polar atoms (i.e., favoring large accessible surface areas for them). This surface-only model, referred to as SA, is supported indirectly with the observation that the Gibbs free energy for transferring small compounds from nonaqueous liquids to water is linearly related to their accessible surface area. SA has become the preferred approach for studying the dynamics of a biomolecule with an implicit solvent, in conjunction with Poisson–Boltzmann (PBSA) or generalized Born (GBSA) .²⁴ It is interesting, however, to think back on the fact that W_np accounts for two effects, namely, the hole formation in the solvent and the vdW interactions between the atoms of the biomolecule and the solvent molecules. While the latter occurs near the boundary between the biomolecule and the solvent and is therefore proportional to the accessible surface area of the molecule, the former is proportional to the volume of that molecule. This apparent contradiction between a surface area model only and the fact that W_np includes a volume-based contribution currently fuels a debate on the geometric nature of W_np. Lum, Chandler, and Weeks for example have shown that W_np scales with the volume of the solute for small solutes, is proportional to the surface area for very large solutes, and should consider both geometric measures in between.²⁵ This idea that W_np for a molecular system depends on surface area and volume is derived from scaled particle theory.²⁶⁻²⁸ It was shown to be a better representation of solvent effects than a surface-based solvation free energy.^29,30 However, even a combined surface area and volume representation for W_np seems to be deficient to represent length-scale dependence of this energy.³¹ More recently, W_np has been expressed as a linear combination of the four morphometric measures of the molecule:³²

In this equation, V, A, M, and G are the volume, surface area, mean curvature, and Gaussian curvature of the molecular system, while p, σ, and k₁ and k₂ are the pressure, surface tension, and bending rigidity parameters. This is under the assumption that the solvation free energy satisfies (i) motion invariance, namely, independence with respect to the location and orientation of the molecular system in space, (ii) continuity, basically that thermodynamics can be expressed in terms of geometry (a condition that is only violated for system whose size is similar to the size of the solvent), and (iii) additivity, i.e., that the energy of the union of two domains is the sum of the energy of the single domains subtracted by the energy of the intersection. This model has proved useful to study the solvation of proteins and their ligands.³³⁻³⁵

All the solvation models presented above consistently associate the nonpolar contribution to the solvation free energy of a molecule to the geometry of this molecule, the continuity assumption. In what follows, we discuss how different measures of this geometry, more specifically volume, surface area, mean curvature, and Gaussian curvature, and their positional derivatives can be computed efficiently, even for very large molecular systems.

Computing Geometric Measures of Biomolecules

A union of balls, with each ball representing an atom, is a typical geometric representation of a molecule. Lee and Richards³⁶ developed the first approach to computing the accessible surface area of a protein represented by such a union by first cutting it with a set of parallel planes. Shrake and Rupley³⁷ proposed instead a Monte Carlo numerical integration method to compute regions of the surfaces of atoms that are accessible. Many efficient implementations of this method have been proposed, including the use of look-up tables,³⁸ as well as of algorithms that make use of the parallel architecture of computer central processing units (CPU) .^39,40 All those methods have been expanded to compute also the volume of a union of balls.⁴¹⁻⁴⁴

Numerical integration techniques, while practical, are not accurate, and more importantly do not easily provide derivatives for the quantities that they compute. This is true for the computations of surface area of a union of balls described above. Analytical alternatives have been proposed, although computing geometric measures of overlapping balls is not an easy task. Approximations have been proposed that treat overlapping balls using a probabilistic model⁴⁵⁻⁴⁷ or by fully ignoring them.^48,49 Such approximations are ideally suited for the parallel architecture of graphics processing units (GPU) .⁵⁰ They remain approximations, however, that are prone to singularities introduced by numerical errors or by discontinuities in the derivatives.^51,52 Better analytical methods refine the geometric representation of the molecule, considered as a union of pieces of balls.⁵³⁻⁵⁸

This Work

To model the nonpolar contribution to the solvation energies of a biomolecular system, in support of the morphometric model described by eq 2, we consider weighted versions of four measures of the geometry of the union of balls, namely, the volume, surface area, integrated mean curvature, and integrated Gaussian curvature, as well as their derivatives with respect to the positions of the ball centers. This paper presents an extension of a large body of work in which these measures and their derivatives have been characterized before in the context of the Voronoi decomposition of a space filling diagram.⁵⁹⁻⁶⁴ Its contributions are two-fold. First, we present comprehensive and consistent sets of equations for the expression of the four measures and their derivatives in intrinsic geometry, i.e., as functions of the distances between the centers of the atoms only. Second, we establish parallel algorithms for computing the measures and their derivatives, targeting very large molecular systems with millions of atoms. We use viral capsids to illustrate the performances of these algorithms. Information about the structures of such capsids was recovered from the Protein Data Bank⁸ or from the VIPER database.⁶⁵

Outline

The next section, Measuring Union of Balls, provides a brief description of the Alpha Shape theory and its application to measuring a union of balls. It includes subsections that provide expressions for the weighted volume, surface area, mean curvature, and Gaussian curvature and their derivatives, all expressed in terms of the distances between the centers of the balls. The explicit constituents for those expressions are provided in the Supporting Information, parts A–D. The following section, Algorithm and Implementation, describes our parallel implementation of this theory. It includes testing on a set of large virus capsids. The last section concludes the paper.

Measuring Union of Balls

Given a collection of N 3-dimensional bodies, P_i, any geometric measure of the union of the P_i can be derived from the principle of inclusion–exclusion. That is, a measure of the union, ∪_iP_i, is expressed as an alternating sum of the measures of the intersections of the P_i. To make the inclusion–exclusion formula amenable to computation, however, two issues need to be solved. First, we need to reduce significantly the number of terms it includes, as a brute force application of the formula leads to an algorithm with exponential running time, as the total number of potential intersections of P_i terms is 2^N – 1. Second, we need analytical formulas for computing the measures of the those intersections of bodies. The next three subsections provide solutions to these two issues when the bodies are 3D balls.

Background on Voronoi Decompositions and Dual Complexes

Let us consider a finite set of closed balls, B_i, with centers z_i and radii r_i, and let S_i be the sphere that is the boundary of B_i. We define the power distance between a point x and a ball B_i as Inline graphic . The Voronoi region of B_i includes all points x that are at least as close to B_i as to any other ball: . It is a convex polyhedron obtained as the common intersection of finitely many closed half-spaces, one per ball B_j ≠ B_i. The collection of all Voronoi regions, V_i, is the Voronoi diagram of the balls. Note that their union covers the entire space. The intersection of the Voronoi diagram with the union of balls B_i decomposes this union into convex regions, as shown in Figure 2A.

Voronoi decomposition and dual complex of a union of disks. (A) Given a finite set of disks, the Voronoi diagram corresponds to a decomposition of the whole plane into regions, one for each disk, such that any point that belongs to the region corresponding to disk D_i is closer to that disk than to any other disk, with the distance to D_i being the power distance (see text for details). In the graphics, we have restricted the Voronoi diagram to the region covered by the disks. This defines a decomposition of the union of disks into convex regions. (B) The Delaunay triangulation is the dual of the Voronoi diagram that is constructed by defining edges between disk centers of neighboring Voronoi regions. (C) The dual complex is a subset of the Delaunay triangulation, limited to the edges and triangles (dark red) whose corresponding Voronoi regions fully intersect within the union of disks.

The Delaunay triangulation is the dual of the Voronoi diagram. It is obtained by defining an edge between the centers of the balls B_i and B_j if and only if the two corresponding Voronoi regions share a common face. In addition, we generate a triangle connecting z_i, z_j, z_k if V_i, V_j, V_k intersect in a common line segment, and we generate a tetrahedron connecting z_i, z_j, z_k, z_l if V_i, V_j, V_k, V_l meet at a common point. A 2D version of the Delaunay triangulation is illustrated in Figure 2B. Assuming general positions of the balls, those are the only cases to be considered. We call this the generic case. This generic case is rare in practical implementations because of finite precision for the computer representations of the coordinates and radii. It is, however, possible to simulate a perturbation of the union of balls that always restores the generic case.⁶⁶

Next, we limit the construction of Delaunay triangulation to within the union of balls. In other words, we draw a dual edge between the two vertices, z_i and z_j, only if B_i∩V_i and B_j∩V_j share a common face and similarly for triangles and tetrahedra. The result is a subcomplex of the Delaunay triangulation, which is referred to as the dual complex of the set of balls (see Figure 2C). Our objective is to use the dual complex X = ∪_iB_i, corresponding to a biomolecule to compute its nonpolar solvation free energy, which is expressed in a general form as

Here, V_w, A_w, M_w, and G_w are the total weighted volume, weighted surface area, integrated weighted mean curvature, and integrated weighted Gaussian curvature of the union of balls. The a_i, b_i, c_i, and d_i are weights, while the V_i, A_i, M_i, and G_i are the contributions of ball i to the total corresponding measure of the union of balls. The sum extends over all atoms of this union. The Voronoi decomposition of the union of balls described above allows us to compute the different terms in these equation based on intersections of up to four balls only.

Area and Volume Formulas

Write K for the dual complex. A simplex, s, in K can be understood abstractly as a collection of balls: one ball if it is a vertex, two if it is an edge, three if it is a triangle, and four if it is a tetrahedron. As proved in ref (59), the inclusion–exclusion formula that corresponds to the dual complex gives the correct volume and surface area of a union of balls. Let s_i be the vertex corresponding to the ball B_i, s_ij the edges of balls B_i and B_j, s_ijk the triangle of balls B_i, B_j, B_k, and finally, s_ijkl the tetrahedron of four balls, B_i, B_j, B_k, and B_l. Then:

Proposition 1: (Area)

Proposition 2: (Volume)

Here, Inline graphic is the volume of the ball B_i, is the contribution of B_i to the volume of the intersection of the balls B_i and B_j, etc. Similar definitions are used for the surface areas .

Note that even though the eqs 4 and 6 for the surface area and volume are minimal as they only consider up to four levels in the inclusion–exclusion formula, it is possible to find even shorter expressions if noninteger coefficients are considered. Those expressions correspond to the short inclusion–exclusion method; it is described in detail in ref (59). In this method, the areas and volumes are expressed as the sums of the contributions of intersections of at most three balls, with angular coefficients γ_i, γ_ij, and γ_ijk, with the exception of the term vol F_i;jkl (a fraction of the Voronoi region of B_i; see Supporting Information, part A). These coefficients γ are the normalized exposed angles of the simplices;⁶⁰ they integrate the contributions of the tetrahedra of the dual complex. For vertices and edges, these angles can be expressed as fractions of solid and dihedral angles inside tetrahedra. If we define Ω_i;jkl as the solid angle at vertex z_i and ϕ_ij;kl as the dihedral angle at the edge z_iz_j in the tetrahedron defined by z_i, z_j, z_k, z_l, the coefficients are

Expressions for the derivatives with respect to the Cartesian coordinates of the centers of the balls are available for the surface area⁶¹ and for the volume.⁶⁰ Alternate expressions are available for the same derivatives with respect to the distances between the center of these balls.⁶⁷ Note that these distances define internal coordinates for the system, which are invariant under rigid body transformations (rotations and translations). We recall those derivatives here:

Proposition 3: (Area derivative)

Proposition 4: (Volume derivative)

The derivatives of the surface area and volume are expressed in eqs 11 and 12, respectively. They are derived from the corresponding simplified, angle-weighted inclusion–exclusion eqs 5 and 7, respectively. Note that there are no derivatives of Inline graphic and , which are constant, and that there are no terms involving the derivatives of γ_ijk: these derivatives are piecewise zero because the γ_ijk are piecewise constant. Their values change at nongeneric states, where their derivatives are not defined.^60,61 Finally, we note that the derivatives of A_i and V_i with respect to the distance r_ab between the centers z_a and z_b of the two balls B_a and B_b are nonzero if and only if i, a, and b belong to a simplex of K.

Proofs of eqs 4, 5, 6, 7, 11, and 12 and additional formulas are provided in refs (59−61 and 67). We summarize them in Supporting Information, part A, for sake of completeness.

Mean Curvature formulas

Akopyan and Edelsbrunner recently derived theorems for computing the integrated mean curvature over the surface of a union of balls using the dual complex.⁶³ They distinguish between two terms: the contribution of the spherical patches and the contribution of the accessible circular arcs at the intersections of two spheres. Along these circular arcs, the mean curvature is partitioned equally between the two spheres involved. This leads to the following formula for the mean curvature and its derivatives in terms of the edge lengths in the dual complex:

Proposition 5: (Mean curvature)

Proposition 6: (Mean curvature derivative)

In addition to the contribution A_i of the sphere S_i to the total surface area of the union of balls, these equations involve three new terms, r_ij, α_ij, σ_ij, all associated with two balls B_i and B_j that form a simplex of the dual complex. The spheres S_i and S_j that bound those balls intersect at a circle S_ij; r_ij is the radius of this circle, and α_ij is the angle between the unit normals of the spheres at any point of S_ij; see Figure 3. Finally, σ_ij is the fraction of the length of S_ij that is at the boundary of the union of balls (i.e., not covered by other balls). Akopyan and Edelsbrunner⁶³ established formulas for these three terms as functions of the Cartesian coordinates of the centers of the balls in the union. In Supporting Information, part B, we revisit these formulas using internal coordinates (namely, the distances between the centers of the balls) instead.

Gaussian Curvature Formulas

In parallel to the mean curvature formula, Akopyan and Edelsbrunner established a formula for the Gaussian curvature that distinguishes between three terms: the contribution of the spherical patches, the contribution of the circular arcs at the intersections of two spheres, and the contribution of the accessible corners at the intersection of three spheres.⁶⁴ This leads to the following formula for the Gaussian curvature and its derivatives in terms of edge lengths in the dual complex:

Proposition 7: (Gaussian curvature)

Proposition 8: (Gaussian curvature derivative)

All variables have been defined above, except for λ_ij and σ_i;jk, associated with two and three spheres, respectively. Two spheres, S_i and S_j, with centers z_i and z_j that form an edge in K intersect at a circle S_ij. λ_ij is the combined length of the unit normals of the spheres at any point of S_ij after projection on the line passing through z_i and z_j; see Figure 3. Three spheres, S_i, S_j, and S_k, that form a triangle in K intersect in two points, P_ijk and P_ikj. σ_ijk is the fraction assigned to i of the solid angle spanned by the unit normals of S_i, S_j, and S_k at one of those points. Akopyan and Edelsbrunner⁶⁴ established formulas for those two terms. In Supporting Information, part C, we revisit those formulas using internal coordinates.

The Nonpolar Solvation Free Energy W_np

Recall that the nonpolar contribution to the solvation free energy of a union of balls, X, is

in which V_i, A_i, M_i, and G_i are the contributions of ball i to the total volume, surface area, integrated mean curvature, integrated Gaussian curvature of X, respectively, and a_i, b_i, c_i, and d_i are the coefficients corresponding to pressure, surface tension, and bending rigidities. In the previous subsections, we have established formulas for those contributions, as well as for their derivatives with respect to internal coordinates. For any given pair of balls, B_a and B_b, that belong to the dual complex, K, of X, we have

in which Inline graphic , , , are given by eqs 12, 11, 14, and 16, respectively, with all details given in the Supporting Information. Once the derivatives in terms of internal coordinates are available, derivatives with respect to Cartesian coordinates are easily computed using the chain rule:

Proposition 9: (Derivative ofW_np) The gradient a of the nonpolar solvation free energy is

in which u_ij = (z_i – z_j)/r_ij is the unit vector along the edge from z_j to z_i.

Algorithm and Implementation

Our software for computing geometric measures of biomolecules has gone through successive revisions. AlphaVol was our original software package, which implemented the Alpha Shape theory for the volumes and surface areas of biomolecules;⁶² its origins can be traced to the Alpha Shape package.⁶⁸ AlphaVol was partially redesigned into a new package, UnionBall, with modification needed to deal with large molecular systems.⁶⁷ We have now completely redesigned UnionBall into a new package, AlphaMol, written in C++. AlphaMol implements all four intrinsic volumes, as well as their derivatives with respect to atomic coordinates. Each of these measures is possibly weighted, i.e., the contribution of each atom is weighted by a constant provided as input to the software, with a different constant for each of the intrinsic volumes. AlphaMol takes as input a set of balls in Inline graphic , each specified by the coordinates of its center and the radius, as well as by its four weights. In the case of biomolecules, the coordinates of the center of the balls are extracted from the corresponding PDB file, while the radii are defined according to the chemical nature of the atoms, using one of several standard sets of radii (in the following we use the OPLS force field⁶⁹). These radii may be enlarged by the radius of a water probe (usually 1.4 Å), should the measures correspond to the accessible surface of the molecule. The algorithm includes three steps:

Step 1. Build the Delaunay triangulation.

Step 2. Extract the dual complex from the Delaunay triangulation.

Step 3. Compute the geometric measures of the union of balls using the dual complex.

Just like AlphaVol and UnionBall, AlphaMol uses standard algorithms from computational geometry for the first two tasks.^70,71 We have designed our own algorithm for measuring the union (step 3) .^60,61,63,64 We have made modifications to these algorithms compared to AlphaVol and UnionBall, as our interests are mostly measuring biomolecules, with a focus on scalability, namely, the ability to deal with very large biomolecules. We describe those modifications in the following subsections, for the sequential and parallel version of AlphaMol, respectively.

A Sequential Algorithm for Measuring Biomolecules

We implemented the randomized incremental algorithm from Edelsbrunner and Shah⁷⁰ to construct the Delaunay triangulation of a union of balls. In this algorithm, the triangulation is built incrementally, by adding one ball at a time. The input balls are preprocessed with a random permutation. Four dummy balls whose centers lie at infinity are added so that all input balls are contained in the tetrahedron they define. Let DT_i be the Delaunay triangulation at step i of the construction (DT_i contains the four balls at infinity as well as B₁, B₂, ..., B_i). The algorithm proceeds by iterating three steps:

for i = 1 to N do
(1)
Identify the tetrahedron t ∈ DT_i–1 that contains z_i.
(2)
Add z_i as a vertex and decompose t into four tetrahedra.
(3)
Flip locally all non-Delaunay triangles attached to z_i.
endfor.

The randomization guarantees a theoretical expected running time of O(N log N) with an additional linear term in the number of simplices in the Delaunay triangulation.⁷⁰ In Inline graphic , the number of simplices can be as large as a constant times N². However, for well-packed data—which is typical for biomolecules—this number is at most a constant times N, leading to an expected running time of O(N log N).

In practice, a different behavior is observed for very large molecules (such as macromolecular assemblies with millions of atoms, for example virus capsids). This is unfortunately a known problem. Virtual memory operating systems cache recently used data in memory, under the assumption that they are more likely to be used again soon. If the new ball to be inserted in step 1 is not included in one of the recent tetrahedra, the cache will not be useful. This scenario is likely if the order in which the balls are inserted is random. A possible solution is to create 3D locality, by ordering the balls first such that a ball at position i is mostly local with respect to the previous balls in the ordering. Interestingly, the order in which atoms are stored in a PDB file is inherently local.⁶⁷ UnionBall implemented this idea. Instead of randomizing the balls, as in AlphaVol, it keeps the ordering provided by the input PDB file. The effect was significant: UnionBall is substantially faster than AlphaVol, especially for large molecular systems.⁶⁷ With this simple trick, however, there are no guarantee for the expected running time.

Amenta, Choi, and Rote⁷² developed a scheme for ordering points before computing the Delaunay triangulation that maintains enough randomness so that the theoretical complexity of the algorithm is conserved. Their Biased Randomized Insertion Order (BRIO) method was shown to significantly improve the running time of Delaunay triangulation for large number of points.⁷² Later, Liu and Snoeyink⁷³ proposed a different method for ordering the points based on the Hilbert curve. They showed that reordering points using such a Hilbert curve significantly sped up the point location in step 1 of the Delaunay triangulation algorithm.⁷³

We implemented our own version of BRIO in AlphaMol as an option. BRIO proceeds by organizing the points randomly into rounds, using a logarithmic scheme.⁷² Within each round, points can be inserted into any order, allowing for locality. In the original BRIO, within a round, the points were ordered using a kd-tree; we used the Hilbert curve instead. We added another option to AlphaMol, in which the ordering follows the Hilbert curve directly (this is equivalent to BRIO with a single round). We compared our versions of BRIO and of the Hilbert curve ordering with the predefined ordering imposed by the PDB file (as implemented in UnionBall) and with randomized ordering (as implemented in AlphaVol) on a set of 68 virus capsids (see Supporting Information, part E, for a full list). These virus capsids vary in size from 400,000 atoms to 26,000,000 atoms, representing a broad range of sizes for large biomolecular systems. Results of the comparisons of the different ordering schemes are illustrated in Figure 4.

(A) The running times of AlphaMol for measuring biomolecules (the 68 virus capsids in our database; see Supporting Information, part E), when the atoms are inserted randomly (red line) or based on the order provided by the PDB file (black line). (B) Comparing the running times of AlphaMol when atoms are inserted based on the PDB order (black line), or randomly inserted, followed by ordering based on the Hilbert curve (red line) or followed by BRIO-Hilbert ordering (blue curve). In both (A) and (B), we show the mean and standard deviation over five random trials. Computations were performed on a single core on an iMac computer with an 3.8 GHz 8-core 10th-generation Intel Core i7.

As illustrated in Figure 4A, not randomizing the order in which points are inserted resulted in a significant improvement in performance. This was already observed with UnionBall.⁶⁷ Removing the randomization leads to an observed linear dependence of the running time on the number of weighted spheres considered. Randomization followed by ordering based on a spacing-filling curve, or on our modified BRIO method, leads to further speed-up, albeit small; see Figure 4B. Note that the differences in running time between the Hilbert curve ordering and BRIO are not significant. In the following, we use the Hilbert curve ordering when necessary. We note that both BRIO and the Hilbert curve method require preprocessing of the data that comes with its own computational cost. This cost, however, is minimal, representing 1.7% of the total computing time, on average; see Figure 5. The bulk of the calculation comes from computing the Delaunay triangulation (52.4% on average), followed by the extraction of the dual complex (27.4%), and the computations of the intrinsic volumes and their derivatives (18.5%).

Fraction of running times of the different components of AlphaMol for measuring the 68 virus capsids in our database; (see Supporting Information, part E), in percentage: black, Hilbert curve ordering of the atoms (average 1.7%); red, weighted Delaunay triangulation of the ball centers (average 52.4%); green, filtering the Delaunay triangulation to generate the dual complex (average 27.4%); and blue, computing the intrinsic volumes based on the dual complex (average 18.5%).

A Parallel Algorithm for Measuring Biomolecules

As described above, measuring a biomolecule represented by a union of balls involves three steps: computing the Delaunay triangulation of the centers of the balls weighted by their radii, filtering the simplices of the Delaunay triangulation to build the dual complex, and building the inclusion–exclusion formulas for the intrinsic volumes of the union of balls. Of these three steps, the second and the third can be easily parallelized, as they basically involve iterating over simplices. Unfortunately, parallelizing the computation of the Delaunay triangulation is a difficult task that remains a hot topic in research.⁷⁴⁻⁸⁰ Considering that computing the Delaunay triangulation is more than 50% of the total computing cost of AlphaMol (see Figure 5), this is a concern.

Many parallel Delaunay triangulation algorithms have been proposed. Most focus on partitioning the domain that contains the points so that each partition can be triangulated separately and in parallel with the others.⁷⁴⁻⁸⁰ The bottleneck of this approach, however, is the merging of the triangulations from the different partitions to generate the complete triangulation.⁷⁶ The tetrahedra at the borders of a partition have to be connected to tetrahedra in adjacent partitions, often leading to local retriangulations. This merging step is sometimes referred to as stitching and is known to be difficult. We note, however, that the goal of AlphaMol is to compute the contributions of all atoms to the intrinsic volumes of the biomolecule of interest and not generate the overall Delaunay triangulation. We propose a different parallel strategy for computing these contributions, that still uses the concept of partitioning the whole domain, but with a focus on the volumes, and not the Delaunay triangulation. It is explained in Figure 6 and illustrated for a virus capsid in Figure 7.

Splitting the computation of the intrinsic volumes of a union of balls. Let us consider the union of balls represented in panel A (this is the same union as in Figure 2, in which we show the dual complex overlaid with the restriction of the Voronoi diagram to within the portion of the plane covered by the balls). In panel B, we limit this union to those balls whose centers belong to the rectangular block shown with solid sides; those balls are colored rose. We expand this block with a buffer zone, delimited by the dashed, magenta sides. Balls B₂ and B₃, whose centers are within this buffer zone, interact with balls from the rectangular block. All other balls (here B₁) are ignored. Applying AlphaMol to the balls in the block and in the buffer zone leads to the correct computation of the intrinsic volumes for the balls in the block. The procedure is repeated in parallel for all blocks covering the union.

Splitting the computation of the intrinsic volumes of the murine polyomavirus (PDB code 1sid) over four processors. The 1,020,180 atoms are divided into four partitions of approximately equal size, using a kd-tree algorithm. Each partition correspond to a rectangular block. Each block is complemented with a buffer zone, such that atoms in this buffer zone (shown in *magenta*) may interact with the atoms of the block. Each block together with its buffer zone is assigned to one processor, which then runs the full AlphaMol algorithm. As each block with buffer zone includes approximately one-quarter of the atoms, and the computations on the blocks are run in parallel, it is expected that the total computation time be reduced by a factor of 4.

Briefly, we partition all atoms of the biomolecule into 3D rectangular blocks, using a kd-tree algorithm. Our goal is to apply the sequential algorithm of AlphaMol to each block, using a different processor for each block. We note, however, that atoms at the edges of a block may interact with atoms from neighboring blocks. If those atoms are not included in the calculation, the intrinsic volumes found for the atoms in the block will be inexact. We therefore expand each block with a buffer zone to include the atoms neighboring the block. The width of this buffer zone is set to 2R_max, in which R_max is the radius of the largest ball in the union of balls representing the biomolecule. This value ensures that all atoms that potentially interact with the atoms in the block will be accounted for. We then apply the AlphaMol algorithm to all atoms in the block and in its buffer zone. We finally retain the intrinsic volumes of the atoms in the block. The choice of the buffer zone guarantees that simplices sharing a vertex in the block are exactly the simplices sharing this vertex in the complete Delaunay triangulation, a property that does not hold for the vertices in the buffer zone. It follows that the intrinsic volumes of the balls whose centers lie in the block are computed correctly, while those of the balls whose centers lie in the buffer zone may be incorrect. Each block and its buffer is dealt with on a different processor, and as the partitioning of the atoms is balanced, we expect a speed-up proportional to the number of blocks.

Figure 8 shows the wall time for the execution of AlphaMol on the capsid of faustovirus (PDB code 5j7z,⁸¹ 26 million atoms), with different numbers of threads requested by the software. The virus capsid is divided into m partitions based on a kd-tree, and each partition is handled separately on a different thread. Computations were performed on two different multicore computers, one with an Intel Xeon multicore CPUs running at 2.70 GHz, with 96 cores (192 threads), and a second with an AMD Threadripper multicore CPUs running at 2.2 GHz, with 32 cores (64 threads). The AMD computer is more recent than the Intel computer. As expected, we observe a significant speed-up when AlphaMol is run on multiple threads. The gain in time is significant. It takes 605 s with a single Intel Xeon thread and 321 s with a single AMD thread to compute the weighted volumes and their derivatives for the capsid of faustovirus. In comparison, it takes 161 or 82.3 s on four Intel threads or four AMD threads, respectively, and it only takes 30.9 or 21.2 s to do the same calculation using 32 Intel threads or 32 AMD threads, respectively. The speed-up is nearly linear between 1 and 32 threads for both types of processors. We note, however, that the speed-up is marginal between 32 threads and 64 threads for the AMD processor and between 64 threads and 128 threads for the Intel processor (see Figure 8). We believe that this behavior is reflective of the computer architecture, and not of the software itself. The AMD processor has 64 threads, but only 32 physical cores. We do not expect that two threads on a single core will be significantly faster than a single thread on that core, as those two threads share the same resources. The Intel processor has 96 physical cores. However, those cores are spread equally over four sockets, each with 24 cores. As such, we also expect a slow down in speed-up for large numbers of threads that need to share the same memory resources.

Execution (wall) time of AlphaMol as a function of the number threads requested, for computing the intrinsic volumes and their derivatives of the capsid of the faustovirus (PDB code 5j7v), which consists of approximately 26 million atoms. Computations were performed on an Intel Xeon processor at 2.70 GHz, with 96 cores (192 threads) (red), or an AMD Threadripper processor at 2.2 GHz, with 32 cores (64 threads).

To reduce the risk that our conclusions are anecdotal, we repeated the calculations on all 68 virus capsids in our database. We only used the Intel computer, but found similar results with the AMD computer. Figure 9 shows the wall time as well as the speed-up for the execution of AlphaMol on the virus capsids, with different numbers of threads. Just like for the faustovirus, we observe a significant speed-up when multiple threads are added. This speed-up is nearly linear up to 32 cores, and then slows down. There are two reasons for the slow down in speed-up as more threads are added, namely, the architecture of the computer and progressively more redundant computations for larger numbers of threads. To illustrate the latter, we use two ways to calculate the speed-up. First, we compare the wall time with the total CPU time, represented as CLOCK/CPU. Based on this notion, the speed-up is nearly linear up to 32 processors, and slows down thereafter, still reaching 40 for 64 processors. It shows that there is little communication between the master processor and the at least 32 additional processors, indicating that the parallelism is effective. Second, we compare the wall time when running on k processors, with the running time from a serial (i.e., one processor) independent calculation. Based on this second notion, the speed-up is less effective, with 64 processors not being significantly more effective than 32 processors. The two notions should give similar results if the splitting is balanced so that each processor deals with about N/k atoms, N being the total number of atoms in the biomolecule. This is not the case, however, as the algorithm adds a buffer zone to each partition to make sure that the computation is correct. Each atom in the buffer zones will be considered at least twice, since it also belongs to one of the partitions. As expected, the number of atoms treated at least twice increases with the number of processors; see Figure 10. This leads to redundant computations whose importance increases when the number of processors increases, ultimately leading to a plateau in the speed-up brought by the parallelism. It remains that we do reach a significant speed-up through this parallelism, with an average factor of 20-folds with 32 processors.

(A) Execution (wall) time of AlphaMol as a function of the number of atoms for different numbers of processors. (B) The speedup (computed as the ratio of total CPU time over wall time) is plotted against the number of threads used by the program. Computations were performed on an Intel Xeon processor at 2.70 GHz, with 96 cores (192 threads).

Percentages of atoms in the buffer zones as a fraction of the total number of atoms for the parallel version of AlphaMol as a function of the number of processors. An atom is said to belong to the buffer zones if it belongs to at least one of the buffer zones. The percentage increases with the number of processors, leading to progressively more redundant computations that ultimately limit the impact of using multiple processors.

Conclusion

The Alpha Shape theory provides an accurate and robust method for computing the geometric measures of a biomolecule.⁵⁹⁻⁶⁴ Among these measures, the intrinsic volumes are used to quantify the interaction between a biomolecule and surrounding water in the so-called morphometric model.³³⁻³⁵ Several implementations of the Alpha Shape theory for measuring biomolecules exist, including our own, AlphaVol⁶² and UnionBall.⁶⁷ These implementations, however, were limited to computing the volumes and surface areas of biomolecules.

In this paper, we have derived homogeneous formulas for the expressions of all four intrinsic volumes and their derivatives and implement them into a new package, AlphaMol. The only variables in these formulas are the interatomic distances, making them insensitive to translations and rotations. Recent spectacular advances in structural biology have produced an abundance of data on large macromolecular complexes, such as full size virus capsids⁸² that contain several millions of atoms. Modeling these large systems is as important as modeling smaller proteins or nucleic acids; see for example the simulations of the HIV capsid that include over 60 million atoms.¹⁷ To make sure that AlphaMol remains practical, we have adapted its underlying algorithms in two ways. First, we have included an ordering scheme based on Hilbert curves to improve the localities of the atoms as they are introduced sequentially for generating the Delaunay triangulation of the atom centers, which is at the core of the Alpha Shape theory. Second, we have introduced a parallel version of AlphaMol, which partitions the atoms of the biomolecule of interest into 3D rectangular blocks, using a kd-tree algorithm. We then apply the sequential algorithm of AlphaMol to each block, augmented by a buffer zone to account for atoms that may overlap with atoms in the block. The presence of the buffer zones, however, leads to redundant computations that ultimately limit the impact of using multiple processors. In our current version, 32 processors led to a significant speed-up (20 times on average for 68 virus capsids ranging from 400,000 atoms to 26 million atoms), with marginal improvements for a higher number of processors.

Ultimately, we would like to push the parallelism to hundreds of processors, such as those that are available on a GPU. Recently, there was an attempt to do so for computing the Alpha Shape of a molecule,⁸³ using a mixed CPU-GPU algorithm showing speed-ups in the order of 25, which is similar to what we observed for multiple CPUs. We do see, however, some roadblocks for a GPU-only implementation of AlphaMol. To our knowledge, all current GPU implementations of the 3D Delaunay triangulation compute a near-Delaunay structure on the GPU, followed by transformations on the CPU to generate a valid Delaunay triangulation.⁸⁴ In addition, computing a valid Delaunay triangulation requires robust geometric predicates to account for possible degeneracies. In our implementation, we rely on the Simulation of Simplicity (SoS) to remove those degeneracies.⁶⁶ The SoS method is based or arbitrary precision arithmetics. There are currently only limited GPU implementations of arbitrary precision arithmetics, and they are still under development. Developing a GPU-based computation of intrinsic volumes of biomolecules remains, however, a priority for molecular simulations of very large biomolecular systems.

Acknowledgments

P.K. acknowledges support from the University of California Multicampus Research Programs and Initiatives (Grant No. M21PR3267) and from the NSF (Grant No.1760485). H.E. acknowledges support from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program, Grant No. 788183, from the Wittgenstein Prize, Austrian Science Fund (FWF), Grant No. Z 342-N31, and from the DFG Collaborative Research Center TRR 109, ‘Discretization in Geometry and Dynamics’, Austrian Science Fund (FWF), Grant No. I 02979-N35.

Data Availability Statement

All PDB files for the virus capsid structures are available from the Protein Data Bank, http://www.rcsb.org. The sequential version of AlphaMol is available as OpenSource software on github (https://github.com/pkoehl/AlphaMol).

Supporting Information Available

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jcim.2c01346.

Proofs of the propositions in main text of the document: surface areas and volumes of ball intersections and derivatives; integrated mean curvature and derivatives; integrated Gaussian curvature and derivatives; and geometry of a tetrahedron. Table S.1: Database of virus capsid structures that are used in this study. (PDF)

Open Access is funded by the Austrian Science Fund (FWF).

The authors declare no competing financial interest.

Supplementary Material

ci2c01346_si_001.pdf^{(289.4KB, pdf)}

References

Senior A.; Evans R.; Jumper J.; Kirkpatrick J.; Sifre L.; Green T.; Qin C.; Žídek A.; Nelson A.; Bridgland A.; Penedones H.; Petersen S.; Simonyan K.; Crossan S.; Kohli P.; Jones D.; Silver D.; Kavukcuoglu K.; Hassabis D. Improved protein structure prediction using potentials from deep learning. Nature 2020, 577, 706–710. 10.1038/s41586-019-1923-7. [DOI] [PubMed] [Google Scholar]
Jumper J.; Evans R.; Pritzel A.; Green T.; Figurnov M.; Ronneberger O.; Tunyasuvunakool K.; Bates R.; Zidek A.; Potapenko A.; Bridgland A.; Meyer C.; Kohl S. A. A.; Ballard A. J.; Cowie A.; Romera-Paredes B.; Nikolov S.; Jain R.; Adler J.; Back T.; Petersen S.; Reiman D.; Clancy E.; Zielinski M.; Steinegger M.; Pacholska M.; Berghammer T.; Bodenstein S.; Silver D.; Vinyals O.; Senior A. W.; Kavukcuoglu K.; Kohli P.; Hassabis D. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583–589. 10.1038/s41586-021-03819-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hassoun S.; Jefferson F.; Shi X.; Stucky B.; Wang J.; Rosa E. Jr Artificial intelligence for biology. Integrative and Comparative Biology 2022, 61, 2267–2275. 10.1093/icb/icab188. [DOI] [PubMed] [Google Scholar]
Xu Y.; Liu X.; Cao X.; Huang C.; Liu E.; Qian S.; Liu X.; Wu Y.; Dong F.; Qiu C.-W.; Qiu J.; Hua K.; Su W.; Wu J.; Xu H.; Han Y.; Fu C.; Yin Z.; Liu M.; Roepman R.; Dietmann S.; Virta M.; Kengara F.; Zhang Z.; Zhang L.; Zhao T.; Dai J.; Yang J.; Lan L.; Luo M.; Liu Z.; An T.; Zhang B.; He X.; Cong S.; Liu X.; Zhang W.; Lewis J. P.; Tiedje J. M.; Wang Q.; An Z.; Wang F.; Zhang L.; Huang T.; Lu C.; Cai Z.; Wang F.; Zhang J. Artificial intelligence: A powerful paradigm for scientific research. Innovation 2021, 2, 100179. 10.1016/j.xinn.2021.100179. [DOI] [PMC free article] [PubMed] [Google Scholar]
Callaway E. It will change everything”: DeepMind’s AI makes gigantic leap in solving protein structures. Nature 2020, 588, 203–205. 10.1038/d41586-020-03348-4. [DOI] [PubMed] [Google Scholar]
Jones D.; Thornton J. The impact of AlphaFold2 one year on. Nat. Methods 2022, 19, 15–20. 10.1038/s41592-021-01365-3. [DOI] [PubMed] [Google Scholar]
Nussinov R.; Zhang M.; Liu Y.; Jang H. AlphaFold, Artificial Intelligence, and Allostery. J. Phys. Chem. B 2022, 126, 6372–6383. 10.1021/acs.jpcb.2c04346. [DOI] [PMC free article] [PubMed] [Google Scholar]
Berman H. M.; Westbrook J.; Feng Z.; Gilliland G.; Bhat T. N.; Weissig H.; Shindyalov I.; Bourne P. The Protein Data Bank. Nucl. Acids. Res. 2000, 28, 235–242. 10.1093/nar/28.1.235. [DOI] [PMC free article] [PubMed] [Google Scholar]
Varadi M.; Anyango S.; Deshpande M.; Nair S.; Natassia C.; Yordanova G.; Yuan D.; Stroe O.; Wood G.; Laydon A.; Zidek A.; Green T.; Tunyasuvunakool K.; Petersen S.; Jumper J.; Clancy E.; Green R.; Vora A.; Lutfi M.; Figurnov M.; Cowie A.; Hobbs N.; Kohli P.; Kleywegt G.; Birney E.; Hassabis D.; Velankar S. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 2022, 50, D439–D444. 10.1093/nar/gkab1061. [DOI] [PMC free article] [PubMed] [Google Scholar]
Shaw D.; Deneroff M.; Dror R.; Kuskin J.; Larson R.; Salmon J.; Young C.; Batson B.; Bowers K.; Chao J.; Eastwood M.; Gagliardo J.; Grossman J.; Ho C.; Ierardi D.; Kolossváry I.; Klepeis J.; Layman T.; McLeavey C.; Moraes M.; Mueller R.; Priest E.; Shan Y.; Spengler J.; Theobald M.; Towles B.; Wang S. Anton, a Special-purpose Machine for Molecular Dynamics Simulation. Commun. ACM 2008, 51, 91–97. 10.1145/1364782.1364802. [DOI] [Google Scholar]
Stone J.; Hardy D.; Ufimtsev I.; Schulten K. GPU-accelerated molecular modeling coming of age. J. Mol. Graph. Modelling 2010, 29, 116–125. 10.1016/j.jmgm.2010.06.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang Y.; Harrison C.; Schulten K.; McCammon J. Implementation of accelerated molecular dynamics in NAMD. Comput. Sci. Discovery 2011, 4, 015002. 10.1088/1749-4699/4/1/015002. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pierce L. C.T.; Salomon-Ferrer R.; de Oliveira C. A. F.; McCammon J. A.; Walker R. C. Routine access to millisecond time scale events with accelerated molecular dynamics. J. Chem. Theori. Comput. 2012, 8, 2997–3002. 10.1021/ct300284c. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sweet J.; Nowling R.; Cickovski T.; Sweet C.; Pande V.; Izaguirre J. Long Timestep Molecular Dynamics on the Graphical Processing Unit. J. Chem. Theori. Comput. 2013, 9, 3267–3281. 10.1021/ct400331r. [DOI] [PMC free article] [PubMed] [Google Scholar]
Shaw D. E.; Grossman J. P.; Bank J. A.; Batson B.; Butts J. A.; Chao J. C.; Deneroff M. M.; Dror R. O.; Even A.; Fenton C. H.; Forte A.; Gagliardo J.; Gill G.; Greskamp B.; Ho C. R.; Ierardi D. J.; Iserovich L.; Kuskin J. S.; Larson R. H.; Layman T.; Lee L. S.; Lerer A. K.; Li C.; Killebrew D.; Mackenzie K. M.; Mok S. Y. H.; Moraes M. A.; Mueller R.; Nociolo L. J.; Peticolas J. L.; Quan T.; Ramot D.; Salmon J. K.; Scarpazza D. P.; Schafer U. B.; Siddique N.; Snyder C. W.; Spengler J.; Tang P. T. P.; Theobald M.; Toma H.; Towles B.; Vitale B.; Wang S. C.; Young C.. Anton 2: Raising the Bar for Performance and Programmability in a Special-Purpose Molecular Dynamics Supercomputer. SC14: International Conference for High Performance Computing, Networking, Storage and Analysis; In SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis; IEEE, 2014; pp 41–53.
Eastman P.; Pande V. OpenMM: A Hardware Independent Framework for Molecular Simulations. Comput. Sci. Eng. 2015, 12, 34–39. 10.1109/MCSE.2010.27. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhao G.; Perilla J.; Yufenyuy E.; Meng X.; Chen B.; Ning J.; Ahn J.; Gronenborn A.; Schulten K.; Aiken C.; Zhang P. Mature HIV-1 capsid structure by cryo-electron microscopy and all-atom molecular dynamics. Nature 2013, 497, 643–646. 10.1038/nature12162. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sener M.; Strumpfer J.; Singharoy A.; Hunter C.; Schulten K. Overall energy conversion efficiency of a photosynthetic vesicle. Elife 2016, 5, na. 10.7554/eLife.09541. [DOI] [PMC free article] [PubMed] [Google Scholar]
Phillips J.; Hardy D.; Maia J.; Stone J.; Ribeiro J.; Bernardi R.; Buch R.; Fiorin G.; Hénin J.; Jiang W.; McGreevy R.; Melo M.; Radak B.; Skeel R.; Singharoy A.; Wang Y.; Roux B.; Aksimentiev A.; Luthey-Schulten Z.; Kalé L.; Schulten K.; Chipot C.; Tajkhorshid E. Scalable molecular dynamics on CPU and GPU architectures with NAMD. J. Chem. Phys. 2020, 153, 044130. 10.1063/5.0014475. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jung J.; Kobayashi C.; Kasahara K.; Tan C.; Kuroda A.; Minami K.; Ishiduki S.; Nishiki T.; Inoue H.; Ishikawa Y.; Feig M.; Sugita Y. New parallel computing algorithm of molecular dynamics for extremely huge scale biological systems. J. Comput. Chem. 2021, 42, 231–241. 10.1002/jcc.26450. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gupta C.; Sarkar D.; Tieleman D.; Singharoy A. The ugly, bad, and good stories of large-scale biomolecular simulations. Curr. Opin. Struct. Biol. 2022, 73, 102338. 10.1016/j.sbi.2022.102338. [DOI] [PubMed] [Google Scholar]
Eisenberg D.; McLachlan A. D. Solvation energy in protein folding and binding. Nature (London) 1986, 319, 199–203. 10.1038/319199a0. [DOI] [PubMed] [Google Scholar]
Richards F. M. Areas, volumes, packing, and protein-structure. Annu. Rev. Biophys. Bioeng. 1977, 6, 151–176. 10.1146/annurev.bb.06.060177.001055. [DOI] [PubMed] [Google Scholar]
Kollman P.; Massova I.; Reyes C.; Kuhn B.; Huo S.; Chong L.; Lee M.; Lee T.; Duan Y.; Wang W.; Donini O.; Cieplak P.; Srinivasan J.; Case D.; Cheatham T. Calculating structures and free energies of complex molecules: combining molecular mechanics and continuum models. Acc. Chem. Res. 2000, 33, 889–897. 10.1021/ar000033j. [DOI] [PubMed] [Google Scholar]
Lum K.; Chandler D.; Weeks J. D. Hydrophobicity at small and large length scales. J. Phys. Chem. B 1999, 103, 4570–4577. 10.1021/jp984327m. [DOI] [Google Scholar]
Reiss H.; Frisch H.; Lebowitz J. Statistical mechanics of rigid spheres. J. Chem. Phys. 1959, 31, 369–380. 10.1063/1.1730361. [DOI] [Google Scholar]
Lebowitz J.; Helfand E.; Praestgaard E. Scaled particle theory of fluid mixtures. J. Chem. Phys. 1965, 43, 774–779. 10.1063/1.1696842. [DOI] [Google Scholar]
Rosenfeld Y. Scaled field particle theory of the structure and the thermodynamics of isotropic hard particle fluids. J. Chem. Phys. 1988, 89, 4272–4287. 10.1063/1.454810. [DOI] [Google Scholar]
Levy R.; Zhang L.; Gallicchio E.; Felts A. On the nonpolar hydration free energy of proteins: surface area and continuum solvent models for the solute- solvent interaction energy. J. Am. Chem. Soc. 2003, 125, 9523–9530. 10.1021/ja029833a. [DOI] [PubMed] [Google Scholar]
Wagoner J.; Baker N. Assessing implicit models for nonpolar mean solvation forces: the importance of dispersion and volume terms. Proc. Natl. Acad. Sci. (USA) 2006, 103, 8331–8336. 10.1073/pnas.0600118103. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chen J.; Brooks C. III Implicit modeling of nonpolar solvation for simulating protein folding and conformational transitions. Phys. Chem. Chem. Phys. 2008, 10, 471–481. 10.1039/B714141F. [DOI] [PubMed] [Google Scholar]
König P.-M.; Roth R.; Mecke K. Morphological thermodynamics of fluids: shape dependence of free energies. Phys. Rev. Lett. 2004, 93, 160601. 10.1103/PhysRevLett.93.160601. [DOI] [PubMed] [Google Scholar]
Roth R.; Harano Y.; Kinoshita M. Morphometric approach to the solvation free energy of complex molecules. Phys. Rev. Lett. 2006, 97, 078101. 10.1103/PhysRevLett.97.078101. [DOI] [PubMed] [Google Scholar]
Hansen-Goos H.; Roth R.; Mecke K.; Dietrich S. Solvation of proteins: linking thermodynamics to geometry. Phys. Rev. Lett. 2007, 99, 128101. 10.1103/PhysRevLett.99.128101. [DOI] [PubMed] [Google Scholar]
Harano Y.; Roth R.; Chiba S. A morphometric approach to the accurate solvation thermodynamics of proteins and ligands. J. Comput. Chem. 2013, 34, 1969–1974. 10.1002/jcc.23348. [DOI] [PubMed] [Google Scholar]
Lee B.; Richards F. M. Interpretation of protein structures: estimation of static accessibility. J. Mol. Biol. 1971, 55, 379–400. 10.1016/0022-2836(71)90324-X. [DOI] [PubMed] [Google Scholar]
Shrake A.; Rupley J. A. Environment and exposure to solvent of protein atoms. Lysozyme and insulin. J. Mol. Biol. 1973, 79, 351–371. 10.1016/0022-2836(73)90011-9. [DOI] [PubMed] [Google Scholar]
Legrand S. M.; Merz K. M. Rapid approximation to molecular-surface area via the use of boolean logic and look-up tables. J. Comput. Chem. 1993, 14, 349–352. 10.1002/jcc.540140309. [DOI] [Google Scholar]
Wang H.; Levinthal C. A vectorized algorithm for calculating the accessible surface area of macromolecules. J. Comput. Chem. 1991, 12, 868–871. 10.1002/jcc.540120712. [DOI] [Google Scholar]
Futamura N.; Aluru S.; Ranjan D.; Hariharan B. Efficient parallel algorithms for solvent accessible surface area of proteins. IEEE Trans. Parallel Dist. Syst. 2002, 13, 544–555. 10.1109/TPDS.2002.1011399. [DOI] [Google Scholar]
Rowlinson J. S. The triplet distribution function in a fluid of hard spheres. Mol. Phys. 1963, 6, 517–524. 10.1080/00268976300100581. [DOI] [Google Scholar]
Pavani R.; Ranghino G. A method to compute the volume of a molecule. Computers and Chemistry 1982, 6, 133–135. 10.1016/0097-8485(82)80006-5. [DOI] [Google Scholar]
Gavezzotti A. The calculation of molecular volumes and the use of volume analysis in the investigation of structured media and of solid-state organic reactivity. J. Am. Chem. Soc. 1983, 105, 5220–5225. 10.1021/ja00354a007. [DOI] [Google Scholar]
Till M.; Ullmann G. M. McVol - A program for calculating protein volumes and identifying cavities by a Monte Carlo algorithm. J. Mol. Model. 2010, 16, 419–429. 10.1007/s00894-009-0541-y. [DOI] [PubMed] [Google Scholar]
Wodak S. J.; Janin J. Analytical approximation to the accessible surface-area of proteins. Proc. Natl. Acad. Sci. (USA) 1980, 77, 1736–1740. 10.1073/pnas.77.4.1736. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hasel W.; Hendrickson T. F.; Still W.C. A rapid approximation to the solvent accessible surface areas of atoms. Tetrahed. Comp. Method. 1988, 1, 103–116. 10.1016/0898-5529(88)90015-2. [DOI] [Google Scholar]
Cavallo L. POPS: a fast algorithm for solvent accessible surface areas at atomic and residue level. Nucl. Acids. Res. 2003, 31, 3364–3366. 10.1093/nar/gkg601. [DOI] [PMC free article] [PubMed] [Google Scholar]
Street A. G.; Mayo S. L. Pairwise calculation of protein solvent-accessible surface areas. Folding & Design 1998, 3, 253–258. 10.1016/S1359-0278(98)00036-4. [DOI] [PubMed] [Google Scholar]
Weiser J.; Shenkin P. S.; Still W. C. Approximate atomic surfaces from linear combinations of pairwise overlaps (LCPO). J. Comput. Chem. 1999, 20, 217–230. . [DOI] [Google Scholar]
Dynerman D.; Butzlaff E.; Mitchell J. CUSA and CUDE: GPU-accelerated methods for estimating solvent accessible surface area and desolvation. J. Comput. Biol. 2009, 16, 523–537. 10.1089/cmb.2008.0157. [DOI] [PubMed] [Google Scholar]
Wawak R. J.; Gibson K. D.; Scheraga H. A. Gradient discontinuities in calculations involving molecular-surface area. J. Math. Chem. 1994, 15, 207–232. 10.1007/BF01277561. [DOI] [Google Scholar]
Gogonea V.; Osawa E. An improved algorithm for the analytical computation of solvent-excluded volume. The treatment of singularities in solvent-accessible surface-area and volume functions. J. Comput. Chem. 1995, 16, 817–842. 10.1002/jcc.540160703. [DOI] [Google Scholar]
Richmond T. J. Solvent accessible surface-area and excluded volume in proteins. Analytical equations for overlapping spheres and implications for the hydrophobic effect. J. Mol. Biol. 1984, 178, 63–89. 10.1016/0022-2836(84)90231-6. [DOI] [PubMed] [Google Scholar]
Connolly M. L. Computation of molecular volume. J. Am. Chem. Soc. 1985, 107, 1118–1124. 10.1021/ja00291a006. [DOI] [Google Scholar]
Dodd L. R.; Theodorou D. N. Analytical treatment of the volume and surface area of molecules formed by an arbitrary collection of unequal spheres intersected by planes. Mol. Phys. 1991, 72, 1313–45. 10.1080/00268979100100941. [DOI] [Google Scholar]
Irisa M. An elegant algorithm of the analytical calculation for the volume of fused spheres with different radii. Comput. Phys. Commun. 1996, 98, 317–338. 10.1016/0010-4655(96)00082-3. [DOI] [Google Scholar]
Vassetti D.; Civalleri B.; Labat F. Analytical calculation of the solvent-accessible surface area and its nuclear gradients by stereographic projection: A general approach for molecules, polymers, nanotubes, helices, and surfaces. J. Comput. Chem. 2020, 41, 1464–1479. 10.1002/jcc.26191. [DOI] [PubMed] [Google Scholar]
Duan X.; Quan C.; Stamm B. A boundary-partition-based Voronoi diagram of d-dimensional balls: definition, properties, and applications. Adv. Comput. Math. 2020, 46 (44), 1–25. 10.1007/s10444-020-09765-3. [DOI] [Google Scholar]
Edelsbrunner H. The union of balls and its dual shape. Discrete Comput. Geom. 1995, 13, 415–440. 10.1007/BF02574053. [DOI] [Google Scholar]
Edelsbrunner H.; Koehl P. The weighted-volume derivative of a space-filling diagram. Proc. Natl. Acad. Sci. (USA) 2003, 100, 2203–2208. 10.1073/pnas.0537830100. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bryant R.; Edelsbrunner H.; Koehl P.; Levitt M. The area derivative of a space-filling diagram. Discrete Comput. Geom 2004, 32, 293–308. 10.1007/s00454-004-1099-1. [DOI] [Google Scholar]
Edelsbrunner H.; Koehl P. The geometry of biomolecular solvation. Combinatorial and Computational Geometry 2005, 52, 243–275. [Google Scholar]
Akopyan A.; Edelsbrunner H. The Weighted Mean Curvature Derivative of a Space-Filling Diagram. Comput. Math. Biophys. 2020, 8, 51–67. 10.1515/cmb-2020-0100. [DOI] [Google Scholar]
Akopyan A.; Edelsbrunner H. The Weighted Gaussian Curvature Derivative of a Space-Filling Diagram. Comput. Math. Biophys. 2020, 8, 74–88. 10.1515/cmb-2020-0101. [DOI] [Google Scholar]
Carrillo-Tripp M.; Shepherd C. M.; Borelli I. A.; Venkataraman S.; Lander G.; Natarajan P.; Johnson J. E.; Brooks C. L.; Reddy V. S. VIPERdb2: an enhanced and web API enabled relational database for structural virology. Nucl. Acids. Res. 2009, 37, D436–D442. 10.1093/nar/gkn840. [DOI] [PMC free article] [PubMed] [Google Scholar]
Edelsbrunner H.; Mücke E. P. Simulation of simplicity: a technique to cope with degenerate cases in geometric algorithms. ACM Trans. Graphics 1990, 9, 66–104. 10.1145/77635.77639. [DOI] [Google Scholar]
Mach P.; Koehl P. Geometric measures of large biomolecules: surface, volume, and pockets. J. Comput. Chem. 2011, 32, 3023–3038. 10.1002/jcc.21884. [DOI] [PMC free article] [PubMed] [Google Scholar]
Edelsbrunner H.; Mücke E. P. Three-dimensional Alpha Shapes. ACM Trans. Graphics 1994, 13, 43–72. 10.1145/174462.156635. [DOI] [Google Scholar]
Kaminski G. A.; Friesner R. A.; Tirado-Rives J.; Jorgensen W. L. Evaluation and reparametrization of the OPLS-AA force field for proteins via comparison with accurate quantum chemical calculations on peptides. J. Phys. Chem. B 2001, 105, 6474–6487. 10.1021/jp003919d. [DOI] [Google Scholar]
Edelsbrunner H.; Shah N. R. Incremental topological flipping works for regular triangulations. Algorithmica 1996, 15, 223–241. 10.1007/BF01975867. [DOI] [Google Scholar]
Edelsbrunner H.Weighted Alpha Shapes; Technical Report UIUC-CS-R-92-1760; University of Illinois, Urbana, IL, 1992.
Amenta N.; Choi S.; Rote G.. Incremental constructions con BRIO. In SoCG03: Annual ACM Symposium on Computational Geometry; ACM, 2003; pp 211–219.
Liu Y.; Snoeyink J. A comparison of five implementations of 3D Delaunay tessalation. Combinatorial and Computational Geometry 2005, 52, 439–458. [Google Scholar]
Cignoni P.; Montani C.; Perego R.; Scopigno R. Parallel 3D Delaunay triangulation. Computer Graphics Forum 1993, 12, 129–142. 10.1111/1467-8659.1230129. [DOI] [Google Scholar]
Hardwick J. C.Implementation and Evaluation of an Efficient Parallel Delaunay Triangulation Algorithm. In Ninth Annual ACM Symposium on Parallel Algorithms and Architectures; ACM, 1997; pp 239–248.
Chen M.-B.The merge phase of parallel divide-and-conquer scheme for 3D Delaunay triangulation. In International Symposium on Parallel and Distributed Processing with Applications (ISPA); IEEE, 2010; pp 224–230.
Fuetterling V.; Lojewski C.; Pfreundt F.-J.. High-Performance Delaunay Triangulation for Many-Core Computers. In Proceedings of High Performance Graphics; ACM, 2014; p 97–104.
Lo S. 3D Delaunay triangulation of 1 billion points on a PC. Finite Elements in Analysis and Design 2015, 102–103, 65–73. 10.1016/j.finel.2015.05.003. [DOI] [Google Scholar]
Lin J.; Chen R.; Yang C.; Shu Z.; Wang C.; Lin Y.; Wu L.. Distributed and Parallel Delaunay Triangulation Construction with Balanced Binary-tree Model in Cloud. In 2016 15th International Symposium on Parallel and Distributed Computing; IEEE, 2016; pp 107–113.
Nguyen C.; Rhodes P. Delaunay triangulation of large-scale datasets using two-level parallelism. Parallel Computing 2020, 98, 102672. 10.1016/j.parco.2020.102672. [DOI] [Google Scholar]
Klose T.; Reteno D.; Benamar S.; Hollerbach A.; Colson P.; La Scola B.; Rossmann M. Structure of faustovirus, a large dsDNA virus. Proc. Natl. Acad. Sci. (USA) 2016, 113, 6206–6211. 10.1073/pnas.1523999113. [DOI] [PMC free article] [PubMed] [Google Scholar]
Montiel-Garcia D.; Santoyo-Rivera N.; Ho P.; Carrillo-Tripp M.; Brooks C. L. III; Johnson J. E; Reddy V. S VIPERdb v3.0: a structure-based data analytics platform for viral capsids. Nucl. Acids. Res. 2021, 49, D809–D816. 10.1093/nar/gkaa1096. [DOI] [PMC free article] [PubMed] [Google Scholar]
Masood T. B.; Ray T.; Natarajan V. Parallel computation of alpha complexes for biomolecules. Comput. Geom. 2020, 90, 101651. 10.1016/j.comgeo.2020.101651. [DOI] [Google Scholar]
Cao T.-T.; Nanjappa A.; Gao M.; Tan T.-S.. A GPU Accelerated Algorithm for 3D Delaunay Triangulation. In Proceedings of the 18th Meeting of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games; ACM, 2014; pp 47–54.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ci2c01346_si_001.pdf^{(289.4KB, pdf)}

Data Availability Statement

[ref1] Senior A.; Evans R.; Jumper J.; Kirkpatrick J.; Sifre L.; Green T.; Qin C.; Žídek A.; Nelson A.; Bridgland A.; Penedones H.; Petersen S.; Simonyan K.; Crossan S.; Kohli P.; Jones D.; Silver D.; Kavukcuoglu K.; Hassabis D. Improved protein structure prediction using potentials from deep learning. Nature 2020, 577, 706–710. 10.1038/s41586-019-1923-7. [DOI] [PubMed] [Google Scholar]

[ref2] Jumper J.; Evans R.; Pritzel A.; Green T.; Figurnov M.; Ronneberger O.; Tunyasuvunakool K.; Bates R.; Zidek A.; Potapenko A.; Bridgland A.; Meyer C.; Kohl S. A. A.; Ballard A. J.; Cowie A.; Romera-Paredes B.; Nikolov S.; Jain R.; Adler J.; Back T.; Petersen S.; Reiman D.; Clancy E.; Zielinski M.; Steinegger M.; Pacholska M.; Berghammer T.; Bodenstein S.; Silver D.; Vinyals O.; Senior A. W.; Kavukcuoglu K.; Kohli P.; Hassabis D. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583–589. 10.1038/s41586-021-03819-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref3] Hassoun S.; Jefferson F.; Shi X.; Stucky B.; Wang J.; Rosa E. Jr Artificial intelligence for biology. Integrative and Comparative Biology 2022, 61, 2267–2275. 10.1093/icb/icab188. [DOI] [PubMed] [Google Scholar]

[ref4] Xu Y.; Liu X.; Cao X.; Huang C.; Liu E.; Qian S.; Liu X.; Wu Y.; Dong F.; Qiu C.-W.; Qiu J.; Hua K.; Su W.; Wu J.; Xu H.; Han Y.; Fu C.; Yin Z.; Liu M.; Roepman R.; Dietmann S.; Virta M.; Kengara F.; Zhang Z.; Zhang L.; Zhao T.; Dai J.; Yang J.; Lan L.; Luo M.; Liu Z.; An T.; Zhang B.; He X.; Cong S.; Liu X.; Zhang W.; Lewis J. P.; Tiedje J. M.; Wang Q.; An Z.; Wang F.; Zhang L.; Huang T.; Lu C.; Cai Z.; Wang F.; Zhang J. Artificial intelligence: A powerful paradigm for scientific research. Innovation 2021, 2, 100179. 10.1016/j.xinn.2021.100179. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref5] Callaway E. It will change everything”: DeepMind’s AI makes gigantic leap in solving protein structures. Nature 2020, 588, 203–205. 10.1038/d41586-020-03348-4. [DOI] [PubMed] [Google Scholar]

[ref6] Jones D.; Thornton J. The impact of AlphaFold2 one year on. Nat. Methods 2022, 19, 15–20. 10.1038/s41592-021-01365-3. [DOI] [PubMed] [Google Scholar]

[ref7] Nussinov R.; Zhang M.; Liu Y.; Jang H. AlphaFold, Artificial Intelligence, and Allostery. J. Phys. Chem. B 2022, 126, 6372–6383. 10.1021/acs.jpcb.2c04346. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref8] Berman H. M.; Westbrook J.; Feng Z.; Gilliland G.; Bhat T. N.; Weissig H.; Shindyalov I.; Bourne P. The Protein Data Bank. Nucl. Acids. Res. 2000, 28, 235–242. 10.1093/nar/28.1.235. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref9] Varadi M.; Anyango S.; Deshpande M.; Nair S.; Natassia C.; Yordanova G.; Yuan D.; Stroe O.; Wood G.; Laydon A.; Zidek A.; Green T.; Tunyasuvunakool K.; Petersen S.; Jumper J.; Clancy E.; Green R.; Vora A.; Lutfi M.; Figurnov M.; Cowie A.; Hobbs N.; Kohli P.; Kleywegt G.; Birney E.; Hassabis D.; Velankar S. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 2022, 50, D439–D444. 10.1093/nar/gkab1061. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref10] Shaw D.; Deneroff M.; Dror R.; Kuskin J.; Larson R.; Salmon J.; Young C.; Batson B.; Bowers K.; Chao J.; Eastwood M.; Gagliardo J.; Grossman J.; Ho C.; Ierardi D.; Kolossváry I.; Klepeis J.; Layman T.; McLeavey C.; Moraes M.; Mueller R.; Priest E.; Shan Y.; Spengler J.; Theobald M.; Towles B.; Wang S. Anton, a Special-purpose Machine for Molecular Dynamics Simulation. Commun. ACM 2008, 51, 91–97. 10.1145/1364782.1364802. [DOI] [Google Scholar]

[ref11] Stone J.; Hardy D.; Ufimtsev I.; Schulten K. GPU-accelerated molecular modeling coming of age. J. Mol. Graph. Modelling 2010, 29, 116–125. 10.1016/j.jmgm.2010.06.010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref12] Wang Y.; Harrison C.; Schulten K.; McCammon J. Implementation of accelerated molecular dynamics in NAMD. Comput. Sci. Discovery 2011, 4, 015002. 10.1088/1749-4699/4/1/015002. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref13] Pierce L. C.T.; Salomon-Ferrer R.; de Oliveira C. A. F.; McCammon J. A.; Walker R. C. Routine access to millisecond time scale events with accelerated molecular dynamics. J. Chem. Theori. Comput. 2012, 8, 2997–3002. 10.1021/ct300284c. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref14] Sweet J.; Nowling R.; Cickovski T.; Sweet C.; Pande V.; Izaguirre J. Long Timestep Molecular Dynamics on the Graphical Processing Unit. J. Chem. Theori. Comput. 2013, 9, 3267–3281. 10.1021/ct400331r. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref15] Shaw D. E.; Grossman J. P.; Bank J. A.; Batson B.; Butts J. A.; Chao J. C.; Deneroff M. M.; Dror R. O.; Even A.; Fenton C. H.; Forte A.; Gagliardo J.; Gill G.; Greskamp B.; Ho C. R.; Ierardi D. J.; Iserovich L.; Kuskin J. S.; Larson R. H.; Layman T.; Lee L. S.; Lerer A. K.; Li C.; Killebrew D.; Mackenzie K. M.; Mok S. Y. H.; Moraes M. A.; Mueller R.; Nociolo L. J.; Peticolas J. L.; Quan T.; Ramot D.; Salmon J. K.; Scarpazza D. P.; Schafer U. B.; Siddique N.; Snyder C. W.; Spengler J.; Tang P. T. P.; Theobald M.; Toma H.; Towles B.; Vitale B.; Wang S. C.; Young C.. Anton 2: Raising the Bar for Performance and Programmability in a Special-Purpose Molecular Dynamics Supercomputer. SC14: International Conference for High Performance Computing, Networking, Storage and Analysis; In SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis; IEEE, 2014; pp 41–53.

[ref16] Eastman P.; Pande V. OpenMM: A Hardware Independent Framework for Molecular Simulations. Comput. Sci. Eng. 2015, 12, 34–39. 10.1109/MCSE.2010.27. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref17] Zhao G.; Perilla J.; Yufenyuy E.; Meng X.; Chen B.; Ning J.; Ahn J.; Gronenborn A.; Schulten K.; Aiken C.; Zhang P. Mature HIV-1 capsid structure by cryo-electron microscopy and all-atom molecular dynamics. Nature 2013, 497, 643–646. 10.1038/nature12162. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref18] Sener M.; Strumpfer J.; Singharoy A.; Hunter C.; Schulten K. Overall energy conversion efficiency of a photosynthetic vesicle. Elife 2016, 5, na. 10.7554/eLife.09541. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref19] Phillips J.; Hardy D.; Maia J.; Stone J.; Ribeiro J.; Bernardi R.; Buch R.; Fiorin G.; Hénin J.; Jiang W.; McGreevy R.; Melo M.; Radak B.; Skeel R.; Singharoy A.; Wang Y.; Roux B.; Aksimentiev A.; Luthey-Schulten Z.; Kalé L.; Schulten K.; Chipot C.; Tajkhorshid E. Scalable molecular dynamics on CPU and GPU architectures with NAMD. J. Chem. Phys. 2020, 153, 044130. 10.1063/5.0014475. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref20] Jung J.; Kobayashi C.; Kasahara K.; Tan C.; Kuroda A.; Minami K.; Ishiduki S.; Nishiki T.; Inoue H.; Ishikawa Y.; Feig M.; Sugita Y. New parallel computing algorithm of molecular dynamics for extremely huge scale biological systems. J. Comput. Chem. 2021, 42, 231–241. 10.1002/jcc.26450. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref21] Gupta C.; Sarkar D.; Tieleman D.; Singharoy A. The ugly, bad, and good stories of large-scale biomolecular simulations. Curr. Opin. Struct. Biol. 2022, 73, 102338. 10.1016/j.sbi.2022.102338. [DOI] [PubMed] [Google Scholar]

[ref22] Eisenberg D.; McLachlan A. D. Solvation energy in protein folding and binding. Nature (London) 1986, 319, 199–203. 10.1038/319199a0. [DOI] [PubMed] [Google Scholar]

[ref23] Richards F. M. Areas, volumes, packing, and protein-structure. Annu. Rev. Biophys. Bioeng. 1977, 6, 151–176. 10.1146/annurev.bb.06.060177.001055. [DOI] [PubMed] [Google Scholar]

[ref24] Kollman P.; Massova I.; Reyes C.; Kuhn B.; Huo S.; Chong L.; Lee M.; Lee T.; Duan Y.; Wang W.; Donini O.; Cieplak P.; Srinivasan J.; Case D.; Cheatham T. Calculating structures and free energies of complex molecules: combining molecular mechanics and continuum models. Acc. Chem. Res. 2000, 33, 889–897. 10.1021/ar000033j. [DOI] [PubMed] [Google Scholar]

[ref25] Lum K.; Chandler D.; Weeks J. D. Hydrophobicity at small and large length scales. J. Phys. Chem. B 1999, 103, 4570–4577. 10.1021/jp984327m. [DOI] [Google Scholar]

[ref26] Reiss H.; Frisch H.; Lebowitz J. Statistical mechanics of rigid spheres. J. Chem. Phys. 1959, 31, 369–380. 10.1063/1.1730361. [DOI] [Google Scholar]

[ref27] Lebowitz J.; Helfand E.; Praestgaard E. Scaled particle theory of fluid mixtures. J. Chem. Phys. 1965, 43, 774–779. 10.1063/1.1696842. [DOI] [Google Scholar]

[ref28] Rosenfeld Y. Scaled field particle theory of the structure and the thermodynamics of isotropic hard particle fluids. J. Chem. Phys. 1988, 89, 4272–4287. 10.1063/1.454810. [DOI] [Google Scholar]

[ref29] Levy R.; Zhang L.; Gallicchio E.; Felts A. On the nonpolar hydration free energy of proteins: surface area and continuum solvent models for the solute- solvent interaction energy. J. Am. Chem. Soc. 2003, 125, 9523–9530. 10.1021/ja029833a. [DOI] [PubMed] [Google Scholar]

[ref30] Wagoner J.; Baker N. Assessing implicit models for nonpolar mean solvation forces: the importance of dispersion and volume terms. Proc. Natl. Acad. Sci. (USA) 2006, 103, 8331–8336. 10.1073/pnas.0600118103. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref31] Chen J.; Brooks C. III Implicit modeling of nonpolar solvation for simulating protein folding and conformational transitions. Phys. Chem. Chem. Phys. 2008, 10, 471–481. 10.1039/B714141F. [DOI] [PubMed] [Google Scholar]

[ref32] König P.-M.; Roth R.; Mecke K. Morphological thermodynamics of fluids: shape dependence of free energies. Phys. Rev. Lett. 2004, 93, 160601. 10.1103/PhysRevLett.93.160601. [DOI] [PubMed] [Google Scholar]

[ref33] Roth R.; Harano Y.; Kinoshita M. Morphometric approach to the solvation free energy of complex molecules. Phys. Rev. Lett. 2006, 97, 078101. 10.1103/PhysRevLett.97.078101. [DOI] [PubMed] [Google Scholar]

[ref34] Hansen-Goos H.; Roth R.; Mecke K.; Dietrich S. Solvation of proteins: linking thermodynamics to geometry. Phys. Rev. Lett. 2007, 99, 128101. 10.1103/PhysRevLett.99.128101. [DOI] [PubMed] [Google Scholar]

[ref35] Harano Y.; Roth R.; Chiba S. A morphometric approach to the accurate solvation thermodynamics of proteins and ligands. J. Comput. Chem. 2013, 34, 1969–1974. 10.1002/jcc.23348. [DOI] [PubMed] [Google Scholar]

[ref36] Lee B.; Richards F. M. Interpretation of protein structures: estimation of static accessibility. J. Mol. Biol. 1971, 55, 379–400. 10.1016/0022-2836(71)90324-X. [DOI] [PubMed] [Google Scholar]

[ref37] Shrake A.; Rupley J. A. Environment and exposure to solvent of protein atoms. Lysozyme and insulin. J. Mol. Biol. 1973, 79, 351–371. 10.1016/0022-2836(73)90011-9. [DOI] [PubMed] [Google Scholar]

[ref38] Legrand S. M.; Merz K. M. Rapid approximation to molecular-surface area via the use of boolean logic and look-up tables. J. Comput. Chem. 1993, 14, 349–352. 10.1002/jcc.540140309. [DOI] [Google Scholar]

[ref39] Wang H.; Levinthal C. A vectorized algorithm for calculating the accessible surface area of macromolecules. J. Comput. Chem. 1991, 12, 868–871. 10.1002/jcc.540120712. [DOI] [Google Scholar]

[ref40] Futamura N.; Aluru S.; Ranjan D.; Hariharan B. Efficient parallel algorithms for solvent accessible surface area of proteins. IEEE Trans. Parallel Dist. Syst. 2002, 13, 544–555. 10.1109/TPDS.2002.1011399. [DOI] [Google Scholar]

[ref41] Rowlinson J. S. The triplet distribution function in a fluid of hard spheres. Mol. Phys. 1963, 6, 517–524. 10.1080/00268976300100581. [DOI] [Google Scholar]

[ref42] Pavani R.; Ranghino G. A method to compute the volume of a molecule. Computers and Chemistry 1982, 6, 133–135. 10.1016/0097-8485(82)80006-5. [DOI] [Google Scholar]

[ref43] Gavezzotti A. The calculation of molecular volumes and the use of volume analysis in the investigation of structured media and of solid-state organic reactivity. J. Am. Chem. Soc. 1983, 105, 5220–5225. 10.1021/ja00354a007. [DOI] [Google Scholar]

[ref44] Till M.; Ullmann G. M. McVol - A program for calculating protein volumes and identifying cavities by a Monte Carlo algorithm. J. Mol. Model. 2010, 16, 419–429. 10.1007/s00894-009-0541-y. [DOI] [PubMed] [Google Scholar]

[ref45] Wodak S. J.; Janin J. Analytical approximation to the accessible surface-area of proteins. Proc. Natl. Acad. Sci. (USA) 1980, 77, 1736–1740. 10.1073/pnas.77.4.1736. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref46] Hasel W.; Hendrickson T. F.; Still W.C. A rapid approximation to the solvent accessible surface areas of atoms. Tetrahed. Comp. Method. 1988, 1, 103–116. 10.1016/0898-5529(88)90015-2. [DOI] [Google Scholar]

[ref47] Cavallo L. POPS: a fast algorithm for solvent accessible surface areas at atomic and residue level. Nucl. Acids. Res. 2003, 31, 3364–3366. 10.1093/nar/gkg601. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref48] Street A. G.; Mayo S. L. Pairwise calculation of protein solvent-accessible surface areas. Folding & Design 1998, 3, 253–258. 10.1016/S1359-0278(98)00036-4. [DOI] [PubMed] [Google Scholar]

[ref49] Weiser J.; Shenkin P. S.; Still W. C. Approximate atomic surfaces from linear combinations of pairwise overlaps (LCPO). J. Comput. Chem. 1999, 20, 217–230. . [DOI] [Google Scholar]

[ref50] Dynerman D.; Butzlaff E.; Mitchell J. CUSA and CUDE: GPU-accelerated methods for estimating solvent accessible surface area and desolvation. J. Comput. Biol. 2009, 16, 523–537. 10.1089/cmb.2008.0157. [DOI] [PubMed] [Google Scholar]

[ref51] Wawak R. J.; Gibson K. D.; Scheraga H. A. Gradient discontinuities in calculations involving molecular-surface area. J. Math. Chem. 1994, 15, 207–232. 10.1007/BF01277561. [DOI] [Google Scholar]

[ref52] Gogonea V.; Osawa E. An improved algorithm for the analytical computation of solvent-excluded volume. The treatment of singularities in solvent-accessible surface-area and volume functions. J. Comput. Chem. 1995, 16, 817–842. 10.1002/jcc.540160703. [DOI] [Google Scholar]

[ref53] Richmond T. J. Solvent accessible surface-area and excluded volume in proteins. Analytical equations for overlapping spheres and implications for the hydrophobic effect. J. Mol. Biol. 1984, 178, 63–89. 10.1016/0022-2836(84)90231-6. [DOI] [PubMed] [Google Scholar]

[ref54] Connolly M. L. Computation of molecular volume. J. Am. Chem. Soc. 1985, 107, 1118–1124. 10.1021/ja00291a006. [DOI] [Google Scholar]

[ref55] Dodd L. R.; Theodorou D. N. Analytical treatment of the volume and surface area of molecules formed by an arbitrary collection of unequal spheres intersected by planes. Mol. Phys. 1991, 72, 1313–45. 10.1080/00268979100100941. [DOI] [Google Scholar]

[ref56] Irisa M. An elegant algorithm of the analytical calculation for the volume of fused spheres with different radii. Comput. Phys. Commun. 1996, 98, 317–338. 10.1016/0010-4655(96)00082-3. [DOI] [Google Scholar]

[ref57] Vassetti D.; Civalleri B.; Labat F. Analytical calculation of the solvent-accessible surface area and its nuclear gradients by stereographic projection: A general approach for molecules, polymers, nanotubes, helices, and surfaces. J. Comput. Chem. 2020, 41, 1464–1479. 10.1002/jcc.26191. [DOI] [PubMed] [Google Scholar]

[ref58] Duan X.; Quan C.; Stamm B. A boundary-partition-based Voronoi diagram of d-dimensional balls: definition, properties, and applications. Adv. Comput. Math. 2020, 46 (44), 1–25. 10.1007/s10444-020-09765-3. [DOI] [Google Scholar]

[ref59] Edelsbrunner H. The union of balls and its dual shape. Discrete Comput. Geom. 1995, 13, 415–440. 10.1007/BF02574053. [DOI] [Google Scholar]

[ref60] Edelsbrunner H.; Koehl P. The weighted-volume derivative of a space-filling diagram. Proc. Natl. Acad. Sci. (USA) 2003, 100, 2203–2208. 10.1073/pnas.0537830100. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref61] Bryant R.; Edelsbrunner H.; Koehl P.; Levitt M. The area derivative of a space-filling diagram. Discrete Comput. Geom 2004, 32, 293–308. 10.1007/s00454-004-1099-1. [DOI] [Google Scholar]

[ref62] Edelsbrunner H.; Koehl P. The geometry of biomolecular solvation. Combinatorial and Computational Geometry 2005, 52, 243–275. [Google Scholar]

[ref63] Akopyan A.; Edelsbrunner H. The Weighted Mean Curvature Derivative of a Space-Filling Diagram. Comput. Math. Biophys. 2020, 8, 51–67. 10.1515/cmb-2020-0100. [DOI] [Google Scholar]

[ref64] Akopyan A.; Edelsbrunner H. The Weighted Gaussian Curvature Derivative of a Space-Filling Diagram. Comput. Math. Biophys. 2020, 8, 74–88. 10.1515/cmb-2020-0101. [DOI] [Google Scholar]

[ref65] Carrillo-Tripp M.; Shepherd C. M.; Borelli I. A.; Venkataraman S.; Lander G.; Natarajan P.; Johnson J. E.; Brooks C. L.; Reddy V. S. VIPERdb2: an enhanced and web API enabled relational database for structural virology. Nucl. Acids. Res. 2009, 37, D436–D442. 10.1093/nar/gkn840. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref66] Edelsbrunner H.; Mücke E. P. Simulation of simplicity: a technique to cope with degenerate cases in geometric algorithms. ACM Trans. Graphics 1990, 9, 66–104. 10.1145/77635.77639. [DOI] [Google Scholar]

[ref67] Mach P.; Koehl P. Geometric measures of large biomolecules: surface, volume, and pockets. J. Comput. Chem. 2011, 32, 3023–3038. 10.1002/jcc.21884. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref68] Edelsbrunner H.; Mücke E. P. Three-dimensional Alpha Shapes. ACM Trans. Graphics 1994, 13, 43–72. 10.1145/174462.156635. [DOI] [Google Scholar]

[ref69] Kaminski G. A.; Friesner R. A.; Tirado-Rives J.; Jorgensen W. L. Evaluation and reparametrization of the OPLS-AA force field for proteins via comparison with accurate quantum chemical calculations on peptides. J. Phys. Chem. B 2001, 105, 6474–6487. 10.1021/jp003919d. [DOI] [Google Scholar]

[ref70] Edelsbrunner H.; Shah N. R. Incremental topological flipping works for regular triangulations. Algorithmica 1996, 15, 223–241. 10.1007/BF01975867. [DOI] [Google Scholar]

[ref71] Edelsbrunner H.Weighted Alpha Shapes; Technical Report UIUC-CS-R-92-1760; University of Illinois, Urbana, IL, 1992.

[ref72] Amenta N.; Choi S.; Rote G.. Incremental constructions con BRIO. In SoCG03: Annual ACM Symposium on Computational Geometry; ACM, 2003; pp 211–219.

[ref73] Liu Y.; Snoeyink J. A comparison of five implementations of 3D Delaunay tessalation. Combinatorial and Computational Geometry 2005, 52, 439–458. [Google Scholar]

[ref74] Cignoni P.; Montani C.; Perego R.; Scopigno R. Parallel 3D Delaunay triangulation. Computer Graphics Forum 1993, 12, 129–142. 10.1111/1467-8659.1230129. [DOI] [Google Scholar]

[ref75] Hardwick J. C.Implementation and Evaluation of an Efficient Parallel Delaunay Triangulation Algorithm. In Ninth Annual ACM Symposium on Parallel Algorithms and Architectures; ACM, 1997; pp 239–248.

[ref76] Chen M.-B.The merge phase of parallel divide-and-conquer scheme for 3D Delaunay triangulation. In International Symposium on Parallel and Distributed Processing with Applications (ISPA); IEEE, 2010; pp 224–230.

[ref77] Fuetterling V.; Lojewski C.; Pfreundt F.-J.. High-Performance Delaunay Triangulation for Many-Core Computers. In Proceedings of High Performance Graphics; ACM, 2014; p 97–104.

[ref78] Lo S. 3D Delaunay triangulation of 1 billion points on a PC. Finite Elements in Analysis and Design 2015, 102–103, 65–73. 10.1016/j.finel.2015.05.003. [DOI] [Google Scholar]

[ref79] Lin J.; Chen R.; Yang C.; Shu Z.; Wang C.; Lin Y.; Wu L.. Distributed and Parallel Delaunay Triangulation Construction with Balanced Binary-tree Model in Cloud. In 2016 15th International Symposium on Parallel and Distributed Computing; IEEE, 2016; pp 107–113.

[ref80] Nguyen C.; Rhodes P. Delaunay triangulation of large-scale datasets using two-level parallelism. Parallel Computing 2020, 98, 102672. 10.1016/j.parco.2020.102672. [DOI] [Google Scholar]

[ref81] Klose T.; Reteno D.; Benamar S.; Hollerbach A.; Colson P.; La Scola B.; Rossmann M. Structure of faustovirus, a large dsDNA virus. Proc. Natl. Acad. Sci. (USA) 2016, 113, 6206–6211. 10.1073/pnas.1523999113. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref82] Montiel-Garcia D.; Santoyo-Rivera N.; Ho P.; Carrillo-Tripp M.; Brooks C. L. III; Johnson J. E; Reddy V. S VIPERdb v3.0: a structure-based data analytics platform for viral capsids. Nucl. Acids. Res. 2021, 49, D809–D816. 10.1093/nar/gkaa1096. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref83] Masood T. B.; Ray T.; Natarajan V. Parallel computation of alpha complexes for biomolecules. Comput. Geom. 2020, 90, 101651. 10.1016/j.comgeo.2020.101651. [DOI] [Google Scholar]

[ref84] Cao T.-T.; Nanjappa A.; Gao M.; Tan T.-S.. A GPU Accelerated Algorithm for 3D Delaunay Triangulation. In Proceedings of the 18th Meeting of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games; ACM, 2014; pp 47–54.

PERMALINK

Computing the Volume, Surface Area, Mean, and Gaussian Curvatures of Molecules and Their Derivatives

Patrice Koehl

Arseniy Akopyan

Herbert Edelsbrunner

Abstract

Introduction

A Morphometric Approach to the Nonpolar Solvation Free Energy

Figure 1.

Computing Geometric Measures of Biomolecules

This Work

Outline

Measuring Union of Balls

Background on Voronoi Decompositions and Dual Complexes

Figure 2.

Area and Volume Formulas

Mean Curvature formulas

Figure 3.

Gaussian Curvature Formulas

The Nonpolar Solvation Free Energy Wnp

Algorithm and Implementation

A Sequential Algorithm for Measuring Biomolecules

Figure 4.

Figure 5.

A Parallel Algorithm for Measuring Biomolecules

Figure 6.

Figure 7.

Figure 8.

Figure 9.

Figure 10.

Conclusion

Acknowledgments

Data Availability Statement

Supporting Information Available

Supplementary Material

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

The Nonpolar Solvation Free Energy W_np