DiSCaMB: a software library for aspherical atom model X-ray scattering factor calculations with CPUs and GPUs

Michał L Chodkiewicz; Szymon Migacz; Witold Rudnicki; Anna Makal; Jarosław A Kalinowski; Nigel W Moriarty; Ralf W Grosse-Kunstleve; Pavel V Afonine; Paul D Adams; Paulina Maria Dominiak

doi:10.1107/S1600576717015825

. 2018 Feb 1;51(Pt 1):193–199. doi: 10.1107/S1600576717015825

DiSCaMB: a software library for aspherical atom model X-ray scattering factor calculations with CPUs and GPUs

Michał L Chodkiewicz ^a,^*, Szymon Migacz ^b, Witold Rudnicki ^b,^c,^*, Anna Makal ^a, Jarosław A Kalinowski ^d, Nigel W Moriarty ^d, Ralf W Grosse-Kunstleve ^e, Pavel V Afonine ^d, Paul D Adams ^d,^f, Paulina Maria Dominiak ^a,^*

PMCID: PMC5822993 PMID: 29507550

A C++ library for calculation of structure factors from the Hansen–Coppens multipolar model of crystal electron density, featuring parallel implementation for multi-core processors and graphics processing units, is presented.

Keywords: structure factors, refinement, GPU, multipole model

Abstract

It has been recently established that the accuracy of structural parameters from X-ray refinement of crystal structures can be improved by using a bank of aspherical pseudoatoms instead of the classical spherical model of atomic form factors. This comes, however, at the cost of increased complexity of the underlying calculations. In order to facilitate the adoption of this more advanced electron density model by the broader community of crystallographers, a new software implementation called DiSCaMB, ‘densities in structural chemistry and molecular biology’, has been developed. It addresses the challenge of providing for high performance on modern computing architectures. With parallelization options for both multi-core processors and graphics processing units (using CUDA), the library features calculation of X-ray scattering factors and their derivatives with respect to structural parameters, gives access to intermediate steps of the scattering factor calculations (thus allowing for experimentation with modifications of the underlying electron density model), and provides tools for basic structural crystallographic operations. Permissively (MIT) licensed, DiSCaMB is an open-source C++ library that can be embedded in both academic and commercial tools for X-ray structure refinement.

1. Introduction

The model of atomic electron density most widely applied in crystal structure refinement (the independent atom model, IAM) assumes spherical symmetry of the atomic density. It therefore cannot account for many effects resulting from the influence of the chemical environment on atomic density – e.g. chemical bonds, charge transfer, lone pairs, intermolecular interactions etc. – and the parameters obtained in the corresponding refinement are therefore subject to some systematic errors. Advances in measurements and computational techniques of X-ray diffraction have made most routine diffraction measurements on single crystals of small molecules (i.e. measurements of the resolution d _min ≤ 5/6 Å as recommended by the International Union of Crystallography; IUCr, 2012 ▸) sufficient to display the deficiencies of the IAM model.

More realistic models of the electron density have been proposed with refinable parameterization of the electron density (Hirshfeld, 1971 ▸; Stewart et al., 1975 ▸; Hansen & Coppens, 1978 ▸). Owing to the large number of refined parameters (compared to IAM) such models usually require X-ray diffraction data of subatomic resolution (d _min ≤ 0.5 Å). However, even if no high-resolution data are available, the multipole model can still be applied using a so-called transferable aspherical atom model (TAAM). In this case parameters of atomic electron density are not refined but instead they are constrained to values typical for the corresponding atom type. Such parameters of atomic electron densities and algorithms for atom type assignment have been incorporated into databanks of pseudoatom parameters such as ELMAM (Pichon-Pesme et al., 1995 ▸; Zarychta et al., 2007 ▸; Domagała et al., 2012 ▸), Invariom (Dittrich et al., 2004 ▸, 2013 ▸; Dittrich, Hübschle et al., 2006 ▸) and UBDB (Volkov et al., 2004 ▸, 2007 ▸; Jarzembska & Dominiak, 2012 ▸). There exist a wealth of publications indicating that TAAM refinement gives superior results over IAM in terms of figures of merit, atomic positions and atomic displacement parameters (e.g. Jelsch et al., 1998 ▸; Dittrich, Hübschle et al., 2006 ▸; Dittrich, Munshi & Spackman, 2006 ▸; Volkov et al., 2007 ▸; Dittrich et al., 2008 ▸; Bąk et al., 2011 ▸; Sanjuan-Szklarz et al., 2016 ▸). In addition, TAAM provides access to quantitative estimation of the electron density and properties derived from it (dipole moments, electrostatic potentials etc.) for molecules and crystals.

In a similar way to the case of TAAM, the idea of transferability was applied also to constructing a databank of extremely localized molecular orbitals (ELMO; Meyer, Guillot, Ruiz-Lopez & Genoni, 2016 ▸; Meyer, Guillot, Ruiz-Lopez, Jelsch & Genoni, 2016 ▸). Application in crystallographic refinement is one of the intended uses of the databank, but no practical application for this purpose has been reported yet.

Recently, the Hirshfeld atom refinement procedure has been introduced. In this approach aspherical atom density is obtained by partitioning quantum mechanical electron density into atomic contributions (Jayatilaka & Dittrich, 2008 ▸). This method allows for very accurate determination of the hydrogen atom positions and atomic displacement parameters (Capelli et al., 2014 ▸; Woińska et al., 2016 ▸), yet it is computationally more demanding than the other methods.

The direct approach to structure factor calculation involves summation of contributions from each scattering center (an atom) at each reciprocal lattice point within a desired resolution limit. It has an approximately quadratic computational complexity with respect to the size of the system studied, resulting from the fact that both the number of atoms and the number of reciprocal lattice points within a fixed resolution limit scale approximately linearly with the unit-cell volume. With the major advantage of simplicity, direct summation is the method of choice in small-molecule crystallography, that is when computational cost is not a primary factor.

In the case of macromolecules, though, structure factor calculations using direct summation can be very time consuming. Fast Fourier transform based approaches with Nlog(N) scaling can remedy that problem. Unfortunately, these cannot be applied in a straightforward manner to the case of the Hansen–Coppens model as its implementation would require (not presently available) efficient implementation of a ‘dynamical’ atomic electron density calculation for the model. These are convolutions of the usual atomic density functions and atomic thermal displacement functions, which in the case of the Hansen–Coppens model involve, respectively, Slater and Gaussian radial functions. Alternative methodologies for dealing with aspherical atomic densities at scale have been introduced: polarizable Cartesian Gaussian multipoles (Schnieders et al., 2009 ▸) and spherical interatomic scatterers (Afonine et al., 2007 ▸). Nevertheless, it is still the Hansen–Coppens model that has been most extensively studied, in particular in macromolecular refinement, either with optimization of the multipolar parameters for ultra-high-resolution data (e.g. Jelsch et al., 2000 ▸; Guillot et al., 2008 ▸; Hirano et al., 2016 ▸) or more commonly via TAAM refinement (e.g. Malinska & Dauter, 2016 ▸; Muzet et al., 2003 ▸; Pröpper et al., 2013 ▸; Schmidt et al., 2003 ▸; Held & van Smaalen, 2014 ▸; Howard et al., 2016 ▸). As TAAM seems to be a good candidate for replacing the IAM model in routine refinement for molecular crystals (Sanjuan-Szklarz et al., 2016 ▸; Dittrich et al., 2013 ▸), availability of software implementations becomes a prominent issue.

There are a few packages that provide multipolar/TAAM refinement, including MoPro (Jelsch et al., 2005 ▸), MOLLY (Hansen & Coppens, 1978 ▸), Jana2006 (Petrícek et al., 2014 ▸) and XD2016 (Volkov et al., 2016 ▸). However, each of them is a highly specialized entity, requires very distinct data formats and, apart from MoPro, is hardly applicable to larger (macromolecular) cases. We therefore decided to build a new toolbox of software libraries, intended to facilitate integration of the aspherical atom model into a wide range of refinement programs commonly used in X-ray crystallography. To achieve this goal the software toolbox must provide easily embeddable components under a permissive license, and so we have used the MIT open-source license for the new DiSCaMB library described here. It is designed for use in structure refinement with X-ray diffraction data in both small-molecule and macromolecular crystallography. In the latter case, as Nlog(N) methods are currently not feasible for the Hansen–Coppens model, the implementation heavily exploits parallelization for both CPUs (central processing units – computer processors) and GPUs (graphics processing units – electronic circuits originally designed for graphics-related computations) to speed up calculations.

2. General considerations

One of the main goals of the development of the DiSCaMB library is to use it in software for X-ray structure refinement. Commonly, this kind of software needs to calculate both structure factors and related derivatives, and this is the task the library was intended for. To ensure general applicability of the library (reusability in multiple programs) it is crucial to define where and how it will fit into the structure common for X-ray refinement programs. The general structure of such a program and the way the library can be incorporated into it is illustrated in Fig. 1 ▸. The part specific to aspherical models of atomic electron density is shown as a separate component of the refinement program (ASPHERICAL ATOM CODE). Tasks related to this component involve assignment of the parameters of the aspherical atom model and calculation of structure factors and related derivatives. In the case of TAAM refinement the assignment of parameters corresponds to assignment of atom types and connecting the types with parameters stored in an aspherical atom bank. Atom type assignment is specific to a particular pseudoatom databank. In contrast, the structure factor calculation implemented in the library is universal and can be shared by all banks using the Hansen–Coppens model of electron density. The code responsible for communication between the aspherical atom code and refinement code is denoted in the diagram as INTERFACE. Ideally, it will be possible to replace code on one side of the interface without the need to change the code on the other side, e.g. easily exchange the ‘aspherical atom code’ with another code providing similar functionality. It is assumed that there are three main tasks in the refinement code related to multipolar model scattering factor calculations:

Structure of an X-ray refinement program using a separate module for the aspherical atom model calculation with pseudoatom databank parameterization.

(1) Assignment of the multipolar model parameters to atoms.

(2) Transferring the parameters to the DiSCaMB library code – at this point some data which could be subsequently used in multiple structure factor calculations are precalculated and stored.

(3) Transferring current structural parameters to the DiSCaMB library code for calculation of structure factors and their derivatives.

The DiSCaMB library alone does not have the functionality required to perform the first task, since it is expected that the code specific to the pseudoatom databank can perform this task. It is, however, able to read multipolar model parameters from files in the XD format generated by LSDB (Volkov et al., 2004 ▸; Jarzembska & Dominiak, 2012 ▸). The DiSCaMB library can be involved in the next two tasks, being responsible for storing multipolar model parameters, calculation and storage of additional data, and using the data in (multiple) structure factor calculations. In the case of gradient calculations in the third task, two groups of algorithms are provided. One of them typically would be used by small-molecule refinement programs which usually apply nonlinear least-squares procedures. The construction of the matrices used in the procedure can be realized using various strategies and can optionally involve direct inclusion of constraints (Bourhis et al., 2015 ▸). All of the strategies require access to derivatives of structure factors with respect to structural parameters at a given scattering vector (h) – this functionality is provided by the library. Such derivatives are then processed in different ways depending on the particular implementation. In the case of macromolecules, structural parameters are usually optimized using gradient methods. Here, derivatives of the target function with respect to the real and imaginary components of the structure factor are expected to be provided by the refinement code. These are subsequently used for calculation of the derivatives of the target function with respect to structural parameters using the chain rule. This macromolecular version of the structure factor and gradient calculation was parallelized since computational performance can be very important in this case.

3. Multipolar model structure factors – theory and implementation

The expression for the aspherical atom model structure factor calculation is similar to that for the IAM model with the difference that the asphericity of atomic form factors has to be taken into account (see Appendix B ). The general strategy of calculation is also analogous [see e.g. Bourhis et al. (2015 ▸) for details of the IAM model]. Here we briefly describe the atomic form factor calculations. Atomic electron density in the Hansen–Coppens multipolar model (Hansen & Coppens, 1978 ▸; Coppens, 1997 ▸) can be expressed as a linear combination of terms [ Inline graphic ] of the following type:

Here, Inline graphic is a density normalized spherical harmonic function (see Appendix B ), a vector with a circumflex (‘hat’) represents a versor (i.e. ) and is a Slater-type radial function:

The corresponding scattering factor for the term Inline graphic is given as

The function g is the three-dimensional Fourier–Bessel transform of the radial function:

where Inline graphic is the lth-order spherical Bessel function. Analytical formulas for the g functions for given l and n have been tabularized (Avery & Watson, 1977 ▸; Su & Coppens, 1990 ▸). They can also be obtained recursively (Avery & Watson, 1977 ▸; Deutsch, 1993 ▸) or evaluated from general analytical expressions (Restori, 1990 ▸). DiSCaMB provides functionality for calculation of both functions, g and spherical harmonics (for Inline graphic ), using tabularized formulas. This is sufficient for straightforward calculation of the terms [equation (3)] contributing to atomic form factors, but to calculate the form factor itself one must also find the values of the coefficients for these terms. For this purpose it is convenient to refer to the pseudoatom electron density in the Hansen–Coppens model in its usual representation and explicitly express its dependence on the model parameters:

where Inline graphic and are the spherically averaged free-atom core and valence, respectively, theoretically derived electron densities, normalized to one electron. These two terms evaluate to a linear combination of Slater s-type functions:

(analogously for Inline graphic ), where are occupation factors for the core orbitals, are parameters of orbitals of the isolated reference atom, N is a normalization factor and symbolizes spherical symmetrization (see Appendix B ). The occupancy factors can be either user defined or set to default values corresponding to filling the lowest-energy orbitals. The parameters Inline graphic , and depend on the parameters of the orbitals and their occupancies (for the sake of clarity this dependence is not explicitly shown in the text). There are several sets of Slater-type atomic wavefunctions (Clementi & Roetti, 1974 ▸; Bunge et al., 1993 ▸; Su & Coppens, 1998 ▸; Macchi & Coppens, 2001 ▸; Volkov & Macchi, unpublished work) in use for Inline graphic and parameterization. Any of them can, in principle, be used with the DiSCaMB library, and Clementi & Roetti wavefunctions data are provided (in the case of a hydrogen atom an analytical wavefunction is used).

The last term in equation (5) represents the deformation valence electron density { Inline graphic being normalization factors given by }. The values of the parameters ξ and are predefined; their default values for a specific chemical element in a given electronic configuration may vary from program to program. The DiSCaMB code accepts either user-provided parameters or those used in UBDB (Jarzembska & Dominiak, 2012 ▸). Parameter ξ can, in principle, be l dependent, but the Hansen–Coppens model is mostly used in an l-independent fashion, as also coded in DiSCaMB. The matrix M transforms the position vector into the local coordinate system associated with the atom, and r is the distance from the nucleus. DiSCaMB provides tools for defining local coordinate systems in a similar manner to the one implemented in the XD package (Volkov et al., 2016 ▸). The populations Inline graphic and , and the dimensionless expansion–contraction parameters κ and κ′, can be, in the Hansen–Coppens model, refined against experimental data. However, they are kept fixed during TAAM refinement, and for that reason DiSCaMB does not currently provide functionality for the calculation of derivatives with respect to these parameters. Therefore, it cannot currently be directly used for all-parameter multipolar refinement. A naïve TAAM refinement implementation is included in the DiSCaMB distribution as an example.

The atomic form factor corresponding to the pseudoatom electron density [equation (5)] is given as

where Inline graphic and are terms corresponding to the scattering from the core and the spherically averaged valence electron densities of the following type:

The deformation valence electron density contribution to scattering is given by

DiSCaMB exposes functionality allowing for performing the intermediate steps in the calculation of scattering factors, including those expressed in equations (3)–(9) . This, in principle, may be useful for the construction of an alternative to the Hansen–Coppens model. DiSCaMB makes use of the notion of pseudoatom types. Atoms of the same type have the same values of Inline graphic , and prefactors of the sum over m in – this feature is employed to speed up calculations. A common approach in the and calculation is to precalculate their values at a relatively small set of points and later use these data to interpolate values of and . This approach was not coded in DiSCaMB, as we observed that the use of atom types provides for efficient calculation times in tests for large molecules.

4. Speeding up calculations with parallel computing on CPUs and GPUs

In recent years computers capable of performing parallel calculations have become very common because of the widespread use of multi-core processors in new computers and application of GPUs in general purpose programming (GPGPU). The DiSCaMB library provides an implementation of parallel code for calculation of the structure factors and related derivatives with both CPUs and GPUs. In terms of parallel computing the structure factor calculation may be classified as an embarrassingly parallel problem, which means that it is separable into a number of mostly independent tasks. Parallelization is performed by splitting the input set of h vectors into subsets and performing calculations for each subset as separate parallel tasks. The results from each task are then collected (in the case of the structure factors) and summed (in the case of the derivatives). The strategy for parallelization is more involved in the case of the code for GPUs, as described in Appendix C . The parallelization was performed for shared memory systems (e.g. single computer node) using OpenMP (http://www.openMP.org) for CPUs and CUDA (created by Nvidia, for Nvidia GPUs, http://nvidia.com/cuda) for GPUs.

The computational efficiency of the code was tested on structures ranging from a ten-residue peptide to a relatively large protein (∼380 amino acids) (see Table 1 ▸). The time necessary for a single calculation of the structure factors and their derivatives at the 5/6 Å resolution limit with a single CPU core varied from 0.5 s to about 2 h. Fortunately, it can be efficiently reduced by running the calculations in parallel. The test shows a good scaling efficiency (E) of the code: Inline graphic , where is execution time using k cores and N is the total number of cores. E exceeds 0.97 for all the cases in the table, which translates to a reduction of the execution time nearly by the number of processor cores used. This also suggests that the problem is well suited for further parallelization for distributed memory systems (e.g. multi-node machines like computational clusters). The use of a GPU allowed for a further reduction of the computational time compared to a 14 core processor by a factor of about 2–3 in the case of the largest systems studied.

Table 1. Performance test – the wall time of a single calculation of structure factors and their derivatives at 5/6 Å resolution.

Structure ID code^†	QANQIG	3p4j	1ejg	4q4g	1f8b
Space group	P2₁2₁2₁	P2₁2₁2₁	P2₁	P2₁2₁2₁	I432
No. of atoms	169	586	942	3461	5916
No. of atom types^‡	20	42	40	62	54
No. of reflections^§	6490	23837	31597	153346	458685
Relative computational cost^¶	4.4	55.9	59.5	2122.9	65125.9

Execution wall time
Time units	ms	s	s	s	min:s
Intel Xeon E5-2697 v3
1 core	515	8.7	10.1	310	105:21
4 cores	126	2.18	2.57	78	26:24
14 cores	37	0.63	0.74	22	7:43
NVIDIA Tesla K20	100	0.45	1.28	10.6	3:59
NVIDIA Tesla K80	60	0.3	2.97	7.3	2:53

Open in a new tab

^†

The CSD refcode in the case of QANQIG and the PDB ID in the other cases. QANQIG – 10 residue peptide (Aravinda et al., 2004 ▸); 3p4j – Z-DNA hexamer duplex d(CGCGCG) (2) (Brzezinski et al., 2011 ▸); 1ejg – crambin (Jelsch et al., 2000 ▸); 4q4g – peptidoglycan endopeptidase RipA (Squeglia et al., 2014 ▸); 1f8b – complex of native influenza virus neuraminidase (carbohydrate chains and solvent water molecules removed; Smith et al., 2001 ▸).

^‡

Structures were parametrized with the use of the UBDB bank (Jarzembska & Dominiak, 2012 ▸) and the LSDB program (Volkov et al., 2004 ▸) supported by manual interventions in the case of disordered parts.

^§

The lists of reflection indices were generated with the help of cctbx (Grosse-Kunstleve et al., 2002 ▸).

^¶

Equal to No. of symmetry operations used in calculations times No. of reflections times No. of atoms (in millions).

5. Summary and outlook

We have presented the open-source C++ library DiSCaMB for the calculation of X-ray scattering factors from the Hansen–Coppens multipolar model of crystal electron density. The library is intended to be used in structure refinement with fixed multipolar parameters – e.g. refinement using a pseudoatom databank parameterization. It also provides access to intermediate steps in the calculations, allowing for modifications of the underlying electron density model. The library is designed for both small and large molecules. In the latter case the code is parallelized for both CPUs and GPUs. Performance tests show good scalability.

Future development plans include completing the aspherical atom code part in Fig. 1 ▸ by providing functionality that performs assignment of atom types and the corresponding atomic form factor parameters. Other planned developments include writing code for structure factor calculation with arbitrary externally provided aspherical atomic form factors (this would facilitate, for example, the inclusion of Hirshfeld atom partition based form factors in the refinement code). We also plan to speed up the calculations by further parallelization (targeting distributed memory systems and the utilization of the vector processing capabilities of modern CPUs) and development of algorithms for optimized calculations of the structure factors.

Acknowledgments

Professor Philip Coppens (University at Buffalo) is acknowledged for his invaluable contribution to the project in its early stages and Dr Dominik Gront (University of Warsaw) for his support in the GPU part of the project. This work has utilized computing resources at the PL-Grid Infrastructure and at the University of Białystok in Poland.

Appendix A. Summary of features

Main features. Calculation of multipolar structure factors and their derivatives with respect to structural parameters (position, occupancy and atomic displacement parameters) – various versions suitable for large and small molecules. Parallelized for use with multi-core processors and graphics processing units. Easy access to computational steps required to calculate the structure factors. Algorithms aware of pseudoatom types.

Structure factor calculations. Access to atomic wavefunction data. Conversion of the atomic wavefunction into corresponding spherically averaged density (linear combination of Slaters) for a given electronic configuration. Calculation of the multipolar model form factor with possibility for separate calculation of its individual components. Calculation of IAM form factors and corresponding structure factors [with Waasmaier and Kirfel parameterization (Waasmaier & Kirfel, 1995 ▸) and H-atom parameters taken from cctbx (Grosse-Kunstleve et al., 2002 ▸)]. Local coordinate system calculation.

Structural data handling. Symmetry operations – multiplication, application to vectors, string and matrix notation. Conversions between fractional and Cartesian coordinate systems in direct and reciprocal space, conversions of atomic displacement parameters and positional parameter derivatives.

Utilities. Mathematical utilities – basic three-dimensional algebra and calculations involving spherical harmonics. Utilities for string operations, error handling, performance measurement. Reading Hansen–Coppens model parameters from XD input files generated by LSDB (Volkov et al., 2004 ▸; Jarzembska & Dominiak, 2012 ▸).

Project organization. Build process managed with CMake (Martin & Hoffman, 2015 ▸) for enabling compilation environment of user choice. Code documentation includes Doxygen-generated (van Heesch, 2016 ▸) documentation. Extensive test set [testing against data generated with XD (Volkov et al., 2016 ▸) and cctbx (Grosse-Kunstleve et al., 2002 ▸; Gildea et al., 2011 ▸)]. Examples illustrating usage including very basic implementation of refinement with BFGS method optimizer as implemented in ALGLIB (Bochkanov, 2017 ▸).

Availability. DiSCaMB is released under the MIT. The source code is freely available at the project web site, http://crystal.chem.uw.edu.pl.

Appendix B. Additional information on multipolar structure factor calculation

The expression for the aspherical atom model structure factor can be written in the following way:

The first summation runs over atoms in the asymmetric unit, occ_a is the atomic occupancy and Inline graphic is the multiplicity. The second summation runs over the symmetry operations () which would generate all symmetry-equivalent atoms in the unit cell for an atom in a general position. stands for the temperature factor and for the aspherical form factor.

It is a commonly applied convention in the Hansen–Coppens model to use density-normalized real spherical harmonics indexed with three indices [ Inline graphic )], with the p index being either + or − for m > 0 and no p index for m = 0. Then the sum over m in the deformation valence part is usually written as . We have used the equivalent two-index notation [] with the sign index (p) being directly incorporated into the m index, which now takes also negative values. Formulas for Inline graphic have been calculated and tabularized (Hansen & Coppens, 1978 ▸; Paturle & Coppens, 1988 ▸; Coppens, 1997 ▸; Michael & Volkov, 2015 ▸).

Free-atom core and valence electron densities are combinations of orbital-related electron densities. Such orbitals in a basis of Slater-type functions take the following form:

where Inline graphic are spherical harmonics. Spherical symmeterization of the corresponding electron density consists of its projection onto the space of the fully symmetric function defined on the surface of a sphere, i.e. the space spanned by spherical harmonic :

Appendix C. Details of the implementation for GPU

The GPU implementation is parallelized over reflections (h vectors). Each CUDA thread gets one reflection and computes the inner loops over atoms and symmetry operations. At the end of computation each thread stores its own structure factor F into global memory.

Computation of derivatives (occupancy derivative, three positional derivatives and six atomic displacement parameter derivatives) requires a reduction over all GPU threads to sum the contributions from all reflections (threads). This reduction could be handled in many different ways; for example, we could store all partial results from each thread into global memory and perform the reduction in a separate kernel. This solution requires a lot of global memory and a lot of global memory bandwidth.

A second straightforward approach is to divide this reduction into two steps: first perform partial reduction in each GPU block using shared memory and after that produce the final result for each atom in a separate kernel, adding up all the partial contributions from the blocks. This solution forces us to use synchronization [__syncthreads() function] inside the loop that performs the main computation, which significantly slows the execution. In our implementation we have chosen a different solution which is in between the two approaches described above. We perform a partial reduction for each warp (currently one warp is 32 consecutive threads). This reduction is done in registers using the warp shuffle instructions available in the NVIDIA Kepler architecture. After the reduction only one thread from each warp stores the partial result in global memory. This significantly reduces (32 times) the quantity of data that need to be transferred to global memory and we are not required to synchronize threads within blocks. The final reduction (adding up the contributions from each warp) is performed in a separate kernel and takes approximately 0.5% of the total execution time.

Coefficients Inline graphic for each atom type are stored in the constant memory of a GPU to improve the performance and reduce the required global memory bandwidth. The limited capacity of constant memory (64 kB) allows us to store the required coefficients for 250 atom types.

Funding Statement

This work was funded by Foundation of Polish Science grant POMOST/2010-2/3. NIH grant 1P01 GM063210 . LBNL grant . US Department of Energy grant DE-AC02-05CH11231. Polish National Center for Science grant DEC-2012/04/A/ST5/00609.

References

Afonine, P. V., Grosse-Kunstleve, R. W., Adams, P. D., Lunin, V. Y. & Urzhumtsev, A. (2007). Acta Cryst. D63, 1194–1197. [DOI] [PMC free article] [PubMed]
Aravinda, S., Datta, S., Shamala, N. & Balaram, P. (2004). Angew. Chem. Int. Ed. 43, 6728–6731. [DOI] [PubMed]
Avery, J. & Watson, K. J. (1977). Acta Cryst. A33, 679–680.
Bąk, J. M., Domagała, S., Hübschle, C., Jelsch, C., Dittrich, B. & Dominiak, P. M. (2011). Acta Cryst. A67, 141–153. [DOI] [PubMed]
Bochkanov, S. (2017). ALGLIB, http://www.alglib.net.
Bourhis, L. J., Dolomanov, O. V., Gildea, R. J., Howard, J. A. K. & Puschmann, H. (2015). Acta Cryst. A71, 59–75. [DOI] [PMC free article] [PubMed]
Brzezinski, K. B. A., Brzuszkiewicz, A., Dauter, M., Kubicki, M., Jaskolski, M. & Dauter, Z. (2011). Nucleic Acids Res. 39, 6238–6248. [DOI] [PMC free article] [PubMed]
Bunge, C. F., Barrientos, J. A. & Bunge, A. V. (1993). At. Data Nucl. Data Tables, 53, 113–162.
Capelli, S. C., Bürgi, H.-B., Dittrich, B., Grabowsky, S. & Jayatilaka, D. (2014). IUCrJ, 1, 361–379. [DOI] [PMC free article] [PubMed]
Clementi, E. & Roetti, C. (1974). At. Data Nucl. Data Tables, 14, 177–478.
Coppens, P. (1997). X-ray Charge Densities and Chemical Bonding. Oxford: International Union of Crystallography/Oxford University Press.
Deutsch, M. (1993). J. Appl. Cryst. 26, 683–686.
Dittrich, B., Hübschle, C. B., Luger, P. & Spackman, M. A. (2006). Acta Cryst. D62, 1325–1335. [DOI] [PubMed]
Dittrich, B., Hübschle, C. B., Pröpper, K., Dietrich, F., Stolper, T. & Holstein, J. J. (2013). Acta Cryst. B69, 91–104. [DOI] [PubMed]
Dittrich, B., Koritsánszky, T. & Luger, P. (2004). Angew. Chem. Int. Ed. 43, 2718–2721. [DOI] [PubMed]
Dittrich, B., McKinnon, J. J. & Warren, J. E. (2008). Acta Cryst. B64, 750–759. [DOI] [PubMed]
Dittrich, B., Munshi, P. & Spackman, M. A. (2006). Acta Cryst. C62, o633–o635. [DOI] [PubMed]
Domagała, S., Fournier, B., Liebschner, D., Guillot, B. & Jelsch, C. (2012). Acta Cryst. A68, 337–351. [DOI] [PubMed]
Gildea, R. J., Bourhis, L. J., Dolomanov, O. V., Grosse-Kunstleve, R. W., Puschmann, H., Adams, P. D. & Howard, J. A. K. (2011). J. Appl. Cryst. 44, 1259–1263. [DOI] [PMC free article] [PubMed]
Grosse-Kunstleve, R. W., Sauter, N. K., Moriarty, N. W. & Adams, P. D. (2002). J. Appl. Cryst. 35, 126–136.
Guillot, B., Jelsch, C., Podjarny, A. & Lecomte, C. (2008). Acta Cryst. D64, 567–588. [DOI] [PubMed]
Hansen, N. K. & Coppens, P. (1978). Acta Cryst. A34, 909–921.
Heesch, D. van (2016). Doxygen, http://www.doxygen.org.
Held, J. & van Smaalen, S. (2014). Acta Cryst. D70, 1136–1146. [DOI] [PMC free article] [PubMed]
Hirano, Y., Takeda, K. & Miki, K. (2016). Nature, 534, 281–284. [DOI] [PubMed]
Hirshfeld, F. L. (1971). Acta Cryst. B27, 769–781.
Howard, E. I., Guillot, B., Blakeley, M. P., Haertlein, M., Moulin, M., Mitschler, A., Cousido-Siah, A., Fadel, F., Valsecchi, W. M., Tomizaki, T., Petrova, T., Claudot, J. & Podjarny, A. (2016). IUCrJ, 3, 115–126. [DOI] [PMC free article] [PubMed]
International Union of Crystallography (2012). Acta Cryst. C68, e3–e11.
Jarzembska, K. N. & Dominiak, P. M. (2012). Acta Cryst. A68, 139–147. [DOI] [PubMed]
Jayatilaka, D. & Dittrich, B. (2008). Acta Cryst. A64, 383–393. [DOI] [PubMed]
Jelsch, C., Guillot, B., Lagoutte, A. & Lecomte, C. (2005). J. Appl. Cryst. 38, 38–54.
Jelsch, C., Pichon-Pesme, V., Lecomte, C. & Aubry, A. (1998). Acta Cryst. D54, 1306–1318. [DOI] [PubMed]
Jelsch, C., Teeter, M. M., Lamzin, V., Pichon-Pesme, V., Blessing, R. H. & Lecomte, C. (2000). Proc. Natl Acad. Sci. USA, 97, 3171–3176. [DOI] [PMC free article] [PubMed]
Macchi, P. & Coppens, P. (2001). Acta Cryst. A57, 656–662. [DOI] [PubMed]
Malinska, M. & Dauter, Z. (2016). Acta Cryst. D72, 770–779. [DOI] [PMC free article] [PubMed]
Martin, K. & Hoffman, B. (2015). Mastering CMake: A Cross-Platform Build System. New York: Kitware.
Meyer, B., Guillot, B., Ruiz-Lopez, M. F. & Genoni, A. (2016). J. Chem. Theory Comput. 12, 1052–1067. [DOI] [PubMed]
Meyer, B., Guillot, B., Ruiz-Lopez, M. F., Jelsch, C. & Genoni, A. (2016). J. Chem. Theory Comput. 12, 1068–1081. [DOI] [PubMed]
Michael, J. R. & Volkov, A. (2015). Acta Cryst. A71, 245–249. [DOI] [PubMed]
Muzet, N., Guillot, B., Jelsch, C., Howard, E. & Lecomte, C. (2003). Proc. Natl Acad. Sci. USA, 100, 8742–8747. [DOI] [PMC free article] [PubMed]
Paturle, A. & Coppens, P. (1988). Acta Cryst. A44, 6–7.
Petrícek, V., Dusek, M. & Palatinus, L. (2014). Z. Kristallogr. 229, 345–352.
Pichon-Pesme, V., Lecomte, C. & Lachekar, H. (1995). J. Phys. Chem. 99, 6242–6250.
Pröpper, K., Holstein, J. J., Hübschle, C. B., Bond, C. S. & Dittrich, B. (2013). Acta Cryst. D69, 1530–1539. [DOI] [PubMed]
Restori, R. (1990). Acta Cryst. A46, 150–151.
Sanjuan-Szklarz, W. F., Hoser, A. A., Gutmann, M., Madsen, A. Ø. & Woźniak, K. (2016). IUCrJ, 3, 2052–2525. [DOI] [PMC free article] [PubMed]
Schmidt, A., Jelsch, C., Østergaard, P., Rypniewski, W. & Lamzin, V. S. (2003). J. Biol. Chem. 278, 43357–43362. [DOI] [PubMed]
Schnieders, M. J., Fenn, T. D., Pande, V. S. & Brunger, A. T. (2009). Acta Cryst. D65, 952–965. [DOI] [PMC free article] [PubMed]
Smith, B. J., Colman, P. M., Von Itzstein, M., Danylec, B. & Varghese, J. N. (2001). Protein Sci. 10, 689–696. [DOI] [PMC free article] [PubMed]
Squeglia, F., Ruggiero, A., Romano, M., Vitagliano, L. & Berisio, R. (2014). Acta Cryst. D70, 2295–2300. [DOI] [PubMed]
Stewart, R. F., Bentley, J. & Goodman, B. (1975). J. Chem. Phys. 63, 3786–3793.
Su, Z. & Coppens, P. (1990). J. Appl. Cryst. 23, 71–73.
Su, Z. & Coppens, P. (1998). Acta Cryst. A54, 646–652.
Volkov, A., Li, X., Koritsanszky, T. & Coppens, P. (2004). J. Phys. Chem. A, 108, 4283–4300.
Volkov, A., Macchi, P., Farrugia, L. J., Gatti, C., Mallinson, P., Richter, T. & Koritsanszky, T. (2016). XD2016 – A Computer Program Package for Multipole Refinement, Topological Analysis of Charge Densities and Evaluation of Intermolecular Energies from Experimental and Theoretical Structure Factors. http://www.chem.gla.ac.uk/~louis/xd-home/.
Volkov, A., Messerschmidt, M. & Coppens, P. (2007). Acta Cryst. D63, 160–170. [DOI] [PubMed]
Waasmaier, D. & Kirfel, A. (1995). Acta Cryst. A51, 416–431.
Woińska, M., Grabowsky, S., Dominiak, P. M., Woźniak, K. & Jayatilaka, D. (2016). Sci. Adv. 2, e1600192. [DOI] [PMC free article] [PubMed]
Zarychta, B., Pichon-Pesme, V., Guillot, B., Lecomte, C. & Jelsch, C. (2007). Acta Cryst. A63, 108–125. [DOI] [PubMed]

[bb1] Afonine, P. V., Grosse-Kunstleve, R. W., Adams, P. D., Lunin, V. Y. & Urzhumtsev, A. (2007). Acta Cryst. D63, 1194–1197. [DOI] [PMC free article] [PubMed]

[bb2] Aravinda, S., Datta, S., Shamala, N. & Balaram, P. (2004). Angew. Chem. Int. Ed. 43, 6728–6731. [DOI] [PubMed]

[bb3] Avery, J. & Watson, K. J. (1977). Acta Cryst. A33, 679–680.

[bb4] Bąk, J. M., Domagała, S., Hübschle, C., Jelsch, C., Dittrich, B. & Dominiak, P. M. (2011). Acta Cryst. A67, 141–153. [DOI] [PubMed]

[bb5] Bochkanov, S. (2017). ALGLIB, http://www.alglib.net.

[bb6] Bourhis, L. J., Dolomanov, O. V., Gildea, R. J., Howard, J. A. K. & Puschmann, H. (2015). Acta Cryst. A71, 59–75. [DOI] [PMC free article] [PubMed]

[bb7] Brzezinski, K. B. A., Brzuszkiewicz, A., Dauter, M., Kubicki, M., Jaskolski, M. & Dauter, Z. (2011). Nucleic Acids Res. 39, 6238–6248. [DOI] [PMC free article] [PubMed]

[bb8] Bunge, C. F., Barrientos, J. A. & Bunge, A. V. (1993). At. Data Nucl. Data Tables, 53, 113–162.

[bb9] Capelli, S. C., Bürgi, H.-B., Dittrich, B., Grabowsky, S. & Jayatilaka, D. (2014). IUCrJ, 1, 361–379. [DOI] [PMC free article] [PubMed]

[bb10] Clementi, E. & Roetti, C. (1974). At. Data Nucl. Data Tables, 14, 177–478.

[bb11] Coppens, P. (1997). X-ray Charge Densities and Chemical Bonding. Oxford: International Union of Crystallography/Oxford University Press.

[bb12] Deutsch, M. (1993). J. Appl. Cryst. 26, 683–686.

[bb13] Dittrich, B., Hübschle, C. B., Luger, P. & Spackman, M. A. (2006). Acta Cryst. D62, 1325–1335. [DOI] [PubMed]

[bb14] Dittrich, B., Hübschle, C. B., Pröpper, K., Dietrich, F., Stolper, T. & Holstein, J. J. (2013). Acta Cryst. B69, 91–104. [DOI] [PubMed]

[bb15] Dittrich, B., Koritsánszky, T. & Luger, P. (2004). Angew. Chem. Int. Ed. 43, 2718–2721. [DOI] [PubMed]

[bb16] Dittrich, B., McKinnon, J. J. & Warren, J. E. (2008). Acta Cryst. B64, 750–759. [DOI] [PubMed]

[bb17] Dittrich, B., Munshi, P. & Spackman, M. A. (2006). Acta Cryst. C62, o633–o635. [DOI] [PubMed]

[bb18] Domagała, S., Fournier, B., Liebschner, D., Guillot, B. & Jelsch, C. (2012). Acta Cryst. A68, 337–351. [DOI] [PubMed]

[bb19] Gildea, R. J., Bourhis, L. J., Dolomanov, O. V., Grosse-Kunstleve, R. W., Puschmann, H., Adams, P. D. & Howard, J. A. K. (2011). J. Appl. Cryst. 44, 1259–1263. [DOI] [PMC free article] [PubMed]

[bb20] Grosse-Kunstleve, R. W., Sauter, N. K., Moriarty, N. W. & Adams, P. D. (2002). J. Appl. Cryst. 35, 126–136.

[bb21] Guillot, B., Jelsch, C., Podjarny, A. & Lecomte, C. (2008). Acta Cryst. D64, 567–588. [DOI] [PubMed]

[bb22] Hansen, N. K. & Coppens, P. (1978). Acta Cryst. A34, 909–921.

[bb23] Heesch, D. van (2016). Doxygen, http://www.doxygen.org.

[bb24] Held, J. & van Smaalen, S. (2014). Acta Cryst. D70, 1136–1146. [DOI] [PMC free article] [PubMed]

[bb25] Hirano, Y., Takeda, K. & Miki, K. (2016). Nature, 534, 281–284. [DOI] [PubMed]

[bb26] Hirshfeld, F. L. (1971). Acta Cryst. B27, 769–781.

[bb27] Howard, E. I., Guillot, B., Blakeley, M. P., Haertlein, M., Moulin, M., Mitschler, A., Cousido-Siah, A., Fadel, F., Valsecchi, W. M., Tomizaki, T., Petrova, T., Claudot, J. & Podjarny, A. (2016). IUCrJ, 3, 115–126. [DOI] [PMC free article] [PubMed]

[bb28] International Union of Crystallography (2012). Acta Cryst. C68, e3–e11.

[bb29] Jarzembska, K. N. & Dominiak, P. M. (2012). Acta Cryst. A68, 139–147. [DOI] [PubMed]

[bb30] Jayatilaka, D. & Dittrich, B. (2008). Acta Cryst. A64, 383–393. [DOI] [PubMed]

[bb31] Jelsch, C., Guillot, B., Lagoutte, A. & Lecomte, C. (2005). J. Appl. Cryst. 38, 38–54.

[bb32] Jelsch, C., Pichon-Pesme, V., Lecomte, C. & Aubry, A. (1998). Acta Cryst. D54, 1306–1318. [DOI] [PubMed]

[bb33] Jelsch, C., Teeter, M. M., Lamzin, V., Pichon-Pesme, V., Blessing, R. H. & Lecomte, C. (2000). Proc. Natl Acad. Sci. USA, 97, 3171–3176. [DOI] [PMC free article] [PubMed]

[bb34] Macchi, P. & Coppens, P. (2001). Acta Cryst. A57, 656–662. [DOI] [PubMed]

[bb35] Malinska, M. & Dauter, Z. (2016). Acta Cryst. D72, 770–779. [DOI] [PMC free article] [PubMed]

[bb36] Martin, K. & Hoffman, B. (2015). Mastering CMake: A Cross-Platform Build System. New York: Kitware.

[bb37] Meyer, B., Guillot, B., Ruiz-Lopez, M. F. & Genoni, A. (2016). J. Chem. Theory Comput. 12, 1052–1067. [DOI] [PubMed]

[bb38] Meyer, B., Guillot, B., Ruiz-Lopez, M. F., Jelsch, C. & Genoni, A. (2016). J. Chem. Theory Comput. 12, 1068–1081. [DOI] [PubMed]

[bb39] Michael, J. R. & Volkov, A. (2015). Acta Cryst. A71, 245–249. [DOI] [PubMed]

[bb40] Muzet, N., Guillot, B., Jelsch, C., Howard, E. & Lecomte, C. (2003). Proc. Natl Acad. Sci. USA, 100, 8742–8747. [DOI] [PMC free article] [PubMed]

[bb41] Paturle, A. & Coppens, P. (1988). Acta Cryst. A44, 6–7.

[bb42] Petrícek, V., Dusek, M. & Palatinus, L. (2014). Z. Kristallogr. 229, 345–352.

[bb43] Pichon-Pesme, V., Lecomte, C. & Lachekar, H. (1995). J. Phys. Chem. 99, 6242–6250.

[bb44] Pröpper, K., Holstein, J. J., Hübschle, C. B., Bond, C. S. & Dittrich, B. (2013). Acta Cryst. D69, 1530–1539. [DOI] [PubMed]

[bb45] Restori, R. (1990). Acta Cryst. A46, 150–151.

[bb46] Sanjuan-Szklarz, W. F., Hoser, A. A., Gutmann, M., Madsen, A. Ø. & Woźniak, K. (2016). IUCrJ, 3, 2052–2525. [DOI] [PMC free article] [PubMed]

[bb47] Schmidt, A., Jelsch, C., Østergaard, P., Rypniewski, W. & Lamzin, V. S. (2003). J. Biol. Chem. 278, 43357–43362. [DOI] [PubMed]

[bb48] Schnieders, M. J., Fenn, T. D., Pande, V. S. & Brunger, A. T. (2009). Acta Cryst. D65, 952–965. [DOI] [PMC free article] [PubMed]

[bb49] Smith, B. J., Colman, P. M., Von Itzstein, M., Danylec, B. & Varghese, J. N. (2001). Protein Sci. 10, 689–696. [DOI] [PMC free article] [PubMed]

[bb50] Squeglia, F., Ruggiero, A., Romano, M., Vitagliano, L. & Berisio, R. (2014). Acta Cryst. D70, 2295–2300. [DOI] [PubMed]

[bb51] Stewart, R. F., Bentley, J. & Goodman, B. (1975). J. Chem. Phys. 63, 3786–3793.

[bb52] Su, Z. & Coppens, P. (1990). J. Appl. Cryst. 23, 71–73.

[bb53] Su, Z. & Coppens, P. (1998). Acta Cryst. A54, 646–652.

[bb54] Volkov, A., Li, X., Koritsanszky, T. & Coppens, P. (2004). J. Phys. Chem. A, 108, 4283–4300.

[bb55] Volkov, A., Macchi, P., Farrugia, L. J., Gatti, C., Mallinson, P., Richter, T. & Koritsanszky, T. (2016). XD2016 – A Computer Program Package for Multipole Refinement, Topological Analysis of Charge Densities and Evaluation of Intermolecular Energies from Experimental and Theoretical Structure Factors. http://www.chem.gla.ac.uk/~louis/xd-home/.

[bb56] Volkov, A., Messerschmidt, M. & Coppens, P. (2007). Acta Cryst. D63, 160–170. [DOI] [PubMed]

[bb57] Waasmaier, D. & Kirfel, A. (1995). Acta Cryst. A51, 416–431.

[bb58] Woińska, M., Grabowsky, S., Dominiak, P. M., Woźniak, K. & Jayatilaka, D. (2016). Sci. Adv. 2, e1600192. [DOI] [PMC free article] [PubMed]

[bb59] Zarychta, B., Pichon-Pesme, V., Guillot, B., Lecomte, C. & Jelsch, C. (2007). Acta Cryst. A63, 108–125. [DOI] [PubMed]

PERMALINK

DiSCaMB: a software library for aspherical atom model X-ray scattering factor calculations with CPUs and GPUs

Michał L Chodkiewicz

Szymon Migacz

Witold Rudnicki

Anna Makal

Jarosław A Kalinowski

Nigel W Moriarty

Ralf W Grosse-Kunstleve

Pavel V Afonine

Paul D Adams

Paulina Maria Dominiak

Abstract

1. Introduction

2. General considerations

Figure 1.

3. Multipolar model structure factors – theory and implementation

4. Speeding up calculations with parallel computing on CPUs and GPUs

Table 1. Performance test – the wall time of a single calculation of structure factors and their derivatives at 5/6 Å resolution.

5. Summary and outlook

Acknowledgments

Appendix A. Summary of features

Appendix B. Additional information on multipolar structure factor calculation

Appendix C. Details of the implementation for GPU

Funding Statement

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

DiSCaMB: a software library for aspherical atom model X-ray scattering factor calculations with CPUs and GPUs

Michał L Chodkiewicz

Szymon Migacz

Witold Rudnicki

Anna Makal

Jarosław A Kalinowski

Nigel W Moriarty

Ralf W Grosse-Kunstleve

Pavel V Afonine

Paul D Adams

Paulina Maria Dominiak

Abstract

1. Introduction

2. General considerations

Figure 1.

3. Multipolar model structure factors – theory and implementation

4. Speeding up calculations with parallel computing on CPUs and GPUs

Table 1. Performance test – the wall time of a single calculation of structure factors and their derivatives at 5/6 Å resolution.

5. Summary and outlook

Acknowledgments

Appendix A. Summary of features

Appendix B. Additional information on multipolar structure factor calculation

Appendix C. Details of the implementation for GPU

Funding Statement

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases