Efficient evaluation of three-center Coulomb integrals

Gyula Samu; Mihály Kállay

doi:10.1063/1.4983393

. 2017 May 22;146(20):204101. doi: 10.1063/1.4983393

Efficient evaluation of three-center Coulomb integrals

Gyula Samu ^1,^a), Mihály Kállay ^1,^b)

PMCID: PMC5440237 PMID: 28571354

Abstract

In this study we pursue the most efficient paths for the evaluation of three-center electron repulsion integrals (ERIs) over solid harmonic Gaussian functions of various angular momenta. First, the adaptation of the well-established techniques developed for four-center ERIs, such as the Obara–Saika, McMurchie–Davidson, Gill–Head-Gordon–Pople, and Rys quadrature schemes, and the combinations thereof for three-center ERIs is discussed. Several algorithmic aspects, such as the order of the various operations and primitive loops as well as prescreening strategies, are analyzed. Second, the number of floating point operations (FLOPs) is estimated for the various algorithms derived, and based on these results the most promising ones are selected. We report the efficient implementation of the latter algorithms invoking automated programming techniques and also evaluate their practical performance. We conclude that the simplified Obara–Saika scheme of Ahlrichs is the most cost-effective one in the majority of cases, but the modified Gill–Head-Gordon–Pople and Rys algorithms proposed herein are preferred for particular shell triplets. Our numerical experiments also show that even though the solid harmonic transformation and the horizontal recurrence require significantly fewer FLOPs if performed at the contracted level, this approach does not improve the efficiency in practical cases. Instead, it is more advantageous to carry out these operations at the primitive level, which allows for more efficient integral prescreening and memory layout.

I. INTRODUCTION

Electron repulsion integrals (ERIs), which describe the Coulomb interaction of two charge distributions, are one of the basic quantities in quantum chemistry. In conventional formulations, these are four-center integrals defined as

(ϕ_{A} ϕ_{B} | ϕ_{C} ϕ_{D}) = \int \int \frac{ϕ_{A} (𝐫_{𝟏}) ϕ_{B} (𝐫_{𝟏}) ϕ_{C} (𝐫_{𝟐}) ϕ_{D} (𝐫_{𝟐})}{| 𝐫_{𝟏} - 𝐫_{𝟐} |} d 𝐫_{𝟏} d 𝐫_{𝟐}

(1)

for basis functions $ϕ_{A}$ , $ϕ_{B}$ , $ϕ_{C}$ , and $ϕ_{D}$ with r₁ and r₂ being the coordinates of the electrons. The evaluation of such integrals is often the limiting step for Hartree–Fock (HF) and density functional theory (DFT) calculations, while their transformation from the atomic orbital (AO) to the molecular orbital (MO) basis can be a bottleneck for correlated methods. The computational requirements for both of these tasks can be efficiently reduced by invoking the density fitting (DF) approximation, which is equivalent to the resolution of identity technique if the so-called Coulomb metric is used.^1–5 In this approach, the generalized electron densities given by the product of two basis functions are expanded in an auxiliary (fitting) basis in a manner that minimizes the error of the electric field generated by the charge distributions^3,4 as

(ϕ_{A} ϕ_{B} | ϕ_{C} ϕ_{D}) \approx \sum_{Q, R} (ϕ_{A} ϕ_{B} | ρ_{Q}) V_{Q R}^{- 1} (ρ_{R} | ϕ_{C} ϕ_{D}),

(2)

where $ρ_{Q}$ and $ρ_{R}$ denote functions from the fitting basis, $V_{Q R}^{- 1}$ is the element of the inverse of the matrix containing the two-center ERIs $(ρ_{Q} | ρ_{R})$ , and $(ϕ_{A} ϕ_{B} | ρ_{Q})$ is a three-center Coulomb integral. The main advantage of applying this approximation is that the $O (N^{4})$ scaling of evaluating and processing the ERIs breaks down to $O (N^{2} M)$ with N (M) being the size of the AO (auxiliary) basis, and the calculation of these integrals over a reduced number of Gaussian basis functions is also considerably simpler than that for the four-center ones. When dealing with large systems, even the necessary three-center ERIs become too numerous to store on a disk or it is more advantageous to recalculate them since the sparsity of the integrals can be efficiently utilized with prescreening techniques. These observations had led to the development of integral-direct algorithms, where the ERIs are recalculated whenever they are needed, e.g., in each cycle of a direct self-consistent field (SCF) procedure^6–8 or for the overlapping domains in a local correlation calculation.⁹ The efficiency of such algorithms obviously depends on the speed of the integral evaluation.

For the evaluation of four-center ERIs, several efficient schemes have been constructed. The oldest of the still popular methods is the one developed by King, Dupuis, and Rys,^10–16 commonly referred to as the Rys quadrature scheme, which is a Gaussian quadrature based technique for the evaluation of integrals containing functions with arbitrary angular momenta. Other methods are mainly based on recurrence relations using scaled Boys functions¹⁷ as their starting values. The scheme of McMurchie and Davidson¹⁸ (MD) utilizes the fact that Cartesian Gaussian overlap distributions can be written in terms of Hermite Gaussian functions and also that the two-center Hermite integrals necessary for this expansion can be reduced to one-center ones. Later, Obara and Saika^19,20 (OS) presented their method based on recurrence relations connecting auxiliary integrals of various angular momenta. Their scheme arguably remains the most widely used one, due to the subsequent introduction of the horizontal recurrence relation^21–23 (HRR) by Head-Gordon and Pople and the electron transfer relation²⁴ (ETR) by Hamilton and Schaefer. The latter recurrence was also presented by Lindh, Riu, and Liu utilizing the close relationship between the OS and the Rys quadrature schemes, and these authors also developed the reduced multiplication scheme by combining the Rys quadrature approach with the ETR and the HRR.^25,26 Gill and co-workers,^27–34 amongst other contributions, achieved a synthesis of the OS and the MD methods by moving the transformation of Hermite integrals into integrals over Cartesian overlap distributions to the contracted level by properly scaling the intermediate one-center integrals, resulting in a scheme that is very efficient for integrals over highly contracted functions.

Concerning the evaluation of three-center integrals, fewer studies can be found in the literature. Köster, exploiting the uncontracted nature of the auxiliary basis sets, combined the OS, MD, and Gill–Head-Gordon–Pople (GHP) algorithms for three-center ERIs over Cartesian Gaussians.³⁵ Later he also proposed the use of Hermite Gaussian auxiliary functions,^36,37 which saves the transformation from Hermite to Cartesian functions in an MD scheme. Reine, Tellgren, and Helgaker^38,39 showed that Hermite Gaussians transform into solid harmonic ones exactly the same way as Cartesian Gaussians do, and utilizing this finding these authors also put forward a scheme for the evaluation of three-center integrals over solid harmonic Gaussians which avoids the Hermite to Cartesian transformation. A remarkable improvement on the OS scheme for solid harmonic three-center ERIs was achieved by Ahlrichs,⁴⁰ who realized that the recurrence relation for the build-up of angular momentum on the fitting function greatly simplifies for three-center integrals. Efficient three-center ERI implementations can be found in the Libint library of Valeev^41,42 and the adaptive integral core code of Knizia,⁴³ who both applied the results of Ahlrichs.

It is also important to mention here that there exist several approaches that employ the DF approximation but at least partly avoid the explicit construction of the three-center ERI lists. The so-called J-engine and the related schemes exploit the structure of the Coulomb term in a direct SCF calculation, and instead of performing the relatively expensive recursions and transformations for the ERIs the reverse operations are carried out for the quantities by which the ERIs are multiplied.^36,44,45 These algorithms are particularly useful for Kohn–Sham SCF calculations where the significantly more costly exact exchange term is not computed, but efficient DF HF and hybrid DFT algorithms can also be designed if the J-engine approaches are combined with low-cost schemes for the evaluation of the Fock exchange.^9,46–51 A further possibility for the reduction of the costs of DF SCF calculations is to approximate far-field ERIs invoking asymptotic or multipole expansions and to evaluate only the near-field integrals analytically.^52–54 Nonetheless, there are numerous applications where the explicit evaluation of the three-center ERIs cannot be avoided. For the evaluation of the Fock exchange in a DF SCF calculation or for any correlated calculation employing the DF approximation, at least one AO index of the three-center integrals must be transformed to the MO basis, and, to the best of our knowledge, there exist no algorithms that use similar tricks as the J-engine scheme. In the above cases, at least the near-field three-center integrals must be computed, which requires a considerable computation time, especially with basis sets including functions of high angular momentum. Thus, the cost-effective evaluation of three-center Coulomb integrals is of utmost importance for DF methods.

The aim of this paper is to find the most efficient route for the evaluation of three-center ERIs over solid harmonic Gaussian functions of various angular momenta. We compare the OS, MD, GHP, and Rys quadrature schemes and their combinations and discuss several algorithmic aspects for the evaluation of three-center ERIs. In Sec. II the adaptation of the aforementioned methods for the evaluation of three-center ERIs is presented. The given equations form the basis for the estimation of the floating point operations (FLOPs) required by the various approaches, detailed in Sec. III. The implementation of the schemes with the lowest theoretical FLOP counts along with various prescreening strategies and orders of the operations is discussed in Sec. IV, and the comparison of practical performances is done in Sec. V. Finally, in Sec. VI the efficiency of our implementation is demonstrated by calculating the ERIs for medium to large systems.

II. THEORY

A. Three-center Coulomb integrals

In this work we are concerned with the evaluation of three-center ERIs over contracted solid harmonic Gaussian basis functions, which are gained by linear transformations of integrals over unnormalized primitive Cartesian Gaussian basis functions. These functions are defined as

G_{𝐼𝐽𝐾} (𝐫, a, 𝐀) = x_{A}^{I} y_{A}^{J} z_{A}^{K} \exp (- a r_{A}^{2}),

(3)

where r denotes the position vector of the electron, A is the position of the nucleus on which the function is centered, a is a constant Gaussian exponent, and r_A is the magnitude of the vector r_A = r − A with x_A being the x component of r_A. L = I + J + K will be called the angular momentum of G_IJK, and the vector L = (I, J, K) will be referred to as the angular momentum vector of G_IJK. Functions with the same center, exponent, and angular momentum constitute a shell with (L + 1) (L + 2)/2 components. The primitive Gaussians are separable in the three Cartesian directions, that is, G_IJK = G_IG_JG_K, where, for instance, $G_{𝐼} = x_{A}^{I} \exp (- a x_{A}^{2})$ . They also obey the following recurrence relation for differentiation with respect to a nuclear coordinate (given here for the x direction only):

\frac{\partial G_{𝐼}}{\partial A_{x}} = 2 a G_{𝐼 + 1} - 𝐼 G_{𝐼 - 1},

(4)

where A_x is the x component of A.

For solid harmonic Gaussian functions, one needs to combine functions with the same exponent, angular momentum, and center, but different angular momentum vectors as

G_{𝐿𝑚} (𝐫, a, 𝐀) = \sum_{I+J+K = L} 𝐶_{IJK}^{Lm} G_{𝐼𝐽𝐾} (𝐫, a, 𝐀) .

(5)

A shell of solid harmonic Gaussians consists of functions with $0 \leq | m | \leq L$ , having 2L + 1 components. The $𝐶_{IJK}^{Lm}$ coefficients in Eq. (5) only depend on the angular momentum vector and the value of L and m.¹⁷

We obtain contracted Gaussians by linearly combining functions with different exponents a but the same angular momentum vector and center,

χ_{A L m} (𝐫, 𝐀) = \sum_{a} G_{L m} (𝐫, a, 𝐀) d_{a χ_{A}},

(6)

where the contraction coefficients $d_{a χ_{A}}$ also include the norm of the solid harmonic Gaussian function and are the same for a given shell. Of course, the transformation given by Eq. (5) can also be applied to integrals in the contracted basis and the one defined by Eq. (6) to the integrals in the primitive Cartesian basis as well.

Three-center ERIs over primitive Gaussian functions are defined as

(𝑳_{a} 𝑳_{b} | 𝐿_{c}) = \int \int \frac{G_{I_{a} J_{a} K_{a}} (𝐫_{𝟏}, a, 𝐀) G_{I_{b} J_{b} K_{b}} (𝐫_{𝟏}, b, 𝐁) G_{I_{c} J_{c} K_{c}} (𝐫_{𝟐}, c, 𝐂)}{| 𝐫_{𝟏} - 𝐫_{𝟐} |} d 𝐫_{𝟏} d 𝐫_{𝟐},

(7)

where L_a = (I_a, J_a, K_a) stands for the angular momentum vector and L_a = I_a + J_a + K_a is the angular momentum of the function with exponent a. From these, integrals over solid harmonic contracted Gaussians are computed by applying Eqs. (5) and (6) in an arbitrary order on the three centers. We will refer to primitive integrals sharing angular momenta L_a, L_b, and L_c, centers A, B, and C, and exponents a, b, and c as a primitive class, e.g., the class (11|1) consists of 27 primitive integrals. Similarly, the members of contracted classes are integrals over contracted Gaussians of the same angular momenta and centers. A shell triplet will refer to all the integrals over solid harmonic Gaussians belonging to centers A, B, and C and angular momenta L_a, L_b, and L_c.

An important special case is the primitive integral where L_a = L_b = L_c = 0, the value of which can be expressed directly⁴⁰ as

{(𝟎𝟎 | 𝟎)}^{(0)} = (𝟎𝟎 | 𝟎) = θ_{p c} κ_{a b} F_{0} (α R_{PC}^{2}),

(8)

with

\begin{matrix} κ_{a b} & = & \exp (- μ R_{AB}^{2}), \\ μ & = & \frac{a b}{a + b}, \\ 𝐑_{AB} & = & 𝐀 - 𝐁, \\ p & = & a + b, \end{matrix} \begin{matrix} θ_{p c} & = & \frac{2 π^{5 / 2}}{p c \sqrt{p + c}}, \\ 𝐏 & = & \frac{a 𝐀 + b 𝐁}{p}, \\ 𝐑_{PC} & = & 𝐏 - 𝐂, \\ α & = & \frac{p c}{p + c}, \end{matrix}

and F_n being the Boys function of order n, defined as

F_{n} (x) = \int_{0}^{1} t^{2 n} \exp (- x t^{2}) d t .

(9)

The integral in Eq. (8) and also other auxiliary integrals where the order of the Boys function is greater than 0 are the starting points for the OS,¹⁹ MD,¹⁸ and GHP²⁷ schemes for the evaluation of the integrals in Eq. (7) for arbitrary angular momenta.

B. Obara–Saika recursion

The OS scheme utilizes recurrence relations of auxiliary intermediate integrals to construct the true ERIs with the desired angular momenta. An efficient application of this method to three-center ERIs was presented by Ahlrichs.⁴⁰ This approach will be referred to as OS1. The first step here is to evaluate the required auxiliary integrals

{(𝟎𝟎 | 𝟎)}^{(n)} = θ_{p c} κ_{a b} F_{n} (α R_{PC}^{2})

(10)

for $L_{c} \leq n \leq L_{a} + L_{b} + L_{c}$ . Then the vertical recurrence relation¹⁹ (VRR) is used to increment the angular momentum of the first function on the bra side (given here for the x direction) as

{([𝒍_{a} + 𝟏_{x}] 𝟎 | 𝟎)}^{(n)} = X_{PA} {(𝑙_{a} 𝟎 | 𝟎)}^{(n)} - \frac{α}{p} X_{PC} {(𝒍_{a} 𝟎 | 𝟎)}^{(n + 1)} + \frac{i_{a}}{2 p} \times ({([𝒍_{a} - 𝟏_{x}] 𝟎 | 𝟎)}^{(n)} - \frac{α}{p} {([𝒍_{a} - 𝟏_{x}] 𝟎 | 𝟎)}^{(n + 1)}),

(11)

where X_PA is the x component of vector R_PA, and generally $𝟏_{σ} = (δ_{σ, x}, δ_{σ, y}, δ_{σ, z})$ for $σ = x, y, z$ . Here and later, l_a, i_a, and l_a refer to the angular momentum, its x component, and the angular momentum vector of the first Gaussian in the intermediate integrals, respectively, and a similar notation will be used for the angular momenta of the second and third functions and their components. With Eq. (11), the classes ${(𝒍_{a} 𝟎 | 𝟎)}^{(L_{c})}$ are calculated for $max (1, L_{a} - L_{c}) \leq l_{a} \leq L_{a} + L_{b}$ . Next, in the case where solid harmonic basis functions are supposed to be on the ket side, l_c can be built up by a two-term VRR,⁴⁰

{(𝒍_{a} 𝟎 | [𝒍_{c} + 𝟏_{x}])}^{(n)} = \frac{α}{c} X_{PC} {(𝒍_{a} 𝟎 | 𝒍_{c})}^{(n + 1)} + \frac{i_{a}}{2 (p + c)} {([𝒍_{a} - 𝟏_{x}] 𝟎 | 𝒍_{c})}^{(n + 1)} .

(12)

Eq. (12) is used to produce (l_a0|L_c)⁽⁰⁾ classes for $L_{a} \leq l_{a} \leq L_{a} + L_{b}$ . From here on, superscript (n) will be dropped when it is equal to 0. The last step is to increment l_b, which is efficiently done by the HRR of Head-Gordon and Pople,²¹

(𝒍_{a} [𝒍_{b} + 𝟏_{x}] | 𝒍_{c}) = ([𝑙_{a} + 𝟏_{x}] 𝒍_{b} | 𝑙_{c}) + X_{AB} (𝑙_{a} 𝑙_{b} | 𝑙_{c}) .

(13)

Besides the above algorithm, there are at least three other possibilities to get the target integrals with OS-type recursions. The first one, labeled as OS2, evaluates the same auxiliary integrals with Eq. (10) as in OS1 and then applies the VRR to the ket side first as

{(𝟎𝟎 | [𝒍_{c} + 𝟏_{x}])}^{(n)} = \frac{α}{c} X_{PC} {(𝟎𝟎 | 𝒍_{c})}^{(n + 1)}

(14)

to construct the classes (00|l_c)⁽ⁿ⁾ for max $(1, L_{c} - L_{a} - L_{b}) \leq l_{c} \leq L_{c}$ and $L_{c} - l_{c} \leq n \leq L_{a} + L_{b}$ . This is followed by building up the angular momentum of the first function on the bra side as

{([𝒍_{a} + 𝟏_{x}] 𝟎 | 𝑙_{c})}^{(n)} = X_{PA} {(𝑙_{a} 𝟎 | 𝑙_{c})}^{(n)} - \frac{α}{p} X_{PC} {(𝑙_{a} 𝟎 | 𝑙_{c})}^{(n + 1)} + \frac{i_{a}}{2 p} ({([𝑙_{a} - 𝟏_{x}] 𝟎 | 𝑙_{c})}^{(n)} - \frac{α}{p} {([𝑙_{a} - 𝟏_{x}] 𝟎 | 𝑙_{c})}^{(n + 1)}) + \frac{i_{c}}{2 (p + c)} {(𝒍_{a} 𝟎 | [𝑙_{c} - 𝟏_{x}])}^{(n + 1)},

(15)

to compute (l_a0|L_c) for $L_{a} \leq l_{a} \leq L_{a} + L_{b}$ , and finally the algorithm is finished with Eq. (13).

Apart from the VRR, another way to build up l_a or l_c is to use the ETR²⁴ arising from the translational invariance of integrals, and also Eqs. (4) and (13). For three-center ERIs, the ETR has the form

([𝒍_{a} + 𝟏_{x}] 𝟎 | 𝑙_{c}) = - \frac{b}{p} X_{AB} (𝑙_{a} 𝟎 | 𝑙_{c}) + \frac{i_{a}}{2 p} ([𝑙_{a} - 𝟏_{x}] 𝟎 | 𝑙_{c}) - \frac{c}{p} (𝑙_{a} 𝟎 | [𝑙_{c} + 𝟏_{x}]) + \frac{i_{c}}{2 p} (𝑙_{a} 𝟎 | [𝑙_{c} - 𝟏_{x}])

(16)

for the $𝒍_{c} \to 𝑙_{a}$ conversion, and

(𝒍_{a} 𝟎 | [𝑙_{c} + 𝟏_{x}]) = - \frac{b}{c} X_{AB} (𝑙_{a} 𝟎 | 𝑙_{c}) + \frac{i_{a}}{2 c} ([𝑙_{a} - 𝟏_{x}] 𝟎 | 𝑙_{c}) - \frac{p}{c} ([𝑙_{a} + 𝟏_{x}] 𝟎 | 𝑙_{c})

(17)

for the $𝑙_{a} \to 𝑙_{c}$ transfer. We note that, in principle, Eq. (17) also contains a fourth term on the right-hand side, i_c/2c(l_a0|[l_c − 1_x]), but this term is canceled for the same reasons as discussed by Ahlrichs for the VRR⁴⁰ when transforming to the solid harmonic basis. This cancellation also takes place for the third and fourth terms in both Eqs. (11) and (15), and the second term in Eq. (16), but only in the case when L_b = 0. It should be noted that the numerical instability in the ETR associated with the addition pX_PA/(c + d) + X_QC⁵⁵ (where, in the four-center case, d is the exponent of the fourth Gaussian and Q = (cC + dD)/(c + d), D being the center of the fourth function) does not appear here. This is because in the absence of the fourth center, Eq. (13) only has to be applied to the bra side, reducing the aforementioned sum to p/cX_PA = −b/cX_AB. If we wish to build up the integrals necessary for Eq. (13) with Eq. (16), we cannot use Eq. (14) for the construction of the (00|l_c) type classes, instead we have to employ the full vertical recurrence,⁴⁰

{(𝟎𝟎 | [𝑙_{c} + 𝟏_{x}])}^{(n)} = \frac{α}{c} X_{PC} {(𝟎𝟎 | 𝑙_{c})}^{(n + 1)} + \frac{i_{c}}{2 c} ({(𝟎𝟎 | [𝑙_{c} - 𝟏_{x}])}^{(n)} - \frac{α}{c} {(𝟎𝟎 | [𝑙_{c} - 𝟏_{x}])}^{(n + 1)}),

(18)

for the ket side. The terms corresponding to the ones in the big parentheses in Eq. (18) vanish in Eqs. (12) and (14) during the solid harmonic transformation⁴⁰ of the ket side; however, with Eq. (16) terms belonging to angular momenta other than l_c get built into the integrals to be transformed, and these will not cancel. The scheme where we first employ Eq. (18) to build up l_c and then Eq. (16) for l_a will be referred to as OS3. In this route, we first use Eq. (10) to calculate the (00|0)⁽ⁿ⁾ integrals for $[L_{c} mod 2] \leq n \leq L_{a} + L_{b} + L_{c}$ , then Eq. (18) for the classes (00|L_c) with max $(L_{c} - L_{a} - L_{b}, 1) \leq l_{c} \leq L_{a} + L_{b} + L_{c}$ , thereafter we apply Eq. (16) to get the (l_a0|L_c) classes for $L_{a} \leq l_{a} \leq L_{a} + L_{b}$ . Finally, in the algorithm denoted as OS4, l_a is built up by Eq. (11), and l_c is incremented by the ETR, Eq. (17). Here the necessary (00|0)⁽ⁿ⁾ integrals are in the range $0 \leq n \leq L_{a} + L_{b} + L_{c}$ and are used to calculate the (l_a0|0) classes for max $(L_{a} - L_{c}, 1) \leq l_{a} \leq L_{a} + L_{b} + L_{c}$ .

C. McMurchie–Davidson scheme

The strategy of the MD method is to expand ERIs over Gaussian overlap distributions arising from multiplying $G_{I_{a} J_{a} K_{a}} (𝐫_{𝟏}, a, 𝐀)$ and $G_{I_{b} J_{b} K_{b}} (𝐫_{𝟏}, b, 𝐁)$ into integrals over Hermite Gaussian functions centered on P, defined as

H_{{\bar{I}}_{p} {\bar{J}}_{p} {\bar{K}}_{p}} (𝐫_{𝟏}, p, 𝐏) = \frac{\partial^{{\bar{L}}_{p}} \exp (- p r_{P}^{2})}{\partial P_{x}^{{\bar{I}}_{p}} \partial P_{y}^{{\bar{J}}_{p}} \partial P_{z}^{{\bar{K}}_{p}}},

(19)

where the bars over the total angular momentum and its components are used to distinguish from the corresponding Cartesian Gaussians. In this scheme, one has to evaluate two-center Coulomb integrals over Hermite Gaussians centered on P and C, which, exploiting translational invariance (that is, $\partial / \partial P_{x} = - \partial / \partial C_{x}$ ), can be written as¹⁸

\begin{matrix} ({\bar{𝑙}}_{p} | {\bar{𝑙}}_{c}) & = θ_{p c} {(2 c)}^{- {\bar{l}}_{c}} \frac{\partial^{{\bar{l}}_{p} + {\bar{l}}_{c}} F_{0} (α R_{PC}^{2})}{\partial P_{x}^{{\bar{i}}_{p}} \partial P_{y}^{{\bar{j}}_{p}} \partial P_{z}^{{\bar{k}}_{p}} \partial C_{x}^{{\bar{i}}_{c}} \partial C_{y}^{{\bar{j}}_{c}} \partial C_{z}^{{\bar{k}}_{c}}} \\ = θ_{p c} {(- 2 c)}^{- {\bar{l}}_{c}} \frac{\partial^{{\bar{l}}_{p} + {\bar{l}}_{c}} F_{0} (α R_{PC}^{2})}{\partial P_{x}^{{\bar{i}}_{p} + {\bar{i}}_{c}} \partial P_{y}^{{\bar{j}}_{p} + {\bar{j}}_{c}} \partial P_{z}^{{\bar{k}}_{p} + {\bar{k}}_{c}}} = ({\bar{𝑙}}_{u}) \end{matrix}

(20)

with ${\bar{𝑙}}_{u} = {\bar{𝑙}}_{p} + {\bar{𝑙}}_{c}$ . The scaling with ${(2 c)}^{- {\bar{l}}_{c}}$ is applied since for the Hermite Gaussian in the ket we follow the definition of Reine and co-workers,³⁸ which will allow us to transform the ket side into the solid harmonic Gaussian basis without the transformation into Cartesian Gaussians first [note that this is not necessary for $({\bar{𝑙}}_{p} |$ ]. The one-center integrals on the rightmost of Eq. (20) can be computed by the two-term recursion¹⁸

{({\bar{𝑙}}_{u} + 𝟏_{x})}^{(n)} = X_{PC} {({\bar{𝑙}}_{u})}^{(n + 1)} + {\bar{i}}_{u} {({\bar{𝑙}}_{u} - 𝟏_{x})}^{(n + 1)}

(21)

with

{(\bar{𝟎})}^{(n)} = {(- 2 α)}^{n} κ_{a b} θ_{p c} {(- 2 c)}^{- {\bar{L}}_{c}} F_{n} (α R_{PC}^{2}) .

(22)

From the one-center integrals, three-center ERIs with two Cartesian Gaussians in the bra and a Hermite Gaussian in the ket are evaluated as¹⁸

(𝐿_{a} 𝐿_{b} | {\bar{𝐿}}_{c}) = \sum_{{\bar{i}}_{p} = 0}^{I_{a} + I_{b}} E_{{\bar{i}}_{p}}^{I_{a}, I_{b}} \sum_{{\bar{j}}_{p} = 0}^{J_{a} + J_{b}} E_{{\bar{j}}_{p}}^{J_{a}, J_{b}} \sum_{{\bar{k}}_{p} = 0}^{K_{a} + K_{b}} E_{{\bar{k}}_{p}}^{K_{a}, K_{b}} ({\bar{𝑙}}_{p} + {\bar{𝐿}}_{c}) .

(23)

The E expansion coefficients appearing in Eq. (23) can be constructed by a set of recurrence relations,¹⁷

E_{\bar{0}}^{i_{a} + 1, 0} = X_{PA} E_{\bar{0}}^{i_{a}, 0} + E_{\bar{1}}^{i_{a}, 0},

(24)

E_{\bar{0}}^{i_{a}, i_{b} + 1} = X_{PB} E_{\bar{0}}^{i_{a}, i_{b}} + E_{\bar{1}}^{i_{a}, i_{b}},

(25)

E_{{\bar{i}}_{p} + 1}^{i_{a}, i_{b}} = \frac{1}{2 p ({\bar{i}}_{p} + 1)} (i_{a} E_{{\bar{i}}_{p}}^{i_{a} - 1, i_{b}} + i_{b} E_{{\bar{i}}_{p}}^{i_{a}, i_{b} - 1}), {\bar{i}}_{p} \geq 0,

(26)

with $E_{\bar{0}}^{0,0} = 1$ .

The expansion defined by Eq. (23) can be applied to produce various types of three-center ERIs. In the MD1 algorithm, for example, we get the $(𝐿_{a} 𝐿_{b} | {\bar{𝐿}}_{c})$ classes directly. First the expansion coefficients are computed; e.g., in the x direction $E_{{\bar{i}}_{p}}^{i_{a}, i_{b}}$ values are needed for $0 \leq i_{a} \leq L_{a}$ , $0 \leq i_{b} \leq L_{b}$ , and $0 \leq {\bar{i}}_{p} \leq i_{a} + i_{b}$ . This is followed by the calculation of the ${(\bar{𝟎})}^{(n)}$ integrals for $⌈ {\bar{L}}_{c} / 2 ⌋ + [{\bar{L}}_{c} mod 2] \leq n \leq L_{a} + L_{b} + {\bar{L}}_{c}$ with $⌈ x ⌋$ denoting the integer part of x. The one-center integrals $({\bar{𝒍}}_{u})$ for ${\bar{L}}_{c} \leq {\bar{l}}_{u} \leq L_{a} + L_{b} + {\bar{L}}_{c}$ are built up by Eq. (21), from which the target integrals are readily assembled by Eq. (23). The work done in this assembly step can be reduced by performing it at an earlier stage to construct intermediate classes and using OS-type recursions for the evaluation of the target integrals. In the MD2 scheme, the $(𝑙_{a} 𝟎 | {\bar{𝐿}}_{c})$ classes for $L_{a} \leq l_{a} \leq L_{a} + L_{b}$ are evaluated with Eq. (23). Here the necessary expansion coefficients are in the range of $0 \leq i_{a} \leq L_{a} + L_{b}$ , i_b = 0, and $0 \leq {\bar{i}}_{p} \leq i_{a}$ , and the required one-center integrals are the same as in MD1. After the assembly, the final integrals are computed by Eq. (13). A third option (MD3) is to obtain the ${(𝑙_{a} 𝟎 | 𝟎)}^{({\bar{L}}_{c})}$ type intermediates for $max (1, L_{a} - {\bar{L}}_{c}) \leq l_{a} \leq L_{a} + L_{b}$ with Eq. (23), then to build up ${\bar{𝑙}}_{c}$ with Eq. (12), and to finish with Eq. (13). Here the ${(- 1)}^{{\bar{L}}_{c}}$ scaling factor is absent from Eq. (22), and the required ${(\bar{𝟎})}^{(n)}$ values are in the range of ${\bar{L}}_{c} \leq n \leq L_{a} + L_{b} + {\bar{L}}_{c}$ and used for calculating the ${({\bar{𝑙}}_{u})}^{({\bar{L}}_{c})}$ integrals for $0 \leq l_{u} \leq L_{a} + L_{b}$ . The index range for the expansion coefficients is the same as in the MD2 scheme.

An alternative method for transforming the Hermite integrals into ones over Cartesian overlaps is the use of the

Ω_{𝑙_{a}, 𝑙_{b}}^{{\bar{𝑙}}_{p}} = x_{A}^{i_{a}} y_{A}^{j_{a}} z_{A}^{k_{a}} x_{B}^{i_{b}} y_{B}^{j_{b}} z_{B}^{k_{b}} \frac{\partial^{{\bar{l}}_{p}} \exp (- p r_{P}^{2})}{\partial P_{x}^{{\bar{i}}_{p}} \partial P_{y}^{{\bar{j}}_{p}} \partial P_{z}^{{\bar{k}}_{p}}}

(27)

hybrid functions¹⁷ on the bra side. As it is clear from Eq. (27), these functions reduce to Hermite Gaussians if l_a = l_b = 0 and to Cartesian overlap distributions centered on P without the $κ_{a b}$ factor if ${\bar{𝑙}}_{p} = 𝟎$ . Introducing the notation for the auxiliary integrals over hybrid bras and Hermite kets as $(Ω_{𝑙_{a}, 𝑙_{b}}^{{\bar{𝑙}}_{p}} | {\bar{𝑙}}_{c})$ and applying the recurrence relations¹⁷ for the functions in Eq. (27) we can write

(Ω_{𝒍_{a} + 𝟏_{x}, 𝒍_{b}}^{{\bar{𝒍}}_{p}} | {\bar{𝒍}}_{c}) = {\bar{i}}_{p} (Ω_{𝒍_{a}, 𝒍_{b}}^{{\bar{𝒍}}_{p} - 𝟏_{x}} | {\bar{𝒍}}_{c}) + X_{PA} (Ω_{𝒍_{a}, 𝒍_{b}}^{{\bar{𝒍}}_{p}} | {\bar{𝒍}}_{c}) + \frac{1}{2 p} (Ω_{𝒍_{a}, 𝒍_{b}}^{{\bar{𝒍}}_{p} + 𝟏_{x}} | {\bar{𝒍}}_{c})

(28)

and

(Ω_{𝒍_{a}, 𝒍_{b} + 𝟏_{x}}^{{\bar{𝒍}}_{p}} | {\bar{𝒍}}_{c}) = {\bar{i}}_{p} (Ω_{𝒍_{a}, 𝒍_{b}}^{{\bar{𝒍}}_{p} - 𝟏_{x}} | {\bar{𝒍}}_{c}) + X_{PB} (Ω_{𝒍_{a}, 𝒍_{b}}^{{\bar{𝒍}}_{p}} | {\bar{𝒍}}_{c}) + \frac{1}{2 p} (Ω_{𝒍_{a}, 𝒍_{b}}^{{\bar{𝒍}}_{p} + 𝟏_{x}} | {\bar{𝒍}}_{c}) .

(29)

Relying on these relations, one can start from the two-center Hermite integrals $(Ω_{0,0}^{{\bar{𝑙}}_{p}} | {\bar{𝐿}}_{c})$ , which are, by Eq. (20), practically scaled one-center $({\bar{𝑙}}_{p} + {\bar{𝐿}}_{c})$ integrals, and, through hybrid intermediates, convert these into the target $(Ω_{𝐿_{a}, 𝐿_{b}}^{\bar{𝟎}} | {\bar{𝐿}}_{c}) = (𝐿_{a} 𝐿_{b} | {\bar{𝐿}}_{c})$ classes with a purely Cartesian bra side. In the MD4, MD5, and MD6 schemes, we proceed the same way as in the MD1, MD2, and MD3 cases, respectively, with the difference that the calculation of the expansion coefficients is omitted, and instead of Eq. (23) we apply Eqs. (28) and (29) for the transformation of the bra side.

D. Gill–Head-Gordon–Pople algorithm

Here we consider the original algorithm of Gill, Head-Gordon, and Pople²⁷ with the modifications needed for three-center ERIs. In this method, the procedure is very similar to the MD5 scheme. The difference lies in the introduction of the $β, ζ$ -scaled auxiliary integrals defined as

{(Ω_{𝒍_{a}, 𝒍_{b}}^{{\bar{𝒍}}_{p}} | {\bar{𝒍}}_{c})}_{β, ζ} = \frac{{(2 b)}^{β}}{{(2 p)}^{ζ}} (Ω_{𝒍_{a}, 𝒍_{b}}^{{\bar{𝒍}}_{p}} | {\bar{𝒍}}_{c}),

(30)

where β and ζ are positive integers. With these quantities, substituting X_PA = −(2b)/(2p)X_AB, Eq. (28) can be rewritten as²⁷

{(Ω_{𝒍_{a} + 𝟏_{x}, 𝒍_{b}}^{{\bar{𝒍}}_{p}} | {\bar{𝒍}}_{c})}_{β, ζ} = {\bar{i}}_{p} {(Ω_{𝒍_{a}, 𝒍_{b}}^{{\bar{𝒍}}_{p} - 𝟏_{x}} | {\bar{𝒍}}_{c})}_{β, ζ} - X_{AB} {(Ω_{𝒍_{a}, 𝒍_{b}}^{{\bar{𝒍}}_{p}} | {\bar{𝒍}}_{c})}_{β + 1, ζ + 1} + {(Ω_{𝒍_{a}, 𝒍_{b}}^{{\bar{𝒍}}_{p} + 𝟏_{x}} | {\bar{𝒍}}_{c})}_{β, ζ + 1},

(31)

which is a relation that does not depend explicitly on the Gaussian exponents and therefore can be applied to the $β, ζ$ -scaled auxiliary integrals transformed to the contracted basis.

The strategy of the GHP scheme for three-center ERIs is thus the following. First, the necessary Hermite integrals $(Ω_{0,0}^{{\bar{𝑙}}_{p}} | {\bar{𝐿}}_{c}) = ({\bar{𝑙}}_{p} + {\bar{𝐿}}_{c})$ are computed for $0 \leq {\bar{l}}_{p} \leq L_{a} + L_{b}$ . Then, all the scaled classes of these integrals required to compute the ${(Ω_{𝑙_{a}, 𝟎}^{\bar{𝟎}} | {\bar{𝐿}}_{c})}_{0,0}$ classes with Eq. (31) for $L_{a} \leq l_{a} \leq L_{a} + L_{b}$ are produced. For each of these classes, we need to start from the ${(Ω_{0,0}^{{\bar{𝑙}}_{p}} | {\bar{𝐿}}_{c})}_{β, ζ}$ scaled Hermite intermediates for $0 \leq {\bar{l}}_{p} \leq l_{a}$ . To determine the $β, ζ$ -scaled classes needed for each $(Ω_{0,0}^{{\bar{𝑙}}_{p}} | {\bar{𝐿}}_{c})$ that will be used for the calculation of a given ${(Ω_{𝑙_{a}, 𝟎}^{\bar{𝟎}} | {\bar{𝐿}}_{c})}_{0,0}$ , we have to trace back the recursion defined by Eq. (31). As each recursion step increments l_a by $𝟏_{σ}$ , there are l_a steps. By analyzing the positions where ${(Ω_{0,0}^{{\bar{𝒍}}_{p}} | {\bar{𝐿}}_{c})}_{β, ζ}$ and the intermediates connected to it can appear in Eq. (31) during the recursion, we see that such intermediates have to be the third term at least ${\bar{l}}_{p}$ times to reduce ${\bar{𝒍}}_{p}$ to $\bar{𝟎}$ . In the additional $l_{a} - {\bar{l}}_{p}$ steps, these intermediates have to appear at the first and the third positions equal times if ${\bar{𝒍}}_{p}$ is to stay equal to $\bar{𝟎}$ , and in the remaining steps they have to be the second term. From this it follows that for each $l_{a}, {\bar{l}}_{p}$ pair there are $⌈ (l_{a} - {\bar{l}}_{p}) / 2 ⌋ + 1$ different scalings to consider. The β and ζ for these can be obtained by looking at how the changes in these values depend on the position the intermediates take in Eq. (31). The scaling indices are determined by how many times the connected intermediates take the second or third position. For example, in the case they take the third place ${\bar{l}}_{p}$ times and the second position in the remaining $l_{a} - {\bar{l}}_{p}$ steps, the values of β and ζ are $l_{a} - {\bar{l}}_{p}$ and l_a, respectively. Another example is when the intermediate takes the second position in two fewer steps in the recursion, and both the first and the third places are taken one more time than in the former example, making the scaling indices $β = l_{a} - {\bar{l}}_{p} - 2$ and $ζ = l_{a} - 1$ . Let us denote the scaled class in the first example as class 1 and that in the second example as class 2. In general, class n can be defined for the scaling indices $β = l_{a} - {\bar{l}}_{p} - 2 (n - 1)$ and $ζ = l_{a} - (n - 1)$ . After these classes for $1 \leq n \leq ⌈ (l_{a} - {\bar{l}}_{p}) / 2 ⌋ + 1$ have been calculated for all the primitive classes, the scaled one-center integrals are transformed to the contracted basis by Eq. (6). When using segmented basis sets, the multiplication work in this contraction step can be reduced to simply multiplying Eq. (22) with the appropriate $d_{a χ_{A}}$ coefficients. Following the contraction Eq. (31) is applied, and lastly Eq. (13) is used to build up l_b.

E. Rys quadrature method

The algorithms discussed before are all based on calculating scaled Boys functions of various orders and using them as starting values for a recursive procedure. Inspecting these methods and utilizing Eq. (9) it is evident that the target integral can be expressed as

\begin{matrix} (𝑳_{a} 𝑳_{b} | 𝑳_{c}) & = \sum_{n = 0}^{L_{a} + L_{b} + L_{c}} Z_{n} F_{n} (α R_{PC}^{2}) \\ = \int_{0}^{1} \sum_{n = 0}^{L_{a} + L_{b} + L_{c}} Z_{n} t^{2 n} \exp (- α R_{PC}^{2} t^{2}) d t, \end{matrix}

(32)

where the values of the coefficients Z_n can be obtained by, for example, backtracking the OS recursions until the integral is only expanded in Boys functions. Eq. (32) is an integral over a polynomial $f (t^{2}) = \sum_{n = 0}^{L_{a} + L_{b} + L_{c}} Z_{n} t^{2 n}$ multiplied by a weight function $W (T, t^{2}) = \exp (- T t^{2})$ with $T = α R_{PC}^{2}$ . According to the theory of Gauss–Rys quadrature,^10,17 these integrals can be evaluated exactly as

\int_{0}^{1} f (t^{2}) W (T, t^{2}) d t = \sum_{n = 1}^{N_{r t s}} f (t_{n}^{2}) w_{n}

(33)

with N_rts being an integer satisfying $N_{r t s} > ⌈ (L_{a} + L_{b} + L_{c}) / 2 ⌋$ and $t_{n}^{2}$ is the square of the nth positive root of the (2N_rts)th order Rys polynomial in t. These polynomials are defined to be orthonormal on the interval [0,1] with the weight function W(T, t²). $w_{𝑛}$ is the T-dependent weight factor of the quadrature associated with $t_{n}^{2}$ . For the calculation of the roots of the Rys polynomials and the weight factors, we followed the approach of King and Dupuis¹⁰ for the $N_{r t s} \leq 5$ cases and the work of Flocke and Lotrich⁵⁶ for $N_{r t s} > 5$ .

Substituting the identity

\frac{1}{| 𝐫_{𝟏} - 𝐫_{𝟐} |} = \frac{2}{π^{1 / 2}} \int_{0}^{\infty} \exp (- {| 𝐫_{𝟏} - 𝐫_{𝟐} |}^{2} u^{2}) d u

(34)

into Eq. (7) and changing the order of integration, we get

(𝑳_{a} 𝑳_{b} | 𝑳_{c}) = \frac{2}{π^{1 / 2}} \int_{0}^{\infty} [\int \int G_{I_{a} J_{a} K_{a}} (𝐫_{𝟏}, a, 𝐀) G_{I_{b} J_{b} K_{b}} (𝐫_{𝟏}, b, 𝐁) \times \exp (- | 𝐫_{𝟏} - 𝐫_{𝟐} |^{2} u^{2}) G_{I_{c} J_{c} K_{c}} (𝐫_{𝟐}, c, 𝐂) d 𝐫_{𝟏} d 𝐫_{𝟐}] d u .

(35)

It is possible to factorize the bracketed integrand in Eq. (35) into three two-dimensional (2D) integrals associated with the three Cartesian directions¹² to get

(𝑳_{a} 𝑳_{b} | 𝑳_{c}) = \frac{2}{π^{1 / 2}} \int_{0}^{\infty} {\underline{Θ}}_{x}^{I_{a}, I_{b}, I_{c}} (u^{2}) {\underline{Θ}}_{y}^{J_{a}, J_{b}, J_{c}} (u^{2}) {\underline{Θ}}_{z}^{K_{a}, K_{b}, K_{c}} (u^{2}) d u,

(36)

where

{\underline{Θ}}_{x}^{I_{a}, I_{b}, I_{c}} (u^{2}) = \int \int x_{A}^{I_{a}} x_{B}^{I_{b}} x_{C}^{I_{c}} \exp [- μ X_{AB} - p x_{P}^{2} - c x_{C}^{2} - u^{2} {| x_{1} - x_{2} |}^{2}] d x_{1} d x_{2} .

(37)

By making a change of variable from u to t as

u^{2} = \frac{α t^{2}}{1 - t^{2}},

(38)

d u = d t α^{1 / 2} {(\frac{1}{1 - t^{2}})}^{3 / 2},

(39)

defining the modified 2D integrals as

Θ_{x}^{I_{a}, I_{b}, I_{c}} (t^{2}) = {\underline{Θ}}_{x}^{I_{a}, I_{b}, I_{c}} (u^{2}) \exp (α X_{PC}^{2} t^{2}) {(1 - t^{2})}^{- 1 / 2},

(40)

and also noting that as u varies from 0 to infinity, t varies from 0 to 1, we can rewrite Eq. (35) as

(𝑳_{a} 𝐋_{b} | 𝐋_{c}) = 2 {(\frac{α}{π})}^{1 / 2} \int_{0}^{1} Θ_{x}^{I_{a}, I_{b}, I_{c}} (t^{2}) Θ_{y}^{J_{a}, J_{b}, J_{c}} (t^{2}) Θ_{z}^{K_{a}, K_{b}, K_{c}} (t^{2}) \times W (T, t^{2}) d t .

(41)

From Eq. (41) it is clear that f(t²) can be written as

f (t^{2}) = 2 {(\frac{α}{π})}^{1 / 2} Θ_{x}^{I_{a}, I_{b}, I_{c}} (t^{2}) Θ_{y}^{J_{a}, J_{b}, J_{c}} (t^{2}) Θ_{z}^{K_{a}, K_{b}, K_{c}} (t^{2}),

(42)

and, since the 2D integrals are polynomials in t²,¹² Eq. (33) takes the form

\int_{0}^{1} f (t^{2}) W (T, t^{2}) d t = 2 {(\frac{α}{π})}^{1 / 2} \sum_{n = 1}^{N_{r t s}} Θ_{x}^{I_{a}, I_{b}, I_{c}} (t_{n}^{2}) Θ_{y}^{J_{a}, J_{b}, J_{c}} (t_{n}^{2}) \times Θ_{z}^{K_{a}, K_{b}, K_{c}} (t_{n}^{2}) w_{n} .

(43)

The value of $Θ_{x}^{I_{a}, I_{b}, I_{c}} (t_{n}^{2})$ can be calculated recursively¹⁷ (and similarly for the y and z directions) as

Θ_{x}^{i_{a} + 1, 0,0} (t_{n}^{2}) = (X_{PA} - \frac{α}{p} X_{PC} t_{n}^{2}) Θ_{x}^{i_{a}, 0,0} (t_{n}^{2}) + \frac{i_{a}}{2 p} (1 - \frac{α}{p} t_{n}^{2}) Θ_{x}^{i_{a} - 1, 0,0} (t_{n}^{2})

(44)

for i_a and as

Θ_{x}^{i_{a}, 0, i_{c} + 1} (t_{n}^{2}) = \frac{α}{c} X_{PC} t_{n}^{2} Θ_{x}^{i_{a}, 0, i_{c}} (t_{n}^{2}) + \frac{i_{a} t_{n}^{2}}{2 (p + c)} Θ_{x}^{i_{a} - 1, 0, i_{c}} (t_{n}^{2})

(45)

for i_c. Finally, i_b is built up by

Θ_{x}^{i_{a}, i_{b} + 1, i_{c}} (t_{n}^{2}) = Θ_{x}^{i_{a} + 1, i_{b}, i_{c}} (t_{n}^{2}) + X_{AB} Θ_{x}^{i_{a}, i_{b}, i_{c}} (t_{n}^{2}) .

(46)

We note that, in the general case, Eq. (45) contains a third term which can be neglected if the ket side is to be transformed to the solid harmonic Gaussian basis. The derivation of Eq. (45) is given in Appendix A. Instead of performing the assembly step as it is defined by Eq. (43) and starting the recursion of Eq. (44) with $Θ_{x}^{0,0,0} (t_{n}^{2}) = π \exp (- μ X_{AB}^{2}) (1 / \sqrt{p c})$ (and analogously for the other directions),¹¹ it is more beneficial to start with $Θ_{z}^{0,0,0} (t_{n}^{2}) = θ_{a b} κ_{a b} w_{n}$ and $Θ_{x}^{0,0,0} (t_{n}^{2}) = Θ_{y}^{0,0,0} (t_{n}^{2}) = 1$ , making the equation for the assembly

\int_{0}^{1} f (t^{2}) W (T, t^{2}) d t = \sum_{n = 1}^{N_{r t s}} Θ_{x}^{I_{a}, I_{b}, I_{c}} (t_{n}^{2}) Θ_{y}^{J_{a}, J_{b}, J_{c}} (t_{n}^{2}) Θ_{z}^{K_{a}, K_{b}, K_{c}} (t_{n}^{2}) .

(47)

For the four-center ERIs, it has also been shown²⁵ that the direct evaluation of the target integrals from the 2D integrals is not the only possibility, but it can be advantageous to construct intermediate integrals from the 2D ones and use OS-type recursions to get the target integral.

Here we will investigate three possibilities for the three-center ERIs. In the RYS1 algorithm, we evaluate the (L_aL_b|L_c) integrals directly by Eq. (47). For this purpose, we have to compute $Θ_{x}^{i_{a}, i_{b}, i_{c}} (t_{n}^{2})$ for $1 \leq i_{a} \leq L_{a}$ , $1 \leq i_{b} \leq L_{b}$ , and $1 \leq i_{c} \leq L_{c}$ for the N_rts roots and for the three directions. In the RYS2 scheme, (l_a0|L_c) classes are calculated on the quadrature for $L_{a} \leq l_{a} \leq L_{a} + L_{b}$ , then the OS-type HRR, Eq. (13), is applied. The indices of the necessary 2D integrals here are in the range of $1 \leq i_{a} \leq L_{a} + L_{b}$ , i_b = 0, and $1 \leq i_{c} \leq L_{c}$ . We also explored here a completely different strategy which has not yet been considered in the literature even for four-center integrals. We utilize that it is also possible to construct the ${(𝒍_{a} 𝟎 | 𝟎)}^{(L_{c})}$ auxiliary integrals as

\begin{matrix} {(𝒍_{a} 𝟎 | 𝟎)}^{(L_{c})} & = \sum_{n = 0}^{l_{a}} Z_{n} F_{n + L_{c}} (α R_{PC}^{2}) \\ = \int_{0}^{1} \sum_{n = 0}^{l_{a}} Z_{n} t^{2 (n + L_{c})} \exp (- α R_{PC}^{2} t^{2}) d t . \end{matrix}

(48)

In this case, the value of the polynomial f of $t_{n}^{2}$ can be written as

f (t_{n}^{2}) = t_{n}^{2 L_{c}} 2 {(α / π)}^{1 / 2} Θ_{x}^{i_{a}, 0,0} (t_{n}^{2}) Θ_{y}^{j_{a}, 0,0} (t_{n}^{2}) Θ_{z}^{k_{a}, 0,0} (t_{n}^{2}) .

(49)

The extra multiplication with $t_{n}^{2 L_{c}}$ can also be built into $Θ_{z}^{0,0,0} (t_{n}^{2})$ . In this algorithm (RYS3), the needed 2D integrals are $Θ_{x}^{i_{a}, 0,0} (t_{n}^{2})$ for $1 \leq i_{a} \leq L_{a} + L_{b}$ for all the roots and directions, the ${(𝒍_{a} 𝟎 | 𝟎)}^{(L_{c})}$ classes are constructed for $max (0, L_{a} - L_{c}) \leq l_{a} \leq L_{a} + L_{b}$ by Eq. (47), and the target integrals are built up via Eqs. (12) and (13).

F. Algorithmic considerations

Since its introduction the HRR equation, Eq. (13), has been a standard tool for evaluating molecular integrals over Gaussian functions. In addition to being a simple two-term recurrence relation, it is also independent of the basis set exponents, making it possible to apply it to contracted integrals instead of primitive ones, which (usually) means that a smaller number of integrals are to be treated. The same is true for the transformation to the solid harmonic Gaussian basis, and it has also been proposed that these two operations for one side (bra or ket) can be efficiently combined into a single matrix multiplication.⁵⁶ On the other hand, if we choose to use Eqs. (13) and (5) at the contracted level, we have to first contract the components of the classes (l_a0|L_c) for $L_{a} \leq l_{a} \leq L_{a} + L_{b}$ , which consist of [(L_a + L_b + 1) (L_a + L_b + 2) (L_a + L_b + 3)/6 − 1 − L_a(L_a + 1)/2](L_c + 1) (L_c + 2)/2 integrals for every final class of (L_aL_b|L_c). If we perform the HRR and the solid harmonic transformation at the primitive level instead, this number becomes (2L_a + 1) (2L_b + 1) (2L_c + 1), which is smaller in all the cases. This does not only affect the operation count of the contraction step but the memory use of the code as well. For example, if we apply the nested loop structure shown in Algorithm 1, the arrays storing the partially and fully contracted integrals will be the largest ones used in the process of evaluating all (L_aL_b|L_c) ERIs for three given centers. This means that we can expect the most data cache-miss events (meaning that the copy of the data stored at a referenced memory address cannot be found in the cache memory of the central processing unit (CPU)) to happen at this stage of the algorithm. Since the fetching of data from main memory is about a magnitude slower than from the cache (two magnitudes if the data reside in the first level of the cache), such misses can have a considerable effect on the performance of the code, and fewer misses are expected for a smaller array. Thus we see that it is not a trivial decision where Eqs. (13) and (5) should be applied. The schemes where the HRR and the solid harmonic transformation are done at the primitive level will be denoted as IN, while the ones where these two steps are performed at the contracted level will be labeled as OUT.

Algorithm 1.

abc primitive loop order.

Loop over a

Loop over b

Algorithm pPRE2: estimate (00|0) for the smallest c

Loop over c

Algorithm pPRE1: estimate (00|0)

Algorithm OUT: Build up (l_a0|L_c) for

L_{a} \leq l_{a} \leq L_{a} + L_{b}

in the Cartesian

Gaussian basis

Algorithm IN: Build up (L_aL_b|L_c) in the solid harmonic Gaussian basis

End loop

Contract the third function for all classes with exponents a and b

End loop

Contract the second function for all classes with exponent a

End loop

Contract the first function for all classes

Loop over

χ_{A}

(executed only in the case of algorithm OUT)

Loop over

χ_{B}

Algorithm cPRE2: Look up the integral of highest absolute value in

the contracted (l_a0|L_c) classes needed for the contracted (L_aL_b|L_c) class with

the smallest c

Algorithm cPRE3: Estimate the integral of highest absolute value in

the contracted (l_a0|L_c) classes needed for the contracted (L_aL_b|L_c) class with

the smallest c

Loop over

χ_{C}

Algorithm cPRE1: Look up the integral of highest absolute value in

the contracted (l_a0|L_c) classes needed for the contracted (L_aL_b|L_c) class

Algorithm OUT: perform HRR to get (L_aL_b|L_c), perform solid harmonic

transformation

End loop

Open in a new tab

Our contraction procedure distinguishes between contracted and uncontracted functions for all three centers, especially because there can be a significant number of uncontracted functions in generally contracted basis sets, e.g., in the cc-pVXZ bases.^57,58 For example, in the cc-pVTZ basis for elements Li to Ne all the d and f functions are uncontracted, and out of the four s and three p functions only two and one are contracted, respectively, and all the functions in the corresponding fitting basis,⁵⁹ cc-pVTZ-RI, are uncontracted. For the integrals that are evaluated over primitives which contribute to an uncontracted function, the quantity $θ_{p c} κ_{a b}$ is multiplied by the norm factor of the function which is otherwise absorbed into the contraction coefficients, and the integrals are written directly into the array that stores the contracted integrals; therefore, both the floating-point and memory operations for the contraction are saved. In the case these primitives also contribute to other, contracted functions, the coefficients of the affected primitives for these contracted functions in Eq. (6) are divided by the above mentioned norm. Further notes on the efficient treatment of integral contraction will be discussed in Sec. IV.

The sizes of the arrays for integral contraction can be further reduced when the auxiliary basis set used for the density fitting approximation is uncontracted even if the functions on centers A and B are contracted. If we change the order of loops from a, b, c to c, a, b as it is shown in Algorithm 2, the sizes of the arrays for the contraction of the first and second functions reduce by a factor of the number of the contracted functions on the third center. Here the loop over the exponents of the ket side is also the loop over the contracted functions on C, and all calculations are performed inside this loop. This scheme, however, has the disadvantage that we have to precalculate the a- and b-dependent quantities in a separate loop to avoid unnecessary recalculations. Schemes with the a,b,c primitive loop structure will be referred to as abc, while the ones with c,a,b order will be denoted by cab.

Algorithm 2.

cab primitive loop order.

Loop over a

Loop over b

Calculate the quantities depending on functions in the bra

End loop

Loop over c

Loop over a

Algorithm pPRE2: estimate (00|0) for the smallest b

Loop over b

Algorithm pPRE1: estimate (00|0)

Algorithm OUT: Build up (l_a0|L_c) for

L_{a} \leq l_{a} \leq L_{a} + L_{b}

in the Cartesian

Gaussian basis

Algorithm IN: Build up (L_aL_b|L_c) in the solid harmonic Gaussian basis

End loop

Contract the second function for all classes with exponents c and a

End loop

Contract the first function for all classes with exponent c

Loop over

χ_{A}

(executed only in the case of algorithm OUT)

Loop over

χ_{B}

Algorithm cPRE1: Look up the integral of highest absolute value in

the contracted (l_a0|L_c) classes needed for the contracted (L_aL_b|L_c) class

Algorithm cPRE4: Estimate the integral of highest absolute value in

the contracted (l_a0|L_c) classes needed for the contracted (L_aL_b|L_c) class

Algorithm OUT: perform HRR to get (L_aL_b|L_c), perform solid harmonic

transformation

End loop

Open in a new tab

Another aspect that can have a strong effect on the performance is the prescreening of integrals which are lower in absolute value than a user-defined threshold, hereafter denoted by ε. In our code, as usual, the entire shell triplets are prescreened invoking the Schwartz inequality,⁷ and we also employ the distance-dependent estimator of Valeev and co-workers.⁶⁰ In addition, the screening of the primitive integrals is also implemented. For the latter, the threshold is also tied to ε by dividing it by the maximal level of contraction, that is, the product of the number of primitive functions on each center. Exceptions from this rule are integrals that contain a primitive (centered on, for example, A) which contributes to only one contracted function $χ_{A}$ . Then, ε is not divided by the number of primitives on A but rather the level of contraction for $χ_{A}$ , making the threshold for primitive prescreening higher. For the estimation of the magnitude of the primitive integrals, we will use the value of the (00|0) ERI evaluated with the exponents of the functions of higher angular momentum. Instead of directly calculating (00|0) according to Eq. (8), we can use the upper bound for the zeroth-order Boys function,¹⁷ from which we get

(𝟎𝟎 | 𝟎) = θ_{p c} κ_{a b} F_{0} (α R_{PC}^{2}) \leq θ_{p c} κ_{a b} min (1, \sqrt{\frac{π}{4 α R_{PC}^{2}}}) .

(50)

The minimum criterion appears since the approximation used in Eq. (50) is only accurate for high values of $α R_{PC}^{2}$ (greater than about 74), and for smaller arguments it can give results greater than 1, which is the highest value the zeroth-order Boys function can take (when $α R_{PC}^{2} = 0$ ). In actual calculations, it is more beneficial to use the square of the rightmost side of Eq. (50) for screening, so the expensive square root calculation only has to be done for classes with small $α R_{PC}^{2}$ that survive the prescreening. In this method (algorithm pPRE1), the estimate for |(00|0)|² is compared to the square of the threshold, and if the former value is greater, the class is evaluated. This is not an exact screening since Eq. (50) is not a rigorous upper bound for the target ERIs. Instead, this approach is related to the one proposed by Almlöf and co-workers,⁶ who used the common factor (in our case $κ_{a b} θ_{p c}$ ) by which all the integrals in a class are multiplied to gain an estimate for the magnitude of the primitive integrals in a given class. In our scheme, this value is multiplied by a number smaller than 1, resulting in a less precise but more efficient screening method. In practice, we found that it can be more efficient to screen a batch of primitive exponent triplets than each individual one. Here we make use of the fact that the value of the right-hand side of Eq. (50) increases with the decrement of the Gaussian exponent c for the ket side. This can be seen by noting that $\partial α / \partial c = p^{2} / {(p + c)}^{2}$ is always a positive number. Hence, we only need to estimate the (00|0) integral with the smallest c in an abc scheme before the innermost loop (algorithm pPRE2). One could proceed the same way in a cab scheme estimating the integral with the smallest b before the loop over b, but we found this choice to be inefficient, as it will be discussed in Sec. V. The accuracy of the pPRE2 screening method and the effect of its inexact nature on HF energies are discussed in the supplementary material. The derivation of an exact, but less efficient prescreening method based on the Schwartz inequality,⁷ is presented in Appendix B.

The primitive prescreening described above does not reduce the work of the HRR and the solid harmonic transformation steps if these are performed at the contracted level (algorithm OUT). The simplest option in this case is, for each combination of the contracted functions, to check if the largest value out of the contracted (l_a0|L_c) classes needed for a class of (L_aL_b|L_c) is greater than the threshold before applying Eqs. (13) and (5) to get the given class (algorithm cPRE1). We can also chose to screen a bigger batch of contracted classes instead by performing the search for the integral of highest absolute value before the loop over $χ_{C}$ in an abc scheme or $χ_{B}$ in a cab scheme. This is advantageous when the fitting basis is uncontracted and an abc scheme is applied (see Algorithm 1). In these cases, we will work with the assumption that the integrals involving the most diffuse functions (that is, the smallest c) on the ket side will have higher absolute values than those containing higher c exponents, and therefore screening for the classes with the smallest c is enough to see if any of the integrals in the batch will reach the threshold (algorithm cPRE2). Like the pPRE1 and pPRE2 methods, this is not a rigorous screening, but its accuracy is demonstrated in the supplementary material. An alternative method is to estimate the integral with the highest absolute value out of the screened batch. For this purpose, we save the estimates of the (00|0) integrals made by Eq. (50). Then, an estimated upper bound for the integral of highest value of a contracted class is gained by taking the (00|0) estimate calculated from the smallest a, b, and c exponents which contribute to the contracted functions in question and multiplying it by both the degree of contraction (product of the number of primitives for the three functions) and the maximal contraction coefficient used for each contracted function. This estimation can also be done before the loop over $χ_{C}$ for the class with the smallest c (algorithm cPRE3) in an abc scheme (see Algorithm 1) when the fitting basis is uncontracted. With a cab loop order (Algorithm 2) we cannot assume which contracted class contains the integrals of highest absolute value; therefore, the estimation is performed for each class inside the loop over $χ_{B}$ (algorithm cPRE4).

Finally, from the recursive formulas for the calculation of six-dimensional integrals given in Secs. II B–II D it is evident that an integral can be constructed in numerous ways by such recursions, depending on which of the x,y,z components of the angular momentum is raised in the various recursion steps. A well-known consequence of this is that not all components of the intermediate classes have to be calculated and that different paths in the recursion have different operation counts.^22,28 In our algorithms, the related tree-search problems were treated utilizing the ideas of Ryu and co-workers.²²

III. FLOATING POINT OPERATION COUNTS

The FLOP requirements of the discussed schemes were estimated by a simple program developed for this purpose. The considered operations include the calculation of the primitive integrals and the transformation into the solid harmonic Gaussian and contracted bases. Estimations for the evaluation of Boys functions and the roots and weights for the Rys quadratures are omitted because the computational requirements of both steps depend heavily on the actual values of $α R_{PC}^{2}$ . Nevertheless, we found that the computation time spent on the two operations is rather similar, thus the neglect of their FLOP counts is not expected to influence our conclusions. Prescreening of the integrals is also not taken into account since this is also strongly system-dependent. The program counts the FLOP requirements of the schemes according to the equations given in Sec. II supposing that reusable compound quantities, such as $(α / p) X_{PC}$ in Eq. (11), are precalculated and treated as single variables. The sparsity of the transformation matrices for the solid harmonic Gaussian transformation and the primitive contraction is taken into consideration. The abc primitive loop structure was used and the solid harmonic transformation and the HRR were performed at the contracted level since this is the most conventional approach, but this does not change the theoretical order of efficiency for the investigated schemes. In the calculations presented in the following, a model system of three carbon atoms were chosen, and the number of FLOPs needed to evaluate all the ERIs over three separate centers was estimated for Dunning’s⁵⁷ correlation consistent cc-pVXZ (X = D,T,Q,5) basis sets (XZ for short) for the bra side and the corresponding auxiliary basis sets of Weigend⁵⁹ (cc-pVXZ-RI) for the ket side.

The overall FLOP counts for all the shell triplets for the various algorithms are presented in Table I. Figures that show the theoretical performance of the other algorithms relative to the OS1 scheme can be found in the supplementary material. It can be seen that out of the OS-based schemes the OS1 algorithm shows the best theoretical performance. In the OS2 and OS3 schemes, the more expensive recursion for l_a takes place after the build up of l_c, which makes these algorithms perform progressively worse with basis sets of higher cardinal number compared to OS1. In the OS4 route, the extra work introduced on the bra side with the use of the ETR becomes less and less significant with higher angular momenta in the bra, making the relative performance of OS4 better with bigger bases. Nevertheless, the OS1 scheme provides the lowest FLOP counts for each shell triplet. For the MD-based algorithms, the introduction of both the HRR for the bra (MD2 and MD5) and the VRR for the ket side (MD3 and MD6) improves the performance with respect to the MD1 and MD4 schemes, and increasingly so with the growth of L_b and L_c, respectively. None of the MD routes perform better than the OS1 for any shell triplets except for (ss|p), where the MD1, MD2, MD4, and MD5 schemes are slightly cheaper since the additional calculation of $\frac{α}{c}$ from Eq. (12) is not necessary. Looking at the best performing MD3 and MD6 schemes, we see that the use of Eqs. (28) and (29) is preferred to the assembly of Eq. (23), except when L_b = 0. The GHP scheme performs better than the OS1 when the bra side is (ps| since the extra contraction work for the scaled Hermite classes needed for Eq. (31) is negligible in these cases [except for very high angular momenta in the ket, see, for example, the (ps|i) shell triplet] and the s and p shells are contracted in all the investigated basis sets. The (ss|p) shell triplet also performs better, for the same reason as with the MD schemes. For higher angular momenta Eq. (31) becomes inefficient, hence the GHP scheme is only competitive for the DZ basis. As in the MD cases, the HRR for RYS2 and the ket-side VRR for RYS3 improve the FLOP counts. The RYS2 and RYS3 algorithms outmatch the OS1 in most of the cases when L_c = 0. For example, the OS1 scheme is better for (ds|s), but not for (dp|s). This is because the two-point quadrature is more costly than Eq. (11) for the former case, but it is cheaper for the latter. The RYS1 is the worst performing one of the Rys-based algorithms, but it is still superior to OS1 for particular shell triplets, for example, for (fd|s). The RYS3 scheme can be better than the OS1 for p kets if the change from s to p does not increase the number of quadrature points. However, since from Eq. (12) the ${(𝒍_{a} 𝟎 | 𝟎)}^{(L_{c})}$ integral classes that have to be calculated with quadrature for RYS3 are in the range of $max (0, L_{a} - L_{c}) \leq l_{a} \leq L_{a} + L_{b}$ , the growth of L_c also increases the work in the quadrature step, so this is only the case for higher angular momentum bras. All in all, there is only a small difference between the overall estimates for the best performing OS1 and RYS3 algorithms. Because of this, and also because the FLOP counts of the Boys functions and the roots and weights of the Rys quadratures are not estimated, these two schemes were implemented efficiently using automated code generation and wall time measurements were carried out, as will be discussed in Secs. IV and V, to decide which of the two is the most efficient scheme. The GHP algorithm for the (ps|s)–(ps|g) integrals has also been implemented “by hand” because the FLOP counts with this scheme are the lowest for these triplets.

TABLE I.

FLOP counts for the various algorithms with the cc-pVXZ basis sets.

	X
Algorithm	D	T	Q	5
OS1	445 777	2 231 707	14 074 904	71 407 908
OS2	545 297	2 967 883	19 981 747	106 671 377
OS3	632 210	3 465 805	22 599 746	116 871 757
OS4	754 037	3 587 118	21 812 481	106 908 381
MD1	599 215	3 801 560	30 617 263	198 278 829
MD2	555 359	3 165 292	22 249 286	125 117 638
MD3	474 978	2 473 785	15 766 358	80 931 165
MD4	616 235	3 824 184	29 178 035	173 497 467
MD5	570 267	3 272 596	22 532 400	121 220 098
MD6	470 050	2 420 151	15 243 362	77 170 230
GHP	499 430	3 188 703	25 032 932	152 491 888
RYS1	622 518	3 181 603	20 929 684	112 060 719
RYS2	585 778	2 902 749	18 203 750	92 155 512
RYS3	467 187	2 308 256	14 413 073	72 659 045

Open in a new tab

The FLOP counts for the four different possible combinations of the IN-OUT and abc-cab schemes for the OS1 algorithm are shown in Table II. The conclusions are also true for the RYS3 algorithm since the OS1 and RYS3 schemes do not differ in any part that is affected by varying these four algorithmic approaches. The estimates for the abc and cab cases are essentially the same; the small difference comes from the fact that for the abc schemes the additional costs of the pPRE2 type primitive prescreening are also counted because additional calculations are needed here before the loop over c. The differences between the IN and OUT algorithms are more significant, and as expected, performing the HRR and the solid harmonic transformation at the contracted level is theoretically more efficient in every case when at least one of the functions is contracted. The difference becomes less pronounced with higher basis sets because d and higher shells are uncontracted in the investigated bases. These results, however, do not provide information about the difference in performance that could arise from the different memory layouts and prescreening strategies of the schemes. Hence, to assess the wall time performances as well as cache-miss rates these four variations have also been efficiently implemented for both the OS1 and RYS3 algorithms, and the abc and cab versions of the GHP schemes were also programmed.

TABLE II.

FLOP counts for the four different OS1 algorithms with the cc-pVXZ basis sets.

	X
Algorithm	D	T	Q	5
IN-abc	566 748	2 664 883	15 919 037	79 233 985
IN-cab	565 054	2 662 374	15 916 043	79 232 960
OUT-abc	445 777	2 231 707	14 074 904	71 407 908
OUT-cab	443 201	2 227 609	14 069 600	71 404 912

Open in a new tab

IV. IMPLEMENTATION

The four combinations of the IN-OUT and abc-cab schemes for the OS1 and RYS3 algorithms together with the prescreening approaches discussed in Sec. II F have been implemented in the Mrcc program suite⁶¹ by means of automated code generation. The abc and cab variants of the GHP algorithm for the (ps|s)–(ps|g) triplets have been implemented in the conventional way. An individual Fortran 95 subroutine was created for every shell triplet up to (hh|i). The subroutines contain the loops over the primitive and the contracted Gaussians, the calculation of the necessary exponent- and center-dependent quantities, the evaluation of Boys functions (or the roots and weights for the Rys quadrature), the recursive build-up of angular momenta (or the quadrature for l_a), and the transformations to the solid harmonic and contracted bases. The code generation based implementation is particularly useful for the exploitation of the fact that not all the intermediate integrals are needed for a given class when using the 6D recurrences of Eqs. (11)–(13) and the 3D recurrence of Eq. (21), and the statements for calculating the unnecessary integrals are simply omitted from the code. For the 2D recursions of the RYS3 scheme, this does not apply since the recursions for the x, y, and z directions are performed separately and all the components are needed in the recursion defined by Eq. (44). The calculation of the 2D integrals is vectorized for the roots of the Rys polynomials, and the quadrature for the ${(𝒍_{a} 𝟎 | 𝟎)}^{(L_{c})}$ classes has been implemented utilizing the reduced multiplication scheme of Lindh and co-workers.²⁵ All the intermediate and target integrals are stored in one-index arrays. The build-up of angular momenta and the solid harmonic transformation is performed for one class at a time, which means that the arrays for storing the intermediates of these tasks are of fixed length and the indices can be explicitly generated, eliminating the integer and memory operations for the calculation of indices.

A significant amount of vectorization can be achieved for the HRR and the solid harmonic transformation provided that the data are stored in the appropriate order. The HRR can be trivially vectorized for the components of L_c since Eq. (13) does not depend on the function in the ket. Systematic vectorization for the components of l_a is also possible if the component of l_b is the slowest changing property in the array. If the ordering of Cartesian components is as it is shown in Fig. 1, then the components of l_a can only be partially vectorized if z or y is raised in the angular momentum of l_b and fully if x is incremented; therefore, whenever it is possible, x should be raised by the HRR. For the GHP algorithm with a (ps|bra, where the target integrals are calculated directly from the one-center ones, Eq. (31) was vectorized in the same manner for the components of L_c. For the solid harmonic transformation of one of the functions, the loops over all the (Cartesian or solid harmonic) components of the other two functions can only be vectorized if the components of the transformed function change most slowly. We found it to be efficient to rearrange the ordering of integrals before these highly vectorizable tasks. The sparsity of the solid harmonic transformation is fully exploited in our implementation, and the values of the coefficients in Eq. (5) are explicitly generated into the code. We have also considered the approach where the HRR and the solid harmonic transformation for the bra are treated as one matrix multiplication by precalculating the combined transformation matrix,⁵⁶ storing it in compressed sparse column format for a given bra, and reusing this matrix with a sparse matrix multiplication routine for the transformation of integrals. It was our experience that performing the HRR separately step by step for each l_b with the vectorization scheme described above and exploiting that some components are unnecessary for the recursion is a more beneficial strategy. It should also be mentioned that the solid harmonic transformation of the ket side is always performed before the HRR since this makes the latter step less expensive.

FIG. 1. — Two possible ways of calculating integrals with a $(𝒅𝒅 |$ bra side and l_b = (1, 0, 1) are by (a) incrementing z and (b) incrementing x in l_b. The indices for the Cartesian components increase as we proceed from top to bottom in the columns for the f and d shells above. The operations which can be vectorized are highlighted by boxes of various colors. In our implementation, incrementing x is always better suited for vectorization. The ket side of the integrals is not shown since the HRR equation is invariant to the function in the ket.

The contraction of primitives can be treated in a vectorized manner without the rearrangement of data. For generally contracted functions, the multiplication with the coefficients in Eq. (6) is vectorized for all the necessary classes, e.g., for the construction of the integrals over all components of one of the $χ_{B}$ functions in an abc scheme $N_{χ_{C}} N_{S}$ number of integrals are treated simultaneously, where $N_{χ_{C}}$ is the number of contracted functions centered on C and N_S is the number of integrals in the class (for algorithm IN) or in the necessary (l_a0|L_c) classes (for algorithm OUT). For example, for a (dd|d) class $N_{S} = 5 \times 5 \times 5 = 125$ for algorithm IN and $N_{S} = 6 \times 1 \times 6 + 10 \times 1 \times 6 + 15 \times 1 \times 6 = 186$ for algorithm OUT because here we need the (ds|d), (fs|d), and (gs|d) classes for the HRR. It is also noteworthy that, at the contraction of the functions centered on B, instead of performing the summation of Eq. (6) in the $N_{a} N_{χ_{B}} N_{χ_{C}} N_{S}$ long array used to store these partially contracted integrals (where N_a is the number of primitives centered on A), it is more cache-friendly to do the summation in a buffer array of size $N_{χ_{C}} N_{S}$ , than to copy the data into the array that will be used for the contraction of primitives centered on A.

The implementation of ERIs also utilizes a coarse-grained OpenMP parallelization for the innermost atomic loop. A figure showing the performance of the parallelization can be found in the supplementary material. We also note that, to demonstrate the efficiency of the generated implementation, we have also coded a subroutine that uses the OS1 scheme for arbitrary angular momenta. Here, the recursions of Eqs. (11) and (12) are performed by general loops, and the intermediates are stored in a two-index array. The HRR and the solid harmonic transformation steps are done at the contracted level with a sparse matrix multiplication routine, which is applied to the solid harmonic transformation of the ket and the combined HRR and solid harmonic transformation of the bra⁵⁶ as described above.

V. PERFORMANCE TESTS

In this section, we present the wall time performances of the implemented algorithms measured using a single core of a 2-core 3.00 GHz Intel Xeon E3110 CPU. The generated subroutines were compiled with the Intel Fortran compiler using the highest level of optimization. Measurements were carried out for penicillin⁶² (PEN) and two DNA systems with one (DNA₁) and two (DNA₂) adenine-thymine base pairs.⁶³ The threshold ε for contracted integrals was set to $10^{- 10} E_{h}$ in all of the calculations. Only the results for DNA₂ with the cc-pVTZ basis set are presented here. The results for the other measurements, which show that the conclusions gained hold for all the investigated systems, can be found in the supplementary material. Cache simulations were performed for hydrogen peroxide (R_OO = 2.7514 bohrs, R_HO = 1.8274 bohrs, $∢_{HOO} = {102.32}^{°}$ , dihedral angle $= {115.89}^{°}$ ) with the Valgrind program package⁶⁴ supposing a three-level CPU cache structure which is common these days: 64 kB of level 1 (L1, 32 kB for both data and instructions), 256 kB of level 2, and 4 MB of level 3 (last level, LL) cache. In the simulations, an L1 miss means that the data or instructions have not been found in the first level, while an LL miss indicates that no copy of the requested information can be found in the cache at all. Note that the number of L1 misses also contains the LL misses.

Fig. 2 shows the difference between the pPRE1 and pPRE2 primitive prescreening schemes in the case of the IN-abc algorithm. The pPRE2 method saves entering the loop over c and the prescreening for each c at the price that classes containing integrals of insignificant absolute values that would be screened out with the pPRE1 scheme are also computed. With the abc loop order, the pPRE2 approach is clearly more efficient. The difference between the performance of the two prescreening schemes, as well as the significance of primitive prescreening, shrinks with the decrease in the number of primitive functions. On the other hand, from Fig. 3 we see that the pPRE1 prescreening is more economical in the case of a cab scheme since the Schwartz screening already throws out most of the shell pairs where no b gives a significant contribution. The figures presenting the timings for the various cPRE algorithms can be found in the supplementary material. The cPRE type of screening has less effect, and for triplets that do not require either the HRR or the solid harmonic transformation, it merely saves the writing of integrals into their final storing array. As the former two tasks become more significant, the cPRE screening gets more beneficial, especially with higher basis sets, where there are more contracted functions for higher angular momenta. For the OUT-abc scheme, the lookup of the integrals of highest absolute value (cPRE1 and cPRE2) is preferred over the estimation of this quantity (cPRE3). The cPRE1 and cPRE2 schemes have very similar performance, with cPRE2 being slightly more efficient. The same tendencies can be observed with the OUT-cab algorithm, where cPRE1 is the more efficient method. We conclude that for the abc primitive loop order, the pPRE2 and cPRE2 are the prescreening schemes of choice, while for the cab algorithms the pPRE1 and cPRE1 screenings are preferred.

FIG. 3. — Wall times measured in seconds obtained by calculating all three-center ERIs of the DNA₂ molecule with the cc-pVTZ basis set by applying the OS1-IN-cab algorithm with various prescreening strategies.

The wall times measured for the shell triplets with the four variants of the OS1 algorithm, using the most efficient prescreening methods, are shown in Fig. 4. For triplets containing small angular momenta, the cab schemes are inefficient, even without primitive prescreening (see also Figs. 2 and 3). The reason for this is that the arrays that become smaller with a cab algorithm are already too short in these cases. For example, the length of the buffer array used for the contraction of functions centered on B for (ss|s) is $N_{χ_{C}}$ and 1 using an abc and a cab scheme, respectively. Here, applying the cab loop order ruins the vectorization for the primitive contraction. This effect loses its importance with the growth of L_c since N_S becomes bigger and $N_{χ_{C}}$ becomes smaller. The difference between the abc and cab schemes grows when using basis sets of higher cardinal number because of the higher number of contracted functions. The IN algorithms generally perform better than the OUT ones. One of the reasons is the apparent superiority of the pPRE-type screening, which lessens the amount of work for the HRR and solid harmonic transformation steps using the IN schemes. We must note, however, that only the s and p shells are contracted in the considered basis sets, making the OUT route theoretically more efficient only in shell triplets containing at least one such shell.

FIG. 4. — Wall times measured in seconds obtained by calculating all three-center ERIs of the DNA₂ molecule with the cc-pVTZ basis set by applying the four OS1 algorithms with the most efficient prescreening strategies.

The timings can be better interpreted inspecting the results of the cache performance simulations. The cumulated results for all the shell triplets in the TZ basis are presented in Table III, while the results with the other basis sets can be found in the supplementary material. We see that the number of level 1 instruction fetch misses (L1Is) is lower for the OUT-abc scheme than for the IN-abc, but a higher percentage of these is also last level misses. This is because with an IN algorithm the calculation of primitive integrals and the conversion into the solid harmonic Gaussian basis are done continuously step by step inside the primitive loops, while in the OUT case this procedure is divided into two parts with two separate loop structures, making it more friendly for the instruction cache for higher angular momenta, where the generated codes are lengthy. This effect is more pronounced with basis sets of higher cardinal number, where the angular momenta are higher and the loops over primitive and contracted functions perform more cycles. With the QZ and 5Z bases, we can observe the same for OUT-cab: the number of L1Is is smaller than for the IN schemes, but higher than for the OUT-abc since all the calculations take place in the loop over c, making the reuse of instructions less temporally local (that is, the same tasks are not performed as frequently as they would be with the loop over c being the innermost one). For this reason, the abc schemes are always more friendly to the instruction cache. This aspect of the performance is the reason why the OUT schemes are sometimes more efficient for shell triplets we would not expect theoretically, for example, for the (fd|f) and (ff|d) cases with the TZ basis, and also explains why the performance of this approach improves with higher basis sets. As anticipated from the sizes of the arrays used for the primitive contraction, the IN algorithms produce fewer data misses of both the read and write kind, and the cab loop order is beneficial in this aspect. This difference also grows with the cardinal number of the basis sets and is more significant for write misses since the read operations are usually carried out from arrays that have been written in a previous calculation step.

TABLE III.

Cache performance simulation results for H₂O₂ with the cc-pVTZ basis set.

	Algorithm
Event	IN-abc	IN-cab	OUT-abc	OUT-cab
L1 instruction fetch miss	687 995	720 295	665 954	820 972
LL instruction fetch miss	576 727	582 575	641 900	666 989
L1 data read miss	219 741	192 668	252 112	202 708
LL data read miss	199 655	191 036	200 703	199 053
L1 data write miss	484 132	385 047	552 062	407 074
LL data write miss	482 194	383 336	544 077	404 681

Open in a new tab

Fig. 5 compares the efficiency of the OS1 and RYS3 schemes. For each shell triplet, the selected algorithmic approach was the one that best performed according to Fig. 4, keeping in mind that the most efficient combination of the IN-OUT and abc-cab approaches for the OS1 scheme is also the most efficient one for the RYS3 since the OS1 and RYS3 schemes do not differ in any part that depends on using the IN-OUT or abc-cab approaches. While the performances fall close, the OS1 scheme is superior in almost every case. The differences are more pronounced for the shell triplets with small angular momenta in the bra. The advantage of using OS1 becomes larger for the shell triplets where the number of Rys quadrature points is over 5. In these cases, the roots and weights are calculated by applying Wheeler’s algorithm⁶⁵ and Golub’s matrix method,⁶⁶ while otherwise the less expensive schemes proposed by King and Dupuis¹⁰ are employed. The disagreement between the timings and the FLOP estimates must come from the task that is not estimated by the operation counts, that is, the evaluation of Boys functions and the roots and weights for the Rys quadrature. In some cases, the RYS3 scheme is still slightly more efficient, e.g., for the (fd|p) and (gd|p) shell triplets. The GHP scheme is competitive for the implemented cases (see Sec. IV) with the 5Z basis, where the degree of contraction is the highest. For smaller basis sets, for the (ps|p) triplet, GHP performs slightly better than OS1 since here the number of integrals to be contracted, that is, the number of integrals included in the scaled classes ${(Ω_{0,0}^{\bar{𝟎}} | \bar{𝟏})}_{1,1}$ and ${(Ω_{0,0}^{\bar{𝟏}} | \bar{𝟏})}_{0,1}$ needed for Eq. (31), is the same as the number of integrals to be contracted in the OS1 scheme, and all of the functions are contracted. The application of the cab loop order on the (ps|g) and (ps|f) triplets makes the GHP algorithm perform better for these cases than the other ones with the TZ and the QZ bases, respectively.

As it was pointed out, the relative performances of the discussed approaches depend on the number of functions and the degree of contraction therefore on the applied basis set itself. For the three test molecules we investigated, it was our experience that the best algorithm for a given shell triplet with a given basis is mostly independent of the calculated system. Based on our measurements with the cc-pVXZ bases for first row elements, in Table IV we present our recommendations for the algorithms for the shell triplets up to (hh|i). The list compiled in Table IV was composed by selecting the schemes that are the most beneficial ones for the TZ and the QZ basis sets because such bases are used most frequently in DF calculations. The best algorithm for the triplets is the same with both basis sets for most of the cases. As we can see, even though the considered basis sets have the similarity that only the s and p shells are contracted, the increase of the number of functions and the level of contraction makes the cab and OUT schemes more beneficial with the bigger bases.

TABLE IV.

Recommended algorithms for the various shell triplets.

Shell triplet	Algorithm	Shell triplet	Algorithm	Shell triplet	Algorithm
(ss\|s)	OS1-IN-abc	(fp\|s)	OS1-IN-abc	(gg\|s)	OS1-OUT-abc
(ss\|p)	OS1-IN-abc	(fp\|p)	OS1-OUT-abc	(gg\|p)	OS1-IN-cab
(ss\|d)	OS1-IN-abc	(fp\|d)	OS1-IN-cab	(gg\|d)	OS1-IN-cab
(ss\|f)	OS1-IN-abc	(fp\|f)	OS1-IN-cab	(gg\|f)	OS1-IN-cab
(ss\|g)	OS1-IN-cab	(fp\|g)	OS1-IN-cab	(gg\|g)	OS1-IN-cab
(ss\|h)	OS1-IN-cab	(fp\|h)	OS1-OUT-cab	(gg\|h)	OS1-OUT-cab
(ss\|i)	OS1-IN-cab	(fp\|i)	OS1-OUT-cab	(gg\|i)	OS1-OUT-abc
(ps\|s)	OS1-IN-abc	(fd\|s)	OS1-IN-cab	(hs\|s)	RYS3-IN-abc
(ps\|p)	GHP-abc	(fd\|p)	RYS3-IN-cab	(hs\|p)	OS1-IN-abc
(ps\|d)	OS1-IN-abc	(fd\|d)	OS1-IN-cab	(hs\|d)	OS1-IN-abc
(ps\|f)	OS1-IN-abc	(fd\|f)	OS1-OUT-abc	(hs\|f)	OS1-IN-abc
(ps\|g)	GHP-cab	(fd\|g)	OS1-OUT-abc	(hs\|g)	OS1-IN-abc
(ps\|h)	OS1-IN-cab	(fd\|h)	OS1-IN-abc	(hs\|h)	OS1-IN-cab
(ps\|i)	OS1-IN-cab	(fd\|i)	OS1-OUT-abc	(hs\|i)	OS1-IN-cab
(pp\|s)	OS1-IN-abc	(ff\|s)	OS1-IN-cab	(hp\|s)	RYS3-IN-cab
(pp\|p)	OS1-OUT-abc	(ff\|p)	OS1-OUT-abc	(hp\|p)	OS1-IN-cab
(pp\|d)	OS1-IN-cab	(ff\|d)	OS1-OUT-abc	(hp\|d)	OS1-IN-abc
(pp\|f)	OS1-IN-cab	(ff\|f)	OS1-IN-cab	(hp\|f)	OS1-OUT-abc
(pp\|g)	OS1-IN-cab	(ff\|g)	OS1-IN-cab	(hp\|g)	OS1-OUT-abc
(pp\|h)	OS1-IN-cab	(ff\|h)	OS1-IN-cab	(hp\|h)	OS1-IN-cab
(pp\|i)	OS1-IN-cab	(ff\|i)	OS1-OUT-cab	(hp\|i)	OS1-IN-cab
(ds\|s)	OS1-IN-abc	(gs\|s)	OS1-IN-abc	(hd\|s)	RYS3-IN-cab
(ds\|p)	OS1-IN-abc	(gs\|p)	OS1-IN-abc	(hd\|p)	OS1-OUT-abc
(ds\|d)	OS1-IN-abc	(gs\|d)	OS1-IN-abc	(hd\|d)	OS1-OUT-cab
(ds\|f)	OS1-IN-abc	(gs\|f)	OS1-IN-abc	(hd\|f)	OS1-OUT-cab
(ds\|g)	OS1-IN-abc	(gs\|g)	OS1-IN-abc	(hd\|g)	OS1-OUT-abc
(ds\|h)	OS1-IN-cab	(gs\|h)	OS1-IN-cab	(hd\|h)	OS1-OUT-cab
(ds\|i)	OS1-IN-cab	(gs\|i)	OS1-IN-cab	(hd\|i)	OS1-OUT-cab
(dp\|s)	OS1-IN-abc	(gp\|s)	OS1-IN-cab	(hf\|s)	OS1-OUT-abc
(dp\|p)	OS1-OUT-abc	(gp\|p)	OS1-IN-cab	(hf\|p)	OS1-OUT-abc
(dp\|d)	OS1-IN-cab	(gp\|d)	OS1-IN-cab	(hf\|d)	OS1-IN-cab
(dp\|f)	OS1-IN-cab	(gp\|f)	OS1-OUT-abc	(hf\|f)	OS1-IN-cab
(dp\|g)	OS1-IN-cab	(gp\|g)	OS1-IN-cab	(hf\|g)	OS1-IN-cab
(dp\|h)	OS1-IN-cab	(gp\|h)	OS1-IN-cab	(hf\|h)	OS1-OUT-cab
(dp\|i)	OS1-OUT-cab	(gp\|i)	OS1-OUT-abc	(hf\|i)	OS1-OUT-abc
(dd\|s)	OS1-IN-abc	(gd\|s)	OS1-IN-cab	(hg\|s)	OS1-OUT-abc
(dd\|p)	OS1-IN-cab	(gd\|p)	OS1-IN-cab	(hg\|p)	OS1-IN-cab
(dd\|d)	OS1-IN-cab	(gd\|d)	OS1-OUT-abc	(hg\|d)	OS1-IN-cab
(dd\|f)	OS1-IN-cab	(gd\|f)	OS1-OUT-abc	(hg\|f)	OS1-OUT-cab
(dd\|g)	OS1-OUT-abc	(gd\|g)	OS1-OUT-abc	(hg\|g)	OS1-OUT-cab
(dd\|h)	OS1-OUT-cab	(gd\|h)	OS1-IN-cab	(hg\|h)	OS1-OUT-cab
(dd\|i)	OS1-OUT-cab	(gd\|i)	OS1-OUT-cab	(hg\|i)	OS1-OUT-abc
(fs\|s)	OS1-IN-abc	(gf\|s)	RYS3-IN-cab	(hh\|s)	OS1-IN-cab
(fs\|p)	OS1-OUT-abc	(gf\|p)	OS1-OUT-abc	(hh\|p)	OS1-IN-cab
(fs\|d)	OS1-IN-abc	(gf\|d)	OS1-OUT-abc	(hh\|d)	OS1-IN-cab
(fs\|f)	OS1-IN-abc	(gf\|f)	OS1-OUT-abc	(hh\|f)	OS1-OUT-cab
(fs\|g)	OS1-IN-cab	(gf\|g)	OS1-IN-cab	(hh\|g)	OS1-OUT-cab
(fs\|h)	OS1-IN-cab	(gf\|h)	OS1-IN-abc	(hh\|h)	OS1-OUT-abc
(fs\|i)	OS1-IN-cab	(gf\|i)	OS1-OUT-abc	(hh\|i)	OS1-OUT-cab

Open in a new tab

VI. BENCHMARK CALCULATIONS

To demonstrate the efficiency of our implementation based on the above recommendation, in Table V we present the wall times measured for the evaluation of three-center ERIs for test systems of various size, namely, penicillin,⁶² DNA fragments containing 1 and 4 adenine-thymine base pairs⁶⁷ (DNA₁ and DNA₄, respectively), indinavir,⁶⁸ angiotensin II,⁶⁹ and a halloysite clay structure.⁷⁰ The measurements were carried out using 8 cores of a 3.00 GHz Intel Xeon E5-1660 CPU. The results are close to quadratic scaling with the total number of basis functions due to the various integral screenings, and the prefactor is kept small by the efficient implementation. We have also experienced a constant speedup of about 3 compared to our general purpose routine using the OS1 scheme, which shows that we can gain an efficient implementation optimized for each shell triplet separately. We note that three-center ERIs can also be easily computed with the algorithms developed for four-center ones constraining two of the four centers to be coincident. Since many quantum chemistry software packages evaluate three-center Coulomb integrals in this way, it is instructive to compare the speed of an explicitly three-center code to that of a four-center one for three-center ERIs. Therefore, we compared our three-center code to our previous OS-based four-center integral program⁷¹ and have found that the former program is roughly 3.5 times faster than the latter one. We also note that the efficiency of our integral code has been recently demonstrated also in the case of the integral-direct local correlation approach of Ref. 9, where roughly one-third of the entire computation time is spent on the calculation of three-center ERIs.

TABLE V.

Wall times of three-center ERI calculations in minutes measured for various test systems with the cc-pVXZ basis sets. N + M denotes the total number of ordinary basis functions and fitting functions.

	X
	D		T		Q		5
Test system	Time	N + M	Time	N + M	Time	N + M	Time	N + M
Penicillin	0.008	430 + 2 136	0.022	946 + 2 478	0.088	1 864 + 3 504	0.372	3 178 + 5 033
DNA₁	0.016	625 + 3 071	0.049	1428 + 3 575	0.201	2 735 + 5 087	0.883	4 670 + 7 351
Indinavir	0.033	865 + 4 231	0.118	2008 + 4 965	0.492	3 885 + 7 167	2.251	6 680 + 10 471
Angiotensin II	0.104	1405 + 6 883	0.380	3244 + 8 055	1.609	6 255 + 11 571	7.245	10 730 + 16 843
DNA₄	0.474	2746 + 19 820	1.777	6192 + 15 794	8.307	11 774 + 22 202	33.174	20 012 + 31 744
Halloysite	1.306	3700 + 19 820	4.607	7970 + 22 435	19.854	14 855 + 30 280	68.447	24 985 + 41 510

Open in a new tab

VII. CONCLUSIONS

We have compared the Obara–Saika, McMurchie–Davidson, Gill–Head-Gordon–Pople, and Rys quadrature schemes as well as their combinations for the evaluation of three-center Coulomb integrals. Various algorithmic considerations, such as the order of loops for primitive functions, the application of the horizontal recurrence relation, and the solid harmonic transformation at the primitive or contracted level, and several prescreening strategies have also been investigated. Based on estimations for the number of necessary floating point operations for a simple model system, we concluded that the Obara–Saika scheme, utilizing the vertical recurrence relation of Ahlrichs,⁴⁰ is the most efficient choice, with the Gill–Head-Gordon–Pople algorithm and the combination of the Rys quadrature and the Obara–Saika schemes being competitive for a few special cases. The most promising algorithms were implemented via automated code generation for all shell triplets up to (hh|i) along with the discussed algorithmic approaches. Wall time measurements for medium sized molecules also showed the Obara–Saika scheme to be superior, and the most effective prescreening technique was determined for each algorithmic approach. Even though the floating point operation counts suggested that the horizontal recurrence relation and the solid harmonic transformation are significantly more efficient when applied to contracted integrals, this does not seem to be the case for the majority of shell triplets encountered in practical calculations. The reason for this is that performing these two tasks on primitive integrals allows for the use of more effective prescreening and memory layout. Based on our investigations, we have presented a recommendation for the algorithms to be used for the various shell triplets, favoring the ones that perform the best with triple- and quadruple-zeta basis sets.

SUPPLEMENTARY MATERIAL

See supplementary material for the analysis of the prescreening schemes presented in Sec. II F, for the relative theoretical performances of the investigated algorithms referred to in Sec. III, for the wall time measurement and cache simulation results discussed in Sec. V, for the performance of the ERI calculation on multiple CPU cores, and for the geometries of the molecules used in the performance tests and benchmark calculations.

ACKNOWLEDGMENTS

The authors are indebted to Professor Reinhart Ahlrichs and Dr. Gerald Knizia for useful discussions. The computing time granted on the Hungarian HPC Infrastructure at NIIF Institute, Hungary, is gratefully acknowledged.

APPENDIX A: IMPROVED RECURRENCE RELATION FOR THE 2D INTEGRALS OF THE RYS SCHEME

In the general case, Eq. (45) contains a third term and has the form¹⁷

Θ_{x}^{i_{a}, 0, i_{c} + 1} (t_{n}^{2}) = \frac{α}{c} X_{PC} t_{n}^{2} Θ_{x}^{i_{a}, 0, i_{c}} (t_{n}^{2}) + \frac{i_{a} t_{n}^{2}}{2 (p + c)} Θ_{x}^{i_{a} - 1, 0, i_{c}} (t_{n}^{2}) + \frac{i_{c}}{2 c} (1 - \frac{α}{c} t_{n}^{2}) Θ_{x}^{i_{a}, 0, i_{c} - 1} (t_{n}^{2}) .

(A1)

With the help of Eq. (12) we can show that, if the ket side will be transformed into the solid harmonic Gaussian basis, the third term on the left-hand side of Eq. (A1) can be omitted. To see this, we first notice from backtracking the recursion defined by Eq. (12) that an integral ${(𝒍_{a}^{#} 𝟎 | 𝒍_{c}^{#})}^{(m + n)}$ contributes to (l_a0|L_c)^(m) only if

l_{c}^{#} = l_{c} - n

(A2)

since each recursion step decreases n and increases $𝑙_{c}^{#}$ by one. Then, let us express (l_a0|L_c)^(m) as

{(𝒍_{a} 𝟎 | 𝒍_{c})}^{(m)} = \sum_{n = 1}^{N_{r t s}} t_{n}^{2 m} Θ_{x}^{i_{a}, 0, i_{c}} (t_{n}^{2}) Θ_{y}^{j_{a}, 0, j_{c}} (t_{n}^{2}) Θ_{z}^{k_{a}, 0, k_{c}} (t_{n}^{2}) .

(A3)

Substituting Eq. (A1) into Eq. (A3) we get

\begin{matrix} {(𝒍_{a} 𝟎 | 𝐥_{c})}^{(m)} & = \sum_{n = 1}^{N_{r t s}} t_{n}^{2 m} [\frac{α}{c} X_{PC} t_{n}^{2} Θ_{x}^{i_{a}, 0, i_{c} - 1} (t_{n}^{2}) + \frac{i_{a} t_{n}^{2}}{2 (p + c)} Θ_{x}^{i_{a} - 1, 0, i_{c} - 1} (t_{n}^{2}) + \frac{i_{c}}{2 c} (1 - \frac{α}{c} t_{n}^{2}) Θ_{x}^{i_{a}, 0, i_{c} - 2} (t_{n}^{2})] \\ \times [\frac{α}{c} Y_{PC} t_{n}^{2} Θ_{y}^{j_{a}, 0, j_{c} - 1} (t_{n}^{2}) + \frac{j_{a} t_{n}^{2}}{2 (p + c)} Θ_{y}^{j_{a} - 1, 0, j_{c} - 1} (t_{n}^{2}) + \frac{j_{c}}{2 c} (1 - \frac{α}{c} t_{n}^{2}) Θ_{y}^{j_{a}, 0, j_{c} - 2} (t_{n}^{2})] \\ \times [\frac{α}{c} Z_{PC} t_{n}^{2} Θ_{z}^{k_{a}, 0, k_{c} - 1} (t_{n}^{2}) + \frac{k_{a} t_{n}^{2}}{2 (p + c)} Θ_{z}^{k_{a} - 1, 0, k_{c} - 1} (t_{n}^{2}) + \frac{k_{c}}{2 c} (1 - \frac{α}{c} t_{n}^{2}) Θ_{z}^{k_{a}, 0, k_{c} - 2} (t_{n}^{2})] . \end{matrix}

(A4)

Each of the terms arising by performing the multiplications amongst the brackets can contribute to an integral determined by the indices of the 2D integrals. For example, the term arising from multiplying the first terms of the brackets contributes to a scaled version of ${(l_{a}^{#} 0 | l_{c}^{#})}^{(m + 3)}$ with $l_{a}^{#} = (i_{a}, j_{a}, k_{a})$ and $l_{c}^{#} = (i_{c} - 1, j_{c} - 1, k_{c} - 1)$ through Eq. (A3), which is used in the expansion of (l_a0|L_c)^(m) by Eq. (12) if we go three steps back in the recursion. The terms containing the third 2D integral from one or more brackets in Eq. (A4) are used to build the ${(l_{a}^{#} 0 | l_{c}^{#})}^{(m + n)}$ classes with Eq. (A3) where $0 \leq n \leq 3$ (because the third 2D integral can be multiplied by a quantity that does or does not contain $t_{n}^{2}$ ), $l_{a} - 2 \leq l_{a}^{#} \leq l_{a}$ (because the product can contain a maximum of two of the second 2D integrals which each reduce $l_{a}^{#}$ by one), and $l_{c} - 6 \leq l_{c}^{#} \leq l_{c} - 4$ (because the first two 2D integrals reduce $l_{c}^{#}$ by one, while the third does so by two). Since none of these satisfy Eq. (A2), the contributions containing the third terms in the brackets in Eq. (A4) will be canceled during the solid harmonic transformation and can be taken to be zero, which means that Eq. (A1) reduces to Eq. (45). The same reasoning applies to the second term in Eq. (44) in the case of L_b = 0, when the third and fourth terms in Eq. (11) vanish.

APPENDIX B: A RIGOROUS UPPER BOUND FOR PRIMITIVE THREE-CENTER ERIs

It is possible to construct an exact prescreening scheme for the primitive integrals based on the Schwartz inequality,

| (𝑳_{a} 𝑳_{b} | 𝑳_{c}) |^{2} \leq | (𝑳_{a} 𝑳_{b} | 𝑳_{a} 𝑳_{b}) | | (𝑳_{c} | 𝑳_{c}) |,

(B1)

by giving upper bounds to the integrals on the right-hand side of Eq. (B1). In fact, the exact value of (L_c|L_c) can be simply calculated by using Eq. (12) and noting that in this special case R_PC = 0, which gives

(𝑳_{c} | 𝑳_{c}) = \frac{L_{c}!}{{(4 c)}^{L_{c}}} {(𝟎 | 𝟎)}^{(L_{c})} = \frac{L_{c}!}{{(4 c)}^{L_{c}}} θ_{c c} \frac{1}{2 L_{c} + 1},

(B2)

where it was also exploited that F_n(0) = 1/(2n + 1).¹⁷ To gain an upper bound for |(L_aL_b|L_aL_b)|, we have to track back the recursions necessary to build up this integral. Let us first define the maximum absolute value component of R_AB as

m R_{AB} = max (| X_{AB} |, | Y_{AB} |, | Z_{AB} |) .

(B3)

Then, by Eq. (13), an upper bound for |(L_aL_b|L_aL_b)| is

| (𝑳_{a} 𝐿_{b} | 𝑳_{a} 𝑳_{b}) | \leq [\sum_{l_{b} = 0}^{L_{b}} (\begin{matrix} L_{b} \\ l_{b} \end{matrix}) m R_{AB}^{l_{b}}] M_{L_{a} L_{b} l_{b}} = U_{H R R} M_{L_{a} L_{b} l_{b}},

(B4)

where $M_{L_{a} L_{b} l_{b}}$ is a value that is greater than the absolute value of any of the integrals (L_aL_b|l_b0) for $L_{a} \leq l_{b} \leq L_{a} + L_{b}$ . Proceeding in the same manner for the bra side, we get

| (𝑳_{a} 𝑳_{b} | 𝑳_{a} 𝑳_{b}) | \leq U_{H R R}^{2} M_{l_{a} l_{b}},

(B5)

where, similarly, $M_{l_{a} l_{b}}$ is an upper bound for |(l_a0|l_b0)| with $L_{a} \leq l_{a} \leq L_{a} + L_{b}$ and $L_{a} \leq l_{b} \leq L_{a} + L_{b}$ . To get an upper bound for these types of integrals, we inspect the VRR for four-center ERIs¹⁹

{(𝒍_{a} 𝟎 | [𝒍_{b} + 𝟏_{x}] 𝟎)}^{(n)} = X_{PA} {(𝒍_{a} 𝟎 | 𝒍_{b} 𝟎)}^{(n)} + \frac{i_{b}}{2 p} {(𝒍_{a} 𝟎 | 𝒍_{b} - 𝟏_{x} 𝟎)}^{(n)} - \frac{i_{b}}{4 p} {(𝒍_{a} 𝟎 | 𝒍_{b} - 𝟏_{x} 𝟎)}^{(n + 1)} + \frac{i_{a}}{4 p} {(𝒍_{a} - 𝟏_{x} 𝟎 | 𝒍_{b} 𝟎)}^{(n + 1)},

(B6)

which can be used to expand (l_a0|lb0) type ERIs in (l_a0|00) type ones. The highest number of terms in this expansion, N_VRR1, will belong to ([L_a + L_b]0|[L_b + L_b]0). We can then write

| (𝑳_{a} 𝑳_{b} | 𝑳_{a} 𝑳_{b}) | \leq U_{H R R}^{2} N_{V R R 1} U_{V R R} M_{l_{a}},

(B7)

where

U_{V R R} = max [m R_{PA}^{L_{a} + L_{b}}, {(\frac{L_{a} + L_{b}}{2 p})}^{⌊ \frac{L_{a} + L_{b}}{2} ⌉}, 1]

(B8)

is the biggest recursion coefficient that can occur, and $M_{l_{a}}$ is an upper bound for |(l_a0|00)| with $0 \leq l_{a} \leq L_{a} + L_{b}$ . N_VRR1 can be given as

N_{V R R 1} = \sum_{m = 0}^{⌊ \frac{L_{a} + L_{b}}{2} ⌉} (\begin{matrix} L_{a} + L_{b} - m \\ m \end{matrix}) 2^{L_{a} + L_{b} - m} .

(B9)

It only remains to give an appropriate value of $M_{l_{a}}$ , for which we use the VRR

{(𝒍_{a} + 𝟏_{x} 𝟎 | 𝟎𝟎)}^{(n)} = X_{PA} {(𝒍_{a} 𝟎 | 𝟎𝟎)}^{(n)} + \frac{i_{a}}{2 p} {(𝒍_{a} - 𝟏_{x} 𝟎 | 𝟎𝟎)}^{(n)} - \frac{i_{a}}{4 p} {(𝒍_{a} - 𝟏_{x} 𝟎 | 𝟎𝟎)}^{(n + 1)}

(B10)

to expand ([L_a + L_b]0|00) in N_VRR2 (00|00)⁽ⁿ⁾ type integrals, the greatest of which will be $κ_{a b}^{2} θ_{p p} F_{0} (0) = κ_{a b}^{2} θ_{p p}$ . We then get

| (𝑳_{a} 𝑳_{b} | 𝑳_{a} 𝑳_{b}) | \leq U_{H R R}^{2} N_{V R R 1} N_{V R R 2} U_{V R R}^{2} κ_{a b}^{2} θ_{p p}

(B11)

with

N_{V R R 2} = \sum_{m = 0}^{⌊ \frac{L_{a} + L_{b}}{2} ⌉} (\begin{matrix} L_{a} + L_{b} - m \\ m \end{matrix}) 2^{m} .

(B12)

Note that U_HRR only depends on the inter-nuclear distances in the bra and L_b, N_VRR1, and N_VRR2 only depend on L_a + L_b, and $m R_{PA} = b / p m R_{AB}$ . If desired, a bound for integrals over spherical harmonic Gaussians can be given by multiplying the screening value by (2L_a + 1) (2L_b + 1) (2L_c + 1) and the maximal coefficients in Eq. (5) for the three shells. In our experience if we neglect this, the integrals that are falsely discarded have the same magnitude as the tolerance. Applying the scheme described above, roughly an extra 5% and 10% of the integrals are calculated with respect to the approaches presented in Sec. II F for the TZ and QZ bases, respectively, and the wall times increase by about 10%.

We note that an upper bound can also be derived directly for the (L_aL_b|L_c) integrals in a way similar to the one outlined here for (L_aL_b|L_aL_b), but the resulting scheme is less efficient due to the increased number of FLOPs and logical operations necessary inside the primitive loops.

REFERENCES

1.Boys S. F. and Shavitt I., University of Wisconsin Naval Research Laboratory Report No. WIS-AF-13, 1959.
2.Baerends E. J., Ellis D. E., and Ros P., Chem. Phys. 2, 41 (1973). 10.1016/0301-0104(73)80059-x [DOI] [Google Scholar]
3.Whitten J. L., J. Chem. Phys. 58, 4496 (1973). 10.1063/1.1679012 [DOI] [Google Scholar]
4.Dunlap B. I., Connolly J. W. D., and Sabin J. R., J. Chem. Phys. 71, 3396 (1979). 10.1063/1.438728 [DOI] [Google Scholar]
5.Dunlap B. I., Phys. Chem. Chem. Phys. 2, 2113 (2000). 10.1039/b000027m [DOI] [Google Scholar]
6.Almlöf J., K. Fægri, Jr., and Korsell K., J. Comput. Chem. 3, 385 (1982). 10.1002/jcc.540030314 [DOI] [Google Scholar]
7.Häser M. and Ahlrichs R., J. Comput. Chem. 10, 104 (1989). 10.1002/jcc.540100111 [DOI] [Google Scholar]
8.Weigend F., Phys. Chem. Chem. Phys. 4, 4285 (2002). 10.1039/b204199p [DOI] [Google Scholar]
9.Nagy P. R., Samu G., and Kállay M., J. Chem. Theory Comput. 12, 4897 (2016). 10.1021/acs.jctc.6b00732 [DOI] [PubMed] [Google Scholar]
10.King H. F. and Dupuis M., J. Comput. Phys. 21, 144 (1976). 10.1016/0021-9991(76)90008-5 [DOI] [Google Scholar]
11.Dupuis M., Rys J., and King H. F., J. Chem. Phys. 65, 111 (1976). 10.1063/1.432807 [DOI] [Google Scholar]
12.Rys J., Dupuis M., and King H. F., J. Comput. Chem. 4, 154 (1983). 10.1002/jcc.540040206 [DOI] [Google Scholar]
13.Komornicki A. and King H. F., J. Chem. Phys. 134, 244115 (2011). 10.1063/1.3600745 [DOI] [PubMed] [Google Scholar]
14.King H. F., J. Phys. Chem. A 120, 9348 (2016). 10.1021/acs.jpca.6b10004 [DOI] [PubMed] [Google Scholar]
15.Dupuis M. and Marquez A., J. Chem. Phys. 114, 2067 (2001). 10.1063/1.1336541 [DOI] [Google Scholar]
16.Dupuis M., Comput. Phys. Commun. 134, 150 (2001). 10.1016/s0010-4655(00)00195-8 [DOI] [Google Scholar]
17.Helgaker T., Jørgensen P., and Olsen J., Molecular Electronic Structure Theory (Wiley, Chichester, 2000). [Google Scholar]
18.McMurchie L. E. and Davidson E. R., J. Comput. Phys. 26, 218 (1978). 10.1016/0021-9991(78)90092-x [DOI] [Google Scholar]
19.Obara S. and Saika A., J. Chem. Phys. 84, 3963 (1986). 10.1063/1.450106 [DOI] [Google Scholar]
20.Honda H., Yamaki T., and Obara S., J. Chem. Phys. 117, 1457 (2002). 10.1063/1.1485958 [DOI] [Google Scholar]
21.Head-Gordon M. and Pople J. A., J. Chem. Phys. 89, 5777 (1988). 10.1063/1.455553 [DOI] [Google Scholar]
22.Ryu U., Lee Y. S., and Lindh R., Chem. Phys. Lett. 185, 562 (1991). 10.1016/0009-2614(91)80260-5 [DOI] [Google Scholar]
23.Johnson B. G., Gill P. M. W., and Pople J. A., Chem. Phys. Lett. 206, 229 (1992). 10.1016/0009-2614(93)85546-z [DOI] [Google Scholar]
24.Hamilton T. P. and Schaefer H. F. III, Chem. Phys. 150, 163 (1991). 10.1016/0301-0104(91)80126-3 [DOI] [Google Scholar]
25.Lindh R., Ryu U., and Liu B., J. Chem. Phys. 95, 5889 (1991). 10.1063/1.461610 [DOI] [Google Scholar]
26.Lindh R., Theor. Chim. Acta 85, 423 (1993). 10.1007/bf01112982 [DOI] [PubMed] [Google Scholar]
27.Gill P. M. W., Head-Gordon M., and Pople J. A., Int. J. Quantum Chem. 36, 269 (1989). 10.1002/qua.560360831 [DOI] [Google Scholar]
28.Johnson B. G., Gill P. M. W., and Pople J. A., Int. J. Quantum Chem. 40, 809 (1991). 10.1002/qua.560400610 [DOI] [Google Scholar]
29.Gill P. M. W., Johnson B. G., and Pople J. A., Chem. Phys. Lett. 217, 65 (1994). 10.1016/0009-2614(93)e1340-m [DOI] [Google Scholar]
30.Johnson B. G., Gill P. M. W., Pople J. A., and Fox D. J., Chem. Phys. Lett. 206, 239 (1993). 10.1016/0009-2614(93)85547-2 [DOI] [Google Scholar]
31.Gill P. M. W., Head-Gordon M., and Pople J. A., J. Phys. Chem. 94, 5564 (1990). 10.1021/j100377a031 [DOI] [Google Scholar]
32.Gill P. M. W. and Pople J. A., Int. J. Quantum Chem. 40, 753 (1991). 10.1002/qua.560400605 [DOI] [Google Scholar]
33.Gill P. M. W. and Johnson B. G., Int. J. Quantum Chem. 40, 745 (1991). 10.1002/qua.560400604 [DOI] [Google Scholar]
34.Gill P. M. W., Adv. Quantum Chem. 25, 141 (1994). 10.1016/s0065-3276(08)60019-2 [DOI] [Google Scholar]
35.Köster A. M., J. Chem. Phys. 104, 4114 (1996). 10.1063/1.471224 [DOI] [Google Scholar]
36.Köster A. M., J. Chem. Phys. 118, 9943 (2003). 10.1063/1.1571519 [DOI] [Google Scholar]
37.Calaminici P., Domínguez-Soria V. D., Geudtner G., Hernández-Marín E., and Köster A. M., Theor. Chem. Acc. 115, 221 (2006). 10.1007/s00214-005-0005-0 [DOI] [Google Scholar]
38.Reine S., Tellgren E., and Helgaker T., Phys. Chem. Chem. Phys. 9, 4771 (2007). 10.1039/b705594c [DOI] [PubMed] [Google Scholar]
39.Reine S., Helgaker T., and Lindh R., Wiley Interdiscip. Rev.: Comput. Mol. Sci. 2, 290 (2012). 10.1002/wcms.78 [DOI] [Google Scholar]
40.Ahlrichs R., Phys. Chem. Chem. Phys. 6, 5119 (2004). 10.1039/b413539c [DOI] [Google Scholar]
41.Valeev E. F., Libint: A library for the evaluation of molecular integrals of many-body operators over Gaussian functions, http://libint.valeyev.net/.
42.Valeev E. F. and Janssen C. L., J. Chem. Phys. 121, 1214 (2004). 10.1063/1.1759319 [DOI] [PubMed] [Google Scholar]
43.Werner H.-J., Knizia G., and Manby F. R., Mol. Phys. 109, 407 (2011). 10.1080/00268976.2010.526641 [DOI] [Google Scholar]
44.Shao Y. and Head-Gordon M., Chem. Phys. Lett. 323, 425 (2000). 10.1016/s0009-2614(00)00524-8 [DOI] [Google Scholar]
45.Sodt A., Subotnik J. E., and Head-Gordon M., J. Chem. Phys. 125, 194109 (2006). 10.1063/1.2370949 [DOI] [PubMed] [Google Scholar]
46.Polly R., Werner H.-J., Manby F. R., and Knowles P. J., Mol. Phys. 102, 2311 (2004). 10.1080/0026897042000274801 [DOI] [Google Scholar]
47.Reine S., Tellgren E., Krapp A., Kjærgaard T., Helgaker T., Jansik B., Høst S., and Salek P., J. Chem. Phys. 129, 104101 (2008). 10.1063/1.2956507 [DOI] [PubMed] [Google Scholar]
48.Manzer S. F., Epifanovsky E., and Head-Gordon M., J. Chem. Theory Comput. 11, 518 (2014). 10.1021/ct5008586 [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Mejía-Rodríguez D. and Köster A. M., J. Chem. Phys. 141, 124114 (2014). 10.1063/1.4896199 [DOI] [PubMed] [Google Scholar]
50.Mejía-Rodríguez D., Huang X., del Campo J. M., and Köster A. M., Adv. Quantum Chem. 71, 41 (2015). 10.1016/bs.aiq.2015.03.009 [DOI] [Google Scholar]
51.Köppl C. and Werner H.-J., J. Chem. Theory Comput. 12, 3122 (2016). 10.1021/acs.jctc.6b00251 [DOI] [PubMed] [Google Scholar]
52.Sierka M., Hogekamp A., and Ahlrichs R., J. Chem. Phys. 118, 9136 (2003). 10.1063/1.1567253 [DOI] [Google Scholar]
53.Alvarez-Ibarra A. and Köster A. M., J. Chem. Phys. 139, 024102 (2013). 10.1063/1.4812183 [DOI] [PubMed] [Google Scholar]
54.Alvarez-Ibarra A. and Köster A. M., Mol. Phys. 113, 3128 (2015). 10.1080/00268976.2015.1078009 [DOI] [Google Scholar]
55.Ishida K., J. Chem. Phys. 98, 2176 (1993). 10.1063/1.464196 [DOI] [Google Scholar]
56.Flocke N. and Lotrich V., J. Comput. Chem. 29, 2722 (2008). 10.1002/jcc.21018 [DOI] [PubMed] [Google Scholar]
57.T. H. Dunning, Jr., J. Chem. Phys. 90, 1007 (1989). 10.1063/1.456153 [DOI] [Google Scholar]
58.Woon D. E. and T. H. Dunning, Jr., J. Chem. Phys. 98, 1358 (1993). 10.1063/1.464303 [DOI] [Google Scholar]
59.Weigend F., Köhn A., and Hättig C., J. Chem. Phys. 116, 3175 (2002). 10.1063/1.1445115 [DOI] [Google Scholar]
60.Hollman D. S., Schaefer H. F. III, and Valeev E. F., J. Chem. Phys. 142, 154106 (2015). 10.1063/1.4917519 [DOI] [PubMed] [Google Scholar]
61.MRCC, a quantum chemical program suite written by Kállay M., Rolik Z., Csontos J., Ladjánszki I., Szegedy L., Ladóczki B., Samu G., Petrov K., Farkas M., Nagy P., Mester D., and Hégely B., see also Ref. 71 as well as http://www.mrcc.hu/.
62.Neese F., Hansen A., and Liakos D. G., J. Chem. Phys. 131, 064103 (2009). 10.1063/1.3173827 [DOI] [PubMed] [Google Scholar]
63.Helgaker T., Gauss J., Jørgensen P., and Olsen J., J. Chem. Phys. 106, 6430 (1997). 10.1063/1.473634 [DOI] [Google Scholar]
64.Weidendorfer J., Kowarschik M., and Trinitis C., in Proceedings of the 4th International Conference on Computational Science (ICCS 2004), Krakow, Poland, 2004. [Google Scholar]
65.Wheeler J. C., Rocky Mt. J. Math. 4, 287 (1974). 10.1216/rmj-1974-4-2-287 [DOI] [Google Scholar]
66.Golub H. and Welsch J. H., Math. Comput. 23, 221 (1969). 10.1090/s0025-5718-69-99647-1 [DOI] [Google Scholar]
67.Doser B., Lambrecht D. S., Kussmann J., and Ochsenfeld C., J. Chem. Phys. 130, 064107 (2009). 10.1063/1.3072903 [DOI] [PubMed] [Google Scholar]
68.Schütz M., Hetzer G., and Werner H.-J., J. Chem. Phys. 111, 5691 (1999). 10.1063/1.479957 [DOI] [Google Scholar]
69.Eshuis H., Yarkony J., and Furche F., J. Chem. Phys. 132, 234114 (2010). 10.1063/1.3442749 [DOI] [PubMed] [Google Scholar]
70.Hári J., Polyák P., Mester D., Mitušík M., Omastová M., Kállay M., and Pukánszky B., Appl. Clay Sci. 132, 167 (2016). 10.1016/j.clay.2016.06.001 [DOI] [Google Scholar]
71.Rolik Z., Szegedy L., Ladjánszki I., Ladóczki B., and Kállay M., J. Chem. Phys. 139, 094105 (2013). 10.1063/1.4819401 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

[c1] 1.Boys S. F. and Shavitt I., University of Wisconsin Naval Research Laboratory Report No. WIS-AF-13, 1959.

[c2] 2.Baerends E. J., Ellis D. E., and Ros P., Chem. Phys. 2, 41 (1973). 10.1016/0301-0104(73)80059-x [DOI] [Google Scholar]

[c3] 3.Whitten J. L., J. Chem. Phys. 58, 4496 (1973). 10.1063/1.1679012 [DOI] [Google Scholar]

[c4] 4.Dunlap B. I., Connolly J. W. D., and Sabin J. R., J. Chem. Phys. 71, 3396 (1979). 10.1063/1.438728 [DOI] [Google Scholar]

[c5] 5.Dunlap B. I., Phys. Chem. Chem. Phys. 2, 2113 (2000). 10.1039/b000027m [DOI] [Google Scholar]

[c6] 6.Almlöf J., K. Fægri, Jr., and Korsell K., J. Comput. Chem. 3, 385 (1982). 10.1002/jcc.540030314 [DOI] [Google Scholar]

[c7] 7.Häser M. and Ahlrichs R., J. Comput. Chem. 10, 104 (1989). 10.1002/jcc.540100111 [DOI] [Google Scholar]

[c8] 8.Weigend F., Phys. Chem. Chem. Phys. 4, 4285 (2002). 10.1039/b204199p [DOI] [Google Scholar]

[c9] 9.Nagy P. R., Samu G., and Kállay M., J. Chem. Theory Comput. 12, 4897 (2016). 10.1021/acs.jctc.6b00732 [DOI] [PubMed] [Google Scholar]

[c10] 10.King H. F. and Dupuis M., J. Comput. Phys. 21, 144 (1976). 10.1016/0021-9991(76)90008-5 [DOI] [Google Scholar]

[c11] 11.Dupuis M., Rys J., and King H. F., J. Chem. Phys. 65, 111 (1976). 10.1063/1.432807 [DOI] [Google Scholar]

[c12] 12.Rys J., Dupuis M., and King H. F., J. Comput. Chem. 4, 154 (1983). 10.1002/jcc.540040206 [DOI] [Google Scholar]

[c13] 13.Komornicki A. and King H. F., J. Chem. Phys. 134, 244115 (2011). 10.1063/1.3600745 [DOI] [PubMed] [Google Scholar]

[c14] 14.King H. F., J. Phys. Chem. A 120, 9348 (2016). 10.1021/acs.jpca.6b10004 [DOI] [PubMed] [Google Scholar]

[c15] 15.Dupuis M. and Marquez A., J. Chem. Phys. 114, 2067 (2001). 10.1063/1.1336541 [DOI] [Google Scholar]

[c16] 16.Dupuis M., Comput. Phys. Commun. 134, 150 (2001). 10.1016/s0010-4655(00)00195-8 [DOI] [Google Scholar]

[c17] 17.Helgaker T., Jørgensen P., and Olsen J., Molecular Electronic Structure Theory (Wiley, Chichester, 2000). [Google Scholar]

[c18] 18.McMurchie L. E. and Davidson E. R., J. Comput. Phys. 26, 218 (1978). 10.1016/0021-9991(78)90092-x [DOI] [Google Scholar]

[c19] 19.Obara S. and Saika A., J. Chem. Phys. 84, 3963 (1986). 10.1063/1.450106 [DOI] [Google Scholar]

[c20] 20.Honda H., Yamaki T., and Obara S., J. Chem. Phys. 117, 1457 (2002). 10.1063/1.1485958 [DOI] [Google Scholar]

[c21] 21.Head-Gordon M. and Pople J. A., J. Chem. Phys. 89, 5777 (1988). 10.1063/1.455553 [DOI] [Google Scholar]

[c22] 22.Ryu U., Lee Y. S., and Lindh R., Chem. Phys. Lett. 185, 562 (1991). 10.1016/0009-2614(91)80260-5 [DOI] [Google Scholar]

[c23] 23.Johnson B. G., Gill P. M. W., and Pople J. A., Chem. Phys. Lett. 206, 229 (1992). 10.1016/0009-2614(93)85546-z [DOI] [Google Scholar]

[c24] 24.Hamilton T. P. and Schaefer H. F. III, Chem. Phys. 150, 163 (1991). 10.1016/0301-0104(91)80126-3 [DOI] [Google Scholar]

[c25] 25.Lindh R., Ryu U., and Liu B., J. Chem. Phys. 95, 5889 (1991). 10.1063/1.461610 [DOI] [Google Scholar]

[c26] 26.Lindh R., Theor. Chim. Acta 85, 423 (1993). 10.1007/bf01112982 [DOI] [PubMed] [Google Scholar]

[c27] 27.Gill P. M. W., Head-Gordon M., and Pople J. A., Int. J. Quantum Chem. 36, 269 (1989). 10.1002/qua.560360831 [DOI] [Google Scholar]

[c28] 28.Johnson B. G., Gill P. M. W., and Pople J. A., Int. J. Quantum Chem. 40, 809 (1991). 10.1002/qua.560400610 [DOI] [Google Scholar]

[c29] 29.Gill P. M. W., Johnson B. G., and Pople J. A., Chem. Phys. Lett. 217, 65 (1994). 10.1016/0009-2614(93)e1340-m [DOI] [Google Scholar]

[c30] 30.Johnson B. G., Gill P. M. W., Pople J. A., and Fox D. J., Chem. Phys. Lett. 206, 239 (1993). 10.1016/0009-2614(93)85547-2 [DOI] [Google Scholar]

[c31] 31.Gill P. M. W., Head-Gordon M., and Pople J. A., J. Phys. Chem. 94, 5564 (1990). 10.1021/j100377a031 [DOI] [Google Scholar]

[c32] 32.Gill P. M. W. and Pople J. A., Int. J. Quantum Chem. 40, 753 (1991). 10.1002/qua.560400605 [DOI] [Google Scholar]

[c33] 33.Gill P. M. W. and Johnson B. G., Int. J. Quantum Chem. 40, 745 (1991). 10.1002/qua.560400604 [DOI] [Google Scholar]

[c34] 34.Gill P. M. W., Adv. Quantum Chem. 25, 141 (1994). 10.1016/s0065-3276(08)60019-2 [DOI] [Google Scholar]

[c35] 35.Köster A. M., J. Chem. Phys. 104, 4114 (1996). 10.1063/1.471224 [DOI] [Google Scholar]

[c36] 36.Köster A. M., J. Chem. Phys. 118, 9943 (2003). 10.1063/1.1571519 [DOI] [Google Scholar]

[c37] 37.Calaminici P., Domínguez-Soria V. D., Geudtner G., Hernández-Marín E., and Köster A. M., Theor. Chem. Acc. 115, 221 (2006). 10.1007/s00214-005-0005-0 [DOI] [Google Scholar]

[c38] 38.Reine S., Tellgren E., and Helgaker T., Phys. Chem. Chem. Phys. 9, 4771 (2007). 10.1039/b705594c [DOI] [PubMed] [Google Scholar]

[c39] 39.Reine S., Helgaker T., and Lindh R., Wiley Interdiscip. Rev.: Comput. Mol. Sci. 2, 290 (2012). 10.1002/wcms.78 [DOI] [Google Scholar]

[c40] 40.Ahlrichs R., Phys. Chem. Chem. Phys. 6, 5119 (2004). 10.1039/b413539c [DOI] [Google Scholar]

[c41] 41.Valeev E. F., Libint: A library for the evaluation of molecular integrals of many-body operators over Gaussian functions, http://libint.valeyev.net/.

[c42] 42.Valeev E. F. and Janssen C. L., J. Chem. Phys. 121, 1214 (2004). 10.1063/1.1759319 [DOI] [PubMed] [Google Scholar]

[c43] 43.Werner H.-J., Knizia G., and Manby F. R., Mol. Phys. 109, 407 (2011). 10.1080/00268976.2010.526641 [DOI] [Google Scholar]

[c44] 44.Shao Y. and Head-Gordon M., Chem. Phys. Lett. 323, 425 (2000). 10.1016/s0009-2614(00)00524-8 [DOI] [Google Scholar]

[c45] 45.Sodt A., Subotnik J. E., and Head-Gordon M., J. Chem. Phys. 125, 194109 (2006). 10.1063/1.2370949 [DOI] [PubMed] [Google Scholar]

[c46] 46.Polly R., Werner H.-J., Manby F. R., and Knowles P. J., Mol. Phys. 102, 2311 (2004). 10.1080/0026897042000274801 [DOI] [Google Scholar]

[c47] 47.Reine S., Tellgren E., Krapp A., Kjærgaard T., Helgaker T., Jansik B., Høst S., and Salek P., J. Chem. Phys. 129, 104101 (2008). 10.1063/1.2956507 [DOI] [PubMed] [Google Scholar]

[c48] 48.Manzer S. F., Epifanovsky E., and Head-Gordon M., J. Chem. Theory Comput. 11, 518 (2014). 10.1021/ct5008586 [DOI] [PMC free article] [PubMed] [Google Scholar]

[c49] 49.Mejía-Rodríguez D. and Köster A. M., J. Chem. Phys. 141, 124114 (2014). 10.1063/1.4896199 [DOI] [PubMed] [Google Scholar]

[c50] 50.Mejía-Rodríguez D., Huang X., del Campo J. M., and Köster A. M., Adv. Quantum Chem. 71, 41 (2015). 10.1016/bs.aiq.2015.03.009 [DOI] [Google Scholar]

[c51] 51.Köppl C. and Werner H.-J., J. Chem. Theory Comput. 12, 3122 (2016). 10.1021/acs.jctc.6b00251 [DOI] [PubMed] [Google Scholar]

[c52] 52.Sierka M., Hogekamp A., and Ahlrichs R., J. Chem. Phys. 118, 9136 (2003). 10.1063/1.1567253 [DOI] [Google Scholar]

[c53] 53.Alvarez-Ibarra A. and Köster A. M., J. Chem. Phys. 139, 024102 (2013). 10.1063/1.4812183 [DOI] [PubMed] [Google Scholar]

[c54] 54.Alvarez-Ibarra A. and Köster A. M., Mol. Phys. 113, 3128 (2015). 10.1080/00268976.2015.1078009 [DOI] [Google Scholar]

[c55] 55.Ishida K., J. Chem. Phys. 98, 2176 (1993). 10.1063/1.464196 [DOI] [Google Scholar]

[c56] 56.Flocke N. and Lotrich V., J. Comput. Chem. 29, 2722 (2008). 10.1002/jcc.21018 [DOI] [PubMed] [Google Scholar]

[c57] 57.T. H. Dunning, Jr., J. Chem. Phys. 90, 1007 (1989). 10.1063/1.456153 [DOI] [Google Scholar]

[c58] 58.Woon D. E. and T. H. Dunning, Jr., J. Chem. Phys. 98, 1358 (1993). 10.1063/1.464303 [DOI] [Google Scholar]

[c59] 59.Weigend F., Köhn A., and Hättig C., J. Chem. Phys. 116, 3175 (2002). 10.1063/1.1445115 [DOI] [Google Scholar]

[c60] 60.Hollman D. S., Schaefer H. F. III, and Valeev E. F., J. Chem. Phys. 142, 154106 (2015). 10.1063/1.4917519 [DOI] [PubMed] [Google Scholar]

[c61] 61.MRCC, a quantum chemical program suite written by Kállay M., Rolik Z., Csontos J., Ladjánszki I., Szegedy L., Ladóczki B., Samu G., Petrov K., Farkas M., Nagy P., Mester D., and Hégely B., see also Ref. 71 as well as http://www.mrcc.hu/.

[c62] 62.Neese F., Hansen A., and Liakos D. G., J. Chem. Phys. 131, 064103 (2009). 10.1063/1.3173827 [DOI] [PubMed] [Google Scholar]

[c63] 63.Helgaker T., Gauss J., Jørgensen P., and Olsen J., J. Chem. Phys. 106, 6430 (1997). 10.1063/1.473634 [DOI] [Google Scholar]

[c64] 64.Weidendorfer J., Kowarschik M., and Trinitis C., in Proceedings of the 4th International Conference on Computational Science (ICCS 2004), Krakow, Poland, 2004. [Google Scholar]

[c65] 65.Wheeler J. C., Rocky Mt. J. Math. 4, 287 (1974). 10.1216/rmj-1974-4-2-287 [DOI] [Google Scholar]

[c66] 66.Golub H. and Welsch J. H., Math. Comput. 23, 221 (1969). 10.1090/s0025-5718-69-99647-1 [DOI] [Google Scholar]

[c67] 67.Doser B., Lambrecht D. S., Kussmann J., and Ochsenfeld C., J. Chem. Phys. 130, 064107 (2009). 10.1063/1.3072903 [DOI] [PubMed] [Google Scholar]

[c68] 68.Schütz M., Hetzer G., and Werner H.-J., J. Chem. Phys. 111, 5691 (1999). 10.1063/1.479957 [DOI] [Google Scholar]

[c69] 69.Eshuis H., Yarkony J., and Furche F., J. Chem. Phys. 132, 234114 (2010). 10.1063/1.3442749 [DOI] [PubMed] [Google Scholar]

[c70] 70.Hári J., Polyák P., Mester D., Mitušík M., Omastová M., Kállay M., and Pukánszky B., Appl. Clay Sci. 132, 167 (2016). 10.1016/j.clay.2016.06.001 [DOI] [Google Scholar]

[c71] 71.Rolik Z., Szegedy L., Ladjánszki I., Ladóczki B., and Kállay M., J. Chem. Phys. 139, 094105 (2013). 10.1063/1.4819401 [DOI] [PubMed] [Google Scholar]

PERMALINK

Efficient evaluation of three-center Coulomb integrals

Gyula Samu

Mihály Kállay

Abstract

I. INTRODUCTION

II. THEORY

A. Three-center Coulomb integrals

B. Obara–Saika recursion

C. McMurchie–Davidson scheme

D. Gill–Head-Gordon–Pople algorithm

E. Rys quadrature method

F. Algorithmic considerations

Algorithm 1.

Algorithm 2.

III. FLOATING POINT OPERATION COUNTS

TABLE I.

TABLE II.

IV. IMPLEMENTATION

FIG. 1.

V. PERFORMANCE TESTS

FIG. 2.

FIG. 3.

FIG. 4.

TABLE III.

FIG. 5.

TABLE IV.

VI. BENCHMARK CALCULATIONS

TABLE V.

VII. CONCLUSIONS

SUPPLEMENTARY MATERIAL

ACKNOWLEDGMENTS

APPENDIX A: IMPROVED RECURRENCE RELATION FOR THE 2D INTEGRALS OF THE RYS SCHEME

APPENDIX B: A RIGOROUS UPPER BOUND FOR PRIMITIVE THREE-CENTER ERIs

REFERENCES

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases