Skip to main content
The Journal of Chemical Physics logoLink to The Journal of Chemical Physics
. 2017 Jun 1;146(21):214106. doi: 10.1063/1.4984322

Optimization of the linear-scaling local natural orbital CCSD(T) method: Redundancy-free triples correction using Laplace transform

Péter R Nagy 1,a), Mihály Kállay 1,b)
PMCID: PMC5453808  PMID: 28576082

Abstract

An improved algorithm is presented for the evaluation of the (T) correction as a part of our local natural orbital (LNO) coupled-cluster singles and doubles with perturbative triples [LNO-CCSD(T)] scheme [Z. Rolik et al., J. Chem. Phys. 139, 094105 (2013)]. The new algorithm is an order of magnitude faster than our previous one and removes the bottleneck related to the calculation of the (T) contribution. First, a numerical Laplace transformed expression for the (T) fragment energy is introduced, which requires on average 3 to 4 times fewer floating point operations with negligible compromise in accuracy eliminating the redundancy among the evaluated triples amplitudes. Second, an additional speedup factor of 3 is achieved by the optimization of our canonical (T) algorithm, which is also executed in the local case. These developments can also be integrated into canonical as well as alternative fragmentation-based local CCSD(T) approaches with minor modifications. As it is demonstrated by our benchmark calculations, the evaluation of the new Laplace transformed (T) correction can always be performed if the preceding CCSD iterations are feasible, and the new scheme enables the computation of LNO-CCSD(T) correlation energies with at least triple-zeta quality basis sets for realistic three-dimensional molecules with more than 600 atoms and 12 000 basis functions in a matter of days on a single processor.

I. INTRODUCTION

Vast knowledge is available about which effects are needed to be taken into account for accurate modeling of molecular systems, on the basis of which our models are constantly improved aiming at better accuracy and lower computational cost. Density functional theory (DFT) is a cost-efficient and frequently applied compromise, but there is a great need for alternative methods with which chemical accuracy can be systematically approached. These methods are not only useful for the accurate calculation of molecular properties but also enable benchmark calculations to assess the performance of DFT or other reduced cost models on realistic sizable chemical problems. Wave function based approaches, especially the coupled cluster (CC) hierarchy of methods,1 combine many desirable features for that purpose, such as proven convergence to the exact energy, size-extensivity, and extensions to molecular properties and excited states;2,3 however, the steep scaling and large atomic orbital (AO) basis set requirement of these models limits their use to systems with a few dozen atoms. The CC model with single and double (CCSD) excitations extended with perturbative triples [CCSD(T)]4 correction, considered as one of the most cost-effective and yet highly accurate members of the CC hierarchy,5 still scales as O(N7) with N characterizing the system size, and hence in its canonical formulation using at least triple-ζ quality AO basis sets CCSD(T) can only be applied to small systems with up to 20–30 atoms.

This limit can only slightly be increased by utilizing highly optimized canonical CCSD and CCSD(T) implementations, exploiting, e.g., parallelization6–14 or graphical processing units,14–17 or by relying on small AO basis sets and estimating the effects of the basis set incompleteness with more cost-effective methods.18–23 Compression of the single particle basis utilizing orbital transformation techniques, such as optimized virtual orbitals24,25 or natural orbitals (NOs)26,27 obtained from correlated wave functions, is also a powerful cost reduction approach, which is frequently employed in both canonical28–33 and local34–37 CC calculations.

The O(N7) scaling of CCSD(T) can, in principle, be brought down to O(N6) by the factorization of the energy denominator using either Cholesky-decomposition38,39 or Laplace transform (LT),40,41 the latter also playing the central role in the developments presented here. Scaling reduction ideas on the basis of denominator factorization, introduced first by Almlöf and Häser,42–44 have also been successfully employed in multiple areas of quantum chemistry. Reduced or even asymptotically linear scaling second-order Møller–Plesset (MP2) implementations were reported by Ayala and Scuseria45,46 and later by Ochsenfeld and co-workers47–49 utilizing the distance decay of various contributions within the AO basis. Kobayashi and Nakai50 implemented the LT based MP2 energy expression of Surján,51 which is expressed in terms of the Hartree–Fock (HF) density matrix. LT plays a pivotal role in reduced scaling scaled-opposite-spin MP2,52,53 second-order CC (CC2),54 periodic MP2,55 and random phase approximation56 implementations.

The orbital invariant property of the denominator-free Laplace transformed perturbation energy expressions can be exploited in not only the AO but in localized molecular orbital (LMO) basis as well, which is a key idea in the present and many previous local correlation schemes.36,57–65 The short-range character of the electron correlation was first exploited in the pioneering work of Pulay66–69 via the use of LMOs and local approximations. For each occupied LMO pair, a domain of spatially close projected atomic orbitals (PAOs) was assembled, which identified the most important excitations required for correlating the corresponding electrons. At the same time, Förner and co-workers70 developed a CC doubles method using local orbitals and limited excitations occurring only within a subunit of the entire system. On the basis of these foundations, two groups of local correlation methods were developed. Fragmentation approaches decompose the system into parts of manageable size and obtain the correlation energy as the sum of fragment (and interfragment interaction) contributions. Methods in the second class, also referred to as direct methods, avoid fragmentation and introduce local approximations into the equations corresponding to the entire system.

Building on the ideas of Pulay, Werner, Schütz, and co-workers contributed significantly to the development of the direct type of local correlation methods71–79 by introducing local density fitting,80–85 explicit correlation,83,86–88 and extensions to CC methods up to CCSD(T)74,75,82 and beyond,89 as well as to molecular properties84,85,90–92 and excited states.78,79,93 However, the large size of the domains composed of PAOs soon became a limiting factor, which was lifted via the introduction of MP2 NOs to compress the virtual subspace. Leading the way in this respect, Neese and co-workers utilized the benefits of pair natural orbitals (PNOs),35,94–96 which were later taken over also by the Werner37,88,97,98 and Hättig36,99–101 groups. The success of the PNO based methods motivated the development of other MP2 NO based correlation orbitals, such as the orbital specific virtual (OSV) orbitals102–105 of Manby and co-workers or the local NOs (LNOs) proposed by us.34,106

The fragmentation-based class of approaches also collects a wide variety of methods, many of which have been developed up to the CC level.70,107–113 CCSD(T) implementations are available for the incremental method proposed by Stoll114,115 and significantly extended by Friedrich,109,116 the divide-and-conquer (DC) method of Li and Li117 and Kobayashi and Nakai,110,118 the divide-expand-consolidate (DEC) approach of Jørgensen, Kristensen, Kjærgaard, and their co-workers,112,119,120 and the cluster-in-molecule (CIM) approach developed by Li, Piecuch, Gordon, and their co-workers.113,121,122 Fragmentation-based strategies offer a straightforward route towards not only linear scaling computation time or efficient parallelism but also asymptotically constant memory and, if needed at all, disk requirement. Additionally, highly optimized canonical algorithms and implementations can be taken advantage of with minor modifications. On the other hand, sufficient accuracy can only be achieved with large, overlapping fragments. The resulting redundancy poses a challenge for such CC implementations.

In our previous studies, we decreased the redundancy of the fragment construction using an ansatz that combines the merits of the CIM and the incremental schemes.63,64 The prefactor of our fragment CC calculations has also been drastically reduced by transforming the fragment MOs to the LNO basis.34 The combination of these improvements led to an efficient and accurate local CCSD(T) implementation,123 in which, however, the calculation of the fragment (T) energies took about 60%–70% of the total computation time due to the remaining overlap of the domains used in the fragment CC calculation. A similar overlap issue has been resolved in our local MP2 approaches,63,64 where the redundancy in the MP2 amplitude evaluation step has been completely removed by switching to a Laplace transformed energy formula. We generalize this idea in the present contribution and eliminate the redundancy in the triples amplitude evaluation by introducing a Laplace transformed (T) [LT (T)] fragment energy expression. In other words, the use of Laplace transformed (T) energy expression allows us to evaluate each triples amplitude of the entire molecule at most once, even if the corresponding orbital domains are overlapping.

This paper is organized as follows. Sections II A and II B introduce the theory of the canonical (T) correction with a slightly improved algorithm that might be useful on its own right for specific applications. Sections II C and II D summarize our latest fragment construction scheme and the previous (T) fragment energy calculation, while we discuss the new LT (T) energy expression and algorithm in Secs. II E and II G. Results of benchmark calculations for numerical accuracy and efficiency of the LT approximation are collected in Sec. III.

II. THEORY

In this paper, we will suppose a closed-shell reference determinant composed of spatial orbitals. In the following theoretical considerations, several different orbital sets will appear, for which the indexing notations are collected in Table I. First, we will consider the (T) correction of the conventional CCSD(T) energy computed in a canonical HF basis. Then we will focus on the (T) term of our local LNO-CCSD(T) ansatz34 which was previously obtained using a local subset of quasi-canonicalized MP2 natural orbitals (LNOs). Indices i,j,k, (a,b,c,) will refer to the occupied (virtual) (quasi-)canonical orbitals for both the canonical and local approaches, and no (nv) will denote the dimension of the corresponding occupied (virtual) subspace.

TABLE I.

Summary of index notations. See Sec. II for the definition of various orbital sets.

i,j,k,l, (Quasi-)canonical occupied orbitals
a,b,c,d, (Quasi-)canonical virtual orbitals
P,Q, (Natural) auxiliary functions for the DF approximation
i,j,k, Localized occupied molecular orbitals (LMOs)
I,J,K, Semi-canonical Gram–Schmidt–Löwdin (scGSL) basis
i¯,,ā, Orbital multiplied with the corresponding Laplace factor
i,,ã, Orbital divided by the corresponding Laplace factor

A. Canonical (T) correlation energy

The closed shell (T) correction can be written in the canonical orbital basis as4,124

E(T)=13ijkabc(4Wijkabc+Wijkbca+Wijkcab)(VijkabcVijkcba)Dijkabc, (1)

where Dijkabc = fii + fjj + fkkfaafbbfcc is the usual energy denominator with fii, as the diagonal elements of the Fock matrix, and intermediates Wijkabc and Vijkabc are defined as

Wijkabc=Pijkabcd(bd|ai)tkjcdl(ck|jl)tilab (2)

and

Vijkabc=Wijkabc+(bj|ck)tia+(ai|ck)tjb+(ai|bj)tkc. (3)

Here Pijkabc permutes the indices of the above tensors as

Pijkabcabcijk=abcijk+acbikj+cabkij+cbakji+bcajki+bacjik. (4)

Additionally, (pq\rs) denotes a two-electron integral using the Mulliken notation for arbitrary orbitals p,q,, and tia and tijab are the cluster amplitudes for single and double excitations, respectively.

The sixfold permutational symmetry of the W and V tensors can be utilized in two alternative ways.124–126 In the “ijkabc” algorithm,125,126 the outermost loops run over the occupied indices (see, e.g., in Algorithm 1), and the W and V tensors are evaluated only for a single permutation of each occupied index triplet, say ijk. The corresponding energy expression with the appropriate index restrictions reads as

E(T)=2ijkabcYabcijk2ZabcijkWabcijk+Wbcaijk+Wcabijk+Zabcijk2YabcijkWacbijk+Wbacijk+Wcbaijk+3Xabcijk/×Dabcijk1+δij+δik1+δab+δbc, (5)

with

Xabcijk=WabcijkVabcijk+WbcaijkVbcaijk+WcabijkVcabijk+WacbijkVacbijk+WbacijkVbacijk+WcbaijkVcbaijk, (6)
Yabcijk=Vabcijk+Vbcaijk+Vcabijk, (7)
Zabcijk=Vacbijk+Vbacijk+Vcbaijk. (8)

Alternatively, in the “abcijk” algorithm,124,126 the W and V tensors are constructed for all occupied index triplets and only a single permutation of each virtual index triplet, say abc. More details are given in Appendix A.

Algorithm 1.

“1permutation-ijkabc” algorithm.

if (1permutation) Rbaij = Tabij
for k = 1, no
if (2permutation) Rbaik=Tabik
for jk
for ij
wabc=Ta,dikId,bcj+Iab,djTd,cik+Ta,dij(Ibc,dk)
vcab=(Id,cak)Td,bij+Tc,dkjId,abi+Ica,diTd,bkj+Tca,li(Ib,ljk)+Rca,lk(Ib,lji)
wabcvcab
if (1permutation) wabcRab,li(Ic,lkj)+Ia,lik(Rbc,lj)
if (2permutation) vcba=Ic,ljk(Tba,li)+Tcb,lj(Ia,lik) ; wabcvcba
wabcTab,lj(Ic,lki)+Ia,lij(Tbc,lk)
vabc=wabc+IbcjkTai+IacikTbj+IabijTck
Calculate energy contribution according to Eq. (5)
end for
end for
end for

B. Canonical (T) algorithm

Tremendous effort has been devoted to the development of better algorithms for the evaluation of the canonical (T) correction. The majority of these studies aim for systems with increasingly larger number of orbitals and focus on efficient parallelization of the conventional algorithms.7–12,16,17 A markedly different situation is relevant, however, for the case of the present LNO-CCSD(T) scheme. Namely, we need to evaluate the (T) expressions with a relatively small number of orbitals (no = 20 − 40 and nv = 100 − 200 on an average) many times, for each orbital domain of the entire system. Additionally, for such domain calculations, the usual assumptions—that is, nvno, and that the double precision general matrix-matrix multiplication (dgemm) operations required in Eq. (2) are by far the most time consuming—are not fulfilled.

Therefore we optimized our existing (T) implementation34,127 for the above MO basis sizes relying on the “ijkabc” algorithms available in the literature.10,16,31,124 A common characteristic of these implementations is that the two types of terms of Eq. (2) are accumulated into a three-index array for a given index triplet ijk at a time via efficient matrix multiplications, e.g., as

wbacIba,di(Tc,dkj)+(Ib,ljk)(Tac,li), (9)

where just as in the following expressions using array notations, summation over repeated indices is assumed. Array I stores the integral list according to the Dirac notation, i.e., Iba,di=ba|di, and the CCSD doubles amplitudes are kept in array T, e.g., Tc,dkj=tkjcd. Superscripts used for the array quantities denote fixed indices for a given ijk triplet and transpose operations interchange (hyper-)index couples separated by a comma, e.g., (Tac,li)=Tl,aci. The remaining five permutations are evaluated analogously, e.g., as

wcab(Id,cak)(Tb,dji)+Ic,lkj(Tab,li). (10)

Before each of these five additional contributions can be accumulated into w, its index order has to be permuted according to the corresponding term of Pijkabc, e.g., array w having bac index order after the evaluation of Eq. (9) is rearranged to have cab index order before the wcab contribution of Eq. (10) can be added to it in a vectorized way. Alternatively, the contributions from each five permutations can be collected into a separate array, say v, and then v can be added to w with an appropriate index order using the daxpy (double precision ax + y) operation, e.g., for the above terms in Eqs. (9) and (10): wbacvcab. We will refer to this solution as the “5permutation” algorithm. (Here we restrict our discussion to Refs. 31, 124, 10, and 16 because we were not able find detailed enough documentation for the rest of the presently available (T) implementations. For this reason, probably, realizations in other program packages that are not designed for multi-node parallelism7,126 should not differ significantly from the above “5permutation” algorithm.)

We analyzed the wall times measured on a 6-core processor (3.5 GHz Intel Xeon E5-1650) required for each step of our “5permutation-ijkabc” code using OpenMP parallel Basic Linear Algebra Subprogram (BLAS) routines and recognized that the relatively inefficient permutations/daxpy operations take time comparable to the more demanding dgemm calls, in spite of the fact that f=65(nv+no) times, and more operations are needed for the latter. Since vector level operations, such as daxpy, cannot utilize the faster, lower level memory cache as efficiently as dgemm, they often exhibit more than an order of magnitude smaller floating point operations rate.10 Moreover, due to this memory access bottleneck, these vector level operations can hardly be accelerated by using multiple cores of the same processor, while the dgemm operation scales almost perfectly with the number of cores (given that the matrix sizes are sufficiently large). As a consequence of these two factors, the dgemm operations are by a factor of 100 to 200 more efficient, which, taking into account the f100200 operation count ratio of the dgemm and permutation/daxpy calls characteristic for our average case, explains our measurements.

In order to optimize our implementation for these moderate numbers of orbitals and improve the efficiency of the OpenMP parallelization, we designed an alternative algorithm that requires only one permutation/daxpy call for each ijk index triplet. To this end, the permutational symmetry of the Coulomb integrals (e.g., ba|di=da|bi) and of the doubles amplitudes (tijab=tjiba) are exploited. The resulting, “1permutation-ijkabc” algorithm is presented in Algorithm 1, while the corresponding “1permutation-abcijk” algorithm is shown in Appendix A. Note that the “1permutation-ijkabc” approach requires the storage of an additional array (R) of size of the doubles amplitude matrix, which does not pose any problem for such moderate number of orbitals. Nevertheless, if necessary, the memory requirement can be kept at the level of the original “5permutation” case at the cost of an additional permutation/daxpy call (see the “2permutation” alternative in Algorithm 1).

We measure around 20%–25% improvement in wall times (using either 1 or 6 cores) when switching to the “1permutation” algorithm (for no = 20 − 50 and nv5no). Even larger speedup is expected with more OpenMP threads, but the relative gain is probably smaller for a larger number of orbitals. Therefore the new algorithm could be most suitable for similar, fragmentation-based methods, such as the alternative CIM implementations,113,121 the DEC scheme,112 the DC,110 or the incremental method.109 Moreover, canonical CCSD(T) computations with compressed virtual subspace and smaller nv/no ratios or calculations carried out using many OpenMP threads could also benefit from the cost reduction offered by Algorithm 1.

Additional optimization was performed on our previous preliminary OpenMP parallel implementation, namely, the parallelization of the energy expression and the W and V tensor evaluation [see Eqs. (2)(5)] was improved. These optimizations together with the algorithmic improvements described above result in a speedup factor of about 3 on 6 cores if the new “1permutation” code is compared to our previous “5permutation” one.

C. Construction of the orbital domains

In our presently applied fragmentation scheme,34,63,64,106,123 which is motivated by the CIM approach113,121,128 and the incremental method,115,116 the correlation energy expression is given as the sum of contributions from individual localized occupied orbitals (fragment energies) and the interaction energy thereof. The (T) correlation energy term of the LNO-CCSD(T) energy reads as34

E(T)=kδEk(T), (11)

where the contribution of an occupied LMO k and δEk(T) is evaluated in LMO-specific orbital domains (local interaction subspaces, LISs, denoted as Pk). In order to arrive at a LIS, first, extended domains (EDs), Ek, are constructed for each LMO, k. The procedure is discussed in detail in Ref. 64. In brief, each LMO is selected as the central LMO of its ED and becomes the first occupied MO of its ED with index 1. Then a list of strong pair LMOs are determined based on approximate MP2 pair correlation energies, and each LMO that forms a strong pair with LMO k is also added to Ek. The virtual and auxiliary basis sets of the EDs are constituted of those PAOs and auxiliary functions that are required to compute accurate MP2 amplitudes in the EDs. Next, using these MP2 amplitudes, the occupied-occupied and virtual-virtual blocks of MP2 fragment density matrices are constructed and diagonalized, yielding a set of occupied and virtual LNOs in each ED.34,63 To save computational resources, only LNOs having occupation numbers above a certain threshold are included in the LISs and carried forward to the CC calculation. Finally, the LNOs kept are quasi-canonicalized, and the integral lists are transformed to this quasi-canonical LNO basis. The conventional auxiliary basis of the ED is also compressed to a much smaller, LIS-specific fitting set that consists of the so-called natural auxiliary functions (NAFs).129 The NAF basis is obtained from a partial singular value decomposition of the three-center density fitting integrals of the LIS.63,64,129

With the necessary integral lists transformed to the LNO and NAF bases at hand, CCSD fragment energies and amplitudes are obtained in each LIS using a conventional CCSD implementation.34

D. (T) fragment energy expression

The (T) fragment energy of contribution of LMO k is expressed with the quasi-canonical LNOs of Pk as34

δEk(T)=13ijPkabcPk(4wijkabc+wijkbca+wijkcab)(vijkabcvijkcba). (12)

To arrive at the above form, first, the energy denominators are absorbed into the W and V tensors as

wijkabc=WijkabcDijkabc (13)

and

vijkabc=VijkabcDijkabc, (14)

and then the resulting formally orbital invariant expression allows the required transformation of one of the occupied indices to LMO k as

wijkabc=lPkwijlabcUlk. (15)

Matrix U transforms the domain specific quasi-canonical MOs of LIS Pk to the LMO basis.

Note that the full sixfold permutational symmetry of W and V is reduced to a twofold symmetry due to the partial index transformation. It was shown in our previous work34 that the full sixfold symmetry of Wijkabc and Vijkabc and the remaining twofold symmetry of wijkabc and vijkabc can still be utilized in a slightly modified “abcijk”-type algorithm. In brief, an energy expression analogous to Eqs. (5) and (A1) was derived [see Eq. (34) of Ref. 34] utilizing the above symmetries, which depends on all independent types of the above partially transformed tensors, that is, on wijkabc, wijkabc, wijkabc, vijkabc, vijkabc, and vijkabc. For the construction of the required partially transformed tensors, it was necessary to evaluate all the different elements of W and V in the quasi-canonical basis and then to perform their transformation according to Eq. (15). This scheme brings down the number of required tensor elements to 6no2nv3; however, the scaling of the number of operations for the construction of W and V remains no3nv4, as for the canonical (T) expression.

It is obvious from the above argument and from Eqs. (2) and (12) that a much better no2nv4 scaling algorithm could be designed if we were able to construct W and V directly with their first occupied index in the LMO basis. The simplest solution for that would be to work directly in the occupied LMO basis and to employ the popular T0 semi-canonical approximation,35,74,75,82,95,105 where the canonical expressions [such as Eq. (2)] are applied in a non-canonical basis as well assuming the Fock-matrix to be diagonal. This option will be further investigated in Sec. III. As an alternative to the semi-canonical T0 approximation, Sec. II E introduces a more accurate approach utilizing the LT of the energy denominator.

E. Laplace transformed (T) energy expression

The (T) energy expression can be rewritten in an orbital invariant form via numerical LT of the energy denominator36,40–42

1Dijkabc=0eDijkabcsdsq=1nqωqeDijkabcsq, (16)

where ωq’s are the corresponding quadrature weights for the nq number of quadrature points sq.130 Since the above eDijkabcsq factors factorize completely, the square-root of the individual Laplace factors (e.g., ωq12efiisq2) can be conveniently absorbed into the corresponding elements of the W and V tensors as

Wi¯j¯k¯,qāb¯c¯=Pi¯j¯k¯āb¯c¯d(b¯d|āi¯)qtk¯j¯,qc¯dl(c¯k¯|j¯l)qti¯l,qāb¯, (17)

with the transformed integrals, e.g.,

(b¯d|āi¯)q=ωq4(bd|ai)e(faa+fbbfii)sq2 (18)

and amplitudes, e.g.,

tk¯j¯,qc¯d=ωq4tkjcde(fccfjjfkk)sq2. (19)

The modified tensor V as well as the remaining (c¯k¯|j¯l)q and (āi¯|b¯j¯)q types of integrals and the ti¯,qā amplitudes is defined analogously. Finally, the substitution of Eqs. (16)(19) into Eq. (1) yields the LT (T) energy expression

E(T)13qi¯j¯k¯āb¯c¯(4Wi¯j¯k¯,qāb¯c¯+Wi¯j¯k¯,qb¯c¯ā+Wi¯j¯k¯,qc¯āb¯)(Vi¯j¯k¯,qāb¯c¯Vi¯j¯k¯,qc¯b¯ā). (20)

F. Laplace transformed (T) fragment energy

Utilizing the orbital invariant property of the Laplace transformed energy expression of Eq. (20), the corresponding (T) fragment energies are straightforwardly defined as

δEk(T)=13qi¯j¯Pkāb¯c¯Pk(4Wi¯j¯k¯,qāb¯c¯+Wi¯j¯k¯,qb¯c¯ā+Wi¯j¯k¯,qc¯āb¯)×(Vi¯j¯k¯,qāb¯c¯Vi¯j¯k¯,qc¯b¯ā). (21)

The advantage of the Laplace transformed expression compared to the previous one of Eq. (12) is that it is not necessary any more to construct all the no3nv3 elements of W and V in the quasi-canonical basis. Instead, after the transformed integrals and amplitudes [for instance, in Eqs. (18) and (19)] are computed in the quasi-canonical basis, one of their occupied indices can be transformed to the local basis, yielding, e.g., (b¯d|āi¯)q or ti¯l¯,qāb¯ type of intermediates. Consequently, it is now possible to evaluate Wi¯j¯k¯,qāb¯c¯ and Vi¯j¯k¯,qāb¯c¯ tensors only for the single k central LMO index, which reduces the number of operations for Eq. (17) to be proportional to no2nv4+no3nv3.

The permutational symmetry can again be exploited. For that purpose it is beneficial to use the same basis set for all three occupied indices of W and V, which restores their sixfold symmetry. (The quasi-canonical virtual LNO basis will be kept unchanged.) In our case, there is a large degree of freedom in the choice of this new, LIS-specific occupied basis. The only requirements are that the central LMO of each domain has to be the first function and the new orbital set has to span the same subspace as the original occupied LNOs.

A possible solution is to construct localized MOs for each LIS. With this choice, both the Laplace transformed and the semi-canonical T0 approximations can be employed and compared in the localized basis. Since in each domain, the original LMOs are mixed together to form LNOs, and some of the LNOs with negligible occupation numbers are dropped, we cannot simply transform back to the original LMO basis of the complete system from the much smaller LNO bases of the LISs. Therefore, LIS-specific localized basis sets are constructed by, first, selecting the central LMO as the first function of the new basis, then forming an orthonormal no − 1 dimensional orbital set, which is orthogonal to the central LMO. For that we project out the central LMO from all the LNOs of the LIS via a Gram–Schmidt step and then perform Löwdin orthogonalization on the resulting functions. This procedure yields an orthonormal set—which we refer to as the non-canonical Gram–Schmidt–Löwdin (ncGSL) basis—that spans the same space as the no dimensional occupied LNO basis. The LNO to ncGSL transformation matrix can be written in a closed form131,132

Ukl=δl1uk+(1δl1)(1δk1)δklukul1+u1δk1ul, (22)

with u storing the expansion coefficients of the central LMO on the occupied LNO basis as defined in Eq. (15), i.e., ul = Ul1. Finally, the functions of the ncGSL basis are localized (except for the fixed central MO), which leads to the LMO basis of the LIS. For the sake of simplicity, the same primed indices will be employed for the LIS-specific localized MOs and the original LMOs of the complete system. The semi-canonical approximation introduced in this LIS-specific LMO basis will be labelled as T0 because of the close relation to the T0 approximation applied in the LMO basis of the entire system in the case of direct local correlation methods.35,74,75,82,95,105

Alternatively, one may attempt to minimize the error of the T0 approximation by choosing to keep only the central MO in the local basis and transforming the remaining no − 1 orbitals to a quasi-canonical basis. This second set, referred to as the semi-canonical GSL (scGSL) basis, is obtained from the ncGSL one by keeping the central LMO fixed and diagonalizing the Fock-matrix in the remaining no − 1 dimensional subspace. Quantities in the scGSL basis will be indexed with capital letters (e.g., I,J,). For instance, the elements of the LNO to scGSL transformation matrix are denoted as UjI, with U being the product of U and the ncGSL to scGSL transformation matrices. Finally, note that in this scGSL basis, the off-diagonal elements in the first row and column of the Fock-matrix (FI1 and F1J) are still nonzero. These remaining nonzero elements can also be neglected to arrive at an approximation that is analogous to the T0 one and will be labeled as T0′.

In our previous approach, an expression with all possible summation restrictions was derived for the efficient evaluation of the δEk(T) contributions [see Eq. (34) of Ref. 34]. In that derivation, the use of a modified “abcijk” algorithm had to be assumed because of, for instance, memory restrictions. However, the present Laplace transformed scheme can be combined with the more advantageous “ijkabc” algorithm. (Additional arguments for “ijkabc” will be given in Sec. II G.) Therefore, assuming the use of the “ijkabc” algorithm, the derivation of Ref. 34 was repeated leading to analogous summation restrictions. The resulting Laplace transformed fragment energy expression in the scGSL basis reads as133

δEk(T)=q{I¯>J¯Pk  āb¯c¯Pk11+δāb¯+δb¯c¯[WI¯J¯K¯,qāb¯c¯(2VI¯J¯K¯,qāb¯c¯VI¯J¯K¯,qāc¯b¯VI¯J¯K¯,qc¯b¯ā)+WI¯J¯K¯,qb¯c¯ā(2VI¯J¯K¯,qb¯c¯āVI¯J¯K¯,qāc¯b¯VI¯J¯K¯,qb¯āc¯)+WI¯J¯K¯,qc¯āb¯(2VI¯J¯K¯,qc¯āb¯VI¯J¯K¯,qb¯āc¯VI¯J¯K¯,qc¯b¯ā)+WI¯J¯K¯,qāc¯b¯(2VI¯J¯K¯,qāc¯b¯VI¯J¯K¯,qb¯c¯āVI¯J¯K¯,qāb¯c¯)+WI¯J¯K¯,qb¯āc¯(2VI¯J¯K¯,qb¯āc¯VI¯J¯K¯,qc¯āb¯VI¯J¯K¯,qb¯c¯ā)+WI¯J¯K¯,qc¯b¯ā(2VI¯J¯K¯,qc¯b¯āVI¯J¯K¯,qāb¯c¯VI¯J¯K¯,qc¯āb¯)+23(WI¯J¯K¯,qāb¯c¯+WI¯J¯K¯,qb¯c¯ā+WI¯J¯K¯,qc¯āb¯WI¯J¯K¯,qāc¯b¯WI¯J¯K¯,qb¯āc¯WI¯J¯K¯,qc¯b¯ā)×(VI¯J¯K¯,qāb¯c¯+VI¯J¯K¯,qb¯c¯ā+VI¯J¯K¯,qc¯āb¯VI¯J¯K¯,qāc¯b¯VI¯J¯K¯,qb¯āc¯VI¯J¯K¯,qc¯b¯ā)]+J¯Pk  āb¯c¯Pk11+δāb¯+δb¯c¯[WJ¯J¯K¯,qāb¯c¯(3VJ¯J¯K¯,qāb¯c¯YJ¯K¯,qāb¯c¯)+WJ¯J¯K¯,qāc¯b¯(3VJ¯J¯K¯,qāc¯b¯YJ¯K¯,qāb¯c¯)+WJ¯J¯K¯,qb¯c¯ā(3VJ¯J¯K¯,qb¯c¯āYJ¯K¯,qāb¯c¯)]}, (23)

with

YJ¯K¯,qāb¯c¯=VJ¯J¯K¯,qāb¯c¯+VJ¯J¯K¯,qāc¯b¯+VJ¯J¯K¯,qb¯c¯ā. (24)

It is again highlighted that the above expression is evaluated only for k=K=1, i.e., only with the central LMO as the third occupied index of W and V. Note that Eq. (23) yields δEK(T) because summation of K¯-dependent terms over the quadrature points results in the corresponding unbarred quantities divided by the energy denominator, and δEK(T)=δEk(T).

G. Laplace transformed local (T) algorithm

The above “1permutation-ijkabc” algorithm cannot be applied right away for the evaluation of WI¯J¯K¯,qāb¯c¯ and VI¯J¯K¯,qāb¯c¯, which are required for the LT (T) fragment energy of Eq. (23) because the integrals and amplitudes are not symmetric any more due to the asymmetric arrangement of the Laplace factors [see Eqs. (18) and (19)]. The symmetry of either the amplitudes or the integrals can be restored by multiplying each term of Eq. (17) with 1 in the form of 1ωq12efddsq2ωq12efddsq2 since, for instance,

(b¯d|āi¯)qtk¯j¯,qc¯d=(b¯d|āi¯)qωq12efddsq2tk¯j¯,qc¯d1ωq12efddsq2=(b¯d¯|āi¯)qtk¯j¯,qc¯d˜, (25)

where we introduce tilded indices when the inverse of the Laplace factors is absorbed into the integrals or amplitudes. Most of the beneficial properties of the “1permutation-ijkabc” algorithm can then be exploited again if the symmetric (b¯d¯|āi¯)q integrals are combined with amplitudes tk¯j¯,qc¯d˜ in the first type of terms, and “fully barred” amplitudes tl¯i¯,qb¯ā are multiplied with the non-symmetric (c¯k¯|l˜j¯)q integrals in the second kind of terms (see later in Algorithm 2).

Algorithm 2.

Laplace transformed local “1permutation-ijkabc” algorithm.

for q = 1, nq
Perform transformation and assembly steps of Eqs. (27)(31)
K = 1; Rb¯āI¯J¯=Tāb¯I¯J¯
for J = 1,no
for IJ
wāb¯c¯,q=Tā,d˜I¯K¯Id¯,b¯c¯J¯+Ia¯b¯,d¯J¯(Tc¯,d˜K¯I¯)+Ta¯,d˜I¯J¯(Ib¯c¯,d¯K¯)
vc¯āb¯,q=(Id¯,c¯āK¯)(Tb¯,d˜J¯I¯)+Tc¯,d˜K¯J¯Id¯,āb¯I¯+Ic¯ā,d¯I¯(Tb¯,d˜J¯K¯)+Tc¯ā,L¯I¯(Ib¯,L˜J¯K¯)+Rc¯ā,L¯K¯(Ib¯,L˜J¯I¯)
wāb¯c¯,qvc¯āb¯,q
wāb¯c¯,qRāb¯,L¯I¯(Ic¯,L˜K¯J¯)+Ia¯,L˜I¯K¯(Rb¯c¯,L¯J¯)+Ta¯b¯,L¯J¯(Ic¯,L˜K¯I¯)+Ia¯,L˜I¯J¯(Tb¯c¯,L¯K¯)
vāb¯c¯,q=wāb¯c¯,q+Ib¯c¯J¯K¯Ta¯I¯+Ia¯c¯I¯K¯Tb¯J¯+Ia¯b¯I¯J¯Tc¯K¯
Calculate energy contribution according to Eq. (23)
end for
end for
end for

After the CCSD iteration is finished, the amplitudes and integrals are available only in the quasi-canonical LNO basis. In order to minimize the overhead of the Laplace transformed ansatz, orbital transformations and multiplications with Laplace factors are carried out in a single step utilizing the density fitting form of the Coulomb integrals, e.g.,

(bd|ai)=PJbdPJaiP, (26)

where index P refers to (natural) auxiliary functions. First, virtual indices are simply multiplied with the corresponding Laplace factors, while the occupied orbitals are transformed to the scGSL (or LMO) basis with

Uj¯J,q=ωq12efjjsq2UjJ (27)

yielding

JāJ¯,qP=jJajPUj¯J,qωq12efaasq2. (28)

Integrals JĨJ¯,qP and Jāb¯,qP and the amplitudes tI¯ā and tI¯J¯āb˜ are obtained via analogous transformations. Finally, the necessary four-center integrals are assembled as

IāL˜I¯J¯=āL˜|I¯J¯q=PJāI¯,qPJL˜J¯,qP, (29)
Iāb¯I¯J¯=āb¯|I¯J¯q=PJāI¯,qPJb¯J¯,qP, (30)
Iāb¯,d¯I¯=āb¯|d¯I¯q=PJād¯,qPJb¯I¯,qP. (31)

Note that the quadrature index of the transformed amplitudes and integrals stored in arrays T and I is omitted in order to simplify the notation.

Using the above defined intermediates, the Laplace transformed version of the “1permutation-ijkabc” algorithm for the local (T) energy is given in Algorithm 2. The additional memory requirement for array R can again be spared using the analogue of the “2permutation” algorithm. Algorithm 2 also offers a highly efficient way for the direct construction of āb¯|d¯I¯q integrals for a given I if the full list does not fit into the main memory (see Appendix B for details).

We note that, in principle, an “abcijk” algorithm could also be designed for the Laplace transformed scheme. While the above Laplace transformed “ijkabc” algorithm extensively utilizes the optimized dgemm operations, its “abcijk” alternative is much less suitable for high performance algorithmization. For example, the construction of the wI¯J¯K¯,q,wI¯K¯J¯,q,and wK¯I¯J¯,qintermediates would require the multiplication of much smaller matrices of various size and dimensions (e.g., IdK¯c¯b¯Td,J¯I¯ā or TLK¯c¯b¯IL,I¯J¯ā), which is highly challenging to implement efficiently. For this reason, the Laplace transformed “abcijk” algorithm for the fragment energies is not considered further.

We note that we investigated a couple of additional optimization ideas with potential for further computation time reduction. However, these approaches were not implemented in the present study because, according to our estimation, they would lead to marginal, if any, speedup compared to the above algorithm in the context of our LNO-CCSD(T) scheme. Appendix C provides a brief summary of our efforts in this direction.

III. BENCHMARK CALCULATIONS

A. Computational details

The LT (T) approach proposed in this paper has been implemented in the mrcc suite of quantum chemical programs as part of the latest version of our LNO-CCSD(T) method and will be available in the next release of the package.127

The accuracy of the LT (T) expression is assessed on correlation energies and reaction energies evaluated for the test set of Neese, Wennmohs, and Hansen (NWH). The NWH set is assembled from 23 reactions also including molecules of 36 atoms.134 Furthermore, correlation energies and reaction energies of medium-sized and larger systems of up to 146 atoms were also investigated, including the androstendion and AuAmin reactions of Ref. 37 (see Fig. 1), the angiotensin molecule taken from Ref. 47, and the crambin protein of Ref. 96. For the performance analysis of the LT (T) approximation, the maximum absolute error (MAX) and the mean absolute error (MAE) measures were applied.

FIG. 1.

FIG. 1.

AuAmin (top) and androstendion (bottom) reactions taken from Ref. 37. Mes denotes the mesityl group.

In the calculations presented here, Dunning’s (augmented) correlation-consistent polarized valence X-tuple-ζ basis sets [(aug-)cc-pVXZ, X = T or Q],135–137 as well as the def2-TZVP triple-ζ valence basis set of Weigend and Ahlrichs,138 were used. The cc-pVTZ basis was applied to the atoms of the reactants and products of the androstendion and AuAmin reactions, except for the gold atom, for which the augmented correlation consistent valence triple-ζ pseudopotential (aug-cc-pVTZ-PP) basis set of Peterson139 was employed together with the corresponding effective core potential for the inner 60 electrons of the atom.140 For all the AO basis sets, the corresponding auxiliary bases of Weigend and co-workers were applied.141–143

The core electrons were kept frozen in all the correlation calculations. The Boys localization144 was used for the construction of occupied LMOs.

Quadrature points, sq, and the corresponding weights, ωq, of Eq. (16) are determined according to the minimax algorithm of Ref. 130, but other similarly accurate schemes can also be chosen.145,146 The number of quadrature points and the corresponding {ωq} and {sq} sets are selected so that the Chebyshev norm of the error function,

q=1nqωqexsq1x, (32)

be below a threshold, TLT, for x[3(fno+1,no+1fno,no),3(fno+nv,no+nvf1,1)]. A second criterion is set for the minimum of nq in the form of nq>|log(TLT)|, which, according to our numerical experience, helps to balance the accuracy of the LT in different LISs of the same system. For instance, in the case of TLT = 10−2, nq = 3 is set for the majority of LISs by the TLT threshold alone, and the minimum criterion prevents the use of nq = 2 in the remaining ones. Although the maximum of nq is not limited, we find that the minimum of nq is usually sufficient to fulfill the first error criterion. Consequently, the actual number of quadrature points equals the minimum in the LISs of the systems that we looked at so far. The numerical values of TLT will be given in atomic units throughout this paper.

Thresholds applied at the construction of the domains and LISs in this study are listed here only in brief, while the notations and the function of these thresholds are explained in detail in Refs. 34, 63, and 64. Extended domains are determined according to the tight threshold set introduced for our LMP2 method,64 e.g., TEDo = 0.9999 and 𝜀w=105Eh. The occupied and virtual LNOs and NAF bases of the LISs are defined using 𝜀o=2×105, 𝜀v=106, and 𝜀NAF=0.01 a.u., respectively. These thresholds are applied also in the latest version of our LNO-CCSD(T) method and will be extensively benchmarked in a forthcoming publication.

The reported computation times are wall-clock times determined on a machine with 128 GB of main memory and a 6-core 3.5 GHz Intel Xeon E5-1650 processor.

B. Accuracy of correlation energies

In the following, we investigate the accuracy of the Laplace transformed and the semi-canonical triples corrections (T0 and T0′) in the context of the present LNO-CCSD(T) approach on correlation and reaction energies of medium-sized and large systems and with basis sets including cc-pVTZ, aug-cc-pVTZ, and cc-pVQZ. Reference values were obtained using our previous local CCSD(T) code,34,123 with which we compute CCSD(T) energies for the same LISs defined by the same truncation thresholds, but without relying on the LT approximation. To avoid confusion, the reference scheme without the LT approach will be referred to as previous LNO-CCSD(T), and the LNO-CCSD(T) acronym will be reserved to the present Laplace transformed version.

First, the correlation energy error introduced by the Laplace transformed form is collected in Table II as the function of the number of the grid points on the example of the angiotensin molecule. Rapid convergence is observed up to μEh accuracy with 6 grid points, while the relative error in E(T) and ECCSD(T) being only 0.067% and 0.0029%, respectively, with nq = 3 is already sufficiently small for our purposes. If one aims to recover more than 99.99% of the canonical correlation energy, nq = 4 is recommended, providing an order of magnitude smaller error due to the LT approximation. Although the computational effort is 3 times less in the cases of nq = 1, T0, or T0′ compared to nq = 3, the introduction of an error of 0.1%–0.3% into ECCSD(T) originating solely from the approximation of the (T) term cannot be afforded. The T0′version is only slightly better than the T0, and both of them are markedly outperformed by LT with a crude quadrature containing 2 grid points.

TABLE II.

Accuracy of the LT (T) correlation energy for angiotensin with the cc-pVTZ basis set. The reference values were obtained with our previous LNO-CCSD(T) code34,123 using the same truncation thresholds and LISs. The third and fourth columns show the absolute and relative error of the LT (T) correlation energy contribution, respectively, introduced by the LT approximation as the function of the number of quadrature points. The relative magnitude of this error compared to the reference total LNO-CCSD(T) correlation energy (obtained with the previous implementation) is collected in the last column. See Sec. III B for further details.

TLT nq E(T) error (mEh) E(T) error (%) ECCSD(T) error (%)
10−5 6 1.1 × 10−3 −1.7 × 10−4 −7.5 × 10−6
10−4 5 2.3 × 10−3 −3.6 × 10−4 −1.5 × 10−5
10−3 4 −3.3 × 10−2 5.3 × 10−3 2.3 × 10−4
10−2 3 0.42 −6.7 × 10−2 −2.9 × 10−3
10−1 2 −1.18 0.19 8.0 × 10−3
1 43.3 −6.9 −0.30
T0′ 13.6 −2.2 −0.09
T0 16.1 −2.6 −0.11

The same error measures are given in Table III for the largest species of the androstendion and the AuAmin reactions indicating that about 0.03%–0.1% and 0.001%–0.004% relative errors can be expected in E(T) and ECCSD(T), respectively, for large molecules and triple-ζ quality basis sets. Again, μEh accuracy is obtained with nq = 6, and large errors of up to about 0.1% of ECCSD(T) are found with T0′ for both systems.

TABLE III.

Accuracy of the LT (T) correlation energy for the androstendion precursor and the AuAmin molecules with the cc-pVTZ (aug-cc-pVTZ-PP for the gold atom) basis set. See the caption of Table II for the definition of the quantities in the columns.

TLT nq E(T) error (mEh) E(T) error (%) ECCSD(T) error (%)
Androstendion precursor
10−5 6 3.1 × 10−3 −1.4 × 10−3 −5.7 × 10−5
10−2 3 6.8 × 10−2 −3.0 × 10−2 −1.2 × 10−3
T0′ 4.5 −2.0 −0.08
AuAmin
10−5 6 −4.5 × 10−3 1.1 × 10−3 5.1 ×105
10−2 3 0.38 −9.1 × 10−2 −4.2 × 10−3
T0′ 12.2 −2.9 −0.14

Relative errors for the 47 molecules of the NWH test set collected in Table IV show similar accuracy on an average in the case of the cc-pVTZ basis set and TLT = 10−2. The maximum deviation of 0.0099% relative to ECCSD(T), though being about 3 times larger than the average, is still acceptable but indicates that somewhat larger errors could appear for these smaller molecules. This can be understood if we look at the signed relative error in the (T) energy contribution of individual LISs. In the case of molecules exhibiting the largest relative error in the NWH set, the sign of the LT approximation error is negative for almost all the LISs, while for larger systems containing more occupied MOs, and hence more LISs, relative deviations of the opposite sign appear as well leading to the cancellation of errors. Compared to nq = 3, an order of magnitude improvement is observed with nq = 4, which is recommended in combination with tighter thresholds for the domain and LNO approximations. It is also satisfactory that the LT approximation performs slightly better with the aug-cc-pVTZ basis and more than twice as well with cc-pVQZ than with cc-pVTZ for both nq = 3 and nq = 4. Finally, we note again that the work required for 3 (4) quadrature points cannot be saved since the relative error of the correlation energy evaluated with the T0′ approach compared to ECCSD(T) is nearly 0.1% on an average for all the three basis sets.

TABLE IV.

Accuracy of the LT (T) correlation energy for the NWH test set. See the caption of Table II for the definition of the quantities in the columns.

E(T) error (%) ECCSD(T) error (%)
Basis set TLT MAE MAX MAE MAX
cc-pVTZ 10−3 5.8 × 10−3 2.0 × 10−2 2.6 × 10−4 1.1 × 10−3
10−2 6.1 × 10−2 0.18 2.7 × 10−3 9.9 × 10−3
T0′ 2.0 3.7 0.087 0.21
aug-cc-pVTZ 10−3 5.1 × 10−3 2.1 × 10−2 2.6 × 10−4 1.4 × 10−3
10−2 5.2 × 10−2 0.16 2.4 × 10−3 9.3 × 10−3
T0′ 2.1 3.7 0.090 0.21
cc-pVQZ 10−3 2.7 × 10−3 8.7 × 10−3 1.1 × 10−4 4.7 × 10−4
10−2 2.1 × 10−2 6.6 × 10−2 9.1 × 10−4 3.4 × 10−3
T0′ 2.0 3.6 0.085 0.20

C. Accuracy of reaction energies

Absolute errors in the (T) contribution to reaction energies (ΔE(T)) originating solely from the LT (T) approximation are given in Table V for the two reactions containing medium-sized/large molecules. Considering the sizable reaction energies of about 35 kJ/mol and 198 kJ/mol [at the LNO-CCSD(T)/cc-pVTZ level] for the androstendion and AuAmin reactions, respectively, the LT approximation with 3 grid points introduces negligible errors of about 0.15 kJ/mol into the studied energy differences. For the androstendion reaction, this result is expected from the accurately recovered correlation energies as well, e.g., only 0.068 mEh = 0.18 kJ/mol error (see Table III) was observed in the (T) term for the androstendion precursor, which is comparable to the 0.16 kJ/mol error in the reaction energy. An error compensation of roughly 5 times as large is found in the case of the AuAmin reaction with nq = 3, which still does not compare to the surprisingly big fortunate error compensation occurring in the case of T0′.

TABLE V.

Absolute errors (in kJ/mol) in the (T) contribution to the reaction energies (ΔE(T)) for the androstendion and the AuAmin reactions.

TLT 103 10−2 T0′
Androstendion 0.0013 0.16 0.17
AuAmin 0.0008 0.15 0.28

The good performance of T0′ in the above cases can be misleading. Closer inspection (see Table VI) of the same errors in ΔE(T) for the 23 reactions of the NWH test set reveals that the T0′ errors are, in average, about 5–20 times larger than the LT errors with 3 grid points depending on the basis set and can even reach 2 kJ/mol. On the other hand, the average (maximum) LT approximation errors with the LNO-CCSD(T) scheme and TLT = 10−2 being 0.02–0.07 kJ/mol (0.06–0.4 kJ/mol) for the three basis sets are small and compare very favorably to T0′. Again, an order of magnitude, better performance is achieved if one more quadrature point is added (nq = 4), which is beneficial if one wishes to recover the canonical CCSD(T) reaction energy with sub-kJ/mol accuracy.

TABLE VI.

Mean absolute and maximum errors (in kJ/mol) in the (T) contribution to the reaction energies (ΔE(T)) for the NWH test set compared to our previous LNO-CCSD(T) results used as reference.

Basis set TLT MAE MAX
cc-pVTZ 10−3 0.0073 0.040
10−2 0.073 0.40
T0′ 0.39 1.9
aug-cc-pVTZ 10−3 0.0070 0.030
10−2 0.054 0.34
T0′ 0.41 2.0
cc-pVQZ 10−3 0.0040 0.025
10−2 0.020 0.058
T0′ 0.40 1.9

D. Comparison with previous studies

Although LT (T) energies are evaluated here in the special context of our LNO-CC method using a compressed LNO basis, it is also worthwhile to make comparisons with closely related previous studies. For instance, Scuseria and co-workers40,41 also found nq = 3 sufficient when assessing the accuracy of their canonical, factorized, Laplace transformed (T) implementation; however, at that time their investigations had to be restricted to small molecules and double-ζ or triple-ζ quality basis sets. Related studies of Koch et al. performed with Cholesky-decomposed denominators arrived at similar conclusions regarding the accuracy of the triples correction with 3–4 Cholesky-vectors.38,39 Most recently, Schmitz and Hättig reported that, in the context of their Laplace transformed PNO-CCSD(T) scheme, 3–4 grid points are sufficient to reach converged results.36 The authors studied 11 reactions containing organic molecules of small and medium size and showed that the use of 3–4 grid points introduces at most 0.1 kJ/mol error for the test reactions with triple-ζ basis sets, which agrees with our maximum errors being in the range of 0.04–0.4 kJ/mol for similar number of grid points (cf., Table VI).

Additionally, Schmitz and Hättig concluded that on an average, the semi-canonical T0 approximation leads to an additional error of 5% in the (T) correlation energy contribution on top of the PNO-truncation and other errors, and the (T0) correlation energy errors computed with PNO-CCSD amplitudes are about 3 times larger than the corresponding Laplace transformed (T) errors. In the case of energy differences, the authors found discrepancies up to 3.6 kJ/mol between the Laplace transformed PNO-CCSD(T) and the PNO-CCSD(T0) reaction energies, which is even higher than the 2 kJ/mol observed by us. Since the T0 approximation was the leading error source in their PNO-CCSD(T0) scheme in most of the cases, its replacement with the LT (T) form was suggested in the context of direct PNO-based local correlation methods. In related studies, Werner and Schütz74,75 compared the semi-canonical T0 approximation with the exact iteratively solved (T) counterpart, while Riplinger et al.95 assessed their DLPNO-CCSD(T) method against both canonical CCSD(T) and CCSD(T0). Regarding the accuracy of the T0 approximation, both groups arrived at similar conclusions as Schmitz and Hättig, which are also in line with our findings.

E. Timings

The major part of our LT (T) implementation utilizes OpenMP-parallelized BLAS subroutines. OpenMP parallelization was also implemented for the remaining time-consuming parts, e.g., for the LT (T) energy expression of Eq. (23). A speedup of about a factor of 5 is measured in the wall times when switching from one to six cores.

The efficiency of our LT (T) approach is illustrated on three large, three-dimensional test systems in Table VII. Wall times measured for the evaluation of the LT (T) correction are compared to those required for the (T) calculation with our previous LNO-CCSD(T) code. Consistent and large speedup values of 9–11 are measured showing the superior performance of the present LT (T) implementation. This order of magnitude gain in wall time is the cumulative result of the two separate developments presented here. First, due to the O(N6) scaling LT (T) algorithm, no/3nq times fewer operations are required in each LIS. This is already about a factor of 3 considering an average LIS with no = 25–30. The additional benefit is that even higher speedup factors are obtained for the largest LISs. This makes highly accurate calculations with tighter LNO truncation thresholds much more affordable and helps to balance the computation time needed for LISs of different size, the latter being very advantageous from the perspective of a parallel implementation. The second factor of 3 comes from the introduction of the “1permutation-ijkabc” algorithm and further optimization of the OpenMP parallel efficiency as it is discussed in Sec. II B.

TABLE VII.

Wall-clock times in minutes for LT (T) computations of large molecules (using the default TLT = 10−2, hence nq = 3 in all LISs) compared to our previous LNO-CCSD(T) approach.

AuAmin angiotensin crambin
No. of atoms 92 146 644
Basis set cc-pVTZ cc-pVTZ def2-TZVP
No. of basis functions 2102 3244 12 075
(T) of previous LNO-CCSD(T) 2381 965 5 736
LT (T) of present LNO-CCSD(T) 217 104 582
Speedup 11.0 9.3 9.9

The advancement of an order of magnitude in the efficiency of the (T) part also has a significant effect on the total wall time of the LNO-CCSD(T) calculation. Previously the (T) part of the calculation took about 60%–70% of the total LNO-CCSD(T) correlation energy calculation, which was even higher with tighter LNO truncation thresholds. Now the cost of the (T) calculation was brought down to the magnitude of the integral transformation and CCSD iterations, which ensures that the (T) correction can be evaluated if the preceding CCSD calculation is feasible. For instance, this allowed us to perform a LNO-CCSD(T) calculation on the crambin protein containing 644 atoms and 12 075 basis functions in a matter of days on a single CPU with 6 cores. Moreover, the full LT (T) calculation was carried out in the memory without resorting to any disk I/O. The low-memory algorithm discussed in Appendix B requires less than 5 times as much memory as the size of the CCSD amplitude array, which equals to about 2.5 GB for the largest LIS in the crambin calculation. In the actual run, the entire āb¯|c¯I¯q list was also stored since this required only additional memory of 2.7 GB.

IV. CONCLUSIONS AND OUTLOOK

We presented an improved alternative to evaluate the perturbative triples correction in our linear-scaling LNO-CCSD(T) method utilizing the orbital invariant property of a Laplace transformed (T) fragment energy expression. While the accuracy of the LNO-CCSD(T) correlation energies and energy differences is affected negligibly by using 3–4 quadrature points in the numerical Laplace transform, compared to the previous implementation, an order of magnitude speedup is measured originating from two separate sources. On average, about a factor of 3–4 fewer operations are required for the new redundancy-free LT (T) formula, while the significant optimization of the underlying canonical (T) algorithm and implementation used for the computation of the triples amplitude related quantities results in an additional three-times speedup. These developments can, in principle, be integrated into canonical and other fragmentation based CCSD(T) approaches with minor modifications, and the LT idea can also be generalized to design efficient perturbative quadruples, etc., corrections. The option to save the costs of the summation over the 3–4 quadrature points using semi-canonical T0-type approximations was also examined. In accordance with previous studies, we found that the accuracy of the T0 approximation in its present form is not sufficient for our goals, but it might be useful to perform quick exploratory computations in combination with loose truncation settings.

The algorithm presented here brings closer the computational expenses of the smaller- and larger-domain calculations and can be executed in memory without relying on disk I/O. Both properties fit perfectly into our line of development working towards an LNO-CCSD(T) implementation that runs similarly efficiently on single workstations and large computer clusters. In order to fully exploit the efficiency of the new (T) approach, development is in progress on the integral transformation and CCSD iteration steps, which are currently the rate-determining operations. Already at the present stage LNO-CCSD(T) calculations with at least triple-ζ quality basis sets can be performed on systems with a few hundred atoms and more than 10 000 orbitals within days on a single processor, illustrating the large potential of the LNO-CCSD(T) method in molecular modeling.

It is important to highlight that the fragmentation and domain construction strategy of the LNO-CCSD(T) method are completely automatic and free from heuristic truncations. Furthermore, pre-defined threshold sets allow us to perform simple single point LNO-CCSD(T) calculations in a “black box” manner. The latter aspect will be elaborated on in our forthcoming publication.

ACKNOWLEDGMENTS

The authors gratefully acknowledge the computing time granted on the Hungarian HPC Infrastructure at NIIF Institute, Hungary.

APPENDIX A: “abcijk” ALGORITHM

The sixfold permutational symmetry of the W and V tensors can also be exploited if they are evaluated only for a given virtual index triplet. The energy expression with the corresponding summation restrictions reads as124,125

E(T)=2abcijkYijkabc2ZijkabcWijkabc+Wkijabc+Wjkiabc+Zijkabc2YijkabcWikjabc+Wjikabc+Wkjiabc+3Xijkabc/×Dijkabc1+δij+δjk1+δab+δbc, (A1)

where the X,Y, and Z intermediates are defined analogously to their counterparts in Eqs. (6)(8), that is, the index order of each term of Eqs. (6)(8) is permuted so that all terms will have abc virtual index order instead of the original ijk occupied index order suitable for the “ijkabc” scheme.

The main advantage of the “abcijk” algorithm over “ijkabc” is its much smaller memory requirement: only two intermediate arrays of the size of no3 and a small subset of the ab|ci integrals are needed at a given point. Due to the relevance of the conventional “abcijk” algorithm for highly memory demanding cases, its “1permutation” variant is presented in Algorithm 3. An even more memory economic “2permutation-abcijk” variant can also be designed exactly the same way as it is shown for the “2permutation-ijkabc” case in Algorithm 1.

Algorithm 3.

“1permutation-abcijk” algorithm.

Rijba = Tijab
for c = 1, nv
for bc
for ab
wijk=Ii,dab(Tjk,dc)+Tij,db(Ik,dca)+Ii,dac(Rjk,db)+Rij,da(Ik,dcb)
vkij=Tki,da(Ij,dbc)+Rki,dc(Ij,dba)
wijkTi,lacIl,jkb+Iij,lbTl,kac+Ti,lab(Ijk,lc)
vkij(Il,kic)Tl,jab+Iki,laTl,jcb+Tk,lcbIl,ija
wijkvkij
vijk=wijk+VjkbcTia+VikacTjb+VijabTkc
Calculate energy contribution according to Eq. (A1)
end for
end for
end for

APPENDIX B: LOW-MEMORY LAPLACE TRANSFORMED “ijkabc” ALGORITHM

For the largest domains with no6080 and nv300400, appearing in calculations performed with very tight truncation settings, the size of the entire āb¯|c¯I¯q list is about 10–40 GB, which might be problematic using computers equipped with less memory. For such cases, a low-memory variant of the Laplace transformed “ijkabc” algorithm is designed, where only a block of āb¯|c¯I¯q integrals is stored at a time for all the virtual indices and for nB number of occupied indices. The low-memory algorithm is shown in Algorithm 4.

Algorithm 4.

Low-memory Laplace transformed “ijkabc” algorithm.

for q = 1, nq
Perform transformation and assembly steps of Eqs. (27)(30), but not Eq. (31)
Iāb¯,c¯K¯=Ja¯c¯PJb¯K¯P, only for K = 1
for J = 1, no
if (new block begins) Iāb¯,c¯L¯=Ja¯c¯PJb¯L¯P for JL<J+nB
if (J+nB1<no) Ab¯c¯,PJ¯K¯=Tb¯,d˜J¯K¯(Jc¯,d¯P)+Jb¯,d¯P(Tc¯,d˜K¯J¯)
for IJ
if (I<J+nB) vcabTc¯,d˜K¯J¯Id¯,āb¯I¯+Ic¯ā,d¯I¯(Tb¯,d˜J¯K¯)
else wabcJāI¯PAb¯c¯,PJ¯K¯
The remaining 10 terms, which do not include the above two terms [c.f., Eq. (B1)] are
evaluated as shown in Algorithm 2
end for
end for
end for

Similarly to the case in Algorithm 2, for each IJK index triplet, the āb¯|c¯L¯q integrals are needed for L = I, J, and K = 1 at the same time. Therefore, first, the āb¯|c¯K¯q list is assembled only for K = 1 and kept in memory throughout all the steps for the given quadrature point, q. Then the remaining memory space is filled with a block of āb¯|c¯J¯q integrals so that they will be available in the innermost loop for the actual J index and for I<J+nB. So far the computational cost is not increased compared to Algorithm 2, each āb¯|c¯L¯q integral is assembled only once. The last missing quantities are those āb¯|c¯I¯q integrals, which are not available in the case of IJ+nB. Instead of performing the redundant assembly of the missing āb¯|c¯I¯q integrals in the innermost loop, their contribution is rearranged as

Ic¯ā,d¯I¯(Tb¯,d˜J¯K¯)+Tc¯,d˜K¯J¯Id¯,āb¯I¯=Ja¯I¯PTb¯,d˜J¯K¯(Jc¯,d¯P)+Jb¯,d¯P(Tc¯,d˜K¯J¯)=Ja¯I¯PAb¯c¯,PJ¯K¯. (B1)

We recognize that intermediate Ab¯c¯,PJ¯K¯ does not depend on I and can be constructed outside the loop for I with a negligible cost proportional to 2nqnonv3(3nv), assuming that the number of auxiliary functions is around 3nv. The contraction of Ab¯c¯,PJ¯K¯ with JāI¯P scales as nqno2nv3(3nv)2, which should be compared to the cost of the corresponding terms in Algorithm 2, that is, nqno2nv4. Consequently, in the worst case scenario, when nB = 1, the low-memory algorithm only requires by about 14% more operations. The upside is that the full local (T) computation can be performed without resorting to disk I/O (even for these large domains), which is especially advantageous if a local hard disk is not available for each node as in many of today’s computer clusters.

APPENDIX C: FURTHER ALGORITHMIC CONSIDERATIONS

Additional optimization paths with operation number reduction potential are considered in this section that are not exploited presently but might be beneficial in the context of alternative fragmentation based methods.

In the first approach, the factorization of the orbital energy denominators via Laplace transform40,41 or Cholesky decomposition38,39 can be exploited to evaluate the (T) energy in an nqnonv5 scaling algorithm without ever constructing any T3 amplitude or W and V elements explicitly. Although these alternative algorithms offer a no2(nqnv) factor of speedup over the conventional “ijkabc” scheme, their much larger prefactor and the usually unfavorable no/nv ratio prevented their wide-spread application so far. Moreover, if these alternative factorizations were applied in the domain calculations, the nqnonv5 scaling would remain, which compares much less favorably to the nqno2nv4 scaling “ijkabc” approach.

Another idea for the operation count reduction of the canonical (T) algorithm was put forward by Noga and co-workers.11,147 The authors recasted Eq. (2) by merging the sum of the two types of terms into a supermatrix, and, by the clever rearrangement of the terms, they were able to evaluate the sum of the six permutations in a cost proportional to no2nv3(no+nv)(3no+6), which is by approximately a factor of 1.8 smaller than the conventional 6no3nv3(no+nv) value. The application of the idea of Noga et al. for our LT (T) fragment calculation is, unfortunately, much less favorable. Taking into account only the most demanding steps, the formal speedup would be 56+4no1 without considering numerous additional complications arising from the introduction of the LT into this elaborate term rearranging scheme.

We also attempted to reduce the operation count further via screening by looking at the magnitude of the correlation energy contribution of individual occupied index triplets, e.g., δEIJk,q(T), where δEk(T)=qI>JδEIJk,q(T) [c.f., Eq. (23)]. We found that the δEIJk,q(T)δEk(T) ratio hardly ever goes below 0.0001. This is in accord with our LIS construction strategy because only those LMOs are correlated at the CC level, which form strong pairs with the central LMO. Therefore at least two of the three pairs in the index triplet, namely, Ik and Jk, are strong pairs, which is very similar to the index triplet pre-selection procedure used in other local CCSD(T) methods.82,95,105 Even if a good formula was available for the low-cost estimation of the δEIJk,q(T) values, such large contributions could not be simply dropped. A markedly different situation was found by Schmitz and Hättig in the context of their PNO-CCSD(T) scheme.36 Since there is no pre-selection on the basis of strong pair lists in their method, occupied index triplets with marginal contributions are still present and can be screened. For that purpose they constructed a couple of empirical expressions estimating the correlation energy contribution of an index triplet, but we did not attempted to implement those in the present empirical parameter free approach.

REFERENCES

  • 1.Bartlett R. J. and Musiał M., Rev. Mod. Phys. 79, 291 (2007). 10.1103/revmodphys.79.291 [DOI] [Google Scholar]
  • 2.Kállay M. and Gauss J., J. Chem. Phys. 120, 6841 (2004). 10.1063/1.1668632 [DOI] [PubMed] [Google Scholar]
  • 3.Kállay M. and Gauss J., J. Chem. Phys. 121, 9257 (2004). 10.1063/1.1805494 [DOI] [PubMed] [Google Scholar]
  • 4.Raghavachari K., Trucks G. W., Pople J. A., and Head-Gordon M., Chem. Phys. Lett. 157, 479 (1989). 10.1016/s0009-2614(89)87395-6 [DOI] [Google Scholar]
  • 5.Stanton J. F., Chem. Phys. Lett. 281, 130 (1997). 10.1016/s0009-2614(97)01144-5 [DOI] [Google Scholar]
  • 6.Deegan M. J. O. and Knowles P. J., Chem. Phys. Lett. 227, 321 (1994). 10.1016/0009-2614(94)00815-9 [DOI] [Google Scholar]
  • 7.Kobayashi R. and Rendell A. P., Chem. Phys. Lett. 265, 1 (1997). 10.1016/s0009-2614(96)01387-5 [DOI] [Google Scholar]
  • 8.Anisimov V. M., Bauer G. H., Chadalavada K., Olson R. M., Glenski J. W., Kramer W. T. C., Aprà E., and Kowalski K., J. Chem. Theory Comput. 10, 4307 (2014). 10.1021/ct500404c [DOI] [PubMed] [Google Scholar]
  • 9.Harding M. E., Metzroth T., Gauss J., and Auer A. A., J. Chem. Theory Comput. 4, 64 (2008). 10.1021/ct700152c [DOI] [PubMed] [Google Scholar]
  • 10.Janowski T. and Pulay P., J. Chem. Theory Comput. 4, 1585 (2008). 10.1021/ct800142f [DOI] [PubMed] [Google Scholar]
  • 11.Pitoňák M., Aquilante F., Hobza P., Neogrády P., Noga J., and Urban M., Collect. Czech. Chem. Commun. 76, 713 (2011). 10.1135/cccc2011048 [DOI] [Google Scholar]
  • 12.Deumens E., Lotrich V. F., Perera A., Ponton M. J., Sanders B. A., and Bartlett R. J., Wiley Interdiscip. Rev.: Comput. Mol. Sci. 1, 895 (2011). 10.1002/wcms.77 [DOI] [Google Scholar]
  • 13.Peng C., Calvin J. A., Pavošević F., Zhang J., and Valeev E. F., J. Phys. Chem. A 120, 10231 (2016). 10.1021/acs.jpca.6b10150 [DOI] [PubMed] [Google Scholar]
  • 14.Kaliman I. A. and Krylov A. I., J. Comput. Chem. 38, 842 (2017). 10.1002/jcc.24713 [DOI] [PubMed] [Google Scholar]
  • 15.Eugene DePrince A. III, Kennedy M. R., Sumpter B. G., and Sherrill C. D., Mol. Phys. 112, 844 (2014). 10.1080/00268976.2013.874599 [DOI] [Google Scholar]
  • 16.Asadchev A. and Gordon M. S., J. Chem. Theory Comput. 9, 3385 (2013). 10.1021/ct400054m [DOI] [PubMed] [Google Scholar]
  • 17.Eriksen J. J., “Efficient and portable acceleration of quantum chemical many-body methods in mixed floating point precision using OpenACC compiler directives,” Mol. Phys. (published online, 2016). 10.1080/00268976.2016.1271155 [DOI] [Google Scholar]
  • 18.Langner K. M., Janowski T., Góra R. W., Dziekoǹski P., Sokalski W. A., and Pulay P., J. Chem. Theory Comput. 7, 2600 (2011). 10.1021/ct200121f [DOI] [PubMed] [Google Scholar]
  • 19.Pitoňák M., Janowski T., Neogrády P., Pulay P., and Hobza P., J. Chem. Theory Comput. 5, 1761 (2009). 10.1021/ct900126q [DOI] [PubMed] [Google Scholar]
  • 20.Janowski T., Pulay P., Karunarathna A. S., Sygula A., and Saebø S., Chem. Phys. Lett. 512, 155 (2011). 10.1016/j.cplett.2011.07.030 [DOI] [Google Scholar]
  • 21.Janowski T. and Pulay P., Theor. Chim. Acta 130, 419 (2011). 10.1007/s00214-011-1009-6 [DOI] [Google Scholar]
  • 22.Sedlak R., Janowski T., Pitoňák M., Řezáč J., Pulay P., and Hobza P., J. Chem. Theory Comput. 9, 3364 (2013). 10.1021/ct400036b [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Janowski T. and Pulay P., J. Am. Chem. Soc. 134, 17520 (2012). 10.1021/ja303676q [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Adamowicz L. and Bartlett R. J., J. Chem. Phys. 86, 6314 (1987). 10.1063/1.452468 [DOI] [Google Scholar]
  • 25.Neogrády P., Pitoňák M., and Urban M., Mol. Phys. 103, 2141 (2005). 10.1080/00268970500096251 [DOI] [Google Scholar]
  • 26.Meyer W., J. Chem. Phys. 58, 1017 (1973). 10.1063/1.1679283 [DOI] [Google Scholar]
  • 27.Ahlrichs R., Lischka H., Staemmler V., and Kutzelnigg W., J. Chem. Phys. 62, 1225 (1975). 10.1063/1.430637 [DOI] [Google Scholar]
  • 28.Klopper W., Noga J., Koch H., and Helgaker T., Theor. Chem. Acc. 97, 164 (1996). 10.1007/s002140050250 [DOI] [Google Scholar]
  • 29.Rolik Z. and Kállay M., J. Chem. Phys. 134, 124111 (2011). 10.1063/1.3569829 [DOI] [PubMed] [Google Scholar]
  • 30.Taube A. G. and Bartlett R. J., J. Chem. Phys. 130, 144112 (2009). 10.1063/1.3115467 [DOI] [PubMed] [Google Scholar]
  • 31.DePrince A. E. and Sherrill C. D., J. Chem. Theory Comput. 9, 293 (2012). 10.1021/ct300780u [DOI] [PubMed] [Google Scholar]
  • 32.DePrince A. E. and Sherrill C. D., J. Chem. Theory Comput. 9, 2687 (2013). 10.1021/ct400250u [DOI] [PubMed] [Google Scholar]
  • 33.Epifanovsky E., Zuev D., Feng X., Khistyaev K., Shao Y., and Krylov A. I., J. Chem. Phys. 139, 134105 (2013). 10.1063/1.4820484 [DOI] [PubMed] [Google Scholar]
  • 34.Rolik Z., Szegedy L., Ladjánszki I., Ladóczki B., and Kállay M., J. Chem. Phys. 139, 094105 (2013). 10.1063/1.4819401 [DOI] [PubMed] [Google Scholar]
  • 35.Riplinger C., Pinski P., Becker U., Valeev E. F., and Neese F., J. Chem. Phys. 144, 024109 (2016). 10.1063/1.4939030 [DOI] [PubMed] [Google Scholar]
  • 36.Schmitz G. and Hättig C., J. Chem. Phys. 145, 234107 (2016). 10.1063/1.4972001 [DOI] [PubMed] [Google Scholar]
  • 37.Schwilk M., Usvyat D., and Werner H.-J., J. Chem. Phys. 142, 121102 (2015). 10.1063/1.4916316 [DOI] [PubMed] [Google Scholar]
  • 38.Koch H. and Sánchez de Merás A. M., J. Chem. Phys. 113, 508 (2000). 10.1063/1.481910 [DOI] [Google Scholar]
  • 39.Cacheiro J. L., Pedersen T. B., Fernández B., Sánchez de Merás A., and Koch H., Int. J. Quantum Chem. 111, 349 (2011). 10.1002/qua.22582 [DOI] [Google Scholar]
  • 40.Constans P., Ayala P. Y., and Scuseria G. E., J. Chem. Phys. 113, 10451 (2000). 10.1063/1.1324989 [DOI] [Google Scholar]
  • 41.Constans P. and Scuseria G. E., Collect. Czech. Chem. Commun. 68, 357 (2003). 10.1135/cccc20030357 [DOI] [Google Scholar]
  • 42.Almlöf J., Chem. Phys. Lett. 181, 319 (1991). 10.1016/0009-2614(91)80078-c [DOI] [Google Scholar]
  • 43.Häser M., Theor. Chim. Acta 87, 147 (1993). 10.1007/bf01113535 [DOI] [Google Scholar]
  • 44.Häser M. and Almlöf J., J. Chem. Phys. 96, 489 (1992). 10.1063/1.462485 [DOI] [Google Scholar]
  • 45.Ayala P. Y. and Scuseria G. E., J. Chem. Phys. 110, 3660 (1999). 10.1063/1.478256 [DOI] [Google Scholar]
  • 46.Scuseria G. E. and Ayala P. Y., J. Chem. Phys. 111, 8330 (1999). 10.1063/1.480174 [DOI] [Google Scholar]
  • 47.Doser B., Lambrecht D. S., Kussmann J., and Ochsenfeld C., J. Chem. Phys. 130, 064107 (2009). 10.1063/1.3072903 [DOI] [PubMed] [Google Scholar]
  • 48.Maurer S. A., Lambrecht D. S., Kussmann J., and Ochsenfeld C., J. Chem. Phys. 138, 014101 (2013). 10.1063/1.4770502 [DOI] [PubMed] [Google Scholar]
  • 49.Maurer S. A., Lambrecht D. S., Flaig D., and Ochsenfeld C., J. Chem. Phys. 136, 144107 (2012). 10.1063/1.3693908 [DOI] [PubMed] [Google Scholar]
  • 50.Kobayashi M. and Nakai H., Chem. Phys. Lett. 420, 250 (2006). 10.1016/j.cplett.2005.12.088 [DOI] [Google Scholar]
  • 51.Surján P. R., Chem. Phys. Lett. 406, 318 (2005). 10.1016/j.cplett.2005.03.024 [DOI] [Google Scholar]
  • 52.Jung Y., Lochan R. C., Dutoi A. D., and Head-Gordon M., J. Chem. Phys. 121, 9793 (2004). 10.1063/1.1809602 [DOI] [PubMed] [Google Scholar]
  • 53.Grimme S., Goerigk L., and Fink R. F., Wiley Interdiscip. Rev.: Comput. Mol. Sci. 2, 886 (2012). 10.1002/wcms.1110 [DOI] [Google Scholar]
  • 54.Winter N. O. C. and Hättig C., J. Chem. Phys. 134, 184101 (2011). 10.1063/1.3584177 [DOI] [PubMed] [Google Scholar]
  • 55.Schäfer T., Ramberger B., and Kresse G., J. Chem. Phys. 146, 104101 (2017). 10.1063/1.4976937 [DOI] [PubMed] [Google Scholar]
  • 56.Kaltak M., Klimeš J., and Kresse G., J. Chem. Theory Comput. 10, 2498 (2014). 10.1021/ct5001268 [DOI] [PubMed] [Google Scholar]
  • 57.Maslen P. E., Dutoi A. D., Lee M. S., Shao Y., and Head-Gordon M., Mol. Phys. 103, 425 (2005). 10.1080/00268970412331319227 [DOI] [Google Scholar]
  • 58.Nakajima T. and Hirao K., Chem. Phys. Lett. 427, 225 (2006). 10.1016/j.cplett.2006.06.059 [DOI] [Google Scholar]
  • 59.Kats D. and Schütz M., J. Chem. Phys. 131, 124117 (2009). 10.1063/1.3237134 [DOI] [PubMed] [Google Scholar]
  • 60.Freundorfer K., Kats D., Korona T., and Schütz M., J. Chem. Phys. 133, 244110 (2010). 10.1063/1.3506684 [DOI] [PubMed] [Google Scholar]
  • 61.Ledermüller K., Kats D., and Schütz M., J. Chem. Phys. 139, 084111 (2013). 10.1063/1.4818586 [DOI] [PubMed] [Google Scholar]
  • 62.Ledermüller K. and Schütz M., J. Chem. Phys. 140, 164113 (2014). 10.1063/1.4872169 [DOI] [PubMed] [Google Scholar]
  • 63.Kállay M., J. Chem. Phys. 142, 204105 (2015). 10.1063/1.4921542 [DOI] [PubMed] [Google Scholar]
  • 64.Nagy P. R., Samu G., and Kállay M., J. Chem. Theory Comput. 12, 4897 (2016). 10.1021/acs.jctc.6b00732 [DOI] [PubMed] [Google Scholar]
  • 65.Kjærgaard T., J. Chem. Phys. 146, 044103 (2017). 10.1063/1.4973710 [DOI] [PubMed] [Google Scholar]
  • 66.Pulay P., Chem. Phys. Lett. 100, 151 (1983). 10.1016/0009-2614(83)80703-9 [DOI] [Google Scholar]
  • 67.Pulay P. and Saebø S., Theor. Chim. Acta 69, 357 (1986). 10.1007/bf00526697 [DOI] [Google Scholar]
  • 68.Saebø S. and Pulay P., J. Chem. Phys. 86, 914 (1987). 10.1063/1.452293 [DOI] [Google Scholar]
  • 69.Saebø S. and Pulay P., Chem. Phys. Lett. 113, 13 (1985). 10.1016/0009-2614(85)85003-x [DOI] [Google Scholar]
  • 70.Förner W., Ladik J., Otto P., and Čížek J., Chem. Phys. 97, 251 (1985). 10.1016/0301-0104(85)87035-x [DOI] [Google Scholar]
  • 71.Hampel C. and Werner H.-J., J. Chem. Phys. 104, 6286 (1996). 10.1063/1.471289 [DOI] [Google Scholar]
  • 72.Schütz M., Hetzer G., and Werner H.-J., J. Chem. Phys. 111, 5691 (1999). 10.1063/1.479957 [DOI] [Google Scholar]
  • 73.Schütz M. and Werner H.-J., J. Chem. Phys. 114, 661 (2001). 10.1063/1.1330207 [DOI] [Google Scholar]
  • 74.Schütz M. and Werner H.-J., Chem. Phys. Lett. 318, 370 (2000). 10.1016/s0009-2614(00)00066-x [DOI] [Google Scholar]
  • 75.Schütz M., J. Chem. Phys. 113, 9986 (2000). 10.1063/1.1323265 [DOI] [Google Scholar]
  • 76.Hetzer G., Schütz M., Stoll H., and Werner H.-J., J. Chem. Phys. 113, 9443 (2000). 10.1063/1.1321295 [DOI] [Google Scholar]
  • 77.Schütz M., Phys. Chem. Chem. Phys. 4, 3941 (2002). 10.1039/b203994j [DOI] [Google Scholar]
  • 78.Kats D., Korona T., and Schütz M., J. Chem. Phys. 125, 104106 (2006). 10.1063/1.2339021 [DOI] [PubMed] [Google Scholar]
  • 79.Kats D., Korona T., and Schütz M., J. Chem. Phys. 127, 064107 (2007). 10.1063/1.2755778 [DOI] [PubMed] [Google Scholar]
  • 80.Werner H.-J., Manby F. R., and Knowles P. J., J. Chem. Phys. 118, 8149 (2003). 10.1063/1.1564816 [DOI] [Google Scholar]
  • 81.Schütz M. and Manby F. R., Phys. Chem. Chem. Phys. 5, 3349 (2003). 10.1039/b304550a [DOI] [Google Scholar]
  • 82.Werner H.-J. and Schütz M., J. Chem. Phys. 135, 144116 (2011). 10.1063/1.3641642 [DOI] [PubMed] [Google Scholar]
  • 83.Adler T. B., Werner H.-J., and Manby F. R., J. Chem. Phys. 130, 054106 (2009). 10.1063/1.3040174 [DOI] [PubMed] [Google Scholar]
  • 84.Schütz M., Werner H.-J., Lindh R., and Manby F. R., J. Chem. Phys. 121, 737 (2004). 10.1063/1.1760747 [DOI] [PubMed] [Google Scholar]
  • 85.Loibl S. and Schütz M., J. Chem. Phys. 137, 084107 (2012). 10.1063/1.4744102 [DOI] [PubMed] [Google Scholar]
  • 86.Werner H.-J. and Manby F. R., J. Chem. Phys. 124, 054114 (2006). 10.1063/1.2150817 [DOI] [PubMed] [Google Scholar]
  • 87.Adler T. B. and Werner H.-J., J. Chem. Phys. 135, 144117 (2011). 10.1063/1.3647565 [DOI] [PubMed] [Google Scholar]
  • 88.Krause C. and Werner H.-J., Phys. Chem. Chem. Phys. 14, 7591 (2012). 10.1039/c2cp40231a [DOI] [PubMed] [Google Scholar]
  • 89.Schütz M., J. Chem. Phys. 116, 8772 (2002). 10.1063/1.1470497 [DOI] [Google Scholar]
  • 90.El Azhary A., Rauhut G., Pulay P., and Werner H.-J., J. Chem. Phys. 108, 5185 (1998). 10.1063/1.475955 [DOI] [Google Scholar]
  • 91.Rauhut G. and Werner H.-J., Phys. Chem. Chem. Phys. 3, 4853 (2001). 10.1039/b105126c [DOI] [Google Scholar]
  • 92.Gauss J. and Werner H.-J., Phys. Chem. Chem. Phys. 2, 2083 (2000). 10.1039/b000024h [DOI] [Google Scholar]
  • 93.Korona T. and Werner H.-J., J. Chem. Phys. 118, 3006 (2003). 10.1063/1.1537718 [DOI] [Google Scholar]
  • 94.Riplinger C. and Neese F., J. Chem. Phys. 138, 034106 (2013). 10.1063/1.4773581 [DOI] [PubMed] [Google Scholar]
  • 95.Riplinger C., Sandhoefer B., Hansen A., and Neese F., J. Chem. Phys. 139, 134101 (2013). 10.1063/1.4821834 [DOI] [PubMed] [Google Scholar]
  • 96.Pinski P., Riplinger C., Valeev E. F., and Neese F., J. Chem. Phys. 143, 034108 (2015). 10.1063/1.4926879 [DOI] [PubMed] [Google Scholar]
  • 97.Werner H.-J., Knizia G., Krause C., Schwilk M., and Dornbach M., J. Chem. Theory Comput. 11, 484 (2015). 10.1021/ct500725e [DOI] [PubMed] [Google Scholar]
  • 98.Ma Q. and Werner H.-J., J. Chem. Theory Comput. 11, 5291 (2015). 10.1021/acs.jctc.5b00843 [DOI] [PubMed] [Google Scholar]
  • 99.Hättig C., Tew D. P., and Helmich B., J. Chem. Phys. 136, 204105 (2012). 10.1063/1.4719981 [DOI] [PubMed] [Google Scholar]
  • 100.Schmitz G., Helmich B., and Hättig C., Mol. Phys. 111, 2463 (2013). 10.1080/00268976.2013.794314 [DOI] [Google Scholar]
  • 101.Schmitz G., Hattig C., and Tew D. P., Phys. Chem. Chem. Phys. 16, 22167 (2014). 10.1039/c4cp03502j [DOI] [PubMed] [Google Scholar]
  • 102.Yang J., Kurashige Y., Manby F. R., and Chan G. K.-L., J. Chem. Phys. 134, 044123 (2011). 10.1063/1.3528935 [DOI] [PubMed] [Google Scholar]
  • 103.Kurashige Y., Yang J., Chan G. K.-L., and Manby F. R., J. Chem. Phys. 136, 124106 (2012). 10.1063/1.3696962 [DOI] [PubMed] [Google Scholar]
  • 104.Yang J., Chan G. K.-L., Manby F. R., Schütz M., and Werner H.-J., J. Chem. Phys. 136, 144105 (2012). 10.1063/1.3696963 [DOI] [PubMed] [Google Scholar]
  • 105.Schütz M., Yang J., Chan G. K.-L., Manby F. R., and Werner H.-J., J. Chem. Phys. 138, 054109 (2013). 10.1063/1.4789415 [DOI] [PubMed] [Google Scholar]
  • 106.Rolik Z. and Kállay M., J. Chem. Phys. 135, 104111 (2011). 10.1063/1.3632085 [DOI] [PubMed] [Google Scholar]
  • 107.Raghavachari K. and Saha A., Chem. Rev. 115, 5643 (2015). 10.1021/cr500606e [DOI] [PubMed] [Google Scholar]
  • 108.Flocke N. and Bartlett R. J., J. Chem. Phys. 121, 10935 (2004). 10.1063/1.1811606 [DOI] [PubMed] [Google Scholar]
  • 109.Friedrich J. and Dolg M., J. Chem. Theory Comput. 5, 287 (2009). 10.1021/ct800355e [DOI] [PubMed] [Google Scholar]
  • 110.Kobayashi M. and Nakai H., J. Chem. Phys. 131, 114108 (2009). 10.1063/1.3211119 [DOI] [PubMed] [Google Scholar]
  • 111.Mochizuki Y., Yamashita K., Nakano T., Okiyama Y., Fukuzawa K., Taguchi N., and Tanaka S., Theor. Chim. Acta 130, 515 (2011). 10.1007/s00214-011-1036-3 [DOI] [Google Scholar]
  • 112.Eriksen J. J., Baudin P., Ettenhuber P., Kristensen K., Kjærgaard T., and Jørgensen P., J. Chem. Theory Comput. 11, 2984 (2015). 10.1021/acs.jctc.5b00086 [DOI] [PubMed] [Google Scholar]
  • 113.Li W., Piecuch P., Gour J. R., and Li S., J. Chem. Phys. 131, 114109 (2009). 10.1063/1.3218842 [DOI] [PubMed] [Google Scholar]
  • 114.Stoll H., Phys. Rev. B 46, 6700 (1992). 10.1103/physrevb.46.6700 [DOI] [PubMed] [Google Scholar]
  • 115.Rościszewski K., Doll K., Paulus B., Fulde P., and Stoll H., Phys. Rev. B 57, 14667 (1998). 10.1103/physrevb.57.14667 [DOI] [Google Scholar]
  • 116.Friedrich J. and Walczak K., J. Chem. Theory Comput. 9, 408 (2013). 10.1021/ct300938w [DOI] [PubMed] [Google Scholar]
  • 117.Li W. and Li S., J. Chem. Phys. 121, 6649 (2004). 10.1063/1.1792051 [DOI] [PubMed] [Google Scholar]
  • 118.Kobayashi M. and Nakai H., J. Chem. Phys. 129, 044103 (2008). 10.1063/1.2956490 [DOI] [PubMed] [Google Scholar]
  • 119.Ziółkowski M., Jansík B., Kjærgaard T., and Jørgensen P., J. Chem. Phys. 133, 014107 (2010). 10.1063/1.3456535 [DOI] [PubMed] [Google Scholar]
  • 120.Kristensen K., Høyvik I.-M., Jansík B., Jørgensen P., Kjærgaard T., Reine S., and Jakowski J., Phys. Chem. Chem. Phys. 14, 15706 (2012). 10.1039/c2cp41958k [DOI] [PubMed] [Google Scholar]
  • 121.Li W., Ni Z., and Li S., Mol. Phys. 114, 1447 (2016). 10.1080/00268976.2016.1139755 [DOI] [Google Scholar]
  • 122.Findlater A. D., Zahariev F., and Gordon M. S., J. Phys. Chem. A 119, 3587 (2015). 10.1021/jp509266g [DOI] [PubMed] [Google Scholar]
  • 123.Hégely B., Nagy P. R., Ferenczy G. G., and Kállay M., J. Chem. Phys. 145, 064107 (2016). 10.1063/1.4960177 [DOI] [Google Scholar]
  • 124.Rendell A. P., Lee T. J., and Komornicki A., Chem. Phys. Lett. 178, 462 (1991). 10.1016/0009-2614(91)87003-t [DOI] [Google Scholar]
  • 125.Lee T. J., Rendell A. P., and Taylor P. R., J. Phys. Chem. 94, 5463 (1990). 10.1021/j100377a008 [DOI] [Google Scholar]
  • 126.Rendell A. P., Lee T. J., Komornicki A., and Wilson S., Theor. Chim. Acta 84, 271 (1993). 10.1007/bf01113267 [DOI] [Google Scholar]
  • 127.Kállay M., Rolik Z., Csontos J., Ladjánszki I., Szegedy L., Ladóczki B., Samu G., Petrov K., Farkas M., Nagy P., Mester D., and Hégely B., MRCC, a quantum chemical program suite, release date April 12, 2017, see also Ref. 34 as well as http://www.mrcc.hu/.
  • 128.Li W., Piecuch P., and Gour J. R., “Linear scaling local correlation extensions of the standard and renormalized coupled-cluster methods,” in Advances in the Theory of Atomic and Molecular Systems: Conceptual and Computational Advances in Quantum Chemistry, edited by Piecuch P., Maruani J., Delgado-Barrio G., and Wilson S. (Springer Netherlands, Dordrecht, 2009), pp. 131–195. [Google Scholar]
  • 129.Kállay M., J. Chem. Phys. 141, 244113 (2014). 10.1063/1.4905005 [DOI] [PubMed] [Google Scholar]
  • 130.Takatsuka A., Ten-no S., and Hackbusch W., J. Chem. Phys. 129, 044112 (2008). 10.1063/1.2958921 [DOI] [PubMed] [Google Scholar]
  • 131.Nagy P. R., Surján P. R., and Szabados Á., Theor. Chem. Acc. 132, 1109 (2012). 10.1007/s00214-012-1109-y [DOI] [Google Scholar]
  • 132.Tóth Z., Nagy P. R., Jeszenszki P., and Szabados Á., Theor. Chem. Acc. 134, 100 (2015). 10.1007/s00214-015-1703-x [DOI] [Google Scholar]
  • 133.We note that there is a typo in the analogous Eqs. (30) and (34) of Ref. 34, the factor of 13 in front of the third sign should be 1.
  • 134.Neese F., Wennmohs F., and Hansen A., J. Chem. Phys. 130, 114108 (2009). 10.1063/1.3086717 [DOI] [PubMed] [Google Scholar]
  • 135.T. H. Dunning, Jr., J. Chem. Phys. 90, 1007 (1989). 10.1063/1.456153 [DOI] [Google Scholar]
  • 136.Kendall R. A., T. H. Dunning, Jr., and Harrison R. J., J. Chem. Phys. 96, 6796 (1992). 10.1063/1.462569 [DOI] [Google Scholar]
  • 137.Woon D. E. and T. H. Dunning, Jr., J. Chem. Phys. 98, 1358 (1993). 10.1063/1.464303 [DOI] [Google Scholar]
  • 138.Weigend F. and Ahlrichs R., Phys. Chem. Chem. Phys. 7, 3297 (2005). 10.1039/b508541a [DOI] [PubMed] [Google Scholar]
  • 139.Peterson K. A. and Puzzarini C., Theor. Chem. Acc. 114, 283 (2005). 10.1007/s00214-005-0681-9 [DOI] [Google Scholar]
  • 140.Figgen D., Rauhut G., Dolg M., and Stoll H., Chem. Phys. 311, 227 (2005). 10.1016/j.chemphys.2004.10.005 [DOI] [Google Scholar]
  • 141.Weigend F., J. Comput. Chem. 29, 167 (2008). 10.1002/jcc.20702 [DOI] [PubMed] [Google Scholar]
  • 142.Weigend F., Häser M., Patzelt H., and Ahlrichs R., Chem. Phys. Lett. 294, 143 (1998). 10.1016/s0009-2614(98)00862-8 [DOI] [Google Scholar]
  • 143.Weigend F., Köhn A., and Hättig C., J. Chem. Phys. 116, 3175 (2002). 10.1063/1.1445115 [DOI] [Google Scholar]
  • 144.Foster J. M. and Boys S. F., Rev. Mod. Phys. 32, 300 (1960). 10.1103/revmodphys.32.300 [DOI] [Google Scholar]
  • 145.Helmich-Paris B. and Visscher L., J. Chem. Phys. 321, 927 (2016). 10.1016/j.jcp.2016.06.011 [DOI] [PubMed] [Google Scholar]
  • 146.Kats D., Usvyat D., Loibl S., Merz T., and Schütz M., J. Chem. Phys. 130, 127101 (2009). 10.1063/1.3092982 [DOI] [PubMed] [Google Scholar]
  • 147.Noga J. and Valiron P., Mol. Phys. 103, 2123 (2005). 10.1080/00268970500131140 [DOI] [Google Scholar]

Articles from The Journal of Chemical Physics are provided here courtesy of American Institute of Physics

RESOURCES