Abstract
Advances in computational chemistry create an ongoing need for larger and higher-quality datasets that characterize noncovalent molecular interactions. We present three benchmark collections of quantum mechanical data, covering approximately 3,700 distinct types of interacting molecule pairs. The first collection, which we refer to as DES370K, contains interaction energies for more than 370,000 dimer geometries. These were computed using the coupled-cluster method with single, double, and perturbative triple excitations [CCSD(T)], which is widely regarded as the gold-standard method in electronic structure theory. Our second benchmark collection, a core representative subset of DES370K called DES15K, is intended for more computationally demanding applications of the data. Finally, DES5M, our third collection, comprises interaction energies for nearly 5,000,000 dimer geometries; these were calculated using SNS-MP2, a machine learning approach that provides results with accuracy comparable to that of our coupled-cluster training data. These datasets may prove useful in the development of density functionals, empirically corrected wavefunction-based approaches, semi-empirical methods, force fields, and models trained using machine learning methods.
Subject terms: Scientific data, Quantum chemistry, Computational models, Quantum mechanics, Computational biophysics
Measurement(s) | Molecular Interaction Process • interaction energy • energy |
Technology Type(s) | ab initio quantum chemistry computational method |
Factor Type(s) | molecular entity |
Machine-accessible metadata file describing the reported data: 10.6084/m9.figshare.13521638
Background & Summary
Noncovalent interactions are essential determinants of the properties of molecular liquids and crystals, solvation effects, and the structure and function of biomolecules. Experimental means of quantifying individual noncovalent interactions are limited to small systems with relatively rigid intramolecular degrees of freedom1, and computer simulations offer a much-needed alternative; quantum mechanical (QM) calculations, for example, enable the characterization of noncovalent interactions with high accuracy. Among QM-based approaches, the use of coupled-cluster singles and doubles with perturbative triples [CCSD(T)]2–4 at the complete basis set (CBS) limit is widely recognized as the gold-standard method for noncovalent interactions4.
High-accuracy QM methods come with an intrinsically high cost; CCSD(T), for example, scales as O(N7) with system size. Publicly available databases5–14 offer a way to amortize this cost over a large user community, thus reducing the burden on individual researchers. Such databases serve as recognized benchmarks, and are indispensable resources for both accuracy assessment and parameterization of more affordable QM approximations such as exchange-correlation functionals12–17 in the density functional theory framework, empirically corrected wavefunction-based approaches18–23, and semi-empirical methods24–27 (for a comprehensive review, see summary works28,29). Benchmark-quality QM data, often in combination with experimental data, also feature prominently in the development of many empirical molecular mechanics–based models (so-called “force fields”)30–34. Diverse, extensive, and consistent collections of high-quality data, moreover, can enable powerful machine learning approaches to be leveraged for molecular modeling35–41.
Here we present three benchmark databases of quantum chemical data, including the full Cartesian coordinates of the associated geometries42. The first is DES370K, a database of dimer interaction energies computed at the CCSD(T)/CBS level of theory. This database features 370,959 unique geometries for 3,691 distinct dimers, which represent 392 closed-shell chemical species (both neutral molecules and ions) including, but not limited to, water and the functional groups found in proteins. An important subset of the data in the DES370K collection consists of QM-optimized dimer structures, which were used as starting points to generate additional structures along one-dimensional radial profiles. To enhance orientational diversity and ensure adequate sampling of the internal degrees of freedom in the larger chemical species, the dataset also includes a large ensemble of structures (and corresponding radial profiles) obtained from molecular dynamics (MD) simulations (Table 1). Because many potential applications of the presented data, such as parameterizing a new exchange-correlation functional, are computationally demanding, we additionally compiled DES15K, a core subset of the most representative structures from DES370K that largely retains the chemical and orientational diversity of DES370K, but with reduced resolution of scan points in the radial profiles (Table 1).
Table 1.
Database | Protocol | Monomers | Dimers | Groups | Dimer geometries |
---|---|---|---|---|---|
DES370K | Dimer scans based on QM optimizationa | 166 | 3,436 | 3,476 | 97,368 |
Dimer scans based on MD configurationsb | 382 | 466 | 6,133 | 166,914 | |
Homodimer single points based on MD configurationsc | 91 | 91 | 910 | 42,201 | |
Heterodimer single points based on MD configurationsd | 261 | 261 | 2,150 | 64,476 | |
Total | 392 | 3,691 | 12,669 | 370,959 | |
DES15K (subset of DES370K) | Dimer scans based on QM optimizationa | 159 | 3,052 | 3,052 | 12,183 |
Dimer scans based on MD configurationsb | 137 | 206 | 1,929 | 2,468 | |
Total | 159 | 3,052 | 4,981 | 14,651 | |
DES5M | Dimer scans based on QM optimizationa | 153 | 2,826 | 71,847 | 2,404,926 |
Dimer scans based on MD configurationsb | 159 | 328 | 47,648 | 1,646,832 | |
Homodimer single points based on MD configurationsc | 138 | 138 | 12,983 | 464,951 | |
Heterodimer single points based on MD configurationsd | 163 | 163 | 14,641 | 439,229 | |
Total | 206 | 2,967 | 147,119 | 4,955,938 |
For each database, we list the protocols employed to generate particular subsets of the data, counts associated with those subsets, and the total count across subsets. The counts shown are the number of chemically distinct monomer types (“Monomers”); the number of chemically distinct dimer types (“Dimers”); the number of groups (“Groups”), where a group is a set of connected calculations, such as those from a radial profile under a dimer-scan protocol or those from a single MD frame under a single-point protocol; and the total number of dimer calculations (i.e., entries in the database) (“Dimer geometries”).
aReference dimer geometries were identified using QM optimization and used to construct a group of radial scan–based geometries.
bReference dimer geometries were extracted from MD simulations of neat liquids and solvated monomers and used to construct a group of radial scan–based geometries.
cReference multimer geometries were extracted from MD simulations of neat liquids and decomposed into a group of single-point dimer geometries.
dReference multimer geometries were extracted from MD simulations of solvated monomers and decomposed into a group of single-point dimer geometries.
The DES370K collection was the source of both training and test data for a machine learning method, SNS-MP2, which we have described in full detail elsewhere39. Briefly, the SNS-MP2 approach combines the spin-component-scaled second-order Møller-Plesset perturbation theory (MP2) method43 with a neural network to predict per-conformer same-spin and opposite-spin energy scaling coefficients. We found39 that for dimer interaction energies, the SNS-MP2 method offers—at a greatly reduced cost—accuracy comparable to that of the CCSD(T)/CBS approach used to obtain the benchmark data in DES370K. The SNS-MP2 neural network also provides per-conformer confidence intervals for the predicted interaction energies39.
Using the SNS-MP2 approach, we generated DES5M, a database of predicted gold-standard dimer interaction energies and their associated confidence intervals (Table 1). The DES5M collection contains 4,955,938 additional unique geometries originating from the same two sources as were used for DES370K: radial profiles starting from a set of QM-optimized conformers and dimer geometries extracted from MD simulations. Both the DES5M and DES370K databases also include the full set of MP2-based QM observables that serve as inputs to the SNS-MP2 procedure39, thereby allowing for the parameterization and evaluation of other SNS-MP2-like models.
We expect that these three databases will serve as valuable benchmarks for a variety of approximate methods in computational chemistry.
Methods
Generation of monomer geometries
Input monomers were specified in the simplified molecular-input line-entry system (SMILES) string format44. Hydrogen atoms were added and initial three-dimensional (3D) conformations were generated using the Open Babel45 software package. The geometry was then optimized using the OPLS_2005 force field46, starting from a large number of perturbed initial structures (with dihedral angles sampled randomly from a uniform distribution over the range ±180° and out-of-plane angles sampled randomly from a uniform distribution over the range ±30°), to identify a set of unique stable conformers for each monomer.
Our intent was to use the QM data to fit force fields for MD simulation, and so we followed the common practice of constraining bonds to hydrogen atoms and valent angles involving two hydrogens to predefined target values. These constraints lead to more stable MD simulations, thus enabling the use of larger time steps. For a bond length, the target value was derived as a sum of three contributions: the equilibrium distance (Re); a vibrational correction, which accounts for the anharmonicity of the stretch potential; and a correction to account for condensed-phase effects in water. The vibrational correction was estimated by approximating the monomer energy with a Morse potential47 as a function of the bond length, , where De and α are fitted parameters that control the well depth and width of the potential, respectively. The equation for the Morse potential leads to the following relationship between the equilibrium (Re) and vibrationally averaged (Rg) bond lengths: , where ћ denotes the Planck constant and μ the reduced mass of the two bonded atoms. The condensed-phase effects were estimated from the energy derivative along the stretch coordinate in a system containing the molecule of interest surrounded by a 4-Å-thick shell of solvent molecules (typically consisting of 16–26 waters). Constraint targets for valent angles were derived from their equilibrium values corrected for the condensed phase effect. We omitted angle vibrational corrections, which we expected to have only a small impact on intermolecular interactions.
We subjected our set of force field–derived monomer conformations to QM-geometry optimization, applying constraints to hydrogen-containing bond lengths and angles, at the MP2 level of theory using the density-fitting, local, and frozen-core approximations (DF-LMP2)48–58 in the MOLPRO 2012 quantum software package (http://www.molpro.net)59 with a triple-zeta, correlation-consistent basis set (aVTZ). (A detailed description of this basis set, and all other basis sets used in this study—including the double-zeta (aVDZ) and quadruple-zeta (aVQZ) variants of aVTZ—is provided in the Supplementary Information.)60–76 The resulting set of unique monomer conformations were the starting point for the generation of QM-based dimer geometries.
Generation of QM-based dimer geometries
Dimer geometries were initially optimized with the OPLS_200546 force field starting from randomly generated relative monomer positions and orientations; the monomer conformations themselves were randomly selected from the corresponding set of QM-optimized structures (described above). The monomers were kept rigid during both this step and all subsequent QM dimer optimization steps. We identified unique dimer minima from this set and then optimized them using a two-step QM procedure: first at the relatively inexpensive DF-LMP2/aVDZ level of theory, then at the DF-MP2/aVTZ level (the convergence threshold for rigid-body optimization was 10−4 a.u. in both the center-of-mass gradient and torque). We note that because these minima are seeded from an empirical force field, we do not expect to necessarily recapitulate the global minimum as captured by a higher-level QM method. The set of unique QM-optimized dimer geometries served as starting points for one-dimensional radial scans along an intermolecular axis in 0.1-Å steps, probing separations that were either more compact (i.e., with the shortest intermolecular contact reaching ~1 Å) or more distant (i.e., up to 5 Å more distant than the reference). The internal monomer geometries were preserved when constructing these scans. The intermolecular axis was defined as the line connecting weighted atomic centers of the two molecules, with the weight for each atom defined as C/R6, where R is the distance to the nearest atom from the other molecule (coefficient C is 1.0 for heavy atoms and 0.1 for hydrogens). Such a definition successfully reproduces, in an automated way, intuitively expected dissociation directions for both nonpolar complexes and hydrogen-bonded dimers; for example, in the latter case the two monomer centers reside in the vicinity of the donor and acceptor atoms.
Generation of MD-based dimer geometries
To more closely mimic biologically relevant physical conditions, we derived a large set of dimer geometries from condensed-phase MD simulations. For a given molecule, two types of simulations were run (both with the OPLS_2005 force field and MD sampling under the NVT ensemble using the Desmond software package)77,78: First, a neat liquid was simulated at the temperature closest to 298 K under which the system remains a liquid at atmospheric pressure, with the density set to the experimentally determined value for that liquid; second, a single solute molecule solvated in a cubic water box (30 Å × 30 Å × 30 Å) was simulated at 298 K and a pre-solute density of 0.997 g cm−3. Dimer configurations were extracted from the MD simulation frames and clustered as follows: (i) randomly select an MD dimer configuration as a center of the first cluster and remove from the ensemble M/N structures closest to the center, where M is the number of MD configurations and N is the desired number of clusters; (ii) select as a center of the second cluster the configuration most distant from the first center and remove from the remaining unassigned ensemble M/N structures closest to the second center; (iii) repeat step (ii) until N centers are selected based on the largest distance to the closest previously selected center. The distance between two conformers is defined according to the bag-of-bonds79 approach. Such a procedure achieves the twin objective of obtaining samples that are both representative and diverse. These dimer configurations were then used to generate radial scans following the same protocol as for QM-optimized conformers.
Multimer configurations were typically extracted from the same MD simulation frames. The multimer configurations extracted from neat liquid simulations were decomposed into the set of all possible homodimer geometries, and those extracted from the water-solvated monomer simulations were decomposed into the set of all possible heterodimer geometries (unless water was both the solute and solvent, in which case water dimer geometries were generated). These multimer-derived dimer configurations were used in single-point QM calculations (not used to seed radial scans).
QM calculation of dimer interaction energies
For all dimer geometries (including at every point along each radial scan), the interaction energy was computed at the DF-MP2/aVQZ level of theory and counterpoise-corrected for basis-set-superposition error (BSSE)80. The resulting MP2 interaction energies form the basis of all datasets presented herein.
The DES370K dataset, which includes CCSD(T) interaction energies, was constructed using the QM- and MD-based protocols for generating dimer geometries described above, but with a more limited set of conformers. QM-derived dimer configurations were restricted to the scans containing the most stable dimer structure for each chemically distinct dimer type. In the case of MD-derived dimer configurations, the number of scans and multimers was limited to ~10 for each chemically distinct dimer type included in the dataset. We excluded the most compact, and thus very repulsive, conformers from all scans.
For each dimer in the DES370K dataset, we calculated a benchmark CCSD(T) interaction energy by using the “gold-standard” method of combining canonical MP2 energies extrapolated to the CBS limit with the difference between the CCSD(T) and MP2 energy estimated in a smaller basis set5. For MP2/CBS extrapolation, we used a two-point extrapolation81 of DF-MP2/aVTZ and DF-MP2/aVQZ counterpoise-corrected interaction energies. The post-MP2 interaction energy correction (denoted ΔCCSD(T)) was estimated by the difference between counterpoise-corrected CCSD(T) and MP2 interaction energies in the largest basis set that we could afford; this basis set varied from aVQZ for the smallest systems (e.g., a water dimer) to aVDZ for the largest (e.g., a phenol dimer).
Figure 1 shows heatmaps of dimer counts and ΔCCSD(T) for DES370K, grouped according to the molecule class of the two monomers. (A full list of SMILES strings assigned to each molecule class is provided in the Supplementary Information).
SNS-MP2 predictions
For every dimer included in DES5M, the interaction energy was computed using our SNS-MP2 approach, described in full detail elsewhere39. In addition to predicting an energy value, SNS-MP2 quantifies the uncertainty of that prediction: Each SNS-MP2 energy is accompanied by the upper and lower bounds of a 90% confidence interval associated with that prediction.
Calculation of other QM quantities
Beyond MP2 energies, SNS-MP2 requires additional features that encode the interaction in a geometry-independent manner, leveraging commonalities between chemically disparate dimers. An account of all inputs to the neural network can be found elsewhere39; here we describe only these additional quantities, which are included as entries in all three datasets. All of the below quantities are calculated automatically when using the SNS-MP2 plugin39 (https://github.com/DEShawResearch/sns-mp2) which relies on the Psi4 quantum chemistry software package82.
For each monomer in our set of dimers, we calculated the Hartree-Fock (HF) wavefunction (and thus density matrix) and the MP2 density matrix in the monomer basis. From these quantities, we calculated three properties of the dimer interaction: the classical electrostatic interaction energy, the Heitler-London energy, and the density-matrix overlap.
For each dimer, we calculated the following SAPT0/aVTZ energy components82–86: the second-order dispersion, induction, exchange-dispersion, and exchange-induction energies; the same-spin component of the second-order dispersion and exchange-dispersion energies; the first-order electrostatic and exchange energies; and the first-order exchange energy computed in the S2 approximation.
Figure 2 portrays a SAPT interaction energy analysis of the DES370K and DES15K datasets.
Core subset DES15K
DES15K is a core subset of the most representative structures from DES370K, and was assembled with a focus on retaining the chemical and orientational diversity of DES370K. DES15K consists of dimer configurations from both QM- and MD-derived scans, though with reduced resolution of scan points in the radial profiles. In the case of QM-optimized dimers, up to four conformers were selected, all from the radial scan corresponding to the most stable QM-optimized structure. These conformers correspond to the minimum, a point less compact than the minimum, the zero-crossing point at a more compact structure (if the minimum is not too deep), and a point representative of the repulsive wall at short distances. The exact definition of these points is specified in Fig. 3, which shows a typical dimer interaction energy profile, highlighting points along the scan that are included in DES15K. The DES15K dimers extracted from MD simulations include up to 10 conformers, and in addition to the MD-observed dimer, the minima along the corresponding radial scans at least half as deep as the most stable QM-optimized configuration. Based on these selection criteria, the MD-based component of DES15K includes only dimers for which we have the corresponding QM-optimized structure and sufficiently attractive scans. DES15K thus features a smaller set of monomers than DES370K, though most of the removed monomers are alkylated forms of the monomers still included in DES15K (and so the chemical diversity of the dimers—that is, at the level of functional-group interactions—is largely maintained).
Data Records
Datasets are provided as CSV files (one file each for DES370K, DES15K, DES5M, DESS66, and DESS66x8) in a Figshare data repository42. A table providing column names and a description of the contents of each column can be found in the Supplementary Information. The Pandas87 and Scipy and Numpy88 packages were used in data processing and packaging for the CSV files.
Technical Validation
Validation of monomer geometries
Employing an established strategy89, we used molecular graph connectivity to confirm that the final set of dimer conformers had not undergone extreme or unintended changes in geometry during any stage of the protocols used to generate those geometries. Connectivity for each dimer, based on the original SMILES string, was compared against the graph assigned by Open Babel45 from the final atom positions of each conformer. Bond order, formal charge, and stereochemistry were ignored in order to avoid ambiguities in molecular graph construction. That is, we verified that the molecular graphs are isomorphic (i.e., they have identical edges between nodes labeled by element). We also calculated monomer energies with the OPLS_2005 force field46 and rejected dimers that included any monomer with excitation energy >30 kcal mol−1.
Comparison to the S66 and S66x8 datasets
We applied the present protocol for estimating CCSD(T)/CBS dimer interaction energies to all 66 conformers from the S66 dataset9 and to all 528 conformers from the S66x8 dataset8, using the reference geometries in both cases. The results of these calculations are provided in the DESS66 and DESS66x8 datasets, respectively. We found good correspondence between our estimates and the values reported in Tables 6 and 7 of Kesharwani et al.90: mean unsigned errors of 0.07 kcal mol−1 (S66) and 0.05 kcal mol−1 (S66x8) and mean signed errors of −0.02 kcal mol−1 (S66) and −0.01 kcal mol−1 (S66x8).
Comparison of bond-length constraints to published experimental data
To validate the present protocol for determining the values for bond constraints involving hydrogen atoms, we compared the values to published experimental data91. For the methyl (CH3) and methylene (CH2) functional groups, our values for the CH bond length—1.101 Å and 1.107 Å, respectively—compare favorably with the experimental distribution, which features a median value of 1.107 Å. We observed similarly good agreement between computed (1.098 Å) and experimental (median of 1.094 Å) bond lengths between hydrogen and aromatic carbon. The present approach yields values for NH bond lengths in primary and secondary amines of 1.026 Å and 1.028 Å, respectively, which are close to the experimentally measured length of 1.021 (±0.006) Å91. For bonds between hydrogen and oxygen, experimental uncertainties are even larger. For alcohols, our protocol yielded a bond length of 0.976 Å before the condensed-phase correction and 0.980 Å after the correction; the former value compares very well with the experimentally determined bond length of 0.975 (±0.010) Å in methanol91. In carboxylic acids, our approach yields a bond length of 0.983 Å before the condensed-phase correction and 0.996 Å after the correction; the former is comparable to the experimentally measured value of 0.981 (±0.003) Å for formic acid.
Ionic system tests
The DES370K and DES5M collections include two types of ionic systems: dimers with only one of the monomers carrying a charge (−1, +1, or +2) and the other monomer neutral, and salts composed of a monovalent cation (+1) and a monovalent anion (−1). At large separations and in the absence of solvent, the desired biologically relevant monomer charges frequently do not represent the ground state. As a precautionary step, we clipped radial scans containing salts or divalent cations to separations between −0.5 and 0.5 Å from the reference structure used to seed the scan. We performed a natural population analysis (NPA)92, implemented in Molpro 2015.1 (http://www.molpro.net)59, to confirm that the interaction energies corresponded to the desired charges. We required that the NPA charges of each monomer differed from the target by <0.3 electrons, were smooth (as described in the next section), and approached the correct asymptotic limit.
Smoothness and curvature tests for radial scans
For each radial scan, we imposed two additional conditions, requiring both that the interaction energy and all components were smooth functions of the separation and that they asymptotically converged to zero at large distances. The smoothness was validated by fitting each radial profile with a cubic spline and assessing the impact of individually removing each data point from the fit. In addition, we ensured that along a given scan, the total interaction energy featured no more than one local minimum. Scans without a local minimum were considered valid only if the interaction energy was strictly positive. Scans with a negative local minimum were allowed to exhibit at most one local maximum with a positive interaction energy.
Supplementary information
Acknowledgements
The authors thank Jessica McGillen and Berkman Frank for editorial assistance.
Author contributions
A.G.D. and D.E.S. designed the project; A.G.D., A.G.T., E.D., C.H., R.T.M., K.-H.L., B.A.G., J.-L.L., K.P., K.S., M.B. and J.L.K. developed and applied the methods for dataset generation; E.D. and K.-H.L. prepared the data for release; E.D. formatted the data for release; E.D. and A.G.D. generated the figures; A.G.D., A.G.T., E.D., J.L.K. and D.E.S. wrote the paper.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Alexander G. Donchev, Email: Alexander.Donchev@DEShawResearch.com
David E. Shaw, Email: David.Shaw@DEShawResearch.com
Supplementary information
The online version contains supplementary material available at 10.1038/s41597-021-00833-x.
References
- 1.Hobza P, Zahradník R, Müller-Dethlefs K. The world of non-covalent interactions: 2006. Collect. Czech. Chem. Commun. 2006;71:443–531. doi: 10.1135/cccc20060443. [DOI] [Google Scholar]
- 2.Raghavachari K, Trucks GW, Pople JA, Head-Gordon M. A fifth-order perturbation comparison of electron correlation theories. Chem. Phys. Lett. 1989;157:479–483. doi: 10.1016/S0009-2614(89)87395-6. [DOI] [Google Scholar]
- 3.Urban M, Noga J, Cole SJ, Bartlett RJ. Towards a full CCSDT model for electron correlation. J. Chem. Phys. 1985;83:4041–4046. doi: 10.1063/1.449067. [DOI] [Google Scholar]
- 4.Bartlett RJ, Musial M. Coupled-cluster theory in quantum chemistry. Rev. Mod. Phys. 2007;79:291–352. doi: 10.1103/RevModPhys.79.291. [DOI] [Google Scholar]
- 5.Řezáč J, Hobza P. Describing noncovalent interactions beyond the common approximations: how accurate is the “gold standard,” CCSD(T) at the complete basis set limit? J. Chem. Theory Comput. 2013;9:2151–2155. doi: 10.1021/ct400057w. [DOI] [PubMed] [Google Scholar]
- 6.Jurečka P, Šponer J, Cerný J, Hobza P. Benchmark database of accurate (MP2 and CCSD(T) complete basis set limit) interaction energies of small model complexes, DNA base pairs, and amino acid pairs. Phys. Chem. Chem. Phys. 2006;8:1985–1993. doi: 10.1039/B600027D. [DOI] [PubMed] [Google Scholar]
- 7.Marshall, M. S., Burns, L. A. & Sherrill, C. D. Basis set convergence of the coupled-cluster correction, : best practices for benchmarking non-covalent interactions and the attendant revision of the S22, NBC10, HBC6, and HSG databases. J. Chem. Phys.135, 194102 (2011). [DOI] [PubMed]
- 8.Brauer B, Kesharwani MK, Kozuch S, Martin JML. The S66x8 benchmark for noncovalent interactions revisited: explicitly correlated ab initio methods and density functional theory. Phys. Chem. Chem. Phys. 2016;18:20905–20925. doi: 10.1039/C6CP00688D. [DOI] [PubMed] [Google Scholar]
- 9.Řezáč J, Riley KE, Hobza P. S66: A well-balanced database of benchmark interaction energies relevant to biomolecular structures. J. Chem. Theory Comput. 2011;7:2427–2438. doi: 10.1021/ct2002946. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Řezáč J, Riley KE, Hobza P. Benchmark calculations of noncovalent interactions of halogenated molecules. J. Chem. Theory Comput. 2012;8:4285–4292. doi: 10.1021/ct300647k. [DOI] [PubMed] [Google Scholar]
- 11.Burns LA, et al. The Bio-Fragment Database (BFDb): An open-data platform for computational chemistry analysis of noncovalent interactions. J. Chem. Phys. 2017;147:161727. doi: 10.1063/1.5001028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Schneebeli ST, Bochevarov AD, Friesner RA. Parameterization of a B3LYP specific correction for noncovalent interactions and basis set superposition error on a gigantic dataset of CCSD(T) quality noncovalent interaction energies. J. Chem. Theory Comput. 2011;7:658–668. doi: 10.1021/ct100651f. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Mardirossian N, Head-Gordon M. ωB97M-V: A combinatorially optimized, range-separated hybrid, meta-GGA density functional with VV10 nonlocal correlation. J. Chem. Phys. 2016;144:214110. doi: 10.1063/1.4952647. [DOI] [PubMed] [Google Scholar]
- 14.Smith DGA, Burns LA, Patkowski K, Sherrill CD. Revised damping parameters for the D3 dispersion correction to density functional theory. J. Phys. Chem. Lett. 2016;7:2197–2203. doi: 10.1021/acs.jpclett.6b00780. [DOI] [PubMed] [Google Scholar]
- 15.Yu HS, He X, Truhlar DG. MN15-L: A new local exchange-correlation functional for Kohn-Sham density functional theory with broad accuracy for atoms, molecules, and solids. J. Chem. Theory Comput. 2016;12:1280–1293. doi: 10.1021/acs.jctc.5b01082. [DOI] [PubMed] [Google Scholar]
- 16.Tkatchenko A, DiStasio RA, Jr, Car R, Scheffler M. Accurate and efficient method for many-body van der Waals interactions. Phys. Rev. Lett. 2012;108:236402. doi: 10.1103/PhysRevLett.108.236402. [DOI] [PubMed] [Google Scholar]
- 17.Goerigk L, Kruse H, Grimme S. Benchmarking density functional methods against the S66 and S66x8 datasets for non-covalent interactions. ChemPhysChem. 2011;12:3421–3433. doi: 10.1002/cphc.201100826. [DOI] [PubMed] [Google Scholar]
- 18.DiStasio RA, Jr, Head-Gordon M. Optimized spin-component scaled second-order Møller-Plesset perturbation theory for intermolecular interaction energies. Mol. Phys. 2007;105:1073–1083. doi: 10.1080/00268970701283781. [DOI] [Google Scholar]
- 19.Marchetti O, Werner HJ. Accurate calculations of intermolecular interaction energies using explicitly correlated coupled cluster wave functions and a dispersion-weighted MP2 method. J. Phys. Chem. A. 2009;113:11580–11585. doi: 10.1021/jp9059467. [DOI] [PubMed] [Google Scholar]
- 20.Takatani T, Hohenstein EG, Sherrill CD. Improvement of the coupled-cluster singles and doubles method via scaling same- and opposite-spin components of the double excitation correlation energy. J. Chem. Phys. 2008;128:124111. doi: 10.1063/1.2883974. [DOI] [PubMed] [Google Scholar]
- 21.Pitoňák M, Neogrady P, Cerný J, Grimme S, Hobza P. Scaled MP3 non-covalent interaction energies agree closely with accurate CCSD(T) benchmark data. ChemPhysChem. 2009;10:282–289. doi: 10.1002/cphc.200800718. [DOI] [PubMed] [Google Scholar]
- 22.Hesselmann A. Improved supermolecular second order Møller-Plesset intermolecular interaction energies using time-dependent density functional response theory. J. Chem. Phys. 2008;128:144112. doi: 10.1063/1.2905808. [DOI] [PubMed] [Google Scholar]
- 23.Burns LA, Marshall MS, Sherrill CD. Appointing silver and bronze standards for noncovalent interactions: a comparison of spin-component-scaled (SCS), explicitly correlated (F12), and specialized wavefunction approaches. J. Chem. Phys. 2014;141:234111. doi: 10.1063/1.4903765. [DOI] [PubMed] [Google Scholar]
- 24.McNamara JP, Hillier IH. Semi-empirical molecular orbital methods including dispersion corrections for the accurate prediction of the full range of intermolecular interactions in biomolecules. Phys. Chem. Chem. Phys. 2007;9:2362–2370. doi: 10.1039/b701890h. [DOI] [PubMed] [Google Scholar]
- 25.Řezáč J, Hobza P. Advanced corrections of hydrogen bonding and dispersion for semiempirical quantum mechanical methods. J. Chem. Theory Comput. 2011;8:141–151. doi: 10.1021/ct200751e. [DOI] [PubMed] [Google Scholar]
- 26.Christensen AS, Elstner M, Cui Q. Improving intermolecular interactions in DFTB3 using extended polarization from chemical-potential equalization. J. Chem. Phys. 2015;143:084123. doi: 10.1063/1.4929335. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Christensen AS, Kubař T, Cui Q, Elstner M. Semiempirical quantum mechanical methods for noncovalent interactions for chemical and biochemical applications. Chem. Rev. 2016;116:5301–5337. doi: 10.1021/acs.chemrev.5b00584. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Patkowski K. Benchmark databases of intermolecular interaction energies: design, construction, and significance. Annu. Rep. Comput. Chem. 2017;13:3–91. doi: 10.1016/bs.arcc.2017.06.004. [DOI] [Google Scholar]
- 29.Řezáč J, Hobza P. Benchmark calculations of interaction energies in noncovalent complexes and their applications. Chem. Rev. 2016;116:5038–5071. doi: 10.1021/acs.chemrev.5b00526. [DOI] [PubMed] [Google Scholar]
- 30.Wang L-P, et al. Building a more predictive protein force field: A systematic and reproducible route to AMBER-FB15. J. Phys. Chem. B. 2017;121:4023–4039. doi: 10.1021/acs.jpcb.7b02320. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Lopes PEM, et al. Polarizable force field for peptides and proteins based on the classical drude oscillator. J. Chem. Theory Comput. 2013;9:5430–5449. doi: 10.1021/ct400781b. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Laury ML, Wang L-P, Pande VS, Head-Gordon T, Ponder JW. Revised parameters for the AMOEBA polarizable atomic multipole water model. J. Phys. Chem. B. 2015;119:9423–9437. doi: 10.1021/jp510896n. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Piana S, Donchev AG, Robustelli P, Shaw DE. Water dispersion interactions strongly influence simulated structural properties of disordered protein states. J. Phys. Chem. B. 2015;119:5113–5123. doi: 10.1021/jp508971m. [DOI] [PubMed] [Google Scholar]
- 34.Harder E, et al. OPLS3: a force field providing broad coverage of drug-like small molecules and proteins. J. Chem. Theory Comput. 2015;12:281–296. doi: 10.1021/acs.jctc.5b00864. [DOI] [PubMed] [Google Scholar]
- 35.Bereau T, Andrienko D, von Lilienfeld OA. Transferable atomic multipole machine learning models for small organic molecules. J. Chem. Theory Comput. 2015;11:3225–3233. doi: 10.1021/acs.jctc.5b00301. [DOI] [PubMed] [Google Scholar]
- 36.Gao T, et al. A machine learning correction for DFT non-covalent interactions based on the S22, S66 and X40 benchmark databases. J. Cheminformatics. 2016;8:24. doi: 10.1186/s13321-016-0133-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Rupp M, Tkatchenko A, Müller K-R, von Lilienfeld OA. Fast and accurate modeling of molecular atomization energies with machine learning. Phys. Rev. Lett. 2012;108:058301. doi: 10.1103/PhysRevLett.108.058301. [DOI] [PubMed] [Google Scholar]
- 38.Smith JS, Isayev O, Roitberg AE. ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost. Chem. Sci. 2017;8:3192–3203. doi: 10.1039/C6SC05720A. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.McGibbon RT, et al. Improving the accuracy of Møller-Plesset perturbation theory with neural networks. J. Chem. Phys. 2017;147:161725. doi: 10.1063/1.4986081. [DOI] [PubMed] [Google Scholar]
- 40.Faber FA, et al. Prediction errors of molecular machine learning models lower than hybrid DFT error. J. Chem. Theory Comput. 2017;13:5255–5264. doi: 10.1021/acs.jctc.7b00577. [DOI] [PubMed] [Google Scholar]
- 41.Hegde G, Bowen RC. Machine-learned approximations to density functional theory Hamiltonians. Sci. Rep. 2017;7:42669. doi: 10.1038/srep42669. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Donchev AG, 2021. Quantum chemical benchmark databases of gold-standard dimer interaction energies. figshare. [DOI] [PMC free article] [PubMed]
- 43.Grimme S. Improved second-order Møller-Plesset perturbation theory by separate scaling of parallel- and antiparallel-spin pair correlation energies. J. Chem. Phys. 2003;118:9095–9102. doi: 10.1063/1.1569242. [DOI] [Google Scholar]
- 44.Weininger D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 1988;28:31–36. doi: 10.1021/ci00057a005. [DOI] [Google Scholar]
- 45.O’Boyle NM, et al. Open Babel: an open chemical toolbox. J. Cheminformatics. 2011;3:33–47. doi: 10.1186/1758-2946-3-33. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Banks JL, et al. Integrated modeling program, applied chemical theory (IMPACT) J. Comput. Chem. 2005;26:1752–1780. doi: 10.1002/jcc.20292. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Morse PM. Diatomic molecules according to the wave mechanics. II. Vibrational levels. Phys. Rev. 1929;34:57–64. [Google Scholar]
- 48.Polly R, Werner H-J, F. Manby R, Knowles PJ. Fast Hartree-Fock theory using local density fitting approximations. Mol. Phys. 2004;102:2311–2321. doi: 10.1080/0026897042000274801. [DOI] [Google Scholar]
- 49.Köppl C, Werner H-J. Parallel and low-order scaling implementation of Hartree-Fock exchange using local density fitting. J. Chem. Theory Comput. 2016;12:3122–3134. doi: 10.1021/acs.jctc.6b00251. [DOI] [PubMed] [Google Scholar]
- 50.Pipek J, Mezey PG. A fast intrinsic localization procedure applicable for ab initio and semiempirical linear combination of atomic orbital wave functions. J. Chem. Phys. 1989;90:4916–4926. doi: 10.1063/1.456588. [DOI] [Google Scholar]
- 51.El Azhary A, Rauhut G, Pulay P, Werner H-J. Analytical energy gradients for local second-order Møller-Plesset perturbation theory. J. Chem. Phys. 1998;108:5185–5193. doi: 10.1063/1.475955. [DOI] [Google Scholar]
- 52.Schütz M, Werner H-J, Lindh R, Manby FR. Analytical energy gradients for local second-order Møller-Plesset perturbation theory using density fitting approximations. J. Chem. Phys. 2004;121:737–750. doi: 10.1063/1.1760747. [DOI] [PubMed] [Google Scholar]
- 53.Hetzer G, Pulay P, Werner H-J. Multipole approximation of distant pair energies in local MP2 calculations. Chem. Phys. Lett. 1998;290:143–149. doi: 10.1016/S0009-2614(98)00491-6. [DOI] [Google Scholar]
- 54.Schütz M, Hetzer G, Werner H-J. Low-order scaling local electron correlation methods. I: Linear scaling local MP2. J. Chem. Phys. 1999;111:5691–5705. doi: 10.1063/1.479957. [DOI] [Google Scholar]
- 55.Hetzer G, Schütz M, Stoll H, Werner H-J. Low-order scaling local correlation methods. II: Splitting the Coulomb operator in linear scaling local second-order Møller-Plesset perturbation theory. J. Chem. Phys. 2000;113:9443–9455. doi: 10.1063/1.1321295. [DOI] [Google Scholar]
- 56.Werner H-J, Manby FR, Knowles PJ. Fast linear scaling second-order Møller-Plesset perturbation theory (MP2) using local and density fitting approximations. J. Chem. Phys. 2003;118:8149–8160. doi: 10.1063/1.1564816. [DOI] [Google Scholar]
- 57.Lindh R, Bernhardsson A, Karlström G, Malmqvist P-A. On the use of a Hessian model function in molecular geometry optimizations. Chem. Phys. Letters. 1995;241:423–428. doi: 10.1016/0009-2614(95)00646-L. [DOI] [Google Scholar]
- 58.Lindh R, Bernhardsson A, Schütz M. Force-constant weighted redundant coordinates in molecular geometry optimizations. Chem. Phys. Letters. 1999;303:567–575. doi: 10.1016/S0009-2614(99)00247-X. [DOI] [Google Scholar]
- 59.Werner H-J, Knowles PJ, Knizia G, Manby FR, Schütz M. A general-purpose quantum chemistry program package. Wiley Interdiscip. Rev.: Comput. Mol. Sci. 2012;2:242–253. [Google Scholar]
- 60.Dunning TH., Jr. Gaussian basis sets for use in correlated molecular calculations. I. The atoms boron through neon and hydrogen. J. Chem. Phys. 1989;90:1007–1023. doi: 10.1063/1.456153. [DOI] [Google Scholar]
- 61.Woon DE. & Dunning Jr., T.H. Gaussian basis sets for use in correlated molecular calculations. IV. Calculation of static electrical response properties. J. Chem. Phys. 1994;100:2975–2988. doi: 10.1063/1.466439. [DOI] [Google Scholar]
- 62.Kendall RA, Dunning TH, Jr., Harrison RJ. Electron affinities of the first-row atoms revisited. Systematic basis sets and wave functions. J. Chem. Phys. 1992;96:6796–6806. doi: 10.1063/1.462569. [DOI] [Google Scholar]
- 63.Woon DE, Dunning TH., Jr. Gaussian basis sets for use in correlated molecular calculations. III. The atoms aluminum through hydrogen. J. Chem. Phys. 1993;98:1358–1371. doi: 10.1063/1.464303. [DOI] [Google Scholar]
- 64.Dunning TH, Jr., Peterson KA, Wilson AK. Gaussian basis sets for use in correlated molecular calculations: X. The atoms aluminum through argon revisited. J. Chem. Phys. 2001;114:9244–9253. doi: 10.1063/1.1367373. [DOI] [Google Scholar]
- 65.Peterson KA, Dunning TH., Jr. Accurate correlation consistent basis sets for molecular core–valence correlation effects: The second row atoms Al–Ar, and the first row atoms B–Ne revisited. J. Chem. Phys. 2002;117:10548–10560. doi: 10.1063/1.1520138. [DOI] [Google Scholar]
- 66.Prascher B, Woon DE, Peterson KA, Dunning TH, Jr., Wilson AK. Gaussian basis sets for use in correlated molecular calculations. VII. Valence, core-valence, and scalar relativistic basis sets for Li, Be, Na, and Mg. Theor. Chem. Acc. 2011;128:69–82. doi: 10.1007/s00214-010-0764-0. [DOI] [Google Scholar]
- 67.Koput J, Peterson KA. Ab initio potential energy surface and vibrational-rotational energy levels of X2Σ+ CaOH. J. Phys. Chem. A. 2002;106:9595–9599. doi: 10.1021/jp026283u. [DOI] [Google Scholar]
- 68.Lim IS, Schwerdtfeger P, Metz B, Stoll H. All-electron and relativistic pseudopotential studies for the group 1 element polarizabilities from K to element 119. J. Chem. Phys. 2005;122:104103. doi: 10.1063/1.1856451. [DOI] [PubMed] [Google Scholar]
- 69.Lim IS, Stoll H, Schwerdtfeger P. Relativistic small-core energy-consistent pseudopotentials for the alkaline-earth elements from Ca to Ra. J. Chem. Phys. 2006;124:034107. doi: 10.1063/1.2148945. [DOI] [PubMed] [Google Scholar]
- 70.Peterson KA, Yousaf KE. Molecular core-valence correlation effects involving the post-d elements Ga-Rn: benchmarks and new pseudopotential-based correlation consistent basis sets. J. Chem. Phys. 2010;133:174116. doi: 10.1063/1.3503659. [DOI] [PubMed] [Google Scholar]
- 71.Peterson KA, Shepler BC, Figgen D, Stoll H. On the spectroscopic and thermochemical properties of ClO, BrO, IO, and their anions. J. Phys. Chem. A. 2006;110:13877–13883. doi: 10.1021/jp065887l. [DOI] [PubMed] [Google Scholar]
- 72.Peterson KA, Figgen D, Goll E, Stoll H, Dolg M. Systematically convergent basis sets with relativistic pseudopotentials. II. Small-core pseudopotentials and correlation consistent basis sets for the post-d group 16–18 elements. J. Chem. Phys. 2003;119:11113–11123. doi: 10.1063/1.1622924. [DOI] [Google Scholar]
- 73.Wilson AK, Woon DE, Peterson KA, Dunning TH., Jr. Gaussian basis sets for use in correlated molecular calculations. IX. The atoms gallium through krypton. J. Chem. Phys. 1999;110:7667–7676. [Google Scholar]
- 74.DeYonker NJ, Peterson KA, Wilson AK. Systematically convergent correlation consistent basis sets for molecular core−valence correlation effects: the third-row atoms gallium through Krypton. J. Phys. Chem. A. 2007;111:11383–11393. doi: 10.1021/jp0747757. [DOI] [PubMed] [Google Scholar]
- 75.Weigend F. A fully direct RI-HF algorithm: Implementation, optimized auxiliary basis sets, demonstration of accuracy and efficiency. Phys. Chem. Chem. Phys. 2002;4:4285–4291. doi: 10.1039/b204199p. [DOI] [Google Scholar]
- 76.Weigend F. Hartree–Fock exchange fitting basis sets for H to Rn. J. Comput. Chem. 2008;29:167–175. doi: 10.1002/jcc.20702. [DOI] [PubMed] [Google Scholar]
- 77.Bowers, K. J. et al. Scalable algorithms for molecular dynamics simulations on commodity clusters. Proc. ACM/IEEE Conf. Supercomput. (ACM, 2006).
- 78.Bergdorf, M., Baxter, S., Rendleman, C. A. & Shaw, D. E. Desmond/GPU performance as of November 2016. D. E. Shaw Research Technical Report DESRES/TR—2016-01. (2016).
- 79.Hansen K, et al. Machine learning predictions of molecular properties: accurate many-body potentials and nonlocality in chemical space. J. Phys. Chem. Lett. 2015;6:2326–2331. doi: 10.1021/acs.jpclett.5b00831. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Boys SF, Bernardi F. The calculation of small molecular interactions by the differences of separate total energies. Some procedures with reduced errors. Mol. Phys. 1970;19:553–566. doi: 10.1080/00268977000101561. [DOI] [Google Scholar]
- 81.Halkier A, Helgaker T, Jorgensen P, Klopper W, Olsen J. Basis-set convergence of the energy in molecular Hartree–Fock calculations. Chem. Phys. Lett. 1999;302:437–446. doi: 10.1016/S0009-2614(99)00179-7. [DOI] [Google Scholar]
- 82.Turney JM, et al. Psi4: An open-source ab initio electronic structure program. Wiley Interdiscip. Rev.: Comput. Mol. Sci. 2012;2:556–565. [Google Scholar]
- 83.Jeziorski B, Moszynski R, Szalewicz K. Perturbation theory approach to intermolecular potential energy surfaces of van der Waals complexes. Chem. Rev. 1994;94:1887–1930. doi: 10.1021/cr00031a008. [DOI] [Google Scholar]
- 84.Hohenstein EG, Sherrill CD. Wavefunction method for noncovalent interactions. Wiley Interdiscip. Rev.: Comput. Mol. Sci. 2012;2:304–326. [Google Scholar]
- 85.Hohenstein EG, Sherrill CD. Density fitting and Cholesky decomposition approximations in symmetry-adapted perturbation theory: implementation and application to probe the nature of π-π interactions in linear acenes. J. Chem. Phys. 2010;132:184111. doi: 10.1063/1.3426316. [DOI] [Google Scholar]
- 86.Hohenstein EG, Parrish RM, Sherrill CD, Turney JM, Schaefer HF. Large-scale symmetry-adapted perturbation theory computations via density fitting and Laplace transformation techniques: investigating the fundamental forces of DNA-intercalator interactions. J. Chem. Phys. 2011;135:174107. doi: 10.1063/1.3656681. [DOI] [PubMed] [Google Scholar]
- 87.McKinney, W. Data structures for statistical computing in python. Proceedings of the 9th Python in Science Conference, 56–61 (2010).
- 88.Oliphant TE. Python for scientific computing. Comput. Sci. Eng. 2007;9:10–20. doi: 10.1109/MCSE.2007.58. [DOI] [Google Scholar]
- 89.Ramakrishnan R, Dral PO, Rupp M, von Lilienfeld OA. Quantum chemistry structures and properties of 134 kilo molecules. Sci. Data. 2014;1:140022. doi: 10.1038/sdata.2014.22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Kesharwani MK, Karton A, Sylvetsky N, Nitai JML. The S66 non-covalent interactions benchmark reconsidered using explicitly correlated methods near the basis set limit. Aust. J. Chem. 2018;71:238–248. doi: 10.1071/CH17588. [DOI] [Google Scholar]
- 91.Ma B, Lii J-H, Schaefer HF, Allinger NL. Systematic comparison of experimental, quantum mechanical, and molecular mechanical bond lengths for organic molecules. J. Phys. Chem. 1996;100:8763–8769. doi: 10.1021/jp953630+. [DOI] [Google Scholar]
- 92.Reed AE, Weinstock RB, Weinhold F. Natural population analysis. J. Chem. Phys. 1985;83:735–746. doi: 10.1063/1.449486. [DOI] [Google Scholar]
- 93.Hunter JD. Matplotlib: A 2D Graphics Environment. Comput. Sci. Eng. 2007;9:90–95. doi: 10.1109/MCSE.2007.55. [DOI] [Google Scholar]
- 94.Waskom M, 2018. mwaskom/seaborn: v0.9.0 (July 2018) Zenodo. [DOI]
- 95.Marc, 2018. marcharper/python-ternary: Corner label functions. Zenodo. [DOI]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Citations
- Donchev AG, 2021. Quantum chemical benchmark databases of gold-standard dimer interaction energies. figshare. [DOI] [PMC free article] [PubMed]
- Waskom M, 2018. mwaskom/seaborn: v0.9.0 (July 2018) Zenodo. [DOI]
- Marc, 2018. marcharper/python-ternary: Corner label functions. Zenodo. [DOI]