Skip to main content
. 2023 May 11;19(11):3251–3275. doi: 10.1021/acs.jctc.3c00039

Table 3. List of QM Data Sets of Optimized Geometries and 1D Torsion Scans, Curated and Used for Training One or More of the Force Fields Discussed Here, as Referenced on MolSSI’s Publicly Accessible Repository QCArchivea.

generation TorsionDrive data set optimization data set (each set has a corresponding basic data set)
Generation 1 training sets (<Parsley 1.2.0), 620 unique molecules OpenFF Group 1 Torsions (820 1D scans) OpenFF Optimization Set 1 (937 conformers)
  SMIRNOFF Coverage Torsion Set 1 (585 1D scans) SMIRNOFF Coverage Set 1 (1132 conformers)
  OpenFF Group 1 Torsions 2 (19 1D scans)  
  OpenFF Group 1 Torsions 3 (6 1D scans)  
Generation 2 training sets (≥Parsley 1.2.0), 1526 unique molecules OpenFF Gen 2 Torsion Set 1 Roche 2 (142 1D scans) OpenFF Gen 2 Opt Set 1 Roche (298 conformers)
  OpenFF Gen 2 Torsion Set 2 Coverage 2 (157 1D scans) OpenFF Gen 2 Opt Set 2 Coverage (373 conformers)
  OpenFF Gen 2 Torsion Set 3 Pfizer Discrepancy 2 (82 1D scans) OpenFF Gen 2 Opt Set 3 Pfizer Discrepancy (197 conformers)
  OpenFF Gen 2 Torsion Set 4 eMolecules Discrepancy 2 (272 1D scans) OpenFF Gen 2 Opt Set 4 eMolecules Discrepancy (2201 conformers)
  OpenFF Gen 2 Torsion Set 5 Bayer 2 (219 1D scans) OpenFF Gen 2 Opt Set 5 Bayer (1850 conformers)
  OpenFF Gen 2 Torsion Set 6 supplemental 2 (22 1D scans)  
a

As discussed in the text, Generation 1 data sets were the first set generated with coverage of all parameters as the main objective, whereas Generation 2 data sets were generated to increase the chemical diversity. Hessian data sets (termed as “basic data set”) for the equilibrium geometries of all the optimization data sets listed here are also available on QCArchive. Each of the Hessian data sets has the exact same data set name as the corresponding optimization data set but Hessians for the final optimized geometries. A complete list of OpenFF data sets, including those not used in fitting here, can be found at https://github.com/openforcefield/qca-dataset-submission#dude-wheres-my-dataset.