Skip to main content
. Author manuscript; available in PMC: 2020 Jul 9.
Published in final edited form as: J Chem Theory Comput. 2019 Jun 12;15(7):4113–4121. doi: 10.1021/acs.jctc.9b00001

Table 1.

Datasetsa used for machine learning model development and evaluation.

Dataset Geometry Energy #Moleculeb #Conformationsb #Heavy Atoms
QM9 B3LYP/6–31G(2df,p) B3LYP/6–31G(2df,p) 99,000/1,000/33,885 99,000/1,000/33,885 [1, 9]
QM9M MMFF94 B3LYP/6–31G(2df,p) 99,000/1,000/33885 99,000/1,000/33,885 [1, 9]
eMol9_CM MMFF94 B3LYP/6–31G* 8111/500/1,348 ~66,000/~6,000/~16,000c [1, 9]
Plati_CM MMFF94 B3LYP/6–31G* 0/0/74 0/0/4,076 [10, 12]
a

QM9 dataset is generated by Ramakrishnan et al53. Other three datasets are prepared by ourselves.

b

Number of molecules and conformations in the training/validation/test sets are shown respectively.

c

eMol9_CM has been random split into train/validation/test sets using five different random seeds based on molecule types. Thus, the numbers of conformations for train/validation/test sets in different splits are different.