Table 1.
Dataset | Geometry | Energy | #Moleculeb | #Conformationsb | #Heavy Atoms |
---|---|---|---|---|---|
QM9 | B3LYP/6–31G(2df,p) | B3LYP/6–31G(2df,p) | 99,000/1,000/33,885 | 99,000/1,000/33,885 | [1, 9] |
QM9M | MMFF94 | B3LYP/6–31G(2df,p) | 99,000/1,000/33885 | 99,000/1,000/33,885 | [1, 9] |
eMol9_CM | MMFF94 | B3LYP/6–31G* | 8111/500/1,348 | ~66,000/~6,000/~16,000c | [1, 9] |
Plati_CM | MMFF94 | B3LYP/6–31G* | 0/0/74 | 0/0/4,076 | [10, 12] |
QM9 dataset is generated by Ramakrishnan et al53. Other three datasets are prepared by ourselves.
Number of molecules and conformations in the training/validation/test sets are shown respectively.
eMol9_CM has been random split into train/validation/test sets using five different random seeds based on molecule types. Thus, the numbers of conformations for train/validation/test sets in different splits are different.