. Author manuscript; available in PMC: 2020 Jul 9.

Published in final edited form as: J Chem Theory Comput. 2019 Jun 12;15(7):4113–4121. doi: 10.1021/acs.jctc.9b00001

Table 1.

Datasets^a used for machine learning model development and evaluation.

Dataset	Geometry	Energy	#Molecule^b	#Conformations^b	#Heavy Atoms
QM9	B3LYP/6–31G(2df,p)	B3LYP/6–31G(2df,p)	99,000/1,000/33,885	99,000/1,000/33,885	[1, 9]
QM9_M	MMFF94	B3LYP/6–31G(2df,p)	99,000/1,000/33885	99,000/1,000/33,885	[1, 9]
eMol9_C_M	MMFF94	B3LYP/6–31G*	8111/500/1,348	~66,000/~6,000/~16,000^c	[1, 9]
Plati_C_M	MMFF94	B3LYP/6–31G*	0/0/74	0/0/4,076	[10, 12]

QM9 dataset is generated by Ramakrishnan et al⁵³. Other three datasets are prepared by ourselves.

Number of molecules and conformations in the training/validation/test sets are shown respectively.

eMol9_C_M has been random split into train/validation/test sets using five different random seeds based on molecule types. Thus, the numbers of conformations for train/validation/test sets in different splits are different.