Table 1. Dataset details: number of compounds and tasks, recommended splits and metrics.
Category | Dataset | Data type | Tasks | Compounds | Rec – split | Rec – metric | |
Quantum mechanics | QM7 | SMILES, 3D coordinates | 1 | Regression | 7165 | Stratified | MAE |
QM7b | 3D coordinates | 14 | Regression | 7211 | Random | MAE | |
QM8 | SMILES, 3D coordinates | 12 | Regression | 21 786 | Random | MAE | |
QM9 | SMILES, 3D coordinates | 12 | Regression | 133 885 | Random | MAE | |
Physical chemistry | ESOL | SMILES | 1 | Regression | 1128 | Random | RMSE |
FreeSolv | SMILES | 1 | Regression | 643 | Random | RMSE | |
Lipophilicity | SMILES | 1 | Regression | 4200 | Random | RMSE | |
Biophysics | PCBA | SMILES | 128 | Classification | 439 863 | Random | PRC-AUC |
MUV | SMILES | 17 | Classification | 93 127 | Random | PRC-AUC | |
HIV | SMILES | 1 | Classification | 41 913 | Scaffold | ROC-AUC | |
PDBbind | SMILES, 3D coordinates | 1 | Regression | 11 908 | Time | RMSE | |
BACE | SMILES | 1 | Classification | 1522 | Scaffold | ROC-AUC | |
Physiology | BBBP | SMILES | 1 | Classification | 2053 | Scaffold | ROC-AUC |
Tox21 | SMILES | 12 | Classification | 8014 | Random | ROC-AUC | |
ToxCast | SMILES | 617 | Classification | 8615 | Random | ROC-AUC | |
SIDER | SMILES | 27 | Classification | 1427 | Random | ROC-AUC | |
ClinTox | SMILES | 2 | Classification | 1491 | Random | ROC-AUC |