Skip to main content
. 2017 Oct 31;9(2):513–530. doi: 10.1039/c7sc02664a

Table 1. Dataset details: number of compounds and tasks, recommended splits and metrics.

Category Dataset Data type Tasks Compounds Rec – split Rec – metric
Quantum mechanics QM7 SMILES, 3D coordinates 1 Regression 7165 Stratified MAE
QM7b 3D coordinates 14 Regression 7211 Random MAE
QM8 SMILES, 3D coordinates 12 Regression 21 786 Random MAE
QM9 SMILES, 3D coordinates 12 Regression 133 885 Random MAE
Physical chemistry ESOL SMILES 1 Regression 1128 Random RMSE
FreeSolv SMILES 1 Regression 643 Random RMSE
Lipophilicity SMILES 1 Regression 4200 Random RMSE
Biophysics PCBA SMILES 128 Classification 439 863 Random PRC-AUC
MUV SMILES 17 Classification 93 127 Random PRC-AUC
HIV SMILES 1 Classification 41 913 Scaffold ROC-AUC
PDBbind SMILES, 3D coordinates 1 Regression 11 908 Time RMSE
BACE SMILES 1 Classification 1522 Scaffold ROC-AUC
Physiology BBBP SMILES 1 Classification 2053 Scaffold ROC-AUC
Tox21 SMILES 12 Classification 8014 Random ROC-AUC
ToxCast SMILES 617 Classification 8615 Random ROC-AUC
SIDER SMILES 27 Classification 1427 Random ROC-AUC
ClinTox SMILES 2 Classification 1491 Random ROC-AUC