Table 4. Summary Statistics of the Public Data Sets Used in This Papera.
data set | no. of tasks | task type | no. of compounds | metric |
---|---|---|---|---|
QM7 | 1 | regression | 6,830 | MAE |
QM8 | 12 | regression | 21,786 | MAE |
QM9 | 12 | regression | 133,885 | MAE |
ESOL | 1 | regression | 1,128 | RMSE |
FreeSolv | 1 | regression | 642 | RMSE |
Lipophilicity | 1 | regression | 4,200 | RMSE |
PDBbind-F | 1 | regression | 9,880 | RMSE |
PDBbind-C | 1 | regression | 168 | RMSE |
PDBbind-R | 1 | regression | 3,040 | RMSE |
PCBA | 128 | classification | 437,929 | PRC-AUC |
MUV | 17 | classification | 93,087 | PRC-AUC |
HIV | 1 | classification | 41,127 | ROC-AUC |
BACE | 1 | classification | 1,513 | ROC-AUC |
BBBP | 1 | classification | 2,039 | ROC-AUC |
Tox21 | 12 | classification | 7,831 | ROC-AUC |
ToxCast | 617 | classification | 8,576 | ROC-AUC |
SIDER | 27 | classification | 1,427 | ROC-AUC |
ClinTox | 2 | classification | 1,478 | ROC-AUC |
ChEMBL | 1310 | classification | 456,331 | ROC-AUC |
Note: PDBbind-F, PDBbind-C, and PDBbind-R refer to the full, core, and refined PDBbind data sets from Wu et al.2