Skip to main content
. 2019 Jul 30;59(8):3370–3388. doi: 10.1021/acs.jcim.9b00237

Table 4. Summary Statistics of the Public Data Sets Used in This Papera.

data set no. of tasks task type no. of compounds metric
QM7 1 regression 6,830 MAE
QM8 12 regression 21,786 MAE
QM9 12 regression 133,885 MAE
ESOL 1 regression 1,128 RMSE
FreeSolv 1 regression 642 RMSE
Lipophilicity 1 regression 4,200 RMSE
PDBbind-F 1 regression 9,880 RMSE
PDBbind-C 1 regression 168 RMSE
PDBbind-R 1 regression 3,040 RMSE
PCBA 128 classification 437,929 PRC-AUC
MUV 17 classification 93,087 PRC-AUC
HIV 1 classification 41,127 ROC-AUC
BACE 1 classification 1,513 ROC-AUC
BBBP 1 classification 2,039 ROC-AUC
Tox21 12 classification 7,831 ROC-AUC
ToxCast 617 classification 8,576 ROC-AUC
SIDER 27 classification 1,427 ROC-AUC
ClinTox 2 classification 1,478 ROC-AUC
ChEMBL 1310 classification 456,331 ROC-AUC
a

Note: PDBbind-F, PDBbind-C, and PDBbind-R refer to the full, core, and refined PDBbind data sets from Wu et al.2