Table 4. Summary Statistics of the Public Data Sets Used in This Papera.
| data set | no. of tasks | task type | no. of compounds | metric |
|---|---|---|---|---|
| QM7 | 1 | regression | 6,830 | MAE |
| QM8 | 12 | regression | 21,786 | MAE |
| QM9 | 12 | regression | 133,885 | MAE |
| ESOL | 1 | regression | 1,128 | RMSE |
| FreeSolv | 1 | regression | 642 | RMSE |
| Lipophilicity | 1 | regression | 4,200 | RMSE |
| PDBbind-F | 1 | regression | 9,880 | RMSE |
| PDBbind-C | 1 | regression | 168 | RMSE |
| PDBbind-R | 1 | regression | 3,040 | RMSE |
| PCBA | 128 | classification | 437,929 | PRC-AUC |
| MUV | 17 | classification | 93,087 | PRC-AUC |
| HIV | 1 | classification | 41,127 | ROC-AUC |
| BACE | 1 | classification | 1,513 | ROC-AUC |
| BBBP | 1 | classification | 2,039 | ROC-AUC |
| Tox21 | 12 | classification | 7,831 | ROC-AUC |
| ToxCast | 617 | classification | 8,576 | ROC-AUC |
| SIDER | 27 | classification | 1,427 | ROC-AUC |
| ClinTox | 2 | classification | 1,478 | ROC-AUC |
| ChEMBL | 1310 | classification | 456,331 | ROC-AUC |
Note: PDBbind-F, PDBbind-C, and PDBbind-R refer to the full, core, and refined PDBbind data sets from Wu et al.2