Skip to main content
. Author manuscript; available in PMC: 2022 Aug 9.
Published in final edited form as: J Phys Chem Lett. 2021 Nov 1;12(44):10793–10801. doi: 10.1021/acs.jpclett.1c03058

Table 1.

Three Pretraining Datasets and 10 Datsets Used for Benchmarking Our Platform

data sets task type compounds split metric
ChEMBL(C)23 pretrain 1 941 410 accuracy
ChEMBL and PubChem(CP)24 pretrain 103 395 400 accuracy
ChEMBL, PubChem, and ZINC(CPZ)25 pretrain 775 007 514 accuracy
Ames mutagenicity (Ames)34 classification 6 512 8:1:1 ROC-AUC
β-secretase 1 inhibition (bace)35 classification 1 513 8:1:1 ROC-AUC
blood−brain barrier penetration (bbbp)36 classification 2 039 8:1:1 ROC-AUC
toxicity in honeybees (beet)37 classification 254 8:1:1 ROC-AUC
ClinTox (Clinical trial results)38 classification 1 478 8:1:1 ROC-AUC
aqueous solubility (ESOL)39 regression 1 128 8:1:1 R 2
lipophilicity (Lipop)23 regression 4 200 8:1:1 R 2
free solvation database (FreeSolv)40 regression 642 8:1:1 R 2
LogS41 regression 4 801 8:1:1 R 2
DPP-4 inhibitors (DPP4)42 regression 3 933 8:1:1 R 2