Table 4. Descriptions of the 10 Data Sets Used in This Studya.
category | data set | # molecules | # tasks | task type |
---|---|---|---|---|
physical chemistry | ESOL | 1128 | 1 | regression |
FreeSolv | 642 | 1 | regression | |
Lipo | 4200 | 1 | regression | |
biophysics | HIV | 41 127 | 1 | classification |
BACE | 1513 | 1 | classification | |
physiology | BBBP | 2039 | 1 | classification |
Tox21 | 7831 | 12 | classification | |
ToxCast | 8576 | 617 | classification | |
SIDER | 1427 | 27 | classification | |
ClinTox | 1478 | 2 | classification |
The columns “# molecules” and “# tasks” gave the numbers of molecule samples and the number of tasks of each data set. The column “task type” gave whether this data set was a regression or classification task. Further details on the data sets are available in Wu et al.19