TABLE 1.
Summary of the applied datasets.
| Dataset | Original dataset | Final dataset | Reference | ||||
|---|---|---|---|---|---|---|---|
| Dataset size | Active molecule | % of actives | Dataset size | Internal set | External set | ||
| Mutagenicity | 6,512 | 3,503 | 54 | 6,190 | 4,952 | 1,238 | Hansen et al (2009) |
| P-glycoprotein | 1,275 | 666 | 52 | 1,180 | 944 | 236 | Broccatelli et al (2011) |
| hERG | 4,787 | 2,749 | 57 | 4,612 | 3,690 | 922 | Alves et al (2018) |
| Hepatotoxicity | 2,476 | 619 | 25 | 2,414 | 1932 | 482 | Wu et al (2019) |
| BBB | 1864 | 1,438 | 77 | 1750 | 1,400 | 350 | Roy et al (2019) |
| CYP 2C9 | 12,776 | 5,800 | 45 | 12,379 | 9,904 | 2,475 | PubChem (2021) |