Skip to main content
. 2016 Oct 31;8:60. doi: 10.1186/s13321-016-0173-z

Table 5.

The 76 datasets used for our model building experiments

Type Dataset/group Num Compounds Active In-active Source
Balanced AMES 1 4337 2401 1936 [47]
Balanced CPDBAS 5 1102.6 545.8 556.8 [48]
Balanced NCTRER 1 217 126 91 [33]
Virtual-screening ChEMBL 50 10,100 100 10,000 [6, 49]
Virtual-screening DUD 3 1822.3 42 1780.3 [6, 50]
Virtual-screening MUV 16 15,026.8 30 14,996.8 [6, 51]

Multiple occurrences of the same compound are inserted only once. E.g., some of the originally 15,000 decoys for each MUV dataset are removed. In case, multiple occurrences have differing endpoint values, the compound is omitted. Only 5 of 7 endpoints from the CPDBAS dataset could be used for this study as two endpoints (Hamster and Dog/Primates) are to small and yield less than 1024 ECFP4 fragments. A more detailed list of datasets is provided in Additional file 2