Table 5.
Type | Dataset/group | Num | Compounds | Active | In-active | Source |
---|---|---|---|---|---|---|
Balanced | AMES | 1 | 4337 | 2401 | 1936 | [47] |
Balanced | CPDBAS | 5 | 1102.6 | 545.8 | 556.8 | [48] |
Balanced | NCTRER | 1 | 217 | 126 | 91 | [33] |
Virtual-screening | ChEMBL | 50 | 10,100 | 100 | 10,000 | [6, 49] |
Virtual-screening | DUD | 3 | 1822.3 | 42 | 1780.3 | [6, 50] |
Virtual-screening | MUV | 16 | 15,026.8 | 30 | 14,996.8 | [6, 51] |
Multiple occurrences of the same compound are inserted only once. E.g., some of the originally 15,000 decoys for each MUV dataset are removed. In case, multiple occurrences have differing endpoint values, the compound is omitted. Only 5 of 7 endpoints from the CPDBAS dataset could be used for this study as two endpoints (Hamster and Dog/Primates) are to small and yield less than 1024 ECFP4 fragments. A more detailed list of datasets is provided in Additional file 2