Table 2. Datasets Used for Bayesian Models Created for Use by MMDS, with ECFP6 Fingerprintsa.
model | datasets used and refs | cutoff for active | no. of molecules | five-fold ROC |
---|---|---|---|---|
solubility | ref (145) | Log solubility = −5 | 1144 actives, 155 inactives | 0.86 |
probe-like | ref (148) | described in ref (148) | 253 actives, 69 inactives | 0.76 |
hERG | ref (149) | described in ref (149) | 373 actives, 433 inactives | 0.85 |
KCNQ1 | PubChem BioAssay: AID 2642150 | using actives assigned in PubChem | 301,737 actives, 3878 inactives | 0.84 |
Bubonic plague (Yersina pestis) | PubChem single-point screen BioAssay: AID 898 | active when inhibition ≥50% | 223 actives, 139,710 inactives | 0.81 |
Chagas disease (Typanosoma cruzi) | Pubchem BioAssay: AID 2044 | with EC50 <1 μM, >10-fold difference in cytotoxicity as active | 1692 actives, 2363 inactives | 0.8 |
TB (Mycobacterium tuberculosis) | in vitro bioactivity and cytotoxicity data from MLSMR, CB2, kinase, and ARRA datasets110 | Mtb activity and acceptable Vero cell cytotoxicity selectivity index = (MIC or IC90)/CC50 ≥10 | 1434 actives, 5789 inactives | 0.73 |
malaria (Plasmodium falciparum) | CDD Public datasets (MMV, St. Jude, Novartis, and TCAMS)127−129 | 3D7 EC50 <10 nM | 175 actives, 19,604 inactives | 0.98 |
All eight models are ECFP6, with folding into 32,768 slots.