Table 1:
iSIM JT, iSIM-JT, and Pairwise for sampled molecules (10%) from ChEMBL33 natural products (whole dataset: n = 64086, iSIM = 0.2745, iSIM-) using different sampling methods. Molecules were represented with RDKit fingerprints (2048 bits).
| Sampling method | iSIM | iSIM | Pairwise |
|---|---|---|---|
| Medoid | 0.6031 | 0.0747 | 0.0763 |
| Outlier | 0.0856 | 0.0687 | 0.0807 |
| Extremes | 0.2800 | 0.2742 | 0.2748 |
| Quota | 0.3113 | 0.1913 | 0.1890 |
| Stratified | 0.2745 | 0.1329 | 0.1302 |
| MaxMin | 0.1764 | 0.0936 | 0.0900 |