Skip to main content
. 2013 Jul 15;53(8):1990–2000. doi: 10.1021/ci400213d

Figure 5.

Figure 5

Cumulative positive predictive values (PPVs) are shown as a function of distance to models (DMs) using two property-based similarities32 (Consensus and Bagging STD) and Leverage, which was calculated using ECFP4 descriptors.28 The same Consensus model was analyzed and thus PPV100% values are identical for all three plots. The PPV values for the first 500 compounds were averaged to decrease chance fluctuations due to a small sampling size. All plots demonstrate that the accuracy of predictions decreases for molecules with large DMs. The property-based similarities better identify molecules with correct predictions and thus higher PPVs compared to Leverage. Thus, the acquisition of molecules selected from regions with most accurate predictions, e.g. 10% of compounds, would provide a smaller fraction of insoluble molecules when using property-based similarities compared to that based on Leverage.