Table 2.
Statistical evaluations of predictions for the Extra Trees, Random Forest and Bagging predictors and for the Vox Machinarum consensus classifier over our sourced literature logS values (see Table A1 for references) for 89 compounds from the 2019 Solubility Challenge tight test set of 100 molecules. The Vox Machinarum predictions reported here were the median of the other three classifiers’ predictions for each compound. The standard deviation of the 89 compounds’ log S values is 1.102. Here and in subsequent tables, SD values are calculated using the denominator N for consistency with the definition of RMSE. This is equivalent to calculating the standard deviation of a small set of solubilities rather than using the Bessel correction to emulate the properties of the notional larger distribution from which they might be drawn.
| Method | RMSE | RMSE/SD | AAE | R 2 | Err < 0.5 | Err < 1.0 |
|---|---|---|---|---|---|---|
| Extra Trees | 0.897 | 0.814 | 0.670 | 0.363 | 46 (52%) | 70 (79%) |
| Random Forest | 0.958 | 0.869 | 0.739 | 0.305 | 40 (45%) | 67 (75%) |
| Bagging | 1.009 | 0.915 | 0.785 | 0.277 | 35 (39%) | 59 (66%) |
| Vox Machinarum | 0.945 | 0.858 | 0.726 | 0.319 | 41 (46%) | 67 (75%) |