Table 2.
Method name | Category | RMSE | MAE | R2 | Kendall’s Tau |
---|---|---|---|---|---|
TFE MLR [87] | Empirical | 0.58 [0.34, 0.83] | 0.41 [0.26, 0.60] | 0.43 [0.06, 0.80] | 0.56 [0.23, 0.83] |
Chemprop [88] | Empirical | 0.66 [0.39, 0.89] | 0.48 [0.30, 0.69] | 0.41 [0.11, 0.76] | 0.54 [0.25, 0.82] |
ClassicalGSG DB3 [84–86] | Empirical | 0.77 [0.57, 0.96] | 0.62 [0.43, 0.82] | 0.51 [0.18, 0.77] | 0.48 [0.14, 0.75] |
COSMO-RS [89] | Physical (QM) | 0.78 [0.49, 1.01] | 0.57 [0.36, 0.80] | 0.49 [0.17, 0.80] | 0.53 [0.25, 0.78] |
TFE-NHLBI-TZVP-QM | Physical (QM) | 1.55 [1.19, 1.87] | 1.34 [1.02, 1.76] | 0.52 [0.19, 0.78] | 0.51 [0.19, 0.78] |
Submissions were ranked according to RMSE, MAE, R2, and Kendall’s Tau. Many top methods were found to be statistically indistinguishable when considering the uncertainties of their error metrics. Additionally, the sorting of methods was significantly influenced by the metric that was chosen. We determined which ranked log P prediction methods were consistently the best according to all four chosen statistical metrics by assessing the top 10 methods according to each metric. A set of five consistently well-performing methods were determined– three empirical methods and two QM-based physical methods. Performance statistics are provided as mean and 95% confidence intervals. Correlation plots of the best performing methods and one average method is shown in Figure 5. Additional statistics are available in Table S1