Separately, on page 3372 the learned
matrix
should be
where
.
Figure 1.
(random split, higher = better) Comparison to baselines on public data sets with original (left) and fixed (right) random forest numbers using a random split.
Table 1. (Random Split, Higher = Better) Comparison to Baselines on Public Datasets with Original and Fixed Random Forest Numbers Using a Random Split.
| data set | metric | D-MPNN | D-MPNN ensemble | RF on Morgan (original) | RF on Morgan (fixed) |
|---|---|---|---|---|---|
| HIV | ROC-AUC | 0.816 ± 0.023 | 0.836 ± 0.020 (+2.40% p = 0.01) | 0.641 ± 0.022 (−21.45% p = 0.00) | 0.819 ± 0.025 (+0.31% p = 0.97) |
| BACE | ROC-AUC | 0.878 ± 0.032 | 0.898 ± 0.034 (+2.31% p = 0.00) | 0.825 ± 0.039 (−6.08% p = 0.00) | 0.898 ± 0.031 (+2.26% p = 1.00) |
| BBBP | ROC-AUC | 0.913 ± 0.026 | 0.925 ± 0.036 (+1.23% p = 0.01) | 0.788 ± 0.038 (−13.77% p = 0.00) | 0.909 ± 0.028 (−0.42% p = 0.19) |
| Tox21 | ROC-AUC | 0.845 ± 0.015 | 0.861 ± 0.012 (+1.95% p = 0.00) | 0.619 ± 0.015 (−26.75% p = 0.00) | 0.819 ± 0.017 (−3.06% p = 0.00) |
| SIDER | ROC-AUC | 0.646 ± 0.016 | 0.664 ± 0.021 (+2.79% p = 0.01) | 0.572 ± 0.007 (−11.38% p = 0.00) | 0.687 ± 0.014 (+6.35% p = 1.00) |
| ClinTox | ROC-AUC | 0.894 ± 0.027 | 0.906 ± 0.043 (+1.33% p = 0.05) | 0.544 ± 0.031 (−39.13% p = 0.00) | 0.759 ± 0.060 (−15.12% p = 0.00) |
Figure 2.
(scaffold split, higher = better) Comparison to baselines on public data sets with original (left) and fixed (right) random forest numbers using a scaffold split.
Table 2. (Scaffold Split, Higher = Better) Comparison to Baselines on Public Datasets with Original and Fixed Random Forest Numbers Using a Scaffold Split.
| data set | metric | D-MPNN | D-MPNN ensemble | RF on Morgan (original) | RF on Morgan (fixed) |
|---|---|---|---|---|---|
| HIV | ROC-AUC | 0.794 ± 0.016 | 0.817 ± 0.013 (+2.94% p = 0.00) | 0.583 ± 0.034 (−26.59% p = 0.00) | 0.821 ± 0.020 (+3.42% p = 0.99) |
| BACE | ROC-AUC | 0.838 ± 0.056 | 0.871 ± 0.041 (+3.89% p = 0.00) | 0.804 ± 0.035 (−4.04% p = 0.01) | 0.884 ± 0.026 (+5.43% p = 1.00) |
| BBBP | ROC-AUC | 0.888 ± 0.029 | 0.902 ± 0.024 (+1.56% p = 0.01) | 0.722 ± 0.049 (−18.68% p = 0.00) | 0.880 ± 0.034 (−0.88% p = 0.45) |
| Tox21 | ROC-AUC | 0.791 ± 0.047 | 0.814 ± 0.047 (+2.89% p = 0.00) | 0.582 ± 0.031 (−26.42% p = 0.00) | 0.747 ± 0.040 (−5.54% p = 0.00) |
| SIDER | ROC-AUC | 0.593 ± 0.032 | 0.612 ± 0.047 (+3.31% p = 0.03) | 0.540 ± 0.013 (−8.79% p = 0.00) | 0.632 ± 0.043 (+6.75% p = 1.00) |
| ClinTox | ROC-AUC | 0.870 ± 0.072 | 0.895 ± 0.050 (+2.86% p = 0.01) | numerically unstable | 0.711 ± 0.123 (−18.24% p = 0.00) |
Figure 3.
(time split, higher = better) Comparison to baselines on Amgen data set with original (left) and fixed (right) random forest numbers using a time split.
Table 3. (Time Split, Higher = Better) Comparison to Baselines on Amgen Dataset with Original and Fixed Random Forest Numbers Using a Time Split.
| data set | metric | D-MPNN | D-MPNN ensemble | RF on Morgan (original) | RF on Morgan (fixed) |
|---|---|---|---|---|---|
| hPXR (class) | ROC-AUC | 0.842 ± 0.008 | 0.858 ± 0.002 (+1.95%) | 0.598 ± 0.004 (−28.98%) | 0.869 ± 0.007 (+3.28%) |
Table 4. Number of Public Datasets Where D-MPNN is Statistically Significantly Better than, Equivalent to, or Worse than Random Forest.
| baseline | D-MPNN is better | D-MPNN is the same | D-MPNN is worse | no. data sets |
|---|---|---|---|---|
| RF on Morgan (original) | 14 | 0 | 1 | 15 |
| RF on Morgan (fixed) | 9 | 1 | 4 | 15 |



