Performance of DeepReac and other models on regression prediction for three benchmark datasetsd.
| Dataset A | Dataset B | Dataset C | ||||
|---|---|---|---|---|---|---|
| RMSE | R 2 | RMSE | R 2 | MAE | R 2 | |
| Mean | 0.273 ± 0.002 | —b | 0.290 ± 0.004 | —b | 0.558 ± 0.035 | —b |
| Median | 0.276 ± 0.003 | —b | 0.303 ± 0.006 | —b | 0.557 ± 0.036 | —b |
| Previous work9,83,84a | 0.073 ± 0.004 | 0.919 ± 0.010 | 0.180 ± 0.004 | 0.354 ± 0.034 | 0.186 ± 0.010 | 0.822 ± 0.020 |
| MFF + RF33a | 0.071 ± 0.004 | 0.924 ± 0.009 | —c | —c | 0.132 ± 0.010 | 0.912 ± 0.012 |
| DeepReac | 0.053 ± 0.004 | 0.960 ± 0.006 | 0.088 ± 0.006 | 0.901 ± 0.013 | 0.096 ± 0.018 | 0.956 ± 0.012 |
| DeepReac_noG | 0.134 ± 0.011 | 0.674 ± 0.067 | 0.171 ± 0.008 | 0.467 ± 0.072 | 0.178 ± 0.021 | 0.852 ± 0.026 |
| DeepReac_noC | 0.061 ± 0.003 | 0.949 ± 0.005 | 0.096 ± 0.001 | 0.884 ± 0.003 | 0.185 ± 0.011 | 0.847 ± 0.025 |
| DeepReac_noGC | 0.150 ± 0.004 | 0.568 ± 0.007 | 0.200 ± 0.004 | 0.114 ± 0.068 | 0.198 ± 0.014 | 0.837 ± 0.017 |
Because the validation method is different from the original studies, we retrained these models and tested. Note that the retained models have a slightly lower prediction performance than these methods reported originally.
The R2 values for the mean and median models turn out to be all negative, which are not meaningful, so they were omitted.
Since MFF didn't indicate how to encode inorganic compounds which are included in Dataset B, we didn't train the MFF + RF model on this dataset.
The values correspond to mean ± standard deviation of the CV results. The best results are given in bold. RMSE, root-mean-square error. MAE, mean absolute error, in kcal mol−1. R2, coefficient of determination. MFF, multiple fingerprint feature. RF, random forest. See also Fig. S12–S26.