Results for the three reaction datasets. For each dataset, the mean and standard deviation of R2 and MAE (in parenthesis) of the test set were obtained via training each model on 30 random data splits.
Method | Suzuki–Miyaura [HTE]18 | Buchwald–Hartwig [HTE]10 | Buchwald–Hartwig [ELN] |
---|---|---|---|
RFa | 0.828 ± 0.008 (0.082 ± 0.002) | 0.913 ± 0.008 (0.054 ± 0.002) | 0.266 ± 0.037 (0.202 ± 0.007) |
RFb | 0.796 ± 0.011 (0.09 ± 0.002) | 0.917 ± 0.008 (0.054 ± 0.002) | 0.262 ± 0.029 (0.205 ± 0.007) |
BERT22 | 0.81 ± 0.01 (0.078 ± 0.004) | 0.951 ± 0.005 (0.054 ± 0.003) | −0.006 ± 0.105 (0.253 ± 0.01) |
Lassoa | 0.798 ± 0.001 (0.167 ± 0.002) | 0.699 ± 0.011 (0.120 ± 0.002) | c |
SVMa | 0.798 ± 0.009 (0.100 ± 0.002) | 0.848 ± 0.009 (0.082 ± 0.001) | 0.222 ± 0.057 (0.209 ± 0.008) |
KNNa | 0.568 ± 0.011 (0.148 ± 0.002) | 0.530 ± 0.019 (0.152 ± 0.003) | 0.067 ± 0.04 (0.241 ± 0.008) |
One-hot encoding | 0.816 ± 0.008 (0.086 ± 0.002) | 0.831 ± 0.002 (0.081 ± 0.002) | 0.144 ± 0.072 (0.105 ± 0.004) |
Shufflea | −0.055 ± 0.013 (0.257 ± 0.003) | −0.066 ± 0.017 (0.241 ± 0.005) | −0.159 ± 0.060 (0.247 ± 0.011) |
With RDKit features.
Without RDKit features.
Overfitted.