Skip to main content
. 2023 Mar 13;14(19):4997–5005. doi: 10.1039/d2sc06041h

Results for the three reaction datasets. For each dataset, the mean and standard deviation of R2 and MAE (in parenthesis) of the test set were obtained via training each model on 30 random data splits.

Method Suzuki–Miyaura [HTE]18 Buchwald–Hartwig [HTE]10 Buchwald–Hartwig [ELN]
RFa 0.828 ± 0.008 (0.082 ± 0.002) 0.913 ± 0.008 (0.054 ± 0.002) 0.266 ± 0.037 (0.202 ± 0.007)
RFb 0.796 ± 0.011 (0.09 ± 0.002) 0.917 ± 0.008 (0.054 ± 0.002) 0.262 ± 0.029 (0.205 ± 0.007)
BERT22 0.81 ± 0.01 (0.078 ± 0.004) 0.951 ± 0.005 (0.054 ± 0.003) −0.006 ± 0.105 (0.253 ± 0.01)
Lassoa 0.798 ± 0.001 (0.167 ± 0.002) 0.699 ± 0.011 (0.120 ± 0.002) c
SVMa 0.798 ± 0.009 (0.100 ± 0.002) 0.848 ± 0.009 (0.082 ± 0.001) 0.222 ± 0.057 (0.209 ± 0.008)
KNNa 0.568 ± 0.011 (0.148 ± 0.002) 0.530 ± 0.019 (0.152 ± 0.003) 0.067 ± 0.04 (0.241 ± 0.008)
One-hot encoding 0.816 ± 0.008 (0.086 ± 0.002) 0.831 ± 0.002 (0.081 ± 0.002) 0.144 ± 0.072 (0.105 ± 0.004)
Shufflea −0.055 ± 0.013 (0.257 ± 0.003) −0.066 ± 0.017 (0.241 ± 0.005) −0.159 ± 0.060 (0.247 ± 0.011)
a

With RDKit features.

b

Without RDKit features.

c

Overfitted.