Skip to main content

View full-text article in PMC

. 2023 Mar 13;14(19):4997–5005. doi: 10.1039/d2sc06041h

Results for the three reaction datasets. For each dataset, the mean and standard deviation of R² and MAE (in parenthesis) of the test set were obtained via training each model on 30 random data splits.

Method	Suzuki–Miyaura [HTE]¹⁸	Buchwald–Hartwig [HTE]¹⁰	Buchwald–Hartwig [ELN]
RF^a	0.828 ± 0.008 (0.082 ± 0.002)	0.913 ± 0.008 (0.054 ± 0.002)	0.266 ± 0.037 (0.202 ± 0.007)
RF^b	0.796 ± 0.011 (0.09 ± 0.002)	0.917 ± 0.008 (0.054 ± 0.002)	0.262 ± 0.029 (0.205 ± 0.007)
BERT²²	0.81 ± 0.01 (0.078 ± 0.004)	0.951 ± 0.005 (0.054 ± 0.003)	−0.006 ± 0.105 (0.253 ± 0.01)
Lasso^a	0.798 ± 0.001 (0.167 ± 0.002)	0.699 ± 0.011 (0.120 ± 0.002)	^c
SVM^a	0.798 ± 0.009 (0.100 ± 0.002)	0.848 ± 0.009 (0.082 ± 0.001)	0.222 ± 0.057 (0.209 ± 0.008)
KNN^a	0.568 ± 0.011 (0.148 ± 0.002)	0.530 ± 0.019 (0.152 ± 0.003)	0.067 ± 0.04 (0.241 ± 0.008)
One-hot encoding	0.816 ± 0.008 (0.086 ± 0.002)	0.831 ± 0.002 (0.081 ± 0.002)	0.144 ± 0.072 (0.105 ± 0.004)
Shuffle^a	−0.055 ± 0.013 (0.257 ± 0.003)	−0.066 ± 0.017 (0.241 ± 0.005)	−0.159 ± 0.060 (0.247 ± 0.011)

^a

With RDKit features.

^b

Without RDKit features.

^c

Overfitted.