Skip to main content
. 2020 Mar 3;11(12):3355–3364. doi: 10.1039/c9sc03666k

Breakdown of the grammatically invalid SMILES error for different beam sizesa.

Invalid SMILES rate (%)
Model (dataset) 1 3 5 10
Liu et al. LSTM +class (USPTO_50K)29 12.2 15.3 18.4 22
Our Transformer +token (USPTO_50K) 2.2 3.7 4.8 7.8
Our Transformer +token +class (USPTO_50K) 2.3 4.9 7.0 12.1
Our Transformer +char (USPTO_50K) 2.1 3.5 4.7 8.3
Our Transformer +char +class (USPTO_50K) 2.4 4.4 6.4 12.6
Our Transformer +char +class (USPTO_MIT) 0.4 1.5 2.9 8.6
a

Key: “+class” means that reaction class information is added to the model; “+token” means that token-based preprocessing is applied; “+char” means that char-based preprocessing is applied.