. 2020 Mar 3;11(12):3355–3364. doi: 10.1039/c9sc03666k

Breakdown of the grammatically invalid SMILES error for different beam sizes^a.

Invalid SMILES rate (%)
Model (dataset)	1	3	5	10
Liu et al. LSTM +class (USPTO_50K)²⁹	12.2	15.3	18.4	22
Our Transformer +token (USPTO_50K)	2.2	3.7	4.8	7.8
Our Transformer +token +class (USPTO_50K)	2.3	4.9	7.0	12.1
Our Transformer +char (USPTO_50K)	2.1	3.5	4.7	8.3
Our Transformer +char +class (USPTO_50K)	2.4	4.4	6.4	12.6
Our Transformer +char +class (USPTO_MIT)	0.4	1.5	2.9	8.6

Key: “+class” means that reaction class information is added to the model; “+token” means that token-based preprocessing is applied; “+char” means that char-based preprocessing is applied.

Breakdown of the grammatically invalid SMILES error for different beam sizesa.

Breakdown of the grammatically invalid SMILES error for different beam sizes^a.