Breakdown of the grammatically invalid SMILES error for different beam sizesa.
Invalid SMILES rate (%) | ||||
---|---|---|---|---|
Model (dataset) | 1 | 3 | 5 | 10 |
Liu et al. LSTM +class (USPTO_50K)29 | 12.2 | 15.3 | 18.4 | 22 |
Our Transformer +token (USPTO_50K) | 2.2 | 3.7 | 4.8 | 7.8 |
Our Transformer +token +class (USPTO_50K) | 2.3 | 4.9 | 7.0 | 12.1 |
Our Transformer +char (USPTO_50K) | 2.1 | 3.5 | 4.7 | 8.3 |
Our Transformer +char +class (USPTO_50K) | 2.4 | 4.4 | 6.4 | 12.6 |
Our Transformer +char +class (USPTO_MIT) | 0.4 | 1.5 | 2.9 | 8.6 |
Key: “+class” means that reaction class information is added to the model; “+token” means that token-based preprocessing is applied; “+char” means that char-based preprocessing is applied.