Skip to main content
. 2023 Mar 23;63(7):1914–1924. doi: 10.1021/acs.jcim.2c01407

Table 1. Top-1 Accuracy Using Different Molecule Formats, Tokenization Schemes, and Embeddings Strategiesa.

  atom-Level
BPE
  FS PT FS PT
Product Prediction (with Reagents)
SMILES 0.879 0.865 0.854 0.512
SELFIES 0.768 0.721 0.654 0.313
Product Prediction (without Reagents)
SMILES 0.837 0.827 0.807 0.589
SELFIES 0.745 0.695 0.623 0.379
Reactant Prediction (with Reagents)
SMILES 0.678 0.643 0.660 0.421
SELFIES 0.610 0.545 0.540 0.301
Reactant Prediction (without Reagent)
SMILES 0.525 0.504 0.514 0.401
SELFIES 0.472 0.449 0.427 0.311
Reagent Prediction
SMILES 0.196 0.135 0.183 0.211
SELFIES 0.187 0.122 0.174 0.196
a

FS—input embeddings trained from scratch, and PT—pre-trained input embeddings.