. 2023 Mar 23;63(7):1914–1924. doi: 10.1021/acs.jcim.2c01407

Table 1. Top-1 Accuracy Using Different Molecule Formats, Tokenization Schemes, and Embeddings Strategies^a.

	atom-Level		BPE
	FS	PT	FS	PT
Product Prediction (with Reagents)
SMILES	0.879	0.865	0.854	0.512
SELFIES	0.768	0.721	0.654	0.313
Product Prediction (without Reagents)
SMILES	0.837	0.827	0.807	0.589
SELFIES	0.745	0.695	0.623	0.379
Reactant Prediction (with Reagents)
SMILES	0.678	0.643	0.660	0.421
SELFIES	0.610	0.545	0.540	0.301
Reactant Prediction (without Reagent)
SMILES	0.525	0.504	0.514	0.401
SELFIES	0.472	0.449	0.427	0.311
Reagent Prediction
SMILES	0.196	0.135	0.183	0.211
SELFIES	0.187	0.122	0.174	0.196

FS—input embeddings trained from scratch, and PT—pre-trained input embeddings.

Table 1. Top-1 Accuracy Using Different Molecule Formats, Tokenization Schemes, and Embeddings Strategiesa.