Skip to main content
. 2023 May 29;15:55. doi: 10.1186/s13321-023-00725-9

Table 5.

Performance analysis of tokenization schemes for molecular property prediction using MoleculeNet benchmark suite

SMILES DeepSMILES SELFIES SmilesPE AIS
Regression Datasets: RMSE
ESOL 0.628 0.631 0.675 0.689 0.553
FreeSolv 0.545 0.544 0.564 0.761 0.441
Lip 0.924 0.895 0.938 0.800 0.683
Classification Datasets: ROC-AUC
BBBP 0.758 0.777 0.799 0.847 0.885
BACE 0.740 0.774 0.746 0.837 0.835
HIV 0.649 0.648 0.653 0.739 0.729

Comparison of Random Forest regression and classification models with 5-Fold Cross-Validation. Bold emphasis  denotes the highest performing approach