Skip to main content
. 2025 Aug 14;4(10):2752–2764. doi: 10.1039/d5dd00028a

Fig. 2. Syntactic validity of SMILES across augmentation strategies and augmentation folds. Several folds of augmentation (three- and ten-folds), across five training set sizes (1000, 2500, 5000, 75 000, and 10 000 SMILES) were analyzed. For each set-up, 1000 SMILES strings were generated across four repetitions for the analysis. The highest validity obtained by SMILES enumeration and without any augmentation is represented as solid and dashed lines, respectively. Statistically significant differences (one-sided Wilcoxon rank-sum test, p < 0.05) between the new augmentation approaches and SMILES enumeration (10×) are marked with asterisks.

Fig. 2