Skip to main content
. 2025 Aug 14;4(10):2752–2764. doi: 10.1039/d5dd00028a

Fig. 3. Distribution learning after fine-tuning. The Kolmogorov–Smirnov (KS) distance for eight selected descriptors was calculated between 3000 designs and the respective fine-tuning sets (the lower the KS, the better). (a) KS distances grouped by fine-tuning set similarity (high/low) and number of fine-tuning molecules (10, 100). Statistically significant differences (Wilcoxon signed-rank test, p < 0.05) between the new augmentation approaches and no augmentation or SMILES enumeration are marked with asterisks. (b–e) Principal component analysis (PCA) obtained on the KS values for different dataset sizes (b and d: 10; c and e: 100) and similarity levels (b and c: high; d and e: low). ‘Best’ and ‘Worst’ indicate the lowest and highest values of KS obtained across experiments, and the line connecting represents the direction of average performance variation from the best to worst performance.

Fig. 3