Skip to main content
. 2025 Aug 14;4(10):2752–2764. doi: 10.1039/d5dd00028a

Table 2. Scaffold diversity and novelty. Metrics were measured after fine-tuning on bioactive molecules for three targets (PPAR, PIM1, and JAK2) using 10 and 100 molecules selected with (a) high similarity and (b) low similarity. Scaffold diversity and novelty relative to the fine-tuning sets (FT) and pre-training sets (PT) are reported as the mean ± standard deviation for 10 fine-tuning molecules. For each experimental setup and each metric, the best and second best values are reported in boldface and italics, respectively.

Similarity Target Augmentation 10 fine-tuning molecules 100 fine-tuning molecules
Scaffold diversity Scaffold novelty (FT) Scaffold novelty (PT) Scaffold diversity Scaffold novelty (FT) Scaffold novelty (PT)
High PPARδ Enumeration 67 ± 2 67 ± 2 35 ± 1 60 ± 1 56 ± 1 42 ± 1
Token deletion (random) 78 ± 1 78 ± 1 46 ± 1 70 ± 2 62 ± 2 41.8 ± 0.9
Token deletion (validity) 77.0 ± 0.9 76.3 ± 0.9 45 ± 1 52 ± 2 48 ± 2 35 ± 2
Token deletion (protected) 82 ± 1 81 ± 1 46 ± 3 64.5 ± 0.8 56.3 ± 0.8 37 ± 3
Atom masking (random) 77 ± 3 75 ± 2 44 ± 1 61.3 ± 0.8 53 ± 1 35 ± 2
Atom masking (funct. Group) 81 ± 1 80 ± 1 45 ± 1 63 ± 4 55 ± 4 35 ± 2
Bioisosteric substitution 80 ± 1 80 ± 1 51 ± 3 54 ± 2 51 ± 2 35.2 ± 0.3
Self-training 83 ± 1 83 ± 1 40.1 ± 0.9 55.3 ± 0.9 53.8 ± 0.9 25 ± 1
No augmentation 93 ± 1 93 ± 1 52 ± 3 77 ± 1 75 ± 1 44 ± 1
PIM1 Enumeration 93.6 ± 0.3 93.2 ± 0.2 58.2 ± 0.8 91 ± 2 88 ± 2 77 ± 1
Token deletion (random) 94.1 ± 0.1 93.6 ± 0.1 59.9 ± 0.1 92 ± 2 85 ± 2 75 ± 2
Token deletion (validity) 93.4 ± 0.9 92.9 ± 1.0 57 ± 2 83 ± 2 77 ± 2 71 ± 2
Token deletion (protected) 94.4 ± 0.2 93.9 ± 0.1 60 ± 2 92 ± 2 86 ± 2 74.5 ± 0.8
Atom masking (random) 90.5 ± 0.4 89.8 ± 0.5 56 ± 2 77.1 ± 0.9 68.3 ± 0.4 57 ± 2
Atom masking (funct. Group) 86.1 ± 0.4 85.3 ± 0.4 51 ± 1 80.1 ± 0.6 72.6 ± 0.9 55 ± 2
Bioisosteric substitution 92.8 ± 0.8 92.5 ± 0.8 53.5 ± 1.0 83 ± 2 81 ± 2 56.7 ± 0.5
Self-training 86.5 ± 0.8 86.5 ± 0.8 45 ± 1 85.7 ± 0.4 84.0 ± 0.5 48 ± 1
No augmentation 94.7 ± 0.7 94.5 ± 0.7 54.6 ± 0.6 93 ± 2 91 ± 1 56.5 ± 0.7
JAK2 Enumeration 93.2 ± 0.4 93.2 ± 0.4 63.8 ± 0.2 87.1 ± 0.8 85 ± 1 73 ± 2
Token deletion (random) 96.8 ± 0.9 96.3 ± 0.9 70 ± 2 91 ± 1 87.2 ± 0.4 72 ± 3
Token deletion (validity) 95.8 ± 0.5 95.1 ± 0.4 75 ± 2 78.2 ± 0.7 76.3 ± 0.7 68 ± 1
Token deletion (protected) 97.3 ± 0.4 96.7 ± 0.4 76.1 ± 0.6 89 ± 2 85 ± 2 72 ± 4
Atom masking (random) 93 ± 1 92 ± 1 68 ± 1 76.8 ± 0.7 72 ± 1 55.5 ± 0.8
Atom masking (funct. Group) 93.8 ± 0.6 92.7 ± 0.7 68 ± 2 81.0 ± 0.3 77.7 ± 0.9 60 ± 2
Bioisosteric substitution 92.8 ± 0.2 92.0 ± 0.2 67.5 ± 0.2 75.1 ± 1.0 71.4 ± 1.0 59 ± 1
Self-training 87 ± 1 87 ± 1 49 ± 2 86.0 ± 0.4 85.2 ± 0.3 51 ± 1
No augmentation 94.8 ± 0.4 94.8 ± 0.4 58.4 ± 0.3 93 ± 1 91 ± 1 61 ± 1
Low PPARδ Enumeration 76.9 ± 0.3 76.6 ± 0.3 40.1 ± 0.8 86.1 ± 0.6 84.7 ± 0.8 58 ± 2
Token deletion (random) 85.6 ± 1.0 85.0 ± 1.0 47 ± 2 81.2 ± 0.7 77 ± 1 42 ± 1
Token deletion (validity) 91.0 ± 0.9 90.5 ± 1.0 48 ± 3 84.5 ± 0.5 82.4 ± 0.4 46.1 ± 0.9
Token deletion (protected) 91.5 ± 0.6 91.3 ± 0.6 52 ± 2 86 ± 1 84 ± 1 46 ± 1
Atom masking (random) 90.0 ± 0.6 89.7 ± 0.6 49 ± 1 83 ± 3 80 ± 2 42 ± 2
Atom masking (funct. Group) 83 ± 2 82 ± 2 44 ± 2 82.3 ± 0.8 79.4 ± 0.9 44 ± 1
Bioisosteric substitution 90.0 ± 1.0 89.7 ± 0.9 51 ± 2 91 ± 1 89 ± 1 55.0 ± 0.8
Self-training 89 ± 1 89 ± 1 46.2 ± 0.4 71 ± 1 71 ± 1 33 ± 1
No augmentation 94 ± 2 94 ± 2 54 ± 2 88.9 ± 0.8 88.0 ± 0.9 46 ± 3
PIM1 Enumeration 90.0 ± 0.3 89.8 ± 0.2 47.6 ± 0.7 93.5 ± 0.3 91.6 ± 0.5 63 ± 1
Token deletion (random) 94.6 ± 0.6 93.9 ± 0.6 55.6 ± 0.7 92.6 ± 0.1 90.0 ± 0.4 54.0 ± 1.0
Token deletion (validity) 94.9 ± 0.2 94.8 ± 0.2 56 ± 1 94.9 ± 0.5 93.0 ± 0.7 55 ± 2
Token deletion (protected) 95.5 ± 0.4 95.4 ± 0.4 57.4 ± 0.5 94.5 ± 0.4 92.5 ± 0.7 55 ± 1
Atom masking (random) 96.0 ± 0.4 95.6 ± 0.3 54 ± 2 93.9 ± 0.3 91.8 ± 0.7 52.4 ± 0.9
Atom masking (funct. Group) 94.7 ± 0.3 94.2 ± 0.3 55 ± 1 91.5 ± 0.3 88.3 ± 0.2 49.6 ± 0.4
Bioisosteric substitution 95.2 ± 0.6 95.0 ± 0.5 55 ± 2 94.4 ± 0.6 93.4 ± 0.6 55 ± 1
Self-training 88.6 ± 0.5 88.6 ± 0.5 46.5 ± 0.9 87.9 ± 0.9 87.5 ± 0.8 42.2 ± 0.4
No augmentation 94.8 ± 0.9 94.7 ± 0.8 54 ± 2 94.5 ± 0.7 93.8 ± 0.6 53 ± 2
JAK2 Enumeration 83 ± 1 83 ± 1 39.8 ± 0.3 94.3 ± 0.9 93 ± 1 66 ± 1
Token deletion (random) 91.9 ± 0.5 91.3 ± 0.4 51 ± 1 95 ± 1 93 ± 2 59 ± 1
Token deletion (validity) 93.1 ± 0.6 92.8 ± 0.6 51.8 ± 0.4 95.0 ± 0.8 93.5 ± 1.0 56 ± 2
Token deletion (protected) 91.3 ± 0.7 90.6 ± 0.6 51 ± 1 96.7 ± 0.9 96 ± 1 59 ± 1
Atom masking (random) 90.8 ± 0.4 90.3 ± 0.4 52.3 ± 0.9 93.5 ± 0.4 91.1 ± 0.5 55 ± 2
Atom masking (funct. Group) 92.9 ± 0.3 92.4 ± 0.2 54 ± 2 94.1 ± 0.2 92.2 ± 0.5 56 ± 2
Bioisosteric substitution 94.5 ± 0.2 94.3 ± 0.1 54.3 ± 0.9 94.7 ± 0.6 94.2 ± 0.7 57.5 ± 0.9
Self-training 87.6 ± 0.7 87.5 ± 0.6 44.8 ± 0.7 87 ± 1 86.6 ± 0.9 41.4 ± 0.9
No augmentation 94 ± 1 94 ± 1 53 ± 1 96 ± 1 96 ± 1 53.3 ± 0.3