Skip to main content
. 2024 Apr 22;15:3408. doi: 10.1038/s41467-024-47613-w

Table 1.

Comparison of DRAGONFLY with a fine-tuned recurrent neural network (RNN) approach, assessing the percentage of molecules meeting various criteria: (i) Unique and novel, (ii) Novelty score ≥ 0.65, (iii) Retrosynthetic accessibility score (RAScore) ≥ 0.5, (iv) QSAR score ≤ 1 μM, and (v) meeting all four criteria

Template / Method Unique and novel / % Novelty score ≥ 0.65 / % RAScore ≥ 0.5 / % QSAR score ≤ 1 μM / % All criteria / %
PPARγ
RNN-SMILES 75.4 ( ± 2.7) 28.7 ( ± 1.1) 67.9 ( ± 2.3) 29.6 ( ± 2.4) 5.1 ( ± 0.2)
DRAGONFLY-SMILES 91.8 ( ± 0.3) 47.9 ( ± 1.4) 86.0 (±0.3) 34.7 (±0.3) 9.4 ( ± 0.0)
DRAGONFLY-SELFIES 99.8 (±0.1) 77.4 (±0.1) 82.2 ( ± 0.2) 31.9 ( ± 0.1) 13.3 (±0.0)
LXRβ
RNN-SMILES 92.4 ( ± 2.5) 65.9 ( ± 2.6) 87.9 ( ± 2.8) 28.6 ( ± 0.9) 11.3 ( ± 0.4)
DRAGONFLY-SMILES 94.3 ( ± 0.5) 80.2 ( ± 1.2) 89.1 (±0.5) 26.2 ( ± 0.2) 11.8 (±0.1)
DRAGONFLY-SELFIES 100 (±0.0) 91.3 (±0.5) 84.2 ( ± 0.3) 27.9 (±0.2) 11.1 ( ± 0.1)
RARα
RNN-SMILES 69.7 ( ± 5.9) 41.9 ( ± 3.3) 57.2 ( ± 4.3) 30.1 ( ± 1.8) 11.1 ( ± 0.7)
DRAGONFLY-SMILES 92.2 ( ± 0.4) 62.4 ( ± 0.7) 75.6 ( ± 0.5) 32.4 (±0.7) 12.7 ( ± 0.2)
DRAGONFLY-SELFIES 99.8 (±0.0) 87.5 (±0.3) 77.1 (±0.2) 29.6 ( ± 0.3) 14.0 (±0.1)
BRAF
RNN-SMILES 89.2 ( ± 3.5) 35.1 ( ± 3.1) 85.9 ( ± 3.0) 35.0 ( ± 1.3) 6.7 ( ± 0.3)
DRAGONFLY-SMILES 87.9 ( ± 0.6) 46.0 ( ± 0.8) 80.9 (±0.5) 42.9 (±0.5) 10.7 ( ± 0.1)
DRAGONFLY-SELFIES 99.7 (±0.1) 81.1 (±0.6) 77.3 ( ± 0.4) 34.3 ( ± 0.1) 12.4 (±0.0)
BTK
RNN-SMILES 82.0 ( ± 4.4) 64.5 ( ± 4.1) 61.9 ( ± 4.7) 20.7 ( ± 1.8) 4.5 ( ± 0.2)
DRAGONFLY-SMILES 88.9 ( ± 0.7) 53.2 ( ± 0.4) 69.6 ( ± 0.9) 36.3 (±0.7) 8.8 (±0.1)
DRAGONFLY-SELFIES 100 (±0.0) 85.8 (±0.7) 68.2 (±1.0) 25.8 ( ± 0.1) 5.8 ( ± 0.0)
JAK2
RNN-SMILES 88.8 ( ± 3.9) 60.2 ( ± 4.2) 79.9 ( ± 3.4) 35.0 ( ± 2.2) 14.5 ( ± 0.8)
DRAGONFLY-SMILES 84.8 ( ± 1.0) 39.4 ( ± 0.9) 69.0 ( ± 1.0) 55.9 (±1.5) 14.8 ( ± 0.2)
DRAGONFLY-SELFIES 99.2 (±0.0) 73.3 (±0.8) 70.5 (±0.5) 50.5 ( ± 1.0) 18.3 (±0.2)

Bold indicates whether the SELFIES- or SMILES-based models achieve a higher value for the investigated property in both structure- and ligand-based models. The values are presented as mean and standard deviation, based on three runs (N = 3), each sampling 2000 SMILES-strings. The complete list of 20 investigated targets can be found in Tables S2S6. JAK Janus kinase, PPAR Peroxisome proliferator-activated receptor, BRAF Serine/threonine-protein kinase B-Raf (rapidly accelerated fibrosarcoma), BTK Bruton’s tyrosine kinase, RAR  Retinoic acid receptor, LXR Liver X receptor.