Skip to main content
. 2024 Apr 22;15:3408. doi: 10.1038/s41467-024-47613-w

Table 2.

Comparison of four Dragonfly methods, namely ligand-SMILES, ligand-SELFIES, structure-SMILES, and Structure-SELFIES

DRAGONFLY method Valid and unique molecules / % Valid, unique and novel molecules / % RAScore ≥ 0.5 / % Average Jaccard distance to other molecules
Ligand-SMILES 93.3 ( ± 0.4) 92.2 ( ± 0.4) 93.4 (±0.6) 0.778 ( ± 0.001)
Ligand-SELFIES 99.9 (±0.1) 99.7 (±0.1) 84.0 ( ± 1.0) 0.805 (±0.002)
Structure-SMILES 90.2 ( ± 0.8) 87.4 ( ± 0.9) 90.0 (±1.0) 0.773 ( ± 0.004)
Structure-SELFIES 99.9 (±0.1) 99.6 (±0.1) 78.0 ( ± 2.0) 0.811 (±0.003)
Unique atom scaffolds / % Unique and Novel atom scaffolds / % Unique carbon scaffolds / % Unique and novel carbon scaffolds / %
Ligand-SMILES 85.0 ( ± 0.1) 53.0 ( ± 0.2) 98.4 ( ± 0.3) 58.0 ( ± 0.2)
Ligand-SELFIES 96.9 (±0.4) 86.0 (±0.1) 99.8 (±0.1) 83.0 (±0.1)
Structure-SMILES 84.0 ( ± 0.1) 55.0 ( ± 0.3) 98.3 ( ± 0.3) 56.0 ( ± 0.2)
Structure-SELFIES 96.0 (±0.1) 81.0 (±0.1) 99.9 (±0.1) 83.0 (±0.2)

Bold indicates whether SELFIES- or SMILES-based models achieve a higher value for the investigated property in both structure- and ligand-based models. The percentage of molecules is shown that fulfill the desired criteria: (i) valid and unique molecules, (ii) valid, unique, and novel molecules, (iii) fraction of molecules with an RAScore of ≥ 0.5, (iv) average Jaccard distance to other generated molecules from the same run (indicating diversity), and (v)–(viii) various scaffold metrics, including unique and novel carbon and atom scaffolds. The values are presented as mean and standard deviation, based on three Dragonfly runs (N = 3), each sampling 2000 SMILES-strings.