Skip to main content
. 2021 Apr 20;12(20):7079–7090. doi: 10.1039/d1sc00231g

Number and percentage of unique molecules obtained within different fingerprint-based similarity thresholds (δ) of the starting structures. The molecules in each experiment were generated from 250 000 random string mutations of the starting structures. Additionally, for celecoxib, we also formed the local chemical space with a scaffold constraint.

Starting structure (method) Fingerprint Number of molecules (and percentage)
δ > 0.75 δ > 0.60 δ > 0.40
Aripirazole (SELFIES, random) ECFP4 513 (0.25%) 4206 (2.15%) 34 416 (17.66%)
Albuterol (SELFIES, random) FCFP4 587 (0.32%) 4156 (2.33%) 16 977 (9.35%)
Mestranol (SELFIES, random) AP 478 (0.22%) 4079 (1.90%) 45 594 (21.66%)
Celecoxib (SELFIES, random) ECFP4 198 (0.10%) 1925 (1.00%) 18 045 (9.44%)
Celecoxib (SELFIES, terminal 10%) ECFP4 864 (2.02%) 9407 (21.99%) 34 187 (79.91%)
Celecoxib (SELFIES, central 10%) ECFP4 111 (0.08%) 1767 (1.32%) 15 348 (11.45%)
Celecoxib (SELFIES, initial 10%) ECFP4 368 (0.53%) 7345 (10.53%) 34 702 (49.74%)
Celecoxib (SMILES, random) ECFP4 122 (18.43%) 515 (77.49%) 662 (100.00%)
Celecoxib (SMILES, terminal 10%) ECFP4 90 (20.79%) 368 (84.99%) 433 (100.00%)
Celecoxib (SMILES, central 10%) ECFP4 114 (22.18%) 419 (81.52%) 514 (100.00%)
Celecoxib (SMILES, initial 10%) ECFP4 122 (19.71%) 490 (79.16%) 619 (100.00%)
Celecoxib (DeepSMILES, random) ECFP4 132 (4.43%) 953 (31.99%) 2793 (93.76%)
Celecoxib (DeepSMILES, terminal 10%) ECFP4 106 (9.73%) 513 (47.11%) 1083 (99.45%)
Celecoxib (DeepSMILES, central 10%) ECFP4 53 (6.54%) 162 (19.98%) 658 (81.13%)
Celecoxib (DeepSMILES, initial 10%) ECFP4 105 (9.28%) 609 (53.80%) 1106 (97.70%)
Celecoxib (SELFIES, scaffold constraint) ECFP4 354 (0.44%) 6311 (7.79%) 53 479 (66.07%)
Celecoxib (CReM, ChEMBL: SCScore ≤ 2.5) ECFP4 239 (0.58%) 5547 (13.47%) 14 887 (36.14%)