Skip to main content
. 2019 Dec 3;11:74. doi: 10.1186/s13321-019-0397-9

Table 3.

Metrics obtained from a 50,000 SMILES sample of all the models trained

Dataset Arch. Valid (%) Unique (%) Novel (%) Active (%) Recovered actives/total actives (%) Recovered neighbors
EGFR GAN 86 56 97 71 5.26 196
RNN 96 46 95 65 7.74 238
HTR1A GAN 86 66 95 71 5.05 284
RNN 96 50 90 81 7.28 384
S1PR1 GAN 89 31 98 44 0.93 24
RNN 97 35 97 65 3.72 43

Dataset used (Dataset), Architecture used (Arch.), Percent of valid molecules in the sampled set (Valid), Percent of valid unique compounds (Unique), Percent of unique novel (not present in the training set) compounds (Novel), Percent of unique active compounds (Active), Recovered actives from the test set given the entire number of actives in the test set (Recovered actives/Total Actives), Recovered neighbors of active compounds using FCFP6 fingerprint with 2048 bits and a threshold Tanimoto similarity of 0.7