. 2019 Dec 3;11:74. doi: 10.1186/s13321-019-0397-9

Table 3.

Metrics obtained from a 50,000 SMILES sample of all the models trained

Dataset	Arch.	Valid (%)	Unique (%)	Novel (%)	Active (%)	Recovered actives/total actives (%)	Recovered neighbors
EGFR	GAN	86	56	97	71	5.26	196
EGFR	RNN	96	46	95	65	7.74	238
HTR1A	GAN	86	66	95	71	5.05	284
HTR1A	RNN	96	50	90	81	7.28	384
S1PR1	GAN	89	31	98	44	0.93	24
S1PR1	RNN	97	35	97	65	3.72	43

Dataset used (Dataset), Architecture used (Arch.), Percent of valid molecules in the sampled set (Valid), Percent of valid unique compounds (Unique), Percent of unique novel (not present in the training set) compounds (Novel), Percent of unique active compounds (Active), Recovered actives from the test set given the entire number of actives in the test set (Recovered actives/Total Actives), Recovered neighbors of active compounds using FCFP6 fingerprint with 2048 bits and a threshold Tanimoto similarity of 0.7