Skip to main content
. 2023 Feb 8;15:18. doi: 10.1186/s13321-023-00686-z

Table 1.

AUROC values for the different models trained on a small dataset of 10,000 compounds

Similarity threshold Vanilla transformer Triplet loss Similarity loss
0.45 0.68 ± 0.17 0.73 ± 0.17 0.82 ± 0.18
0.50 0.69 ± 0.18 0.75 ± 0.16 0.86 ± 0.17
0.55 0.75 ± 0.18 0.80 ± 0.15 0.92 ± 0.08
0.60 0.76 ± 0.18 0.81 ± 0.15 0.91 ± 0.11
0.65 0.80 ± 0.17 0.85 ± 0.13 0.94 ± 0.09
0.70 0.84 ± 0.18 0.89 ± 0.12 0.96 ± 0.07
0.75 0.87 ± 0.16 0.91 ± 0.12 0.97 ± 0.07
0.80 0.90 ± 0.14 0.94 ± 0.09 0.98 ± 0.07
0.85 0.92 ± 0.14 0.96 ± 0.08 0.98 ± 0.07
0.90 0.94 ± 0.14 0.98 ± 0.05 0.98 ± 0.08
0.95 0.97 ± 0.09 0.99 ± 0.04 1.00 ± 0.01

While the vanilla transformer model was trained using only a reconstruction loss function, the other two models were trained with an additional loss term to specifically enforce the conservation of ground truth similarities in the latent space