Table 1.
Similarity threshold | Vanilla transformer | Triplet loss | Similarity loss |
---|---|---|---|
0.45 | 0.68 ± 0.17 | 0.73 ± 0.17 | 0.82 ± 0.18 |
0.50 | 0.69 ± 0.18 | 0.75 ± 0.16 | 0.86 ± 0.17 |
0.55 | 0.75 ± 0.18 | 0.80 ± 0.15 | 0.92 ± 0.08 |
0.60 | 0.76 ± 0.18 | 0.81 ± 0.15 | 0.91 ± 0.11 |
0.65 | 0.80 ± 0.17 | 0.85 ± 0.13 | 0.94 ± 0.09 |
0.70 | 0.84 ± 0.18 | 0.89 ± 0.12 | 0.96 ± 0.07 |
0.75 | 0.87 ± 0.16 | 0.91 ± 0.12 | 0.97 ± 0.07 |
0.80 | 0.90 ± 0.14 | 0.94 ± 0.09 | 0.98 ± 0.07 |
0.85 | 0.92 ± 0.14 | 0.96 ± 0.08 | 0.98 ± 0.07 |
0.90 | 0.94 ± 0.14 | 0.98 ± 0.05 | 0.98 ± 0.08 |
0.95 | 0.97 ± 0.09 | 0.99 ± 0.04 | 1.00 ± 0.01 |
While the vanilla transformer model was trained using only a reconstruction loss function, the other two models were trained with an additional loss term to specifically enforce the conservation of ground truth similarities in the latent space