Table 2.
Comparison of strategies for obtaining global, sequence-length independent representations on three downstream tasks5.
| Stability (Corr.) | Fluorescence (Corr.) | Homology (Acc.) | |
|---|---|---|---|
| Mean | 0.42 | 0.19 | 0.27 |
| Attention | 0.65 | 0.23 | 0.27 |
| Light Att. | 0.66 | 0.23 | 0.27 |
| Maximum | 0.02 | 0.02 | 0.28 |
| MeanMax | 0.37 | 0.15 | 0.26 |
| KMax | 0.10 | 0.11 | 0.27 |
| Concat | 0.74 | 0.69 | 0.34 |
| Bottleneck | 0.79 | 0.78 | 0.41 |
The first six are variants of averaging used in the literature, using uniform weights (Mean), some variant of learned attention weights (Attention5, Light Attention30), or averages of the local representation with the highest attention weight (Maximum, MeanMax, KMax(K = 5)). They all use the same pre-trained and backbone Resnet model, while the last two entries use modified Resnet architectures using either a very low-dimensional feature representation (Concat), or an autoencoder-like structure downsample the representation length. In all cases, training proceeded without fine-tuning. The results demonstrate that simple alternatives such as concatenating smaller local representations (Concat) or changing the model to directly learn a global representation (Bottleneck) can have a substantial impact on performance (best results in bold).