Table 1.
SeqVec | ProtBert | ProtT5 | ESM-1b | UniRep | BB | |
---|---|---|---|---|---|---|
Parameters | 93M | 420M | 3B | 650M | 18.2M | 90M* |
Dataset | UniRef50 | BFD | BFD | UniRef50 | UniRef50 | Pfam |
Sequences | 33M | 2.1B | 2.1B | 27M | 27M | 21M |
Embed time (s) | 0.03 | 0.06 | 0.1 | 0.09 | 2.1 | 0.1 |
Attention heads | 0 | 16 | 32 | 20 | 0 | 0 |
Bits per float | 32 | 32 | 16 | 32 | 32 | 32 |
Size (GB) | 0.35 | 1.6 | 3.6 | 7.3 | 0.06 | 0.12 |
Notes: Estimates marked by *; differences in the number of proteins (Sequences) for the same set (Dataset) originated from versioning. The embedding time (in seconds) was averaged over 10 000 proteins taken from the PDB (Berman et al., 2000) using the embedding models taken from bio-embeddings (Dallago et al., 2021).