Skip to main content
. 2021 Nov 19;1(1):vbab035. doi: 10.1093/bioadv/vbab035

Table 1.

‘Implementation’ details for SeqVec (Heinzinger et al., 2019), ProtBert (Elnaggar et al., 2021), ProtT5 (Elnaggar et al., 2021), ESM-1b (Rives et al., 2021), UniRep (Alley et al., 2019) and BB (Bepler and Berger, 2019)

SeqVec ProtBert ProtT5 ESM-1b UniRep BB
Parameters 93M 420M 3B 650M 18.2M 90M*
Dataset UniRef50 BFD BFD UniRef50 UniRef50 Pfam
Sequences 33M 2.1B 2.1B 27M 27M 21M
Embed time (s) 0.03 0.06 0.1 0.09 2.1 0.1
Attention heads 0 16 32 20 0 0
Bits per float 32 32 16 32 32 32
Size (GB) 0.35 1.6 3.6 7.3 0.06 0.12

Notes: Estimates marked by *; differences in the number of proteins (Sequences) for the same set (Dataset) originated from versioning. The embedding time (in seconds) was averaged over 10 000 proteins taken from the PDB (Berman et al., 2000) using the embedding models taken from bio-embeddings (Dallago et al., 2021).