Table 1.
Embeddings | Language models | Layers | Parameters | Training databases |
---|---|---|---|---|
ProteinBERT | Modified BERT | 6 | 16M | Uniref90 (106M seqs) |
esm2_t30_150M_UR50D | BERT | 30 | 150M | UniRef50D (2021_04) (50M seqs) |
esm2_t33_650M_UR50D | BERT | 33 | 650M | UniRef50D (2021_04) (50M seqs) |
ProtT5-XL | T5 | 24 | 3B | BFD100 (2B seqs) + UniRef50 (45M seqs) |
ProtBert | BERT | 30 | 420 M | BFD100 (2B seqs) |