Table 1.
Model | Architecture | Max len | Dim | # para | Pretrained data | Time | |
---|---|---|---|---|---|---|---|
Source | Size | ||||||
Local models | |||||||
Profile HMMs [37] | Hidden Markov | – | – | – | MSAs | – | Oct 2012 |
EvMutation [38] | Potts models | – | – | – | MSAs | – | Jan 2017 |
MSA transformer [39] | Transformer | 1024 | 768 | 100M | UniRef50 [14] | 26M | Feb 2021 |
DeepSequence [22] | VAEs | – | – | – | MSAs | – | Dec 2017 |
EVE [40] | Bayesian VAEs | – | – | – | MSAs | – | Oct 2021 |
Global models | |||||||
TAPE ResNet [41] | ResNet | 1024 | 256 | 38M | Pfam [36] | 31M | Jun 2019 |
TAPE LSTM [41] | LSTM | 1024 | 2048 | 38M | Pfam [36] | 31M | Jun 2019 |
TAPE transformer [41] | Transformer | 1024 | 512 | 38M | Pfam [36] | 31M | Jun 2019 |
Bepler [42] | LSTM | 512 | 100 | 22M | Pfam [36] | 31M | Feb 2019 |
UniRep [21] | LSTM | 512 | 1900 | 18M | UniRef50 [14] | 24M | Mar 2019 |
eUniRep [43] | LSTM | 512 | 1900 | 18M | UniRef50 [14]; MSAs | 24M | Jan 2020 |
ESM-1b [23] | Transformer | 1024 | 1280 | 650M | UniRef50 [14] | 250M | Dec 2020 |
ESM-1v [44] | Transformer | 1024 | 1280 | 650M | UniRef90 [14] | 98M | Jul 2021 |
ESM-IF1 [45] | Transformer | – | 512 | 124M | UniRef50 [14]; CATH [46] | 12M sequences; 16K structures | Sep 2022 |
ProGen [47] | Transformer | 512 | – | 1.2B | UniParc [14]; UniprotKB [14]; Pfam [36]; NCBI taxonomy [48] | 281M | Jul 2021 |
ProteinBERT [49] | Transformer | 1024 | – | 16M | UniRef90 [14] | 106M | May 2021 |
Tranception [15] | Transformer | 1024 | 1280 | 700M | UniRef100 [14] | 250M | May 2022 |
ESM-2 [50] | Transformer | 1024 | 5120 | 15B | UniRef90 [14] | 65M | Oct 2022 |