Skip to main content
. 2023 Aug 14;24(5):bbad289. doi: 10.1093/bib/bbad289

Table 1.

Summary of protein language models. # para: number of parameters which are only provided for deep learning models. Max len: maximum length of input sequence. Dim: latent space dimension. Size: pre-trained data size where it refers to number of sequences without specification except MSA transformer includes 26 millions of MSAs. K: thousands; M: millions; B: billions. Inline graphic: Time for the first preprint. The input data size, hidden layer dimension, and number of parameters are only provided for global models

Model Architecture Max len Dim # para Pretrained data TimeInline graphic
Source Size
Local models
Profile HMMs [37] Hidden Markov MSAs Oct 2012
EvMutation [38] Potts models MSAs Jan 2017
MSA transformer [39] Transformer 1024 768 100M UniRef50 [14] 26M Feb 2021
DeepSequence [22] VAEs MSAs Dec 2017
EVE [40] Bayesian VAEs MSAs Oct 2021
Global models
TAPE ResNet [41] ResNet 1024 256 38M Pfam [36] 31M Jun 2019
TAPE LSTM [41] LSTM 1024 2048 38M Pfam [36] 31M Jun 2019
TAPE transformer [41] Transformer 1024 512 38M Pfam [36] 31M Jun 2019
Bepler [42] LSTM 512 100 22M Pfam [36] 31M Feb 2019
UniRep [21] LSTM 512 1900 18M UniRef50 [14] 24M Mar 2019
eUniRep [43] LSTM 512 1900 18M UniRef50 [14]; MSAs 24M Jan 2020
ESM-1b [23] Transformer 1024 1280 650M UniRef50 [14] 250M Dec 2020
ESM-1v [44] Transformer 1024 1280 650M UniRef90 [14] 98M Jul 2021
ESM-IF1 [45] Transformer 512 124M UniRef50 [14]; CATH [46] 12M sequences; 16K structures Sep 2022
ProGen [47] Transformer 512 1.2B UniParc [14]; UniprotKB [14]; Pfam [36]; NCBI taxonomy [48] 281M Jul 2021
ProteinBERT [49] Transformer 1024 16M UniRef90 [14] 106M May 2021
Tranception [15] Transformer 1024 1280 700M UniRef100 [14] 250M May 2022
ESM-2 [50] Transformer 1024 5120 15B UniRef90 [14] 65M Oct 2022