TABLE 1.
Methods | QnormTop1 | QrawTop1 |
---|---|---|
Combined method | 62.0% ± 1.4% | 73.0% ± 0.8% |
ProtT5 | 57.5% ± 1.6% | 70.9% ± 0.9% |
ProtT5 BFD | 54.3% ± 1.8% | 70.8% ± 0.9% |
ESM1b | 47.9% ± 2.0% | 68.5% ± 0.9% |
ESM | 43.5% ± 2.0% | 65.2% ± 0.9% |
MMseqs2 | 34.5% ± 1.2% | 40.3% ± 1.0% |
MMseqs2 E < 0.01 | 26.1% ± 0.9% | 28.2% ± 1.0% |
ProtAlbert BFD | 20.2% ± 1.3% | 34.7% ± 0.9% |
SeqVec LSTM1 | 18.6% ± 1.2% | 37.4% ± 0.9% |
SeqVec Sum | 18.2% ± 1.4% | 37.5% ± 0.9% |
PLUS | 17.7% ± 1.3% | 36.0% ± 0.9% |
SeqVec LSTM2 | 17.6% ± 1.3% | 36.7% ± 0.9% |
ProtXLNet UniRef100 | 15.4% ± 1.2% | 34.2% ± 0.9% |
ProtBert BFD | 12.7% ± 0.9% | 21.0% ± 0.8% |
UniRep | 9.1% ± 0.9% | 22.4% ± 0.8% |
SeqVec CharCNN | 2.7% ± 0.4% | 4.2% ± 0.4% |
AA composition | 2.5% ± 0.3% | 4.0% ± 0.4% |
CPCProt | 2.1% ± 0.4% | 3.9% ± 0.4% |
Data set: CATH20 (redundancy reduced at PIDE≤20); performance measures (columns): QrawTop1 (Eq. 1) reflected the percentage of queries for which the first hit was correct (same CATH, identifier), while QnormTop1 normalized by family size (Eq. 2); methods (rows, sorted by QnormTop1): ProtTrans (ProtT5, Prot5 BFD, ProtBert BFD, ProtAlbert BFD, ProtXLNet, UniRef100) (Elnaggar et al., 2021), ESM (Rives et al., 2021), MMseqs2 (Steinegger and Söding, 2017), SeqVec (Heinzinger et al., 2019), UniRep (Alley et al., 2019), CPCProt (Lu et al., 2020), combined method: MMseqs2 E < 0.01 + ProtT5 UniRef50; error estimates: the ± values provide the range of the 95% confidence interval corresponding to 1.96 standard errors; bold letters: highlighting the comparison between embedding-based and alignment-based lookup.