Skip to main content
. 2022 Nov 17;2:1033775. doi: 10.3389/fbinf.2022.1033775

TABLE 1.

Performance on CATH20 a .

Methods QnormTop1 QrawTop1
Combined method 62.0% ± 1.4% 73.0% ± 0.8%
ProtT5 57.5% ± 1.6% 70.9% ± 0.9%
ProtT5 BFD 54.3% ± 1.8% 70.8% ± 0.9%
ESM1b 47.9% ± 2.0% 68.5% ± 0.9%
ESM 43.5% ± 2.0% 65.2% ± 0.9%
MMseqs2 34.5% ± 1.2% 40.3% ± 1.0%
MMseqs2 E < 0.01 26.1% ± 0.9% 28.2% ± 1.0%
ProtAlbert BFD 20.2% ± 1.3% 34.7% ± 0.9%
SeqVec LSTM1 18.6% ± 1.2% 37.4% ± 0.9%
SeqVec Sum 18.2% ± 1.4% 37.5% ± 0.9%
PLUS 17.7% ± 1.3% 36.0% ± 0.9%
SeqVec LSTM2 17.6% ± 1.3% 36.7% ± 0.9%
ProtXLNet UniRef100 15.4% ± 1.2% 34.2% ± 0.9%
ProtBert BFD 12.7% ± 0.9% 21.0% ± 0.8%
UniRep 9.1% ± 0.9% 22.4% ± 0.8%
SeqVec CharCNN 2.7% ± 0.4% 4.2% ± 0.4%
AA composition 2.5% ± 0.3% 4.0% ± 0.4%
CPCProt 2.1% ± 0.4% 3.9% ± 0.4%
a

Data set: CATH20 (redundancy reduced at PIDE≤20); performance measures (columns): QrawTop1 (Eq. 1) reflected the percentage of queries for which the first hit was correct (same CATH, identifier), while QnormTop1 normalized by family size (Eq. 2); methods (rows, sorted by QnormTop1): ProtTrans (ProtT5, Prot5 BFD, ProtBert BFD, ProtAlbert BFD, ProtXLNet, UniRef100) (Elnaggar et al., 2021), ESM (Rives et al., 2021), MMseqs2 (Steinegger and Söding, 2017), SeqVec (Heinzinger et al., 2019), UniRep (Alley et al., 2019), CPCProt (Lu et al., 2020), combined method: MMseqs2 E < 0.01 + ProtT5 UniRef50; error estimates: the ± values provide the range of the 95% confidence interval corresponding to 1.96 standard errors; bold letters: highlighting the comparison between embedding-based and alignment-based lookup.