. 2024 Oct 26;40(11):btae618. doi: 10.1093/bioinformatics/btae618

Table 3.

Perplexity comparison between the protein language model (LM) ESM-2 (Lin et al. 2023), the antibody-specific LMs AntiBERTy (Ruffolo et al. 2021) and AbLang-1 (Olsen et al. 2022b), and our new selection of antibody-specific LMs (see Section 2.4).^a

	Germline residues				Nongermline residues
	Heavy		Light		Heavy			Light
	FWR	CDR1/2	FWR	CDR1/2	FWR	CDR1/2	CDR3	FWR	CDR1/2	CDR3
ESM-2	1.91	4.12	2.54	6.11	32.03	24.36	20.85	23.20	19.37	24.29
AntiBERTy	1.05	1.10	1.17	1.28	29.64	21.51	18.44	40.14	21.75	16.95
AbLang-1	1.03	1.08	1.07	1.16	25.80	17.73	14.47	52.14	25.72	16.75
Ab-Unpaired	1.02	1.07	1.01	1.05	26.81	18.95	14.42	37.60	19.37	17.25
Ab-Paired	1.02	1.06	1.02	1.05	27.24	18.70	14.23	38.95	19.25	16.98
Ab-FL	1.10	1.17	1.09	1.16	10.33	11.18	12.69	10.82	10.24	11.04
Ab-ModMask	1.11	1.18	1.09	1.17	10.26	11.13	13.18	10.78	10.19	11.42
Ab-FT	1.11	1.18	1.10	1.18	10.88	11.91	13.67	11.25	10.63	12.29
AbLang-2	1.10	1.17	1.09	1.16	9.92	11.13	12.47	10.09	9.54	10.77

While most of the models are near perfect at predicting masked germline residues, predictions for nongermline (NGL) residues show significantly higher perplexities. For ESM-2, AntiBERTy, AbLang-1, Ab-Unpaired, and Ab-Paired NGL perplexities are close to or worse than a random prediction. The largest improvement for NGL prediction came from switching to focal loss. Scaling up the model also improved performance, e.g. as seen by AbLang-2’s performances compared to Ab-FT. The best perplexity for each region is shown in bold.