Skip to main content
. Author manuscript; available in PMC: 2024 Dec 1.
Published in final edited form as: Curr Opin Biomed Eng. 2023 May 26;28:100473. doi: 10.1016/j.cobme.2023.100473

Table 1: Overview of deep learning models applied to antibody prediction and design tasks.

Structure prediction, fixed backbone design, sequence generation, and sequence and structure co-design models are included. For brevity, performance metrics are reported for primarily CDR3 and H3. It should be noted that performance metrics are taken directly from published results, and does not guarantee that methods within a particular subheader test against the same set of antibodies.

Fv Structure prediction
Model Architecture Dataset Performance (H3 RMSD)
AbLooper [10] Five E(n)-EGNNs 3k structures from SAbDab 3.20 Å
DeepAb [11] LSTM + residual NN 118k sequences from OAS for LSTM, 1.7k structures from SAbDab 3.28 Å
IgFold [14]** AntiBERTy + Graph transformer + IPA 4k crystals from SAbDab, 38k structures from AF 2.99 Å
ABodyBuilder2 [18] AlphaFold - Multimer 4k crystals from SAbDab, 22k unpaired structures from AlphaFold 2.81 Å
tFold-Ab [16]** AlphaFold - Multimer 7k paired crystals, 1k heavy only, 500 light only from SAbDab 2.74 Å
xTrimoAbFold [17] AlphaFold - Multimer 18k BCR chains from PDB 1.25 Å (for CDR3)
RaptorX-Single [20] Sequence embedding module + Evoformer + IPA layers 340k structures from PDB, fine-tuned on 5k heavy and light antibody chain structures from SAbDab 2.65 Å
Fixed backbone sequence design
Model Architecture Dataset Performance (H3 AAR)
Fv Hallucinator [55] DeepAb 11k immunoglobulin domains from antibody structure database AbDb/abYbank 51%
Representation learning
Model Architecture Dataset Performance (H3 AAR)
AntiBERTy [15] BERT 558M non-redundant sequences from OAS 26.0%
AbLang [39] RoBERTa 14M heavy and 187k light sequences from OAS with 70% identity 33.6%
AntiBERTa [40] RoBERTa 52.89M unpaired heavy and 19.09M unpaired light chains from OAS database -
PARA DeBERTa 13.5M heavy and 4.5M light chains from OAS 34.2%
Sequence generation
Model Architecture Dataset Performance (H3 perplexity)
ProGen2-OAS [34] Transformer decoder 554M sequences from OAS (clustered at 85% sequence identity) -
Manifold Sampler [33,35] DAE 20M from Pfam for DAE, 05M from Pfam for function predictor -
IgLM [37]** Transformer decoder 558M sequences at 95% sequence identity 4.653
Sequence and structure co-design
Model Architecture Dataset Performance
RefineGNN [58]** GNN 4994 antibody CDR H loops from SAbDab H3 RMSD: 2.50 Å
CDR AAR 35.37%
PPL: H1: 6.09, H2: 6.58, H3: 8.38
AbDockGen [59] HERN 3k Ab-Ag complexes from SAbDab after filtering structures without antigens and removing duplicates AAR: 34.1%
Contact AAR: 20.8%
Designs with improved Edesign: 11.6%
MEAN [60] E(3)-eGNNs 3k complexes from SAbDab H3 RMSD: 1.81 Å
AAR: 36.77%
ΔΔG: −5.33
HMPN [61] E(3)-eGNNs 50M OAS Fv sequences for pretraining H3 RMSD: 2.38 Å
H3 AAR: 31.08%
H3 PPL: 6.323
DiffAb [70]** DDPM SAbDab structures higher resolution than 4A, H3 at 50% seq identity H3 RMSD: 3.597
H3 AAR: 26.78%
H3 ΔΔG: 23.63%
DockGPT [73] Transformer encoder/decoder 37k single chains from BC40 dataset, 33k general protein complexes from DIPS, 3k ab-ag complexes from SAbDab with < 40% sequence identity H3 RMSD: 1.88 Å
H3 PPL: 10.68