Table 1: Overview of deep learning models applied to antibody prediction and design tasks.
Fv Structure prediction | |||
---|---|---|---|
Model | Architecture | Dataset | Performance (H3 RMSD) |
AbLooper [10] | Five E(n)-EGNNs | 3k structures from SAbDab | 3.20 Å |
DeepAb [11] | LSTM + residual NN | 118k sequences from OAS for LSTM, 1.7k structures from SAbDab | 3.28 Å |
IgFold [14]** | AntiBERTy + Graph transformer + IPA | 4k crystals from SAbDab, 38k structures from AF | 2.99 Å |
ABodyBuilder2 [18] | AlphaFold - Multimer | 4k crystals from SAbDab, 22k unpaired structures from AlphaFold | 2.81 Å |
tFold-Ab [16]** | AlphaFold - Multimer | 7k paired crystals, 1k heavy only, 500 light only from SAbDab | 2.74 Å |
xTrimoAbFold [17] | AlphaFold - Multimer | 18k BCR chains from PDB | 1.25 Å (for CDR3) |
RaptorX-Single [20] | Sequence embedding module + Evoformer + IPA layers | 340k structures from PDB, fine-tuned on 5k heavy and light antibody chain structures from SAbDab | 2.65 Å |
Fixed backbone sequence design | |||
Model | Architecture | Dataset | Performance (H3 AAR) |
Fv Hallucinator [55] | DeepAb | 11k immunoglobulin domains from antibody structure database AbDb/abYbank | 51% |
Representation learning | |||
Model | Architecture | Dataset | Performance (H3 AAR) |
AntiBERTy [15] | BERT | 558M non-redundant sequences from OAS | 26.0% |
AbLang [39] | RoBERTa | 14M heavy and 187k light sequences from OAS with 70% identity | 33.6% |
AntiBERTa [40] | RoBERTa | 52.89M unpaired heavy and 19.09M unpaired light chains from OAS database | - |
PARA | DeBERTa | 13.5M heavy and 4.5M light chains from OAS | 34.2% |
Sequence generation | |||
Model | Architecture | Dataset | Performance (H3 perplexity) |
ProGen2-OAS [34] | Transformer decoder | 554M sequences from OAS (clustered at 85% sequence identity) | - |
Manifold Sampler [33,35] | DAE | 20M from Pfam for DAE, 05M from Pfam for function predictor | - |
IgLM [37]** | Transformer decoder | 558M sequences at 95% sequence identity | 4.653 |
Sequence and structure co-design | |||
Model | Architecture | Dataset | Performance |
RefineGNN [58]** | GNN | 4994 antibody CDR H loops from SAbDab | H3 RMSD: 2.50 Å CDR AAR 35.37% PPL: H1: 6.09, H2: 6.58, H3: 8.38 |
AbDockGen [59] | HERN | 3k Ab-Ag complexes from SAbDab after filtering structures without antigens and removing duplicates | AAR: 34.1% Contact AAR: 20.8% Designs with improved Edesign: 11.6% |
MEAN [60] | E(3)-eGNNs | 3k complexes from SAbDab | H3 RMSD: 1.81 Å AAR: 36.77% ΔΔG: −5.33 |
HMPN [61] | E(3)-eGNNs | 50M OAS Fv sequences for pretraining | H3 RMSD: 2.38 Å H3 AAR: 31.08% H3 PPL: 6.323 |
DiffAb [70]** | DDPM | SAbDab structures higher resolution than 4A, H3 at 50% seq identity | H3 RMSD: 3.597 H3 AAR: 26.78% H3 ΔΔG: 23.63% |
DockGPT [73] | Transformer encoder/decoder | 37k single chains from BC40 dataset, 33k general protein complexes from DIPS, 3k ab-ag complexes from SAbDab with < 40% sequence identity | H3 RMSD: 1.88 Å H3 PPL: 10.68 |