Table 1: Overview of deep learning models applied to antibody prediction and design tasks.
Structure prediction, fixed backbone design, sequence generation, and sequence and structure co-design models are included. For brevity, performance metrics are reported for primarily CDR3 and H3. It should be noted that performance metrics are taken directly from published results, and does not guarantee that methods within a particular subheader test against the same set of antibodies.
Fv Structure prediction | |||
---|---|---|---|
Model | Architecture | Dataset | Performance (H3 RMSD) |
AbLooper [10] | Five E(n)-EGNNs | 3k structures from SAbDab | 3.20 Å |
DeepAb [11] | LSTM + residual NN | 118k sequences from OAS for LSTM, 1.7k structures from SAbDab | 3.28 Å |
IgFold [14]** | AntiBERTy + Graph transformer + IPA | 4k crystals from SAbDab, 38k structures from AF | 2.99 Å |
ABodyBuilder2 [18] | AlphaFold - Multimer | 4k crystals from SAbDab, 22k unpaired structures from AlphaFold | 2.81 Å |
tFold-Ab [16]** | AlphaFold - Multimer | 7k paired crystals, 1k heavy only, 500 light only from SAbDab | 2.74 Å |
xTrimoAbFold [17] | AlphaFold - Multimer | 18k BCR chains from PDB | 1.25 Å (for CDR3) |
RaptorX-Single [20] | Sequence embedding module + Evoformer + IPA layers | 340k structures from PDB, fine-tuned on 5k heavy and light antibody chain structures from SAbDab | 2.65 Å |
Fixed backbone sequence design | |||
Model | Architecture | Dataset | Performance (H3 AAR) |
Fv Hallucinator [55] | DeepAb | 11k immunoglobulin domains from antibody structure database AbDb/abYbank | 51% |
Representation learning | |||
Model | Architecture | Dataset | Performance (H3 AAR) |
AntiBERTy [15] | BERT | 558M non-redundant sequences from OAS | 26.0% |
AbLang [39] | RoBERTa | 14M heavy and 187k light sequences from OAS with 70% identity | 33.6% |
AntiBERTa [40] | RoBERTa | 52.89M unpaired heavy and 19.09M unpaired light chains from OAS database | - |
PARA | DeBERTa | 13.5M heavy and 4.5M light chains from OAS | 34.2% |
Sequence generation | |||
Model | Architecture | Dataset | Performance (H3 perplexity) |
ProGen2-OAS [34] | Transformer decoder | 554M sequences from OAS (clustered at 85% sequence identity) | - |
Manifold Sampler [33,35] | DAE | 20M from Pfam for DAE, 05M from Pfam for function predictor | - |
IgLM [37]** | Transformer decoder | 558M sequences at 95% sequence identity | 4.653 |
Sequence and structure co-design | |||
Model | Architecture | Dataset | Performance |
RefineGNN [58]** | GNN | 4994 antibody CDR H loops from SAbDab | H3 RMSD: 2.50 Å CDR AAR 35.37% PPL: H1: 6.09, H2: 6.58, H3: 8.38 |
AbDockGen [59] | HERN | 3k Ab-Ag complexes from SAbDab after filtering structures without antigens and removing duplicates | AAR: 34.1% Contact AAR: 20.8% Designs with improved Edesign: 11.6% |
MEAN [60] | E(3)-eGNNs | 3k complexes from SAbDab | H3 RMSD: 1.81 Å AAR: 36.77% ΔΔG: −5.33 |
HMPN [61] | E(3)-eGNNs | 50M OAS Fv sequences for pretraining | H3 RMSD: 2.38 Å H3 AAR: 31.08% H3 PPL: 6.323 |
DiffAb [70]** | DDPM | SAbDab structures higher resolution than 4A, H3 at 50% seq identity | H3 RMSD: 3.597 H3 AAR: 26.78% H3 ΔΔG: 23.63% |
DockGPT [73] | Transformer encoder/decoder | 37k single chains from BC40 dataset, 33k general protein complexes from DIPS, 3k ab-ag complexes from SAbDab with < 40% sequence identity | H3 RMSD: 1.88 Å H3 PPL: 10.68 |