Skip to main content
Bioinformatics logoLink to Bioinformatics
. 2025 Aug 9;41(8):btaf446. doi: 10.1093/bioinformatics/btaf446

Sequence-only prediction of binding affinity changes: a robust and interpretable model for antibody engineering

Chen Liu 1, Mingchen Li 2, Yang Tan 3, Wenrui Gou 4, Guisheng Fan 5,, Bingxin Zhou 6,
Editor: Arne Elofsson
PMCID: PMC12371331  PMID: 40795828

Abstract

Motivation

A pivotal area of research in antibody engineering is to find effective modifications that enhance antibody-antigen binding affinity. Traditional wet-lab experiments assess mutants in a costly and time-consuming manner. Emerging deep learning solutions offer an alternative by modeling antibody structures to predict binding affinity changes. However, they heavily depend on high-quality complex structures, which are frequently unavailable in practice. Therefore, we propose ProtAttBA, a deep learning model that predicts binding affinity changes based solely on the sequence information of antibody-antigen complexes.

Results

ProtAttBA employs a pre-training phase to learn protein sequence patterns, following a supervised training phase using labeled antibody-antigen complex data to train a cross-attention-based regressor for predicting binding affinity changes. We evaluated ProtAttBA on three open benchmarks under different conditions. Compared to both sequence- and structure-based prediction methods, our approach achieves competitive performance, demonstrating notable robustness, especially with uncertain complex structures. Notably, our method possesses interpretability from the attention mechanism. We show that the learned attention scores can identify critical residues with impacts on binding affinity. This work introduces a rapid and cost-effective computational tool for antibody engineering, with the potential to accelerate the development of novel therapeutic antibodies.

Availability and implementation

Source codes and data are available at https://github.com/code4luck/ProtAttBA.

1 Introduction

Antibodies are vital components of the immune system. They induce responses through specific interactions with antigens characterized by binding affinities, which are central to antibody function and efficacy (Liu et al. 2021, Wang et al. 2023). Recent research highlights the effectiveness of antibody-based biotherapeutics, particularly in combating emerging infectious diseases (Zhang et al. 2021).

Despite their intrinsic ability to interact with antigens, most therapeutic antibodies are not directly derived from nature but undergo laboratory screening and optimization to enhance binding affinity and achieve the desired therapeutic efficacy (Beck et al. 2010, Brustad and Arnold 2011). This process typically requires extensive efforts in biological experiments on a massive number of antibody mutants. However, the time-consuming and labor-intensive nature of experimental determination for antibody mutants makes it infeasible to conduct exhaustive exploration (Kouba et al. 2023).

Alternatively, computational approaches provide rapid simulations and predictions of how mutations impact binding affinity. Existing assessments fall into two categories: implicit scoring and explicit scoring. Implicit scoring methods include many zero-shot protein deep models, which predict mutation effects of wild-type proteins without requiring training on labeled mutant data (Hsu et al. 2022, Li et al. 2024a). These methods commonly use self-supervised learning to derive representations from protein sequences or structures. They enable fitness scoring for mutants without additional supervised training, where the fitness score is assumed to correlate positively with various protein properties, such as enzymatic activity, binding, and stability (Zhou et al. 2024a,c; Cuturello et al. 2024). Due to their independence from prior knowledge about specific proteins or assays, these methods are considered robust and well-suited for cold-start scenarios with limited or no experimental labels. However, their performance in scoring antibody-antigen binding affinity is suboptimal, possibly because these models do not account for antigen information.

In contrast, explicit scoring methods calculate the binding affinity changes (e.g., ΔΔG) of antibody mutants relative to the wild type. Two major approaches have been developed for evaluating these changes in antibody-antigen complexes, including energy function calculations and data-driven prediction methods. Energy function-based methods leverage protein structural information, integrating molecular dynamics simulations and physical computations to evaluate complex interactions and affinities (Schymkowitz et al. 2005, Dehouck et al. 2013, Pires and Ascher 2016). These approaches offer mechanistic insights and are grounded in molecular physics, but they face limitations in processing high-throughput data and achieving high predictive accuracy. On the other hand, machine learning-based prediction methods utilize large-scale data to learn implicit patterns in protein construction and make predictions about binding affinity changes (Wang et al. 2020, Yu et al. 2024). Binding is typically considered to have a stronger correlation with protein structures (Huang et al. 2024). Consequently, various models incorporate structural information and achieve strong performance in standard evaluations on open benchmarks (Shan et al. 2022, Rana and Nguyen 2023). However, these methods heavily rely on high-quality structural data inputs. Unlike other proteins, antibodies often lack accurate structural data, with relatively low prediction accuracy and confidence for inferred structures. When only sequence information is available (a common scenario in antibody engineering), structure-based methods demonstrate limited robustness, making their predictions less reliable. Some sequence-based methods address this issue by incorporating multiple sequence alignments (MSA) to capture amino acid co-evolutionary relationships and reduce dependence on structural data (Jin et al. 2024). However, reliable antibody MSAs are often challenging to obtain, and the MSA searching during training and inference renders these models slower and unsuited for high-throughput screening (Misra et al. 2024, Tan et al. 2025).

The limitations of existing methods and the critical role of antibody engineering underscore the need for a robust and efficient tool to predict binding affinity changes in antibody-antigen complexes. Such a tool is expected to demonstrate resilience to uncertainties in input data, such as antibody structures, and efficiency in both training and inference. To address these challenges, in this study, we present ProtAttBA, a novel sequence-only method leveraging a cross-Attention mechanism for Binding Affinity change prediction. As depicted in Fig. 1, ProtAttBA consists of three key components: the embedding module, the attention module, and the prediction module. (i) The embedding module processes the wild-type and mutant sequences of both antibodies and antigens, generating residue-level latent representations using pre-trained protein language models. (ii) Next, the cross-attention module refines these latent representations by emphasizing information-rich features and maintaining contextual dependencies through feature transformation and integration. This step is pivotal for capturing the intricate interactions within antigen-antibody complexes, which form the foundation for the precise prediction of binding affinity changes. This module also provides interpretability for ProtAttBA by identifying and highlighting molecular interaction patterns that significantly influence binding affinities. (iii) The final prediction module integrates these interaction-informed features and produces binding affinity change predictions via learnable regression heads. As demonstrated in the Results section, ProtAttBA serves as a robust and interpretable solution for predicting binding affinity changes in antibody-antigen mutants, thus fulfilling the critical demands of antibody engineering.

Figure 1.

Figure 1.

Overview of the ProtAttBA architecture. The model predicts changes in antigen–antibody binding affinity (ΔΔG) by amino acid mutations. Given wild-type and mutant sequence pairs, ProtAttBA first encodes antibody and antigen sequences using a frozen pre-trained protein language model to generate contextualized residue embeddings {Habwt,Hagwt,Habmt,Hagmt}. The attention module then applies convolutional neural networks with dual multi-head cross-attention to yield refined representations {Habwt,Hagwt,Habmt,Hagmt} and the corresponding pooled feature vectors {fabwt,fagwt,fabmt,fagmt} (see Sections 2.2.1 and 2.2.2). Finally, the prediction module concatenates wild-type and mutant features and regresses the ΔΔG value.

2 Materials and methods

2.1 Datasets

Binding affinity is a biophysical parameter that quantitatively describes the strength of interaction between two proteins. It’s used to calculate the binding Gibbs free energy (ΔG), expressed as ΔG=RTln>(1/Kd>), with R representing the universal gas constant, T the absolute temperature in Kelvin, and Kd the dissociation equilibrium constant of the complex. The change in binding affinity upon mutation was calculated using the difference in free energy (ΔΔGbind=ΔGmutΔGwild). To evaluate both baseline methods and ProtAttBA on predicting affinity changes, we use three open benchmark datasets. These include AB645 (Wang et al. 2020) and S1131 (Xiong et al. 2017) consist of single-site mutations, and AB1101 (Wang et al. 2020) includes mutations across 1 to 7 residues, and the three datasets information are listed in Table 1, available as supplementary data at Bioinformatics online.

The experimental values for S1131 are sourced from the SKEMPI database (Moal and Fernández-Recio 2012), whereas those for AB645 and AB1101 originate from AB-bind (Sirin et al. 2016). SKEMPI compiles 3047 mutation-induced binding free energy changes in protein-protein heterodimeric complexes with experimentally determined structures. After redundancy removal, Xiong et al. (2017) curated S1131 with 1131 interface single-point mutations. Conversely, AB-bind includes 32 complexes with 7 to 246 variants per complex, all measured using consistent experimental techniques to minimize discrepancies caused by variations in experimental conditions. We use the processed datasets by Jin et al. (2024). All three benchmarks use ΔΔGbind as the prediction target, but they exhibit distinct characteristics—such as variations in mutation sites and label distributions. The dataset-wise distribution of ΔΔGbind are compared in Fig. 1, available as supplementary data at Bioinformatics online. This difference provides a comprehensive basis for evaluating model performance across different data patterns.

2.2 Model architecture

Figure 1 presents the architecture of ProtAttBA. It processes sequences of wild-type and mutant antigen-antibody complexes as input to predict the resulting change in binding affinity. The model’s architecture is organized into three principal modules, operating across two conceptual phases: a pre-trained representation phase, followed by a trainable interaction and prediction phase. These modules are: an embedding module for efficient sequence representation; an attention module designed to capture high-dimensional interactions within the input complex; and a prediction module that integrates features from the preceding modules to generate the final predictions. Next, we explain each of these modules in detail.

2.2.1 Embedding module

The first embedding module takes four protein sequences as input: the wild-type antibody, the wild-type antigen, the mutated antibody, and the wild-type antigen. It generates latent representations for these sequences, for which we denote as {Habwt, Hagwt, Habmt,Hagmt}, which are found by a pre-trained protein language model. Protein language models have demonstrated enhanced scalability and stability (Bai et al. 2021) when applied to protein sequences. Common protein language models include BERT-style models (Elnaggar et al. 2022, Li et al. 2024b), which are better suited for predictive tasks, and GPT-style models (Xu et al. 2024, Xiao et al. 2024), which are more appropriate for generative tasks. Here we opted for a BERT-style model, i.e., a masked language model (MLM), which learns to infer the probability distribution of amino acids at masked positions based on the surrounding context, and the protein representation can serve as high-dimensional features of the protein. In empirical evaluations, we implemented four popular open-source protein language models to extract embeddings, including ProtBert (Elnaggar et al. 2022), ESM1b (Rives et al. 2021), ESM2 (Lin et al. 2023), and Ankh (Elnaggar et al. 2023).

2.2.2 Attention module

The attention module processes the hidden representations of antibody-antigen complexes. Overall, the module projects the residue-level matrix representation of protein sequences to vector representations, i.e.

{Habwt,Hagwt,Habmt,Hagmt}{fabwt,fagwt,fabmt,fagmt}. (1)

The attention module comprises three core components: a 1D convolutional operation to capture local features by modeling interactions between sequentially connected residues, a dual multi-head cross-attention to incorporate global contextual information between antibody and antigen pairs for both wild-type and mutant sequences, and a final convolutional pooling layer to compress the matrix representation into a vector representation. By integrating these components, the attention module effectively propagates both local and global interactions, ensuring a robust and comprehensive representation of antibody-antigen complex dynamics. The following sections provide detailed descriptions of each submodule. For simplicity and clarity, in this subsection we use H without superscripts and subscripts to denote the hidden representation of an arbitrary sequence.

1D convolutional operation: The first 1D convolutional operation learns the local patterns within the input representation space. Define

H=softmax>(Conv1D(LayerNorm(H))>)LayerNorm(H), (2)

where LayerNorm(·) denotes layer normalization to ensure numerical stability, represents element-wise multiplication, softmax(·) is the softmax activation function, and HL×d is the d-dimensional embedding representation of the protein sequence with L residues derived from the pre-trained language model. The Conv1D(·) operator, with a kernel size of 1, calculates spatial weights for each position in the sequence. These weights are element-wise multiplied with the original representation, allowing for adaptive feature weighting and enhancing the model’s sensitivity to potentially significant positions within the sequence. The same operation in (2) applies to all four representations in parallel.

Dual multi-head cross-attention: This submodule implements multi-head cross-attention to model interactions between antibody-antigen complex pairs. At this stage, following the 1D convolutional operation in Equation (2), each processed representation H corresponds to an attention score matrix Attn(H). To compute the attention scores, we first define the query, key, and value matrices: QL×d, KL×d, and VL×d, which are parameterized by learnable weight matrices Wq, Wk, and Wv, respectively:

Q=RoPE(WqH),K=RoPE(WkH),V=WvH. (3)

Rotary position embedding (RoPE) (Su et al. 2024) is applied to enhance sensitivity to spatial relationships between residues. The same computations are applied in parallel to all four input embeddings.

The next step computes cross-attention for antibody-antigen pairs. The calculation is performed separately for the wild-type and mutated pairs. For each pair, a symmetric operation is applied to the respective antibody and antigen components. The overall cross-attention mechanism is defined as follows:

Attn(Hab)=softmax>(Qab(Kag)d>)Vag, (4)
Attn(Hag)=softmax>(Qag(Kab)d>)Vab. (5)

Here, d denotes the projection dimension associated with K and Q. By integrating this symmetric cross-attention mechanism, the model facilitates effective communication between antibody and antigen sequences, thus capturing interactions in antibody-antigen complexes more comprehensively. A multi-head attention is applied to capture a rich pattern representation:

Ho=Woconcat([Attn(H1),Attn(H2),,Attn(HN)]). (6)

Here, Hi denotes the vector representation of the ith attention head, and Wo represents the learnable linear projection matrix. Same as before, this procedure is applied to all four protein representations in parallel.

Convolutional pooling: The transformed representations Ho undergo a convolutional pooling and a weighted summation to derive the vector representation of each antibody or antigen

f=l=1L(softmax(WcLayerNorm(Hol))LayerNorm(Hol)), (7)

where Hol denotes the lth column in Ho, i.e., the lth position of the protein. The same operation applies to the matrix representation of all four proteins in (6). After the final pooling step by (7), we obtain four protein-level vector representations {fabwt,fagwt,fabmt,fagmt}, which will be sent to the prediction module.

2.2.3 Prediction module

The final prediction module integrates the joint representations of wild-type and mutant complexes to predict the binding affinity changes through fully connected layers. Based on the output of the previous step, the joint vector representation f is summarized by summing the information from the wild-type and mutant complexes, f=concat(fabwt+fagwt,fabmt+fagmt). The representation f is then passed through three fully connected layers to predict ΔΔG induced by the mutation, i.e.,

y^=W3·Tanh(W2·ReLU(dropout(W1·f))), (8)

where {W1,W2,W3} are learnable parameters, Tanh(·) and ReLU(·) are activation functions, dropout(·) denotes the dropout operations, and y^ is the final numerical prediction, which, in our case, is ΔΔG of the antigen-antibody complex before and after the mutation.

3 Results

3.1 Training and evaluation protocol

Our model was trained and evaluated on the three open benchmarks introduced in Section 2.1 Datasets. For efficiency, we opted to freeze the pre-trained embedding module. The details of the four employed pre-trained models can be found in Table 2, available as supplementary data at Bioinformatics online. Model optimization was performed using AdamW (Loshchilov and Hutter 2019) with a learning rate of 3×105. The optimization objective was to minimize the mean squared error (MSE) between the predictions and the ground truth values. Using early stopping to avoid overfitting. All experiments were conducted on a single NVIDIA RTX-3090 GPU, and the program was based on PyTorch-2.1.2.

Table 2.

Performance comparison on three open benchmarks with sequence identity split and mutation depth split.a

AB645
S1131
AB1101
AB1101-MutDepth
Model RMSE PCC ρ RMSE PCC ρ RMSE PCC ρ RMSE PCC ρ
RF Regressor 1.99 −0.05 −0.40 1.96 0.61 0.65 2.35 0.21 0.47 2.39 0.46 0.30
GB Regressor 5.10 −0.32 −0.37 2.02 0.49 0.56 3.41 −0.12 0.09 5.57 −0.01 0.02
AttABseq 1.34 0.26 0.29 2.33 0.05 0.11 2.46 0.04 −0.01 2.64 0.13 0.20
FoldX-PDB 1.61 0.41 0.35 1.89 0.64 0.61 3.86 0.41 0.48 4.23 0.34 0.27
FoldX-AF2 2.38 0.16 0.15 2.03 0.61 0.60 5.06 0.29 0.31 4.83 0.22 0.09
FoldX-ESM 1.87 0.11 0.08 2.64 0.02 0.01 4.96 0.27 0.24 4.42 0.21 0.10
DDGPred-PDB 2.26 0.08 0.01 1.51 0.76 0.76 2.73 0.13 0.37 3.01 0.50 0.33
DDGPred-AF2 3.04 −0.09 −0.25 6.60 0.50 0.59 2.85 −0.12 0.02 6.38 0.27 0.35
DDGPred-ESM 2.07 −0.04 0.01 2.11 0.51 0.54 2.82 −0.02 0.07 6.60 0.26 0.26
ProtAttBA-ESM2 1.44 0.41 0.49 1.70 0.72 0.77 2.11 0.43 0.39 2.10 0.55 0.45
a

Bold and underline indicate the best and second-best results, respectively.

For evaluation, we adopted a comprehensive assessment protocol with three split strategies: (i) K-fold cross-validation: we followed the widely-used protocol established by Jin et al. (2024), comparing average validation performance via 10-fold cross-validation for AB645 and S1131, and 5-fold cross-validation for AB1101. (ii) Sequence identity split: sequences were clustered using MMSeqs2 at 30% identity and then divided into training and test sets with an 8:2 ratio. This setup evaluates the model’s extrapolation to sequences with low homology. (iii) Mutation depth split: the AB1101 dataset was split by mutation order, using single-point mutations for training and multi-point mutations for testing, to assess the model’s ability to generalize from low- to high-order mutational effects. In the subsequent analysis, we name this test case AB1101-MutDepth. For all three split settings, we assess the model performance with root mean square error (RMSE), coefficient of determination (R2), Pearson correlation coefficient (PCC), and Spearman coefficient (ρ).

3.2 Numerical comparison with baseline models

We compared the performance of ProtAttBA with a range of sequence- and structure-based baseline models to evaluate the prediction accuracy and robustness. Sequence-based methods include: Random Forest Regressor (RF Regressor), Gradient Boosting Regressor (GB Regressor), DeepPE-PPI (Yao et al. 2019), LSTM-PHV (Tsukiyama et al. 2021), PIPR (Chen et al. 2019), TransPPI (Yang et al. 2021), and AttABseq (Jin et al. 2024). Structure-based methods include: BeAtMuSiC (Dehouck et al. 2013), FoldX (Schymkowitz et al. 2005), and DDGPred (Shan et al. 2022).

The performance comparison under the first cross-validation setup is presented in Table 1, where we report the average and standard deviation across four evaluation metrics. ProtAttBA consistently outperforms all baseline methods, particularly in predicting binding affinity changes for multi-site mutations in the AB1101 dataset, which includes both single-site and multi-site mutation cases. All four ProtAttBA variants achieve significantly better results than competing models. The strong performance across four different pre-trained protein language models (ESM2, ESM-1b, ProtBert, and Ankh), as shown in the last four rows of the table, highlights the flexibility and compatibility of our framework. Among them, the ESM2-based variant achieves the highest overall performance. This may be attributed to ESM2’s capacity to implicitly capture evolutionary patterns in protein sequences, offering insights that are typically derived from structural data without relying on potentially unreliable predicted structures.

Table 1.

Performance comparison on three open benchmarks with K-fold validation.a

AB645
S1131
AB1101
Model RMSE R2 PCC ρ RMSE R2 PCC ρ RMSE R2 PCC ρ
Sequence-based Methods
 DeepEP-PPIb 0.09 0.41 0.03 0.21 0.28 0.54
 LSTM-PHVb 0.07 0.17 0.19 0.39 0.05 0.16
 PIPRb 0.10 0.20 0.21 0.33 0.19 0.37
 TransPPIb 0.07 0.18 0.19 0.38 0.12 0.22
 AttABseqb 1.75 0.17 0.44 1.82 0.37 0.66 1.72 0.34 0.59
Structure-based Methods
 BeAtMuSiC 1.98±0.23 −0.03±0.11 0.26±0.13 0.38±0.08 2.37±0.41 0.05±0.10 0.29±0.13 0.36±0.09
 FoldX-PDB 2.51±0.78 −0.83±1.11 0.31±0.20 0.29±0.12 2.65±0.50 −0.22±0.36 0.43±0.08 0.47±0.09 3.40±0.48 −1.66±0.78 0.28±0.11 0.26±0.14
 FoldX-AF2 3.04±1.41 −2.02±3.33 0.13±0.14 0.14±0.11 3.14±0.69 −0.85±0.94 0.39±0.08 0.49±0.08 3.96±0.79 −2.65±1.27 0.14±0.07 0.08±0.05
 FoldX-ESM 3.32±1.34 −2.27±2.58 0.04±0.13 0.05±0.13 2.72±0.32 −0.28±0.12 0.08±0.07 0.09±0.07 3.89±0.96 −2.49±1.49 0.09±0.08 0.04±0.06
 DDGPred-PDB 1.69 ± 0.51 0.25 ± 0.23 0.54 ± 0.16 0.62 ± 0.11 0.95 ± 0.13 0.84 ± 0.04 0.92 ± 0.02 0.85 ± 0.02 1.79±0.16 0.28±0.03 0.59±0.02 0.53±0.02
 DDGPred-AF2 2.19±0.29 −0.34±0.33 0.21±0.13 0.23±0.14 1.63±0.13 0.52±0.14 0.76±0.07 0.60±0.08 2.37±0.11 −0.29±0.18 0.17±0.06 0.13±0.04
 DDGPred-ESM 2.01±0.48 −0.08±0.17 0.37±0.19 0.43±0.12 2.39±0.30 0.02±0.11 0.39±0.11 0.37±0.08 2.04±0.23 0.06±0.05 0.48±0.02 0.43±0.05
Ours
 ProtAttBA-ESM2 1.70 ± 0.25 0.20 ± 0.09 0.47±0.11 0.48±0.12 1.31±0.09 0.69 ± 0.09 0.84 ± 0.05 0.75±0.06 1.61 ± 0.07 0.42 ± 0.06 0.65 ± 0.04 0.63 ± 0.04
 ProtAttBA-ESM1b 1.71±0.26 0.19±0.11 0.47±0.10 0.47±0.11 1.36±0.14 0.65±0.12 0.82±0.06 0.70±0.07 1.62±0.16 0.41±0.11 0.64±0.08 0.63±0.09
 ProtAttBA-Ankh 1.72±0.27 0.18±0.13 0.46±0.11 0.47±0.10 1.29 ± 0.12 0.69±0.11 0.84±0.06 0.76±0.06 1.52 ± 0.09 0.48 ± 0.06 0.69 ± 0.04 0.66 ± 0.03
 ProtAttBA-ProtBert 1.73±0.28 0.18±0.11 0.47 ± 0.09 0.49 ± 0.13 1.37±0.13 0.64±0.16 0.81±0.10 0.71±0.09 1.62±0.07 0.41±0.07 0.65±0.05 0.62±0.04
a

Bold and underline indicate the best and second-best results, respectively.

b

Results are obtained from Jin et al. (2024).

Notably, while the two structure-based deep learning methods, FoldX and DDGPred, achieve promising performance in some cases, their effectiveness is highly sensitive to input structures. PDB structures are generally considered the most accurate, followed by predicted structures from models like AlphaFold2 (Jumper et al. 2021) and ESMFold (Townshend et al. 2019). However, most proteins lack experimentally determined crystal structures. As a result, structure prediction methods like AlphaFold2 have become mainstream in protein property prediction (Zhou et al. 2024b; Li et al. 2025). Therefore, it is crucial for structure-based methods to be robust to input quality and still provide reliable predictions when crystal structures are unavailable. In Table 1, we differentiate the performance of FoldX and DDGPred based on structure sources using the suffixes PDB/AF2/ESM, indicating crystal structures (from the dataset), AlphaFold2 (via ColabFold), and ESMFold predictions, respectively. As shown in the table, although these methods perform well on crystal structures (e.g., DDGPred-PDB ranks top on the first two datasets), their performance declines sharply with predicted structures. This observation aligns with our discussion in the introduction. That is, while structure-based models are often considered the first choice for binding-related tasks, their real-world applicability may not be as robust as sequence-based methods.

The effectiveness of ProtAttBA’s attention module (introduced in Section 2.2.2) is evaluated and reported in Fig. 2. We first directly remove attention layers and, second, replace them with a Multi-Layer Perceptron (MLP) of comparable parameter count, while keeping all other architectural components and hyperparameters unchanged. As observed in the figure, both the MLP-based version and the model with directly removed attention layers exhibited a consistent performance degradation. This indicates that the attention module is crucial for enhancing the model’s representation learning capabilities. This can be attributed to its ability to capture the complex interactions between antibodies and antigens, which are essential for predicting antibody-antigen binding affinity.

Figure 2.

Figure 2.

Ablative comparison of ProtAttBA by PCC and R2 performance on the three benchmark datasets.

We further evaluated the models’ extrapolation capabilities using the sequence identity and mutation depth splits. Table 2 presents the performance comparison under these two evaluation protocols. We exclude the comparison of R2 scores in extrapolation settings, as this metric becomes less reliable when the variance in the ground truth values increases and the prediction errors are amplified. Under the sequence identity split, most deep learning-based methods experienced a noticeable decline in performance compared to their results in the cross-validation setup shown in Table 1. FoldX exhibited relatively stable and competitive performance, which can be attributed to its physics-based energy calculations that do not directly rely on learned sequence patterns from the training data. However, its strong dependence on structural accuracy remains a limitation. As observed in Table 2, FoldX’s performance shows fluctuations when different types of structural inputs are used, such as AB645. Meanwhile, the DDGPred variants suffered a more substantial performance drop, likely due to greater structural dissimilarity between training and testing sets after stringent sequence-based clustering. This underscores its sensitivity to both training data distribution and structural input quality. In contrast, ProtAttBA maintained relatively stable and reliable performance across different splits, suggesting that our framework generalizes more effectively to unseen proteins and is more robust to distributional shifts.

In the last four columns of Table 2, we present the performance comparison under the mutation depth split on AB1101, the only dataset containing multi-site mutations. This evaluation aims to examine ProtAttBA’s capability in handling complex mutational effects and its ability to generalize from simpler to more complex mutations. Overall, baseline models did not show improved performance on high-depth mutations compared to the sequence identity split. In fact, several models experienced a significant performance drop, such as FoldX, especially when relying on less accurate predicted structures as input. In contrast, ProtAttBA consistently maintained strong predictive accuracy and outperformed all baseline methods under this more challenging extrapolation setting.

In summary, across diverse evaluation scenarios we examined, ProtAttBA consistently outperformed baseline methods. The performance variability observed in structure-dependent approaches under different structure qualities and data splits highlights the robustness and broader applicability of our sequence-only framework. These results underscore ProtAttBA’s strong generalization capability and its potential as a reliable tool to predict mutation-induced changes in binding affinity, particularly in settings where structural data is unavailable or inconsistent, or when evaluating mutations in proteins with low sequence similarity.

3.3 Model interpretability with attention scores

An advantage of ProtAttBA is its ability to provide residue-level analysis of proteins, visualizing the impact of residue mutations on prediction outcomes. We randomly selected two complexes from the AB-bind and S1131 datasets for analysis as a case study, employing visualization techniques to examine the distribution of attention weights, focusing on mutation sites.

Figure 3a–d illustrates the hydrogen bond network alterations at mutation sites for two complexes, pre- and post-mutation. Figure 3a and b depicts the arginine (R) to glutamine (Q) mutation at position 53 of the antibody chain in 1IAR, while Fig. 3c and d shows the serine (S) to alanine (A) mutation at position 91 of the antibody chain in 1DQJ. Both mutants exhibit a marked reduction in hydrogen bonds post-mutation, potentially leading to changes in antibody-antigen binding affinity. The visualization of attention weights in Fig. 3e and f reveals that ProtAttBA identifies key amino acid positions within the hydrogen bond network. For the 1IAR position 53 mutation, strong interactions are observed with positions phenylalanine-41 (F-41) and leucine-42 (L-42) of the antigen chain. Similarly, for the 1DQJ position 91 mutation, significant interactions are noted with positions arginine-21 (R-21) and glycine-22 (G-22) of the antigen chain. These findings demonstrate the reliability of the attention-enhanced model in predicting affinity changes induced by point mutations.

Figure 3.

Figure 3.

Illustrative examples of protein structure visualization for interpretability analysis. Panels a and b, respectively, depict localized structural views of the mutation site in the 1IAR (Chain A, Arginine (R) at position 53 mutated to Glutamine (Q)) before and after mutation. Panels c and d show similar localized views for the 1DQJ complex (Chain A, Serine (S) at position 91 mutated to Alanine (A)). The antigen chain is highlighted in green. Panels e and f illustrate the attention weight matrices learned by the model, corresponding to the mutations shown in panels a-b and c-d, respectively. The cooler colors (tending towards blue) indicate regions where the model assigns higher importance.

4 Conclusion

This study addresses the critical task of predicting binding affinity changes in antibody–antigen complexes upon mutation, which plays a central role in antibody engineering. We introduce ProtAttBA, a novel deep learning framework that combines pre-trained protein language models and attention mechanisms to model protein features and interaction contexts from sequences alone. Unlike traditional structure-based methods, ProtAttBA avoids reliance on structural inputs, which are often unavailable or of uncertain quality, thus enhancing its robustness and real-world applicability.

We conducted a comprehensive evaluation of ProtAttBA under various experimental settings, including standard cross-validation, sequence identity-based splits, and mutation depth-based extrapolation. While structure-based methods demonstrate strong performance when high-quality crystal structures are available, their predictive accuracy drops significantly when fed predicted structures from folding models such as AlphaFold2 or ESMFold. This sensitivity to structural inputs limits their reliability in practical scenarios where such ideal data is rarely available (especially for antibodies and protein complexes). In contrast, ProtAttBA maintained stable performance across all settings, demonstrating strong generalization capabilities even in extrapolative scenarios, such as predicting the effects of multi-site mutations or mutations in proteins with low sequence similarity to the training data. Moreover, ProtAttBA offers potential interpretability with residue-level attention scores, allowing users to identify amino acid positions with a strong influence on binding affinity changes. These attention patterns show promising alignment with known functional sites, providing mechanistic insights and enhancing the model’s transparency.

While this study emphasizes the strengths of sequence-based approaches, we do not discount the value of structural modeling. Structure-based methods remain essential, particularly for tasks where spatial configuration plays a dominant role. However, our findings highlight the need for more robust integration of structural, sequence, and evolutionary information to mitigate sensitivity to imperfect structure inputs. Future research could explore hybrid models that incorporate predicted or partial structural features, binding site annotations, or contrastive learning strategies focused on antibody–antigen interfaces to further enhance model reliability.

Supplementary Material

btaf446_Supplementary_Data

Contributor Information

Chen Liu, School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China.

Mingchen Li, School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China.

Yang Tan, School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China.

Wenrui Gou, School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China.

Guisheng Fan, School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China.

Bingxin Zhou, Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai 200240, China.

Author contributions

Chen Liu (conceptualization [equal], data curation [lead], formal analysis [equal], methodology [lead], validation [equal], visualization [equal], writing—original draft [equal]), Yang Tan (methodology [supporting], writing—review & editing [supporting]), Mingchen Li (writing—review & editing [supporting]), Wenrui Gou (data curation [equal]), Guisheng Fan (funding acquisition [supporting], supervision [supporting]), and Bingxin Zhou (conceptualization [equal], formal analysis [equal], funding acquisition [lead], investigation [lead], methodology [supporting], project administration [lead], resources [lead], supervision [lead], visualization [equal], writing—original draft [equal], writing—review & editing [lead])

Supplementary data

Supplementary data are available at Bioinformatics online.

Conflict of interest: No conflict of interest is declared.

Funding

This work was supported by grants from the National Science Foundation of China (grant number: 62302291).

Data availability

The data underlying this article are available in Github repository named ProtAttBA, at https://github.com/code4luck/ProtAttBA.

References

  1. Bai Y, Mei J, Yuille AL  et al.  Are transformers more robust than CNNs?  Adv Neural Info Proc Syst  2021;34:26831–843. [Google Scholar]
  2. Beck A, Wurch T, Bailly C  et al.  Strategies and challenges for the next generation of therapeutic antibodies. Nat Rev Immunol  2010;10:345–52. [DOI] [PubMed] [Google Scholar]
  3. Brustad EM, Arnold FH.  Optimizing non-natural protein function with directed evolution. Curr Opin Chem Biol  2011;15:201–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Chen M, Ju CJ-T, Zhou G  et al.  Multifaceted protein–protein interaction prediction based on siamese residual RCNN. Bioinformatics  2019;35:i305–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Cuturello F, Celoria M, Ansuini A  et al.  Enhancing predictions of protein stability changes induced by single mutations using MSA-based language models. Bioinformatics  2024;40:btae447. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Dehouck Y, Kwasigroch JM, Rooman M  et al.  Beatmusic: prediction of changes in protein–protein binding affinity on mutations. Nucleic Acids Res  2013;41:W333–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Elnaggar A, Essam H, Salah-Eldin W  et al. Ankh: optimized protein language model unlocks general-purpose modelling. arXiv, 2023, preprint: not peer reviewed.
  8. Elnaggar A, Heinzinger M, Dallago C  et al.  ProtTrans: toward understanding the language of life through self-supervised learning. IEEE Trans Pattern Anal Mach Intell  2022;44:7112–27. [DOI] [PubMed] [Google Scholar]
  9. Hsu C, Verkuil R, Liu J  et al. Learning inverse folding from millions of predicted structures. In: International Conference on Machine Learning. Baltimore, USA: PMLR, 2022, 8946–70.
  10. Huang J, Sun C, Li M  et al.  Structure-inclusive similarity based directed gnn: a method that can control information flow to predict drug–target binding affinity. Bioinformatics  2024;40:btae563. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Jin R, Ye Q, Wang J  et al.  AttABseq: an attention-based deep learning prediction method for antigen–antibody binding affinity changes based on protein sequences. Brief Bioinform  2024;25:bbae304. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Jumper J, Evans R, Pritzel A  et al.  Highly accurate protein structure prediction with alphafold. Nature  2021;596:583–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Kouba P, Kohout P, Haddadi F  et al.  Machine learning-guided protein engineering. ACS Catal  2023;13:13863–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Li M, Shi Y, Hu S  et al.  MVSF-AB: accurate antibody-antigen binding affinity prediction via multi-view sequence feature learning. Bioinformatics  2024. a;41:btae579. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Li M, Tan Y, Ma X  et al. ProSST: protein language modeling with quantized structure and disentangled attention. In: The Thirty-eighth Annual Conference on Neural Information Processing Systems. Vancouver, Canada, 2024. b.
  16. Li S, Tan Y, Ke S  et al. Immunogenicity prediction with dual attention enables vaccine target selection. In: The Thirteenth International Conference on Learning Representations. Singapore, 2025.
  17. Lin Z, Akin H, Rao R  et al.  Evolutionary-scale prediction of atomic-level protein structure with a language model. Science  2023;379:1123–30. [DOI] [PubMed] [Google Scholar]
  18. Liu X, Luo Y, Li P  et al.  Deep geometric representations for modeling effects of mutations on protein-protein binding affinity. PLoS Comput Biol  2021;17:e1009284. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Loshchilov I, Hutter F. Decoupled weight decay regularization. In International Conference on Learning Representations. New Orleans, United States, 2019.
  20. Misra M, Jeffy J, Liao C  et al.  Hiresist: a database of HIV-1 resistance to broadly neutralizing antibodies. Bioinformatics  2024;40:btae103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Moal IH, Fernández-Recio J.  Skempi: a structural kinetic and energetic database of mutant protein interactions and its use in empirical models. Bioinformatics  2012;28:2600–7. [DOI] [PubMed] [Google Scholar]
  22. Pires DE, Ascher DB.  MCSM-AB: a web server for predicting antibody–antigen affinity changes upon mutation with graph-based signatures. Nucleic Acids Res  2016;44:W469–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Rana MM, Nguyen DD.  Geometric graph learning to predict changes in binding free energy and protein thermodynamic stability upon mutation. J Phys Chem Lett  2023;14:10870–9.   [DOI] [PubMed] [Google Scholar]
  24. Rives A, Meier J, Sercu T  et al.  Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc Natl Acad Sci USA  2021;118:e2016239118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Schymkowitz J, Borg J, Stricher F  et al.  The foldx web server: an online force field. Nucleic Acids Res  2005;33:W382–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Shan S, Luo S, Yang Z  et al.  Deep learning guided optimization of human antibody against sars-cov-2 variants with broad neutralization. Proc Natl Acad Sci USA  2022;119:e2122954119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Sirin S, Apgar JR, Bennett EM  et al.  Ab-bind: antibody binding mutational database for computational affinity predictions. Protein Sci  2016;25:393–409. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Su J, Ahmed M, Lu Y  et al.  Roformer: enhanced transformer with rotary position embedding. Neurocomputing  2024;568:127063. [Google Scholar]
  29. Tan Y, Wang R, Wu B  et al.  From high-throughput evaluation to wet-lab studies: advancing mutation effect prediction with a retrieval-enhanced model. Bioinformatics  2025;41:i401–i409. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Townshend R, Bedi R, Suriana P  et al.  End-to-end learning on 3D protein structure for interface prediction. In: Advances in Neural Information Processing Systems. Vancouver, Canada, 2019. [Google Scholar]
  31. Tsukiyama S, Hasan MM, Fujii S  et al.  Lstm-phv: prediction of human-virus protein–protein interactions by lstm with word2vec. Brief Bioinform  2021;22:bbab228. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Wang G, Liu X, Wang K  et al.  Deep-learning-enabled protein–protein interaction analysis for prediction of SARS-CoV-2 infectivity and variant evolution. Nat Med  2023;29:2007–18. [DOI] [PubMed] [Google Scholar]
  33. Wang M, Cang Z, Wei G-W.  A topology-based network tree for the prediction of protein–protein binding affinity changes following mutation. Nat Mach Intell  2020;2:116–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Xiao Y, Sun E, Jin Y  et al. Proteingpt: multimodal llm for protein property prediction and structure understanding. In: ICLR 2025 Workshop on Machine Learning for Genomics Explorations.Singapore, 2024.
  35. Xiong P, Zhang C, Zheng W  et al.  Bindprofx: assessing mutation-induced binding affinity change by protein interface profiles with pseudo-counts. J Mol Biol  2017;429:426–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Xu X, Xu C, He W  et al.  HELM-GPT: de novo macrocyclic peptide design using generative pre-trained transformer. Bioinformatics  2024;40:btae364. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Yang X, Yang S, Lian X  et al.  Transfer learning via multi-scale convolutional neural layers for human–virus protein–protein interaction prediction. Bioinformatics  2021;37:4771–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Yao Y, Du X, Diao Y  et al.  An integration of deep learning with feature embedding for protein–protein interaction prediction. PeerJ  2019;7:e7126. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Yu G, Zhao Q, Bi X  et al.  Ddaffinity: predicting the changes in binding affinity of multiple point mutations using protein 3D structure. Bioinformatics  2024;40:i418–i427. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Zhang J, Wu Q, Liu Z  et al.  Spike-specific circulating t follicular helper cell and cross-neutralizing antibody responses in COVID-19-convalescent individuals. Nat Microbiol  2021;6:51–8. [DOI] [PubMed] [Google Scholar]
  41. Zhou B, Tan Y, Hu Y  et al.  Protein engineering in the deep learning era. mLife  2024. a;3:477–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Zhou B, Zheng L, Wu B  et al.  A conditional protein diffusion model generates artificial programmable endonuclease sequences with enhanced activity. Cell Discov  2024. b;10:95. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Zhou B, Zheng L, Wu B  et al.  Protein engineering with lightweight graph denoising neural networks. J Chem Inf Model  2024. c;64:3650–61. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

btaf446_Supplementary_Data

Data Availability Statement

The data underlying this article are available in Github repository named ProtAttBA, at https://github.com/code4luck/ProtAttBA.


Articles from Bioinformatics are provided here courtesy of Oxford University Press

RESOURCES