Abstract
Protein–protein interactions (PPIs) are central to cellular signaling and regulation, and their dysregulation underlies many diseases. Predicting the impact of mutations on PPI stability, quantified as ΔΔG, is essential for understanding disease mechanisms and guiding protein engineering. Here, we first present MutPPI, a graph-based deep-learning model that encodes full-residue structural features of protein–protein complexes and employs a shared GIN-GAT feature extractor for wild-type and mutant complexes. MutPPI outperforms 12 existing methods on an antibody–antigen single-point mutation dataset (S645). By integrating evolutionary information from protein language models, we further develop MutPPI-plus, achieving enhanced predictive performance. Second, we proposed a mutation-path-based data augmentation strategy, which enriches input modalities and improves generalization of both MutPPI and MutPPI-plus. After data augmentation, MutPPI-plus demonstrates state-of-the-art performance on S645 and three additional multi-point mutation datasets (SM_ZEMu, SM595, SM1124), substantially surpassing DDMut-PPI. Our analyses highlight the benefits of the multimodal framework and the physically informed data augmentation method. Together, these results provide a versatile computational tool for accurate ΔΔG prediction, advancing rational protein design.
Keywords: protein–protein interaction, binding free energy change, mutation, deep learning, data augmentation
Introduction
Protein–protein interactions (PPIs) underpin a wide range of biological processes, including signal transduction and metabolic regulation, and their dysregulation is frequently implicated in disease [1]. Although many disease-associated proteins were once considered ‘undruggable’ targets, advances over the past decades have increasingly rendered them tractable to therapeutic intervention, with PPI-focused studies playing a pivotal role in driving this progress [1–3]. A key parameter describing the strength of PPIs is binding affinity, often expressed as the dissociation constant (Kd) or the binding free energy (ΔG). Mutations within protein–protein complexes can perturb ΔG, and the resulting change—denoted as ΔΔG—quantifies the effect of mutations on binding stability. Accurate prediction of ΔΔG is therefore critical for elucidating how mutations rewire protein function and interaction networks, thereby offering valuable insights into disease mechanisms and guiding both drug discovery and protein engineering [4, 5].
In recent years, a surge of pioneering studies has highlighted the rapid progress of protein design powered by deep learning, sparking growing interest in ΔΔG prediction. Breakthroughs in protein structure prediction, exemplified by AlphaFold2 [6] and RoseTTAFold [7], have ushered protein science into a new era. For instance, Watson et al. expanded upon RoseTTAFold to develop RFdiffusion, a framework that facilitates the design of monomers, binders, and symmetric oligomers [8]. Building upon these advances in protein design, predictive models of biophysical attributes (e.g. ΔG and ΔΔG) can further offer a complementary perspective, guiding mutation and optimization strategies to refine de novo protein candidates toward specific design objectives [9–14].
Current computational strategies for predicting ΔΔG can be broadly classified into physics-based simulations, traditional machine-learning approaches, and deep-learning methods [4, 15]. Physics-based simulations estimate binding free-energy changes directly from molecular mechanics force fields or statistical potentials [16–18]. For example, FoldX [16] employs an empirical force field to assess the impact of mutations on protein stability and binding affinity, offering mechanistic clarity and interpretability. However, despite their utility, physics-based simulations are typically computationally intensive and largely CPU-bound, which makes large-scale mutation screening or high-throughput applications challenging in practice. As a result, their applicability is often limited when evaluating a large number of variants or protein–protein complexes [19, 20]. Traditional machine-learning approaches instead rely on handcrafted features to train regression or classification models [21–23]. For instance, mCSM-AB [21] encodes the local structural environment of mutated residues as graph-based descriptors, which are then evaluated using Gaussian processes to predict ΔΔG. Traditional machine-learning methods, meanwhile, are constrained by their dependence on feature engineering and dataset biases, limiting their ability to generalize to unseen mutations [24]. By contrast, deep-learning approaches have emerged over the past five years as the most rapidly advancing and effective class of computational methods [25–36].
Deep-learning approaches for ΔΔG prediction can be further categorized by their model architectures, with graph neural networks (GNNs) and topological deep-learning frameworks being two prominent directions. Among GNN-based methods, Graphinity [34] leverages the equivariant GNN to capture local geometric and pairwise interaction patterns in protein–protein complexes. In contrast, topological deep-learning methods, represented by TopNetTree [25], extend beyond pairwise connectivity by integrating convolutional and gradient-boosting models to extract global, high-dimensional topological features. More recently, researchers have sought to combine protein language models or AlphaFold-generated structures with GNNs or topological approaches. For instance, the state-of-the-art method DDMut-PPI [28] embeds residue representations derived from the ProtT5 [37] language model into its graph convolutional network, while Wee et al. [32] integrated AlphaFold’s predictions with the topological deep-learning (MT-TopLap [38]) framework to extend applicability to sequence data. Nonetheless, a fundamental challenge shared by all deep-learning approaches in this domain is the limited availability of experimentally measured mutation data for protein–protein complexes.
Mutation ΔΔG data describe changes in binding free energy induced by sequence alterations. Representative datasets include SKEMPI v2.0 [39], which compiles 7085 entries of affinity changes in PPI complexes, and the AB-Bind dataset [40], which contains 1101 ΔΔG measurements across 32 antibody–antigen complexes. Overall, publicly available datasets remain limited in scale, with a pronounced scarcity of high-quality experimental ΔΔG data [34]. Issues such as sample imbalance and dataset bias further constrain deep-learning models’ training and generalization, underscoring two main challenges: (i) developing better neural architectures that reduce reliance on large training sets, and (ii) expanding available data through new collection efforts or data-augmentation [41, 42] strategies to better leverage existing resources.
To address the above challenges, we developed approaches targeting both model architecture and data augmentation. First, at the model level, we proposed a multimodal framework named MutPPI-plus for predicting mutation effects on PPIs. The backbone of MutPPI-plus, MutPPI, explored structural graphs comprising all amino acid residues as inputs and achieved an average Pearson’s correlation coefficient (PCC) of 0.663 on the S645 dataset, outperforming DDMut-PPI (PCC = 0.597) and 11 other approaches. After incorporating evolutionary information from protein language models (ESM-2 [43]), MutPPI-plus improved performance to an average PCC of 0.670 on S645. Second, at the data level, we introduced a mutation-path-based data augmentation strategy. With this approach, MutPPI-plus achieved a record-high PCC of 0.691 on S645. To provide a more rigorous assessment, we further benchmarked MutPPI-plus against existing methods on three multi-point mutation datasets (SM_ZEMu [44], SM595 [28], and SM1124 [28]) under the additivity assumption. After data augmentation, MutPPI-plus achieved average PCC of 0.863, 0.801, and 0.880 on SM_ZEMu, SM595, and SM1124, respectively—substantially outperforming DDMut-PPI [28] (0.723, 0.721, and 0.831). The final MutPPI-plus model exhibited strong performance when applied to nanobody-associated mutation data.
Materials and methods
Mutation data
In this study, we employed two single-point mutation datasets and three multiple-point mutation datasets, all derived from SKEMPI 2.0 [39]. The partitioning of training and test sets followed the same setting as DDMut-PPI [28]. Specifically, the single-point mutation dataset S4169 was used for model training, encompassing 319 distinct protein–protein complexes and including both non-immune PPIs as well as antibody–antigen interactions. Following the commonly adopted reverse mutation strategy from previous studies [28, 34], we generated an additional 4169 mirrored data points by inverting each mutant and its wild-type counterpart. This procedure not only augmented the training set but also balanced the proportion of positive and negative ΔΔG values. Consequently, the final training set comprised the S8338 dataset. The definition of ΔΔG follows the previously established principles:
![]() |
(1) |
where a positive ΔΔG indicates a destabilizing mutation (weaker binding affinity) and a negative ΔΔG indicates a stabilizing mutation (stronger binding affinity). This sign convention is used consistently across all datasets, experiments, and evaluations in the manuscript. For datasets in which binding affinities were originally reported as dissociation constants (Kd), ΔΔG values were computed using the standard thermodynamic relation:
![]() |
(2) |
with R being the gas constant and T the absolute temperature. Regarding the reverse-mutation-based data augmentation strategy, we confirm that it strictly follows the same ΔΔG sign convention. Specifically, for a mutation A→B with ΔΔG = 𝑥, the reverse mutation B→A is assigned a ΔΔG of −𝑥.
For single-point mutation ΔΔG prediction, we placed particular emphasis on assessing the model’s utility in antibody–antigen interactions. To this end, the S645 dataset was adopted as the test set. The S645 dataset, originating from the AB-Bind database, consists of 32 antibody–antigen complexes. When benchmarking against the state-of-the-art ΔΔG predictor Graphinity [34], we also followed its experimental setting and performed ten-fold cross-validation on the S645 dataset. This experiment was presented as supplementary results to further underscore the advances of the MutPPI model.
For multiple-point mutations ΔΔG prediction, we employed three test datasets from SKEMPI 2.0: SM_ZEMu, SM595, and SM1124. The SM_ZEMu dataset comprises 270 instances of multiple-point mutations, with the number of mutations per instance ranging from 2 to 11. The SM595 dataset contains 595 instances, covering mutations ranging from 4 to 14, while the SM1124 dataset includes 1124 instances, with mutation counts ranging from 2 to 3. It should be noted that in SM_ZEMu, SM595, and SM1124, there are 1, 11, and 1 entries, respectively, for which the residues in the corresponding PDB structures were renumbered. As the renumbered PDB files were not available in the literature [28], we excluded these entries (accounting for only 0.065% of the data) to ensure data integrity and the reliability of comparative analyses. Consequently, the final datasets used for model evaluation consisted of 269, 584, and 1123 instances for SM_ZEMu, SM595, and SM1124, respectively.
MutPPI: efficient encoding of structural information for predicting ΔΔG
Protein structure exerts a direct influence on its function, making structural information critical for modeling and predicting the ΔΔG of single-point mutations. At the data representation level, the three-dimensional structure of a protein–protein complex was abstracted into a residue-level graph: each amino acid residue was represented as a node encoded by a one-hot vector of its amino acid type, while spatial proximity between residues was captured by pairwise distances calculated from Cα atomic coordinates. In addition, we represent protein–protein complexes using only Cα coordinates. An edge was defined when the inter-residue distance was less than 7 Å, with its weight determined by the corresponding geometric distance. In this way, the spatial conformation of the protein–protein complex was transformed into a graph amenable to GNN. On this basis, both mutant and wild-type protein–protein complexes were mapped to the graph framework, and paired with experimentally determined ΔΔG values, allowing the model to learn the subtle effects of mutations on the stability of PPIs. Notably, the experimentally resolved wild-type complex structures were directly used to represent both wild-type and mutant structures. No energy minimization was applied to mutant structures, as our goal is to develop a fully end-to-end deep learning framework for ΔΔG prediction without relying on external physics-based preprocessing that could limit scalability and inference efficiency. Overall, this representation strategy provides a natural transition from molecular three-dimensional coordinates to a learnable graph structure.
At the model architectural level, we propose MutPPI (Fig. 1a and Fig. S1a), designed to capture differences between wild-type and mutant protein complexes and thereby predict the associated change in ΔΔG. During graph feature extraction, node embeddings were first encoded using two Graph Isomorphism Network (GIN) layers, which effectively learn local topological information. Subsequently, a Graph Attention Network (GAT) layer was introduced to perform weighted aggregation of neighboring features through a multi-head attention mechanism, allowing the model to emphasize critical residues while preserving global context. Distinct from the baseline model (Fig. S2), MutPPI applies the same GIN–GAT architecture to both wild-type and mutant structures. This design is motivated by two considerations. First, given the limited availability of ΔΔG training data, parameter sharing reduces model complexity, mitigates overfitting, and improves generalization. Second, as single-point mutations typically exert minimal changes on the protein backbone, the key differences between mutant and wild-type complexes largely stem from the altered amino acid identity at the mutation site. Employing a shared GIN-GAT feature extractor thus accentuates these local variations. The shared GIN–GAT structure encoder synergistically combines GIN’s powerful aggregation of local neighborhood composition and geometry with GAT’s adaptive attention over residues, enabling the model to selectively emphasize a small number of critical interfacial contacts that dominate mutation-induced ΔΔG changes (Table S1). At the readout stage, MutPPI applies global add pooling to compress the graph-level representation into a vector, which is further projected into a compact latent space through fully connected and normalization layers. For wild-type and mutant complexes, global embeddings are computed separately and then concatenated before being fed into a regression head to predict ΔΔG.
Figure 1.
Study design overview. (a) Protein–protein complex structures were abstracted as graph representations, and a GNN module was used to extract features from mutant and wild-type structures for ΔΔG prediction. (b) Sequence information of the mutant protein was integrated with structural features for ΔΔG prediction. (c) The ΔΔG of multiple mutations can be expressed as the sum of ΔΔG values along the mutational path of individual substitutions, or approximated by assuming additive effects. This provides a basis for extending single-mutation ΔΔG prediction models to multiple mutations. (d) Based on the reverse data augmentation strategy, a mutation-path-based data augmentation approach was further introduced for training ΔΔG prediction models.
MutPPI-plus: integrating structural and sequence information for ΔΔG prediction
Inspired by our previous work [45], we further integrated protein language models and ensemble deep learning into ΔΔG prediction. On top of MutPPI we developed an extended framework, MutPPI-plus (Fig. 1b and Fig. S1b). The hallmark of MutPPI-plus lies in combining structural graph features of protein complexes with sequence-derived evolutionary representations to more precisely model the impact of mutations on binding free energy. At its core, MutPPI-plus leverages the pretrained protein language model ESM2 to extract context-aware representations from amino acid sequences. In this study, all sequence information was derived directly from the protein chains present in the PDB complexes. No additional residues outside the resolved PDB regions were included in either training or inference. Wild-type and mutant sequences are first encoded by the same ESM2 encoder into high-dimensional hidden states, after which a residue-level difference embedding is computed to capture mutation-induced semantic perturbations. This operation directly reflects subtle sequence-level perturbation effects. The resulting difference embedding is then aggregated across multiple scales (mean, min, and max pooling) to derive global statistical features of mutational impact, which are subsequently integrated with three-dimensional structural representations extracted by the GNN. Through this fusion strategy, MutPPI-plus achieves complementary modeling of sequence semantics and structural conformations within a unified framework. The final prediction of ΔΔG can thus be presented as the average of the outputs of the three branches. In addition, the outputs of these three branches are calculated separately with the ground truth to compute the loss during training.
MutPPI-plus is originally designed for predicting ΔΔG of single-point mutations. To apply it to multi-point mutations, the additive effect is typically required. As shown in Fig. 1c, assuming that three mutations (mutation 1, mutation 2, and mutation 3) occur simultaneously from the wild-type, the corresponding ΔΔG is denoted as ΔΔG6. The ΔΔG values for individual mutations 1, 2, and 3 are denoted as ΔΔG₁, ΔΔG₂, and ΔΔG₃, respectively. The ΔΔG for introducing mutation 2 on top of mutation 1 is denoted as ΔΔG₄, and the ΔΔG for further introducing mutation 3 is denoted as ΔΔG5. The formula for the additive effect can be expressed as:
![]() |
(3) |
However, based on the state-function property of Gibbs free energy, a more accurate formula for describing ΔΔG6 is:
![]() |
(4) |
A key challenge in extending MutPPI-plus to multi-mutation ΔΔG prediction using Equation (2) lies in the absence of structural information for intermediate mutant complexes.
Data augmentation based on the mutation-path strategy
In terms of training strategy, we designed a mutation-path-based data augmentation scheme to strengthen the robustness of mutational effect representation. Specifically, rather than solely predicting ΔΔG from paired wild-type and mutant graphs, we additionally introduced the intermediate mutant state. As illustrated in Fig. 1d, the original single-point mutation corresponds to a ΔΔG3 from the wild type to the mutant. Building on this, we inserted an intermediate mutant, thereby forming a mutation path: ΔΔG₁ from the wild type to the intermediate mutant, and ΔΔG₂ from the intermediate mutant to the target mutant. The relationship between ΔΔG1, ΔΔG2, and ΔΔG3 is as follows:
![]() |
(5) |
The augmentation strategy requires the model to predict ΔΔG₁ and ΔΔG₂ separately, and enforces their sum to approximate the overall ΔΔG3, with the discrepancy incorporated into the training loss. These intermediate mutants are not intended to represent physically realized mutation states, but are introduced as conceptual constructs to enforce a thermodynamic additivity constraint during training.
Intermediate mutants can, in principle, be chosen as random variants of the same complex or even as variants from other protein–protein complexes. In this study, we adopted a simple yet effective procedure: during training, for each batch of wild-type–mutant pairs, wild-type samples were randomly shuffled to construct ‘rearranged’ protein–protein complexes, which served as intermediate mutants. This approach offers two key advantages: first, all intermediate complexes retain valid protein sequences and experimentally resolved structures drawn from the original dataset, without introducing artificial geometries or coordinates; second, the strategy is straightforward to implement and broadly applicable, acting as a regularization mechanism during training. Overall, this mutation-path-based augmentation not only enlarges the effective distribution of training samples but also introduces an additional consistency constraint, compelling the model to learn more generalizable mutation–energy relationship.
Training process
In this study, we employed multiple training strategies to optimize deep learning models. We first applied the reverse-mutation data augmentation strategy to expand the S4169 dataset, yielding the augmented S8338 training set. On this S8338 dataset, we sequentially trained the baseline model, the MutPPI model, and the MutPPI-plus model. Both the baseline and MutPPI models were trained from scratch for 1000 epochs. To assess the contribution of structural information, we evaluated three different input schemes for protein–protein complexes: all residues, interface residues only, and residues surrounding the mutation site. Both the baseline and MutPPI models were trained under these three input settings. However, training of the MutPPI-plus model was conducted exclusively under the all-residue input scheme.
It is worth noting that the MutPPI-plus model was not trained entirely from scratch. For sequence representation, we adopted the officially released pretrained weights of ESM-2 (650 M), fine-tuning only its final hidden layer during training. For structural representation, the feature extraction module (GIN-GAT) was initialized from the MutPPI model checkpoint at epoch 500, and the entire structural encoder was subsequently fine-tuned. As a result, only the fully connected layers in the final three branches of MutPPI-plus were randomly initialized. All unfrozen components were trained on the S8338 dataset for 150 epochs in total.
Finally, we introduced a mutation-path-based data augmentation strategy, under which both the MutPPI and MutPPI-plus models were further fine-tuned on the S8338 dataset. For fine-tuning, the initial weights of the MutPPI and MutPPI-plus models were taken from their original training checkpoints at the 500th and 100th epochs, respectively, and each was subsequently fine-tuned for only 20 epochs. This design ensured that the overall training duration remained comparable to the original training strategy. Additionally, the mean squared error (MSE) is used as the loss function.
To ensure a rigorous comparison across different model architectures and training strategies, all above experiments were independently repeated five times using distinct random seeds for model initialization and data loading. In addition, training details when compared with the Graphinity algorithm can be found in Supplementary Note 1.
MutPPI and MutPPI-plus are trained using the AdamW optimizer with a learning rate of 0.0005, a weight decay of 0.01, and a batch size of 32. All experiments were conducted on a single NVIDIA A100 GPU with 40 GB memory. Model performance across five random seeds is reported as mean ± standard deviation. The software environment used on our computing cluster includes Miniforge3/24.1, CUDA 11.8, cuDNN 8.4, and GCC 9.3.
Performance evaluation metrics
This study utilizes four evaluation metrics: the PCC, the Kendall correlation coefficient, the mean absolute error (MAE), and the root mean squared error (RMSE). Their computational formulas are provided below:
![]() |
(6) |
![]() |
(7) |
![]() |
(8) |
![]() |
(9) |
where
is the ith prediction,
is the ith ground truth, and n is the sample size.
Results
Encoding protein–protein complex structures for efficient prediction of ΔΔG upon single-point mutations
Protein structures exert a direct influence on binding affinity between proteins. To predict ΔΔG upon Single-Point Mutations, we first proposed a baseline model (Fig. S2), which utilized only structure information as input. As illustrated in Fig. 2a, we take the protein–protein complex with Protein Data Bank (PDB) ID 1E50 as an example to demonstrate the representation of structural information. A protein–protein complex generally consists of at least two interacting monomeric chains. Since our study first focuses on single-point mutations, we define the mutated chain as the mutant protein and all other chains without mutations as the target protein. In the example shown in Fig. 2a, chain A is designated as the mutant protein, and chain B as the target protein. To construct structural input representations, we adopted three alternative strategies: (i) inputting all residues; (ii) inputting only residues located at the binding interface (within 10 Å between the mutant and target proteins); and (iii) inputting only 20 residues proximal to the mutation site in the mutant protein together with 20 proximal residues in the target protein. For comparison, input representations for two additional mutation cases are shown in Fig. S3. For each of the three input strategies, we generated graph-based structural data as input for ΔΔG prediction. Model training was performed on the S8338 dataset, with evaluation on the S645 dataset. As shown in Fig. 2b, the baseline model exhibited substantial performance differences across the three structural input strategies. The most effective approach was the local neighborhood strategy (K-nearest residues around the mutation site, KNN), under which the baseline model initialized with five different random seeds achieved both higher and more stable predictive performance. This finding indicates that the local structural environment surrounding the mutation site exerts a direct impact on ΔΔG prediction. The training dynamics shown in Fig. S4a further corroborates this observation.
Figure 2.
Comparison of MutPPI with other models. (a) Three strategies for representing protein–protein complex structural features as graph inputs: All cα atom coordinates, interface-residue cα atom coordinates, and cα atom coordinates of residues nearest to the mutation site. (b) Performance comparison on the S645 test set across three input strategies combined with two model architectures. Each scatter point represents the model’s test performance on the S645 dataset after initializing training with different random seeds. For the same model, five distinct random seeds were used for initialization under various structural input strategies to assess the stability of model performance. (c) Key distinctions between MutPPI and the baseline model. (d) Predictive performance of MutPPI versus 12 existing methods on the S645 test set, with benchmark results taken from DDMut-PPI [28]. MutPPI was initialized and trained with five random seeds; error bars indicate standard deviation across the five trials. (e) Performance comparison between Graphinity and MutPPI on the S645 dataset using tenfold cross-validation. Error bars here represent the standard deviation across the ten folds.
We suspect that the baseline model’s unstable performance may be due to an excessive number of parameters. As shown in Fig. 2c, on the basis of the baseline model, we further introduced MutPPI (Fig. S1a) by sharing the weights of the structural feature extraction modules between wild-type and mutant complexes to reduce parameters. Across the three structural input strategies (Fig. 2b), MutPPI consistently outperformed the baseline model, achieving higher PCCs and exhibiting more stable performance across different random seeds. Notably, MutPPI achieved its best results when provided with the full set of amino acid residues, indicating that with a more effective model architecture, incorporating richer structural information leads to the best predictive accuracy. We also benchmarked MutPPI against existing ΔΔG prediction methods, including DDMut-PPI [28] and FoldX [16]. As shown in Fig. 2d, the best-performing existing method was DDMut-PPI (PCC = 0.597). By contrast, MutPPI achieved a maximum PCC of 0.676 (seed = 42), with an average performance of PCC = 0.663 across five random seeds. These results demonstrate that MutPPI effectively captures the mapping from single-point mutations and complex structures to ΔΔG values in an end-to-end fashion, without relying on any force field–based energy calculation.
To further evaluate the advantages of the MutPPI architecture, we compared it with the recently published Graphinity algorithm under the identical experimental settings reported in its original study [34]. Specifically, we conducted ten-fold cross-validation on the S645 dataset and examined four levels of train–validation redundancy thresholds. As shown in Fig. 2e, without redundancy reduction, MutPPI achieved an average PCC of 0.994 on the validation set, substantially higher than Graphinity (average PCC = 0.857). Although the performance of MutPPI decreased as the redundancy threshold became stricter, it still consistently outperformed Graphinity across all settings. At thresholds of 100, 90, and 70, MutPPI reached average PCC values of 0.640, 0.553, and 0.500, respectively, compared with 0.264, 0.185, and 0.139 for Graphinity.
A multimodal framework for predicting ΔΔG
We further tested whether incorporating evolutionary information from amino acid sequences could boost ΔΔG prediction performance (Table S2). On the basis of MutPPI, we proposed the MutPPI-plus model. As shown in Fig. 3a, MutPPI-plus achieved an average PCC of 0.670 on the S645 test dataset, representing a ~ 1.0% improvement over MutPPI (average PCC = 0.663). For individual runs, MutPPI reached its best performance with a PCC of 0.676 (seed = 42), corresponding to a Kendall’s τ of 0.619 (Fig. 3b). By contrast, MutPPI-plus obtained a maximum PCC of 0.681 (seed = 1998) and a Kendall’s τ of 0.629 (Fig. S5). Across all evaluations, MutPPI-plus consistently outperformed MutPPI.
Figure 3.
Comparison of MutPPI-plus with MutPPI. (a) Test performance of MutPPI-plus and MutPPI on the S645 dataset under five different random seeds. Error bars indicate standard deviation across the five trials. (b) Scatter plots of predicted versus observed values on the S645 dataset for DDMut-PPI and the two best-performing MutPPI and MutPPI-plus models. (c) Test performance of MutPPI and MutPPI-plus on the redundancy-reduced S415 dataset under five different random seeds. (d) Scatter plots of predicted versus observed values on the redundancy-reduced S415 dataset for DDMut-PPI and the two best-performing MutPPI and MutPPI-plus models.
It is important to note that the S645 test dataset used in the evaluation contains a certain degree of redundancy with the S8338 training set [28]. Our analysis identified 230 redundant samples within S645. To ensure fairness in evaluation, we removed these overlapping entries and constructed a redundancy-free test set, denoted as S415. Comparative results of MutPPI-plus, MutPPI, and DDMut-PPI on S415 are shown in Fig. 3c. On the S415 dataset, MutPPI-plus achieved an average PCC of 0.583, compared with 0.578 for MutPPI, corresponding to a ~ 0.8% relative improvement (Fig. 3c). As shown in Fig. 3d and Fig. S6, MutPPI still achieved its best performance with seed = 42 (Kendall’s τ = 0.497), while MutPPI-plus achieved its best performance with seed = 1998 (Kendall’s τ = 0.507). Both models consistently outperformed DDMut-PPI (Kendall’s τ = 0.420). Furthermore, comparative results on test cases involving long mutant proteins (Fig. S7 and Table S3) and under conditions of increased structural noise (Table S4) demonstrate the performance advantages of MutPPI-plus over MutPPI.
Extending ΔΔG prediction from single to multiple mutations
The relationship between the ΔΔG values of single-point and multi-point mutations can manifest as synergistic, antagonistic, or additive effects. A previous study [46] has suggested that additivity is more common than either synergy or antagonism. Figure 4a illustrates the definition of additivity effects, where the ΔΔG of a multiple-point mutation is approximately equal to the sum of the ΔΔG values of its constituent single-point mutations. To examine this phenomenon, we analyzed both single-point and multiple-point mutation data from the SKEMPI 2.0 database. We identified 807 multiple-point mutation cases in which all corresponding single-point ΔΔG values had been experimentally measured. We then assessed the extent of additivity in this dataset. As shown in Fig. 4b, the PCC between the summed ΔΔG values of single mutations and the experimentally determined multiple-point ΔΔG values was as high as 0.9034, with P ≪ .05, indicating a strong additive effect. Moreover, scatter points are color-coded by the number of mutations per sample in Fig. 4b. No systematic deviation from the identity line was observed for any mutation count, suggesting that additive effects are consistently present across multiple-point mutations of varying sizes.
Figure 4.
Results of testing single-point ΔΔG models on multiple-point mutation ΔΔG datasets. (a) Illustration of the additive effect principle. (b) Quantification of additive effects in the SKEMPI 2.0 dataset. (c) Extension of MutPPI-plus to three multiple-mutation ΔΔG datasets based on additive effects, benchmarked against DDMut-PPI and mmCSM-PPI. (d) Illustration of the mutation-path principle. (e) Extension of MutPPI-plus to three multiple-mutation ΔΔG datasets based on mutation paths, compared with results derived from additive effects.
Building on the above analysis, we next applied the additive effect assumption to extend MutPPI-plus for predicting ΔΔG values of multiple-point mutations. Here, we evaluated performance on three benchmark datasets—SM_ZEMu, SM595, and SM1124—described in detail in the Methods and Fig. S8. Comparative baselines included DDMut-PPI and mmCSM-PPI. As shown in Fig. 4c, on the SM_ZEMu dataset, mmCSM-PPI achieved the best performance among existing methods (PCC = 0.738). By contrast, MutPPI-plus reached a maximum PCC of 0.866 (seed = 2025) and an average PCC of 0.856, representing a relative improvement of up to 17.3% than mmCSM-PPI. On the SM595 dataset, DDMut-PPI achieved a PCC of 0.721, while MutPPI-plus reached a maximum PCC of 0.786 (seed = 1998), corresponding to a maximum 9.0% gain than DDMut-PPI. On the SM1124 dataset, DDMut-PPI achieved a PCC of 0.831, whereas MutPPI-plus reached an average PCC of 0.882, yielding a 6.3% improvement than DDMut-PPI. Additional comparisons of mean absolute error (MAE) and Kendall’s τ across the three datasets are presented in Fig. S9, while results under different random seeds for MutPPI-plus are shown in Fig. S10. These results establish MutPPI-plus as a state-of-the-art approach for ΔΔG prediction of multiple-point mutations.
Although extending MutPPI-plus with the additivity assumption yielded promising results for predicting ΔΔG values of multiple-point mutations (Table S5), the additive effect lacks a reliable theoretical explanation. To address this, we introduced a mutation-path-based prediction strategy. As illustrated in Fig. 4d, a multiple-point mutation can be decomposed into a sequence of single-point mutations, such that at least one mutation path exists to gradually transform the wild-type into the final multiple-points mutant. Importantly, while the sequence information of all intermediate mutants is known, their structural information is not. To enable MutPPI-plus to make path-based predictions, we assumed that the structure of each intermediate mutant complex could be approximated by that of the wild type. As shown in Fig. 4e, evaluating MutPPI-plus with the mutation-path strategy across the three multiple-mutation test datasets revealed that MAE was consistently improved compared with the additivity-based approach. However, on the SM595 dataset, the mutation-path strategy yielded a lower PCC (Fig. S11 and Fig. S12). A likely explanation is that SM595 contains a higher proportion of samples with a large number of mutated sites. Under such conditions, our assumption that intermediate mutant structures can be approximated by the wild-type structure is more likely to break down.
Data augmentation via mutation-based pathways
A major challenge in modeling ΔΔG of PPIs is the scarcity of available data. Inspired by mutation pathways, we developed a data augmentation strategy. As illustrated in Fig. 5a, conventional augmentation simply reverses the order of mutant and wild-type sequences, assigning the opposite sign to ΔΔG. In contrast, our mutation-path-based approach introduces intermediate states between the wild-type and mutant (see Methods for details). We retrained both MutPPI and MutPPI-plus models with this augmentation and evaluated them on the S645 dataset. As shown in Fig. 5b, without augmentation, the average PCCs were 0.663 for MutPPI and 0.670 for MutPPI-plus. With mutation-path-based augmentation, the PCCs improved to 0.666 and 0.678, respectively. Notably, MutPPI-plus also achieved the highest performance observed on S645 (PCC = 0.691, with random seed = 1998) under the new augmentation scheme. When using AlphaFold3-predicted complex structures instead of experimentally resolved ones on the S645 dataset, the predictive performance of MutPPI and MutPPI-plus decreased as expected, but MutPPI-plus remained consistently more accurate and exhibited only a moderate performance drop, indicating a reasonable level of robustness to structural prediction errors (Table S6).
Figure 5.
Performance evaluation with mutation-path-based data augmentation. (a) Conceptual illustration of conventional reversal-based data augmentation versus mutation-path-based data augmentation. (b) Performance of MutPPI and MutPPI-plus on the single-mutation ΔΔG dataset S645, with and without mutation-path-based data augmentation. (c) Performance of MutPPI-plus on three multiple-mutation ΔΔG datasets, with and without mutation-path-based data augmentation.
Beyond evaluating single-point mutations, we further assessed the effect of mutation-path-based data augmentation on MutPPI-plus’s performance across three multi-mutation datasets, using the additive-effect assumption for multi-point ΔΔG prediction. As shown in Fig. 5c and Fig. S13, on the SM_ZEMu dataset, MutPPI-plus achieved an average PCC of 0.856, which increased slightly to 0.863 with augmentation. On the SM595 dataset, the correlation improved from 0.773 to 0.801, corresponding to an inspiring 11.6% relative gain. On the SM1124 dataset, performance remained comparable, with correlations of 0.882 and 0.880 before and after augmentation, respectively. These results indicate that mutation-path-based augmentation can enhance model generalization to multi-point ΔΔG prediction (Fig. S14 and Table S7), particularly in cases where baseline performance is more limited.
Comparison on the nanobody database
We curated a set of nanobody data from a previous study [47] to evaluate the performance of the final MutPPI-plus model. As shown in Fig. 6a and b, the nanobody–antigen dataset comprises two examples: the VHH2–TNF-α (PDB ID: 5M2J) and 37D5–IL23 (PDB ID: 4GRW) complexes. In both complexes, several mutations were introduced into the nanobodies, and the corresponding changes in binding affinity were experimentally measured, as summarized in Table S8. We employed the final data-augmented MutPPI-plus models to predict the ΔΔG induced by these mutations. As illustrated in Fig. 6c, the ensemble model of MutPPI-plus, averaged over five random seeds, achieved a PCC of 0.824, substantially outperforming existing methods such as DDMut-PPI and mmCSM-PPI. These results demonstrate that MutPPI-plus exhibits superior predictive performance on these nanobody–antigen mutation cases.
Figure 6.
Performance comparison on the nanobody test dataset. (a) Mutation sites in the VHH2-TNF-α complex are presented. (b) Mutation sites in the 37D5-IL-23 complex are shown. (c) Comparison of MutPPI-plus with existing methods on these nanobody ΔΔG data. This test evaluates the MutPPI-plus model after data augmentation. The five scatter points represent the performance of MutPPI-plus models corresponding to five different random seeds. The MutPPI-plus bar chart displays the ensemble performance (the average of the prediction result) across these five random seeds.
Conclusion
This study comprises five components. First, we propose MutPPI, a single-point mutation ΔΔG prediction model using graph isomorphism and attention networks. MutPPI encodes protein–protein complexes as Cα-atom graphs and shares a structural encoder for wild-type and mutant forms. On the antibody–antigen dataset S645, MutPPI outperformed 12 existing models and surpassed Graphinity under tenfold cross-validation. Second, by adding evolutionary features from mutant sequences, we built MutPPI-plus, a multimodal framework. Its improved results on S645 show that multimodal integration enhances ΔΔG prediction. Third, assuming additivity in multi-point ΔΔG data, we applied MutPPI-plus to three datasets (SM_ZEMu, SM595, SM1124), where it outperformed DDMut-PPI and mmCSM-PPI. Extending it with a mutation-path method slightly lowered PCC on SM595 but improved MAE and RMSE overall. Fourth, we developed a mutation-path-based data augmentation strategy that improved both MutPPI and MutPPI-plus, enhancing model generalization and better utilizing limited ΔΔG data. Fifth, we curated a dataset of nanobody mutations, and the final MutPPI-plus model achieved strong predictive performance, demonstrating good applicability to nanobody-associated data.
Despite the above contributions, our work has limitations needing future refinement. In MutPPI-plus, mutant atomic coordinates were approximated by wild-type ones, and only backbone (Cα) atoms were modeled, omitting side-chain conformations. Incorporating accurate mutant structures or side-chain details could further improve ΔΔG prediction. Furthermore, additional candidate approaches could be explored for constructing sequence and structural features, as well as for integrating these two modalities of information. For example, within the sequence modality, PLM-interact [48] could be utilized directly to extract holistic features of PPI sequences, moving beyond the conventional approach of using protein language models to separately represent each interacting protein. Regarding the fusion of sequence and structural features, alternative strategies might also be considered [49]. One alternative way is to use residue-level representations derived from ESM-2 as node features in a GNN, which might facilitate better alignment between the two modalities. When extending single-point models to multi-point mutations, we introduced a mutation-path-based approach as an alternative to the additivity assumption. However, it reduced PCC on the SM595 dataset, implying that substituting wild-type structures for intermediate mutants may be unreliable. Explicit modeling of intermediate mutants may address this issue and enhance prediction accuracy. For data augmentation, the mutation-path strategy exploits ΔΔG’s physical rules (e.g. the state-function property of Gibbs free energy), providing more realistic constraints. Still, expanding and curating high-quality ΔΔG datasets will likely yield the greatest improvements in predictive performance.
Key Points
Representing protein–protein complexes as residue-level graphs with Cα atoms as nodes, MutPPI employs a shared GIN–GAT encoder for wild-type and mutant structures, achieving superior performance to 12 existing methods and Graphinity on the S645 dataset.
Integrating evolutionary context from the ESM-2 protein language model, the multimodal framework MutPPI-plus captures subtle sequence-level effects of mutations and attains a better predictive performance, outperforming MutPPI and DDMut-PPI.
Extending the prediction model from single to multiple mutations, MutPPI-plus generalizes well across three test datasets (SM_ZEMu, SM595, and SM1124) and introduces a mutation-path-based approach for flexible multi-mutation ΔΔG prediction.
Introducing a mutation-path-based data augmentation strategy, MutPPI-plus further enhances training diversity and achieves improved predictive performance across both single- and multi-mutation datasets.
Collecting a curated set of nanobody-associated mutation data, MutPPI-plus achieves a markedly improved predictive performance, substantially outperforming DDMut-PPI and mmCSM-PPI.
Supplementary Material
Acknowledgements
This work was supported by the National Science Foundation of China (Grant No. 62173204) and the Fundamental Research Funds for the Central Universities, China (buctrc202337).
Contributor Information
Juntao Deng, Department of Automation, Tsinghua University, Shuangqing Road 30, Haidian District, Beijing, 100084, China.
Miao Gu, Department of Automation, Tsinghua University, Shuangqing Road 30, Haidian District, Beijing, 100084, China.
Pengyan Zhang, Department of Automation, Tsinghua University, Shuangqing Road 30, Haidian District, Beijing, 100084, China.
Tao Liu, Department of Automation, Tsinghua University, Shuangqing Road 30, Haidian District, Beijing, 100084, China.
Guansong Hu, Institute for Healthcare Artificial Intelligence Application, Guangdong Second Provincial General Hospital, Xingang Middle Road 466, Haizhu District, Guangzhou, 510317, China; Department of Nuclear Medicine, Jinan University, Huangpu Avenue West 601, Tianhe District, Guangzhou, 510632, China.
Mingyu Dong, Department of Automation, Tsinghua University, Shuangqing Road 30, Haidian District, Beijing, 100084, China.
Yabin Zhang, Department of Automation, Tsinghua University, Shuangqing Road 30, Haidian District, Beijing, 100084, China.
Yizhen Song, College of Information Science & Technology, Beijing University of Chemical Technology, Beisanhuan East Road 15, Chaoyang District, Beijing, 100029, China.
Yunfan Zhang, Department of Automation, Tsinghua University, Shuangqing Road 30, Haidian District, Beijing, 100084, China.
Min Liu, Department of Automation, Tsinghua University, Shuangqing Road 30, Haidian District, Beijing, 100084, China; Institute for Healthcare Artificial Intelligence Application, Guangdong Second Provincial General Hospital, Xingang Middle Road 466, Haizhu District, Guangzhou, 510317, China.
Junzhang Tian, Institute for Healthcare Artificial Intelligence Application, Guangdong Second Provincial General Hospital, Xingang Middle Road 466, Haizhu District, Guangzhou, 510317, China; Department of Nuclear Medicine, Jinan University, Huangpu Avenue West 601, Tianhe District, Guangzhou, 510632, China.
Weibin Cheng, Institute for Healthcare Artificial Intelligence Application, Guangdong Second Provincial General Hospital, Xingang Middle Road 466, Haizhu District, Guangzhou, 510317, China; Department of Nuclear Medicine, Jinan University, Huangpu Avenue West 601, Tianhe District, Guangzhou, 510632, China.
Data availability
The mutation data for training and testing used in this study, along with the corresponding benchmark results, were obtained from https://biosig.lab.uq.edu.au/ddmut_ppi/datasets. The protein structure files for the S4169 training set and the three multi-mutation ΔΔG test datasets (SM_ZEMu, SM595, and SM1124) were downloaded from the Protein Data Bank (PDB) at https://www.rcsb.org/. The protein structure files for the S645 test dataset were sourced from https://doi.org/10.24433/CO.0537487.v1. The code repository of this study, which includes the mutation data, trained models and the evaluation pipeline, is freely available at https://github.com/ddd9898/MutPPI.
References
- 1. Nada H, Choi Y, Kim S et al. New insights into protein-protein interaction modulators in drug discovery and therapeutic advance. Signal Transduct Target Ther 2024;9:341. 10.1038/s41392-024-02036-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Wu K, Jiang H, Hicks DR et al. Design of intrinsically disordered region binding proteins. Science 2025;389:eadr8063. 10.1126/science.adr8063 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Wang ZZ, Shi XX, Huang GY et al. Fragment-based drug discovery supports drugging 'undruggable' protein-protein interactions. Trends Biochem Sci 2023;48:539–52. 10.1016/j.tibs.2023.01.008 [DOI] [PubMed] [Google Scholar]
- 4. Geneg CL, Xue LC, Roel-Touris J et al. Finding the ΔΔG spot: are predictors of binding affinity changes upon mutations in protein–protein interactions ready for it? Wires Comput Mol Sci 2019;9:5. 10.1002/wcms.1410 [DOI] [Google Scholar]
- 5. Sora V, Laspiur AO, Degn K et al. RosettaDDGPrediction for high-throughput mutational scans: from stability to binding. Protein Sci 2023;32:e4527. 10.1002/pro.4527 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Jumper J, Evans R, Pritzel A et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021;596:583–9. 10.1038/s41586-021-03819-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Baek M, DiMaio F, Anishchenko I et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 2021;373:871–6. 10.1126/science.abj8754 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Watson JL, Juergens D, Bennett NR et al. De novo design of protein structure and function with RFdiffusion. Nature 2023;620:1089–100. 10.1038/s41586-023-06415-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Lu L, Gou X, Tan SK et al. De novo design of drug-binding proteins with predictable binding energy and specificity. Science 2024;384:106–12. 10.1126/science.adl5364 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Jiang K, Yan Z, di Bernardo M et al. Rapid in silico directed evolution by a protein language model with EVOLVEpro. Science 2024;387:eadr6006. 10.1126/science.adr6006 [DOI] [PubMed] [Google Scholar]
- 11. Cai H, Zhang Z, Wang M et al. Pretrainable geometric graph neural network for antibody affinity maturation. Nat Commun 2024;15:7785. 10.1038/s41467-024-51563-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Sarma S, Herrera SM, Xiao X et al. Computational design and experimental validation of ACE2-derived peptides as SARS-CoV-2 receptor binding domain inhibitors. J Phys Chem B 2022;126:8129–39. 10.1021/acs.jpcb.2c03918 [DOI] [PubMed] [Google Scholar]
- 13. Sarma S, Catella CM, San Pedro ET et al. Design of 8-mer peptides that block Clostridioides difficile toxin a in intestinal cells. Commun Biol 2023;6:878. 10.1038/s42003-023-05242-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Wang C, Zou Q. A machine learning method for differentiating and predicting human-infective coronavirus based on physicochemical features and composition of the spike protein. Chinese J Electron 2021;30:815–23. 10.1049/cje.2021.06.003 [DOI] [Google Scholar]
- 15. Zhang Y, Dong M, Deng J et al. Graph masked self-distillation learning for prediction of mutation impact on protein-protein interactions. Commun Biol 2024;7:1400. 10.1038/s42003-024-07066-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Schymkowitz J, Borg J, Stricher F et al. The FoldX web server: an online force field. Nucleic Acids Res 2005;33:W382–8. 10.1093/nar/gki387 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Huang X, Zheng W, Pearce R et al. SSIPe: accurately estimating protein-protein binding affinity change upon mutations using evolutionary profiles in combination with an optimized physical energy function. Bioinformatics 2020;36:2429–37. 10.1093/bioinformatics/btz926 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Barlow KA, Ó Conchúir S, Thompson S et al. Flex ddG: Rosetta ensemble-based estimation of changes in protein-protein binding affinity upon mutation. J Phys Chem B 2018;122:5389–99. 10.1021/acs.jpcb.7b11367 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Gipson B, Hsu D, Kavraki LE et al. Computational models of protein kinematics and dynamics: beyond simulation. Annu Rev Anal Chem (Palo Alto Calif) 2012;5:273–91. 10.1146/annurev-anchem-062011-143024 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Gong X, Zhang Y, Chen J. Advanced sampling methods for multiscale simulation of disordered proteins and dynamic interactions. Biomolecules 2021;11:1416. 10.3390/biom11101416 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Myung Y, Pires DEV, Ascher DB. mmCSM-AB: guiding rational antibody engineering through multiple point mutations. Nucleic Acids Res 2020;48:W125–31. 10.1093/nar/gkaa389 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Rodrigues CHM, Pires DEV, Ascher DB. mmCSM-PPI: predicting the effects of multiple point mutations on protein-protein interactions. Nucleic Acids Res 2021;49:W417–24. 10.1093/nar/gkab273 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Zhang N, Chen Y, Lu H et al. MutaBind2: predicting the impacts of single and multiple mutations on protein-protein interactions. iScience 2020;23:100939. 10.1016/j.isci.2020.100939 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Tsishyn M, Pucci F, Rooman M. Quantification of biases in predictions of protein-protein binding affinity changes upon mutations. Brief Bioinform 2023;25:bbad491. 10.1093/bib/bbad491 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Wang M, Cang Z, Wei GW. A topology-based network tree for the prediction of protein-protein binding affinity changes following mutation. Nat Mach Intell 2020;2:116–23. 10.1038/s42256-020-0149-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Jin R, Ye Q, Wang J et al. AttABseq: an attention-based deep learning prediction method for antigen-antibody binding affinity changes based on protein sequences. Brief Bioinform 2024;25:bbae304. 10.1093/bib/bbae304 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Yu G, Zhao Q, Bi X et al. DDAffinity: predicting the changes in binding affinity of multiple point mutations using protein 3D structure. Bioinformatics 2024;40:i418–27. 10.1093/bioinformatics/btae232 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Zhou Y, Myung YC, Rodrigues CHM et al. DDMut-PPI: predicting effects of mutations on protein-protein interactions using graph-based deep learning. Nucleic Acids Res 2024;52:W207–14. 10.1093/nar/gkae412 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Liu X, Luo Y, Li P et al. Deep geometric representations for modeling effects of mutations on protein-protein binding affinity. PLoS Comput Biol 2021;17:e1009284. 10.1371/journal.pcbi.1009284 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Mohseni Behbahani Y, Laine E, Carbone A. Deep local analysis deconstructs protein-protein interfaces and accurately estimates binding affinity changes upon mutation. Bioinformatics 2023;39:i544–52. 10.1093/bioinformatics/btad231 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Jiang Y, Quan L, Li K et al. DGCddG: deep graph convolution for predicting protein-protein binding affinity changes upon mutations. IEEE/ACM Trans Comput Biol Bioinform 2023;20:2089–100. 10.1109/TCBB.2022.3233627 [DOI] [PubMed] [Google Scholar]
- 32. Wee J, Wei GW. Evaluation of AlphaFold 3's protein-protein complexes for predicting binding free energy changes upon mutation. J Chem Inf Model 2024;64:6676–83. 10.1021/acs.jcim.4c00976 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Zhang C, Sun Y, Hu P. An interpretable deep geometric learning model to predict the effects of mutations on protein-protein interactions using large-scale protein language model. J Chem 2025;17:35. 10.1186/s13321-025-00979-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Hummer AM, Schneider C, Chinery L et al. Investigating the volume and diversity of data needed for generalizable antibody-antigen ΔΔG prediction. Nat Comput Sci 2025;5:635–47. 10.1038/s43588-025-00823-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Ridha F, Gromiha MM. MPA-MutPred: a novel strategy for accurately predicting the binding affinity change upon mutation in membrane protein complexes. Brief Bioinform 2024;25:bbae598. 10.1093/bib/bbae598 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Yue Y, Li S, Wang L et al. MpbPPI: a multi-task pre-training-based equivariant approach for the prediction of the effect of amino acid mutations on protein-protein interactions. Brief Bioinform 2023;24:bbad310. 10.1093/bib/bbad310 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Elnaggar A, Heinzinger M, Dallago C et al. ProtTrans: towards cracking the language of Lifes code through self-supervised deep learning and high performance computing. IEEE Trans Pattern Anal Mach Intell PP 2022;44:7112–27. 10.1109/TPAMI.2021.3095381 [DOI] [PubMed] [Google Scholar]
- 38. Wee J, Chen J, Xia K et al. Integration of persistent Laplacian and pre-trained transformer for protein solubility changes upon mutation. Comput Biol Med 2024;169:107918. 10.1016/j.compbiomed.2024.107918 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Jankauskaite J, Jiménez-García B, Dapkunas J et al. SKEMPI 2.0: an updated benchmark of changes in protein-protein binding energy, kinetics and thermodynamics upon mutation. Bioinformatics 2019;35:462–9. 10.1093/bioinformatics/bty635 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Sirin S, Apgar JR, Bennett EM et al. AB-bind: antibody binding mutational database for computational affinity predictions. Protein Sci 2016;25:393–409. 10.1002/pro.2829 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Wu L, Liu Y, Lin H et al. A simple yet effective ΔΔG predictor is an unsupervised antibody optimizer and explainer. In: The Thirteenth International Conference on Learning Representations (ICLR 2025); 2025.
- 42. Barducci G, Rossi I, Codicè F et al. JanusDDG: a physics-informed neural network for sequence-based protein stability via two-fronts attention. Commun Biol 2026. 10.1038/s42003-026-09632-9 [DOI] [PubMed]
- 43. Lin Z, Akin H, Rao R et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 2023;379:1123–30. 10.1126/science.ade2574 [DOI] [PubMed] [Google Scholar]
- 44. Dourado DF, Flores SC. A multiscale approach to predicting affinity changes in protein-protein interfaces. Proteins 2014;82:2681–90. 10.1002/prot.24634 [DOI] [PubMed] [Google Scholar]
- 45. Deng J, Gu M, Zhang P et al. Nanobody–antigen interaction prediction with ensemble deep learning and prompt-based protein language models. Nat Mach Intell 2024;6:1594–604. 10.1038/s42256-024-00940-5 [DOI] [Google Scholar]
- 46. Heyne M, Shirian J, Cohen I et al. Climbing up and down binding landscapes through deep mutational scanning of three homologous protein-protein complexes. J Am Chem Soc 2021;143:17261–75. 10.1021/jacs.1c08707 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Bai Z, Wang J, Li J et al. Design of nanobody-based bispecific constructs by in silico affinity maturation and umbrella sampling simulations. Comput Struct Biotechnol J 2023;21:601–13. 10.1016/j.csbj.2022.12.021 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Liu D, Young F, Lamb KD et al. PLM-interact: extending protein language models to predict protein-protein interactions. Nat Commun 2025;16:9012. 10.1038/s41467-025-64512-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. Tao T, Zhang X, Liu Y et al. Machine learning on protein–protein interaction prediction: models, challenges and trends. Brief Bioinform 2023;24:bbad076. 10.1093/bib/bbad076 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The mutation data for training and testing used in this study, along with the corresponding benchmark results, were obtained from https://biosig.lab.uq.edu.au/ddmut_ppi/datasets. The protein structure files for the S4169 training set and the three multi-mutation ΔΔG test datasets (SM_ZEMu, SM595, and SM1124) were downloaded from the Protein Data Bank (PDB) at https://www.rcsb.org/. The protein structure files for the S645 test dataset were sourced from https://doi.org/10.24433/CO.0537487.v1. The code repository of this study, which includes the mutation data, trained models and the evaluation pipeline, is freely available at https://github.com/ddd9898/MutPPI.















