Skip to main content
Briefings in Bioinformatics logoLink to Briefings in Bioinformatics
. 2025 Oct 1;26(5):bbaf517. doi: 10.1093/bib/bbaf517

Dual-protein embedding-based graph model with dynamic attention for interaction prediction

Shunpeng Pang 1, Mingjian Jiang 2,, Shugang Zhang 3, Shuang Wang 4, Zhen Li 5, Jing Sun 6, Yuanyuan Zhang 7, Li Guo 8
PMCID: PMC12486248  PMID: 41031876

Abstract

Protein–protein interactions (PPIs) are fundamental to biological processes, yet experimental determination of PPIs remains costly and labor-intensive. While computational methods have emerged as promising alternatives, sequence-based approaches face critical challenges: (1) effectively capturing long-range dependencies and critical biochemical patterns in variable-length sequences, and (2) balancing computational efficiency with sensitivity to subtle residue-level interactions. Here, we present Dual Protein Embedding-based Graph Model (DPEG), which leverages dynamic graph attention networks to enable robust sequence-driven PPI prediction. Unlike structure-dependent methods, DPEG operates solely on sequence data, bypassing the need for structural or domain annotations. Specifically, we employ ESM-2 to transform sequences into residue-level graphs, preserving evolutionary and physicochemical context. To address variable sequence lengths, we design a module that can represent protein sequences of arbitrary lengths as graph networks at the amino acid level. Further, a gated attention mechanism is introduced to adaptively refining residue representations. Finally, a dynamic attention mechanism prioritizes functionally critical motifs within the graph. Evaluated on four diverse PPI datasets spanning different species and interaction types, DPEG achieves state-of-the-art performance and demonstrates strong cross-dataset generalizability. By integrating deep sequence semantics with graph-based interaction modeling, DPEG advances sequence-only PPI prediction, offering a scalable and biologically plausible framework for proteome-wide studies.

Keywords: protein–protein interactions, graph neural network, deep learning

Introduction

Protein–protein interactions (PPIs) have long been a central focus in bioinformatics research due to their crucial role in processes such as signal transduction, regulation, and binding, which provide valuable insights for drug development. Predicting these interactions involves computational methods to forecast whether the two target proteins will interact, enhancing our understanding of biological activities and supporting protein engineering efforts.

With advancements in computer and artificial intelligence technologies, the prediction of PPIs has progressed rapidly [1, 2]. Prediction of PPIs is typically categorized into three types: network-based methods, structure-based methods, and sequence-based methods. Network-based methods utilize information from protein networks to predict interactions, benefiting from the ability to consider the relationships between proteins. For example, Kovács et al. [3], Liu et al. [4], NECARE [5], Lun et al. [6], and Wang et al. [7] introduced the collective initiatives led by the International Network Medicine Consortium to evaluate the performance of 26 representative network-based methods in predicting PPIs. DSSGNN-PPI [8] introduced a hierarchical graph architecture integrating 3D structural graph learning and sequence-structure fusion through gated attention mechanisms for PPIs. However, these network-based methods have the disadvantage of often dealing with incomplete information from protein networks. Additionally, they may be limited when it comes to newly discovered proteins.

Structure-based methods utilize information from the 3D structure of proteins to predict PPIs. The advantage of this kind of approach is its ability to capture the real structure of proteins, resulting in higher prediction accuracy compared with other methodse.g. Sijbesma et al. [9], Zheng et al. [10], and Chessari et al. [11]. Bryant et al. [12] and Mischley et al. [13] applied AlphaFold2 [14] together with optimized multiple sequence alignments to predict complexes of heterodimeric proteins. Wang et al. [15] developed a motif-matching algorithm designed to identify the proteins that contain sequential or structural similar motifs with the given query motif. But these structure-based methods often require known structures of the target proteins, and the prediction process is complex and time-consuming. Due to the high cost, it is challenging to implement large-scale screening.

Sequence-based methods, on the other hand, utilize protein sequence information to predict PPIs. The advantage of this method lies in its fast computation speed and low cost. Also, it is easy to obtain sequences for unknown proteins, enabling large-scale PPI prediction and screening. Sequence-based prediction of PPIs involves the representation and information mining of protein sequences. Currently, there are numerous sequence-based prediction methods, e.g. LSTM-PHV [16] first developed the LSTM model with word2vec to predict PPIs between human and virus, DL-PPI framework [17] proposed a dual-module architecture that combines multi-scale representation learning and graph-based relational modeling to enhance PPI prediction accuracy while improving generalizability on unseen data, and PPI-Detect [18] introduced a procedure for the general-purpose numerical codification of polypeptides and utilized support vector machine model to predict PPIs. DeepFE-PPI [19] proposed a novel residue representation method named Res2vec to predict the interactions,with a maximum protein input length of 850 residues. PIPR [20] accomplished PPI prediction using a deep residual recurrent convolutional neural network to extract both local features and contextualized information, implementing zero-padding to align shorter sequences with the dataset’s maximum sequence length. Moreover, DeepTrio [21] employed masked parallel convolutional neural networks to predict interactions, though its protein input length was strictly limited to 1500 residues. However, many existing sequence-based methods still struggle to achieve high prediction accuracy while remaining constrained by strict input length limitations. These methods struggle to capture the spatial structure of proteins and the interactions between residues, thus providing incomplete reflections of protein structure and function. And some methods adopt the approach of truncating sequence when dealing with excessively large protein. As a result, the application of predictions is constrained.

Sequence-based methods are widely used for large-scale PPI prediction and screening due to their simplicity, low cost, and fast computation. However, many existing methods struggle to effectively extract the evolutionary and sequential information inherent in protein sequences. Additionally, current sequence-based approaches often face challenges when dealing with very long protein sequences due to limitations in their model architecture, which restrict input length. This constraint poses significant practical limitations in real-world applications. To address these issues, this paper proposes a novel method for PPI prediction that comprehensively characterizes protein structure and residue interactions. This method achieves PPI prediction by constructing a protein graph that reflects protein structure through the contact map. Further feature extraction is performed using graph neural networks, leading to more accurate predictions of PPIs. The innovations presented in this paper include the following:

  • (i) introduced a node feature representation method for protein graphs that selectively filters and emphasizes relevant biochemical features at each amino acid residue position, dynamically focusing on critical information within the amino acid residue;

  • (ii) proposed a simple, efficient, and highly accurate method for predicting PPIs; this prediction method is not restricted by the length of protein sequences and can be applied to predict interactions for proteins of any size; this reduces the loss of protein information caused by sequence truncation; and

  • (iii) integrated protein graphs with graph attention neural networks to further explore deep-level information of proteins, enabling more accurate information extraction from sequences.

Materials and methods

Data sets

Biological General Repository for Interaction Datasets (BioGRID) datasets. The BioGRID [22] is a publicly accessible database that compiles experimentally validated protein–protein and genetic interaction data for various organisms, thereby supporting biological research with comprehensive and reliable interaction networks. Due to the frequent use of Saccharomyces cerevisiae (yeast) and Homo sapiens (human) data for assessing the accuracy of PPI prediction models [19–21, 23, 24], we utilize the multi-validated physical interaction datasets of human and yeast from BioGRID as benchmarks for both training and evaluation. Following the processing strategy of DeepTrio, the Saccharomyces cerevisiae (yeast) dataset comprises 13 462 positive and 13 462 negative entries derived from 17 015 proteins, while the Homo sapiens dataset includes 31 164 positive and 31 164 negative entries sourced from 38 869 proteins.The protein length distributions of the two datasets are shown in Figures S1 and S2, respectively.

Saccharomyces cerevisiae core dataset. The S.cerevisiae core dataset, widely recognized as a benchmark, comprises 11 188 PPI cases, which include 5594 positive cases as proposed by Guo et al. [23] and an equal number of 5594 negative cases derived from various sources. Positive instances are sourced from the DIP database [25, 26], with proteins shorter than 50 amino acids and those with over 40% sequence identity excluded. We used the saccharomyces cerevisiae core dataset from DeepFE-PPI [19], which was derived from the Saccharomyces cerevisiae core data and contains 2438 proteins forming 5271 positive entries and 5266 negative entries, and the protein length distributions of the dataset is shown in Figure S3.

Multi-species dataset. Chen et al. [20] combined the data for Caenorhabditis elegans, Escherichia coli, and Drosophila melanogaster to create a multi-species dataset. In this dataset, proteins were pre-filtered at the dataset construction stage using different sequence identity thresholds (40%, 25%, 10%, 1%, and an unfiltered full dataset). Our five-fold cross-validation was performed on these pre-filtered subsets, without any additional sequence identity filtering during the cross-validation process. In the final dataset, the different filtered datasets contain 10 747, 12 641, 19 458, 25 916, and 32 959 positive entries, and 8065, 9819, 15 827, 22 012, and 32 959 negative entries, respectively. And the protein length distributions of the full dataset is shown in Figure S4.

Virus–human interaction dataset: the virus–human interaction dataset [27] consists of 8929 positive entries and 8929 negative entries from 12 652 proteins, and the protein length distributions of the dataset is shown in Figure S5. To train and test Dual Protein Embedding-based Graph Model (DPEG) and other baseline approaches, we utilized the virus–human interaction dataset as an independent test set in our experiments.

Structure-based dataset: after removing duplicates, the structure-based interaction dataset from method Struct2Graph [28] consists of 4853 positive entries and 4489 negative entries from 3621 proteins, and the protein length distributions of the dataset is shown in Figure S6. This dataset is primarily used to better demonstrate the performance differences between DPEG and protein structure-based PPI prediction models.

Dual-protein activity prediction model

The DPEG model is designed to predict the activity of PPIs by processing the sequences of two proteins as input. As illustrated in Fig. 1a, the framework begins by constructing a contact map from the protein sequences using the ESM-2 model. This contact map serves as the foundation for developing a protein graph that encapsulates the spatial and interaction features of the proteins.

Figure 1.

Diagram of the DPEG overarching framework, divided into three labeled subfigures (a), (b), and (c). Subfigure (a) shows a flow chart of DPEG's basic workflow, with labeled steps (protein graph construction, graph structural feature processing, PPI prediction) and two highlighted modules (CMP Block, SSSM Block) marked within the flow. Subfigure (b) presents a detailed schematic of the CMP Block, illustrating visual elements for the segmentation and assembly strategy (e.g. overlapping sub-sequence segments and assembly logic). Subfigure (c) displays a detailed diagram of the SSSM Block, with visual indicators for gating mechanisms and attention mechanisms related to protein graph node adjustments.

The overarching framework of DPEG. (a) The basic workflow of DPEG includes protein graph construction, graph structural feature processing, and final PPI prediction. The CMP Block and SSSM Block are introduced to enable contact map prediction for sequences of any length and to enhance graph node features. (b) Details of the CMP Block, which uses segmentation and assembly strategies to enable contact map prediction for proteins of any length by processing overlapping sub-sequences. (c) Details of the SSSM Block, which utilizes gating mechanisms and attention mechanisms to adjust node representations in the protein graph.

To adeptly manage protein sequences of arbitrary lengths, we introduce a component called the CMP block (Fig. 1b), which enhances the representation of the contact map. Following the construction of the protein graph, we employ the SSSM Block (Fig. 1c) to amplify relevant physicochemical properties, such as hydrophobicity and charge, while suppressing noisy features. This block simulates local interactions between amino acid residues, particularly focusing on short-range interactions found in structural motifs like Inline graphic-helices and Inline graphic-sheets.

Subsequently, we implement a dynamic graph attention mechanism that aggregates information from neighboring nodes, capturing both local and global structural patterns within the protein graphs. Finally, the output features from the two protein graphs are concatenated, and after passing through a fully connected layer, the model produces the predicted interaction scores. This end-to-end training approach enables efficient learning from the protein sequence input to the final prediction output.

Adaptive contact map prediction for arbitrary-length sequences

To predict PPIs, the initial step involves extracting structural information from the protein sequence, which can be represented by a contact map. This map, a matrix capturing pairwise interaction probabilities between residues, serves as the adjacency matrix for constructing a protein graph. Here, residues are nodes, and their interactions form edges.

We use the pretrained model ``esm2_t33_650M_UR50D" of ESM2 [29] to extract contextual representations of amino acids, which serve as the basis for contact map prediction. The model takes the raw amino acid sequence as input, processing it through 33 transformer layers to generate per-residue hidden state embeddings. For contact prediction, we leverage pairwise relationships between these embeddings—including dot products and cross-attention weights—to compute a contact probability matrix via a downstream neural network. By applying a threshold of Inline graphic to the contact probability matrix, we determine the presence or absence of interactions, yielding an interaction matrix Inline graphic with shape Inline graphic. Each element Inline graphic is defined as

graphic file with name DmEquation1.gif (1)

This approach enables adaptive handling of sequences of any length, as ESM2’s self-attention mechanism dynamically captures long-range dependencies without explicit length constraints. When constructing the protein graph, the interaction matrix C serves as the adjacency matrix for the graph, leading to the subsequent construction of the protein graph.

When using graph neural networks to extract protein features, besides constructing the edge generation, it is essential to determine the features for each node in the graph. Since the protein graph is constructed with residues as nodes, the choice of features should reflect the differences between different residues. Residues are composed of various parts, including a common carboxyl group, amino group, and central carbon atom. The primary source of differences between different residues lies in the distinct side-chain (R-group) structures. These differences encompass properties such as polarity, charge, aromaticity, and more. We selected a total of 33 features to describe residue nodes, as detailed in Table S1. For a protein graph with a sequence length of Inline graphic, the node feature matrix can be represented using a matrix of shape Inline graphic.

By inputting the adjacency matrix and node feature matrix of the protein into the graph neural network, we can obtain the corresponding feature vector representation for the protein. However, for the running device, insufficient memory may prevent the prediction of the contact map and lead to an out of memory. It is known that residues that are closer in sequence are more likely to contact each other. In the contact map matrix of a protein, the interaction probabilities are larger in the diagonal region. Therefore, if the device memory is insufficient, segmentation and assembly of the contact map can be performed for the large protein. Assuming the target large protein has a length of Inline graphic, a segment length of Inline graphic can be set, along with a step size, where several overlapping sub-sequences will be obtained. Then the contact map of each subsequence can be predicted, and after assembly, it forms the complete contact map of the protein. For overlapping regions in the contact map, the average value can be used to obtain the interaction probability. The contact map process for a large protein is illustrated in Fig. 1b. The settings of Inline graphic and step are related to the memory of the used device, which is set to Inline graphic for Inline graphic and Inline graphic for step size in our experiments.

Graph node feature representation

Following the construction of the protein graph using the CMP block, we developed a novel neural architecture to enhance node feature representation in graph-structured data. The proposed module SSSM block employs a three-component framework: (i) adaptive gating mechanisms for feature selection, (ii) GRU-inspired update dynamics for temporal state propagation, and (iii) an additive attention layer for context-aware feature refinement. Given an input feature matrix Inline graphic, where Inline graphic is the number of nodes and Inline graphic is the feature dimension (Inline graphic in this paper), SSSM block (Fig. 1c) first expands the input to a higher dimensional space through learnable projections, subsequently applying hierarchical transformations to capture multi-scale structural patterns.

The gating mechanism regulates information flow through parametric state updates. The gate vector Inline graphic is computed as

graphic file with name DmEquation2.gif (2)

where Inline graphic denotes the current node features, Inline graphic represents the previous hidden state, Inline graphic and Inline graphic are learnable parameters, and Inline graphic is the sigmoid activation function. This adaptive gating enables context-dependent feature modulation, amplifying task-relevant signals while attenuating noisy components.

The candidate hidden state Inline graphic is computed through gated integration of temporal dynamics:

graphic file with name DmEquation3.gif (3)

where Inline graphic denotes element-wise multiplication, and Inline graphic represents the updated hidden state. The GRU cell captures sequential patterns in the data, making it suitable for tasks involving dynamic or sequential node features.

The attention mechanism computes the attention score Inline graphic and the attention weights Inline graphic as follows:

graphic file with name DmEquation4.gif (4)
graphic file with name DmEquation5.gif (5)

where Inline graphic and Inline graphic are learnable parameters. The final hidden state integrates these attention weights through feature-wise scaling:

graphic file with name DmEquation6.gif (6)

Finally, the output of the SSSM block is Inline graphic. This step ensures that the model focuses on the most informative feature dimensions.

Dynamic graph attention

In graph neural networks, the classic graph attention network (GAT) relies on a static attention mechanism. Its core constraint is that the attention ranking of neighbor nodes (keys) is globally fixed for all query nodes. For instance, in a PPI network, if GAT treats Protein i as a query, the attention ranking of its interacting partners (e.g. Inline graphic) remains unchanged even when Protein k becomes the query. This rigid, query-independent ranking fails to capture the context-specific interactions inherent in PPI: different proteins often prioritize distinct interaction partners based on their functional states, yet static attention cannot model such dynamic preferences.

To address this limitation in PPI analysis, we implement multi-head Graph Attention Networks v2 a dynamic attention mechanism(GATv2) [30]. GATv2 decouples the feature transformation of query nodes and key nodes, enabling attention scores to adapt to the unique characteristics of each query. Below, we elaborate on the update rules and symbol definitions, tailored to PPI interpretation:

graphic file with name DmEquation7.gif (7)

where Inline graphic is the input node embeddings of shape, Inline graphic represents the weight matrix for transforming the embedded features of neighbor node Inline graphic in the Inline graphicth attention head, and attention coefficient of the Inline graphicth head is

graphic file with name DmEquation8.gif (8)

where Inline graphic is the transpose of a learnable attention vector Inline graphic, Inline graphic is the nonlinear activation introducing sparsity in attention and mitigating gradient vanishing, Inline graphic represents the weight matrix that transforms the embedding features of the central node Inline graphic in the Inline graphicth attention head.

By separately processing the features of different role nodes in the graph (central nodes versus neighbor nodes), and finally fusing the information of the two through the attention mechanism to capture the associations between nodes.

P‌PI prediction

We used a three-layer GATv2. The first layer employs two multi-head attention mechanisms, where Inline graphic, and the output is a 132D vector. The second and third layers use single-head attention, where Inline graphic. This designed model can capture local contacts through head-wise attention divergence, allowing it to identify subtle interactions at the atomic level. It also incorporates multi-hop message passing to effectively map allosteric pathways, enabling the model to trace complex communication networks between distant residues. Furthermore, the model leverages signed attention weights to account for electrostatic complementarity, enhancing its ability to predict functional relationships between proteins based on their electrostatic properties. These features together enable a comprehensive understanding of protein activity, improving prediction accuracy for interactions and activity.

After global mean pooling Inline graphic, interactions are predicted through:

graphic file with name DmEquation9.gif (9)
graphic file with name DmEquation10.gif (10)
graphic file with name DmEquation11.gif (11)
graphic file with name DmEquation12.gif (12)

where Inline graphic and Inline graphic. The architecture diagram of the entire above-mentioned process is shown in Fig. 2.

Figure 2.

Architectural diagram of the DPEG model.

The architecture of DPEG.

Results

We evaluate the performance of DPEG and compare it with other approaches across three distinct PPI datasets. Additionally, we assess DPEG’s performance on a multi-species dataset, where proteins are filtered based on varying sequence identity thresholds. We further validate the model’s generalization ability through testing on an independent dataset.

Comparative performance of DPEG and other approaches

The primary task of DPEG is to estimate the interaction probability between a given protein pair based on their sequences. We compare DPEG with several state-of-the-art PPI prediction methods, including SVM-AC [23], SVM-MCD [31], DPPI [24], PIPR [20], DeepFE-PPI [19], DeepTrio [21], and DeepDuo. DeepDuo is a simplified variant of DeepTrio, sharing the same learning architecture but is not trained on the single-protein dataset. These methods are evaluated on a variety of benchmark datasets.

BioGRID multi-validated physical interaction data

Using the same strategy as DeepTrio, We conduct five-fold cross-validation on the BioGRID human and yeast datasets. In this approach, we ensure each fold has a proportional representation of PPI classes (interacting pairs labeled as “1” and noninteracting pairs labeled as “0”), preserving the class distribution in both the training and test sets. We evaluate the model’s predictive performance using six quality metrics: accuracy, precision, sensitivity (recall), specificity, Matthews correlation coefficient (MCC), and F1 score. Higher values across all these metrics indicate better performance.

As shown in Tables 1 and 2, the mask multiple parallel convolutional neural networks of DeepTrio promises a remarkable performance and gets the highest scores in Specificity and Precision on both the human and yeast datasets. However, DPEG achieves superior performance in other metrics by utilizing selective SSM and a graph attention network, which can better capture the complex relationships and dynamic changes inherent in protein sequences. For instance, DPEG outperforms DeepTrio by 0.94% and 1.03% in MCC, and by 2.52% and 1.79% in sensitivity on the yeast and human datasets, respectively. Additionally, Tables S2 and S4 present the data distribution for each fold in the five-fold cross-validation process, including the size of the training and test sets, and the number of positive and negative samples. Stratified sampling was used to ensure the balance of positive and negative samples in each fold. Tables S3 and S5 show the detailed results of the test set for each fold. Combined with Tables 1 and 2, it can be observed that the performance of DPEG varies little across different folds, indicating more stable overall performance.

Table 1.

Five-fold cross-validation performance of PPI prediction on the BioGRID S.cerevisiae dataset

Methods Accuracy Precision Sensitivity Specificity MCC F1-score
DeepFE-PPRa 85.24Inline graphic0.52 85.49Inline graphic1.41 84.99Inline graphic2.77 85.49Inline graphic2.11 70.57Inline graphic1.06 85.19Inline graphic0.79
PIPRa 95.76Inline graphic0.25 94.61Inline graphic0.53 97.06Inline graphic0.41 94.47Inline graphic0.55 91.56Inline graphic0.48 95.82Inline graphic0.24
DeepDuoa 97.06Inline graphic0.28 98.06Inline graphic0.51 96.02Inline graphic0.35 98.10Inline graphic0.50 94.14Inline graphic0.57 97.02Inline graphic0.30
DeepTrioa 97.55Inline graphic0.38 98.95Inline graphic0.20 96.12Inline graphic0.74 98.98Inline graphic0.21 95.15Inline graphic0.74 97.52Inline graphic0.40
DPEGa 98.04Inline graphic0.32 97.47Inline graphic0.66 98.64Inline graphic0.19 97.44Inline graphic0.69 96.09Inline graphic0.63 98.05Inline graphic0.31

Note: We report the mean values and standard deviations for the test sets. aThose models are retrained using the same data.

Table 2.

Five-fold cross-validation performance of PPI prediction on the BioGRID H.sapiens dataset

Methods Accuracy Precision Sensitivity Specificity MCC F1-score
DeepFE-PPIa 87.66Inline graphic0.57 89.42Inline graphic1.05 85.47Inline graphic2.27 89.85Inline graphic1.4 75.44Inline graphic1.09 87.37Inline graphic0.78
PIPRa 97.6Inline graphic0.08 97.57Inline graphic0.35 97.63Inline graphic0.44 97.56Inline graphic0.36 95.2Inline graphic0.15 97.6Inline graphic0.1
DeepDuoa 98.04Inline graphic0.05 98.83Inline graphic0.28 97.23Inline graphic0.28 98.85Inline graphic0.27 96.09Inline graphic0.1 98.02Inline graphic0.05
DeepTrioa 98.12Inline graphic0.12 99.0Inline graphic0.17 97.23Inline graphic0.28 99.01Inline graphic0.17 96.26Inline graphic0.23 98.11Inline graphic0.13
DPEGa 98.64Inline graphic0.13 98.28Inline graphic0.31 99.02Inline graphic0.18 98.26Inline graphic0.32 97.29Inline graphic0.25 98.65Inline graphic0.13

Note: We report the mean values and standard deviations for the test sets. aThose models are retrained using the same data.

Saccharomyces cerevisiae core data

Using the same strategy as DeepTrio, we used the S.cerevisiae dataset from DeepFE-PPI to evaluate DPEG. The positive set in this dataset is identical to that of [32]. It should be noted that DeepTrio removed proteins longer than 1500 amino acids from this dataset. The results presented in Table 3 indicate that DeepFE-PPI demonstrates excellent performance on its own dataset, particularly in terms of accuracy and precision. However, the DPEG model outperforms DeepFE-PPI in terms of the F1-score. This indicates that DPEG has a distinct advantage in the metric that comprehensively considers precision and recall. When compared with DeepTrio, the DPEG model shows superiority in several key metrics. Although the accuracy of DeepTrio is slightly higher than that of DPEG, the difference is extremely small (only 0.04%). Moreover, DPEG has a significantly smaller standard deviation, suggesting that DPEG performs more stably across different test sets. More notably, DPEG is significantly superior to DeepTrio in terms of sensitivity and the F1-score. In terms of the F1-score, DPEG is notably higher than DeepTrio, which fully demonstrates DPEG’s excellent ability to balance precision and recall. Moreover, the data distribution for each fold in the five-fold cross-validation and the detailed test set results for each fold are presented in Tables S6 and S7, respectively.

Table 3.

Five-fold cross-validation performance of PPI prediction on the S.cerevisiae dataset

Methods Accuracy Precision Sensitivity Specificity MCC F1-score
SVM-AC 87.35Inline graphic1.38 87.82Inline graphic4.84 87.3Inline graphic5.23 87.41Inline graphic6.33 87.34Inline graphic1.33 75.09Inline graphic2.51
SVM-MCD 91.36Inline graphic0.4 91.94Inline graphic0.69 90.67Inline graphic0.77 NA 91.3Inline graphic0.73 84.21Inline graphic0.59
DeepFE-PPI 94.78Inline graphic0.61 96.45Inline graphic0.87 92.99Inline graphic0.66 NA NA 89.62Inline graphic1.23
PIPRa 92.16Inline graphic0.55 96.57Inline graphic1.22 87.46Inline graphic1.46 96.83Inline graphic1.27 91.78Inline graphic0.59 84.71Inline graphic1.1
DeepDuoa 92.26Inline graphic0.44 94.17Inline graphic0.65 90.11Inline graphic0.56 94.42Inline graphic0.56 92.09Inline graphic0.53 84.6Inline graphic0.89
DeepTrioa 92.57Inline graphic0.63 96.33Inline graphic0.88 88.53Inline graphic1.19 96.62Inline graphic0.83 92.26Inline graphic0.65 85.43Inline graphic1.22
DPEGa 92.53Inline graphic0.32 94.19Inline graphic0.50 90.66Inline graphic1.14 94.32Inline graphic0.62 85.13Inline graphic0.63 92.31Inline graphic0.42

Note: Performance values for majority of baseline approaches are obtained from Hu et al. [21], and NA denotes metrics not reported in the original studies. We report the mean values and standard deviations for the test sets. aThose models are retrained using the same data.

Multi-species (C. elegans, D. melanogaster, and E. coli) dataset

Using the same strategy as PIPR, we report the five-fold cross-validation performance of DPEG on variants of the multi-species dataset, which includes C. elegans, E. coli, and D. melanogaster. In this dataset, proteins are excluded based on varying thresholds of sequence identity (40%, 25%, 10%, or 1%). From Table 4, it is clear that DPEG outperforms PIPR in both accuracy and F1-score. Due to input constraints of the model, DeepTrio removed protein cases longer than 1500 amino acids. Consequently, we performed the same operation. However, there are some differences in the number of retained proteins as well as the counts of positive and negative samples after removal, with the specific results shown in Table 5. The table shows that DPEG performs better than DeepTrio in all metrics, except for a slightly lower precision in “any.” More detailed data distribution for each fold and additional metric values can be found in Tables S8 to S11.

Table 4.

Evaluation of PPI prediction performance from PIPR on the multispecies dataset

Seq. identity Methods Protein number Positive pairs Negative pairs Accuracy Precision Sensitivity F1-score
Any PIPR 11529 32959 32959 98.19 NA NA 98.17
DPEG 11529 32959 32959 98.54 99.4 97.66 98.52
Inline graphic 40% PIPR 9739 25916 22012 98.29 NA NA 98.28
DPEG 9739 25916 22012 98.43 99.42 97.63 98.53
Inline graphic 25% PIPR 7790 19458 15827 97.91 NA NA 98.08
DPEG 7790 19458 15827 98.38 99.42 97.63 98.52
Inline graphic 10% PIPR 5769 12641 9819 97.54 NA NA 97.79
DPEG 5769 12641 9819 97.9 99.03 96.99 98.12
Inline graphic 1% PIPR 5171 10747 8065 97.51 NA NA 97.8
DPEG 5171 10747 8065 97.74 99.03 96.99 98

Note: Performance values for majority of baseline approaches are obtained from Chen et al. [20], and NA denotes metrics not reported in the original studies. We report the mean values and standard deviations for the test sets.

Table 5.

Evaluation of PPI prediction performance from DeepTrio on the multispecies dataset

Seq. identity Methods Protein number Positive pairs Negative pairs Accuracy Precision Sensitivity F1-score
Any DeepTrio 11108 31227 30368 98.2 99.51 96.92 98.27
DPEG 11108 31228 30839 98.54 99.43 97.66 98.53
Inline graphic 40% DeepTrio 9354 24406 20461 97.83 99.23 96.77 97.64
DPEG 9354 24407 20510 98.35 99.42 97.53 98.47
Inline graphic 25% DeepTrio 7454 18193 14485 97.52 98.78 96.74 97.85
DPEG 7454 18194 14526 98.15 99.36 97.3 98.32
Inline graphic 10% DeepTrio 5478 11777 8839 97.32 98.87 96.42 97.67
DPEG 5478 11778 8869 97.88 99.11 98.84 98.13
Inline graphic 1% DeepTrio 4932 10110 7284 97.11 98.89 96.1 97.62
DPEG 4932 10111 7314 97.74 99.03 96.99 98.02

Note: Performance values for majority of baseline approaches are obtained from Hu et al. [21], and the F1 scores shown in the tables were calculated by us using the precision and sensitivity values reported in the original studies, as the original papers did not present F1 scores. We report the mean values and standard deviations for the test sets.

Tables 4 and 5 indicate that the positive and negative samples across all threshold categories (40%, 25%, 10%, 1%) are imbalanced, with positive samples consistently outnumbering negative ones. To investigate the impact of sample balancing on model performance, we employed random undersampling to equalize the number of positive and negative samples and retrained the model. The results from Table 6 show that, compared with the model trained on the imbalanced dataset, the model trained on the balanced dataset exhibited improvements of 0.14%–0.83% in accuracy and 0.15%–0.62% in precision. Notably, despite the significant difference in sample distribution, the model performance fluctuated minimally, indicating that the model is highly robust to sample imbalance.

Table 6.

The performance of DPEG on balanced samples

Seq. identity Accuracy Precision Sensitivity F1-score
Inline graphic 40% 98.57 99.65 97.48 98.55
Inline graphic 25% 98.43 99.60 97.25 98.41
Inline graphic 10% 98.42 99.57 97.26 98.40
Inline graphic 1% 98.22 99.59 96.84 98.20

Note: Using all proteins in the multispecies dataset without excluding those with sequence lengths >1500.

The comparison of accuracy between different methods with (DPEG-all) and without (DPEG-remove) long protein sequences is shown in Fig. 3. Our systematic evaluation reveals distinct performance patterns between protein interaction prediction methods. When compared under non-homology-restricted conditions (Any identity), DPEG-all outperforms PIPR with a 0.35% accuracy gain (98.54% versus 98.19%), despite utilizing identical training pairs (32 959 positive/negative). This advantage is particularly pronounced at moderate homology levels (Inline graphic25% identity), where DPEG-all achieves a 0.47% higher accuracy (98.38% versus 97.91%) while processing the same 7790 proteins. Notably, the average accuracy improvement of DPEG-all over PIPR across all thresholds reaches 0.31% (SD = 0.12%), confirming consistent performance enhancement.

Figure 3.

Line graph of four methods(`PIPR, DPEG-all, DeepTrio, DPEG-remove) accuracy (%) across sequence identity levels (from ``any" to ``less than or equal to%"). DPEG-all/DPEG-remove stay more accurate as identity drops, while DeepTrio declines sharply.

Accuracy of Methods With/Without Long Protein Sequences (>1500 Residues).

The comparison with DeepTrio highlights DPEG’s architectural strengths in low-homology regimes. At the most stringent Inline graphic1% identity threshold, DPEG-remove outperforms DeepTrio by 0.63% absolute accuracy (97.74% versus 97.11%), despite DeepTrio employing 0.41% fewer negative training pairs (7284 versus 7314). This performance gap escalates progressively with decreasing sequence identity, culminating in an average Inline graphic of 0.54% (SD = 0.12%) across all homology levels. Particularly at Inline graphic25% identity, the 0.63% accuracy difference (98.15% versus 97.52%) demonstrates DPEG-remove’s superior capability in capturing distant interaction patterns.

Additionally, the F1 scores presented in Fig. 4 further corroborate the model’s superior performance, demonstrating consistent excellence across different homology thresholds and outperforming comparative methods in balancing precision and sensitivity.

Figure 4.

Line graph displaying F1 scores (%) of four methods (PIPR, DPEG-all, DeepTrio, DPEG-remove) across different sequence identity levels (labeled ``any," ``less than or equal to 40%," ``less than or equal to 25%," ``less than or equal to 10%," and ``less than or equal to 1%"). DPEG-all and DPEG-remove retain relatively high F1 scores as sequence identity decreases, whereas DeepTrio's score declines sharply.

F1_score of Methods With/Without Long Protein Sequences (>1500 Residues).

Comparative analysis of model performance with integrated datasets

In real-world applications of machine learning and data-driven models, the availability of multiple data sources is common, and often, the most effective approach is to leverage all accessible data for model training. While previous sections of this study explored the performance of our models on individual datasets, this analysis provides only a partial understanding of the models’ capabilities. To more accurately reflect real-world practices and to comprehensively assess the potential of our model, we conducted an additional set of experiments where BioGRID S. cerevisiae and H. sapiens, S. cerevisiae, multispecies, and human virus datasets were integrated for training. The integrated datasets contain 67 651 proteins, including 78 797 positive entries and 80 249 negative entries.

As shown in Table 7, after integrating multisource data for model training, the model demonstrates favorable performance across multiple evaluation metrics. Although certain single datasets (e.g. accuracy of BioGRID S. cerevisiae and BioGRID H. sapiens) exhibit superior performance in specific indicators, the integrated dataset showcases more robust cross-distribution generalization ability in real-world scenarios with multisource data. This finding validates the effectiveness of integrating multisource data for model training to comprehensively assess model potential and align with practical applications, providing empirical support for leveraging multisource data in real-world machine learning and data-driven model applications. It highlights that fusing multisource data contributes to enhancing the model’s capability to handle complex practical scenarios. Moreover, the data distribution for each fold in the five-fold cross-validation and the detailed test set results for each fold are presented in Tables S12 and S13, respectively.

Table 7.

Model performance comparison between integrated datasets and individual datasets

Datasets Accuracy Precision Sensitivity Specificity MCC F1-score
BioGRID S.cerevisiae 98.04Inline graphic0.32 97.47Inline graphic0.66 98.64Inline graphic0.19 97.44Inline graphic0.69 96.09Inline graphic0.63 98.05Inline graphic0.31
BioGRID H.sapiens 98.64Inline graphic0.13 98.28Inline graphic0.31 99.02Inline graphic0.18 98.26Inline graphic0.32 97.29Inline graphic0.25 98.65Inline graphic0.13
S.cerevisiae 92.53Inline graphic0.32 94.19Inline graphic0.50 90.66Inline graphic1.14 94.32Inline graphic0.62 85.13Inline graphic0.63 92.31Inline graphic0.42
Multispecies (any) 98.54Inline graphic0.03 99.40Inline graphic0.14 97.66Inline graphic0.18 99.41Inline graphic0.14 97.09Inline graphic0.06 98.52Inline graphic0.03
Multispecies (40%) 98.43Inline graphic0.18 99.43Inline graphic0.14 97.65Inline graphic0.19 99.34Inline graphic0.17 96.86Inline graphic0.35 98.53Inline graphic0.17
Multispecies (25%) 98.38Inline graphic0.11 99.42Inline graphic0.09 97.63Inline graphic0.22 99.30Inline graphic0.12 96.76Inline graphic0.21 98.52Inline graphic0.10
Multispecies (10%) 97.90Inline graphic0.23 99.08Inline graphic0.23 97.18Inline graphic0.47 98.84Inline graphic0.30 95.78Inline graphic0.45 98.12Inline graphic0.21
Multispecies (1%) 97.74Inline graphic0.42 99.03Inline graphic0.41 96.99Inline graphic0.90 98.74Inline graphic0.55 95.43Inline graphic0.82 98.00Inline graphic0.38
Integrated dataset 96.94Inline graphic0.23 97.05Inline graphic0.31 96.76Inline graphic0.25 97.12Inline graphic0.31 93.88Inline graphic0.47 96.91Inline graphic0.22

Note: We report the mean values and standard deviations for the test sets.

Comparison with structure-based state-of-the-art model

We benchmark DPEG against Struct2Graph, a structure-based GNN model for PPI prediction, to highlight the methodological distinction between sequence-driven and structure-driven approaches, evaluate their performance trade-offs, and validate DPEG’s utility in low-structure data scenarios.

As shown in Fig. 5, Struct2Graph generally exhibits marginal performance advantages across metrics like Accuracy, Precision, and MCC. Yet, DPEG maintains competitive outcomes, underscoring the methodological divergence between sequence-and structure-driven paradigms. Notably, DPEG’s solid showing hints at its utility in low-structure data scenarios, where reliance on explicit structural inputs (as in Struct2Graph) may be constrained, thereby expanding the toolkit for PPI prediction across diverse data contexts.

Figure 5.

Grouped bar chart comparing Struct2Graph (blue) and DPEG (orange) performance (%) on a structure-based dataset across six metrics (Accuracy, Precision, etc.), with Struct2Graph generally higher.

Performance comparison of DPEG and Struct2Graph.

Performance evaluation using clustering-based cross-validation

We use the pretrained model ``esm2_t33_650M_UR50D" to extract embedding features of protein sequences. After standardizing the extracted embedding features, the KMeans algorithm was employed for clustering, yielding five clusters. Subsequently, Principal Component Analysis (PCA) was used to reduce the dimensionality of the extracted protein sequence embedding features to a 2D space, preserving the main distribution characteristics of the data (Fig. 6). The distribution of proteins and protein pairs within each cluster is illustrated in Fig. 7. We conducted cross-validation experiments on these five clusters and calculated the average values of each metric for each method across these five clusters. Test results on the BioGRID S.cerevisiae dataset are presented in Fig. 8, while those on the BioGRID H.sapiens dataset are shown in Fig. 9.

Figure 6.

Two-panel scatter plots (a and b) showing PCA results with axes ``pca 1" (horizontal) and ``pca 2" (vertical). Data points are colored by five clusters: Cluster 0 (blue), Cluster 1 (orange), Cluster 2 (green), Cluster 3 (red), and Cluster 4 (purple), displaying distinct distribution patterns in each panel.

PCA visualization of protein clusters. This includes (a) the BioGRID S.cerevisiae Dataset, (b) the BioGRID H.sapiens Datase.

Figure 7.

Grouped stacked bar chart displaying the distribution of proteins (left, primary vertical axis) and PPI pairs (right, secondary vertical axis) across clusters (color-coded: cluster 0 in blue, cluster 1 in orange, cluster 2 in green, cluster 3 in red, cluster 4 in purple) for two datasets: S. cerevisiae and H. sapiens.

Cluster distribution of proteins and pairs. The two leftmost bars represent the number of proteins in each cluster across the two datasets, while the right bars represent the number of PPI pairs in each cluster across the two datasets.

Figure 8.

Grouped bar chart showing performance of four PPI prediction methods (DeepFE-PPI, PIPR, DeepTrio, DPEG) on the BioGRID S. cerevisiae dataset, with scores (%) for five metrics (accuracy, precision, recall, specificity, f1_score) color-coded; DPEG achieves the highest scores in most metrics.

Performance on the BioGRID S.cerevisiae dataset.

Figure 9.

Grouped bar chart showing performance of four PPI prediction methods (DeepFE-PPI, PIPR, DeepTrio, DPEG) on the BioGRID H. sapiens dataset, with scores (%) for five metrics (accuracy, precision, recall, specificity, f1_score) color-coded.

Performance on the BioGRID H.sapiens.

On the BioGRID S.cerevisiae dataset, comparative evaluation across five clusters (489–2382 protein pairs) reveals marked performance disparities. DPEG demonstrates robust superiority with consistently high metrics, indicating excellent generalization across clusters of varying sizes. Benchmark methods, however, exhibit notable limitations: they suffer from inconsistent performance, characterized by imbalanced precision-recall tradeoffs and critical deficits in specificity, reflecting inadequate adaptability to cluster size variations.

Evaluation on the H. sapiens dataset (cluster size range: 1324–5748 pairs) confirms the robustness of DPEG, which achieves consistently high performance with balanced precision-recall and strong specificity. This stability across both BioGRID S.cerevisiae and BioGRID H. sapiens datasets highlights DPEG’s superior generalization capability despite varying cluster sizes and biological complexities. In stark contrast, benchmark methods show severe limitations in cross-species applicability: they exhibit catastrophic specificity degradation and unstable predictive behavior, failing to maintain balanced performance across diverse biological contexts, which highlights their poor generalization capability.

P‌PI prediction performance on independent test set

To further assess the model’s generalization, we evaluated DPEG across diverse scenarios. First, following the benchmark convention, we trained models on the BioGRID Homo sapiens dataset and tested them on an independent virus–human interaction dataset [27].

To further assess the model’s generalization, we systematically evaluated its performance under three scenarios, all tested on an independent virus–human interaction dataset [27]: (1) using identical training data (BioGRID H.sapiens) across different methods; (2) training a single model (DPEG) on diverse datasets; and (3) conducting a cold-start experiment where we first remove the proteins overlapping between the training set and the test set before evaluating the model. Repeat protein overlap between training and test datasets is shown in Table 8.

Table 8.

Repeat protein overlap between training and test datasets across models

Model Training dataset Repeat proteins Identity Inline graphic0.4
DPEG-bh BioGRID H.sapiens 2328 20.85%
DPEG-bhr BioGRID H.sapiens 0 4.12%
DeepTrio BioGRID H.sapiens 2328 20.85%
PIPR BioGRID H.sapiens 2328 20.85%
DeepFE-PPI BioGRID H.sapiens 2328 20.85%
DPEG-bs BioGRID S.cerevisiae 0 2.61%
DPEG-msf Multi-species (any) 3 7.11%
DPEG-sc S.cerevisiae core 0 2.32%

Note: Identity refers to the proportion of the test set that has a similarity of greater than or equal to the threshold of 0.4 with each training set, which is calculated by us using CD-HIT-2D [33].

The results presented in Fig. 10 demonstrate that, among the methods trained on the BioGRID H.sapiens dataset (with 2328 repeat proteins shared with the test set), DPEG-bh outperforms DeepTrio, PIPR, and DeepFE-PPI by a substantial margin. This outcome effectively rules out “data leakage” as a driver of performance: even when confronted with identical counts of repeat proteins, DPEG’s capability to extract sequence-driven interaction motifs outshines that of other sequence-based approaches. The balanced Positive/Negative AP scores of DPEG-bh further validate its robustness in discerning both true and false interactions—signaling genuine generalization rather than mere reliance on memorized protein information.

Figure 10.

Three-panel plot comparing performance of methods (DPEG variants, DeepTrio, PIPR, DeepFE-PPI) on an independent test set: (a) ROC curves with AUC values, (b) Precision-Recall curve for the interacting class with AP values, (c) Precision-Recall curve for the noninteracting class with AP values.

Performance comparison of DPEG, DeepTrio, PIPR, and DeepFE-PPI on an independent test set. This includes (a) the area under the receiver operating characteristic curve (AUC), (b) average precision (AP) for the interacting class, and (c) average precision (AP) for the noninteracting class.

For a single model, the results reveal inherent tradeoffs between data diversity and homogeneity. While evolutionary divergence (e.g. the transition from yeast-derived interactions to virus–human interactions) constrains transferability, curated datasets (such as BioGRID S.cerevisiae) retain measurable utility. This insight guides future endeavors to strike a balance between biological diversity and data quality when developing generalizable PPI models.

For the cold-start test, it can be seen that DPEG-bhr shows certain advantages and characteristics in various performance indicators. Compared with DPEG-bh, which does not remove the overlapping proteins, although some of its indicator values are slightly lower, considering that DPEG-bhr is tested under the cold-start condition (i.e. with the overlapping proteins removed), such results still demonstrate that the model has good generalization ability and robustness. When compared with other models such as DeepTrio, PIPR, and DeepFE-PPI, DPEG-bhr mostly outperforms them in terms of various indicators, further validating the effectiveness of the model in the cold-start experiment.

There is no positive correlation between sequence similarity and the performance of models in the human–virus PPI prediction task, and some datasets with high sequence similarity (e.g. the multi-species dataset) even show relatively poor performance. We speculate that the core reasons are as follows: first, the way negative samples are constructed significantly affects the efficiency of model learning. Datasets that generate negative samples using the method of “shuffling one sequence of a positive case with 2-let counts” (e.g. BioGRID S.cerevisiae, BioGRID H.sapiens) have negative samples that are closer to the background of positive examples, enabling the model to capture key discriminative features. In contrast, datasets that generate negative samples through “randomly pairing the proteins without obvious evidence of interaction” (e.g. multi-species, S.cerevisiae core datasets) tend to introduce false negatives or nonspecific differences, leading to the failure of feature learning. Second, the relevance of interaction mechanisms between the training set and the test set is more critical than sequence similarity. For instance, the human dataset (BioGRID H.sapiens) performs excellently because it shares human protein interaction patterns with human–virus PPI, while the multi-species dataset includes distantly related species such as prokaryotes, resulting in significant differences in their interaction mechanisms compared to virus–host interactions.

Ablation study

To quantify the contributions of the SSSM block and dynamic graph attention (GATv2), we systematically evaluated four model variants across multiple datasets (Fig. 11). The full DPEG model (SSSM + GATv2) consistently outperformed all ablated versions, demonstrating synergistic effects between its core components. Specifically:

Figure 11.

Eight-panel grouped bar chart comparing performance (percentage) of four model variants across diverse datasets: (a) BioGRID S. cerevisiae, (b) BioGRID H. sapiens, (c) S. cerevisiae (DeepFE-PPI), (d) multispecies (any sequence identity), (e) multispecies (less than or equal to 0.40), (f) multispecies (less than or equal to 0.25), (g) multispecies (less than or equal to 0.10), (h) multispecies (less than or equal to 0.01). Each panel shows metrics: Accuracy, Precision, Sensitivity, Specificity, MCC, F1-score.

Performance comparison of four model variants across heterogeneous datasets: (a) BioGRID S. cerevisiae dataset, (b) BioGRID H. sapiens dataset, (c) S. cerevisiae dataset on DeepFE-PPI, (d) multispecies dataset (any sequence identity), (e) multispecies subset with sequence identity Inline graphic0.40, (f) multispecies subset with sequence identity Inline graphic0.25, (g) multispecies subset with sequence identity Inline graphic0.10, and (h) multispecies subset with sequence identity Inline graphic0.01.

Dynamic graph attention (GATv2 versus static GAT): replacing static GAT with GATv2 in SSM-integrated models (DPEG versus DPEG-SG) improves MCC across datasets. For example, on BioGRID S.cerevisiae, DPEG (MCC = 96.09) outperforms DPEG-SG (MCC = 95.25). In low sequence identity scenarios (e.g. Multispecies Seq.identity Inline graphic 0.01), DPEG achieves +1.84% MCC (95.43 versus 93.59). This highlights GATv2’s enhanced capacity to model adaptive residue-level interactions in evolutionarily distant proteins when synergized with SSSM.

SSSM block integration: adding the SSSM block to GATv2 (DPEG versus DPEG-DG) further boosted performance, particularly in sensitivity (0.74% increase) and specificity (0.38% increase) for human and yeast BioGRID datasets, suggesting its role in capturing conserved biochemical motifs. Notably, the SSSM provided critical stabilization when combined with static GAT—DPEG-SG outperformed DPEG-NG (SSSM+static versus static-only) by 0.16%–0.30% MCC across all test conditions.

These results confirm that both dynamic graph learning and sequence-structure motif mining are essential for robust PPI prediction, especially for proteins with ultra-low homology (Inline graphic0.25 sequence identity).

In addition, for the feature representation of protein sequences, we employed one-hot encoding (DPEG-OH, containing 21D amino acid type information) and ProstT5 embedding (DPEG-P5, which includes 21D amino acid type information and 6D amino acid physicochemical properties). DPEG, on the other hand, integrates 21D amino acid type information and 11D amino acid physicochemical properties. The results show that, as expected, DPEG exhibits relatively better performance in practical data due to its more comprehensive physicochemical properties. In contrast, DPEG-OH only contains amino acid type information and lacks physicochemical property information. Its lower mean value also validates the role of physicochemical property information in enhancing model performance.

Training stability and hyperparameter sensitivity

To evaluate hyperparameter sensitivity, we designed a systematic experiment comprising seven groups of hyperparameter configurations. Specifically, these groups included: A-1, A-2, and A-3 (varying batch sizes); B-1, B-2, and A-3 (varying learning rates); C-1, C-2, and A-3 (varying the number of GNN layers); D-1, D-2, and A-3 (varying dropout rates); E-1, A-2, and B-2 (a combination of batch size and learning rate); F-1, C-2, and D-3 (a combination of GNN layer count and dropout rate); and G-1, G-2, and G-3 (varying optimizers).The hyperparameter settings for each group are shown in Table S14. The test results of the dataset on each group are shown in Table S15.

DPEG was trained using the hyperparameter combination A3 with 1000 epochs and an early stopping mechanism, where the patience was set to 15. The composite metric Inline graphic was computed every five epochs. Training was terminated if this metric failed to improve for 15 consecutive evaluations. The convergence curves of the loss function and performance metrics on each dataset are presented in Figures S7–14 and S15-22, respectively.

Human–virus PPI prediction network

We constructed a PPI prediction network to visualize potential interaction relationships between human proteins and virus proteins, providing researchers with an intuitive visualization to explore molecular mechanisms and underlying biological functions. Using the virus–human interaction dataset, DPEG generated the PPI network. Due to the large scale of the dataset, top 100 proteins from the dataset, as illustrated in Fig. 12. In the network the blue nodes represent human proteins, and the green nodes represent virus proteins. The black edges indicate successfully predicted PPIs (solid lines: experimentally validated interactions; dashed lines: predicted interactions with high confidence). The red edges denote unsuccessfully predicted PPIs (solid lines: false positives; dashed lines: true negatives). This visualization framework enables rapid identification of key interaction hubs and systematic validation of prediction reliability.

Figure 12.

Network diagram of the top 100 predicted human--virus PPIs, with nodes color-coded: green for virus proteins, light blue for human proteins. Edges are styled to indicate interaction types: solid gray (True Positive), red (False Negative), dashed gray (True Negative), and dashed red (False Positive).

Human–virus PPI prediciton network of top 100.

Conclusion

In this paper, we propose DPEG, a novel sequence-driven framework for PPI prediction that overcomes the limitations of existing methods in capturing long-range dependencies, residue-level interactions, and variable-length sequence modeling. By integrating protein language models with dynamic graph attention networks, DPEG eliminates reliance on structural annotations and enables scalable, sequence-only PPI analysis. The framework leverages ESM-2 to encode sequences into residue-level graphs enriched with evolutionary and physicochemical semantics, while a flexible graph construction module adapts to arbitrary sequence lengths. To enhance sensitivity and interpretability, DPEG introduces two key components: a gated attention mechanism to refine noisy residue representations and a dynamic graph attention mechanism to prioritize functionally critical interaction motifs, ensuring computational efficiency without sacrificing granularity.

Comprehensive evaluations on four diverse PPI datasets demonstrate that DPEG achieves state-of-the-art performance across species and interaction types, outperforming sequence-based baselines. The framework exhibits strong cross-dataset generalizability, validating its robustness in real-world scenarios where structural data are absent. Its graph architecture ensures scalability to proteome-wide studies while maintaining sensitivity to subtle residue-level patterns, addressing a critical gap in computational PPI prediction.

While DPEG demonstrates strong performance in protein activity prediction, a primary limitation lies in its computational complexity, which grows superlinearly with protein sequence length due to the quadratic scaling (Inline graphic) of graph edges capturing spatial interactions. To address this, future work will prioritize implementing sparse graph construction strategies (e.g. K-nearest neighbor graphs or physics-based functional graphs) to reduce edge complexity to near-linear Inline graphic, enabling efficient handling of longer sequences.

Beyond this, we plan to extend DPEG to model multi-protein complexes and integrate uncertainty quantification for confidence scoring, which will enhance its utility in high-throughput screening. Enhancing the dynamic attention mechanism to pinpoint critical interaction sites (e.g. binding domains) will also improve biological interpretability, supporting experimental validation. Additionally, exploring transfer learning across species and low-resource organisms will broaden the model’s applicability, while its extensible architecture may be adapted to other biomolecular tasks, such as RNA–protein interaction prediction or ligand binding affinity estimation. By unifying sequence semantics with graph-based reasoning, DPEG paves the way for scalable, resource-efficient analyses of protein interactomes and beyond.

Key Points

  • DPEG employs dynamic graph attention with adaptive gating to prioritize critical functional motifs and model residue-level interactions, achieving high sensitivity without relying on protein structural data.

  • DPEG adapts to variable-length sequences through ESM-2-based residue graphs, preserving full biochemical context and eliminating information loss from sequence truncation.

  • DPEG integrates evolutionary sequence semantics with hierarchical graph neural networks, enabling state-of-the-art cross-dataset PPI prediction accuracy and robust generalizability.

Supplementary Material

Supplementary_File_bbaf517

Contributor Information

Shunpeng Pang, School of Computer Engineering, WeiFang University, 5147 East Dongfeng Road, Kuiwen District, Weifang, Shandong 261061, China.

Mingjian Jiang, School of Information and Control Engineering, Qingdao University of Technology, 777 East Jialingjiang Road, Huangdao District, Qingdao, Shandong 266525, China.

Shugang Zhang, College of Computer Science and Technology, Ocean University of China, 1299 Sansha Road, Huangdao District, Qingdao, Shandong 266100, China.

Shuang Wang, College of Computer Science and Technology, China University of Petroleum, 66 West Changjiang Road, Huangdao District, Qingdao, Shandong 266580, China.

Zhen Li, College of Computer Science and Technology, Qingdao University, 308 Ningxia Road, Laoshan District, Qingdao, Shandong 266071, China.

Jing Sun, School of Computer Engineering, WeiFang University, 5147 East Dongfeng Road, Kuiwen District, Weifang, Shandong 261061, China.

Yuanyuan Zhang, School of Information and Control Engineering, Qingdao University of Technology, 777 East Jialingjiang Road, Huangdao District, Qingdao, Shandong 266525, China.

Li Guo, Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang Key Laboratory of Grapevine Improvement and Utilization, 699 Binhu Road, Xiashan District, Weifang, Shandong 261325, China.

Author contributions

Shunpeng Pang (Conceptualization, Methodology, Investigation, Writing—Original Draft, Writing—Review & Editing), Mingjian Jiang (Conceptualization, Methodology, Writing—Review & Editing, Supervision), Shugang Zhang (Methodology, Investigation, Writing—Review & Editing), Shuang Wang (Investigation, Writing—Review & Editing), Zhen Li (Investigation, Writing—Review & Editing), Jing Sun (Investigation, Writing—Review & Editing), Yuanyuan Zhang (Investigation, Writing—Review & Editing), and Li Guo (Investigation, Writing—Review & Editing)

Conflict of interest: The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Funding

This work is supported by the Natural Science Foundation of Shandong Province (No.ZR2022QF111) and the Scientific Research Foundation for Ph.D (No.WFU2023BS47).

Data availability

The complete source code, data sets, and models used in this study are made publicly available on GitHub at: https://github.com/Bio-Joint-Lab/DPEG.

References

  • 1. Masumshah  R, Eslahchi  C. DPSP: a multimodal deep learning framework for polypharmacy side effects prediction. Bioinforma Adv  2023;3:vbad110. 10.1093/bioadv/vbad110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Masumshah  R, Aghdam  R, Eslahchi  C. A neural network-based method for polypharmacy side effects prediction. BMC Bioinformatics  2021;22:385. 10.1186/s12859-021-04298-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Kovács  IA, Luck  K, Spirohn  K. et al.  Network-based prediction of protein interactions. Nat Commun  2019;10:1240. 10.1038/s41467-019-09177-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Liu  L, Zhu  X, Ma  Y. et al.  Combining sequence and network information to enhance protein–protein interaction prediction. BMC Bioinformatics  2020;21:1–13. 10.1186/s12859-020-03896-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Qiu  J, Chen  K, Zhong  C. et al.  Network-based protein-protein interaction prediction method maps perturbations of cancer interactome. PLoS Genet  2021;17:e1009869. 10.1371/journal.pgen.1009869 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Lun  H, Wang  X, Huang  Y-A. et al.  A novel network-based algorithm for predicting protein–protein interactions using gene ontology. Front Microbiol  2021;12. 10.3389/fmicb.2021.735329 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Wang  X-W, Madeddu  L, Spirohn  K. et al.  Assessment of community efforts to advance network-based prediction of protein–protein interactions. Nat Commun  2023;14:1582. 10.1038/s41467-023-37079-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Zhang  F, Chang  S, Wang  B. et al.  DSSGNN-PPI: a protein–protein interactions prediction model based on double structure and sequence graph neural networks. Comput Biol Med  2024;177:108669. 10.1016/j.compbiomed.2024.108669 [DOI] [PubMed] [Google Scholar]
  • 9. Sijbesma  E, Visser  E, Plitzko  K. et al.  Structure-based evolution of a promiscuous inhibitor to a selective stabilizer of protein–protein interactions. Nat Commun  2020;11:3954. 10.1038/s41467-020-17741-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Zheng  C, Liu  Y, Sun  F. et al.  Predicting protein–protein interactions between rice and blast fungus using structure-based approaches. Front Plant Sci  2021;12. 10.3389/fpls.2021.690124 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Chessari  G, Hardcastle  IR, Ahn  JS. et al.  Structure-based design of potent and orally active isoindolinone inhibitors of MDM2-p53 protein–protein interaction. J Med Chem  2021;64:4071–88. 10.1021/acs.jmedchem.0c02188 [DOI] [PubMed] [Google Scholar]
  • 12. Bryant  P, Pozzati  G, Elofsson  A. Improved prediction of protein-protein interactions using AlphaFold2. Nat Commun  2022;13:1265. 10.1038/s41467-022-28865-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Mischley  V, Maier  J, Chen  J. et al.  PPIscreenML: structure-based screening for protein-protein interactions using AlphaFold. elife  2024;13:RP98179. 10.7554/eLife.98179.1 [DOI] [Google Scholar]
  • 14. Jumper  J, Evans  R, Pritzel  A. et al.  Highly accurate protein structure prediction with AlphaFold. Nature  2021;596:583–9. 10.1038/s41586-021-03819-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Wang  L, Li  F-l, Ma  X-y. et al.  PPI-Miner: a structure and sequence motif co-driven protein–protein interaction mining and modeling computational method. J Chem Inf Model  2022. 10.1021/acs.jcim.2c01033 [DOI] [PubMed] [Google Scholar]
  • 16. Sho Tsukiyama  M, Hasan  M, Fujii  S. et al.  LSTM-PHV: prediction of human–virus protein–protein interactions by LSTM with word2vec. Brief Bioinform  2021;22:bbab228. 10.1093/bib/bbab228 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Jiahui  W, Liu  B, Zhang  J. et al.  DL-PPI: a method on prediction of sequenced protein–protein interaction based on deep learning. BMC Bioinformatics  2023;24:473. 10.1186/s12859-023-05594-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Romero-Molina  S, Ruiz-Blanco  YB, Harms  M. et al.  PPI-detect: a support vector machine model for sequence-based prediction of protein–protein interactions. J Comput Chem  2019;40:1233–42. 10.1002/jcc.25780 [DOI] [PubMed] [Google Scholar]
  • 19. Yao  Y, Xiuquan  D, Diao  Y. et al.  An integration of deep learning with feature embedding for protein–protein interaction prediction. PeerJ  2019;7:e7126. 10.7717/peerj.7126 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Chen  M, Ju  CJ-T, Zhou  G. et al.  Multifaceted protein–protein interaction prediction based on Siamese residual RCNN. Bioinformatics  2019;35:i305–14. 10.1093/bioinformatics/btz328 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Xiaotian  H, Feng  C, Zhou  Y. et al.  DeepTrio: a ternary prediction system for protein–protein interaction using mask multiple parallel convolutional neural networks. Bioinformatics  2022;38:694–702. 10.1093/bioinformatics/btab737 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Oughtred  R, Stark  C, Breitkreutz  B-J. et al.  The BioGRID interaction database: 2019 update. Nucleic Acids Res  2019;47:D529–41. 10.1093/nar/gky1079 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Guo  Y, Lezheng  Y, Wen  Z. et al.  Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences. Nucleic Acids Res  2008;36:3025–30. 10.1093/nar/gkn159 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Hashemifar  S, Neyshabur  B, Khan  AA. et al.  Predicting protein–protein interactions through sequence-based deep learning. Bioinformatics  2018;34:i802–10. 10.1093/bioinformatics/bty573 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Salwinski  L, Miller  CS, Smith  AJ. et al.  The database of interacting proteins: 2004 update. Nucleic Acids Res  2004;32:449D–51. 10.1093/nar/gkh086 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Xenarios  I, Salwínski  L, Duan  XJ. et al.  DIP, the database of interacting proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res  2002;30:303–5. 10.1093/nar/30.1.303 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Liu-Wei  W, Kafkas  Ş, Chen  J. et al.  DeepViral: prediction of novel virus–host interactions from protein sequences and infectious disease phenotypes. Bioinformatics  2021;37:2722–9. 10.1093/bioinformatics/btab147 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Baranwal  M, Magner  A, Saldinger  J. et al.  Struct2Graph: a graph attention network for structure based predictions of protein–protein interactions. BMC Bioinformatics  2022;23:370. 10.1186/s12859-022-04910-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Hie  B, Candido  S, Lin  Z. et al.  A High-Level Programming Language for Generative Protein Design. bioRxiv USA 2022. 10.1101/2022.12.21.521526 [DOI] [Google Scholar]
  • 30. Brody  S, Alon  U, Yahav  E. How attentive are graph attention networks? International Conference on Learning Representations  2022. https://openreview.net/forum?id=F72ximsx7C1 [Google Scholar]
  • 31. You  Z-H, Zhu  L, Zheng  C-H. et al.  Prediction of protein–protein interactions from amino acid sequences using a novel multi-scale continuous and discontinuous feature set. BMC Bioinformatics  2014;15:S9. 10.1186/1471-2105-15-S15-S9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. You  Z-H, Chan  KCC, Pengwei  H. Predicting protein–protein interactions from primary protein sequences using a novel multi-scale local feature representation scheme and the random Forest. PloS One  2015;10:e0125811. 10.1371/journal.pone.0125811 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Li  W, Godzik  A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics  2006;22:1658–9. 10.1093/bioinformatics/btl158 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary_File_bbaf517

Data Availability Statement

The complete source code, data sets, and models used in this study are made publicly available on GitHub at: https://github.com/Bio-Joint-Lab/DPEG.


Articles from Briefings in Bioinformatics are provided here courtesy of Oxford University Press

RESOURCES