Abstract
Background
Accurate prediction of drug–target interactions (DTIs) is essential for advancing drug discovery. Although numerous computational methods have been proposed, many exhibit limited generalization, particularly when dealing with unseen drugs or targets.
Results
To address this challenge, we introduce GPS-DTI, a deep learning framework designed to capture both local and global features of drugs and proteins, thereby enhancing predictive robustness. Specifically, GPS-DTI employs a graph isomorphism network with edge features (GINE)–based graph neural network, combined with a multi-head attention mechanism (MHAM), to effectively model the structural characteristics of drug molecules. For proteins, representations are derived from the pre-trained Evolutionary Scale Model (ESM-2) model and further refined through convolutional neural networks (CNNs), yielding rich feature embeddings. A cross-attention module integrates drug and protein features, uncovering biologically meaningful interactions and improving model interpretability.
Conclusions
Comprehensive benchmarking across in-domain and cross-domain DTI prediction tasks demonstrates that GPS-DTI outperforms existing methods, underscoring its strong generalization capability. Notably, the model achieves state-of-the-art performance on drug–target affinity (DTA) tasks and shows robust adaptability when evaluated on an independent Coronavirus Disease 2019 (COVID-19)–related test set. Furthermore, visualization of cross-attention maps offers interpretable insights into key molecular interactions, highlighting the potential of GPS-DTI in real-world drug discovery applications.
Supplementary Information
The online version contains supplementary material available at 10.1186/s12915-025-02456-9.
Keywords: Drug–target interaction, Graph neural networks, Cross-attention
Background
Predicting drug–target interaction (DTI) is essential for drug discovery [1–4]. While conventional in vitro assays are reliable, they are costly and time-consuming, making them unsuitable for large-scale data applications [5]. In contrast, computational approaches offer an efficient alternative for identifying potential DTI, thereby accelerating the drug discovery process and reducing development costs [6, 7]. As a result, computer-aided DTI prediction has garnered significant attention, offering new opportunities for advancement in the biomedical field [8, 9].
Computational methods for DTI prediction typically fall into three categories. The first category comprises ligand-based approaches, which infer interactions by measuring the similarity between candidate drugs and known ligands of target proteins [10]. However, their efficacy is limited when few ligands are known for a given target [11]. The second category includes methods that utilize the three-dimensional (3D) structures of drugs and target proteins to model interactions [12–14]. These approaches require accurate structural data, making them unsuitable for targets without experimentally resolved 3D conformations [15]. The third category consists of chemogenomics-based methods that combine different types of data, including genome sequences and chemical structures, to predict interactions. Due to the limitations of the first two categories in terms of data availability and applicability, recent research have increasingly focused on chemogenomics-based approaches. These methods often leverage machine learning and deep learning techniques to enhance prediction accuracy and efficiency.
The effectiveness of DTI prediction has been significantly enhanced with the widespread application of machine learning and deep learning techniques in the third category of chemogenomics-based methods. Traditional machine learning techniques, including Random Forests (RFs) [16, 17] and Support Vector Machines (SVMs) [18–20], have been widely utilized in DTI prediction [21]. For instance, Fu long et al. [22] utilized molecular and reaction characteristics as inputs for DTI prediction via SVM kernel functions, while Wang et al. [23] employed features selected by Boruta’s algorithm as inputs to a Random Forest model for a similar task. However, these approaches have limitations when dealing with large-scale datasets as they fail to capture complex patterns effectively. These limitations have spurred the development of end-to-end deep learning models, which demonstrate significant potential in DTI prediction. These models can be classified according to their input data types, including sequence-based, graph-based, and 3D structural data. Sequence-based models represent drugs and targets as linear sequences and use architectures like convolutional neural networks (CNNs) [24, 25] or Transformers [26, 27] to predict interactions, with notable examples including DeepDTA [3] and DeepConv-DTI [24]. Graph-based models depict drugs as molecular graphs [28] and proteins as two-dimensional (2D) maps [29, 30], facilitating the use of Graph Convolutional Networks (GCNs) to extract structural information. Notable models include GraphDTA [31] and DGraphDTA [32]. Additionally, some approaches further explore the complex relationships between drugs and targets by constructing heterogeneous graphs, such as HGTDR [33] and GSRF-DTI [34]. Lastly, 3D structure-based models leverage structural information from protein pockets [35] or molecular dynamics simulations to improve predictions [36].
Although deep learning models have made significant advances in DTI prediction, their performance in cross-domain tasks often remains unsatisfactory. Bai et al. [25] demonstrated that many models perform well within a domain but experience a marked drop in effectiveness across domains, indicating poor generalization to diverse data distributions. A major factor limiting model generalization lies in the challenges of feature extraction, particularly in representing drug and protein characteristics. Conventional methods, such as graph neural networks (GNNs) and GCNs, excel at capturing local structural information but are less effective in modeling long-range dependencies, which are critical for understanding complex drug molecules. Transformer-based architectures, such as CFSSynergy [37], have recently been introduced to enhance global feature extraction, particularly for long-chain molecules. Nevertheless, although graph transformers are effective in modeling global dependencies, they may fall short in capturing fine-grained local patterns, making them less suitable for representing short-chain molecules compared to locally focused models like CNNs. These limitations highlight the need for feature extraction approaches that balance local and global representation to improve model generalization across diverse domains.
To address these challenges, we propose GPS-DTI, a novel deep learning framework for DTI prediction that integrates both local and global feature learning. Specifically, GPS-DTI combines the Graph Isomorphism Network with Edge features (GINE) and a multi-head attention mechanism (MHAM) to comprehensively capture drug molecule characteristics. The GINE layer aggregates local structural information from neighboring nodes and edges, while the MHAM models global dependencies among atoms, thereby enhancing the model’s capacity to learn complex molecular representations. Protein features are extracted using a pre-trained Evolutionary Scale Model (ESM-2) [38] model, which captures evolutionary information encoded in protein sequences. These features are further refined by a CNN to identify relevant local patterns. To effectively integrate drug and protein representations, we introduce a cross-attention mechanism (CAM) that dynamically highlights key interaction regions, enabling more interpretable predictions. Extensive evaluations under both in-domain and cross-domain settings demonstrate that GPS-DTI consistently outperforms state-of-the-art methods, offering superior performance and interpretable insights into drug–target interactions.
The key contributions of this study are outlined below:
To effectively capture the geometric information of drugs, we integrate GINE with MHAM layers, enabling the extraction of both local structural details and global dependencies within drug molecules.
To characterize the interaction conformation between drugs and targets, we introduce a CAM that dynamically interacts drug atom features with target amino acid features, highlighting critical regions most relevant to their interactions and improving interpretability.
Extensive in-domain and cross-domain evaluations demonstrate that GPS-DTI consistently outperforms state-of-the-art DTI prediction methods, showcasing superior performance and generalization ability.
Results
Baseline methods
We evaluate GPS-DTI against the following baseline:
GraphDTA [31] is a GNN-based method that encodes drug molecule graphs with various GNN models and uses a CNN for protein sequence encoding.
DeepConvDTI [24] is a CNN-based model that capturs local patterns from protein amino acid sequences for DTI prediction.
Moltrans [39] integrates a Transformer encoder with substructure pattern mining to improve the precision of DTI prediction and enhance interpretability.
HyperAttentionDTI [40] employs one-dimensional (1D) convolution layers and a HyperAttention module to derive feature representations from protein sequences and drug, modeling amino acid-atom interactions for DTI prediction.
TransformerCPI [26] employs a Transformer-based encoder for sequence-only DTI prediction and introduces label reversal experiments to mitigate data bias and improve interpretability.
DrugBAN [25] uses a bilinear attention mechanism to create joint representations of drugs and proteins, providing interpretability, and incorporates the CDAN module to enhance cross-domain generalization.
CAT-DTI [41] utilizes a CNN-Transformer encoder to generate protein features and applies a CAM to merge drug and target features for DTI prediction.
Evaluation strategies and metrics
To thoroughly evaluate the performance of the model, we employ a range of evaluation strategies, categorized into intra-domain and cross-domain evaluations.
In the intra-domain evaluation, we use a five-fold cross-validation approach for the experiments. Specifically, for all benchmark datasets, 10% of the data is first divided into independent test sets, and the remaining 90% of the data is further divided into quintuplets for five-fold cross-validation. Additionally, we conducted three types of cold start experiments: drug-cold, protein-cold, and drug–target pair-cold. In these cold-start experiments, 70% of drugs, targets, or drug–target pairs were randomly selected for training, while the remaining 30% were allocated to the validation and test sets in a 3:7 ratio. This ensures that the drugs, targets, or pairs in the test set were unseen by the model during training.
For the cross-domain evaluation, we adopted the segmentation method from DrugBAN [25], which utilizes extended connectivity fingerprint, up to four bonds (ECFP4) and pseudo-amino acid composition (PSC) algorithms for clustering drugs and targets. We randomly selected 60% of the clusters as source domain data and the remaining 40% as target domain data. This approach ensures that the source and target domain data come from different distributions, providing a more accurate assessment of the model’s generalization ability.
Since DTI prediction is a classification task, we selected well-established evaluation metrics: area under the receiver operating characteristic curve (AUROC), area under the precision–recall curve (AUPRC), and F1 score. The F1 score is defined as:
| 1 |
where true positives (TP) and true negatives (TN) include the counts of correctly identified interacting and non-interacting drug–target pairs, respectively. False positives (FP) and false negatives (FN) refer to the counts of drug–target pairs mistakenly classified as interacting and non-interacting, respectively. Given that AUROC serves as a comprehensive measure of model performance across various decision thresholds, we retain the model with the highest AUROC on the validation set and subsequently evaluate it on the test set to report the performance metrics.
Experimental setup
Our proposed method is implemented in PyTorch. For all experiments, the model was trained for 50 epochs with a batch size of 64 and a fixed random seed of 42. We used the AdamW optimizer, setting the learning rate to 1e-4 and weight decay to 2e-5 for in-domain tasks, while a learning rate of 5e-5 was used for cross-domain tasks.
Intra-domain evaluation
To evaluate the performance of GPS-DTI, we compared it against existing models, including SVM [18], RF [16], GraphDTA [31], DeepConvDTI [24], TransformerCPI [26], MolTrans [39], HyperAttDTI [40], DrugBAN [25], and CAT-DTI [41] on five benchmark datasets under a five-fold cross-validation setting. As shown in Fig. 1, GPS-DTI consistently achieves superior performance across most datasets and evaluation metrics, demonstrating its effectiveness in drug–target interaction prediction.
Fig. 1.
Five-fold cross-validation performance of GPS-DTI across five datasets
Specifically, on the DrugBank and BioSNAP datasets, GPS-DTI outperforms all other baseline models across all metrics, demonstrating its strong predictive power and stability, further validating the model’s superior performance across different datasets. In contrast, on the BindingDB dataset, although GPS-DTI’s performance is slightly lower than that of DrugBAN, it still maintains robust performance, showcasing its resilience on this challenging dataset. On the Human dataset, GPS-DTI demonstrates clear dominance, achieving the highest F1 score, AUROC, and AUPRC, significantly surpassing all benchmark models. Similarly, on the C.elegans dataset, GPS-DTI leads across all metrics, reflecting its strong generalization capability across diverse datasets. Additionally, statistical analyses of the model’s performance, including significance and stability tests, are provided in Additional file 1: Figs. S1–S5 and Table S1, offering a comprehensive assessment of GPS-DTI’s robustness and statistical significance across the different datasets. Furthermore, a detailed comparison of model complexity and computational resource consumption, including the number of parameters, Floating-Point Operations per Second (FLOPs), memory usage, and training and inference efficiency, is presented in Additional file 2: Table S2, further evaluating the practicality of GPS-DTI for real-world deployment.
While most baseline models achieve high scores in random segmentation settings, these results may not fully reflect real-world scenarios. To further assess model robustness, we conducted cold-start experiments, including drug-based, protein-based, and drug-protein pair-based settings, on the BindingDB and BioSNAP datasets. These scenarios simulate realistic conditions by testing model performance on unseen drugs, proteins, or their combinations.
As shown in Fig. 2, all models exhibited performance declines under cold-start conditions, underscoring the increased complexity of these scenarios. However, GPS-DTI consistently achieved the best performance across all three cold-start settings, demonstrating its robustness and adaptability.
Fig. 2.
Comparison of model performance under three cold-start conditions on BindingDB and BioSNAP
In the Target-cold setting, GPS-DTI showed significant improvements over other models. On the BindingDB dataset, it achieved gains of 4.6%, 4.7%, and 1.5% in AUROC, AUPRC, and F1, respectively, compared to the next-best model. On the BioSNAP dataset, these improvements were even more pronounced, with gains of 6.7%, 5.0%, and 6.6%, respectively. In the Drug-cold setting, GPS-DTI maintained high performance across all metrics, consistently outperforming other models on both datasets. These results highlight the model’s ability to generalize effectively to unseen drugs. In the most challenging Pair-cold setting, where neither drugs nor targets from the training set are present in the test set, GPS-DTI demonstrated strong stability and achieved superior performance across all metrics. It outperformed all benchmark models on both datasets, showcasing its capacity to capture complex interactions even in highly novel scenarios.
Overall, these results validate the robustness, adaptability, and generalization capability of GPS-DTI. Its strong performance across random segmentation and cold-start scenarios establishes it as a reliable tool for drug–target interaction prediction in realistic and challenging settings.
Cross-domain evaluation
To evaluate the generalizability of our models in cross-domain scenarios, we conducted experiments using the BindingDB and BioSNAP datasets, as shown in Fig. 3. Compared to intra-domain experiments, most models exhibit a notable decline in performance when tested on datasets with differing data distributions. However, GPS-DTI consistently demonstrates superior performance in this challenging cross-domain setting.
Fig. 3.
Results of cross-domain experiments on different datasets (a, b)
On the BioSNAP dataset, GPS-DTI achieved the highest AUROC, AUPRC, and F1 scores, outperforming all baseline models. Specifically, GPS-DTI demonstrated significant improvements in AUPRC and F1 scores compared to the second-best models, highlighting its ability to generalize effectively in cross-domain scenarios. On the BindingDB dataset, GPS-DTI also outperformed all other models, achieving the highest AUROC (0.699), AUPRC (0.599), and F1 score (0.680). While the overall performance on BindingDB was lower than on BioSNAP, GPS-DTI maintained a clear advantage over baseline models, emphasizing its robustness across datasets.
These results confirm that GPS-DTI effectively handles cross-domain variability, maintaining strong predictive accuracy and demonstrating its potential as a reliable model for drug–target interaction prediction in diverse datasets.
Performance evaluation on molecules of different sizes
To further evaluate the model’s ability to capture long-range dependencies, molecules were categorized into four groups based on their number of atoms: < 10, 10–20, 20–30, and > 30. Performance was assessed using AUPRC, AUROC, and F1 metrics across the Human, C.elegans, BioSNAP, and BindingDB datasets, as shown in Table 1. For molecules with fewer than 10 atoms, the model consistently achieved high performance metrics across all datasets, with values nearing 1.0. This demonstrates that the model is well-suited for extracting both local and global features in simpler molecules, where long-range dependencies are minimal or unnecessary. For molecules with 10–20 atoms, a slight decline in F1 scores is observed in datasets like C.elegans and BindingDB, potentially reflecting the increased complexity introduced by moderately sized molecules. However, AUPRC and AUROC metrics remained consistently high, underscoring the model's robustness in adapting to more intricate molecular structures despite the added challenge of longer-range interactions. As molecular size increased to 20–30 atoms, the model showed a marked improvement in performance, particularly in the BioSNAP dataset, where all metrics displayed a clear upward trend. This suggests that the model effectively captures long-range dependencies as molecular structures become more complex, leveraging its architecture to analyze intricate patterns.
Table 1.
Performance of GPS-DTI on drugs with varying atom counts
| Dataset | Atom range | AUROC | AUPRC | F1 |
|---|---|---|---|---|
| Human | < 10 | 0.9997 | 0.9997 | 0.9973 |
| 10–20 | 0.9916 | 0.9913 | 0.9779 | |
| 20–30 | 0.9986 | 0.9986 | 0.9911 | |
| > 30 | 0.9996 | 0.9991 | 0.9880 | |
| C.elegans | < 10 | 0.9965 | 0.9972 | 0.9814 |
| 10–20 | 0.9924 | 0.9903 | 0.9611 | |
| 20–30 | 0.9987 | 0.9993 | 0.9809 | |
| > 30 | 0.9987 | 0.9990 | 0.9919 | |
| BioSNAP | < 10 | 0.9742 | 0.9777 | 0.9353 |
| 10–20 | 0.9870 | 0.9870 | 0.9576 | |
| 20–30 | 0.9840 | 0.9820 | 0.9577 | |
| > 30 | 0.9863 | 0.9853 | 0.9600 | |
| BindingDB | < 10 | 1 | 1 | 1 |
| 10–20 | 0.9749 | 0.9589 | 0.9538 | |
| 20–30 | 0.9877 | 0.9824 | 0.9612 | |
| > 30 | 0.9915 | 0.9894 | 0.9694 |
These results highlight the model’s robustness and scalability across varying molecular sizes. While smaller molecules require less reliance on long-range dependencies, the model excels in handling larger molecules, demonstrating its capacity to learn and adapt to molecular complexity. Notably, the model’s recovery and high performance for molecules with 20–30 atoms underscore its capability to manage the transition from moderate to high complexity.
Performance evaluation of drug–target affinity prediction
To further evaluate the performance of our model across different drug–target affinity prediction tasks, we conducted experiments on the DAVIS dataset under three cold-start conditions: drug-, target-, and drug–target pair-based cold-starts. Following the dataset division from AttentionMGT-DTA[42], we compared GPS-DTI against baseline models. Model performance was assessed using three standard regression metrics: Mean Squared Error (MSE), Concordance Index (CI), and modified coefficient of determination (), which respectively evaluate prediction accuracy, ranking consistency, and correlation between predicted and true affinity values. The results, summarized in Fig. 4, demonstrate the superior performance of GPS-DTI across all conditions.
Fig. 4.
Performance comparison results under three cold-start conditions for the DTA task
Under the drug cold-start condition, GPS-DTI achieved a high CI (0.694) and (0.154) with the lowest MSE (0.570), highlighting its strong predictive capability in scenarios with unseen drugs. In the target cold-start condition, GPS-DTI outperformed other models, achieving the highest CI (0.8461), lowest MSE (0.2854), and a significantly improved (0.5211), indicating the model’s ability to effectively learn protein features. For the pair cold-start condition, where both drugs and targets are unseen, all models exhibited performance degradation due to the increased complexity of the task. Despite this, GPS-DTI still achieved the best overall results, showcasing its adaptability to challenging scenarios.
Compared to other models, AttentionMGT-DTA and ELECTRA-DTA demonstrated relatively stable performance across all conditions, whereas DeepDTA and GraphDTA struggled, particularly under Pair cold-start conditions. This performance gap highlights the importance of feature extraction methods in generalization. The strong performance of GPS-DTI under Target-based conditions, coupled with its slightly weaker results in Drug-based scenarios, suggests that existing models are more adept at capturing protein features than drug features. This underscores the need for further advancements in drug feature representation to enhance model performance.
In summary, GPS-DTI demonstrates excellent performance in drug–target affinity prediction, consistently outperforming baseline models across cold-start scenarios. Its adaptability to new data and robust handling of unseen drugs and targets highlight its potential as a valuable tool in computational drug discovery.
Performance validation on COVID-19 data
To further validate the practical efficacy of our model, we curated drug–target interaction data from the Coronavirus Disease 2019 (COVID-19) dataset in the Therapeutic Target Database (TTD) [43]. After filtering for drugs with Simplified Molecular Input Line Entry System (SMILES) sequences, we obtained 101 drug–target interaction entries, encompassing 50 proteins and 69 drugs, for subsequent model validation and analysis.
Negative samples were constructed by excluding positive interactions and randomly selecting negatives at ratios of 1 × , 5 × , 10 × , 15 × , and 20 × relative to the positive samples. As shown in Fig. 5, GPS-DTI consistently achieves the highest AUROC across all positive-to-negative ratios, significantly outperforming baseline models (Fig. 5a). Notably, under severe class imbalance (1:5 to 1:20), GPS-DTI maintains a consistently high F1 score (Fig. 5c), demonstrating its robustness in distinguishing positive and negative samples. Although AUPRC decreases with increasing imbalance (Fig. 5b), GPS-DTI remains competitive and achieves balanced performance in precision and recall. Evaluation results on an independent test set further confirm GPS-DTI's real-world applicability. The model excels in predicting known drug–target interactions and demonstrates strong potential for drug discovery in emerging diseases such as COVID-19. These results highlight the practical utility and adaptability of GPS-DTI, offering a robust tool for advancing drug discovery efforts.
Fig. 5.
Comparison of model performance under different sample ratios on the COVID-19 test set. a, b, and c The results of the performance metrics with respect to the proportions for AUROC, AUPRC, and FI, respectively
Ablation study
To validate the efficacy of the GPS layer, we compared the GPS-DTI model with its four variants (Graph Attention Network (GAT), GCN, GINE, and Graph Isomorphism Network (GIN) as drug feature encoders, respectively) in a clustering-based partitioning scenario. As shown in Fig. 6, GPS consistently achieved the best predictive performance on both the BindingDB and BioSNAP datasets, confirming the effectiveness of the GPS layer in capturing drug features.
Fig. 6.
Comparison of different drug encoding methods on the BindingDB and BioSNAP datasets. a and b represent the AUROC and AUPRC results, respectively
Interestingly, a distinct trend was observed across the datasets: on the BioSNAP dataset, GCN outperformed GAT, whereas on the BindingDB dataset, the opposite was true. This discrepancy can be attributed to the differing characteristics of the datasets. The self-attention mechanism in GAT allows it to dynamically adjust the weights of neighboring nodes, enabling it to better capture local key features and handle data imbalances, such as the larger diversity of drug types compared to protein types in BindingDB. Conversely, the smoothing property of GCN facilitates the integration of global structural features, making it more effective on datasets like BioSNAP, where drug and protein types are more balanced.
Overall, GPS combines the detailed local feature extraction capability of GINE with the global dependency modeling strength of the multi-head attention mechanism. This synergy enables GPS to provide a more comprehensive and effective feature representation, outperforming traditional graph neural network models in drug–target interaction prediction.
On this basis, in order to further analyze the specific roles of GPS-DTI’s constituent modules under the cluster-based split scenario, we designed four ablation models (M-1 to M-4), eliminating or combining GINE (local feature extraction module), MHAM (global feature extraction module), and CAM, respectively. As summarized in Table 2:
M-1: Includes GINE and CAM but excludes MHAM.
M-2: Includes MHAM and CAM but excludes GINE.
M-3: Includes GINE and MHAM but excludes CAM.
M-4: Integrates all three components (GINE, MHAM, and CAM).
Table 2.
The Ablation study results (AUROC) in the cross-domain setting
| Model | GINE | MHAM | CAM | BioSNAP | BindingDB |
|---|---|---|---|---|---|
| M-1 | √ | √ | 0.732 | 0.634 | |
| M-2 | √ | √ | 0.765 | 0.658 | |
| M-3 | √ | √ | 0.759 | 0.635 | |
| M-4 | √ | √ | √ | 0.781 | 0.679 |
Note: Bold values indicate the best performance across all model variants for each dataset
The results demonstrate that M-4 consistently achieves the best performance across both datasets, with AUROC values of 0.781 on BioSNAP and 0.679 on BindingDB. This highlights the synergistic effect of combining all three components, which together enable the model to fully capture the complexity of drug-protein interactions in cross-domain settings. Comparative analysis of the other variants further underscores the importance of each component. M-2 (MHAM + CAM) outperforms M-1 (GINE + CAM), reflecting the importance of MHAM for global feature extraction. Similarly, M-3 (GINE + MHAM) achieves higher performance than M-1, demonstrating the necessity of CAM for capturing dynamic interactions between drugs and proteins. However, neither M-2 nor M-3 matches the performance of M-4, confirming that local (GINE) and global (MHAM) feature extraction, along with CAM, are all essential for optimal model performance.
In summary, the ablation study results validate that GINE provides critical local feature extraction, MHAM enhances global feature representation, and CAM facilitates dynamic interactions, collectively enabling GPS-DTI to achieve superior predictive accuracy in cross-domain tasks.
Interpretability with cross-attention visualization
To demonstrate the interpretability of GPS-DTI, we conducted binding visualization analyses on two complexes from the Protein Data Bank (PDB) database [44], specifically 3S75 [45] and 2XFK [46], as shown in Fig. 7 (details of the visualization process are provided in Additional file 3). In these visualizations, proteins are depicted in white, drugs in sky blue, and hydrogen bonds are represented by blue dashed lines. Key protein residues and drug atoms identified by the model are highlighted in red and light orange, respectively. The lower section of the protein attention heatmap indicates the residues most critical for binding (see Additional file 4 and Additional file 5 of the Supplementary Material for full heatmaps). In the drug-molecule attention map, darker orange regions signify areas of greater model attention.
Fig. 7.
The attention visualization for DTI. a Attention visualization for 3S75. b Attention visualization for 2XFK
For PDB structure 3S75, involving furan-2-sulfonamide binding to human carbonic anhydrase 2 (CA2), GPS-DTI accurately identified the importance of the sulfonamide group in the interaction. The oxygen atom in the sulfonamide group was correctly identified as a hydrogen bond acceptor interacting with the backbone of Thr199, while the amine group served as a hydrogen bond donor to the side chain of Thr199. Additionally, the model accurately predicted the hydrogen bond between the oxygen atom in the furan ring and Thr200, although minor inaccuracies were observed for some other atoms.
For PDB structure 2XFK, involving Beta-secretase 1 binding to VTP-27999, the model further validated its interpretability by identifying critical interactions. These included the sulfonamide group acting as a hydrogen bond acceptor with Thr293 and Asn294, as well as the amine group forming a hydrogen bond with Thr293. Highlighted regions in the drug molecule, including the sulfonamide, amine, and hydroxymethyl groups, were all experimentally confirmed to participate in binding.
These findings highlight GPS-DTI’s ability to provide interpretable predictions by effectively identifying critical interactions in drug–protein binding. The biologically relevant insights offered by the model demonstrate its potential as a valuable tool in drug discovery and development.
Discussion
This study demonstrates the superior performance of GPS-DTI across diverse experimental settings, consistently outperforming existing models. We attribute this success to GPS-DTI’s comprehensive feature extraction capabilities for both proteins and drugs, as well as its stable predictive performance across varying data distributions—particularly under cold-start scenarios, where it exhibits notable adaptability and robustness. Specifically, we integrate a large-scale language model (ESM-2) with CNNs to capture both global and local protein features, and combine GINE layers with MHAM to extract local and global features from drug molecules. The synergy between the language model and CNN facilitates the rich extraction of biological and chemical information from proteins, thereby enhancing the model’s generalization to unseen data. Meanwhile, the integration of GINE and MHAM enables GPS-DTI to learn both structural details and long-range dependencies in drug molecules, further improving its robustness and adaptability.
In comparison with sequence-based models (such as MolTrans, TransformerCPI, and HyperAttDTI), we observed that these models performed relatively poorly, mainly due to their inability to fully utilize the chemical structural information of drugs. Sequence models typically represent drugs using SMILES strings, which lack a comprehensive understanding of the complex structures of drug molecules, limiting their application in complex DTI prediction. In contrast, graph-based models (such as DrugBAN and CAT-DTI) use molecular graphs to represent the chemical structure of drugs, but they focus primarily on local neighbor information, neglecting the global structural features of the molecular graph. Our graph-based comparative experiments further demonstrate that GPS-DTI outperforms conventional graph neural networks in drug feature extraction. This superior performance highlights the advantages of the GPS layer, which integrates the local representation power of GINE with the global feature extraction capability of MHAM, yielding a more comprehensive and effective modeling approach.
Moreover, GPS-DTI’s performance in experiments across different drug atom number ranges further demonstrates the model’s ability to capture both long-range and short-range dependencies between atoms, particularly in complex molecular structures. GPS-DTI effectively learns both local and global features across molecules of different sizes, showcasing its robustness and scalability in handling molecular complexity. This indicates that GPS-DTI can process not only smaller molecules but also larger and more complex molecular structures, and achieves excellent performance in drug–target interaction prediction.
Experimental results on the COVID-19 dataset also demonstrate that GPS-DTI maintains strong predictive performance in real-world scenarios, highlighting its generalization ability. Moreover, the incorporation of the CAM enhances the biological interpretability of the model by enabling the identification of key protein residues and drug atoms. This allows GPS-DTI to reveal meaningful binding regions and interaction patterns, offering valuable insights into drug–target binding mechanisms.
Despite these promising results, the present work also reveals several challenges that warrant further investigation. First, the scarcity and limited diversity of existing training datasets restrict the model’s ability to capture the full range of DTIs encountered in real-world biological systems, emphasizing the need for assembling larger and more heterogeneous datasets. Second, while the current approach primarily relies on two-dimensional chemical representations and protein sequence information, integrating three-dimensional structural features, such as spatial conformations, could significantly enhance the predictive accuracy and biological relevance of the model. Moreover, due to the lack of publicly available datasets related to certain complex DTI patterns (e.g., allosteric regulation), our model has not yet been thoroughly evaluated under such scenarios and may exhibit limitations in handling these specific mechanisms. Finally, although GPS-DTI demonstrates strong cross-domain adaptability, future research should focus on devising more powerful domain adaptation strategies to address major shifts in chemical or biological space, thereby ensuring robust performance against novel molecular scaffolds and unconventional protein targets.
Conclusions
In this study, we introduced GPS-DTI, a novel framework for DTI prediction with enhanced generalization in both cross-domain settings and when confronted with previously unseen drugs or targets. By leveraging a pre-trained ESM-2 model and convolutional neural networks for protein feature extraction, alongside a GPS layer that integrates GINE with MHAM to model drug features, GPS-DTI captures both local and global structural information critical to accurate DTI prediction. A CAM further enriches interpretability by highlighting biologically relevant interactions, thus offering greater insight into the underlying mechanisms governing drug–target binding. Extensive benchmarking experiments demonstrate that GPS-DTI achieves state-of-the-art performance across diverse datasets, excelling in both in-domain and cross-domain settings. Notably, the model maintains robust predictive capabilities on an independent COVID-19-related test set, underscoring its potential as a transformative tool in the early stages of drug discovery and development.
Methods
Benchmark dataset
In this study, we selected five public DTI datasets for our experiments: the Human, C.elegans, BioSNAP, BindingDB, and DrugBank datasets. The first two datasets were created by Ref. [29], while the BioSNAP dataset was constructed in Ref. [39]. The BindingDB used in this study is a low-bias version, as described in Ref. [47]. The DrugBank dataset is a balanced dataset based on the DrugBank database[48], containing equal numbers of positive and negative samples. Table 3 presents the basic statistical information for each dataset, and Fig. 8 provides a detailed visualization of the distribution of protein sequence lengths, drug SMILES string lengths, and drug molecular weights across these datasets. The data reveal significant differences not only in the number of drugs and proteins but also in the distributions of drug characteristics, such as molecular weight and sequence length.
Table 3.
Summary of drug–protein interaction datasets with positive and negative samples
| Datasets | Drug | Protein | Interactions | Positives |
|---|---|---|---|---|
| BindingDB | 14,643 | 2,623 | 49,199 | 20,674 |
| Human | 2,726 | 2,001 | 6,728 | 3,364 |
| BioSNAP | 4,505 | 2,181 | 27,464 | 13,830 |
| C.elegans | 1,767 | 1,876 | 7,786 | 3,893 |
| DrugBank | 6,553 | 4,293 | 34,518 | 17,259 |
Fig. 8.
Distribution of protein lengths, drug SMILES lengths, and drug molecular weights in five DTI datasets
GPS-DTI architecture
GPS-DTI is a deep learning-based framework designed to accurately predict DTIs by integrating the graph structure of drug molecules with the sequence information of target proteins. As illustrated in Fig. 9a, the model takes the SMILES representation of a drug and the amino acid sequence of a protein as input. For the protein branch, high-level feature representations capturing evolutionary and context-dependent information are first extracted using the pre-trained ESM-2 model (Fig. 9b). These features are then further processed by three-layer CNNs to capture local patterns, yielding the final protein representation . The drug branch converts the input into a molecular graph, where atoms and chemical bonds correspond to nodes and edges. Initial node and edge features are fed into the GPS layer (Fig. 9c), which integrates a message-passing neural network (MPNN) and a MHAM to capture both local structures and global dependencies. Specifically, MPNN aggregates information from neighboring nodes and edges, modeling local connectivity, while MHAM allows long-range interactions across the entire molecule. Within each GPS layer, the fused local and global features are passed through a multilayer perceptron (MLP) to update node and edge representations. This process is repeated across multiple layers, and the final node features from the last layer are used as the drug representation . Subsequently, a CAM is employed to integrate the drug and protein representations, highlighting crucial interaction regions and generating a joint representation. This is subsequently passed through a fully connected layer to predict the likelihood of drug–target interaction. By leveraging the complementary characteristics of molecular graphs and protein sequences, GPS-DTI demonstrates strong predictive performance and offers enhanced biological interpretability.
Fig. 9.
Overview of the GPS-DTI framework. a Main figure: The GPS-DTI framework takes the SMILES sequence of a drug and the protein sequence as input. The drug is first converted into a molecular graph through a graph construction process, which generates node and edge features that are then processed by the GPS layer. Protein features are extracted using a pre-trained ESM-2 model and further refined by three-layer CNNs. The drug and protein features are fused through a cross-attention module to generate updated representations, which are then processed by max-pooling and concatenation to form a joint representation, ultimately used for drug–target interaction prediction through a fully connected layer. b The protein encoding module utilizes the pre-trained ESM-2 model to extract evolutionary and contextual information from the protein sequences. c The drug encoding module applies the GPS layer, which processes the drug’s node and edge features. Local feature extraction is achieved via a MPNN, while global feature learning is performed using a multi-head attention mechanism, enabling comprehensive drug representation
Drug molecule graph construction
To capture more chemical and topological information, we represent drug molecules as graphs. Each drug is modeled as a two-dimensional undirected graph , where the vertices represent atoms and the edges represent bonds. In this graph representation, refers to the set of atom nodes, each associated with feature vectors, while consists of edges, each characterized by its respective feature vector. Our approach utilizes node features and edge features to construct the drug graph, as detailed in Additional file 6: Table S3 [49, 50].
We employ Laplacian eigenvectors to enhance node representations, thereby better capturing the graph's structural information. Specifically, for each drug graph , we first extract node features and edge features . These features are then transformed into hidden representations and via a linear projection layer, as defined below:
| 2 |
| 3 |
where and are learnable weight parameters, and are the corresponding bias terms.
To incorporate positional encoding into the node features, we follow the approach described in [42]. We first compute the Laplacian matrix ∆ of the graph as:
| 4 |
where is the adjacency matrix, is the degree matrix, is the diagonal matrix of eigenvalues, and is the matrix of eigenvectors. We then select the smallest non-trivial eigenvectors for node as its positional encoding, denoted . These are linearly transformed as:
| 5 |
| 6 |
where is a learnable weight matrix, and is a bias term. The final node representation integrates both the original node features and the structural position encoded by the Laplacian eigenvectors.
Graph representation learning for drug molecules
To effectively extract comprehensive drug features, we adopt a hybrid framework based on the GPS layer [51], which integrates a MPNN with a MHAM. This framework captures both local and global structural information in molecular graphs. For local structural learning, we evaluated three edge-aware MPNN architectures: GINE, GAT, and Generalized Graph Convolution (GEN). Among them, GINE outperformed the others in capturing the structural complexity and isomorphism of drug molecules by explicitly integrating edge features into message passing. In contrast, GAT leverages attention mechanisms to dynamically weight neighboring nodes, emphasizing chemically significant patterns, while GEN combines diverse propagation approaches to enhance model generalization. Experimental results (Additional File 7: Fig. S6) demonstrated GINE's superior performance, leading to its adoption as the MPNN component in our GPS framework.
In each GINE layer, during the message construction phase, each node constructs a message based on its features, the features of neighboring nodes, and the edge features. This process is implemented through the following message function:
| 7 |
where denotes the message sent from node to node , , and represent the features of node and node at layer , respectively, and is the edge feature at layer . The function is the message passing function.
Subsequently, node aggregates all the messages sent by its neighboring nodes as follows:
| 8 |
where represents the aggregated message, and denotes the set of neighboring nodes defined by the adjacency matrix .
The aggregated message is then combined with the current feature of node and passed through the update function:
| 9 |
where represents the node feature update function, and is the updated feature of node at layer .
Similarly, the edge feature is also updated based on the features of the two endpoint nodes and the current edge feature:
| 10 |
where denotes the edge feature update function, and is the updated edge feature.
To enhance the model’s ability to capture long-range dependencies and interactions within the graph, a MHAM is employed. This mechanism models global information by attending to all nodes in the graph, as shown in the following formula:
| 11 |
where (query), (key), and (value) are obtained from node features through linear transformation, and is the dimension of the key vectors.
The computation for each head is given by:
| 12 |
where , , and are the query, key, and value transformation matrices for the -th attention head, respectively.
The combined output from all heads is:
| 13 |
where is the output projection matrix.
The global node features are then computed as:
| 14 |
where is the output, which is the new global feature representation for the nodes after the attention operation at layer .
At each layer, we aggregate features from the GINE layer and the MHAM and generate the final node and edge feature representations using an MLP. The feature update mechanism is as follows:
| 15 |
where is the node feature updated by the GINE module, and is the node feature updated by the multi-head attention mechanism. Both are combined and passed through an MLP to generate the final updated node features .
It is important to note that the output at each layer (except the final one) serves as the input for the next layer. This process continues until the final layer is reached, with the output of the last layer representing the final node features. These final node features are taken as the drug feature representation , which will be used for further analysis or prediction tasks.
Sequence representation learning for proteins
In this study, we employed ESM-2 to embed protein sequences. Developed by Facebook AI Research, ESM-2 is a deep learning architecture tailored for processing protein sequences [52]. Utilizing the Transformer architecture, it trained on a substantial corpus of unlabeled protein sequences, allowing it to grasp long-range dependencies and evolutionary information within the sequences. Since Transformers excel at capturing global information and CNNs are effective at capturing local patterns, we first obtain the high-level embeddings of protein sequences using the ESM2 model, where indicates the length of the protein sequence and represents the embedding dimension.
Before inputting protein sequences into the model, we standardized their lengths to ensure consistency. Specifically, the length of each protein sequence was set to 1000. For sequences longer than 1000 amino acids, the first 1000 residues were retained, while shorter sequences were padded with a special token < PAD > to reach the fixed length. This preprocessing step ensures that all protein sequences have a uniform length upon entering the model, guaranteeing consistency in the dimensionality of the embeddings . We then process the obtained embeddings using three-layer CNNs to extract higher-level features. The output from each CNN layer acts as the input for the subsequent layer until the desired depth is reached.
For the layer of the CNN, the process of updating the protein hidden representations can be expressed as:
| 16 |
where denotes the learnable weight matrix and is bias vector for that layer; represents the hidden representations of the protein for that layer; the activation functionand is implemented as the ReLU function; indicates that the input to the first CNN layer is the original protein embedding vector. The final layer produces the desired protein features, denoted as .
Cross-attention mechanism
After obtaining the feature mappings for drugs and targets, we implemented a CAM to enhance DTI prediction precision. This mechanism facilitates bidirectional communication between drug and protein features by dynamically adjusting and optimizing their mappings. The cross-attention module improves prediction reliability by updating the attentional distributions between drug and protein features, capturing subtle interactions.
For the drug features , the query vectors are computed via a linear transformation. Key vectors and value vectors are computed from the protein feature map . Specifically,
| 17 |
where , , and are learned weight matrices. Here, represents the number of nodes in the drug molecular graph, is the length of the protein sequence, is the embedding dimension, is the dimension of each attention head, and ranges from to , denoting the number of attention heads.
Similarly, the protein feature map is processed to generate its corresponding query , key , and value vectors, with the drug feature map used for the calculation of key and value vectors.
The drug attention matrix is computed by multiplying the drug query with the transpose of the drug key , followed by applying the softmax function for normalization:
| 18 |
the attention matrix reflects the relevance of protein feature sections to the drug feature map.
The protein attention matrix is calculated in the same manner, using the protein query and the drug key :
| 19 |
The revised drug feature map for each head is derived by multiplying the attention matrix with the corresponding value vector , followed by concatenating the results from all heads along the channel dimension. A final linear transformation is applied to produce the updated drug feature map :
| 20 |
where is a shared weight matrix. The protein feature map is updated similarly, using the protein attention matrix and value vector .
The resulting feature maps are integrated with the original ones to produce the drug feature map and protein feature map :
| 21 |
| 22 |
Finally, global max pooling is applied to reduce the dimensionality of the drug and protein feature maps. The resulting features are then concatenated to create the joint feature representation :
| 23 |
Prediction classifier
We designed a deep learning classifier to predict the probability of DTI. The classifier transforms the joint feature representation into the output space using a fully connected layer. The output layer consists of a single unit with weights and bias learned during training. The probability of DTI is calculated as follows:
| 24 |
During training, we use the AdamW optimizer to minimize the cross-entropy loss function with L2 regularization, optimizing the model parameters. The loss function is defined as:
| 25 |
where denotes the true label of the -th sample, represents the corresponding predicted probability, is the collection of all learnable parameters, and indicates the L2 regularization strength.
Supplementary Information
Additional file 1. Statistical Analysis of Model Performance. This file contains the supplementary materials supporting the main text, including statistical analyses of model performance across different datasets, presented in additional figures and tables. Figs. S1–S5. Statistical analyses of AUROC performance between GPS-DTI and baseline models on different datasets: Human (Fig. S1), C.elegans (Fig. S2), BioSNAP (Fig. S3), BindingDB (Fig. S4), and DrugBank (Fig. S5). Table S1. Statistical analysis of AUROC performance across folds for GPS-DTI on all benchmark datasets.
Additional file 2. Model Complexity and Resource Utilization Analysis. This file provides supplementary information related to the computational complexity and resource utilization of the proposed model. Table S2. Comparison of model complexity and resource utilization among GPS-DTI and baseline methods, including parameter counts, training time, and GPU memory consumption.
Additional file 3. Detailed Explanation of Cross-Attention and 3D Structure Visualization. This file provides detailed implementation information for the “Interpretability with Cross-Attention Visualization” section, including the methods used to generate attention heatmaps and the procedures for visualizing the 3D structure. The content aims to enhance the reproducibility and interpretability of the GPS-DTI model.
Additional file 4. Complete Heatmap of Protein 3S75 Complex (.fig file). This file contains the complete attention heatmap of proteins within the 3S75 complex.
Additional file 5. Complete Heatmap of Protein 2XFK Complex (.fig file). This file contains the complete attention heatmap of proteins within the 2XFK complex.
Additional file 6. Features for Constructing Drug Graphs. This file contains Table S3, which lists the atom– and bond-level features used to construct molecular graphs for the GPS-DTI model.
Additional file 7. Comparison of MPNN Variants. This file contains Fig. S6, which shows the performance of different MPNN variants (GEN, CAT, GINE) on the BindingDB and BioSNAP datasets.
Acknowledgements
We sincerely acknowledge the contributions of all authors who participated in this study, whose collaboration and efforts were essential for the completion of this research.
Abbreviations
- 1D
One-dimensional
- 2D
Two-dimensional
- 3D
Three-dimensional
- AUPRC
Area under the precision–recall curve
- AUROC
Area under the receiver operating characteristic curve
- CA2
Human carbonic anhydrase 2
- CAM
Cross-attention mechanism
- CI
Concordance Index
- CNN
Convolutional neural network
- COVID-19
Coronavirus Disease 2019
- DTA
Drug–target affinity
- DTI
Drug–target interaction
- ECFP4
Extended connectivity fingerprint, up to four bonds
- ESM
Evolutionary Scale Model
- FLOPs
Floating-Point Operations per Second
- FN
False negatives
- FP
False positives
- GCN
Graph Convolutional Network
- GAT
Graph Attention Network
- GINE
Graph Isomorphism Network with Edge features
- GNN
Graph neural network
- GIN
Graph Isomorphism Network
- GEN
Generalized Graph Convolution
- MHAM
Multi-Head Attention Mechanism
- MSE
Mean Squared Error
- PDB
Protein Data Bank
- PSC
Pseudo-amino acid composition
- RCSB
Research Collaboratory for Structural Bioinformatics
- RF
Random Forest
Modified Coefficient of Determination
- SVM
Support Vector Machine
- SMILES
Simplified Molecular Input Line Entry System
- TP
True positives
- TN
True negatives
- TTD
Therapeutic Target Database
Authors’ contributions
An Xiong: Conceptualization, Investigation, Methodology, Writing – original draft, Writing – review & editing. Zhenjie Luo: Investigation, Writing – review & editing. Yan Xia: Data curation, Writing – review & editing. Quan Zou: Supervision, Writing – review & editing. Leyi Wei: Supervision, Writing – review & editing. Zilong Zhang: Conceptualization, Resources, Supervision, Writing – review & editing. Tao Wang: Investigation, Resources, Writing– review & editing. Lesong Wei: Conceptualization, Writing – review & editing. Feifei Cui: Conceptualization, Resources, Supervision, Writing – review & editing. All authors read and approved the final manuscript.
Funding
The work was supported by the National Natural Science Foundation of China (No.62262015), the Science and Technology special fund of Hainan Province (ZDYF2024GXJS018), the Innovation Platform for “New Star of South China Sea” of Hainan Province (No. NHXXRCXM202306), the Hainan Provincial Natural Science Foundation of China (324MS009), the Qiqihar Medical Institute Foundation (QMSI2024Z-01), the Education Department Foundation of Heilongjiang Province (2024-KYYWF-0339), and the Scientific Technology Project of Qiqihar City (LSFGG-2024099).
Data availability
All datasets used in this study are publicly available. The BioSNAP and BindingDB datasets were obtained from the DrugBAN [53] project repository (https://github.com/peizhenbai/DrugBAN/tree/main/datasets). According to the DrugBAN paper, these datasets were originally derived from publicly available sources, including the BindingDB [54] database (https://www.bindingdb.org/rwd/bind/index.jsp) and the BioSNAP [55] dataset (https://github.com/kexinhuang12345/MolTrans/tree/master/dataset/BIOSNAP/full_data). The Human and C.elegans datasets were obtained from Ref. [56], available at: Human (https://github.com/masashitsubaki/CPI_prediction/tree/master/dataset/human).C.elegans:( https://github.com/masashitsubaki/CPI_prediction/tree/master/dataset/celegans). The DrugBank dataset is a balanced dataset derived from the DrugBank [57] database (https://go.drugbank.com/). The DAVIS dataset was obtained from the AttentionMGT-DTA [58] project repository (https://github.com/JK-Liu7/AttentionMGT-DTA/tree/main/data/Davis). The COVID-19 drug–target interaction dataset was curated from the TTD [59] (https://db.idrblab.net/ttd/). The source code supporting the findings of this study is publicly available at GitHub: https://github.com/xa-123955/GPS-DTI
Declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Zilong Zhang, Email: zhangzilong@hainanu.edu.cn.
Tao Wang, Email: wangtao@qmu.edu.cn.
Lesong Wei, Email: lesongwei@csj.uestc.edu.cn.
Feifei Cui, Email: feifeicui@hainanu.edu.cn.
References
- 1.Agamah FE, Mazandu GK, Hassan R, Bope CD, Thomford NE, Ghansah A, et al. Computational/in silico methods in drug target and lead prediction. Brief Bioinform. 2020;21(5):1663–75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Luo Y, Zhao X, Zhou J, Yang J, Zhang Y, Kuang W, et al. A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information. Nat Commun. 2017;8(1):573. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Öztürk H, Özgür A, Ozkirimli E. Deepdta: deep drug-target binding affinity prediction. Bioinformatics. 2018;34(17):i821-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Yamanishi Y, Araki M, Gutteridge A, Honda W, Kanehisa M. Prediction of drug–target interaction networks from the integration of chemical and genomic spaces. Bioinformatics. 2008;24(13):i232–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Pan X, Lin X, Cao D, Zeng X, Yu PS, He L, et al. Deep learning for drug repurposing: methods, databases, and applications. WIREs Comput Mol Sci. 2022. 10.1002/wcms.1597. [Google Scholar]
- 6.Bagherian M, Sabeti E, Wang K, Sartor MA, Nikolovska-Coleska Z, Najarian K. Machine learning approaches and databases for prediction of drug-target interaction: a survey paper. Brief Bioinform. 2021;22(1):247–69. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Jie G, Qiming F, Jiacheng S, Yunzhe W, Youbing X, You L, et al. QLDTI: a novel reinforcement learning-based prediction model for drug-target interaction. Curr Bioinform. 2024;19(4):352–74. [Google Scholar]
- 8.Wen M, Zhang Z, Niu S, Sha H, Yang R, Yun Y, et al. Deep-learning-based drug-target interaction prediction. J Proteome Res. 2017;16(4):1401–9. [DOI] [PubMed] [Google Scholar]
- 9.Liu J, Lu Y, Guan S, Jiang T, Ding Y, Fu Q, et al. Drug-target interaction prediction by combining transformer and graph neural networks. Curr Bioinform. 2024;19(4):316–26. [Google Scholar]
- 10.Hendrickson JB. Concepts and applications of molecular similarity. Science. 1991;252(5009):1189–90. [Google Scholar]
- 11.Jacob L, Vert JP. Protein-ligand interaction prediction: an improved chemogenomics approach. Bioinformatics. 2008;24(19):2149–56. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Chen YZ, Zhi DG. Ligand-protein inverse docking and its potential use in the computer search of protein targets of a small molecule. Proteins. 2001;43(2):217–26. [DOI] [PubMed] [Google Scholar]
- 13.Kitchen DB, Decornez H, Furr JR, Bajorath J. Docking and scoring in virtual screening for drug discovery: methods and applications. Nat Rev Drug Discov. 2004;3(11):935–49. [DOI] [PubMed] [Google Scholar]
- 14.Zhang R, Zhu B, Jiang T, Cui Z, Wu H. Enhancing drug-target binding affinity prediction through deep learning and protein secondary structure integration. Curr Bioinform. 2024;19(10):943–52. [Google Scholar]
- 15.Yildirim MA, Goh K-I, Cusick ME, Barabasi A-L, Vidal M. Drug-target network. Nat Biotechnol. 2007;25(10):1119–26. [DOI] [PubMed] [Google Scholar]
- 16.Breiman L. Random forests. Mach Learn. 2001;45:5–32. [Google Scholar]
- 17.Shi C, He J, Pundlik S, Zhou X, Wu N, Luo G. Low-cost real-time VLSI system for high-accuracy optical flow estimation using biological motion features and random forests. Sci China Inf Sci. 2023;66(5):159401. [Google Scholar]
- 18.Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20:273–97. [Google Scholar]
- 19.Wang Y, Zhai Y, Ding Y, Zou Q. SBSM-Pro: support bio-sequence machine for proteins. Sci China Inf Sci. 2024;67(11):212106.
- 20.Kumar Meher P, Hati S, Sahu T, Pradhan U, Gupta A, Rath S. SVM-root: identification of root-associated proteins in plants by employing the support vector machine with sequence-derived features. Curr Bioinform. 2024;19(1):91–102. [Google Scholar]
- 21.Ballester PJ, Mitchell JB. A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking. Bioinformatics. 2010;26(9):1169–75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Faulon JL, Misra M, Martin S, Sale K, Sapra R. Genome scale enzyme-metabolite and drug-target interaction predictions using the signature molecular descriptor. Bioinformatics. 2008;24(2):225–33. [DOI] [PubMed] [Google Scholar]
- 23.Wang XR, Cao TT, Jia CM, Tian XM, Wang Y. Quantitative prediction model for affinity of drug-target interactions based on molecular vibrations and overall system of ligand-receptor. BMC Bioinformatics. 2021;22(1):497. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Lee I, Keum J, Nam H. Deepconv-DTI: prediction of drug-target interactions via deep learning with convolution on protein sequences. PLoS Comput Biol. 2019;15(6):e1007129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Bai P, Miljkovic F, John B, Lu H. Interpretable bilinear attention network with domain adaptation improves drug-target prediction. Nat Mach Intell. 2023;5(2):126–36. [Google Scholar]
- 26.Chen L, Tan X, Wang D, Zhong F, Liu X, Yang T, et al. TransformerCPI: improving compound-protein interaction prediction by sequence-based deep learning with self-attention mechanism and label reversal experiments. Bioinformatics. 2020;36(16):4406–14. [DOI] [PubMed] [Google Scholar]
- 27.Huang L, Lin J, Liu R, Zheng Z, Meng L, Chen X, et al. Coadti: multi-modal co-attention based framework for drug-target interaction annotation. Brief Bioinform. 2022. 10.1093/bib/bbac446. [DOI] [PubMed] [Google Scholar]
- 28.Qian Y, Wu J, Zhang Q. CAT-CPI: combining CNN and transformer to learn compound image features for predicting compound-protein interactions. Front Mol Biosci. 2022;9:963912. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Tsubaki M, Tomii K, Sese J. Compound-protein interaction prediction with end-to-end learning of neural networks for graphs and sequences. Bioinformatics. 2019;35(2):309–18. [DOI] [PubMed] [Google Scholar]
- 30.Fu X, Yuan Y, Qiu H, Suo H, Song Y, Li A, et al. AGF-ppis: a protein–protein interaction site predictor based on an attention mechanism and graph convolutional networks. Methods. 2024;222:142–51. [DOI] [PubMed] [Google Scholar]
- 31.Nguyen T, Le H, Quinn TP, Nguyen T, Le TD, Venkatesh S. Graphdta: predicting drug-target binding affinity with graph neural networks. Bioinformatics. 2021;37(8):1140–7. [DOI] [PubMed] [Google Scholar]
- 32.Jiang M, Li Z, Zhang S, Wang S, Wang X, Yuan Q, et al. Drug–target affinity prediction using graph neural network and contact maps. RSC Adv. 2020;10(35):20701–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Gharizadeh A, Abbasi K, Ghareyazi A, Mofrad MRK, Rabiee HR. HGTDR: advancing drug repurposing with heterogeneous graph transformers. Bioinformatics. 2024. 10.1093/bioinformatics/btae349. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Zhu Y, Ning C, Zhang N, Wang M, Zhang Y. GSRF-DTI: a framework for drug-target interaction prediction based on a drug-target pair network and representation learning on a large graph. BMC Biol. 2024;22(1):156. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Yazdani-Jahromi M, Yousefi N, Tayebi A, Kolanthai E, Neal CJ, Seal S, et al. Attentionsitedti: an interpretable graph-based model for drug-target interaction prediction using NLP sentence-level relation classification. Brief Bioinform. 2022. 10.1093/bib/bbac272. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Stepniewska-Dziubinska MM, Zielenkiewicz P, Siedlecki P. Development and evaluation of a deep learning model for protein-ligand binding affinity prediction. Bioinformatics. 2018;34(21):3666–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Rafiei F, Zeraati H, Abbasi K, Razzaghi P, Ghasemi JB, Parsaeian M, et al. CFSSynergy: combining feature-based and similarity-based methods for drug synergy prediction. J Chem Inf Model. 2024;64(7):2577–85. [DOI] [PubMed] [Google Scholar]
- 38.Lin Z, Akin H, Rao R, Hie B, Zhu Z, Lu W, et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science. 2023;379(6637):1123–30. [DOI] [PubMed] [Google Scholar]
- 39.Huang K, Xiao C, Glass LM, Sun J. Moltrans: molecular interaction transformer for drug-target interaction prediction. Bioinformatics. 2021;37(6):830–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Zhao Q, Zhao H, Zheng K, Wang J. HyperAttentionDTI: improving drug-protein interaction prediction by sequence-based deep learning with attention mechanism. Bioinformatics. 2022;38(3):655–62. [DOI] [PubMed] [Google Scholar]
- 41.Zeng X, Chen W, Lei B. Cat-dti: cross-attention and transformer network with domain adaptation for drug-target interaction prediction. BMC Bioinformatics. 2024;25(1):141. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Wu H, Liu J, Jiang T, Zou Q, Qi S, Cui Z, et al. AttentionMGT-DTA: a multi-modal drug-target affinity prediction using graph transformer and attention mechanism. Neural Netw. 2024;169:623–36. [DOI] [PubMed] [Google Scholar]
- 43.Zhou Y, Zhang Y, Zhao D, Yu X, Shen X, Zhou Y, et al. TTD: therapeutic target database describing target druggability information. Nucleic Acids Res. 2024;52(D1):D1465–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Burley SK, Berman HM, Bhikadiya C, Bi C, Chen L, Di Costanzo L, et al. RCSB protein data bank: biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology and energy. Nucleic Acids Res. 2019;47(D1):D464–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Snyder PW, Mecinović J, Moustakas DT, Thomas SW III, Harder M, Mack ET, et al. Mechanism of the hydrophobic effect in the biomolecular recognition of arylsulfonamides by carbonic anhydrase. Proc Natl Acad Sci U S A. 2011;108(44):17889–94. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Clarke B, Cutler L, Demont E, Dingwall C, Dunsdon R, Hawkins J, et al. BACE-1 hydroxyethylamine inhibitors using novel edge-to-face interaction with Arg-296. Bioorg Med Chem Lett. 2010;20(15):4639–44. [DOI] [PubMed] [Google Scholar]
- 47.Liu T, Lin Y, Wen X, Jorissen RN, Gilson MK. BindingDB: a web-accessible database of experimentally determined protein–ligand binding affinities. Nucleic Acids Res. 2007;35(suppl_1):D198-201. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Wishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M, Stothard P, et al. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res. 2006;34(suppl_1):D668-D72. [DOI] [PMC free article] [PubMed]
- 49.Wu Z, Jiang D, Wang J, Hsieh C-Y, Cao D, Hou T. Mining toxicity information from large amounts of toxicity data. J Med Chem. 2021;64(10):6924–36. [DOI] [PubMed] [Google Scholar]
- 50.Jiang D, Hsieh C-Y, Wu Z, Kang Y, Wang J, Wang E, et al. Interactiongraphnet: a novel and efficient deep graph representation learning framework for accurate protein–ligand interaction predictions. J Med Chem. 2021;64(24):18209–32. [DOI] [PubMed] [Google Scholar]
- 51.Rampášek L, Galkin M, Dwivedi VP, Luu AT, Wolf G, Beaini D. Recipe for a general, powerful, scalable graph transformer. Adv Neural Inf Process Syst. 2022;35:14501–15. [Google Scholar]
- 52.Xiao C, Zhou Z, She J, Yin J, Cui F, Zhang Z. PEL-PVP: application of plant vacuolar protein discriminator based on PEFT ESM-2 and bilayer LSTM in an unbalanced dataset. Int J Biol Macromol. 2024;277:134317. [DOI] [PubMed] [Google Scholar]
- 53.Bai P, Miljkovic F, John B, Lu H. Interpretable bilinear attention network with domain adaptation improves drug-target prediction. github: https://github.com/peizhenbai/DrugBAN/; 2023.
- 54.Liu T, Lin Y, Wen X, Jorissen RN, Gilson MK. BindingDB: a web-accessible database of experimentally determined protein–ligand binding affinities. Nucleic Acids Res. 2007. 10.1093/nar/gkl999. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Huang K, Xiao C, Glass LM, Sun J. MolTrans: molecular interaction transformer for drug-target interaction prediction. github: https://github.com/kexinhuang12345/MolTrans/; 2021. [DOI] [PMC free article] [PubMed]
- 56.Tsubaki M, Tomii K, Sese J. Compound-protein interaction prediction with end-to-end learning of neural networks for graphs and sequences. github: https://github.com/masashitsubaki/CPI_prediction/; 2019. [DOI] [PubMed]
- 57.Wishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M, Stothard P, et al. DrugBank: a comprehensive resource for in silico drug discovery and exploration. DrugBank: https://go.drugbank.com/; 2006. [DOI] [PMC free article] [PubMed]
- 58.Wu H, Liu J, Jiang T, Zou Q, Qi S, Cui Z, et al. AttentionMGT-DTA: a multi-modal drug-target affinity prediction using graph transformer and attention mechanism. github: https://github.com/JK-Liu7/AttentionMGT-DTA/tree/main/data/Davis; 2024. [DOI] [PubMed]
- 59.Zhou Y, Zhang Y, Zhao D, Yu X, Shen X, Zhou Y, et al. TTD: therapeutic target database describing target druggability information. Nucleic Acids Res. 2024(D1). 10.1093/nar/gkad751. [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Additional file 1. Statistical Analysis of Model Performance. This file contains the supplementary materials supporting the main text, including statistical analyses of model performance across different datasets, presented in additional figures and tables. Figs. S1–S5. Statistical analyses of AUROC performance between GPS-DTI and baseline models on different datasets: Human (Fig. S1), C.elegans (Fig. S2), BioSNAP (Fig. S3), BindingDB (Fig. S4), and DrugBank (Fig. S5). Table S1. Statistical analysis of AUROC performance across folds for GPS-DTI on all benchmark datasets.
Additional file 2. Model Complexity and Resource Utilization Analysis. This file provides supplementary information related to the computational complexity and resource utilization of the proposed model. Table S2. Comparison of model complexity and resource utilization among GPS-DTI and baseline methods, including parameter counts, training time, and GPU memory consumption.
Additional file 3. Detailed Explanation of Cross-Attention and 3D Structure Visualization. This file provides detailed implementation information for the “Interpretability with Cross-Attention Visualization” section, including the methods used to generate attention heatmaps and the procedures for visualizing the 3D structure. The content aims to enhance the reproducibility and interpretability of the GPS-DTI model.
Additional file 4. Complete Heatmap of Protein 3S75 Complex (.fig file). This file contains the complete attention heatmap of proteins within the 3S75 complex.
Additional file 5. Complete Heatmap of Protein 2XFK Complex (.fig file). This file contains the complete attention heatmap of proteins within the 2XFK complex.
Additional file 6. Features for Constructing Drug Graphs. This file contains Table S3, which lists the atom– and bond-level features used to construct molecular graphs for the GPS-DTI model.
Additional file 7. Comparison of MPNN Variants. This file contains Fig. S6, which shows the performance of different MPNN variants (GEN, CAT, GINE) on the BindingDB and BioSNAP datasets.
Data Availability Statement
All datasets used in this study are publicly available. The BioSNAP and BindingDB datasets were obtained from the DrugBAN [53] project repository (https://github.com/peizhenbai/DrugBAN/tree/main/datasets). According to the DrugBAN paper, these datasets were originally derived from publicly available sources, including the BindingDB [54] database (https://www.bindingdb.org/rwd/bind/index.jsp) and the BioSNAP [55] dataset (https://github.com/kexinhuang12345/MolTrans/tree/master/dataset/BIOSNAP/full_data). The Human and C.elegans datasets were obtained from Ref. [56], available at: Human (https://github.com/masashitsubaki/CPI_prediction/tree/master/dataset/human).C.elegans:( https://github.com/masashitsubaki/CPI_prediction/tree/master/dataset/celegans). The DrugBank dataset is a balanced dataset derived from the DrugBank [57] database (https://go.drugbank.com/). The DAVIS dataset was obtained from the AttentionMGT-DTA [58] project repository (https://github.com/JK-Liu7/AttentionMGT-DTA/tree/main/data/Davis). The COVID-19 drug–target interaction dataset was curated from the TTD [59] (https://db.idrblab.net/ttd/). The source code supporting the findings of this study is publicly available at GitHub: https://github.com/xa-123955/GPS-DTI









