Abstract
Identifying drug–drug interactions (DDIs) is a critical task in pharmaceutical research and clinical applications, as these interactions can pose serious medical risks. Deep learning models, known for their ability to accurately predict DDIs, have become powerful tools for enhancing prediction accuracy and efficiency. However, many existing approaches fail to fully incorporate chemical information and lack interpretability when exploring DDI mechanisms. In this work, we propose TRACE, a transformer-based graph representation learning framework that integrates chemical knowledge into DDI prediction. Extensive experiments demonstrate that TRACE outperforms state-of-the-art baseline models under both in-distribution and out-of-distribution settings, highlighting its strong predictive performance and generalization ability. In terms of interpretability, TRACE leverages its attention mechanism to effectively identify high-risk substructures that may trigger DDIs. In summary, TRACE not only provides new perspectives for elucidating the underlying causes of DDIs through interpretable substructure analysis but also offers robust predictive performance to support drug development and combination therapy.
Keywords: drug–drug interaction, molecular graph, chemical knowledge
Introduction
As the global population ages, the incidence of complex diseases such as cardiovascular disease and cancer continues to rise. Against this backdrop, combination drug therapies have become widely used in clinical settings [1]. However, the popularity of these treatment methods has also led to an increase in drug–drug interactions (DDIs), which can trigger serious adverse reactions [2, 3]. Additionally, in the drug development process, safety concerns caused by DDIs are a significant reason for drug withdrawals [4]. Verifying DDIs through traditional in vitro and in vivo experiments requires a substantial investment of time, money, and human resources [5]. Therefore, developing efficient and accurate methods for predicting DDIs has become a crucial and challenging task. With the accumulation of extensive drug data, computational methods based on machine learning and deep learning are increasingly becoming vital tools for supporting clinical decision-making [5–8].
In recent years, the capability of graph structures to represent complex data has been widely acknowledged. This has promoted the application of graph neural networks (GNNs) in learning the chemical structure features of drugs and predicting DDIs, making it a prominent research topic [9–13]. Many deep learning methods based on molecular graphs have achieved remarkable results. For example, Li et al. [13] proposed a dual-view learning approach that utilizes both local and global representation modules to learn the connections between molecular graphs more comprehensively. Lee et al. [12] further revealed the relationships between molecular graphs by constructing a structural causal model. However, most existing methods rely on traditional GNNs or their variants for drug molecular graph modeling. Due to the limitations of GNNs in structural representation, these methods usually aggregate information only from local neighborhoods. As a result, they struggle to capture complex structural features and long-range dependencies in molecular graphs [14]. This poses a bottleneck for GNNs in representing complex molecular structures [15–17]. To address this issue, we introduce a graph Transformer architecture in this study [18, 19]. Its global self-attention mechanism enables the exploration of more complex relationships within molecular graphs, facilitating more comprehensive molecular structure learning and potentially leading to more accurate DDI prediction.
In addition, most current deep learning methods based on molecular graphs are data-driven and lack the integration of chemical domain knowledge [9, 11, 20]. Specifically, these methods generally consider only the atom–atom relationships defined by chemical bonds, which limits the exploration of potential chemical semantics within molecules. Moreover, models that depend solely on structural features often exhibit limited interpretability, as most GNN-based approaches function as black-box models, making it difficult to attribute predictions to specific molecular substructures or chemically meaningful patterns [21, 22]. Incorporating external chemical knowledge as a prior in the construction of drug molecular graphs can enhance the model’s understanding and interpretability of chemical semantics in drug interactions. [23] In this work, we address this limitation by introducing external chemical knowledge during the construction of drug molecular graphs, thereby enriching the information represented between atoms and improving DDI prediction performance.
A wide range of computational approaches have been explored for DDI prediction. For binary interaction prediction, molecular graph–based models such as MPNN [24], GAT [25], GCN [26], DSN-DDI [13], MIRACLE [27], SSI-DDI [28], CMRL [12], and CGIB [11] learn structural or relational representations directly from molecular graphs. Beyond binary tasks, several studies address multi-type DDI event prediction. Similarity- and feature-based approaches such as DeepDDI and DeepWalk generate drug-level descriptors for multi-class classification, while knowledge-graph–enhanced models including KGDDI [29], KGNN [30], and MUFFINE [31] leverage biomedical KGs to incorporate relational information between drugs and biological entities. Other models, such as MDF-SA-DDI [32], MDDI-SCL [33], and MATT-DDI [34], integrate heterogeneous drug descriptors using deep feature fusion, contrastive learning, or attention mechanisms for multi-type DDI prediction.
These methods demonstrate the breadth of DDI predictive modeling across molecular, drug-feature, and KG levels. TRACE differs from these approaches by directly modeling molecular graphs augmented with chemical knowledge embeddings, providing a structurally grounded and interpretable framework for binary and multi-class DDI prediction.
To address these challenges, we introduce TRACE, a new DDI prediction framework that combines transformer-based molecular graph learning with atom-level chemical knowledge integration. TRACE is the first method to embed element-oriented knowledge graphs directly into atom representations, enabling the model to capture chemical reactivity patterns beyond molecular topology. In addition, TRACE couples self-attention with a substructure decomposition procedure to provide mechanistic, chemistry-grounded interpretability, identifying metabolic ”hot spots” that contribute to DDIs. Extensive experiments further demonstrate TRACE’s strong inductive generalization and its ability to uncover dataset-wide chemically meaningful substructures.
Results
Overview of TRACE
In this paper, we propose TRACE, a new DDI prediction method, termed Transformer-based Graph Representation LeArning with Chemical Embedding. TRACE is founded on graph transformers and augmented with chemical knowledge. Fig. 1 provides an overview of TRACE.
Figure 1.
Overview of TRACE, illustrating the integration of drug molecular properties and knowledge graph embeddings within a Graph Transformer framework for drug representation learning and DDI prediction, including (a) the overall TRACE workflow for feature extraction and model training, (b) the knowledge graph–enhanced molecular representation module based on chained ElementKG element embeddings, and (c) the Graph Transformer and training module that derives drug representations via attention and inputs chained DDI pairs into an MLP for prediction.
The interactions between drug molecules (DDIs) are influenced not only by the topological structures of the molecules but also by the chemical elements and functional groups they contain [35]. The properties and distribution of these elements largely determine the physicochemical characteristics and bioactivity of drug molecules, thereby affecting their interactions [36–38]. The Element Knowledge Graph (ElementKG) systematically integrates multidimensional properties and relationships of chemical elements and functional groups [23], providing a structured knowledge foundation for a deeper understanding of drug characteristics and molecular interaction mechanisms. During the construction of molecular graphs, we embed the elemental information from ElementKG into the original drug molecular graphs, resulting in KG-enhanced molecular graphs that combine both structural features and chemical domain knowledge. This enhanced representation not only expands the descriptive dimension of molecules but also provides a richer and deeper information basis for subsequent feature extraction and DDI prediction.
After obtaining the KG-enhanced molecular graphs, we input them into the Graph Transformer module to further extract high-level representations of drug molecules. The Graph Transformer leverages self-attention mechanisms to dynamically aggregate local and global information within the graph, effectively capturing complex structural and chemical patterns critical for DDI prediction. The representations of two drug molecules are then concatenated and fed into a downstream multilayer perceptron (MLP), which is trained to predict DDIs.
TRACE boosts the performance of drug–drug interaction prediction
To comprehensively evaluate the effectiveness of TRACE, we conducted experiments on three benchmark datasets of varying sizes for the DDI prediction task: the small-scale ZhangDDI [39], the medium-scale ChCh-Miner [40], and the large-scale DeepDDI [41]. Each dataset contains diverse and abundant experimental data, ensuring the representativeness of our evaluation. In our experiments, each drug is encoded as a molecular graph, and the process of feature extraction is described in detail in the Methods section.
We first assessed the performance of TRACE under the commonly adopted transductive setting, where drugs in the test set are also present in the training set. As shown in Table 1, Table 2, and Table 3, TRACE consistently outperforms all baseline models across all four evaluation metrics (AUROC, AUPR, ACC, and F1). While several recent state-of-the-art algorithms already achieve strong performance on these datasets, TRACE still provides the most balanced and reliable results overall.
Table 1.
Binary DDI prediction performance on the ZhangDDI dataset (AUROC, AUPR (AP), ACC, and F1).
| Baseline model | AUROC | AP | ACC | F1 |
|---|---|---|---|---|
| GCN |
|
|
|
|
| GAT |
|
|
|
|
| MPNN |
|
|
|
|
| MIRACLE |
|
|
|
|
| DSN-DDI |
|
|
|
|
| SSI-DDI |
|
|
|
|
| CMRL |
|
|
|
|
| CGIB |
|
|
|
|
| TRACE |
|
|
|
|
Note: The report includes the average and standard deviation (in percentage) of AUROC, AP (AUPR), ACC, and F1 from five independent runs.
Table 2.
Binary DDI prediction performance on the ChChMiner dataset (AUROC, AUPR (AP), ACC, and F1).
| Baseline model | AUROC | AP | ACC | F1 |
|---|---|---|---|---|
| GCN |
|
|
|
|
| GAT |
|
|
|
|
| MPNN |
|
|
|
|
| MIRACLE |
|
|
|
|
| DSN-DDI |
|
|
|
|
| SSI-DDI |
|
|
|
|
| CMRL |
|
|
|
|
| CGIB |
|
|
|
|
| TRACE |
|
|
|
|
Note: The report includes the average and standard deviation (in percentage) of AUROC, AP (AUPR), ACC, and F1 from five independent runs.
Table 3.
Binary DDI prediction performance on the DeepDDI dataset (AUROC, AUPR (AP), ACC, and F1).
| Baseline model | AUROC | AP | ACC | F1 |
|---|---|---|---|---|
| GCN |
|
|
|
|
| GAT |
|
|
|
|
| MPNN |
|
|
|
|
| MIRACLE |
|
|
|
|
| DSN-DDI |
|
|
|
|
| SSI-DDI |
|
|
|
|
| CMRL |
|
|
|
|
| CGIB |
|
|
|
|
| TRACE |
|
|
|
|
Note: The report includes the average and standard deviation (in percentage) of AUROC, AP (AUPR), ACC, and F1 from five independent runs.
In particular, TRACE surpasses the second-best method across all four metrics on every dataset, demonstrating both superior ranking capability (AUROC, AUPR) and more accurate classification of interacting drug pairs (ACC, F1). The performance gains are especially notable on the large-scale DeepDDI dataset, where TRACE improves ACC by +1.09% and achieves the highest AUROC (99.69%) and AUPR (99.46%) among all models. Similar improvements are observed on ZhangDDI and ChChMiner, where TRACE achieves almost the top performance across all metrics. These results indicate that TRACE performs nearly perfect DDI prediction for existing drugs under the transductive setting.
Table 4.
Binary DDI prediction performance evaluation of TRACE and state-of-the-art algorithms for the inductive setting on DeepDDI dataset.
| S1(single-unseen drug) | S2(double-unseen drug) | |||
|---|---|---|---|---|
| Baseline model | AUROC | ACC | AUROC | ACC |
| GCN |
|
|
|
|
| GAT |
|
|
|
|
| MPNN |
|
|
|
|
| MIRACLE |
|
|
|
|
| DSN-DDI |
|
|
|
|
| SSI-DDI |
|
|
|
|
| CMRL |
|
|
|
|
| CGIB |
|
|
|
|
| TRACE |
|
|
|
|
Note: The report includes the average and standard deviation (in percentage) of the test ROC-AUC and ACC from five independent runs.
To further evaluate the generalization capability of TRACE to unseen drugs, we performed experiments under the more challenging inductive setting. In particular, we tested TRACE on the largest dataset, DeepDDI, under two scenarios: S1, where each sample in the test set contains one unseen drug and one known drug (simulating the case of predicting DDI when a new drug is combined with existing drugs); and S2, where both drugs in the test samples are unseen (simulating prediction when two new drugs are combined). Notably, in the more practical S1 scenario, TRACE surpasses the second-best method, CGIB [11], by 2.62% in ACC and 1.81% in AUROC. Despite the increased challenge in the S2 scenario, TRACE still achieves performance comparable with the best existing methods. The superior performance of TRACE in the inductive setting can be attributed to its structurally and chemically informed design. Specifically, the integration of ElementKG likely contributes to improved generalization by introducing transferable chemical prior knowledge based on elemental properties and functional group patterns, which are less dependent on specific molecular scaffolds observed during training. Furthermore, the Graph Transformer architecture enhances global context modeling and alleviates the limitations of traditional local message-passing mechanisms, which may explain the improved robustness of TRACE when encountering unseen drug combinations. These observations suggest that TRACE can better capture generalizable chemical semantics, leading to its superior inductive performance.
To further provide a comprehensive evaluation of TRACE beyond binary interaction prediction, we additionally conducted multi-class DDI type prediction following prior studies that adopt the DrugBank event-type dataset. In this setting, each drug pair is associated with a specific interaction type, forming a multi-class classification task widely used in existing DDI research.
We evaluated TRACE and all baseline models on the DrugBank benchmark. As shown in Table 5, TRACE achieves the best performance across all metrics. Specifically, TRACE improves accuracy by 2.26%, and also yields consistent gains in precision, recall, and F1-score. The increase in precision indicates a reduction in false-positive event assignments, while the improvement in recall shows that TRACE more effectively captures true interaction types without missing meaningful events. The overall enhancement in F1-score reflects a balanced improvement in both correctness and coverage, demonstrating TRACE’s strong ability to distinguish diverse DDI types. These results confirm that TRACE generalizes well not only in binary interaction prediction but also in multi-class event classification.
Table 5.
Multi-class DDI type prediction performance on the DrugBank dataset.
| Model | Mean-ACC | Macro-Precision | Macro-Recall | Macro-F1 |
|---|---|---|---|---|
| DeepWalk |
|
|
|
|
| DeepDDI |
|
|
|
|
| KGDDI |
|
|
|
|
| KGNN |
|
|
|
|
| MUFFIN |
|
|
|
|
| DGNN-DDI |
|
|
|
|
| DSN-DDI |
|
|
|
|
| TRACE |
|
|
|
|
Note: The report includes the average and standard deviation (in percentage) of the test Mean-ACC, Macro-Precision, Macro-Recall, and F1 from five independent runs.
Revealing high-risk molecular structures in DDI mechanisms through TRACE’s attention mechanism
Elucidating the mechanisms underlying DDIs is of great scientific significance for improving medication safety and guiding rational drug design. However, the intrinsic mechanisms of DDIs are often complex and multifaceted. To systematically uncover the specific roles of molecular structures in DDIs, we employed the TRACE model to assign attention-based weights to different nodes within drug molecules and combined this with substructure decomposition algorithms [42, 43] to automatically identify high-risk molecular fragments that may play key roles in DDIs. The specific steps of this method and the relevant conclusions are detailed in the Supplementary Information.
It is worth noting that previous studies have shown that a significant proportion of harmful DDIs arise from metabolic alterations, where one drug affects the metabolism of another, resulting in changes in plasma concentration and potentially serious adverse effects. [44, 45] This highlights the central role of metabolism in DDI mechanisms. Therefore, in the subsequent analysis and validation of high-risk substructures, we specifically focused on substructures related to drug metabolism as representative cases.
Specifically, we conducted a thorough literature review to theoretically validate the association between these metabolism-related substructures and DDI risk. This approach not only enhances the interpretability of the model but also provides valuable molecular-level insights for drug design and safety assessment. As illustrated in Fig. 2, TRACE highlights key functional groups in representative molecules by assigning elevated attention weights to substructures with known DDI relevance. (1) The first example is the drug quinine, for which we observed a relatively high attention weight assigned to the quinoline functional group. Interestingly, studies have shown that quinoline derivatives can coordinate with the heme iron in CYP enzymes, acting as reversible inhibitors and thereby reducing the metabolism of co-administered drugs, which is of great significance. [46] (2) The second example is zileuton, where the model assigns greater attention to the thiophene substructure. Research indicates that P450 enzymes can oxidize thiophene to generate highly reactive metabolites, which readily interact with other cellular components and affect metabolic processes. [47] (3) The third case is erythromycin, with the amine substructure receiving the highest attention. Studies suggest that under the catalysis of P450 enzymes, amines can be transformed into nitroso-iron complexes, resulting in enzyme inactivation and consequently impacting the metabolism of other drugs. [46, 48] (4) The fourth example is stiripentol, in which the methylenedioxyphenyl substructure is assigned the highest attention weight. It has been reported that this substructure can form a metabolic intermediate complex with P450 enzymes through its metabolites, leading to quasi-irreversible inactivation of P450 and subsequently affecting the metabolism of other drugs, thus triggering DDIs. [47, 49].
Figure 2.
Examples of high-risk molecular substructures, including (a) a schematic illustration of the substructure decomposition algorithm for generating candidate motifs and the subsequent attention-based scoring procedure used to identify high-risk substructures, (b) visualization of substructure-level attention for four representative molecules together with mechanistic explanations of their functional roles, and (c) dataset-wide involvement of high-risk substructures in DDI-positive interactions, where the average number of DDI-positive interactions per drug is compared between molecules containing a given motif and those without it, with summary statistics of motif frequency, DDI-pair counts, and attention consistency reported.
In addition to the representative case studies discussed above, we further assessed whether the high-attention substructures identified by TRACE generalize across the broader chemical space in DeepDDI. These four substructure families are broadly distributed in the dataset, collectively appearing in 739 drugs. TRACE also responds to them in a consistent manner. For instance, it assigns higher-than-average attention to quinoline-containing molecules in nearly 80% of cases, and alkylamines exhibit a similar pattern of consistently elevated attention.
To evaluate their relevance to observed interaction patterns, we quantified how often these motifs are involved in DDI-positive drug pairs. As shown in Fig. 2, motif-containing drugs participate in markedly more DDIs than motif-free drugs. For instance, quinoline-containing molecules exhibit an average of 272.5 DDI-positive interactions per drug, compared with 108.6 for drugs lacking this motif. Alkylamine-containing drugs show a similarly strong trend, with >220 interactions per drug on average. These enriched patterns indicate that the substructures emphasized by TRACE recur frequently in DDI-associated drug pairs.
Together, these analyses demonstrate that the motifs highlighted by TRACE are neither isolated nor molecule-specific; instead, they represent broadly distributed structural patterns that repeatedly arise in DDI-relevant chemical contexts. This supports the generalizability and mechanistic value of TRACE’s substructure-level explanations.
Case study of TRACE for predicting tanshinone IIA–drug interactions
Given the growing importance of natural product-derived compounds in modern pharmacology [50, 51], we investigated the applicability of TRACE in scenarios involving bioactive natural products. Specifically, we conducted a case study on Tanshinone IIA—a representative bioactive natural product extracted from the traditional medicinal plant Salvia miltiorrhiza [52]. By evaluating the performance of TRACE in predicting interactions between Tanshinone IIA and several clinically relevant drugs, and interpreting the results using molecular docking, we aimed to explore whether TRACE could be leveraged to predict interactions involving structurally novel compounds and provide mechanistic insights at the molecular level.
We systematically reviewed the published literature and constructed a dataset comprising seven drugs known to interact with Tanshinone IIA: warfarin, dabigatran, rivaroxaban, apixaban, edoxaban, betrixaban, and losartan [53–55]. For each Tanshinone IIA–drug pair, we employed the pretrained TRACE model to predict the presence or absence of an interaction. Notably, TRACE achieved 100% prediction accuracy on this dataset.
To further elucidate the molecular basis of these interactions, we conducted molecular docking analyses. First, we performed detailed docking studies using Tanshinone IIA and warfarin as a clinically relevant example. Both compounds were individually docked to human serum albumin (HSA), the primary drug-binding protein in blood and a common mediator of DDIs. As shown in Fig. 3, the docking results indicated that both Tanshinone IIA and warfarin bind to the same pocket on HSA with comparable binding affinities. Structural analysis further revealed that both molecules form hydrogen bonds with the same amino acid residue, ARG257, within the binding pocket. This suggests a potential competitive binding relationship between the two compounds. These findings provide a structural explanation for the clinically observed interaction between Tanshinone IIA and warfarin, indicating that their co-administration may lead to competitive displacement and altered pharmacokinetics.
Figure 3.
DDI prediction of Tanshinone IIA using the TRACE model and molecular docking analyses, including (a) an overview of TRACE-based DDI prediction for Tanshinone IIA, (b) molecular docking of Tanshinone IIA and warfarin with human serum albumin (HSA, PDB ID: 1HA2), (c) molecular docking of Tanshinone IIA with P-glycoprotein (P-gp), and (d) molecular docking of Tanshinone IIA and midazolam with CYP3A4 illustrating their relative binding configurations.
For the direct oral anticoagulants (DOACs), including dabigatran, rivaroxaban, apixaban, edoxaban, and betrixaban, we further investigated potential mechanisms by docking Tanshinone IIA to P-glycoprotein (Pgp). The results showed that tanshinone IIA can occupy the substrate-binding pocket of Pgp and forms a hydrogen bond with Y949, which may inhibit the activity of Pgp, thereby increasing the plasma concentrations of DOACs and consequently enhancing DDIs.
For losartan, we performed molecular docking between Tanshinone IIA and CYP3A4, a major metabolic enzyme [54]. The results indicate that tanshinone IIA and midazolam [56] occupy the same binding pocket, suggesting that tanshinone IIA may have a similar role to midazolam. It could serve as a substrate for CYP3A4, thereby slowing the metabolism of losartan and increasing the risk of DDIs and adverse effects.
Overall, the integration of AI-based predictions from the TRACE model with mechanistic validation by molecular docking offers a feasible and interpretable approach for systematically discovering and elucidating interactions between natural products and conventional drugs.
Ablation study for the effectiveness of TRACE model design
To investigate the sources of TRACE’s performance gains, we conducted a series of ablation studies to assess the contributions of its key components. Specifically, we focused on two modules: the chemical element knowledge graph embedding (ElementKG) and First; we evaluated a variant of TRACE without the ElementKG embedding (denoted as w/o KG) across all three datasets. As shown in Table 6, removing ElementKG led to decreased performance on all metrics, with the full TRACE model achieving a 1.46% higher ACC on the small-scale ZhangDDI dataset compared with its variant. These results underscore the effectiveness of incorporating chemical element knowledge into the model. Secondly, to examine the role of the graph transformer, we replaced it with GCN and GAT architectures, creating the w/o GCN and w/o GAT variants. Evaluations on all three datasets demonstrated that TRACE with the graph transformer consistently outperformed the alternatives, with particularly notable improvements of 7.9% in AUROC and over 12% in ACC on the DeepDDI dataset. This clearly demonstrates the substantial contribution of the graph transformer module. We further evaluated a variant of TRACE in which the final layer updates both nodes and edges. This modification yielded modest improvement only on ChCh-Miner, with no gains on ZhangDDI or DeepDDI and an increase in computational cost. These findings suggest that edge information is already sufficiently captured in earlier layers, and that final refinement is most effective at the node level, which aligns with TRACE’s chemically grounded design and its use of atom-level ElementKG embeddings.
Table 6.
Ablation study for TRACE on three benchmark datasets.
| ZhangDDI | ChChMiner | DeepDDI | ||||
|---|---|---|---|---|---|---|
| Baseline model | AUROC | ACC | AUROC | ACC | AUROC | ACC |
| w/o GCN |
|
|
|
|
|
|
| w/o GAT |
|
|
|
|
|
|
| w/o KG |
|
|
|
|
|
|
| TRACE(edge) |
|
|
|
|
|
|
| TRACE |
|
|
|
|
|
|
Note: The report includes the average and standard deviation (in percentage) of the test ROC-AUC and ACC from five independent runs.
In summary, each component of the TRACE model significantly contributes to DDI prediction performance, further validating the rationality and scientific soundness of our model design.
Discussion
In this study, we present TRACE, a novel method for predicting DDIs by integrating graph transformers with chemical knowledge embeddings. TRACE demonstrates outstanding performance in both transductive and inductive DDI prediction scenarios, accurately identifying interactions between both known and previously unseen drugs. By leveraging the attention mechanism of graph transformers, TRACE can effectively pinpoint high-risk molecular substructures that are likely to contribute to DDIs, and these identified substructures closely match those reported in the literature. This level of interpretability not only enhances TRACE’s value as a predictive tool but also offers new perspectives for understanding the mechanisms underlying DDIs.
Importantly, we further discuss how this interpretability can provide actionable insights for drug design. The high-attention substructures highlighted by TRACE—such as quinoline, thiophene, alkylamines, and methylenedioxyphenyl groups—correspond to well-established metabolic ”hot spots” frequently involved in CYP inhibition, bioactivation, or reactive intermediate formation. By systematically recognizing these chemically liable regions across diverse molecules, TRACE can help medicinal chemists identify components of a scaffold that may require modification, masking, or replacement during lead optimization.
Notably, this interpretability is not limited to isolated examples. By extending the analysis to 86 distinct DDI mechanism categories, TRACE reveals recurring, mechanism-consistent substructure patterns that generalize across different interaction types. A comprehensive summary of high-risk substructures and representative drugs for all 86 DDI categories is provided in the Supplementary Information (Table S1).
Additionally, because TRACE generalizes well to previously unseen drugs, the model can be used prospectively to prioritize safer analogs, guide scaffold hopping strategies, and evaluate the DDI liabilities of structural variants before synthesis. These capabilities demonstrate that TRACE’s interpretability goes beyond post hoc explanation and provides practical guidance for molecular redesign and pharmacological safety assessment.
Moreover, we extended our case studies to natural products, using tanshinone IIA as a representative example. TRACE achieved near-perfect predictions in this context as well (see Supplementary Information), and the accuracy of these predictions was further confirmed through molecular docking analysis.
Despite TRACE’s robust performance, there are several limitations to consider. For example, the chemical knowledge embedding currently focuses mainly on elemental knowledge and may not fully capture complex molecular interactions. Additionally, while TRACE shows strong performance in predicting interactions between pairs of drugs, predicting interactions among multiple drug combinations remains a challenge.
To address these limitations, we propose several potential future directions. First, we suggest incorporating a broader range of chemical information, such as embedding metabolic pathways and physicochemical descriptors (e.g. LogP, pKa), to enhance TRACE’s chemical knowledge embedding and provide a more comprehensive representation of molecular systems. In particular, well-established chemical and bioactivity databases such as DrugBank, ChEMBL, and PubChem offer rich structural annotations, metabolic reaction records, and experimentally validated property data, which can serve as practical data sources for extending TRACE’s knowledge integration in future work. Secondly, studying the chemical interpretability of the high-risk substructures identified by TRACE could deepen our understanding of their roles in DDIs and molecular design. Finally, combining TRACE’s predictions with molecular dynamics simulations and wet-lab experiments could bridge the gap between computational predictions and pharmacological validation, offering practical insights for molecular optimization and drug combination strategies, thereby accelerating drug discovery.
Conclusion
In this study, we introduce TRACE, a novel framework that combines graph transformer models with chemical knowledge embeddings for DDI prediction. TRACE demonstrates superior predictive performance in both transductive and inductive scenarios, accurately identifying high-risk substructures associated with DDIs and providing new insights into the underlying molecular mechanisms. Our approach not only advances the state of the art in DDI prediction but also offers valuable interpretability, facilitating safer drug development and usage. In future work, we aim to further improve model generalizability and extend TRACE to a wider range of drug discovery and pharmacological safety applications.
Methods
ElementKG construction
To integrate fundamental chemical knowledge into TRACE, we construct an element-oriented knowledge graph (ElementKG) following the methodology of Fang et al. [57]. ElementKG provides structured representations of chemical elements and functional groups and is built through three steps: entity definition, property assignment, and relation modeling.
Entities
ElementKG contains two categories of entities: (i) chemical elements commonly appearing in drug molecules (e.g. C, H, O, N, S, Cl, Br, I), and (ii) functional groups collected from the Wikipedia ”Functional group” page. Each entity is assigned to a chemical class hierarchy derived from the Periodic Table (e.g. metals, nonmetals, reactive nonmetals) to capture elemental taxonomy.
Data properties
For each element, we collect physicochemical attributes from the Periodic Table (https://ptable.com), including atomic number, electron affinity, ionization energy, physical state, covalent radius group, electronegativity group, and other relevant descriptors. Functional groups are annotated using characteristic bond-type patterns (e.g. single, double, aromatic bonds). These attributes provide multidimensional chemical priors beyond molecular topology.
Object properties
We model relations between entities by discretizing continuous element attributes into grouped relational categories (e.g. inRadiusGroup1, inElectronegativityGroup2) and by linking elements to functional groups through an isPartOf relation. This yields a relational structure reflecting periodic trends and group membership.
KG embedding
We adopt the OWL2Vec*-based embedding procedure from Fang et al. [57], generating structure and lexical documents via random walks over ElementKG and training a skip-gram Word2Vec model to obtain embedding vectors for all entities. These embeddings are incorporated into TRACE as chemical prior features for enhancing molecular graph representation learning.
TRACE architecture
Graph representations for molecules
In the Methods section, for molecular graph representation, our approach begins with the original data presented in SMILES format—a line notation system that uses ASCII strings to describe the structure of chemical species. We utilize RDKit for feature computation and employ the Deep Graph Library to generate molecular graphs. Specifically, the features of the molecules primarily include attributes of atoms and chemical bonds. Each atom is represented by a 63D feature vector, encompassing chemical properties such as atom type, chirality, and hybridization state. Each bond feature is represented by a 6D vector, which includes bond type, conjugation status, and ring state (see Table 7).
Table 7.
Feature extraction for atoms and bonds.
| Features | Size | Description |
|---|---|---|
| Nodes | ||
| atom_type_one_hot | 40 | One hot encoding for atom type ([’C’, ’O’, ’N’, ’Ti’, ’S’, ’Cl’, ’Br’, ’I’, ’K’, ’Bi’, ’Al’, ’Mg’, ’Ca’, ’Si’, ’Na’, ’P’, ’F’, ’Pt’, ’B’, ’Li’, ’Gd’, ’As’, ’Tc’, ’Fe’, ’Ga’, ’H’, ’Ag’, ’Ra’, ’La’, ’Sr’, ’Zn’, ’Cu’,’Se’, ’Cr’, ’Au’, ’Hg’, ’Co’, ’Sb’,’Mn’,’Lu’]) |
| atom_valence:one_hot | 7 | One hot encoding for atom valence (0 to 6). |
| atom_num_radical_electrons | 2 | One hot encoding for the number of radical electrons (0, 1). |
| atom_degree_one_hot | 7 | One hot encoding for the degree of the atom (0 to 6). |
| atom_formal_charge | 3 | One hot encoding for formal charge (−1, 0, 1). |
| atom_hybridization_one_hot | 4 | One hot encoding for atom hybridization (SP, SP2, SP3, SP3D). |
| Edges | ||
| bond_type_one_hot | 4 | One hot encoding for bond type (SINGLE, DOUBLE, TRIPLE, AROMATIC) |
| bond_is_conjugated | 1 | Whether the bond is conjugated |
| bond_is_in_ring | 1 | Whether the bond is in a ring |
Additionally, we incorporate chemical element knowledge embeddings into the atomic feature vectors to create chemically enhanced molecular graphs. Specifically, we utilize element embeddings from the ElementKG—a knowledge graph constructed of chemical elements—by adding these embeddings to the atomic feature vectors. Within TRACE, atomic features dynamically interact through an attention mechanism, where atomic features are updated by aggregating neighborhood information, thus enhancing molecular representation. Consequently, the input for each atom is initialized as a 196D vector, optimizing the integration of both structural and chemical knowledge in the representation.
Graph transformer for feature extraction module
First, we initialize the graph transformation process by preparing the input node and edge embeddings. For each node
in a graph
, node features are represented as
, and for each edge between nodes
and
, edge features are
. These node and edge features are then subjected to two independent linear projections, transforming them into
-dimensional hidden representations, denoted as
for nodes and
for edges:
![]() |
(1) |
where
,
, and
are the weights and biases of linear projection layers. The overall architecture of the graph transformer layer, depicted in Fig. 1B, primarily relies on a modified multihead self-attention (MHA) mechanism to update the node and edge features of each graph. Taking the
layer as an example, the hidden representations of nodes and edges,
and
, are first normalized through a batch normalization layer. Subsequently, within the Multihead Self-Attention (MHA) mechanism, the corresponding queries (
), keys (
), values (
), and the hidden representations of the edges (
) are computed:
![]() |
(2) |
![]() |
(3) |
![]() |
(4) |
![]() |
(5) |
where
,
,
denotes the number of attention heads,
represents the dimension of each head, which equals
divided by
, and
represents the neighboring nodes of node
. The self-attention mechanism calculates the attention weight between queries and keys, and then multiplies by the value. The output attention scores is formulated as
![]() |
(6) |
For numerical stability, the outputs after taking exponents of the terms inside softmax is clamped to a value between −5 and +5. Then we compute the intermediate representations for the nodes and edges:
![]() |
(7) |
![]() |
(8) |
where
, ∥ denotes concatenation. The outputs
and
then passed to separate Feed Forward Networks preceded and succeeded by residual connections and normalization layers, as
![]() |
(9) |
![]() |
(10) |
where
, and
,
,
denote intermediate representations. The node representations obtained from the final layer of the graph transformer are passed on to downstream tasks for predicting DDIs.
Interaction prediction module
Given a set of all drugs
, consider a pair of drugs
, where
. The enhanced molecular graphs
and
are processed using graph transformer modules to extract their feature representations
and
:
![]() |
(11) |
![]() |
(12) |
where
and
denote the chemically augmented atom and bond features, respectively. The final interaction probability is computed via MLP projection:
![]() |
(13) |
we formulate the cross-entropy loss
for all DDI pairs:
![]() |
(14) |
where
denotes the drug pair set and
is the ground-truth label.
Experimental setup
Dataset
To comprehensively evaluate TRACE, we conducted experiments on three widely used binary DDI datasets—ZhangDDI [39], ChChMiner [40], and DeepDDI [41]—as well as the DrugBank dataset [58] (version 5.0) for multi-class interaction-type prediction. Table 8 summarizes the core statistics of all datasets.
Table 8.
Summary of datasets used in this study.
| Dataset | #Drugs | #Pairs | Pos:Neg | Task | #Labels |
|---|---|---|---|---|---|
| ZhangDDI | 548 | 48 548 | 1:1 | Binary | 2 |
| ChCh-Miner | 1322 | 48 514 | 1:1 | Binary | 2 |
| DeepDDI | 1704 | 191 511 | 1:1 | Binary | 2 |
| DrugBank | 1704 | 191 400 | – | Multi-class | 86 |
Binary classification datasets
For the binary prediction task, we follow prior work such as MIRACLE [27] and CMRL [12] to construct negative samples by randomly sampling drug pairs that do not appear in the positive set. We ensure the following:
a fixed 1:1 positive-to-negative ratio,
no overlap between negative and positive pairs,
the same negative sample set is used for all baselines to guarantee fairness.
Experiments are conducted under both transductive and inductive settings.
In the transductive setting, each dataset was randomly split into training, validation, and test sets with a ratio of 6:2:2, ensuring that drugs appearing in the test set were also present in the training set. For each dataset, experiments were repeated five times with different random seeds, and we report the mean and standard deviation of the results. The best test performance was recorded when the model achieved the highest performance on the validation set.
In the inductive setting, we evaluated the generalization ability of our model to novel drugs not included in the training set.
In the inductive setting, we focused on the DeepDDI dataset and evaluated the model’s generalization ability to novel drugs that were not included in the training set. Specifically, let
denote the set of all drugs in DeepDDI. We randomly partitioned
into five mutually exclusive and approximately equal-sized subsets:
, where
for all
. For the
th fold (
…
), we designate
as the test drug set and
as the training drug set. For each fold, we construct the following drug pair datasets:
Training set: drug pairs
where 
s1 set (Single-unseen set): drug pairs
where
or
, but not boths2 set (Double-unseen set): drug pairs
where 
During each fold, the model was trained exclusively on drug pairs in which both drugs belonged to the training set. Evaluation was then performed on drug pairs where at least one or both drugs were from the test set, thereby measuring the model’s ability to generalize to unseen drugs. This process was repeated five times, with each subset serving as the test set once. The final performance was reported as the mean and standard deviation across the five folds.
Multi-class classification dataset
For the task of predicting interaction types, we utilized the widely recognized Drugbank dataset [58] (version 5.0), which comprises 191 808 DDI records, involving 1706 drugs and 86 types of interactions. These datasets were divided into training, validation, and test sets with a split ratio of 6: 2: 2. For each dataset, experiments were repeated five times with different random seeds, and we report the mean and standard deviation of the results. The best test performance was recorded when the model achieved the highest performance on the validation set.
Implementation
During training, we first use linear projection to map the initial atomic and bond features to a 128D hidden space, then input this into the graph transformer module for training. The graph transformer consists of eight layers; the first seven layers utilize a multi-head self-attention mechanism to update both node and edge features simultaneously, while the final layer focuses on refining node features. Each layer consists of the following key components: firstly a parallelized multi-head attention computation, where four heads work in parallel allowing the model to capture different aspects of the data at the same time; secondly a BatchNorm, which normalizes the attentional outputs; and then a nonlinear projection (with input and output dimensions of 128 and an inner dimension of 256), activated using a SiLU, using a 0.1 dropout rate, and quadratic normalization.
We train the model on the training set and implement an early stopping mechanism based on the performance on the validation set. The model is trained using the Adam optimizer with a batch size of 1024 and a learning rate of 0.001. TRACE is implemented in Pytorch and runs on an Linux server equipped with NVIDIA A10 graphics processors.
Baseline
We compare TRACE with 13 state-of-the-art DDI prediction models, covering GNN-based approaches, contrastive representation learning methods, causal-relational models, and KG-enhanced architectures.
To ensure strict fairness and reproducibility we have the following:
Identical dataset splits. All baseline models are trained and evaluated on exactly the same training/validation/test splits as TRACE, including the transductive and inductive settings.
Consistent negative sampling. For binary tasks, all baselines use the same fixed 1:1 positive-to-negative sampling set constructed for TRACE.
Hyperparameter configurations. Hyperparameters strictly follow the settings reported in the original publications or the default values provided in their official implementations.
Unified training protocol. All baselines are trained using the same early-stopping strategy (patience = 20) and evaluated under identical metrics.
Multiple runs. Each experiment is repeated five times with different random seeds, and we report mean
standard deviation.
These measures ensure that all experimental comparisons are conducted under an identical and reproducible protocol.
GCN [26]: a graph convolutional network (GCN) is employed to perform semi-supervised node classification tasks. We use the GCN to encode drug molecular graphs and utilize their representations as a basis for prediction.
GAT [25]: a graphical attention network (GAT) is used to learn node embeddings through an attentional mechanism on graphs. We utilize GAT to obtain drug embeddings for DDI prediction.
MPNN [24]: it achieves an understanding of complex structures by propagating information through graphs. We utilize MPNN to obtain drug embeddings for predicting DDIs.
MIRACLE [27]: a method that uses multi-view contrastive representation learning to predict DDIs, capturing both inter-molecular structures across views and intra-molecular interactions within views.
DSN-DDI [13]: it learns substructures of drug pairs from both internal and external dual views for drug characterization, enabling the prediction of DDIs.
SSI-DDI [28]: it uses GNNs to extract features, treating each node’s hidden representation as a substructure, and calculates the impact of these substructure interactions on the final DDI prediction.
CMRL [12]: a relational learning approach that leverages causal relationships to learn interactions between pairs of molecules.
CGIB [11]: a relational learning method based on graph information bottleneck to learn interactions between pairs of molecules.
DeepDDI [41]: a similarity-based approach that designs feature structure similarity curves for each drug based on its molecular fingerprint. It then inputs the similarity features of the drugs into an MLP for prediction.
DeepWalk [59]: a method for learning latent representations of network vertices involves generating node sequences through random walks to perform node representation. We utilize DeepWalk to obtain drug representations for predicting DDIs.
KGDDI [29]: a drug prediction method utilizing a CNN-LSTM-based neural network, which integrates the dataset into a knowledge graph (KG) and embeds nodes using the ComplEx embedding technique.
KGNN [30]: a drug prediction method based on GCNs selectively aggregates neighbor information with higher level information to learn node representations within a KG.
MUFFINE [31]: a CNN-based deep learning approach for DDI prediction using drug chemical structures and biomedical KGs.
DGNN-DDI [60]: a dual-graph GNN–based method that predicts DDIs through directed message passing and substructure-level interaction modeling.
Attention-guided high-risk substructure extraction
To provide interpretability and identify chemically meaningful regions associated with DDIs, TRACE extracts high-risk substructures in two steps.
(1) Candidate substructure generation. We adopt a chemistry-inspired molecular substructure decomposition algorithm based on BRICS fragmentation, followed by rule-based refinement [42]. The algorithm first applies BRICS cleavage rules to break chemically meaningful bonds and generate initial fragments. It then refines these fragments by additionally splitting ring–non-ring bonds and isolating highly connected non-ring atoms, producing compact and chemically coherent motifs. When applied across the ChEMBL dataset; this procedure yields a vocabulary of 12 331 unique structural motifs [61]. A schematic illustration of this substructure decomposition process is provided in Supplementary Fig. S1. In TRACE, we reuse this motif vocabulary and apply the same fragmentation procedure to molecules in our DDI datasets, enabling each molecule to be mapped to a standardized set of chemically meaningful motifs.
(2) Attention-based scoring. For each molecule, TRACE produces an atom-level attention vector. The attention score of a candidate substructure is computed as the sum of attention values of its constituent atoms; an average score is obtained by normalizing by substructure size. Substructures with the highest average attention scores are identified as high-risk substructures, representing regions most emphasized by the model.
This approach provides chemically interpretable substructures that align with functional groups or reactive atomic clusters, while remaining computationally efficient and grounded in the model’s learned attention distribution.
Key Points
Chemical knowledge integration and interpretable graph transformer architecture: TRACE integrates element-level chemical knowledge through ElementKG and utilizes a self-attention–based Graph Transformer to jointly capture local and global structural dependencies. This combination enriches molecular representations with chemically grounded semantics and enhances predictive reliability, providing actionable guidance for early-stage compound optimization and DDI liability assessment.
Robust generalization and benchmarking: TRACE consistently outperforms state-of-the-art models in both transductive and inductive settings, demonstrating strong generalization to unseen drugs. This robustness enables reliable evaluation of novel compounds and supports clinical decision-making in scenarios where empirical evidence of DDIs is sparse or unavailable.
High-risk substructure identification: through attention-guided interpretability coupled with substructure decomposition, TRACE identifies chemically meaningful fragments—such as quinoline, thiophene, alkylamines, and methylenedioxyphenyl groups—that are known contributors to DDI mechanisms. These insights help medicinal chemists locate metabolically vulnerable regions within molecular scaffolds and inform rational structural modifications to mitigate interaction risks.
Supplementary Material
Contributor Information
Jinlu Zhang, Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, 866 Yuhangtang Road, Xihu District, Hangzhou 310058, Zhejiang, China; State Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Xiangfudang Sci-tech Innovation Green Valley, Jiashan County, Jiaxing 314102, Zhejiang, China.
Xuting Zhang, Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, 866 Yuhangtang Road, Xihu District, Hangzhou 310058, Zhejiang, China; State Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Xiangfudang Sci-tech Innovation Green Valley, Jiashan County, Jiaxing 314102, Zhejiang, China.
Yizheng Dai, State Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Xiangfudang Sci-tech Innovation Green Valley, Jiashan County, Jiaxing 314102, Zhejiang, China.
Xin Shao, Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, 866 Yuhangtang Road, Xihu District, Hangzhou 310058, Zhejiang, China; State Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Xiangfudang Sci-tech Innovation Green Valley, Jiashan County, Jiaxing 314102, Zhejiang, China.
Xiaohui Fan, Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, 866 Yuhangtang Road, Xihu District, Hangzhou 310058, Zhejiang, China; State Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Xiangfudang Sci-tech Innovation Green Valley, Jiashan County, Jiaxing 314102, Zhejiang, China.
Competing interests
No competing interest is declared.
Funding
This work was supported by the ”Pioneer” and ”LeadingGoose” R&D Program of Zhejiang (2025C01110, X.S.), and the National Natural Science Foundation of China (U23A20513, X.F.).
Data availability
The source code and data of this work are freely available in the GitHub repository https://github.com/ZJUFanLab/TRACE.
References
- 1. Gilmartin D, O’Mahony D. Polypharmacy and potentially inappropriate prescribing in hospitalised older Irish adults. Age Ageing 2012;41:78. [DOI] [PubMed] [Google Scholar]
- 2. Petrovic M, Van der Cammen T, Onder G. Adverse drug reactions in older people: detection and prevention. Drugs Aging 2012;29:453–62. [DOI] [PubMed] [Google Scholar]
- 3. Scondotto G, Pojero F, Addario SP. et al. The impact of polypharmacy and drug interactions among the elderly population in Western Sicily, Italy. Aging Clin Exp Res 2018;30:81–7. 10.1007/s40520-017-0755-2 [DOI] [PubMed] [Google Scholar]
- 4. Giacomini KM, Krauss RM, Roden DM. et al. When good drugs go bad. Nature 2007;446:975–7. 10.1038/446975a [DOI] [PubMed] [Google Scholar]
- 5. Zhang Y, Deng Z, Xiaoyu X. et al. Application of artificial intelligence in drug–drug interactions prediction: a review. J Chem Inf Model 2023;64:2158–73. [DOI] [PubMed] [Google Scholar]
- 6. Adlung L, Cohen Y, Mor U. et al. Machine learning in clinical decision making. Med 2021;2:642–65. 10.1016/j.medj.2021.04.006 [DOI] [PubMed] [Google Scholar]
- 7. Ouanes K, Farhah N. Effectiveness of artificial intelligence (AI) in clinical decision support systems and care delivery. J Med Syst 2024;48:74. 10.1007/s10916-024-02098-4 [DOI] [PubMed] [Google Scholar]
- 8. Zhang K, Yang X, Wang Y. et al. Artificial intelligence in drug development. Nat Med 2025;31:45–59. 10.1038/s41591-024-03434-4 [DOI] [PubMed] [Google Scholar]
- 9. Lee N, Lee J, Park C. Augmentation-free self-supervised learning on graphs. In: Proceedings of the AAAI Conference on Artificial Intelligence, Vol 36. New York, USA: AAAI Press, 2022, 7372–80. 10.1609/aaai.v36i7.20700 [DOI] [Google Scholar]
- 10. Sun M, Xing J, Wang H. et al. MoCL: Data-driven molecular fingerprint via knowledge-aware contrastive learning from molecular graphs. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. New York, USA: ACM Press, 2021, 3585–94. [DOI] [PMC free article] [PubMed]
- 11. Lee N, Hyun D, Na GS. et al. Conditional graph information bottleneck for molecular relational learning. In: Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A (eds.), Proceedings of the 40th International Conference on Machine Learning, Vol 202. Brooklyn, NY, USA: PMLR, 2023, 18852–71. [Google Scholar]
- 12. Lee N, Yoon K, Na GS. et al. Shift-robust molecular relational learning with causal substructure. In: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. New York, USA: ACM Press, 2023, 1200–12.
- 13. Li Z, Zhu S, Shao B. et al. DSN-DDI: an accurate and generalized framework for drug–drug interaction prediction by dual-view representation learning. Brief Bioinform 2023;24:bbac597. [DOI] [PubMed] [Google Scholar]
- 14. Chen D, O’Bray L, Borgwardt K. Structure-aware transformer for graph representation learning. In: Chaudhuri K, Salakhutdinov R (eds.), Proceedings of the 39th International Conference on Machine Learning, Vol 162. Brooklyn, NY, USA: PMLR, 2022, 3469–89. [Google Scholar]
- 15. Zonghan W, Pan S, Chen F. et al. A comprehensive survey on graph neural networks. IEEE Trans Neural Networks Learn Syst 2020;32:4–24. 10.1109/TNNLS.2020.2978386 [DOI] [PubMed] [Google Scholar]
- 16. Wu Q, Zhao W, Li Z. et al. NodeFormer: a scalable graph structure learning transformer for node classification. In: Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A (eds.), Advances in Neural Information Processing Systems, Vol 35. Red Hook, NY, USA: Curran Associates, Inc., 2022, 27387–401. [Google Scholar]
- 17. Dai H, Kozareva Z, Dai B. et al. Learning steady-states of iterative algorithms over graphs. In: Dy J, Krause A (eds.), Proceedings of the 35th International Conference on Machine Learning, Vol 80. Brooklyn, NY, USA: PMLR, 2018, 1106–14. [Google Scholar]
- 18. Dwivedi VP, Bresson X. A generalization of transformer networks to graphs. arXiv preprint arXiv:2012.09699. 2020.
- 19. Morehead A, Chen C, Cheng J. Geometric transformers for protein interface contact prediction. arXiv preprint arXiv:2110.02423. 2021.
- 20. Fang X, Liu L, Lei J. et al. Geometry-enhanced molecular representation learning for property prediction. Nat Mach Intell 2022;4:127–34. 10.1038/s42256-021-00438-4 [DOI] [Google Scholar]
- 21. Ying Z, Bourgeois D, You J. et al. GNNExplainer: generating explanations for graph neural networks. In: Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R (eds.), Advances in Neural Information Processing Systems, Vol 32. Red Hook, NY, USA: Curran Associates, Inc., 2019, 9240–51. [PMC free article] [PubMed] [Google Scholar]
- 22. Yuan H, Haiyang Y, Gui S. et al. Explainability in graph neural networks: a taxonomic survey. IEEE Trans Pattern Anal Mach Intell 2022;45:5782–99. [DOI] [PubMed] [Google Scholar]
- 23. Fang Y, Zhang Q, Yang H. et al. Molecular contrastive learning with chemical element knowledge graph. In: Proceedings of the AAAI Conference on Artificial Intelligence, Vol 36. New York, USA: AAAI Press, 2022, 3968–76. 10.1609/aaai.v36i4.20313 [DOI] [Google Scholar]
- 24. Gilmer J, Schoenholz SS, Riley PF. et al. Neural message passing for quantum chemistry. In: Precup D, Teh YW (eds.), Proceedings of the 34th International Conference on Machine Learning, Vol 70. Brooklyn, NY, USA: PMLR, 2017, 1263–72. [Google Scholar]
- 25. Veličković P, Cucurull G, Casanova A. et al. Graph attention networks. arXiv preprint arXiv:1710.10903. 2017.
- 26. Kipf TN, Welling M. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. 2016.
- 27. Wang Y, Min Y, Chen X. et al. Multi-view graph contrastive representation learning for drug–drug interaction prediction. In: Proceedings of the Web Conference 2021. New York, USA: ACM Press, 2021, 2921–33.
- 28. Nyamabo AK, Hui Y, Shi J-Y. SSI–DDI: substructure–substructure interactions for drug–drug interaction prediction. Brief Bioinform 2021;22:bbab133. [DOI] [PubMed] [Google Scholar]
- 29. Karim MR, Cochez M, Jares JB. et al. Drug-drug interaction prediction based on knowledge graph embeddings and convolutional-lstm network. In: Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics. New York, USA: ACM Press, 2019, 113–23.
- 30. Lin X, Quan Z, Wang Z-J. et al. KGNN: knowledge graph neural network for drug–drug interaction prediction. In: Proceedings of the 29th International Joint Conference on Artificial Intelligence. Yokohama, Japan: International Joint Conferences on Artificial Intelligence Organization, 2020, 2739–45.
- 31. Chen Y, Ma T, Yang X. et al. MUFFIN: multi-scale feature fusion for drug–drug interaction prediction. Bioinformatics 2021;37:2651–8. 10.1093/bioinformatics/btab169 [DOI] [PubMed] [Google Scholar]
- 32. Lin S, Wang Y, Zhang L. et al. MDF-SA-DDI: predicting drug–drug interaction events based on multi-source drug fusion, multi-source feature fusion and transformer self-attention mechanism. Brief Bioinform 2022;23:bbab421. [DOI] [PubMed] [Google Scholar]
- 33. Lin S, Chen W, Chen G. et al. MATT-DDI: predicting multi-type drug-drug interactions via supervised contrastive learning. J Chem 2022;14:81. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Lin S, Mao X, Hong L. et al. MATT-DDI: predicting multi-type drug-drug interactions via heterogeneous attention mechanisms. Methods 2023;220:1–10. 10.1016/j.ymeth.2023.10.007 [DOI] [PubMed] [Google Scholar]
- 35. Chen L, Chu C, Zhang Y-H. et al. Identification of drug–drug interactions using chemical interactions. Current Bioinformatics 2017;12:526–34. 10.2174/1574893611666160618094219 [DOI] [Google Scholar]
- 36. Basak SC. Mathematical descriptors for the prediction of property, bioactivity, and toxicity of chemicals from their structure: a chemical-cum-biochemical approach. Curr Comput Aided Drug Des 2013;9:449–62. [DOI] [PubMed] [Google Scholar]
- 37. He Z, Zhang J, Shi X-H. et al. Predicting drug-target interaction networks based on functional groups and biological features. PloS One 2010;5:e9603. 10.1371/journal.pone.0009603 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Fernández-Torras A, Comajuncosa-Creus A, Duran-Frigola M. et al. Connecting chemistry and biology through molecular descriptors. Curr Opin Chem Biol 2022;66:102090. 10.1016/j.cbpa.2021.09.001 [DOI] [PubMed] [Google Scholar]
- 39. Zhang W, Chen Y, Liu F. et al. Predicting potential drug-drug interactions by integrating chemical, biological, phenotypic and network data. BMC bioinformatics 2017;18:1–12. 10.1186/s12859-016-1415-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Zitnik SMM, Sosič R, Leskovec J. [n. d.] Biosnap datasets: Stanford biomedical network dataset collection. 2018.
- 41. Ryu JY, Kim HU, Lee SY. Deep learning improves prediction of drug–drug and drug–food interactions. Proc Natl Acad Sci 2018;115:E4304–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Zhang Z, Liu Q, Wang H. et al. Motif-based graph self-supervised learning for molecular property prediction. In: Ranzato M, Beygelzimer A, Dauphin Y, Liang P, Vaughan JW (eds.), Advances in Neural Information Processing Systems, Vol 34. Red Hook, NY, USA: Curran Associates, Inc., 2021, 15870–82. [Google Scholar]
- 43. Degen J, Wegscheid-Gerlach C, Zaliani A. et al. On the art of compiling and using’drug-like’chemical fragment spaces. ChemMedChem 2008;3:1503–7. 10.1002/cmdc.200800178 [DOI] [PubMed] [Google Scholar]
- 44. Tornio A, Filppula AM, Niemi M. et al. Clinical studies on drug–drug interactions involving metabolism and transport: methodology, pitfalls, and interpretation. Clin Pharmacol Ther 2019;105:1345–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Cendrós M, Arranz MJ, Torra M. et al. The influence of CYP enzymes and ABCB1 on treatment outcomes in schizophrenia: Association of CYP1A2 activity with adverse effects. J Transl Genet Genomics, 20202020;4:210–20. [Google Scholar]
- 46. Kamel A, Harriman S. Inhibition of cytochrome P450 enzymes and biochemical aspects of mechanism-based inactivation (MBI). Drug Discov Today Technol 2013;10:e177–89. 10.1016/j.ddtec.2012.09.011 [DOI] [PubMed] [Google Scholar]
- 47. Saeed Mirzaei M, Ivanov MV, Taherpour AA. et al. Mechanism-based inactivation of cytochrome P450 enzymes: computational insights. Chem Res Toxicol 2021;34:959–87. [DOI] [PubMed] [Google Scholar]
- 48. Donavon J, McConn II, Lin YS. et al. Differences in the inhibition of cytochromes P450 3A4 and 3A5 by metabolite-inhibitor complex-forming drugs. Drug Metab Dispos 2004;32:1083–91. 10.1124/dmd.32.10.1083 [DOI] [PubMed] [Google Scholar]
- 49. Murray M. Mechanisms of inhibitory and regulatory effects of methylenedioxyphenyl compounds on cytochrome P450-dependent drug oxidation. Curr Drug Metab 2000;1:67–84. 10.2174/1389200003339270 [DOI] [PubMed] [Google Scholar]
- 50. Shao X, Chen Y, Zhang J. et al. Advancing network pharmacology with artificial intelligence: the next paradigm in traditional chinese medicine. Chin J Nat Med 2025;23:1358–76. 10.1016/S1875-5364(25)60941-1 [DOI] [PubMed] [Google Scholar]
- 51. Dai Y, Shao X, Zhang J. et al. TCMChat: a generative large language model for traditional chinese medicine. Pharmacol Res 2024;210:107530. [DOI] [PubMed] [Google Scholar]
- 52. Shao X, Ai N, Donghang X. et al. Exploring the interaction between Salvia miltiorrhiza and human serum albumin: insights from herb–drug interaction reports, computational analysis and experimental studies. Spectrochim Acta A Mol Biomol Spectrosc 2016;161:1–7. 10.1016/j.saa.2016.02.015 [DOI] [PubMed] [Google Scholar]
- 53. Grześk G, Rogowicz D, Wołowiec Ł. et al. The clinical significance of drug–food interactions of direct oral anticoagulants. Int J Mol Sci 2021;22:8531. 10.3390/ijms22168531 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54. Wang R, Zhang H, Wang Y. et al. Effects of salvianolic acid B and tanshinone IIA on the pharmacokinetics of losartan in rats by regulating the activities and expression of CYP3A4 and CYP2C9. J Ethnopharmacol 2016;180:87–96. 10.1016/j.jep.2016.01.021 [DOI] [PubMed] [Google Scholar]
- 55. Wu WWP, Yeung JHK. Inhibition of warfarin hydroxylation by major tanshinones of danshen (salvia miltiorrhiza) in the rat in vitro and in vivo. Phytomedicine 2010;17:219–26. 10.1016/j.phymed.2009.05.005 [DOI] [PubMed] [Google Scholar]
- 56. Stresser DM, Blanchard AP, Turner SD. et al. Substrate-dependent modulation of CYP3a4 catalytic activity: analysis of 27 test compounds with four fluorometric substrates. Drug Metab Dispos 2000;28:1440–8. [PubMed] [Google Scholar]
- 57. Fang Y, Zhang Q, Zhang N. et al. Knowledge graph-enhanced molecular contrastive learning with functional prompt. Nat Mach Intell 2023;5:542–53. 10.1038/s42256-023-00654-0 [DOI] [Google Scholar]
- 58. Wishart DS, Feunang YD, Guo AC. et al. DrugBank 5.0: a major update to the drugbank database for 2018. Nucleic Acids Res 2018;46:D1074–82. 10.1093/nar/gkx1037 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59. Perozzi B, Al-Rfou R, Skiena S. DeepWalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM Press, 2014, 701–10.
- 60. Ma M, Lei X. A dual graph neural network for drug–drug interactions prediction based on molecular structure and interactions. PLoS Comput Biol 2023;19:e1010812. 10.1371/journal.pcbi.1010812 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61. Zhong Y, Li G, Yang J. et al. Learning motif-based graphs for drug–drug interaction prediction via local–global self-attention. Nat Mach Intell 2024;6:1094–105. 10.1038/s42256-024-00888-6 [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The source code and data of this work are freely available in the GitHub repository https://github.com/ZJUFanLab/TRACE.

















