Abstract
The metabolic stability of a drug is a crucial determinant of its pharmacokinetic properties, including clearance, half-life, and oral bioavailability. Accurate predictions of metabolic stability can significantly streamline the drug discovery process. In this study, we present MetaboGNN, an advanced model for predicting liver metabolic stability based on Graph Neural Networks (GNNs) and Graph Contrastive Learning (GCL). Using a high-quality dataset from the 2023 South Korea Data Challenge for Drug Discovery, which comprises 3,498 training molecules and 483 test molecules, we presented molecular structures as graphs to capture the intricate structural relationships that influence metabolic stability. A GCL-driven pretraining step was employed to enhance model generalizability by learning robust, transferable graph-level representations. Notably, incorporating interspecies differences between human liver microsomes (HLM) and mouse liver microsomes (MLM) further improved predictive accuracy, achieving Root Mean Square Error (RMSE) values of 27.91 (HLM) and 27.86 (MLM), both expressed as the percentage of parent compound remaining after a 30-min incubation. Compared to traditional approaches, MetaboGNN demonstrates superior predictive performance and highlights the importance of considering interspecies enzymatic variations. In addition, attention-based analysis identified key molecular fragments associated with metabolic stability, highlighting chemically meaningful structural determinants. These findings establish MetaboGNN as a powerful tool for metabolic stability prediction, supporting more efficient lead optimization processes in drug discovery.
Keywords: Drug metabolism, Liver microsomal stability, Graph neural networks, Graph contrastive learning, Interspecies differences
Scientific contribution
This study presents MetaboGNN, a novel predictive framework for liver metabolic stability that integrates Graph Neural Networks with Graph Contrastive Learning. By explicitly incorporating interspecies metabolic differences as a dedicated learning target, MetaboGNN achieved enhanced prediction accuracy. Attention-based analysis also identified molecular fragments that are strongly associated with stabilizing or destabilizing effects, facilitating chemical insights during lead optimization.
Introduction
Metabolic stability plays a critical role in determining the pharmacokinetic behavior of drugs, including their clearance, half-life, and oral bioavailability [1]. As one of the primary organs of drug metabolism, the liver mediates enzymatic reactions that convert drugs into metabolites for easier excretion [2]. This process can be classified into two phases: Phase I reactions, which involve functionalization processes such as oxidation, reduction, and hydrolysis, and Phase II reactions, which involve conjugation with endogenous molecules like glucuronic acid or sulfate [2, 3]. Understanding and optimizing metabolic stability during the early stages of drug discovery is essential, as it directly influences the selection and refinement of lead compounds [4].
In vitro models, including hepatocytes, hepatic microsomes, and S9 fractions, are indispensable tools for assessing metabolic stability due to their significant correlation with in vivo drug clearance [5]. These models provide critical insights that guide structural modifications, enhancing the pharmacokinetic (PK) profiles of drug candidates to ensure both efficacy and safety [6]. Key parameters measured in these assays include the percentage of the parent compound remaining after 30 min, half-life (t1/2), and intrinsic clearance (CLint), offering valuable data on drug metabolism [7]. Despite these advantages, both in vitro and in vivo methods face significant challenges, including high costs and limited scalability [8]. While in vitro models provide a more feasible option for metabolic stability screening, evaluating large libraries of compounds remains resource-intensive and time-consuming [9]. This highlights the pressing need for innovative, cost-effective, and scalable approaches to streamline metabolic stability screening.
With the rapid advancements in artificial intelligence technology, deep learning (DL) is increasingly being applied in the drug discovery process to predict compounds' physicochemical properties and biological activities [10, 11]. Owing to its remarkable ability to capture complex relationships between molecular structures and their experimental values, this technology has been successfully utilized to predict drug-target affinity [12, 13], assess cardiotoxicity [14], and evaluate drug-induced liver injury [15] through specialized models developed for these purposes. In the case of metabolic stability prediction, traditional approaches such as rule-based or QSAR models are commonly employed [16–18]; however, DL-based prediction models remain relatively underdeveloped [19, 20]. It might be due to the inherent complexity of metabolic pathways and the limited availability of high-quality, comprehensive datasets required to train and validate DL algorithms. Recently, graph neural network (GNN) models such as CMMS-GCL and MS-BACL have applied graph contrastive learning (GCL) to improve molecular representations under limited data conditions [19, 20]. GCL is a self-supervised learning strategy that enhances representation learning by encouraging embeddings of similar views of the same molecule to be similar, while pushing apart unrelated ones [21]. By incorporating contrastive loss as a joint training objective, these models [19, 20] have been shown to enhance prediction performance while also enabling the identification of substructures relevant to metabolic stability. However, since they are trained solely on molecular structure data with single-task labels and contrastive loss, the models have limited access to broader biological or contextual information, which makes it challenging to capture the complex factors that influence metabolic stability in real-world biological systems. In parallel, traditional machine learning techniques such as random forest have also been employed to develop multi-species prediction models for liver microsomal stability [22, 23]. However, these models typically make parallel predictions for each species without explicitly incorporating interspecies differences into the model design or learning objectives.
The 2023 South Korea Data Challenge for Drug Discovery provided a unique opportunity to address these challenges. Running from August 7 to October 23, 2023, this competition focused on developing regression models to predict NADPH-dependent metabolic stability of compounds in human liver microsomes (HLM) and mouse liver microsomes (MLM). The dataset includes 3,498 training molecules and 483 test molecules with features such as SMILES, AlogP, the number of hydrogen donors and acceptors, and liver microsomal stability measured by liquid chromatography with tandem mass spectrometry (LC–MS/MS). The stability values represent the percentage of the parent compound remaining after a 30-min incubation. HLM or MLM data quantified by LC–MS/MS is a widely used method for assessing drug metabolism and predicting in vivo clearance [24]. Participants in this challenge were tasked with minimizing the prediction error and evaluated using the formula:
| 1 |
where RMSE refers to the Root Mean Square Error, calculated separately for HLM and MLM. The competition drew 1,254 teams to showcase their innovative approaches to this challenging problem. Our model, one of the Top-2 solutions, was built on GNNs [25] and GCL [21], incorporating interspecies differences in metabolic stability into the learning process. It achieved state-of-the-art performance and serves as the foundation for the model MetaboGNN detailed in this article.
Learning property differences between molecules, rather than absolute values, has shown promise in understanding structural features affecting various molecular properties [13, 26]. This difference-learning approach is particularly effective with limited datasets, as it can better capture structural patterns from fewer examples [13]. Building on this concept, our approach employed GNNs to represent molecular structures as graphs and captured intricate structural relationships essential for metabolic stability. To further enhance model performance, we adopted GCL as a pretraining strategy, which substantially improves predictive accuracy and generalizability. Finally, we integrated interspecies differences in liver microsomal metabolic stability as a multi-task learning component. This innovative strategy not only boosts predictive accuracy but also provides valuable mechanistic insights into species-specific metabolic variations, a critical aspect of preclinical drug development.
By combining advanced pretraining techniques with multi-task learning, particularly focusing on interspecies metabolic differences, MetaboGNN effectively captures the impact of enzymatic variations on metabolic rates. This integration enables the model to identify how structural properties influence drug metabolism, offering a deeper understanding of compound-specific behaviors. These advancements position MetaboGNN as a robust, interpretable, and practical tool for accelerating drug discovery and optimizing development pipelines.
Results and discussion
Exploratory data analysis of liver microsomal stability data
We used a liver microsomal stability dataset from the 2023 South Korea Data Challenge for Drug Discovery, comprising 3,981 compounds with human and mouse liver microsomal stability (HLM and MLM) data measured by LC–MS/MS. For the challenge, the dataset was split into training (3,498 compounds) and test (483 compounds) sets. Before the model training, exploratory data analysis was conducted to explore the distribution of the chemical structures, their key physicochemical properties, and also the microsomal stability values. As shown in Fig. 1, kernel density estimation (KDE) plots were utilized to explore the distributions of the data, and the molecular structures represented using Morgan fingerprints were then visualized as a chemical space through principal component analysis (PCA). From these analyses, we confirmed that the molecular structures were evenly distributed between the training and test sets (Fig. 1a, b). In the case of stability data, the KDE plot revealed that MLM values were slightly skewed toward lower values compared to HLM (Fig. 1c, top). To investigate the metabolic differences between HLM and MLM for each compound, we defined HLM–MLM as the difference between the human and mouse liver microsomal stability values (HLM–MLM, in % remaining). The distribution of HLM–MLM values was then visualized to quantify the interspecies differences in metabolic stability (Fig. 1c, bottom). The HLM–MLM values exhibited a wide distribution, which can be explained by metabolic differences arising from variations in enzyme expression levels and isoform composition between humans and mice [27, 28]. Such differences can cause substantial variability in the metabolism of identical compounds between HLM and MLM.
Fig. 1.
Exploratory data analysis of liver microsomal stability dataset. a Distribution of molecular properties. Kernel density estimation (KDE) plots depict the distributions of key molecular properties, including LogD, AlogP, molecular weight, molecular polar surface area (MPSA), the number of rotatable bonds, the number of hydrogen bond donors, and the number of hydrogen bond acceptors across the datasets (green and orange for training and test sets, respectively). b Principal component analysis (PCA) of molecular features. PCA scatter plots show the balanced distribution of molecular features in the training and test sets. c Distribution of liver microsomal stability and interspecies differences. KDE plots display consistent distributions of the human liver microsomal stability (HLM, %) and the mouse liver microsomal stability (MLM, %) between the training and test sets. The KDE plot of HLM–MLM highlights interspecies variability in metabolic stability, likely influenced by enzymatic differences between humans and mice. d Correlation between molecular features and metabolic stability values. The Pearson correlation matrix shows strong correlations between HLM and MLM while highlighting weak correlations between HLM–MLM and properties such as LogD and MPSA
Pearson correlation coefficients were calculated to evaluate the relationships among features, as well as between HLM, MLM, and HLM−MLM (Fig. 1d). The correlation coefficient between HLM and MLM was 0.71, indicating a strong positive correlation. This result suggests that liver microsomal stability is significantly influenced by shared enzymatic pathways in humans and mice [27]. Among the features, LogD and AlogP exhibited the highest correlations, indicating their importance in determining microsomal stability. Note that liver microsomal stability can be influenced not only by metabolic enzymes but also by microsomal membrane permeability [29]. In contrast, the metabolic stability difference between species, represented by HLM–MLM, showed negligible correlations with LogD (− 0.01) and AlogP (− 0.03). This indicates that the interspecies differences in metabolic stability are not significantly influenced by membrane permeability properties such as LogD and AlogP. Instead, these differences appear to rise from enzymatic variations, rather than physicochemical properties, reflecting the complexity of metabolic differences between human and mouse liver microsomes.
Ablation study on architecture and pretraining for metabolic stability prediction
This section presents a comprehensive comparison of predictive performance between the proposed MetaboGNN model and other baseline models, including both graph-based architectures [30–33] and pretrained chemical language models (CLMs) [34–36], for liver microsomal stability prediction. All performance metrics reported here are based on the test set. Without pretraining, MetaboGNN (Scratch; Table 1) achieved RMSEs of 30.94 (95% CI 30.52–31.35) for the human dataset and 29.48 (95% CI 29.29–29.67) for the mouse dataset. These RMSE values reflect the prediction error in the same unit as the target, namely the percentage of parent compound remaining after a 30-min incubation. As shown in Table 1, it outperformed all compared graph-based models on the human dataset, although GIN slightly outperformed it on the mouse dataset. This may be due to MetaboGNN’s molecular representation, which encodes atoms, bonds, and ring structures to model intramolecular interactions relevant to metabolic stability [37].
Table 1.
RMSE comparison of MetaboGNN with GCL against pretrained Language Model
| Human | Mouse | |
|---|---|---|
| MetaboGNN (Scratch) | 30.94 (30.52–31.35) | 29.48 (29.29–29.67) |
| MetaboGNN (GCL) | 30.14 (29.89–30.39) | 28.72 (28.55–28.88) |
| GIN [30] | 31.90 (30.46–33.25) | 29.44 (28.12–30.76) |
| GCN [31] | 31.85 (30.71–33.12) | 29.71 (28.33–30.89) |
| GAT [32] | 32.27 (30.97–33.99) | 29.74 (28.39–31.00) |
| MPNN [33] | 33.09 (31.55–34.80) | 31.20 (30.11–32.48) |
| ChemBERTa [34] | 30.99 (29.35–32.60) | 29.81 (28.24–30.91) |
| MolFormer [35] | 31.14 (29.47–32.52) | 30.56 (29.16–31.87) |
| MolT5 [36] | 32.68(30.89–34.26) | 32.84(31.41–34.19) |
The numbers in parentheses represent the 95% confidence interval
To evaluate the relative effectiveness of GCL versus pretrained CLMs in predictive performance, we compared their outcomes in Table 1. MetaboGNN with GCL achieved the lowest RMSE for both human and mouse data, with RMSE values of 30.14 (95% CI 29.89–30.39) and 28.72 (95% CI 28.55–28.88), respectively, representing the best overall performance among all models evaluated. Notably, even without pretraining, MetaboGNN outperformed all CLMs on both datasets, highlighting the effectiveness of graph-based representations for liver microsomal stability prediction. These findings suggest that pretraining based on graph structural information via GCL can be more effective than large-scale pretrained CLMs in predicting liver microsomal stability.
Overall, the findings highlight a consistent performance advantage of graph-based models over CLMs in predicting metabolic stability. This gap likely stems from differences in input representation: CLMs process SMILES strings at the token level [39], which can capture implicit structural patterns but lack the explicit encoding of local chemical environments critical for metabolic transformations. In contrast, graph-based models like MetaboGNN directly model atomic connectivity and substructures. This capability is particularly important for metabolic stability prediction because metabolic transformations such as oxidation and demethylation inherently involve forming and breaking chemical bonds at specific molecular sites. GNNs are invariant to SMILES permutations and emphasize chemically meaningful neighborhoods, whereas CLMs rely on tokenization and sequential order. In addition, the performance gain from GCL-based pretraining suggests that the model has learned generalizable representations over a broad chemical space. While the improvement in RMSE is relatively modest, it suggests that this pretraining process enhances generalization and robustness.
Incorporating interspecies differences to enhance prediction accuracy
Instead of training separate models for human and mouse metabolic stability, we applied learning interspecies differences by integrating an auxiliary prediction task that estimates the difference between the two species into the training of all models. This approach significantly improved prediction accuracy. As shown in Table 2, the performance gains were consistent across different deep learning architectures, including both graph-based and language-based models. In particular, incorporating interspecies differences to the MetaboGNN (GCL) model reduced the RMSE from 30.14 to 27.91 for the human dataset and from 28.72 to 27.86 for the mouse dataset, achieving the best overall performance.
Table 2.
Impact of interspecies difference task on RMSE across models
| Human | Mouse | |
|---|---|---|
| MetaboGNN (Scratch) + Interspecies Difference Task | 28.82(27.44–30.41) | 28.17(26.46–29.58) |
| MetaboGNN (GCL) + Interspecies Difference Task | 27.91 (27.76–28.06) | 27.86 (27.69–28.03) |
| GIN [30] + Interspecies Difference Task | 29.46 (28.00–31.12) | 28.66 (27.28–30.60) |
| GCN [31] + Interspecies Difference Task | 30.25 (28.96–31.46) | 29.55 (28.01–31.13) |
| GAT [32] + Interspecies Difference Task | 29.99 (28.71–31.48) | 29.37 (27.85–30.73) |
| MPNN [33] + Interspecies Difference Task | 30.72 (29.35–32.03) | 30.75 (29.05–32.62) |
| ChemBERTa [34] + Interspecies Difference Task | 29.11 (27.75–31.13) | 29.13 (27.70–30.72) |
| MolFormer [35] + Interspecies Difference Task | 29.94 (27.95–31.41) | 29.81 (28.33–31.19) |
| MolT5 [36] + Interspecies Difference Task | 30.64 (29.27–32.06) | 31.70 (30.22–33.71) |
The numbers in parentheses represent the 95% confidence interval
To further investigate the impact of learning interspecies differences, the dataset was divided into two groups based on the absolute values of interspecies differences, i.e., |HLM − MLM|. Within each group, a paired t-test was conducted to assess the performance improvement after incorporating interspecies difference learning, and a two-sided t-test was performed to compare the magnitude of improvement between the two groups. In the group with larger |HLM − MLM| (range: 16.12–99.89), the model's performance improvement was particularly pronounced (Fig. 2a, p < 0.001), whereas the improvement in the group with smaller |HLM − MLM| (range: 0.00–16.12) was not statistically significant (p ≥ 0.05). Furthermore, the performance improvement in the larger |HLM − MLM| group was significantly greater than that in the smaller |HLM − MLM| group (p < 0.001). These results suggest that the improved predictive performance is driven by its ability to learn interspecies metabolic differences for compounds with significant HLM and MLM variations, rather than for those with similar values.
Fig. 2.
Performance comparison between baseline models and the models incorporating interspecies differences across key features. Comparison between the high and low values of (a) |HLM–MLM|, (b) molecular weight (MW), (c) molecular polar surface area (MPSA), (d) the number of rotatable bonds, (e) the number of hydrogen bond donors, and (f) the number of hydrogen bond acceptors. For features (a–f), high–low group differences were evaluated using Welch’s two-sided t-test, while within-group differences were assessed using a paired t-test. For features (g) LogD and (h) AlogP, pairwise comparisons between low, medium, and high bins were conducted using Welch’s t-test with Holm correction, and within-group differences were assessed using a paired t-test. Incorporating interspecies differences markedly reduced prediction error, particularly for compounds with large |HLM − MLM| values, low MW, and low MPSA. For rotatable bonds, hydrogen bond donors, and acceptors, no notable differences emerged between groups, though both improved. For LogD and AlogP, performance improvements were evident in the low and medium groups, with no clear difference between them
Subsequently, we analyzed features closely related to membrane permeability, including LogD, AlogP, molecular weight (MW), and molecular polar surface area (MPSA) [41]. MW and MPSA were divided at their median values to compare prediction performance between low- and high-value groups. The same statistical procedure was applied, with paired t-tests conducted within each group and two-sided t-tests used to compare improvements between groups. Compounds with lower MW and MPSA values showed significantly greater performance improvements than those in the higher groups (Fig. 2b, c, p < 0.05 and p < 0.01, respectively), with the lower groups themselves exhibiting significant improvements (p < 0.001). For the number of rotatable bonds, hydrogen bond donors, and hydrogen bond acceptors, the improvements were not statistically different between the low- and high-number groups (Fig. 2d–f, p ≥ 0.05), although both groups showed significant gains in performance regardless of group classification. LogD and AlogP were grouped using quantile-based binning into three categories: Low (bottom 25%), Medium (middle 50%), and High (top 25%). Compounds with medium values often achieve a favorable balance between permeability and solubility, so the medium group was also included in the analysis. This setup enabled more nuanced performance evaluation while maintaining sufficient sample sizes across groups. For LogD (Fig. 2g) and AlogP (Fig. 2h), pairwise comparisons between low, medium, and high bins were conducted using Welch’s t-test with Holm correction, and within-group differences were assessed using a paired t-test. For LogD and AlogP, performance improvements were most pronounced in the low and medium groups (p < 0.001). Pairwise comparisons revealed that the improvements in the low and medium groups were significantly greater than those in the high group (p < 0.001 and p < 0.01, respectively). However, no significant difference in performance gains was observed between the low and medium groups (p ≥ 0.05).
Taken together, compounds with larger |HLM − MLM| differences exhibited greater improvements in predictive performance, and the effect was more pronounced in groups with favorable drug-like properties, such as better permeability and solubility. Based on these findings, we speculated that learning interspecies differences improves performance by accounting for metabolic stability changes after compounds cross the microsome membrane, capturing differences in enzymatic landscapes between species. Such differences are well-documented; for example, in humans, a small set of CYP genes is responsible for most drug metabolism, whereas mice possess a much larger repertoire, leading to substantial disparities in substrate specificity, metabolic rates, and metabolite profiles [42].
Performance comparison
To evaluate our model’s classification performance, we compared its results against two established models, PredMS [43] and MS-BACL [20]. PredMS is a random forest model designed for predicting metabolic stability [43], while MS-BACL [20], similar to our approach, enhances molecular structure interaction modeling and employs GCL, contributing to improved stability prediction in human datasets. As these models [20, 43] produce binary predictions (stable or unstable), we converted our continuous HLM output into binary classes using a threshold of 50% of parent compound remaining after 30 min. The Receiver Operating Characteristic (ROC) and Precision-Recall curves showed that our model performed on par with or better than the existing approaches across almost all classification metrics (Table 3) with the corresponding AUROC and AUPRC curves illustrated in Fig. 3. Notably, our model achieved a Matthews correlation coefficient (MCC) of 0.4781, which is significantly higher than MS-BACL (0.4155) and PredMS (0.3425). Since MCC considers both positive and negative outcomes proportionally, it provides a more balanced measure of performance, making it particularly suitable for evaluating the reliability of binary classification models even in imbalanced datasets [44]. This indicates the robustness of our model for predicting metabolic stability even in the presence of data imbalances and suggests its practical applications in drug discovery. The performance advantage of MetaboGNN over MS-BACL arises from several key differences. While MS-BACL introduces GCL jointly with human stability prediction, MetaboGNN is pretrained on a broader chemical space, enabling more generalizable molecular representations. Unlike prior studies that embed contrastive loss into classification-focused objectives [19, 20], our model combines HLM and MLM losses and explicitly incorporates inter-species differences as a prediction target. This design leverages real inter-species differences instead of relying solely on augmented structural views, providing more informative and biologically grounded supervision. As a result, our model integrates richer task-specific information into training, leading to more accurate predictions of metabolic stability.
Table 3.
Performance comparison for binary classification of metabolic stability
| MetaboGNN | MS-BACL [20] | PredMS [43] | |
|---|---|---|---|
| AUROC | 0.8137 (0.7813–0.8476) | 0.7892 (0.7561–0.8207) | 0.7425 (0.7049–0.7775) |
| AUPRC | 0.8536 (0.8162–0.8894) | 0.8434 (0.8071–0.8769) | 0.8094 (0.7672–0.8487) |
| Accuracy | 0.7474 (0.7101–0.7888) | 0.7164 (0.6770–0.7557) | 0.6915 (0.6460–0.7288) |
| Precision | 0.7943 (0.7482–0.8419) | 0.7706 (0.7218–0.8172) | 0.7112 (0.6616–0.7578) |
| Recall | 0.7778 (0.7308–0.8241) | 0.7465 (0.6976–0.7959) | 0.8125 (0.7621–0.8566) |
| F1-score | 0.7860 (0.7500–0.8237) | 0.7584 (0.7203–0.7968) | 0.7585 (0.7174–0.7931) |
| Matthews correlation coefficient (MCC) | 0.4781 (0.4002–0.5593) | 0.4155 (0.3353–0.4957) | 0.3425 (0.2543–0.4218) |
The numbers in parentheses represent the 95% confidence interval
Fig. 3.
Performance comparison of MetaboGNN with existing methods. a The Receiver Operating Characteristic (ROC) curve analysis. ROC curves show that MetaboGNN outperforms previous methods in distinguishing true positives from false positives as evidenced by its higher AUROC (> 0.8). b Precision-Recall (PR) curve analysis. PR curves demonstrate that MetaboGNN effectively balances precision and recall, showing enhanced or comparable performance (AUPRC > 0.85) in detecting true positives while minimizing false positives in a binary classification context
Visual analysis of bond importance for metabolic stability
To understand the features that the GNN model considers critical for predicting the metabolic stability of molecules, we extracted edge importance scores using EdgeSHAPer [45] and analyzed their contributions using effect size (Cohen's d) [46] for each bond in the test set (Fig. 4). The edge importance was visualized as edge shapes in the molecular graphs, allowing us to evaluate their impact on the predicted metabolic stability. In Fig. 4a, red-highlighted bonds represent regions contributing to lower metabolic stability, whereas blue-highlighted bonds indicate regions associated with higher stability. This visualization effectively identifies stabilizing and destabilizing regions within individual molecules, providing actionable insights for structural optimization despite the complex nature of liver microsomal metabolism.
Fig. 4.
Analysis of molecular substructures influencing metabolic stability. a Exemplary molecules marked with the key substructures affecting metabolic stability. Highlighted bonds in the test molecules indicate their impact on metabolic stability. Blue bonds represent stabilizing features, and red bonds indicate destabilizing features, as identified by the model's Attention module. Key examples include O-demethylation and benzylic carbon oxidation. b Statistical analysis of highly contributing substructures on metabolic stability. The top 10 destabilizing (red) and stabilizing (blue) substructures are displayed based on their contributions to metabolic stability. Destabilizing features commonly include aromatic carbons and ether groups, while stabilizing features often involve amines, weakly basic heterocycles, and amides.
For example, in Test 213, the methoxy phenyl group highlighted in red was identified as a destabilizing feature, which aligns with the well-known O-demethylation reactions observed in drugs like Venlafaxine and Diltiazem [47, 48]. Similarly, in Test 17, the bond involving the benzylic carbon, also highlighted in red, corresponds to structures commonly oxidized by CYP450 enzymes [49]. On the other hand, in Test 208, the bonds between two fluorine substituents and aromatic carbons in the phenyl group were highlighted in blue, indicating a stabilizing effect. This observation aligns with a widely used strategy of incorporating fluorine substituents to improve metabolic stability through inductive/resonance effects, as well as conformational or electrostatic effects [50]. While these examples align with known metabolic processes, not all highlighted regions can be solely explained by the chemical stability of specific functional groups. Moreover, liver microsomal stability is influenced by multiple factors, including membrane permeability and enzymatic affinity, the highlighted regions may also reflect molecular features related to these factors.
Using effect size (Cohen's d) [46], we statistically identified substructures that contribute to either destabilizing or stabilizing liver microsomal stability (Fig. 4b). Among the top 10 destabilizing substructures, common motifs include aromatic carbon connected to ether functional groups, benzylic carbons, and carbon–carbon connections within aromatic rings. These structures are known to be susceptible to CYP450-mediated oxidation [48, 49, 51], as exemplified by Test 213 and Test 17. Conversely, the top 10 stabilizing substructures include amines, amides, nitrogen-containing hetero-rings, and sulfonamides. Although amines and amides are often subject to metabolic processes such as N-dealkylation and hydrolysis, respectively [52, 53], they were identified as stabilizing substructures in this analysis. These groups might be involved in reducing membrane permeability, thereby decreasing subsequent enzymatic metabolism. In summary, our graph attention analysis reflects known relationships between liver microsomal metabolism and molecular structures. Taken together, these findings illustrate how our GNN model and graph attention analysis can serve as powerful tools for metabolic screening and also guide effective lead optimization in drug discovery.
Methods
Dataset preprocessing
The liver microsomal stability dataset in this study was provided by the 2023 South Korea Data Challenge for Drug Discovery. The complete dataset contained 3,981 compounds, each represented by unique identifiers, SMILES notations describing molecular structures, and physicochemical properties such as LogD at pH 7.4. It also included experimentally measured liver microsomal stability in both human (HLM) and mouse (MLM) liver microsomes. These values represent the percentage of the parent compound remaining after a 30-min incubation with liver microsomes, quantified using LC–MS/MS. The dataset was split by the challenge organizers into training (3,498 compounds) and test (483 compounds) sets to ensure structural diversity and representative distributions. This was achieved using a two-step process: first, compounds were grouped by clustering based on their molecular structures; second, stratified splitting was performed within each cluster, balancing the training and test sets while preserving the structural variation. During our study, we further processed the dataset by converting SMILES notations into graph representations, where nodes (atoms) and edges (bonds) were characterized by their respective features.
Model architecture and training
Using a Graph Neural Network (GNN) architecture, we represent molecules as graphs where atoms are nodes, bonds are edges [54], and ring structures are explicitly included to capture critical structural features [37]. Each atom is featurized by its type, hybridization state, aromaticity, and formal charge, while bonds are described by their types (single, double, triple, or aromatic) and conjugation states (Fig. 5a). Ring structures, such as aromatic or heterocyclic rings, are incorporated to enhance the model's ability to capture interactions essential to metabolic stability, including those with cytochrome P450 enzymes. The GNN’s message-passing mechanism enables nodes to iteratively aggregate information from their neighbors and edges. This iterative process refines node embeddings to encode local chemical environments and bonding patterns. A global pooling then assembles these refined embeddings into a comprehensive graph-level representation, which is subsequently utilized to predict metabolic stability.
Fig. 5.
Overview of the proposed model framework. a Graph Neural Network (GNN). Molecular structures are represented as graphs with nodes (atoms) and edges (bonds), capturing atomic properties (e.g., valency) and bond characteristics to model molecular and ring structures. b Graph Contrastive Learning (GCL). Node and edge augmentations create diverse molecular representations. Contrastive learning ensures the model learns robust and invariant representations by minimizing differences between augmented graphs. c Multi-task training. Molecular representations are input into fully connected layers to predict mouse liver microsomal stability (MLM) and interspecies differences. Human liver microsomal stability (HLM) is derived by adjusting MLM using interspecies differences. Multi-task learning minimizes the Mean Squared Error (MSE) for both HLM and MLM predictions
The training process consists of two stages. In the first stage, Graph Contrastive Learning (GCL) was applied as a pretraining method (Fig. 5b) using a large-scale unlabeled molecular dataset containing over 2.58 million molecules. Previous studies, such as MS-BACL [20] and CMMS-GCL [19], have shown that GCL can improve metabolic stability prediction when incorporated into joint training, even without being used as a separate pretraining step. These findings further motivated its adoption in our framework. Our pretraining dataset comprised approximately 2,000,000 drug-like compounds from ZINC [55] and various subsets from MoleculeNet [56], including BACE (1,513), ESOL (1,128), FreeSolv (642), HIV (41,127), Lipophilicity (4,200), MUV (93,087), PCBA (437,929), SIDER (1,427), and Tox21 (7,831 compounds). Graph augmentation techniques, including node masking and edge dropping, have been proposed to introduce controlled variations in molecular graphs. However, these strategies, while commonly used in GCL, can potentially compromise chemical validity by perturbing atomic or bonding structures [21, 57]. To mitigate this issue, recent studies [20, 58] have emphasized the use of knowledge-guided and chemically valid augmentations. In line with this, we employed an attribute masking strategy [59] that occludes a subset of node and edge features, such as partial charge, aromaticity, or hybridization state, while preserving the structural topology of the molecular graph. This approach maintains the underlying chemical semantics during pretraining. This self-supervised learning approach aimed to improve molecular representations under limited data conditions, thereby enhancing downstream predictive accuracy and generalization, and enabled the model to capture subtle molecular features without requiring extensive labeled data [40]. In the second stage, task-specific fine-tuning was performed to predict liver microsomal stability for both human (HLM) and mouse (MLM) systems. A multi-task learning framework was employed to incorporate interspecies differences as a dedicated task, allowing the model to learn both HLM–MLM differences and individual metabolic stability values simultaneously (Fig. 5c). To optimize the training process, we utilized the AdamW [60] optimizer for parameter updates, the CosineAnnealingWarmRestarts scheduler for dynamic learning rate adjustment, and the Exponential Moving Average (EMA) [61] technique to ensure stable convergence. Together, these techniques significantly improved the model's predictive accuracy and stability, making it highly effective for metabolic stability prediction.
Incorporation of inter-species differences as a learning feature
This study adopts learning inter-species differences as a core methodology to enhance prediction accuracy. We focused on both the liver microsomal stability of each species and the inter-species differences between humans and mice (HLM–MLM), framing this as a dedicated learning task. The GNN generates 128-dimensional representation vectors from molecular structures, which are then used by separate fully connected layers to predict MLM values and inter-species differences. Metabolic stability varies significantly across species due to differences in liver CYP isoforms and hepatic enzyme expression [27]. We assumed that understanding these species-specific variations can be critical in predicting metabolic stability profiles. By integrating the interspecies difference values as a new learning prediction task, the model can predict metabolic stability as well as species-specific metabolism, resulting in a synergistic improvement in the performance of both tasks.
The model is trained to predict metabolic stability in MLM and inter-species differences (HLM–MLM), and the refined HLM is derived using the following adjustment process:
| 2 |
| 3 |
The training process uses two equally weighted MSE losses: one for MLM and one for HLM, both compared to their ground truth. The combined loss is used in backpropagation to update model parameters, enabling the GNN to learn MLM metabolism and inter-species metabolic differences. This approach hypothesized that molecular features and metabolic associations can be more effectively captured by focusing on inter-species variations rather than predicting HLM and MLM independently.
Performance comparison with available methods
To evaluate our model's effectiveness, we compared its performance with that of two prominent studies on metabolic stability prediction: PredMS [43] and MS-BACL [20]. PredMS employs a Random Forest approach for binary classification of compounds as stable or unstable in human liver microsomes, while MS-BACL utilizes a GNN architecture enhanced with bond graph augmentation and contrastive learning strategies designed to capture both atomic and molecular bond relationships. For a fair comparison, we trained these models on our dataset using their publicly available code and evaluated them on our test set. We adapted the outputs of our regression model to binary classification, aligning with the binary classification approach of PredMS and MS-BACL, which categorize states as stable or unstable. For converting the predicted HLM values into binary classes, a threshold of 50% was used (values ≥ 50% as stable and < 50% as unstable), consistent with previous studies. Figure 6 illustrates the class distribution for HLM and MLM datasets based on this 50% threshold. While the HLM dataset exhibited a relatively balanced distribution, the MLM dataset showed a higher proportion of unstable compounds. This tendency may reflect known interspecies differences in drug metabolism, as mice possess more CYP genes than humans, leading to differences in substrate specificity, metabolic rates, and metabolite profiles [42]. To compare performance across models, we used a set of standard evaluation metrics, including AUROC, accuracy, precision, recall, F1 score, and MCC.
Fig. 6.
Stability class distributions for HLM and MLM datasets (≥ 50%: Stable; < 50%: Unstable). a HLM stability distribution across the full, train, and test sets. The proportion of stable compounds was 55.8% in the full set, 55.3% in the train set, and 59.6% in the test set. b MLM stability distribution. Stable compounds accounted for 37.2% in the full set, 36.9% in the train set, and 39.3% in the test set
Evaluation and statistical analysis
The model's performance was evaluated using RMSE to assess the accuracy of metabolic stability predictions. The final score was computed as the weighted average of the RMSE values for HLM and MLM (Eq. 1). To verify the stability of the predictions, 30 independent training runs were performed, and the 95% confidence intervals were calculated based on the results. Statistical analysis was performed using the SciPy library in Python [62]. A Welch's t-test was used to compare two groups divided based on high and low values across key features. The effect sizes for molecular substructures were quantified using Cohen's d, which measures the standardized difference between the means of two groups [46]. To ensure reproducibility, all source code and model weights are publicly available on the GitHub repository listed in the Availability of Data and Materials section.
Hyperparameter selection
For all models, the dataset was split into 80% training and 20% validation sets using a fixed random seed to ensure reproducibility, with the same validation set applied across models for consistent and fair comparison. The final evaluation was performed on the hold-out test set. For MetaboGNN and other graph-based models, hyperparameters were tuned based on validation RMSE using grid search [63], and the configuration minimizing the average RMSE across three runs was selected. The final GNN model consisted of two message-passing layers with a hidden dimension size of 128. The learning rate was set to 5e-4 using the AdamW optimizer. Dropout was not applied to the GNN layers themselves but was set to 0.1 in the fully connected layers used for metabolic stability prediction. For comparison, CLMs were also fine-tuned using grid-searched hyperparameters. Specifically, the learning rates were set to 2e-5 for ChemBERTa, 1e-5 for MolT5, and 5e-6 for MolFormer. All models were trained for up to 50 epochs, with the checkpoint achieving the best performance on the validation split used for final evaluation. The respective batch sizes were 128 for GNNs and 16 for CLMs. For both GNNs and CLMs, identical search ranges were applied for key hyperparameters (e.g., learning rate, dropout rate, batch size), and all baseline models and our proposed variants were tuned independently on the validation split. The best-performing configuration per model was then selected for final test evaluation.
Conclusions
In this study, we developed MetaboGNN, a novel predictive framework for liver metabolic stability based on GNNs and GCL. Our findings demonstrate that GNN-based models can effectively capture both structural features and their relationship with metabolic stability, while GCL enhances representation learning from unlabeled data and improves the model's generalizability. In particular, by explicitly incorporating interspecies differences between human and mouse liver microsomes into model training, MetaboGNN showed significant performance improvements, especially for compounds with larger metabolic discrepancies between species. Based on the attention mechanism within GNNs, we also highlighted functional groups and substructures that strongly influence metabolic stability. Our model's ability to account for both molecular structure and interspecies metabolic differences represents a major advancement in predicting metabolic stability. However, while some attention patterns matched known structure–metabolism relationships, others were harder to interpret. This may be due to the multifactorial nature of metabolic stability, which depends on factors such as enzyme interactions, microsomal permeability, and the specific site of metabolism (SoM). Comparing MetaboGNN’s attention maps with those derived from SoM or permeability prediction models could provide clearer insights into these ambiguous patterns. Another complementary approach is to reduce interpretability noise by applying graph pooling techniques (e.g., motif-based or hierarchical) to cluster related atoms and bonds into functional group–level representations before prediction and attribution analysis. This can consolidate correlated features into coherent substructures, highlighting fragments most relevant to metabolic stability, albeit with reduced atom- or bond-level detail.
A key limitation of the current approach is the lack of explicit enzyme-level information, such as isoform specificity, activity profiles, or structural characteristics, due to the absence of comprehensive datasets linking these properties to molecular structures and metabolic outcomes. A more comprehensive dataset that integrates SoM, permeability, and enzyme-level information alongside metabolic stability would enable the development of a unified multi-task model with enhanced predictive performance and interpretability. Although this limitation stems from current constraints in publicly available data, MetaboGNN may nonetheless serve as a useful foundation for developing AI-driven predictive models of metabolic stability. By integrating advanced deep learning techniques and interspecies metabolic insights, MetaboGNN can support the prediction of pharmacokinetic properties in lead optimization, offering potential applications in drug discovery and development.
Abbreviations
- GNN
Graph Neural Networks
- GCL
Graph Contrastive Learning
- HLM
Human Liver Microsomes
- MLM
Mouse Liver Microsomes
- CYP
Cytochrome P450
- ADMET
Absorption, Distribution, Metabolism, Excretion, and Toxicity
- LC–MS/MS
Liquid Chromatography-Mass Spectrometry/Mass Spectrometry
- NADPH
Nicotinamide Adenine Dinucleotide Phosphate Hydrogen
- SMILES
Simplified Molecular Input Line Entry System
- AlogP
Atom-based Logarithm of Partition coefficient
- LogD
Distribution coefficient
- MPSA
Molecular Polar Surface Area
- MW
Molecular Weight
- RMSE
Root Mean Square Error
- PCA
Principal Component Analysis
- KDE
Kernel Density Estimation
- AUROC
Area Under the Receiver Operating Characteristic Curve
- AUPRC
Area Under the Precision-Recall Curve
- MCC
Matthews Correlation Coefficient
- CI
Confidence Interval
Author contributions
J.P., R.H., J.J., and J.K. analyzed the dataset and developed the prediction model. J.P., R.H., J.J., J.K., and Y.L. contributed to the data interpretation and wrote the manuscript. J.P., J.H., and Y.L. supervised the study and revised the manuscript.
Funding
This work was supported by grants from the National Research Foundation of Korea (NRF-2022R1C1C1007409 to Y.L. and NRF-RS2024-00343863 to J.P.) and the Institute of Information & communications Technology Planning & Evaluation (IITP) (2021–0-01341 to J.P.) funded by the Korean Ministry of Science and ICT (MSIT); grants from the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI) (HR21C1003, HR22C1734 to J.H.) funded by the Korean Ministry of Health & Welfare; and the Environmental Health R&D Program (2021003310005 to Y.L.) funded by the Korean Ministry of Environment. Additional support was provided by the research fund of Ajou University Medical Center (2023) (to J.H.).
Data availability
The data used in the 2023 South Korea Data Challenge for Drug Discovery competition can be accessed at [https://dacon.io/competitions/official/236127/overview/description]. The datasets analyzed in this study were provided by the Korea Chemical Bank (KCB). The complete source code implementation of this study is available on GitHub at [https://github.com/qwon135/MetaboGNN]. All code is released under the MIT License.
Declarations
Competing interests
The authors declare no competing interests.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
JunHyeong Park, Ri Han, Junbo Jang and Jisan Kim have contributed equally.
Contributor Information
Joonki Paik, Email: paikj@cau.ac.kr.
Jaesung Heo, Email: nahero@ajou.ac.kr.
Yoonji Lee, Email: yoonjilee@cau.ac.kr.
References
- 1.Pritchard JF, Jurima-Romet M, Reimer MLJ, Mortimer E, Rolfe B, Cayen MN (2003) Making better drugs: decision gates in non-clinical drug development. Nat Rev Drug Discovery 2(7):542–553 [DOI] [PubMed] [Google Scholar]
- 2.Almazroo OA, Miah MK, Venkataramanan R (2017) Drug metabolism in the liver. Clin Liver Dis 21(1):1–20 [DOI] [PubMed] [Google Scholar]
- 3.Issa NT, Wathieu H, Ojo A, Byers SW, Dakshanamurthy S (2017) Drug metabolism in preclinical drug development: a survey of the discovery process, toxicology, and computational tools. Curr Drug Metab 18(6):556–565 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Thompson TN (2001) Optimization of metabolic stability as a goal of modern drug design. Med Res Rev 21(5):412–449 [DOI] [PubMed] [Google Scholar]
- 5.Asha S, Vidyavathi M (2010) Role of human liver microsomes in in vitro metabolism of drugs-a review. Appl Biochem Biotechnol 160(6):1699–1722 [DOI] [PubMed] [Google Scholar]
- 6.Nassar AEF, Kamel AM, Clarimont C (2004) Improving the decision-making process in the structural modification of drug candidates: enhancing metabolic stability. Drug Discovery Today 9(23):1020–1028 [DOI] [PubMed] [Google Scholar]
- 7.Perryman AL, Stratton TP, Ekins S, Freundlich JS (2016) Predicting mouse liver microsomal stability with “pruned” machine learning models and public data. Pharm Res 33(2):433–449 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Lin JH (1998) Applications and limitations of interspecies scaling and in vitro extrapolation in pharmacokinetics. Drug Metab Dispos 26(12):1202–1212 [PubMed] [Google Scholar]
- 9.Sodhi JK, Benet LZ (2021) Successful and unsuccessful prediction of human hepatic clearance for lead optimization. J Med Chem 64(7):3546–3559 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Zhang Y, Ye TY, Xi H, Juhas M, Li JY (2021) Deep learning driven drug discovery: tackling severe acute respiratory syndrome coronavirus 2. Front Microbiol 12:739684 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Han R, Yoon H, Kim G, Lee H, Lee Y (2023) Revolutionizing medicinal chemistry: the application of artificial intelligence (AI) in early drug discovery. Pharmaceuticals 16(9):1259 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Lai HT, Wang LY, Qian RY, Huang JH, Zhou P, Ye GY, Wu FD, Wu F, Zeng XX, Liu W (2024) Interformer: an interaction-aware model for protein-ligand docking and affinity prediction. Nat Commun 15(1):10223 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Fralish Z, Chen A, Skaluba P, Reker D (2023) Deepdelta: predicting ADMET improvements of molecular derivatives with deep learning. J Cheminform 15(1):101 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Ryu JY, Lee MY, Lee JH, Lee BH, Oh KS (2020) DeepHIT: a deep learning framework for prediction of hERG-induced cardiotoxicity. Bioinformatics 36(10):3049–3055 [DOI] [PubMed] [Google Scholar]
- 15.Lee S, Yoo S (2024) Interdili: interpretable prediction of drug-induced liver injury through permutation feature importance and attention mechanism. J Cheminform 16(1):1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Darvas F (1988) Predicting metabolic pathways by logic programming. J Mol Graph 6(2):80–86 [Google Scholar]
- 17.Hu YB, Unwalla R, Denny RA, Bikker J, Di L, Humblet C (2010) Development of QSAR models for microsomal stability: identification of good and bad structural features for rat, human and mouse microsomal stability. J Comput Aided Mol Des 24(1):23–35 [DOI] [PubMed] [Google Scholar]
- 18.Shen M, Xiao YD, Golbraikh A, Gombar VK, Tropsha A (2003) Development and validation of k-nearest-neighbor QSPR models of metabolic stability of drug candidates. J Med Chem 46(14):3013–3020 [DOI] [PubMed] [Google Scholar]
- 19.Du BX, Long Y, Li X, Wu M, Shi JY (2023) CMMS-GCL: cross-modality metabolic stability prediction with graph contrastive learning. Bioinformatics 39(8):btad503 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Wang T, Li Z, Zhuo LL, Chen YF, Fu XZ, Zou Q (2024) Ms-bacl: enhancing metabolic stability prediction through bond graph augmentation and contrastive learning. Brief Bioinform 25(3):bbae127 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.You Y, Chen TL, Sui YD, Chen T, Wang ZY, Shen Y (2020) Graph contrastive learning with augmentations. Adv Neural Inform Proc Syst 33:5812–5823 [Google Scholar]
- 22.Long TZ, Jiang DJ, Shi SH, Deng YC, Wang WX, Cao DS (2024) Enhancing multi-species liver microsomal stability prediction through artificial intelligence. J Chem Inf Model 64(8):3222–3236 [DOI] [PubMed] [Google Scholar]
- 23.Li LQ, Lu Z, Liu GX, Tang Y, Li WH (2022) In silico prediction of human and rat liver microsomal stability via machine learning methods. Chem Res Toxicol 35(9):1614–1624 [DOI] [PubMed] [Google Scholar]
- 24.Gajula SNR, Nadimpalli N, Sonti R (2021) Drug metabolic stability in early drug discovery to develop potential lead compounds. Drug Metab Rev 53(3):459–477 [DOI] [PubMed] [Google Scholar]
- 25.Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G (2009) The graph neural network model. IEEE Trans Neural Netw 20(1):61–80 [DOI] [PubMed] [Google Scholar]
- 26.Amara K, Rodríguez-Pérez R, Jiménez-Luna J (2023) Explaining compound activity predictions with a substructure-aware loss for graph neural networks. J Cheminform 15(1):67 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Hammer H, Schmidt F, Marx-Stoelting P, Pötz O, Braeuning A (2021) Cross-species analysis of hepatic cytochrome P450 and transport protein expression. Arch Toxicol 95(1):117–133 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Martignoni M, Groothuis GMM, de Kanter R (2006) Species differences between mouse, rat, dog, monkey and human CYP-mediated drug metabolism, inhibition and induction. Expert Opin Drug Metab Toxicol 2(6):875–894 [DOI] [PubMed] [Google Scholar]
- 29.Keefer C, Chang G, Carlo A, Novak JJ, Banker M, Carey J, Cianfrogna J, Eng H, Jagla C, Johnson N et al (2020) Mechanistic insights on clearance and inhibition discordance between liver microsomes and hepatocytes when clearance in liver microsomes is higher than in hepatocytes. Eur J Pharm Sci 155:105541 [DOI] [PubMed] [Google Scholar]
- 30.Xu K, Hu W, Leskovec J, Jegelka S (2018) How powerful are graph neural networks? arXiv. 10.4855/arXiv.1810.00826 [Google Scholar]
- 31.Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv. 10.4855/arXiv.1609.02907 [Google Scholar]
- 32.Velickovic P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y (2017) Graph Attention Networks. Stat 1050(20):10–48550 [Google Scholar]
- 33.Gilmer J, Schoenholz SS, Riley PF, Vinyals O, Dahl GE (2017) Neural message passing for quantum chemistry. Int Conf Mach Learn 70:70 [Google Scholar]
- 34.Ahmad W, Simon E, Chithrananda S, Grand G, Ramsundar B (2022) Chemberta-2: towards chemical foundation models. arXiv. 10.4855/arXiv.2209.01712 [Google Scholar]
- 35.Ross J, Belgodere B, Chenthamarakshan V, Padhi I, Mroueh Y, Das P (2022) Large-scale chemical language representations capture molecular structure and properties. Nat Mach Intell. 10.1038/s42256-022-00580-7 [Google Scholar]
- 36.Edwards C, Lai T, Ros K, Honke G, Cho K, Ji H (2022) Translation between molecules and natural language. arXiv. 10.4855/arXiv.2204.11817 [Google Scholar]
- 37.Zhu, J, Wu, K, Wang, B, Xia, Y, Xie, S, Meng, Q, Wu, L, Qin, T, Zhou, W and Li, H (2023) O-GNN: incorporating ring priors into molecular modeling. In ICLR, https://openreview.net/forum?id=5cFfz6yMVPU.
- 38.Xu K et al (2018) How powerful are graph neural networks? arXiv. 10.4855/arXiv.1810.00826 [Google Scholar]
- 39.Flores-Hernandez H, Martinez-Ledesma E (2024) A systematic review of deep learning chemical language models in recent era. J Cheminform. 10.1186/s13321-024-00916-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Wang YY, Wang JR, Cao ZL, Farimani AB (2022) Molecular contrastive learning of representations via graph neural networks. Nat Mach Intell 4(3):279–287 [Google Scholar]
- 41.Matsson P, Doak BC, Over B, Kihlberg J (2016) Cell permeability beyond the rule of 5. Adv Drug Deliv Rev 101:42–61 [DOI] [PubMed] [Google Scholar]
- 42.Macleod K, Coquelin KS, Huertas L, Simeons FRC, Riley J, Casado P, Guijarro L, Casanueva R, Frame L, Pinto EG et al (2024) Acceleration of infectious disease drug discovery and development using a humanized model of drug metabolism. Proc Natl Acad Sci U S A 121(7):e2315069121 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Ryu JY, Lee JH, Lee BH, Song JS, Ahn S, Oh KS (2022) PredMS: a random forest model for predicting metabolic stability of drug candidates in human liver microsomes. Bioinformatics 38(2):364–368 [DOI] [PubMed] [Google Scholar]
- 44.Chicco D, Jurman G (2020) The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21(1):6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Mastropietro A, Pasculli G, Feldmann C, Rodríguez-Pérez R, Bajorath J (2022) Edgeshaper: bond-centric Shapley value-based explanation method for graph neural networks. iScience. 10.1016/j.isci.2022.105043 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Sullivan GM, Feinn R (2012) Using effect size-or why the P value is not enough. J Grad Med Educ 4(3):279–282 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Fogelman SM, Schmider J, Venkatakrishnan K, von Moltke LL, Harmatz JS, Shader RI, Greenblatt DJ (1999) O- and N-demethylation of venlafaxine by human liver microsomes and by microsomes from cDNA-transfected cells: Effect of metabolic inhibitors and SSRI antidepressants. Neuropsychopharmacology 20(5):480–490 [DOI] [PubMed] [Google Scholar]
- 48.Molden E, Åsberg A, Christensen H (2000) CYP2D6 is involved in O-demethylation of diltiazem. Eur J Clin Pharmacol 56(8):575–579 [DOI] [PubMed] [Google Scholar]
- 49.Xu WT, Wang WL, Liu T, Xie J, Zhu CJ (2019) Late-stage trifluoromethylthiolation of benzylic C-H bonds. Nat Commun 10(1):4867 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Shah P, Westwell AD (2007) The role of fluorine in medicinal chemistry. J Enzyme Inhib Med Chem 22(5):527–540 [DOI] [PubMed] [Google Scholar]
- 51.Ullrich R, Hofrichter M (2007) Enzymatic hydroxylation of aromatic compounds. Cell Mol Life Sci 64(3):271–293 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Najmi AA, Bischoff R, Permentier HP (2022) N-dealkylation of amines. Molecules 27(10):3293 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Snape TJ, Astles AM, Davies J (2010) Understanding the chemical basis of drug stability and degradation. Pharm J 285(7622):416–417 [Google Scholar]
- 54.David L, Thakkar A, Mercado R, Engkvist O (2020) Molecular representations in AI-driven drug discovery: a review and practical guide. J Cheminform 12(1):56 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Irwin JJ, Shoichet BK (2005) ZINC - a free database of commercially available compounds for virtual screening. J Chem Inf Model 45(1):177–182 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Wu ZQ, Ramsundar B, Feinberg EN, Gomes J, Geniesse C, Pappu AS, Leswing K, Pande V (2018) MoleculeNet: a benchmark for molecular machine learning. Chem Sci 9(2):513–530 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Mahmood O, Mansimov E, Bonneau R, Cho K (2021) Masked graph modeling for molecule generation. Nat Commun 12(1):3156 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Li H, Zhang RT, Min YS, Ma DC, Zhao D, Zeng JY (2023) A knowledge-guided pre-training framework for improving molecular representation learning. Nat Commun 14(1):7568 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Zhang XY, Xu YN, Jiang CZ, Shen L, Liu XR (2024) MoleMCL: a multi-level contrastive learning framework for molecular pre-training. Bioinformatics 40(4):btae164 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Loshchilov I, HF (2019) Decoupled weight decay regularization. In International Conference on Learning Representations (ICLR), https://openreview.net/forum?id=Bkg6RiCqY7.
- 61.Haynes, D, Corns, S and KumarVenayagamoorthy, G (2012) An Exponential Moving Average Algorithm. IEEE Congress on Evolutionary Computation (Cec):1–8.
- 62.Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, Burovski E, Peterson P, Weckesser W, Bright J et al (2020) SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods 17(3):261–272 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13:281–305 [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The data used in the 2023 South Korea Data Challenge for Drug Discovery competition can be accessed at [https://dacon.io/competitions/official/236127/overview/description]. The datasets analyzed in this study were provided by the Korea Chemical Bank (KCB). The complete source code implementation of this study is available on GitHub at [https://github.com/qwon135/MetaboGNN]. All code is released under the MIT License.






