Skip to main content
ACS AuthorChoice logoLink to ACS AuthorChoice
. 2025 Oct 22;65(21):11796–11803. doi: 10.1021/acs.jcim.5c00758

Improving ADME Prediction with Multitask Graph Neural Networks and Assessing Explainability in Lead Optimization

Shoma Ito , Takuto Koyama , Shigeyuki Matsumoto , Ryosuke Kojima , Yuji Okamoto , Masataka Kuroda ‡,, Hitoshi Kawashima , Reiko Watanabe ‡,, Tomoki Yonezawa , Takaaki Sumiyoshi #, Kazuyoshi Ikeda ⊥,, Kenji Mizuguchi ‡,, Hiroaki Iwata ○,*, Yasushi Okuno †,∇,*
PMCID: PMC12606635  PMID: 41125231

Abstract

Early evaluation of absorption, distribution, metabolism, and excretion (ADME) properties is crucial for streamlining drug development. Traditional in vivo and in vitro approaches are often expensive. Moreover, during lead optimization, these methods rely heavily on the expertise of specialists, leading to efficiency challenges. Consequently, in silico methods for ADME prediction are attracting increasing attention. However, existing in silico methods face two major issues: a decline in predictive performance caused by limited ADME data and a lack of clarity regarding the rationale for lead optimization to improve ADME properties. In this study, we built an AI model capable of predicting ten different ADME parameters to overcome these challenges. Our training approach was based on a graph neural network combining multitask learning, which shares information across multiple tasks to increase the number of usable samples, with fine-tuning that adapts to each task. In addition, we applied the integrated gradients method to compound data collected before and after lead optimization to quantify and interpret each input feature’s contribution to the predicted ADME values. Our proposed model achieved the highest performance for seven of the ten ADME parameters compared with conventional methods. Furthermore, visualization of the changes in chemical structures before and after lead optimization revealed that the model’s explanations aligned well with established chemical insights. These results suggest that data-driven approaches may assist molecular design by providing complementary insights into empirical rules.


graphic file with name ci5c00758_0006.jpg


graphic file with name ci5c00758_0004.jpg

Introduction

Drug development follows a long timeline and requires significant cost, with the high probability of project discontinuation representing a major challenge. , Many drug candidates are screened during the preclinical stage or at various phases of clinical trials; however, more than 75% of the compounds that advance to clinical trials fail to receive approval. One of the main reasons for drug candidate discontinuation is poor absorption, distribution, metabolism, and excretion (ADME) properties. Therefore, to improve the development efficiency and success rates, it is essential to evaluate ADME properties during the early stages of drug development. Conventional approaches for evaluating ADME properties rely on in vitro and in vivo experiments, which are extremely costly. Moreover, during lead optimization, ADME parameter adjustments are frequently based on expert intuition and experience, which can hinder the attainment of consistent and efficient results. To address these issues, there has been growing interest in using in silico methods for the ADME property prediction. Owing to advancements in computational science and the increasing availability of experimental data, numerous machine-learning models have been developed to enable rapid and quantitative evaluation during the early screening stage.

Earlier machine learning approaches predicted ADME properties using molecular descriptors, such as extended connectivity fingerprints (ECFP), in conjunction with conventional models, including random forests and support vector machines. , Recently, graphene neural networks (GNNs) that can directly process molecular structures as inputs have gained significant attention. The application of GNN enables the effective characterization of complex molecular structures, thereby yielding more accurate predictions. , Several user-friendly online tools have been developed to facilitate prediction of ADME properties. ,

However, existing in silico methods face two major challenges. First, some ADME parameters lack sufficient training data, which reduces the generalization performance of the predictive models. This issue occurs when low-throughput human in vivo experiments are required or when ADME parameters can be measured only in later stages of development. For example, data on the fraction of unbound drugs in homogenized brain tissue (fubrain), an ADME parameter crucial for understanding drug penetration into the brain, are neither abundant nor readily obtainable, as such experiments are difficult and costly. Consequently, models predicting these parameters exhibit low generalization performance. , Second, current predictive models do not explicitly identify the chemical structures that influence the ADME properties. As the specific substructures responsible for ADME improvements remain unclear, it is difficult to use these models for lead optimization. Comparing the chemical structures before and after lead optimization enables a quantitative assessment of the effect of structural modifications on ADME properties. Although prior research has attempted to interpret the underlying rationale for predictive models, these efforts have been limited to individual compounds. Consequently, the relationship between explainable results and changes in ADME parameters following lead optimization has not been thoroughly examined, and the impact of structural modifications on ADME properties has been insufficiently understood.

In this study, we aimed to improve predictive models using limited ADME training data and evaluate their explainability in lead optimization. Specifically, we utilized experimental data for ten ADME parameters, each containing approximately 200 to 15,000 compounds. To address data scarcity, we adopted a multitask learning approach that allowed information sharing across tasks and trained a GNN on all ten ADME parameters simultaneously. Subsequently, we fine-tuned each parameter, successfully building more accurate predictive models compared with traditional methods. We then applied the integrated gradients (IG) method, an explainable AI technique, to the multitask GNN. IG computes which features are important to a neural network when a prediction is made on a particular data point. It enables both visual and quantitative identification of how individual atoms or substructures in a chemical compound contribute to the prediction. Evaluations using compounds before and after known lead optimization showed that our approach could effectively estimate the substructures linked to the undesirable ADME properties.

Materials and Method

Data Sets

In this study, ADME and explainability evaluation data sets were used to build artificial intelligence (AI) models for predicting ADME parameters and performing explainability assessments.

ADME Data Sets

Pairs of experimentally measured ADME values and the corresponding SMILES representations of the compounds were compiled to construct an AI model for predicting ADME parameters. The data set was extracted from DruMAP, which publicly shares in-house data obtained by NIBIOHN. We focused on ten ADME parameters, as detailed in Table . When multiple experimental values were available for the same compound, the average experimental value was considered. Several ADME parameter values (Rb rat, NER human, Papp LLC, CLint, Papp Caco-2, and solubility) were standardized, and the distributions of each standardized parameter are shown in Supporting Information Figure S1.

1. Details and Number of Compounds for Each ADME Parameter.
ADME parameter parameter name number of compounds unit
Rb rat blood-to-plasma concentration ratio of rat 163  
fe fraction excreted in urine 343  
NER human P-gp net efflux ratio (LLC-PK1) 446  
Papp LLC permeability coefficient (LLC-PK1) 462 nm/s
fup rat the fraction unbound in plasma of rat 536  
fubrain the fraction unbound in brain homogenate 587  
fup human the fraction unbound in plasma 3472  
CLint hepatic intrinsic clearance in the liver microsome 5256 μL/min/mg
Papp Caco-2 permeability coefficient (Caco-2) 5581 nm/s
solubility solubility 14,392 μg/mL

Explainability Evaluation Data Sets

Compound pairs before and after lead optimization were collected to evaluate the ADME prediction performance and visualize how chemical structures contribute to the predicted ADME parameters. These pairs were selected from the clinical candidate compounds reported in the Journal of Medicinal Chemistry between 2016 and 2017 and between 2018 and 2021. Data extraction was based on the following criteria: (1) both the pre- and post-lead optimization compounds had available structural information, and (2) both compounds had experimentally measured values for at least one of the ten ADME parameters. Consequently, compound pairs that met these criteria were identified, as summarized in Table .

2. Number of Compound Pairs before and after Lead Optimization for Each ADME Parameter.
ADME parameter number of compound pairs
fup rat 13
fubrain 1
fup human 13
CLint 11
Papp Caco-2 8
solubility 32

Building ADME Parameter Prediction Models

In this study, we built prediction models for ADME parameters using compound structural information (Figure a). Our models include the GNNMT+FT model, which combines multitask learning and fine-tuning based on GNN, as well as several baseline models. Detailed hyperparameter information is provided in Supporting Information Tables S1–S4.

1.

1

Workflow of the GNNMT+FT model. (a) Model construction: The model is built via multitask learning and subsequently fine-tuned for each ADME parameter. (b) Explainability: IGs are used to highlight the substructures that contribute to the predicted ADME properties of the preoptimized lead compound.

Constructing GNNMT+FT Model

Each molecule was represented as a graph G, and its experimentally measured ADME parameter was labeled y. Data set D m , which includes all labeled samples for task m, is shown below.

Dm={(Gi,yi(m))|iCm} 1

where G i = (V i , E i , X i ) denotes the graph corresponding to the ith molecule, and C m represents the set of compounds for which the ADME parameters were experimentally measured in the respective task. This graph comprises the set of atoms V i , the set of edges E i (representing bonds between atoms), and the node feature matrix X i . Details of X i are provided in Supporting Information Table S5. In addition, yi(m)RM denotes a vector containing the experimentally measured values of M ADME parameters. In this study, a two-stage approach was employed. First, a pretrained GNN model was constructed by using multitask learning. Subsequently, the model was fine-tuned for each ADME parameter.

GNN models are constructed using the kMoL package. The details of the GNN model construction are provided in Supporting Information S2. The pretrained model consisted of a graph-embedding function and individual prediction models for each ADME parameter. The graph embedding function maps a molecular graph G i to an embedding vector h i and is expressed as

hi=fθ(Gi) 2

where θ denotes learned parameters. The 960-dimensional embedding vector h i encodes the i th molecule. Using this embedding, we computed the predicted value for each ADME parameter m, denoted by ŷ i .

y^i(m)=gθm(hi) 3

where g θ m denotes the prediction function for task m that incorporates the task-specific model parameters θ m . Smooth L1 loss was employed as the loss function.

L(y,y^)={12(yy^)2,if|yy^|<1|yy^|12,otherwise 4

In multitask learning, the total loss across all ADME parameters, LMT , is minimized.

LMT=m=1M1|Dm|(Gi,yi)DmL(yi(m),y^i(m)) 5

|D m | denotes the number of samples in data set D m . Because some ADME parameters may lack labels in the multitask learning setting, missing values were excluded from the loss calculation.

In fine-tuning, the multitask pretrained GNN model parameters θ serve as the initialization, and the loss LFT(m) for each ADME parameter is minimized.

LFT(m)=1|Dm|(Gi,yi)DmL(yi(m),y^i(m)) 6

Baseline Models

To assess the utility of the GNNMT+FT models, three baseline models were employed: a single-task model (GNNsingle) without pretraining, a multitask model (GNNMT), and a random forest (RF) regression model. GNNsingle and GNNMT share the same architecture and optimization methods as GNNMT+FT. The RF model uses 2048-bit ECFP4 fingerprints as molecular descriptors, with separate training for each ADME parameter.

Explainability Method

In this study, we employed IG to estimate the contribution of each input feature to the predicted outcome of a compound (Figure b). IG is a gradient-based approach that quantifies the contribution of each molecular feature to the prediction. Here, h i is a vector whose dimensions correspond to individual molecular features, and the contribution of each feature is represented by the IG score s i .

si=(hihi)α=01gθm(hi+α(hihi))hidα 7

where h i ′ is the baseline of feature vector h i , and in this study, we set it to zero vector. In addition, the IG scores for each atom were normalized.

sinorm=simax(|s1|,|s2|,...,|sn|),i=1,2,...,n 8

Performance Evaluation

A 5-fold cross-validation on the ADME data set was performed to evaluate the predictive performance of each ADME parameter. For this cross-validation, the data set was randomly divided into five subsets: one subset (1/5) for testing and four subsets (4/5) for training. In addition, 10% of the training portion was allocated for validation. The performances of the trained models on the test data were then assessed using three metrics: the coefficient of determination (R 2, eq ), mean absolute error (MAE, eq ), and root mean squared error (RMSE, eq ). The equations for these metrics are as follows:

R2=1(Gi,yi)Dm(yi(m)y^i(m))2(Gi,yi)Dm(yi(m)y(m))2 9
MAE=1|Dm|(Gi,yi)Dm|yi(m)y^i(m)| 10
RMSE=1|Dm|(Gi,yi)Dm(yi(m)y^i(m))2 11

Results and Discussion

Performance Evaluation Results for ADME Parameter Prediction Models

We compared the predictive accuracy of the proposed GNNMT+FT approach with that of a single-task model (GNNsingle), a multitask model (GNNMT), and a conventional machine learning model (RF). Figure and Table present a comparison using R 2 as the performance metric, while the MAE and RMSE scores for each model are detailed in the Supporting Information Tables S6 and S7.

2.

2

Comparison of predictive performance for four modelsGNNMT+FT, GNNMT, GNNsingle, and RFacross ten ADME parameters. The vertical axis shows the coefficient of determination (R 2), while the horizontal axis lists the ADME parameters (Rb rat, fe, NER human, Papp LLC, fup rat, fubrain, fup human, CLint, Papp Caco-2, solubility) along with the number of compounds. Each bar represents the mean R 2 value and its standard deviation, as determined by 5-fold cross-validation.

3. Performance Evaluation Using R 2 .

ADME parameter RF GNNsingle GNNMT GNNMT+FT
Rb rat 0.246 ± 0.104 0.168 ± 0.177 0.259 ± 0.214 0.3 7 2 ± 0.1 7 9
fe 0.251 ± 0.023 0.3 3 0 ± 0.0 7 2 0.214 ± 0.075 0.274 ± 0.081
NER human 0.103 ± 0.102 0.1 0 5 ± 0.1 1 4 0.011 ± 0.158 0.045 ± 0.150
Papp LLC 0.236 ± 0.039 0.280 ± 0.051 0.300 ± 0.063 0.3 1 2 ± 0.0 6 2
fup rat 0.259 ± 0.112 0.401 ± 0.093 0.647 ± 0.048 0.6 5 6 ± 0.0 4 9
fubrain 0.277 ± 0.074 0.5 9 2 ± 0.0 6 2 0.560 ± 0.062 0.568 ± 0.093
fup human 0.493 ± 0.044 0.587 ± 0.064 0.639 ± 0.059 0.6 4 2 ± 0.0 5 8
CLint 0.630 ± 0.020 0.610 ± 0.038 0.645 ± 0.027 0.6 4 6 ± 0.0 2 9
Papp Caco-2 0.488 ± 0.037 0.492 ± 0.028 0.515 ± 0.037 0.5 1 9 ± 0.0 3 3
solubility 0.515 ± 0.032 0.5 5 9 ± 0.0 3 0 0.536 ± 0.019 0.539 ± 0.020
average 0.350 0.412 0.433 0.457

When comparing the GNNMT model with conventional methods, the GNNsingle and RFGNNMT models consistently demonstrated superior performance. Notably, when comparing tasks with small amounts of data (fe, Papp LLC, and fup rat), a significant improvement in performance due to multitask learning was observed for Papp LLC (R 2 = 0.300) and fup rat (R 2 = 0.647) (Table ). This is likely because information was effectively shared between them and their respective similar tasks, Papp Caco-2 and fup human. Accordingly, the predictive performance for Papp Caco-2 (R 2 = 0.515) and fup human (R 2 = 0.639) also improved. On the other hand, for tasks where single-task models already achieved sufficient predictive performance, such as fubrain and solubility, no significant improvement in performance was confirmed. However, it is considered meaningful to incorporate these tasks into the multitask learning framework because they are thought to play a role in sharing information with other tasks that require less data. As for tasks such as Rb rats and NER humans, which had very limited data, predictive performance varied considerably across test splits, making accurate evaluation challenging. To further evaluate the uncertainty of the estimates, we conducted additional experiments in which models were repeatedly trained on the same fold with three different random seed initializations for the NER human and Rb rat. The results revealed that the multitask GNN models exhibited larger variance across seeds compared with the single-task GNN and random forest models (Supporting Information Figure S2 and Table S8). This variability is likely attributable to the fact that multitask learning simultaneously optimizes multiple tasks, making the resulting solutions more sensitive to initialization and thus more prone to fluctuations across seeds. These findings highlight that particular caution is required when applying multitask learning to endpoints with limited compound numbers.

Next, the GNNMT model was compared with the GNNMT+FT model, in which each task was fine-tuned individually. Among the eight ADME parameters deemed suitable for reliable evaluation (fe, Papp LLC, fup rat, fubrain, fup human, CLint, Papp Caco-2, and solubility), further performance improvements were observed in five parameters: Papp LLC, fup rat, fup human, CLint, and Papp Caco-2. Specifically, R 2 increased to 0.656 for fup rat and 0.519 for Papp Caco-2 cells. These findings suggest that individually fine-tuning each task leverages the benefits of multitask learning while reducing interference from less-correlated tasks, thereby enhancing the overall performance. To clarify whether the performance gains of fine-tuning are simply attributable to longer training, we conducted a control experiment in which the GNNextended MT model was trained for an extended number of steps, matching the total training duration of the GNNMT+FT setting. The results showed that the GNNMT+FT model outperformed the extended multitask model (Supporting Information Table S9). This finding indicates that the improvement is not merely due to longer training. Rather, it suggests that multitask learning has inherent limitations in fully optimizing individual endpoints, and that fine-tuning enables task-specific learning to specialize further, thereby improving predictive accuracy for each ADME parameter.

To rigorously evaluate the generalization performance of our proposed method, we conducted a benchmark evaluation using scaffold splitting. In this setting, GNNMT achieved the highest R 2 values in two tasks, while GNNMT+FT achieved the best results in three tasks. More importantly, the average R 2 value of GNNMT+FT (0.268) exceeded that of all other models, demonstrating its overall advantage (Supporting Information Table S10). The smaller performance gain compared to random splitting is likely due to the greater difficulty of scaffold splitting, where model performance tends to plateau, making differences between methods less pronounced.

Our results show that even with limited data, as in the ADME data set, we can build high-performance models by combining multitask learning with task-specific fine-tuning. We leveraged shared information and applied tailored adjustments to effectively overcome data scarcity.

Evaluation of Our GNN Models on Public ADME Data Sets

To further evaluate the performance of our proposed model, we conducted an assessment using the publicly available Polaris Biogen ADME data set (https://polarishub.io/datasets/biogen/adme-fang-v1), which is widely used in benchmarking studies for ADME prediction (Table ). The performance results of baseline models were obtained from the work by Cheng Fang et al. Our results show that the GNNMT or GNNMT+FT models achieved the highest predictive accuracy across the majority of ADME parameters.

4. Pearson Correlation for the Polaris Biogen ADME Dataset.

ADME parameter RF GNNsingle GNNMT GNNMT+FT
HLM 0.548 0.668 0.720 0.722
MDR1-MDCKER 0.644 0.665 0.7568 0.7574
solubility 0.479 0.550 0.580 0.579
RLM 0.558 0.700 0.748 0.750
hPPB 0.533 0.698 0.719 0.724
rPPB 0.528 0.556 0.689 0.692

When comparing the GNNsingle and GNNMT models, performance improvements were observed for all parameters, demonstrating the benefit of multitask learning. In contrast, the comparison between GNNMT and GNNMT+FT revealed minimal differences in performance for most endpoints. This suggests that additional fine-tuning was not necessary and that multitask learning alone was sufficient to achieve high predictive accuracy. These findings imply that, for the Biogen ADME data set, the large number of compounds available for each parameter allowed effective model training through multitask learning without the need for fine-tuning.

Assessing Explainability in Lead-Optimized Compounds

We applied IG to the GNNMT+FT model to identify the substructures that may contribute to unfavorable ADME properties. Specifically, we visualized the contribution of each atom to the predicted ADME parameter values. We then evaluated the validity of this interpretation using the following procedure: (1) select known compounds from the literature that successfully underwent lead optimization; (2) predict their ADME parameters before and after optimization; (3) visualize the substructure contributions of the preoptimized compounds; and (4) assess the highlighted substructures from a medicinal chemistry perspective. For the evaluation, we focused on three ADME parameters: CLint, Papp Caco-2, and solubility, which showed high predictive performance under the GNNMT+FT model. We identified five compound pairs whose predicted value changes in ADME parameters were consistent with the experimentally measured value changes (Table S11). Specifically, LXH254 and GLPG-1205 were used for CLint and Papp Caco-2 cells, respectively, and BMS-986278 was used for solubility. Consequently, in each compound pair, the substructures highlighted by the visualization were modified during lead optimization to improve the ADME properties (Figure ).

3.

3

Black circles in the figures highlight key structural modifications and their impact on the ADME properties. Red indicates a positive contribution to the predicted value, while blue indicates a negative contribution; deeper shades represent stronger contributions. Shown are the CLint evaluations for LXH254 (a) and GLPG-1205 (b), the Papp Caco-2 evaluation for GLPG-1205 (c), the solubility evaluation for BMS-986278 (d), and the Papp Caco-2 evaluation for LXH254 (e).

Interpretation of Lead Optimization for CLint

LXH254

Figure a shows the chemical structure for LXH254, both before and after lead optimization. For the preoptimized compound, we also provide an interpretation obtained using IG. In the IG method, the red-highlighted substructures contributed positively to the CLint value, suggesting that the modification may lead to a reduction in CLint. Previous reports support this interpretation; in region (i), substituting a carbon atom with a nitrogen atom improves metabolic stability, whereas in region (ii), replacing a tert-butyl group with a CF3 group retains lipophilicity and enhances metabolic stability. These results support the validity of the explainability approach and suggest that the red-lighted substructures may be associated with changes in CLint, providing insights into possible directions for structural modification.

GLPG-1205

We used the IG method to assess the influence of lead optimization on CLint in GLPG-1205. For the preoptimized compound, we explained the substructures contributing to CLint and found that the region undergoing structural modification is highlighted in red (Figure b). The circled region in the preoptimized molecule contains a methoxy group prone to demethylation, which likely contributed to the increase in CLint. During lead optimization, modifications to this substructure improved the metabolic stability, and the explainability results from the GNNMT+FT model aligned well with existing chemical insights. These results indicate that the red-highlighted region identified by IG may provide supportive evidence for considering strategies to reduce CLint.

For both LXH254 and GLPG-1205, the observed improvements in CLint following lead optimization were consistent with the experimental findings. These results suggest that the explainability approach based on the proposed GNNMT+FT model has the potential to guide effective lead optimizations.

Interpretation of Lead Optimization for Papp-Caco2

GLPG-1205

We evaluated the effects of lead optimization on Papp Caco-2 cells using GLPG-1205. Explainability of the substructures contributing to Papp Caco-2 revealed that the blue-lighted atoms were associated with decreased Papp Caco-2 (Figure c). Circular nitrogen atoms contribute to the lower permeability. Replacing nitrogen with an oxygen atom removes a proton donor, which is expected to improve permeability. This substructure was modified during lead optimization, resulting in enhanced permeability and alignment of the GNNMT+FT redox with established chemical insights. These results suggest that the blue-highlighted portion may be related to Papp Caco-2 and could provide insights for considering structural modifications.

Interpretation of Lead Optimization for Solubility

BMS-986278

The effect of Pb optimization on the solubility of BMS-986278 was evaluated. Explainability revealed that the atoms highlighted in blue were linked to a lower solubility (Figure d). In particular, the circular biaryl structure, comprising consecutive benzene rings, exhibits high lipophilicity and planar geometry, which likely reduces solubility before optimization. During lead optimization, the modification of this substructure enhanced solubility. The interpretation from the GNNMT+FT model aligns with the established chemical insights. These results suggest that the blue-highlighted region may influence solubility and could offer insights into potential directions for modification.

Effectiveness of the Explainability in Lead Optimization for Papp Caco-2

LXH254

We investigated the effect of lead optimization on Papp Caco-2 cells in LXH254 cells by visualizing the substructures that contribute to the Papp Caco-2 value. As shown in Figure e, the circled region highlighted in red appears to be associated with an increased number of Papp Caco-2 cells, although conventional chemical insights do not directly link this substructure to permeability.

It is possible that this red-highlighted structure influenced Papp Caco-2 cells in ways that are not captured by traditional medicinal chemistry. After lead optimization, the model predicted a decreased Papp Caco-2 value, and the circled substructure was modified. These results suggest that data-driven approaches may complement conventional medicinal chemistry by highlighting substructures whose influence on the permeability is not yet well established.

Robustness of IG Explanations to Baseline Choice

In this study, we used the zero vector as the baseline for IG, following Sundararajan et al., but IG is dependent on the baseline selection. To examine the dependence of IG-based explanations on baseline choice, we compared the results obtained with a zero-vector baseline and with a random-vector baseline. As shown in the Supporting Information Figure S3, substructures identified as highly contributing with the zero baseline also consistently showed high contributions under random baselines, whereas some regions exhibited variability depending on the baseline. These findings suggest that IG provides a certain degree of robustness in identifying major contributing substructures, but interpretations of more ambiguous regions should be treated with caution.

Conclusions

In this study, we propose and evaluate a GNNMT+FT model to improve the accuracy of ADME parameter predictions in drug development and guide molecular design based on these predictions. By integrating multitask learning and fine-tuning, this approach achieved highly accurate ADME predictions, which were previously challenging with conventional methods, and demonstrated the feasibility of constructing models that handle ADME parameters with limited training data.

We also introduced an explainability method using IG, which captures the contribution of molecular structures to each ADME parameter. The IG method clarifies how specific functional groups and substructures influence ADME properties and suggests the possibility of identifying unfavorable structures early in drug development. Our findings, validated by both experimental data and the expert opinion of a medicinal chemist, suggest that the proposed GNNMT+FT model can directly learn meaningful chemical features related to ADME properties.

A limitation of the current study is the relatively small number of compounds used for the evaluation. This is primarily due to the limited availability of public data sets containing compounds suitable for assessing molecular optimization. Furthermore, in this study, we conducted a retrospective evaluation targeting only the regions that were actually modified during lead optimization among the multiple substructures highlighted by IG. However, this evaluation alone cannot provide predictive guidance on which substructures should be preferentially modified to improve ADME properties in future lead optimization. This represents an interpretational limitation of our method. We acknowledge that a more extensive evaluation with a larger and more diverse set of compounds is crucial to further substantiate these findings. Future work will focus on expanding our validation through collaborations with pharmaceutical companies, which will provide access to more extensive proprietary data sets and enable more thorough experimental verification.

Our approach may contribute to improving ADME properties during lead optimization using in silico approaches by leveraging high-performance ADME prediction and explainability. Moreover, it could support more efficient early selection of lead compounds and offer insights that may inform molecular design considering ADME characteristics. These advantages could help accelerate the overall drug discovery process and improve the success rates.

Supplementary Material

ci5c00758_si_001.pdf (946.8KB, pdf)

Acknowledgments

This research was supported by the Japan Agency for Medical Research and Development (AMED) under Grant Number JP22nk0101111. This work was also supported by a Grant-in-Aid for Transformative Research Areas(A)“Latent Chemical Space”[JP24H01771] for HI from the Ministry of Education, Culture, Sports, Science and Technology, Japan, and by the Japan Research Foundation for Clinical Pharmacology. OpenAI ChatGPT was used to improve the wording of some paragraphs, but not to generate new content. We would like to thank Editage (www.editage.jp) for English language editing.

The ADME data sets used in this study were extracted from DruMAP version 2.0. The processed data sets are available at https://github.com/clinfo/ADME_MTFT, and the code for constructing GNN models can be found at https://github.com/elix-tech/kmol.

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jcim.5c00758.

  • Distributions of ADME parameters before and after log transformation; best hyperparameters for GNNMT+FT; best hyperparameters for GNNMT; best hyperparameters for GNNsingle; best hyperparameters for RF; node features of a graph representation; performance evaluation using MAE; performance evaluation using RMSE; prediction variability with multiple random seeds for Rb rat and NER human; performance evaluation with multiple random seeds using R 2; performance evaluation comparing extended multitask (extended MT) and fine-tuned (MT+FT) models using R 2; performance evaluation on scaffold split using R 2; experimental and predicted values before and after lead optimization; and assessment of uncertainty of IG to different baselines (zero vs random) (PDF)

§.

S.I. and T.K. contributed equally to this work. S.I., T.K., S.M., R.K., H.I., and Y.O. conceived and designed the study. M.K., H.K., R.W., and K.M. collected ADME data sets. T.Y. and K.I. collected explainability evaluation data sets. S.I. and T.K. performed the calculations and analyzed the data. S.I., T.K., S.M., R.K., T.S., H.I., and Y.O. contributed to interpreting the results. S.I. drafted the original manuscript. S.I., T.K., S.M., R.K., Y.O., H.I., and Y.O. revised the drafts. All the authors approved the final version of the manuscript.

The authors declare no competing financial interest.

References

  1. DiMasi J. A., Grabowski H. G., Hansen R. W.. Innovation in the pharmaceutical industry: new estimates of R&D costs. Journal of health economics. 2016;47:20–33. doi: 10.1016/j.jhealeco.2016.01.012. [DOI] [PubMed] [Google Scholar]
  2. Fleming N.. How artificial intelligence is changing drug discovery. Nature. 2018;557:S55–S55. doi: 10.1038/d41586-018-05267-x. [DOI] [PubMed] [Google Scholar]
  3. Dowden H., Munro J.. Trends in clinical success rates and therapeutic focus. Nat. Rev. Drug Discovery. 2019;18:495–496. doi: 10.1038/d41573-019-00074-z. [DOI] [PubMed] [Google Scholar]
  4. Waring M. J., Arrowsmith J., Leach A. R., Leeson P. D., Mandrell S., Owen R. M., Pairaudeau G., Pennie W. D., Pickett S. D., Wang J.. et al. An analysis of the attrition of drug candidates from four major pharmaceutical companies. Nat. Rev. Drug Discovery. 2015;14:475–486. doi: 10.1038/nrd4609. [DOI] [PubMed] [Google Scholar]
  5. Tsaioun K., Bottlaender M., Mabondzo A.. The Alzheimer's Drug Discovery Foundation. ADDME–Avoiding Drug Development Mistakes Early: central nervous system drug discovery perspective. BMC Neurol. 2009;9:S1. doi: 10.1186/1471-2377-9-S1-S1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Iga K.. Verification of pharmacokinetic approaches in prior drug development. Yakugaku Zasshi: Journal of the Pharmaceutical Society of Japan. 2019;139:437–460. doi: 10.1248/yakushi.18-00190. [DOI] [PubMed] [Google Scholar]
  7. Wang Y., Xing J., Xu Y., Zhou N., Peng J., Xiong Z., Liu X., Luo X., Luo C., Chen K.. et al. In silico ADME/T modelling for rational drug design. Q. Rev. Biophys. 2015;48:488–515. doi: 10.1017/S0033583515000190. [DOI] [PubMed] [Google Scholar]
  8. Obrezanova O.. Artificial intelligence for compound pharmacokinetics prediction. Curr. Opin. Struct. Biol. 2023;79:102546. doi: 10.1016/j.sbi.2023.102546. [DOI] [PubMed] [Google Scholar]
  9. Dulsat J., López-Nieto B., Estrada-Tejedor R., Borrell J. I.. Evaluation of free online ADMET tools for academic or small biotech environments. Molecules. 2023;28:776. doi: 10.3390/molecules28020776. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Sakiyama Y., Yuki H., Moriya T., Hattori K., Suzuki M., Shimada K., Honma T.. Predicting human liver microsomal stability with machine learning techniques. Journal of molecular graphics and Modelling. 2008;26:907–915. doi: 10.1016/j.jmgm.2007.06.005. [DOI] [PubMed] [Google Scholar]
  11. Chen B., Sheridan R. P., Hornak V., Voigt J. H.. Comparison of random forest and Pipeline Pilot Naive Bayes in prospective QSAR predictions. J. Chem. Inf. Model. 2012;52:792–803. doi: 10.1021/ci200615h. [DOI] [PubMed] [Google Scholar]
  12. Duvenaud, D. K. ; Maclaurin, D. ; Iparraguirre, J. ; Bombarell, R. ; Hirzel, T. ; Aspuru-Guzik, A. ; Adams, R. P. . Convolutional networks on graphs for learning molecular fingerprints. Adv. Neural Inf. Process. Syst. 2015, 28. [Google Scholar]
  13. Siramshetty V., Williams J., Nguyen Đ-T., Neyra J., Southall N., Mathé E., Xu X., Shah P.. Validating ADME QSAR models using marketed drugs. SLAS Discovery. 2021;26:1326–1336. doi: 10.1177/24725552211017520. [DOI] [PubMed] [Google Scholar]
  14. Fu L., Shi S., Yi J., Wang N., He Y., Wu Z., Peng J., Deng Y., Wang W., Wu C.. et al. ADMETlab 3.0: an updated comprehensive online ADMET prediction platform enhanced with broader coverage, improved performance, API functionality and decision support. Nucleic Acids Res. 2024;52:W422–W431. doi: 10.1093/nar/gkae236. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Wei Y., Li S., Li Z., Wan Z., Lin J.. Interpretable-ADMET: a web service for ADMET prediction and optimization based on deep neural representation. Bioinformatics. 2022;38:2863–2871. doi: 10.1093/bioinformatics/btac192. [DOI] [PubMed] [Google Scholar]
  16. Esaki T., Ohashi R., Watanabe R., Natsume-Kitatani Y., Kawashima H., Nagao C., Mizuguchi K.. Computational model to predict the fraction of unbound drug in the brain. J. Chem. Inf. Model. 2019;59:3251–3261. doi: 10.1021/acs.jcim.9b00180. [DOI] [PubMed] [Google Scholar]
  17. Esaki T., Ikeda K.. Difficulties and prospects of data curation for ADME in silico modeling. Chem-Bio Informatics Journal. 2023;23:1–6. doi: 10.1273/cbij.23.1. [DOI] [Google Scholar]
  18. Iwata H., Matsuo T., Mamada H., Motomura T., Matsushita M., Fujiwara T., Maeda K., Handa K.. Predicting total drug clearance and volumes of distribution using the machine learning-mediated multimodal method through the imputation of various nonclinical data. J. Chem. Inf. Model. 2022;62:4057–4065. doi: 10.1021/acs.jcim.2c00318. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Sundararajan, M. ; Taly, A. ; Yan, Q. . Axiomatic Attribution for Deep Networks. Proceedings of the 34th International Conference on Machine Learning, 2017; pp 3319–3328.
  20. Kawashima H., Watanabe R., Esaki T., Kuroda M., Nagao C., Natsume-Kitatani Y., Ohashi R., Komura H., Mizuguchi K.. DruMAP: A novel drug metabolism and pharmacokinetics analysis platform. J. Med. Chem. 2023;66:9697–9709. doi: 10.1021/acs.jmedchem.3c00481. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Brown D. G., Bostrom J.. Where do recent small molecule clinical development candidates come from? Journal of medicinal chemistry. 2018;61:9442–9468. doi: 10.1021/acs.jmedchem.8b00675. [DOI] [PubMed] [Google Scholar]
  22. Brown D. G.. An analysis of successful hit-to-clinical candidate pairs. Journal of medicinal chemistry. 2023;66:7101–7139. doi: 10.1021/acs.jmedchem.3c00521. [DOI] [PubMed] [Google Scholar]
  23. Cozac R., Hasic H., Choong J. J., Richard V., Beheshti L., Froehlich C., Koyama T., Matsumoto S., Kojima R., Iwata H.. et al. kMoL: an open-source machine and federated learning library for drug discovery. J. Cheminf. 2025;17:22. doi: 10.1186/s13321-025-00967-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Rogers D., Hahn M.. Extended-connectivity fingerprints. J. Chem. Inf. Model. 2010;50:742–754. doi: 10.1021/ci100050t. [DOI] [PubMed] [Google Scholar]
  25. Fang C., Wang Y., Grater R., Kapadnis S., Black C., Trapa P., Sciabola S.. Prospective validation of machine learning algorithms for absorption, distribution, metabolism, and excretion prediction: An industrial perspective. J. Chem. Inf. Model. 2023;63:3263–3274. doi: 10.1021/acs.jcim.3c00160. [DOI] [PubMed] [Google Scholar]
  26. Nishiguchi G. A., Rico A., Tanner H., Aversa R. J., Taft B. R., Subramanian S., Setti L., Burger M. T., Wan L., Tamez V.. et al. Design and Discovery of N-(2-Methyl-5-morpholino-6-((tetrahydro-2 H-pyran-4-yl) oxy)-[3, 3-bipyridin]-5-yl)-3-(trifluoromethyl) benzamide (RAF709): A Potent, Selective, and Efficacious RAF Inhibitor Targeting RAS Mutant Cancers. J. Med. Chem. 2017;60:4869–4881. doi: 10.1021/acs.jmedchem.6b01862. [DOI] [PubMed] [Google Scholar]
  27. Ramurthy S., Taft B. R., Aversa R. J., Barsanti P. A., Burger M. T., Lou Y., Nishiguchi G. A., Rico A., Setti L., Smith A.. et al. Design and discovery of N-(3-(2-(2-Hydroxyethoxy)-6-morpholinopyridin-4-yl)-4-methylphenyl)-2-(trifluoromethyl) isonicotinamide, a selective, efficacious, and well-tolerated RAF inhibitor targeting RAS mutant cancers: the path to the clinic. J. Med. Chem. 2029;63:2013–2027. doi: 10.1021/acs.jmedchem.9b00161. [DOI] [PubMed] [Google Scholar]
  28. Haug J., Zürn S., El-Jiz P., Kasneci G.. On baselines for local feature attributions. arXiv. 2021 doi: 10.48550/arXiv.2101.00905. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ci5c00758_si_001.pdf (946.8KB, pdf)

Data Availability Statement

The ADME data sets used in this study were extracted from DruMAP version 2.0. The processed data sets are available at https://github.com/clinfo/ADME_MTFT, and the code for constructing GNN models can be found at https://github.com/elix-tech/kmol.


Articles from Journal of Chemical Information and Modeling are provided here courtesy of American Chemical Society

RESOURCES