Skip to main content
eBioMedicine logoLink to eBioMedicine
. 2025 Oct 28;121:105994. doi: 10.1016/j.ebiom.2025.105994

Drug toxicity prediction based on genotype-phenotype differences between preclinical models and humans

Minhyuk Park a,b,d, Woomin Song a,d, Hyunsoo Ahn c,d, Sanguk Kim a,c,
PMCID: PMC12597050  PMID: 41161095

Summary

Background

A major hurdle in drug development is the poor translatability of preclinical toxicity findings to human outcomes, largely due to biological differences between humans and model organisms. This gap leads to high clinical trial attrition and post-marketing drug withdrawals. Existing toxicity prediction methods primarily rely on chemical properties and typically overlook these inter-species (or -organism) differences.

Methods

We developed a machine learning framework that incorporates genotype-phenotype differences (GPD) between preclinical models (cell lines and mice) and humans to improve the prediction of human drug toxicity. Clinical risk information on drugs (e.g., clinical trials or post-marketing surveillance) was obtained without bias from published reports, data sources, and databases. GPD of drug target was assessed across three biological contexts: gene essentiality, tissue expression profiles, and network connectivity. We benchmarked the GPD-based model against state-of-the-art toxicity predictors and evaluated its performance using independent datasets and chronological validation.

Findings

Using a dataset of 434 risky and 790 approved drugs, GPD features were significantly associated with drug failures due to severe adverse events. The Random Forest model integrating GPD with chemical features demonstrated enhanced predictive accuracy (AUPRC = 0.63 vs. baseline 0.35; AUROC = 0.75 vs. baseline 0.50), particularly for neurotoxicity and cardiovascular toxicity, two major causes of clinical failures that were previously overlooked due to their chemical properties alone. Our model outperformed state-of-the-art chemical structure-based models and demonstrated a practical ability to anticipate future drug withdrawals in real-world settings.

Interpretation

Incorporating differences in genotype-phenotype relationships offers a biologically grounded strategy for drug toxicity prediction. Our framework enables early identification of high-risk drugs in clinical development. This approach holds promise for reducing development costs, improving patient safety, and increasing the success rate of therapeutic approvals.

Funding

Korean National Research Foundation (2020R1A6A1A03047902, RS-2025-16070008), IITP (2019-0-01906, Artificial Intelligence Graduate School Program and IITP-2024-RS-2024-00441244, Global Data-X Leader HRD program, POSTECH)

Keywords: Drug safety, Translational medicine, Cross-species translation, Machine learning, Artificial intelligence


Research in context.

Evidence before this study

Predicting human drug toxicity remains a persistent challenge in pharmaceutical development. A significant proportion of drug candidates that demonstrate safety in preclinical models subsequently fail in clinical trials or are withdrawn post-marketing due to unforeseen severe adverse events (SAEs). Existing predictive approaches predominantly relied on chemical properties such as drug-likeness rules (e.g., Lipinski, Veber) or AI models based on chemical structure-based features. However, these methods often fail to capture human-specific toxicities due to biological differences between preclinical models and humans. Although target gene-centric features, such as gene essentiality, tissue specificity, and network connectivity, have been linked to toxicity, few studies have explored inter-species or inter-organism differences in these features for predicting human toxicity.

Added value of this study

This study presents a machine learning framework that incorporates genotype-phenotype differences (GPD) between preclinical models and humans to enhance the prediction of human-specific drug toxicity. By assessing discrepancies in gene essentiality, tissue specificity, and network connectivity of drug targets, we showed that those GPD-based features are associated with toxic outcomes in humans. When combined with traditional chemical descriptors, GPD features improved predictive performance over chemical structure-based state-of-the-art models. Notably, the GPD features effectively detected high-risk drugs associated with neurological and cardiovascular toxicity—common reasons for late-stage failures—that were previously overlooked by chemical-based assessments. Model robustness was demonstrated through evaluation on independent datasets of drug-adverse event associations, confirming its capability to identify drugs linked to severe toxicities. Chronological validation further confirmed the model's utility to anticipate future drug withdrawals in real-world drug development.

Implications of all the available evidence

Incorporating differences in genotype-phenotype relationships between preclinical models and humans offers a biologically grounded prediction of human drug toxicity. GPD-based models can capture signals that are missed by conventional chemical-based methods, particularly for toxicity classes that frequently lead to clinical trial failures. This strategy holds promise as an early warning system to improve drug safety evaluations, reduce attrition rates, and inform prioritisation in the drug development pipeline. As drug-target annotations and functional genomics datasets continue to expand, GPD-based frameworks are expected to play an increasingly pivotal role in the development of safer and more effective therapeutics.

Introduction

Assessing drug-induced toxicity for humans early in pharmaceutical development remains a critical challenge. Toxicities are a leading cause of clinical trial failures and post-market withdrawals, resulting in wasted costs.1, 2, 3 Unfortunately, current preclinical models, including in vitro cell cultures and in vivo animal studies, often fail to fully capture human-specific drug toxicities. This leads to translational inaccuracies that contribute to unanticipated severe adverse events in humans.4,5 For instance, sibutramine, an appetite suppressant, exhibited no severe cytotoxic effects in preclinical studies6 but was later withdrawn due to life-threatening cardiovascular risks in humans.7

Traditional approaches to drug toxicity assessment have primarily relied on chemical properties, such as rules proposed by Lipinski, Veber, and Ghose, providing valuable but insufficient features to forecast human-specific toxicological outcomes.8 A critical limitation of these approaches is the failure to account for fundamental biological differences between preclinical models and humans, particularly in the genotype-phenotype relationship. These differences can lead to discrepancies between preclinical toxicity assessments and actual human outcomes, often resulting in unforeseen adverse events during clinical trials and post-market phases. Indeed, artificial intelligence (AI) models based on chemical properties struggle to bridge the gap between preclinical data and human toxicity.9

To address this issue, we hypothesised that the genotype-phenotype differences (GPD)—defined as variations in how genetic factors influence traits between preclinical models and humans—play a pivotal role in determining human drug toxicity outcomes. Drug failure due to toxicity often arises from the perturbation of target genes.2 Accordingly, the relationship between characteristics of genes and toxicity has been extensively studied across various biological contexts such as gene essentiality,10, 11, 12 tissue specificity,13,14 and connectivity within biological networks.10,13,15 However, these genetic characteristics are not conserved across species (or organisms), leading to variations in drug-induced phenotypic effects. Our previous research has demonstrated that discrepancies in gene essentiality between cells and humans contribute to drug safety failures.16 Nevertheless, the impact of inter-species differences, particularly between widely used animal models, such as mice, and humans, remains insufficiently explored in various contexts. Evolutionary divergence influences the phenotypic consequences of gene perturbation by altering protein–protein interaction networks17 and noncoding regulatory elements.18,19 These evolutionary modifications may lead to species (or organisms)-specific phenotypic responses to drug-induced gene perturbations, emphasising the need to consider inter-species and inter-organism differences for accurate human drug toxicity assessment.

In this study, we present a GPD-based machine learning framework designed to predict human drug toxicity. Our approach incorporates inter-species and inter-organism differences in genotype-phenotype relationships to improve toxicity assessment. By systematically comparing gene essentiality, tissue expression profiles, and biological network connectivity between preclinical models and humans, we found a significant correlation between human drug toxicity and GPD features. Furthermore, we discovered that GPD features can enhance chemical properties in drug toxicity prediction by identifying risky drugs linked to major toxicity classes that result in drug failure, such as toxicity in the nervous and cardiovascular systems. Our model surpasses other state-of-the-art models in toxicity assessment and was evaluated by predicting real-world outcomes through chronological validation. Our method offers a valuable framework for assessing human drug toxicity, aiding in the reduction of the attrition rate in drug development.

Methods

Drug dataset with toxicity profiles in humans

We compiled drugs that passed toxicity tests in preclinical models and entered the clinical stage but were risky for SAEs in humans. Based on their risk of causing SAEs, these drugs were categorised into two groups.

  • 1.

    Drugs that failed in clinical trials due to safety issues: This group included drugs that were discontinued during the clinical phase due to safety-related issues. Data on these drugs were obtained from the study by Gayvert et al.20 and the ClinTox database in the MoleculeNet database.21

  • 2.

    Drugs with safety issues on the market: This category includes (i) drugs withdrawn from the market due to SAEs identified through post-marketing surveillance and (ii) drugs carrying boxed warnings for life-threatening adverse events. Data on withdrawn drugs were sourced from Onakpoya et al.,22 with drug-withdrawal reason pairs filtered based on the Oxford Centre for Evidence-based Medicine levels 1, 2, or 3. Information on drugs with boxed warnings was obtained from the drug safety data curation in ChEMBL23 (version 32) with toxicity profiles. The dataset comprises drugs classified into 18 distinct toxicities, including neurotoxicity, cardiotoxicity, and vascular toxicity. Psychiatric toxicity was defined to encompass cases associated with drug misuse.

Additionally, a dataset of drugs with no reported SAEs was constructed using drugs approved for any indication in the ChEMBL database24 (version 32). Anticancer drugs were excluded from approved drugs due to their distinct toxicity tolerance profiles.14,16,25

In our study, drugs were categorised as risky if they head experienced clinical trial failures or post-marketing withdrawals due to SAEs in at least one market, or if they carry significant safety warnings such as boxed warnings. We deliberately adopted this conservative definition to prioritise minimising patient harm from SAEs over potential therapeutic benefits, even if such drugs remain approved in certain health markets.

To minimise potential biases arising from chemical structure similarities, we removed duplicate drugs with analogous chemical structures. ChEMBL IDs were mapped to STITCH IDs using the “chemical.sources.v5.0.tsv.gz” file provided by the STITCH 5 database.26 Since drugs with similar chemical formulas (e.g., salts, hydrates, and radioisotopes) often share the same STITCH ID despite having different ChEMBL IDs, we identified and excluded these duplicates. To verify the effective removal of chemically analogous drugs, we assessed chemical similarity using chemical fingerprints derived from SMILES strings provided by the STITCH database, applying a commonly used threshold for defining drug similarity (Drug similarity was quantified using the Tanimoto similarity coefficient (Tc). Tc of similar drug pair ≥0.85)27, 28, 29 [Fig. S1]. Chemical fingerprints were generated using MACCS keys and Extended-Connectivity Fingerprints (ECFP4, 1024 bits). All analyses were conducted using the RDKit cheminformatics toolkit in Python (RDKit: Open-source cheminformatics. https://www.rdkit.org).

In the following section, we utilised 434 risky and 790 approved drugs for this study. To ensure reliable estimation of drug perturbation effects, we included only drugs for which target gene information covered more than 50% of the relevant data.

Estimation of drug perturbation effects in preclinical models and humans

We curated GPD features in three biological contexts—gene essentiality, expression profiles, and network connectivity—representing drug target genes’ perturbation effects in preclinical models (such as cells and mice) and humans [Fig. S2 and Fig. S3A–I].

Gene essentiality

Gene essentiality scores were derived for cells, mice, and humans using different datasets. For cells, we obtained gene fitness scores from Project Score11 (release 1, 5th April 2019) for 324 cell lines and calculated the average fitness score per gene to represent its essentiality. According to the threshold of significant effects defined by Behan et al.,11 genes with an average fitness score ≥0 were considered as genes showing severe perturbation effects, whereas genes with an average fitness score <0 were considered as genes showing non-severe perturbation effects on cells. In mice, gene essentiality was determined by integrating OGEE30 (Online GEne essentiality, version 3) scores, including experimental viability measurements, and computationally predicted scores based on gene sequence information.31 Each score was normalised (using a robust scaler from the scikit-learn package in Python, version 1.0.2) before averaging, and in cases where only one source was available, the existing score was used without averaging. Genes with integrated scores ≥0.6 were considered as genes showing severe perturbation effects, whereas genes with integrated scores <0.6 were considered as genes showing non-severe perturbation effects on mice. The threshold score of 0.6 was chosen because it resulted in the highest overlap with essential and non-essential genes identified in the independent IMPC (International mouse phenotyping Consortium; 1402 essential and 2253 non-essential genes) database provided by OGEE [Fig. S4]. For humans, the LOEUF (loss-of-function observed/expected upper bound fraction) score from gnomAD32 (version 2.1.1) was used. To ensure alignment with the interpretation that a higher essentiality score indicates a more severe perturbation effect, the LOEUF values were adjusted by subtracting each score from the maximum LOEUF value. According to the threshold of intolerant gene perturbation effects defined by Karczewski et al.32 (LOEUF <0.35), genes with the adjusted LOEUF >1.646 were considered genes showing severe perturbation effects, whereas genes with the adjusted LOEUF ≤1.646 were considered genes showing non-severe perturbation effects on humans.

Expression profiles

For expression profiles, gene expression levels (TPM, Transcript Per Million) for cells, mice, and humans were obtained from the ENCODE database.33 Blood cancer cell lines were excluded from the cell expression profile due to their distinct characteristics.34 The dataset included transcriptome data from 127 cell samples (covering 54 organ types), 141 mouse samples (40 organ types), and 187 human tissue samples (48 organ types). For each organ type, the representative expression value was computed as the average gene expression level across all samples from the same organ type, forming gene expression matrices of dimensions: cell (18,109 genes × 57 organ types), mouse (19,110 genes × 43 organ types), and human (18,775 genes × 51 organ types). Genes with no expression level exceeding 1 TPM across all organ types were excluded. Expression values were transformed using log2(TPM+1). Dimensionality reduction was performed for expression values using t-SNE to profile gene expression levels and tissue specificity across different organ types. To ensure that the reduced-dimensional representations accurately captured gene expression patterns and tissue specificities, we assessed those correlations with measures of average expression levels and tissue specificities (tau value35) across diverse organ types and found them statistically significant [Fig. S5A–F]. The final expression profile for each gene was represented as a single value for cell, mouse, and human: cell (18,109 genes × 1), mouse (19,110 genes × 1), and human (18,775 genes × 1).

Network connectivity

For network connectivity, biological networks specific to cells, mice, and humans were constructed by integrating physical protein–protein interaction (PPI) networks from STRING36 (version 12.0) with co-expression networks generated from ENCODE expression data. To construct the biological network for cells, for example, we combined the STRING-derived human physical network with the co-expression network of cell lines obtained from ENCODE. Gene filtering for co-expression network construction was performed using default settings from the GWENA tool,37 retaining genes with sufficient expression levels. Pairwise gene expression correlations were calculated using the Spearman rank correlation method, and the context likelihood of relatedness (CLR) algorithm38 was applied to identify significant relationships while reducing noise. Following CLR processing, the top 5% of gene–gene links with the highest link scores were retained as meaningful connections. To quantify gene connectivity within the physical and co-expression network, we computed the integrated value of influence (IVI),39 which aggregates multiple network centrality measures (missing values were imputed with the network-wide median). The final multimodal network connectivity score was obtained by multiplying IVI values from the physical and co-expression networks.

The classification of genes into severe and non-severe perturbation groups, based on their effects on expression profiles and network connectivity, was determined to match the proportion of genes exhibiting severe or non-severe effects on gene essentiality in cells, mice, and humans [Table S1A]. Genes exhibiting severe perturbation effects in at least one of the contexts—gene essentiality, expression profile, or network centrality—were finally classified as having severe perturbation effects. Conversely, genes showing non-severe effects across all three contexts were finally classified as having non-severe perturbation effects [Table S1B]. This comparison was restricted to human-mouse orthologous genes to ensure cross-species consistency.

To estimate the perturbation effect of a drug in cells, mice, and humans, we utilised drug-target gene interaction data for humans and mice from the STITCH 5 database (combined interaction score ≥400). Although our chosen cutoff for drug–target interactions may be relatively low, we deliberately included weak associations to avoid excluding potential off-targets or selective targets that may carry safety-relevant signals. Supporting this rationale, when progressively larger fractions of drug–target gene interactions (10–90%) were excluded from model training, the decline in predictive performance became more pronounced, underscoring the importance of incorporating plausible interactions [Fig. S6]. The maximum perturbation score across all genes targeted by the drug was used as the representative drug perturbation effect in each feature. Consequently, each drug was assigned a total of nine perturbation scores (three biological contexts across cells, mice, and humans).

Correlation of the GPD with human drug toxicity

To assess the severity of the perturbation effect of risky and approved drugs, we performed a random sampling procedure for drug-target gene interactions. For each drug, we randomly selected a set of target genes equal in number to its actual targets. This process was repeated 1000 times to generate a distribution of median perturbation effects across all risky and approved drugs. We then compared the actual median perturbation effects of risky and approved drugs to this distribution. A drug group was classified as having a non-severe effect if its perturbation effect was lower than the random distribution, whereas it was classified as having a severe effect if its perturbation effect was higher. This is based on the majority of genes exhibiting non-severe (average 82%) rather than severe perturbation effects (average 18%) [Table S1A]. The statistical significance of these differences was assessed using a two-sided P-value.

This analysis was conducted across GPD features in three biological contexts, including gene essentiality, expression profile, and network connectivity, in cells, mice, and humans.

GPD-based machine learning for human drug toxicity prediction

We developed a machine learning model to predict human drug toxicity using the GPD and chemical properties as input features. The GPD was considered by including perturbation effects across three biological contexts in cells, mice, and humans, resulting in a total of nine GPD features. To obtain chemical properties, we extracted data using scripts from Gayvert et al.20 (https://github.com/kgayvert/PrOCTOR). The chemical properties included components of drug-likeness rules, such as molecular weight and the number of hydrogen bond donors/acceptors, resulting in a total of 13 features. The model was trained and evaluated using leave-one-out cross-validation. We tested multiple classifiers and selected the random forest classifier (n_estimators = 1000), demonstrating the best performance [Fig. S7 and Table S2A]. For comparison, we also implemented an XGBoost classifier (XGB classifier, n_estimators = 1000), a support vector classifier (SVC, kernel = ‘rbf’), logistic regression (solver = ‘liblinear’), and a stochastic gradient descent classifier (SGD classifier, loss = ‘modified_huber’). To assess model performance, various metrics were considered (Area under the precision–recall curve, AUPRC; Area under the receiver operating characteristic curve, AUROC; Accuracy; Precision; Recall; Specificity; Matthews correlation coefficient, MCC). Among these seven metrics, the random forest classifier showed the best performance in five of them, except for precision and specificity. For classification, we defined true positives (TP) as successfully identified risky drugs, true negatives (TN) as successfully identified approved drugs, false negatives (FN) as risky drugs falsely identified as approved, and false positives (FP) as approved drugs falsely identified as risky. A drug was classified as risky if its predicted toxicity probability was 0.4 or higher; otherwise, it was classified as an approved drug. The probability cutoff of 0.4 was chosen because it resulted in the highest combined performance of the model [Fig. S7], while performance remained robust with minimal sensitivity to variations in the cutoff values [Table S3].

To statistically compare the performance of machine learning models, we utilised DeLong's test and permutation test. For AUROC (AUC), we applied DeLong's test. For the other metrics (AUPRC, Accuracy, Precision, Recall, Specificity, and MCC), we performed statistical comparisons using a permutation test with bootstrap sampling with replacement. Specifically, for each pair of models, we generated 10,000 bootstrap resamples of the prediction values and evaluated the resulting performance differences. The observed performance difference was then compared against this empirical distribution to assess its statistical significance, with the P-value defined as the proportion of bootstrap differences greater than the observed difference.

Functional enrichment analysis

To investigate the biological functions associated with GPD-based gene sets, we conducted functional enrichment analysis using the g:Profiler40 (g:GOSt, accessed March 2025). Gene set enrichment was evaluated using the hypergeometric test, with multiple testing corrections performed via the Benjamini–Hochberg false discovery rate (FDR) method. The analysis was performed with the statistical domain scope set to “Custom over annotated genes”, restricting the background set to genes both annotated in g:Profiler and present in the GPD dataset. In the analysis, only function terms associated with 10 to 1000 genes were included.

Drug toxicity assessments using preclinical models and chemical properties

We obtained viability screening data for small molecules from the DepMap Portal and ToxValDB stored in CompTox,41 corresponding to cell lines and mouse models, respectively. The primary PRISM Repurposing dataset42 (19Q4) from the DepMap Portal provides chemical toxicity data, represented as cell line viability values expressed in log fold change relative to negative controls. For in vivo toxicity, we used mouse toxicology data from the ToxValDB (v9.6.0), considering only LC50/LD50 values from oral exposure routes. To represent the most severe toxicity effects per drug, cell-based toxicity was defined as the minimum log fold change observed across multiple cell lines, and mouse toxicity was defined as the lowest LC50 or LD50 value reported in mouse models. Since lower values of toxicity indices in preclinical models and chemical properties indicate higher toxicity, we transformed these values by subtracting them from 1 to align with the toxicity labelling of drugs. These toxicity metrics were then normalised to a range between 0 and 1 using the MinMaxScaler from the scikit-learn Python package. For independent analysis, we used NCI60 cell toxicity datasets, which provide multiple drug response metrics (GI50, IC50, LC50, and TGI) across diverse cell lines. Toxicity values were calculated using the same procedure described above.

To obtain chemical properties, we extracted data on Lipinski's Rule of Five (Ro5), Ghose, and Veber rule compliance, as well as a weighted Quantitative Estimate of Drug-likeness (wQED) for small molecules, using scripts from Gayvert et al.20

Benchmarking for evaluating the GPD model in predicting human drug toxicity

To benchmark our model, we conducted a performance comparison analysis against state-of-the-art human drug toxicity assessment models. We evaluated various conventional machine learning or deep learning models based on chemical structure information: eToxPred,43 MolToxPred,44 SPMM (Structure–Property Multi-Modal foundation model),45 two types of multitask-toxicity model (FP, FingerPrint; SE, SMILES embedding),46 and MolE (Molecular Embeddings).47 Additionally, we tested the PrOCTOR (Predicting the Odds of Clinical Trial Outcomes using Random forest),20 which incorporates both chemical information and human target gene data. For benchmarking, the chemical structure-based models were trained using SMILES strings. The human target gene information used in the PrOCTOR model was sourced from Gayvert et al. All models were assessed using leave-one-out cross-validation. All models were implemented with default hyperparameter settings provided by each original framework.

Model evaluation using independent datasets related to severe adverse events of drugs

Information on drug-adverse event relationships was obtained from the SIDER48 (version 4.1) and ADReCS49 (version 3.3) databases. In SIDER and ADReCS, only Preferred Terms (PTs) from the MedDRA ontology were considered when identifying adverse events. SAEs were determined based on severity rankings provided by Gottlieb et al.,50 which were generated through a crowdsourcing approach. The top 20 ranked adverse events were selected as SAEs. To assess the model's robustness on independent datasets, we trained the model using labels that distinguished risky drugs from approved drugs. For testing, we evaluated its performance using labels indicating whether a drug was associated with SAEs.

Chronological validation

For chronological validation, we constructed a subset of 639 drugs with accessible approval and withdrawal years. This dataset included 603 approved drugs and 36 risky drugs. Among the 36 risky drugs, 21 were withdrawn after 1991 (the threshold year with the best accuracy for chronological validation [Fig. S9]). To validate the model chronologically for each of these 21 drugs, we used the remaining 15 risky drugs and all 603 approved drugs as the training dataset. Due to the imbalance in drug class, we applied random sampling to match the number of approved drugs to the number of risky drugs, arranging a balanced training set. To ensure reliable prediction, this sampling process was repeated 1000 times and the model predicted toxicity probabilities 1000 times for each drug undergoing chronological validation. The median probability across these predictions was used as the representative value. If the median probability was 0.4 or higher, the drug was classified as risky; otherwise, it was classified as approved. A successful chronological validation was defined as a case in which the model, trained on drugs labelled according to their status up to 1991 (excluding the drug under validation), successfully predicted the withdrawal status of a drug that transitioned from approved to withdrawn after 1991.

Statistics

Statistical analyses and P-value calculation were performed using various tests, including two-tailed tests, two-sided T-test, two-sided Mann–Whitney U test, two-sided paired T-test, two-sided Wilcoxon signed-rank test, hypergeometric test, and two-sided Fisher's exact test. To adjust for multiple comparisons, the Benjamini–Hochberg procedure was applied to control the false discovery rate (FDR). Correlation analyses were conducted using Spearman's rank correlation. Normality of data distributions was assessed using the Shapiro–Wilk test. Based on the test results, parametric methods were applied when normality was satisfied, while non-parametric methods were used otherwise. All analyses were carried out in Python (version 3.7.9), utilising the SciPy package (version 1.6.0) and the statsmodels package (version 0.10.1).

Ethics

This study used only publicly available data and did not involve experiments on human participants or animals. Therefore, ethical approval and informed consent were not required.

Role of funders

The funders had no role in this study, including study design, data collection, data analysis, data interpretation, or manuscript writing.

Results

Overview of the GPD-based framework for assessing human drug toxicity

During translational research, which bridges the gap between preclinical and clinical studies, drug toxicity assessments based on chemical properties and preclinical models have limitations in translating to clinical trials, leading to the failure of many candidates. To address this issue, we developed a framework based on the GPD between preclinical models and humans to enhance human drug toxicity assessment. Our approach aims to identify drugs with poor toxicity translation from preclinical models to humans, resulting in safety failure. To achieve this, we considered differences in three layers of drug target gene-based perturbation effects—gene essentiality, expression profile, and network connectivity—as GPD features [Fig. 1A]. For our analyses, we curated an unbiased and comprehensive dataset of human drug toxicity profiles from various sources to characterise risky drugs (n = 434; see “Methods”) [Table S4]. These include drugs that passed preclinical toxicity tests and entered the clinical stage but exhibited severe safety issues in humans, such as clinical trial failures or post-marketing withdrawals and boxed warnings for life-threatening conditions. The toxicity profiles of these drugs were diverse, with cardiovascular, nervous system, and teratogenic toxicities being the most prevalent [Fig. 1B]. As a control group, we included a broad selection of approved drugs (n = 790) without reported SAEs in humans. To analyse the correlation of the target genes' GPD with risky drugs, we estimated the drug-induced perturbation effects in preclinical models (cell and mouse) and humans across three GPD features: gene essentiality, expression profiles, and network connectivity, using drug–target gene interaction information (see “Methods”) [Fig. 1C and Tables S5–6]. Based on the correlation of GPD features with drug toxicity, we developed a target gene-centric interpretable machine learning model using GPD features for drug toxicity prediction, alongside chemical properties [Fig. 1D]. To validate the robustness of our model, we compared its performance to that of the state-of-the-art models and tested it in predicting SAEs using an independent dataset. Furthermore, we conducted chronological validation to evaluate the model's ability to predict real-world outcomes based on past observations in a time-ordered manner.

Fig. 1.

Fig. 1

Overview of a machine learning framework for assessing human drug toxicity based on the difference in preclinical models and humans. (A) Drug safety failure due to differences in drug-induced perturbation effects between preclinical models and humans. (B) Curation of drug toxicity profiles. (C) Estimating drug-induced perturbation effect across three biological contexts using target gene information. (D) Interpretable machine learning framework using GPD for assessing risky drugs.

GPD features improved chemical property-based human drug toxicity assessment

Given the high attrition rates of drug candidates and the limited power of preclinical toxicity assessment in clinical trials [Fig. S10A–B and Fig. S11A–C], we aimed to examine the correlation between GPD and its impact on human drug toxicity (see “Methods”). Our findings indicated that GPD features were correlated with clinical trial failures and post-marketing withdrawals/boxed warnings due to safety issues in humans. Specifically, risky drugs exhibited non-severe perturbation effects in preclinical models but severe effects in humans. The median gene essentialities for both approved and risky drugs in cells (two-tailed test P = 3.0 × 10−52 and P = 3.1 × 10−11) and mice (P = 3.0 × 10−206 and P = 1.0 × 10−200) were significantly lower than the random distribution of drug targets in clinical trials [Fig. 2A]. In humans, the median gene essentiality of approved drugs was significantly lower than the random distribution (P = 2.5 × 10−19), whereas that of risky drugs was significantly higher (P = 4.8 × 10−3) [Fig. 2B]. A similar trend was seen for expression profiles (tissue broadness) and network connectivity (centrality). In cells and mice, the median tissue broadness and centrality of both drug types were significantly lower or not different from the random distribution [Fig. 2A]. In contrast, in humans, only risky drugs displayed significantly higher median tissue broadness and centrality compared to the random distribution [Fig. 2B]. Additionally, target gene-level analyses revealed that genes primarily targeted by risky drugs were significantly enriched in GPD-associated genes showing non-severe perturbation effects in preclinical models, including cells and mice, but severe effects in humans [Fig. S12A–B].

Fig. 2.

Fig. 2

Prediction performance of model based on the correlation between GPD and human drug toxicity. (AB) Gene essentiality, tissue broadness, and centrality were mapped onto the risky and approved drugs. The numbers of risky and approved drugs are shown. The actual median gene essentialities (for cell, mouse, and human) of risky and approved drugs are depicted with red and blue dashed lines, respectively. The histograms show the random distributions of median gene essentiality of drugs (random shuffling 1000 times). Statistical significance (P) was analysed using a two-tailed test. The same description applies to tissue broadness and centrality. (CD) Comparison of model performance in predicting risky drugs. Precision-recall curve (PRC) and receiver operating characteristic (ROC) curve evaluate model performance, while the area under the PRC (AUPRC) and area under the ROC curve (AUROC) are depicted as bar plots. The grey dashed lines indicate the baseline. (E) Toxicity probability distributions of drugs predicted by the GPD-Chemical and Chemical models. Red dots indicate risky drugs, while blue dots indicate approved drugs. The region where risky drugs are exclusively identified by the GPD + Chemical model is shown in orange. Arrows indicate the drugs listed in Table 1. The statistical significance of the difference between the two distributions was assessed using the two-sided Mann–Whitney U test. Normality of the distribution was evaluated with the Shapiro–Wilk test. (F) Sankey diagram illustrating the identification of risky drugs detected by GPD features. The diagram shows the number of risky drugs successfully identified by the Chemical and GPD + Chemical models, the number of risky drugs falsely identified as approved drugs, and the changes in assessment outcomes across models.

Motivated by the finding that chemical properties alone have limitations in capturing the differences in drug perturbation effects between preclinical models and humans—a key factor in drug safety failures—we developed a machine learning classifier to test whether GPD features could enhance human drug toxicity assessment based on chemical properties (see “Methods”). We discovered that GPD features improve toxicity assessment by complementing chemical properties. The GPD model demonstrated improved predictive performance, with an AUPRC of 0.574 (baseline: 0.355) and an AUROC of 0.710 (baseline: 0.500), outperforming approaches that rely solely on preclinical models or human data [Fig. S13A–B]. Furthermore, integrating GPD features with chemical properties (GPD + Chemical) further boosted performance, achieving an AUPRC of 0.629 and an AUROC of 0.752, compared to using chemical properties alone (Chemical; AUPRC: 0.500, AUROC: 0.632) [Fig. 2C and D and Table S2B]. This trend was consistently observed across various performance metrics, including accuracy, precision, recall, specificity, and Matthews correlation coefficient (MCC) [Fig. S14]. While the feature importance analysis indicated that both GPD features and chemical properties contributed to the performance of toxicity assessment, GPD features had a significantly greater impact than chemical properties [Fig. S15A–B].

We found that GPD has the potential to reduce drug attrition rates by identifying risky drugs that their chemical properties could not detect. The improved recall indicates a reduction in false negatives, meaning that fewer drugs that are eventually risky are misclassified as safe. We observed a more significant difference in toxicity probabilities between risky and approved drugs using the GPD + Chemical model (Mann–Whitney U test P = 3.39 × 10−48) compared to the Chemical model (P = 1.77 × 10−15), resulting in higher recall performance [Fig. 2E]. Consequently, among the 225 risky drugs assigned low toxicity probabilities based on chemical properties, a substantial number of drugs (GPD-detected risky drugs, n = 118), such as indomethacin, venlafaxine, and bupivacaine, were successfully identified with higher toxicity probabilities by additionally considering GPD [Fig. 2F and Fig. S16].

Indomethacin, a nonsteroidal anti-inflammatory drug (NSAID), was flagged as risky based on GPD features, yielding toxicity probabilities of 0.555 with the GPD + Chemical model and 0.342 with the Chemical model [Fig. 2E]. This drug has been linked to an increased risk of serious cardiovascular and neurotoxicity events, including myocardial infarction and stroke.51, 52, 53 The GPD + Chemical model likely predicted indomethacin as toxic due to its interaction with target genes such as nitric oxide synthase (NOS3) and peroxisome proliferator-activated receptor delta (PPARD), which are essential and highly central in humans but non-essential and low central in preclinical cell and mouse models [Table 1]. Previous studies implicating perturbed NOS3 and PPARD in myocardial infarction54, 55, 56 and stroke57,58 support this alignment, indicating that the framework successfully reproduces known phenotypic associations.

Table 1.

Examples of GPD-detected risky drugs.

Drug Toxicity class Severe adverse event (in Boxed warning) GPD-associated target gene GPD feature Reference (PMID)
Indomethacin Cardiovascular & Neurotoxicity Myocardial infarction and stroke NOS3, PPARD GE, NC 33532862, 34594039, 34737426, 21205987, 36180795
Testosterone enanthate Cardiovascular toxicity Blood pressure increase AKT1 GE 21842346
Flumazenil Neurotoxicity Seizure GABRA1, 2, 3 GE 38024579, 29961870, 28053010
Ketorolac Cardiovascular & Neurotoxicity Myocardial infarction and stroke NOS3 NC 36180795
Bupivacaine Cardiovascular toxicity Cardiac arrest KCNA10 NC 36965351
Cocaine Psychiatric toxicity Drug abuse and dependence GABRA2 GE 16622805
Venlafaxine Psychiatric toxicity Suicidality (in children, adolescents) SLC6A3 GE 34199792
Quetiapine Psychiatric toxicity Suicidality (in children, adolescents) SLC6A3 GE 34199792
Clomipramine Psychiatric toxicity Suicidality (in children, adolescents) HTR2B, SLC6A3 NC, GE 21179162, 34199792
Mirtazapine Psychiatric toxicity Suicidality (in children, adolescents) HTR2B, SLC6A3 NC, GE 21179162, 34199792
Fluvoxamine Psychiatric toxicity Suicidality (in children, adolescents) SLC6A3 GE 34199792

Examples of risky drugs identified by GPD features but not by chemical properties. The table includes drug names, associated severe adverse events in boxed warning, corresponding toxicity classes, and target genes linked to the adverse events with supporting references. Listed target genes are GPD-associated and shown with relevant GPD feature information. These targets exhibit non-severe in preclinical models (cells and mice) but severe perturbation effects in human.

GE, Gene essentiality; NC, Network connectivity.

Venlafaxine, a serotonin-norepinephrine reuptake inhibitor (SNRI) used as an antidepressant, has also been identified as a risky drug by the GPD features, with predicted toxicity probabilities of 0.664 using the GPD + Chemical model and 0.287 with the Chemical model [Fig. 2E]. Venlafaxine has been associated with an increased risk of suicidality in children and adolescents.59,60 The predicted toxicity of venlafaxine by the GPD + Chemical model was likely influenced by its interaction with the target gene solute carrier family 6 member 3 (SLC6A3), which is essential in humans but non-essential in preclinical models [Table 1]. Prior research, connecting perturbed SLC6A3 to suicidal behaviour,61 again demonstrates the model's ability to reproduce clinically relevant associations.

Bupivacaine, a local anaesthetic, was similarly classified as a risky drug by the GPD features, with toxicity probabilities of 0.581 using the GPD + Chemical model and 0.307 using the Chemical model [Fig. 2E]. Clinically, bupivacaine has been associated with cardiac arrest.62 The GPD + Chemical model likely attributed its predicted toxicity to interaction with the potassium voltage-gated channel subfamily A member 10 (KCNA10), a gene with high centrality in humans but low centrality in preclinical models [Table 1]. Consistent with clinical evidence linking KCNA10 loss-of-function mutations to sudden cardiac death,63 the framework again replicates known associations.

GPD features can identify drugs with major adverse events leading to drug development failure

We found that GPD features enhance assessments of human drug toxicity by identifying major classes of adverse events that lead to drug safety failures. In particular, GPD features demonstrated a strong ability to detect drugs associated with neurotoxicity, cardiovascular toxicity, and gastrointestinal toxicity, which are significant causes of safety failures in clinical trials, along with haematological and psychiatric toxicity, the primary reasons for post-marketing withdrawals.2,22 The GPD + Chemical model was able to classify drugs with safety failures due to these toxicities as exhibiting a higher risk compared to the model relying solely on chemical properties. For drugs linked to neuro- and psychiatric toxicities affecting the nervous system, the GPD + Chemical model significantly increased predicted toxicity probabilities (Wilcoxon signed-rank test P = 5.11 × 10−3; P = 1.36 × 10−3, respectively) [Fig. 3A]. A similar trend was observed for drugs associated with cardio- and vascular toxicities, where the GPD + Chemical model predicted significantly higher toxicity probabilities than the Chemical model (Wilcoxon signed-rank test P = 9.77 × 10−7; Paired T-test P = 2.40 × 10−3, respectively). Additionally, this pattern extended to drugs linked to haematological (Paired T-test P = 3.03 × 10−7), gastrointestinal (Paired T-test P = 1.15 × 10−2), and respiratory toxicity (Paired T-test P = 8.27 × 10−3). For the aforementioned classes of toxicity, the number of GPD-detected risky drugs (false negatives reclassified as true positives) exceeded the number of GPD-missed drugs (true positives reclassified as false negatives). The ratios of GPD-detected to missed risky drugs were 14:1 for neurotoxicity, 19:9 for psychiatric toxicity, 25:3 for cardiotoxicity, and 15:6 for vascular toxicity [Fig. 3A]. Similarly, for haematological, gastrointestinal, and respiratory toxicity, the ratios were 11:1, 12:3, and 11:3, respectively. However, for hepatotoxicity, carcinogenicity, and teratogenicity, the addition of GPD did not show significant improvements in detecting risky drugs.

Fig. 3.

Fig. 3

Comparing toxicity assessment of GPD and Chemical models across toxicity classes. (A) Box plots showing the distributions of toxicity probability predicted by the GPD + Chemical and Chemical models for risky drugs, categorised by toxicity class. Two-sided paired t-test or Wilcoxon signed-rank test was used to assess the statistical significance of differences between the two models. Normality of the distribution was evaluated with the Shapiro–Wilk test. Bar plots illustrate the numbers of drugs successfully (GPD-detected) and falsely identified (GPD-missed) by the GPD + Chemical and Chemical model for each toxicity class. The ratio of GPD-detected/GPD-missed drug counts is displayed below the bars. Toxicity classes with more than 20 associated risky drugs are shown. (B) Functional enrichment analysis of GPD-associated gene sets in terms of gene essentiality, expression profiles, and network connectivity. For each gene set, the top 10 Gene ontology (GO) biological process terms associated with neurotoxicity, psychiatric toxicity, cardiovascular toxicity, haematological toxicity, and genitourinary toxicity are shown. Enrichment was analysed using the hypergeometric test, with P-values adjusted for multiple testing using the Benjamini–Hochberg false discovery rate (FDR) correction. Bar plots represent the −log10 of FDR.

We found that the toxicity classes for which GPD features demonstrate better assessment than chemical properties can be attributed to the critical biological roles of genes associated with those classes. For instance, in terms of gene essentiality, genes with non-severe perturbation effects in cells but severe effects in humans were significantly enriched in functions related to the nervous system, implicating neurotoxicity [Fig. 3B and Table S7]. Similarly, genes with non-severe effects in mice but severe effects in humans were significantly associated with synaptic signalling, which is relevant to psychiatric toxicity. Regarding gene expression profiles, genes that are tissue-specific in mice but broadly expressed in humans were enriched for functions involved in angiogenesis and haematopoietic stem cell differentiation, corresponding to cardiovascular and haematological toxicity. Regarding network connectivity, genes with low centrality in cells but high centrality in humans were significantly associated with reproductive functions, indicating a relationship to genitourinary toxicity. These findings suggest that drugs targeting genes with GPD characteristics are more likely to fail due to these toxicity mechanisms.

Validating the robustness of the GPD-based model for practical use in drug development

To thoroughly assess our model, we conducted a comparative analysis against state-of-the-art models used in human drug toxicity prediction. Our benchmarking process included models based solely on chemical properties and those that incorporated both target gene information and chemical properties (see “Methods”). Notably, our model (AUPRC of GPD + Chemical: 0.629) consistently outperformed various state-of-the-art models (AUPRC of eToxPred,43 MolToxPred,44 SPMM,45 multitask-toxicity46 [FP], MolE,47 and multitask-toxicity [SE]: 0.596, 0.555, 0.546, 0.471, 0.454, and 0.445, respectively) [Fig. 4A, Fig. S17A, and Table S2B]. In particular, the GPD + Chemical model demonstrated superior performance compared to PrOCTOR (AUPRC: 0.601), which uses on-target gene information only in human tissues (without considering the differences with preclinical models) and chemical properties of drugs [Fig. 4B, Fig. S17B, and Table S2B]. Based on the fact that chemical pattern information can provide a more accurate description of drug biology, we further enhanced drug toxicity prediction by integrating this information derived from existing models, including chemical fingerprints and SMILES embeddings. Specifically, we utilised fingerprints from eToxPred and SMILES embedding from SPMM, which represent the best-performing approaches among existing models. Incorporating the eToxPred fingerprint substantially improved our models’ performance, increasing the AUPRC from 0.629 to 0.686 [Fig. S18]. In contrast, the addition of the SPMM SMILES embedding yielded a modest gain, raising the AUPRC to 0.641.

Fig. 4.

Fig. 4

Validating robustness of GPD-based model in assessing human drug toxicity in practice. (AB) Comparison of risky drug prediction performance between the GPD + Chemical model and several state-of-the-art models. (C) The performance of predicting drugs with severe adverse events using independent drug-side effect datasets. The grey dashed line represents the baseline. (D) Chronological validation conducted to assess the model's ability to predict drug withdrawals by training on past data and testing on future unseen data. (E) The model's predictions for drugs withdrawn after 1991 are based on training data up to 1991. The year the drug was withdrawn is shown in parentheses. The predicted toxicity probability distributions for withdrawn drugs are derived from models trained on 1000 randomly sampled training datasets. Dots represent the median values of the distribution, with red indicating a median probability above 0.4 and grey indicating a lower probability. The black line spans the interquartile range (25th–75th percentile).

Given the model's outstanding performance in predicting drug toxicity, we further examined its robustness across various stages of drug development. To this end, we classified risky drugs into two groups: those that failed in clinical trials and those with post-marketing safety issues. Our model exhibited significantly high performance for both groups [Fig. S19A]. Interestingly, while both groups showed significantly higher toxicity probabilities than approved drugs (Mann–Whitney U test P = 3.25 × 10−43; P = 3.27 × 10−25, respectively) [Fig. S19B], drugs that failed in clinical trials had even higher probabilities compared to those with post-marketing safety issues (P = 1.02 × 10−4). This result suggests that the model assigns a greater risk to drugs discontinued in the early phases of development.

To further validate the model's robustness, we assessed its ability to identify drugs linked to SAEs using independent datasets. We utilised drug-adverse event data from the SIDER48 and ADReCS49 databases. Our model achieved performance metrics with AUPRCs of 0.628 and 0.727 in the SIDER (baseline AUPRC: 0.382) and ADReCS (baseline AUPRC: 0.505) datasets, respectively [Fig. 4C and Fig. S20A]. Furthermore, we consistently observed outstanding performances of our model compared to existing models across both datasets [Fig. S21A–H and Table S2C]. Accordingly, drugs associated with SAEs exhibited significantly higher toxicity probabilities than those without [Fig. S20B]. To determine whether the model specialises in detecting SAEs, we evaluated its performance in identifying drugs associated with mild adverse events (MAEs). As expected, the model demonstrated lower performance for MAEs compared to SAE identification, indicating its focus on detecting severe toxicities [Fig. S20C].

Lastly, we conducted a chronological validation analysis to further evaluate our model's practical ability to assess human drug toxicity in real-world scenarios. We found that our model is capable of predicting unseen future drug withdrawals based on past drug information. Specifically, the model was trained on historical data of drug withdrawals or approvals up to 1991 and then validated its predictions on drug withdrawals from subsequent periods (see “Methods”) [Fig. 4D]. The model successfully identified 20 out of 21 drugs withdrawn from the market after 1991 (accuracy: 0.95) [Fig. 4E]. For instance, rosiglitazone, which was approved for diabetes treatment in 1999 but withdrawn in Europe in 2011 due to severe cardiovascular toxicity, was recognised as a high-risk drug by the model based on prior patterns of drug withdrawals only up to 1991, over two decades before its actual withdrawal. Furthermore, we confirmed the robustness of our model for chronological validation. By varying the threshold year for separating past and unseen future data from 1987 to 2005, the model consistently achieved an average accuracy of 0.87, indicating stable performance across different temporal splits [Fig. S9]. These results suggest that our model has the potential to serve as an early warning system for drug safety risks, enabling proactive identification of drugs that may later be found to have SAEs.

Discussion

In this study, we proposed a machine learning framework based on the GPD for inter-species and inter-organism to improve human drug toxicity assessment. The implications are significant: incorporating GPD into drug development pipelines can serve as an early-warning mechanism, enabling developers to deprioritise high-risk candidates before costly clinical trials. This ultimately has the potential to reduce development costs, prevent patient harm, and increase the success rate of new drug approvals. Our central hypothesis was that the translational gap between preclinical models and humans—one of the leading causes of clinical trial failures and drug withdrawal from markets—stems from species (or organism)-specific differences in how genetic perturbations translate into phenotypic outcomes [Fig. 1A]. Indeed, Caldu-Primo et al.64 reported that genes exhibiting the perturbation differences between preclinical models and humans were associated with human abnormal traits. While existing toxicity models primarily focus on chemical properties or static target gene features in humans, our model uniquely incorporates comparative biological information across preclinical models and humans. Through a comprehensive analysis of 1224 drugs [Fig. 1B], we demonstrated that the GPD-based model significantly improved toxicity assessment [Figs. 1C and 2C–D] and outperformed the state-of-the-art models [Fig. 4A and B]. Furthermore, our model was chronologically validated by evaluating its ability to predict the toxicity of future, previously unseen drugs based on historical data in real-world settings [Fig. 4D and E].

The high performance of our model is due to its biological relevance and context-awareness. Drug toxicity in humans often stems from complex disturbances of critical biological pathways that are not accurately replicated in preclinical models, such as cell culture systems or mice. By explicitly modelling the differences in perturbation effects—measured through essentiality, expression specificity, and network centrality—our framework captures the biological divergence between humans and model organisms. This design allows the model to detect human-specific toxicological outcomes that preclinical models and chemical descriptors frequently overlook. One of the most significant contributions of our GPD-based model is its ability to identify high-risk drugs that would otherwise be misclassified as safe based solely on chemical properties. Out of 225 risky drugs that received low toxicity probabilities based on chemical features, 118 were successfully identified as high-risk when GPD features were included [Fig. 2E and F]. Notably, our results indicated that the inclusion of GPD features facilitated the identification of adverse event classes such as cardiovascular, neurological, and psychiatric toxicities, which are among the most common causes of clinical and post-market drug failures [Fig. 3A]. Indomethacin, for example, was associated with both cardiovascular and neurotoxic effects, venlafaxine with psychiatric toxicity, and bupivacaine with cardiac toxicity [Table 1]. These associations align with previous studies linking such toxicities to target genes that exhibit the GPD. This result was further supported by the biological functions of GPD-associated genes, which were predominantly involved in the nervous and cardiovascular systems [Fig. 3B]. Additionally, the presence of genes exhibiting differential perturbation effects,18,19,65 as reported in several studies, and those genes’ known associations with nervous and cardiovascular systems related to the development process lend support to our findings.16,17,64

In drug toxicity assessment, incorporating the biological characteristics of target genes is crucial, as it provides insight into how drugs disrupt biological systems at the molecular level. Traditional methods have primarily focused on the chemical and pharmacokinetic properties of drugs; however, emerging evidence highlights the complementary value of gene-centric evaluations. Recent studies have demonstrated significant associations between drug-induced side effects and gene-level features, such as genetic perturbation effects,16 tissue specificity,13 and network centrality.10,15 Our study builds upon a foundational body of works that systematically identified and characterised these biological features. Key resources enabled our analysis: Drug-target gene associations from the STITCH database26 formed the basis for assessing perturbation effects. Gene essentiality data—critical for estimating the impact of gene perturbation—were derived from large-scale functional genomics initiatives, including Project Score,11 OGEE30 (MGI), and gnomAD.32 Tissue-specific gene expression profiles, obtained from human and mouse ENCODE projects,33 allowed us to model inter-species (and organisms) differences in gene activity. Furthermore, protein–protein interaction (PPI) data from databases such as STRING33 provided a framework to evaluate the topological importance of target genes within biological networks. Collectively, these resources established the necessary foundation for our comparative GPD analysis, enhancing both its feasibility and biological relevance.

Despite the model's strengths, our approach has limitations. The performance of our model depends on the completeness and accuracy of drug-target gene relationship information. Small-molecule drugs have been reported to interact, on average, with 6–11 distinct targets beyond their primary pharmacological target.66,67 However, our further investigation revealed that “GPD-missed” drugs—risky drugs that were not detected even after incorporating GPD features—often had an insufficient number of known intended target (or off-target) genes [Fig. S22A–B], limiting our model's ability to evaluate those drugs meaningfully. This highlights a broader challenge: polypharmacology, while offering therapeutic benefits for complex diseases, also raises safety concerns due to unforeseen adverse events from multi-target interactions, making accurate toxicity prediction difficult. A major limitation remains the large number of unknown drug-target interactions, which complicates efforts to fully characterise the safety profile of multi-target drugs.68 Indeed, we found that incomplete knowledge of drug-target interactions significantly impairs the performance of drug toxicity predictions by limiting available information on target genes [Fig. S6]. Encouragingly, several AI-driven efforts are currently underway to predict or expand drug-target interactions.69,70 As drug-target interactions become clearer, integrating systemically identified polypharmacological targets into toxicity models, including our GPD-based framework, will be essential, potentially improving prediction accuracy and enabling more robust frameworks.

Although anticancer drugs were excluded from this study due to their distinct toxicity tolerance profiles, predicting their SAEs remains especially critical in polypharmaceutical regimens, where overlapping toxicities and complex drug-target interactions may amplify cancer patient risk. Because GPD between preclinical models and humans often obscures these liabilities,71 there is a pressing need for oncology-specific predictive frameworks. Recent advances in machine learning suggest that incorporating human-specific genomic and clinical features alongside therapeutic representations can effectively capture known toxicity associations and enhance prediction accuracy.72, 73, 74 Focusing on oncology-specific datasets and mechanistically relevant features, while considering GPD, will therefore be essential for anticipating severe toxicities in combination therapies, ultimately supporting the development of more personalised and safer treatment strategies in oncology.

GPD provides meaningful information regarding human drug toxicity caused by the perturbation effects of drug treatment, but it does not necessarily imply that such perturbations will always result in SAEs or clinical trial failure. This study reveals a significant correlation between GPD and human drug toxicity [Fig. 2A and B]. It is crucial to note, however, that this association is correlational rather than causal. From this perspective, our framework is not intended to uncover new biology and mechanistic findings, but rather to validate its capacity to reproduce established clinically relevant phenotypic associations. Nonetheless, our findings indicate that drugs associated with GPD are more likely to exhibit toxic effects or fail in clinical development. Therefore, while GPD is not a definitive predictor of toxicity, it serves as a valuable early-warning indicator that can inform toxicity assessment during drug discovery. This highlights a probabilistic approach to evaluating toxicity risk, wherein drugs targeting GPD-associated genes should undergo more rigorous safety evaluations when they demonstrate strong therapeutic potential.

To ensure that our model is readily useable by drug development professionals, we have provided a user-friendly and customisable script that allows for toxicity prediction based on given target genes and drug's chemical structure. This script includes clear documentation and examples, allowing researchers and industry professionals to evaluate drug candidates early in the development pipeline. We believe that providing such practical tools is essential for bridging the gap between cutting-edge academic research and real-world application in pharmaceutical development.

In summary, we created a machine learning model to predict which drugs may cause harmful adverse events in humans, either during clinical trials or post-marketing surveillance, by analysing how drugs impact biological systems differently in preclinical models compared to humans. The model identifies human genetic features that differ from those in preclinical models, such as gene essentiality, expression profiles, and network connectivity, to identify high-risk drugs, particularly those affecting the heart, nervous system, or mental health. Validated through cross-validation and prospective predictions with historical data, the model outperformed previous methods. A user-friendly tool is provided to facilitate practical use in drug development.

Contributors

MP, WS, and SK conceived and designed the experiments. MP, WS, HA, and SK devised the methodologies. MP and WS curated the data. MP, WS, and HA analysed the data. MP, WS, HA, and SK interpreted the data. MP and WS wrote the paper. SK supervised the study. MP, WS, HA, and SK verified the underlying data. All authors read and approved the final version of the manuscript.

Data sharing statement

All datasets used in this study are publicly available. Information on drugs that failed in clinical trials, classified as risky drugs, was obtained from the script provided by Gayvert et al.20 (https://github.com/kgayvert/PrOCTOR) and the ClinTox dataset in MoleculeNet database21 (https://moleculenet.org/). Data on drugs withdrawn from markets due to safety issues were sourced from Table S1 of Onakpoya et al.,22 while the list of drugs with boxed warnings was retrieved from the ChEMBL23,24 (version 32, https://www.ebi.ac.uk/chembl/). An approved drug list was also obtained from the ChEMBL database). Drug-target gene interaction data for human and mouse were obtained from the STITCH database26 (version 5.0, http://stitch.embl.de/). Gene essentiality information for cells, mice, and humans was collected from Project Score11 (release 1, https://score.depmap.sanger.ac.uk/), OGEE30 (version 3, https://v3.ogee.info/), and gnomAD32 (version 2.1.1, https://gnomad.broadinstitute.org/), respectively. Gene expression profiles for cell lines, mouse tissues, and human tissues were acquired from the ENCODE database33 (https://www.encodeproject.org/). Data on physical protein–protein interactions for human and mouse were obtained from the STRING36 database (version 12.0, https://string-db.org/). Drug screening results for preclinical models, including cell and mouse models, were sourced from the PRISM Repurposing database42 (19Q4) available through the DepMap portal (https://depmap.org/portal/) and ToxValDB (version 9.6.0) stored in CompTox database41 (https://comptox.epa.gov/dashboard/). Information on drug-related adverse events was retrieved from the ADReCS49 (version 3.3, https://bioinf.xmu.edu.cn/ADReCS/) and SIDER48 (version 4.1, http://sideeffects.embl.de/) databases.

The source codes for reproduction of the results were developed in Python 3.7.12 and R 4.2.2. The scripts used in this study are available at a GitHub repository (https://github.com/pmh0731/DrugToxicityPrediction). For questions or requests regarding data access and source codes, please contact the corresponding author at: sukim@postech.ac.kr.

Declaration of interests

The authors declare no competing interests.

Acknowledgments

We thank all of the members of Kim laboratory for helpful discussions. This work was supported by grants to SK from the Korean National Research Foundation (2020R1A6A1A03047902, RS-2025-16070008) and IITP (2019-0-01906, Artificial Intelligence Graduate School Program and IITP-2024-RS-2024-00441244, Global Data-X Leader HRD program, POSTECH).

Footnotes

Appendix A

Supplementary data related to this article can be found at https://doi.org/10.1016/j.ebiom.2025.105994.

Appendix A. Supplementary data

Supplementary Figures
mmc1.docx (3.9MB, docx)
Supplementary Table 1
mmc2.xlsx (1.3MB, xlsx)
Supplementary Table 2
mmc3.xlsx (20.3KB, xlsx)
Supplementary Table 3
mmc4.xlsx (13.5KB, xlsx)
Supplementary Table 4
mmc5.xlsx (92.9KB, xlsx)
Supplementary Table 5
mmc6.xlsx (3MB, xlsx)
Supplementary Table 6
mmc7.xlsx (285.3KB, xlsx)
Supplementary Table 7
mmc8.xlsx (1MB, xlsx)

References

  • 1.Sun D., Gao W., Hu H., Zhou S. Why 90% of clinical drug development fails and how to improve it? Acta Pharm Sin B. 2022;12:3049–3062. doi: 10.1016/j.apsb.2022.02.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Cook D., Brown D., Alexander R., et al. Lessons learned from the fate of AstraZeneca's drug pipeline: a five-dimensional framework. Nat Rev Drug Discov. 2014;13:419–431. doi: 10.1038/nrd4309. [DOI] [PubMed] [Google Scholar]
  • 3.DiMasi J.A., Grabowski H.G., Hansen R.W. Innovation in the pharmaceutical industry: new estimates of R& D costs. J Health Econ. 2016;47:20–33. doi: 10.1016/j.jhealeco.2016.01.012. [DOI] [PubMed] [Google Scholar]
  • 4.Seyhan A.A. Lost in translation: the valley of death across preclinical and clinical divide – identification of problems and overcoming obstacles. Transl Med Commun. 2019;4:1–19. [Google Scholar]
  • 5.Van Norman G.A. Limitations of animal studies for predicting toxicity in clinical trials. JACC Basic Transl Sci. 2019;4:845–854. doi: 10.1016/j.jacbts.2019.10.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Clements M., Millar V., Williams A.S., Kalinka S. Bridging functional and structural cardiotoxicity assays using human embryonic stem cell-derived cardiomyocytes for a more comprehensive risk assessment. Toxicol Sci. 2015;148:241–260. doi: 10.1093/toxsci/kfv180. [DOI] [PubMed] [Google Scholar]
  • 7.Qureshi Z.P., Seoane-Vazquez E., Rodriguez-Monguio R., Stevenson K.B., Szeinbach S.L. Market withdrawal of new molecular entities approved in the United States from 1980 to 2009. Pharmacoepidemiol Drug Saf. 2011;20:772–777. doi: 10.1002/pds.2155. [DOI] [PubMed] [Google Scholar]
  • 8.Leeson P.D., Springthorpe B. The influence of drug-like concepts on decision-making in medicinal chemistry. Nat Rev Drug Discov. 2007;6:881–890. doi: 10.1038/nrd2445. [DOI] [PubMed] [Google Scholar]
  • 9.Masarone S., Beckwith K.V., Wilkinson M.R., et al. Advancing predictive toxicology: overcoming hurdles and shaping the future. Digital Discov. 2025;4:303–315. [Google Scholar]
  • 10.Wang X., Thijssen B., Yu H. Target essentiality and centrality characterize drug side effects. PLoS Comput Biol. 2013;9 doi: 10.1371/journal.pcbi.1003119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Behan F.M., Iorio F., Picco G., et al. Prioritization of cancer therapeutic targets using CRISPR–Cas9 screens. Nature. 2019;568:511–516. doi: 10.1038/s41586-019-1103-9. [DOI] [PubMed] [Google Scholar]
  • 12.Chang L., Ruiz P., Ito T., Sellers W.R. Targeting pan-essential genes in cancer: challenges and opportunities. Cancer Cell. 2021;39:466–479. doi: 10.1016/j.ccell.2020.12.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Piñero J., Gonzalez-Perez A., Guney E., et al. Network, transcriptomic and genomic features differentiate genes relevant for drug response. Front Genet. 2018;9:412. doi: 10.3389/fgene.2018.00412. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Duffy Á., Verbanck M., Dobbyn A., et al. Tissue-specific genetic features inform prediction of drug side effects in clinical trials. Sci Adv. 2020;6 doi: 10.1126/sciadv.abb6242. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Perez-Lopez Á.R., Szalay K.Z., Türei D., et al. Targets of drugs are generally and targets of drugs having side effects are specifically good spreaders of human interactome perturbations. Sci Rep. 2015;5 doi: 10.1038/srep10182. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Park M., Kim D., Kim I., Im S., Kim S. Drug approval prediction based on the discrepancy in gene perturbation effects between cells and humans. EBioMedicine. 2023;94 doi: 10.1016/j.ebiom.2023.104705. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Kim J., Kim I., Han S.K., Bowie J.U., Kim S. Network rewiring is an important mechanism of gene essentiality change. Sci Rep. 2012;2:900. doi: 10.1038/srep00900. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Han S.K., Kim D., Lee H., Kim I., Kim S. Divergence of noncoding regulatory elements explains gene–phenotype differences between human and mouse orthologous genes. Mol Biol Evol. 2018;35:1653–1667. doi: 10.1093/molbev/msy056. [DOI] [PubMed] [Google Scholar]
  • 19.Ha D., Kim D., Kim I., et al. Evolutionary rewiring of regulatory networks contributes to phenotypic differences between human and mouse orthologous genes. Nucleic Acids Res. 2022;50:1849–1863. doi: 10.1093/nar/gkac050. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Gayvert K.M., Madhukar N.S., Elemento O. A data-driven approach to predicting successes and failures of clinical trials. Cell Chem Biol. 2016;23:1294–1301. doi: 10.1016/j.chembiol.2016.07.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Wu Z., Ramsundar B., Feinberg E.N., et al. MoleculeNet: a benchmark for molecular machine learning. Chem Sci. 2018;9:513–530. doi: 10.1039/c7sc02664a. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Onakpoya I.J., Heneghan C.J., Aronson J.K. Post-marketing withdrawal of 462 medicinal products because of adverse drug reactions: a systematic review of the world literature. BMC Med. 2016;14:10. doi: 10.1186/s12916-016-0553-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Hunter F.M.I., Bento A.P., Bosc N., Gaulton A., Hersey A., Leach A.R. Drug safety data curation and modeling in ChEMBL: boxed warnings and withdrawn drugs. Chem Res Toxicol. 2021;34:385–395. doi: 10.1021/acs.chemrestox.0c00296. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Zdrazil B., Felix E., Hunter F., et al. The ChEMBL database in 2023: a drug discovery platform spanning multiple bioactivity data types and time periods. Nucleic Acids Res. 2024;52:D1180–D1192. doi: 10.1093/nar/gkad1004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Nguyen P.A., Born D.A., Deaton A.M., Nioi P., Ward L.D. Phenotypes associated with genes encoding drug targets are predictive of clinical trial side effects. Nat Commun. 2019;10:1579. doi: 10.1038/s41467-019-09407-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Szklarczyk D., Santos A., von Mering C., Jensen L.J., Bork P., Kuhn M. Stitch 5: augmenting protein–chemical interaction networks with tissue and affinity data. Nucleic Acids Res. 2016;44:D380–D384. doi: 10.1093/nar/gkv1277. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Patterson D.E., Cramer R.D., Ferguson A.M., Clark R.D., Weinberger L.E. Neighborhood behavior: a useful concept for validation of “Molecular Diversity” descriptors. J Med Chem. 1996;39:3049–3059. doi: 10.1021/jm960290n. [DOI] [PubMed] [Google Scholar]
  • 28.Brown R.D., Martin Y.C. The information content of 2D and 3D structural descriptors relevant to ligand-receptor binding. J Chem Inf Comput Sci. 1997;37:1–9. [Google Scholar]
  • 29.Jasial S., Hu Y., Vogt M., Bajorath J. Activity-relevant similarity values for fingerprints and implications for similarity searching. F1000Res. 2016;5:591. doi: 10.12688/f1000research.8357.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Gurumayum S., Jiang P., Hao X., et al. OGEE v3: online GEne essentiality database with increased coverage of organisms and human cell lines. Nucleic Acids Res. 2021;49:D998–D1003. doi: 10.1093/nar/gkaa884. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Beder T., Aromolaran O., Dönitz J., et al. Identifying essential genes across eukaryotes by machine learning. NAR Genom Bioinform. 2021;3 doi: 10.1093/nargab/lqab110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Karczewski K.J., Francioli L.C., Tiao G., et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020;581:434–443. doi: 10.1038/s41586-020-2308-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Sloan C.A., Chan E.T., Davidson J.M., et al. ENCODE data at the ENCODE portal. Nucleic Acids Res. 2016;44:D726–D732. doi: 10.1093/nar/gkv1160. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Jin H., Zhang C., Zwahlen M., et al. Systematic transcriptional analysis of human cell lines for gene expression landscape and tumor representation. Nat Commun. 2023;14:5417. doi: 10.1038/s41467-023-41132-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Kryuchkova-Mostacci N., Robinson-Rechavi M. A benchmark of gene expression tissue-specificity metrics. Brief Bioinform. 2016;18:205–214. doi: 10.1093/bib/bbw008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Szklarczyk D., Kirsch R., Koutrouli M., et al. The STRING database in 2023: protein–protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res. 2023;51:D638–D646. doi: 10.1093/nar/gkac1000. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Lemoine G.G., Scott-Boyer M.-P., Ambroise B., Périn O., Droit A. GWENA: gene co-expression networks analysis and extended modules characterization in a single bioconductor package. BMC Bioinformatics. 2021;22:267. doi: 10.1186/s12859-021-04179-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Faith J.J., Hayete B., Thaden J.T., et al. Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol. 2007;5 doi: 10.1371/journal.pbio.0050008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Salavaty A., Ramialison M., Currie P.D. Integrated value of influence: an integrative method for the identification of the Most influential nodes within networks. Patterns. 2020;1 doi: 10.1016/j.patter.2020.100052. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Kolberg L., Raudvere U., Kuzmin I., Adler P., Vilo J., Peterson H. g:Profiler—interoperable web service for functional enrichment analysis and gene identifier mapping (2023 update) Nucleic Acids Res. 2023;51:W207–W212. doi: 10.1093/nar/gkad347. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Williams A.J., Grulke C.M., Edwards J., et al. The CompTox chemistry dashboard: a community data resource for environmental chemistry. J Cheminform. 2017;9:61. doi: 10.1186/s13321-017-0247-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Corsello S.M., Nagari R.T., Spangler R.D., et al. Discovering the anticancer potential of non-oncology drugs by systematic viability profiling. Nat Cancer. 2020;1:235–248. doi: 10.1038/s43018-019-0018-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Pu L., Naderi M., Liu T., Wu H.-C., Mukhopadhyay S., Brylinski M. eToxPred: a machine learning-based approach to estimate the toxicity of drug candidates. BMC Pharmacol Toxicol. 2019;20:2. doi: 10.1186/s40360-018-0282-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Setiya A., Jani V., Sonavane U., Joshi R. MolToxPred: small molecule toxicity prediction using machine learning approach. RSC Adv. 2024;14:4201–4220. doi: 10.1039/d3ra07322j. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Chang J., Ye J.C. Bidirectional generation of structure and properties through a single molecular foundation model. Nat Commun. 2024;15:2323. doi: 10.1038/s41467-024-46440-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Sharma B., Chenthamarakshan V., Dhurandhar A., et al. Accurate clinical toxicity prediction using multi-task deep neural nets and contrastive molecular explanations. Sci Rep. 2023;13:4908. doi: 10.1038/s41598-023-31169-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Méndez-Lucio O., Nicolaou C.A., Earnshaw B. MolE: a foundation model for molecular graphs using disentangled attention. Nat Commun. 2024;15:9431. doi: 10.1038/s41467-024-53751-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Kuhn M., Letunic I., Jensen L.J., Bork P. The SIDER database of drugs and side effects. Nucleic Acids Res. 2016;44:D1075–D1079. doi: 10.1093/nar/gkv1075. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Cai M.-C., Xu Q., Pan Y.-J., et al. ADReCS: an ontology database for aiding standardization and hierarchical classification of adverse drug reaction terms. Nucleic Acids Res. 2015;43:D907–D913. doi: 10.1093/nar/gku1066. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Gottlieb A., Hoehndorf R., Dumontier M., Altman R.B. Ranking adverse drug reactions with crowdsourcing. J Med Internet Res. 2015;17:e80. doi: 10.2196/jmir.3962. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Roumie C.L., Choma N.N., Kaltenbach L., Mitchel E.F., Jr., Arbogast P.G., Griffin M.R. Non-aspirin NSAIDs, cyclooxygenase-2 inhibitors and risk for cardiovascular events–stroke, acute myocardial infarction, and death from coronary heart disease. Pharmacoepidemiol Drug Saf. 2009;18:1053–1063. doi: 10.1002/pds.1820. [DOI] [PubMed] [Google Scholar]
  • 52.Chang C.-H., Shau W.-Y., Kuo C.-W., Chen S.-T., Lai M.-S. Increased risk of stroke associated with nonsteroidal anti-inflammatory drugs. Stroke. 2010;41:1884–1890. doi: 10.1161/STROKEAHA.110.585828. [DOI] [PubMed] [Google Scholar]
  • 53.Masclee G.M.C., Straatman H., Arfè A., et al. Risk of acute myocardial infarction during use of individual NSAIDs: a nested case-control study from the SOS project. PLoS One. 2018;13 doi: 10.1371/journal.pone.0204746. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Sakaue S., Kanai M., Tanigawa Y., et al. A cross-population atlas of genetic associations for 220 human phenotypes. Nat Genet. 2021;53:1415–1424. doi: 10.1038/s41588-021-00931-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Jiang L., Zheng Z., Fang H., Yang J. A generalized linear mixed model association tool for biobank-scale data. Nat Genet. 2021;53:1616–1621. doi: 10.1038/s41588-021-00954-4. [DOI] [PubMed] [Google Scholar]
  • 56.Hartiala J.A., Han Y., Jia Q., et al. Genome-wide analysis identifies novel susceptibility loci for myocardial infarction. Eur Heart J. 2021;42:919–933. doi: 10.1093/eurheartj/ehaa1040. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Yin K.-J., Deng Z., Hamblin M., Zhang J., Chen Y.E. Vascular PPARδ protects against stroke-induced brain injury. Arterioscler Thromb Vasc Biol. 2011;31:574–581. doi: 10.1161/ATVBAHA.110.221267. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Mishra A., Malik R., Hachiya T., et al. Stroke genetics informs drug discovery and risk prediction across ancestries. Nature. 2022;611:115–123. doi: 10.1038/s41586-022-05165-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Tiihonen J., Lönnqvist J., Wahlbeck K., Klaukka T., Tanskanen A., Haukka J. Antidepressants and the risk of suicide, attempted suicide, and overall mortality in a nationwide cohort. Arch Gen Psychiatry. 2006;63:1358–1367. doi: 10.1001/archpsyc.63.12.1358. [DOI] [PubMed] [Google Scholar]
  • 60.Rubino A., Roskell N., Tennis P., Mines D., Weich S., Andrews E. Risk of suicide during treatment with venlafaxine, citalopram, fluoxetine, and dothiepin: retrospective cohort study. BMJ. 2007;334:242. doi: 10.1136/bmj.39041.445104.BE. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Rafikova E., Shadrina M., Slominsky P., et al. SLC6A3 (DAT1) as a novel candidate biomarker gene for suicidal behavior. Genes (Basel) 2021;12:861. doi: 10.3390/genes12060861. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Viderman D., Ben-David B., Sarria-Santamera A. Analysis of bupivacaine and ropivacaine-related cardiac arrests in regional anesthesia: a systematic review of case reports. Revista Española de Anestesiología y Reanimación (English Edition) 2021;68:472–483. doi: 10.1016/j.redare.2020.10.005. [DOI] [PubMed] [Google Scholar]
  • 63.Huang S., Chen J., Song M., et al. Whole-exome sequencing and electrophysiological study reveal a novel loss-of-function mutation of KCNA10 in epinephrine provoked long QT syndrome with familial history of sudden cardiac death. Leg Med. 2023;62 doi: 10.1016/j.legalmed.2023.102245. [DOI] [PubMed] [Google Scholar]
  • 64.Caldu-Primo J.L., Verduzco-Martínez J.A., Alvarez-Buylla E.R., Davila-Velderrain J. In vivo and in vitro human gene essentiality estimations capture contrasting functional constraints. NAR Genom Bioinform. 2021;3:1–14. doi: 10.1093/nargab/lqab063. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Uhlen M., Fagerberg L., Hallstrom B.M., et al. Tissue-based map of the human proteome. Science (1979) 2015;347 doi: 10.1126/science.1260419. [DOI] [PubMed] [Google Scholar]
  • 66.Mestres J., Gregori-Puigjané E., Valverde S., Solé R.V. The topology of drug–target interaction networks: implicit dependence on drug properties and target families. Mol Biosyst. 2009;5:1051. doi: 10.1039/b905821b. [DOI] [PubMed] [Google Scholar]
  • 67.Peón A., Naulaerts S., Ballester P.J. Predicting the reliability of drug-target interaction predictions with maximum coverage of target space. Sci Rep. 2017;7:3820. doi: 10.1038/s41598-017-04264-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Manen-Freixa L., Antolin A.A. Polypharmacology prediction: the long road toward comprehensively anticipating small-molecule selectivity to de-risk drug discovery. Expert Opin Drug Discov. 2024;19:1043–1069. doi: 10.1080/17460441.2024.2376643. [DOI] [PubMed] [Google Scholar]
  • 69.Zhao Y., Xing Y., Zhang Y., et al. Evidential deep learning-based drug-target interaction prediction. Nat Commun. 2025;16:6915. doi: 10.1038/s41467-025-62235-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Liao Q., Zhang Y., Chu Y., et al. Application of artificial intelligence in drug-target interactions prediction: a review. NPJ Biomedical Innovations. 2025;2:1. [Google Scholar]
  • 71.Rubahamya B., Dong S., Thurber G.M. Clinical translation of antibody drug conjugate dosing in solid tumors from preclinical mouse data. Sci Adv. 2024;10 doi: 10.1126/sciadv.adk1894. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Froicu E.-M., Oniciuc O.-M., Afrăsânie V.-A., et al. The use of artificial intelligence in predicting chemotherapy-induced toxicities in metastatic colorectal cancer: a data-driven approach for personalized oncology. Diagnostics. 2024;14:2074. doi: 10.3390/diagnostics14182074. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Badwan B.A., Liaropoulos G., Kyrodimos E., Skaltsas D., Tsirigos A., Gorgoulis V.G. Machine learning approaches to predict drug efficacy and toxicity in oncology. Cell Reports Methods. 2023;3 doi: 10.1016/j.crmeth.2023.100413. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Guo L., Wang W., Xie X., Wang S., Zhang Y. Machine learning for genetic prediction of chemotherapy toxicity in cervical cancer. Biomed Pharmacother. 2023;161 doi: 10.1016/j.biopha.2023.114518. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Figures
mmc1.docx (3.9MB, docx)
Supplementary Table 1
mmc2.xlsx (1.3MB, xlsx)
Supplementary Table 2
mmc3.xlsx (20.3KB, xlsx)
Supplementary Table 3
mmc4.xlsx (13.5KB, xlsx)
Supplementary Table 4
mmc5.xlsx (92.9KB, xlsx)
Supplementary Table 5
mmc6.xlsx (3MB, xlsx)
Supplementary Table 6
mmc7.xlsx (285.3KB, xlsx)
Supplementary Table 7
mmc8.xlsx (1MB, xlsx)

Articles from eBioMedicine are provided here courtesy of Elsevier

RESOURCES