Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Jul 6.
Published in final edited form as: Mol Pharm. 2020 Jun 8;17(7):2628–2637. doi: 10.1021/acs.molpharmaceut.0c00326

Comparing Machine Learning Algorithms for Predicting Drug-Induced Liver Injury (DILI)

Eni Minerali 1, Daniel H Foil 1, Kimberley M Zorn 1, Thomas R Lane 1, Sean Ekins 1,*
PMCID: PMC7702310  NIHMSID: NIHMS1619903  PMID: 32422053

Abstract

Drug-Induced Liver Injury (DILI) is one the most unpredictable adverse reactions to xenobiotics in humans and the leading cause of post-marketing withdrawals of approved drugs. To date, these drugs have been collated by the FDA to form the DILIRank database, which classifies DILI severity and potential. These classifications have been used by various research groups in generating computational predictions for this type of liver injury. Recently, groups from Pfizer and AstraZeneca have collated DILI in vitro data and physicochemical properties for compounds that can be used along with data from the FDA to build machine learning models for DILI. In this study, we have used these datasets, as well as the Biopharmaceutics Drug Disposition Classification System dataset, to generate Bayesian machine learning models with our in-house software, Assay Central™. The performance of all machine learning models was assessed through both internal five-fold cross-validation metrics, and prediction accuracy of an external test set of compounds with known hepatotoxicity. The best performing Bayesian model was based on the DILI-concern category from the DILIRank database with an ROC of 0.814, sensitivity of 0.741, specificity of 0.755, and accuracy of 0.746. A comparison of alternative machine learning algorithms, such as k-Nearest Neighbors, support vector classification, AdaBoosted decision trees, and deep learning produced similar statistics to those generated with the Bayesian algorithm in Assay Central™. This study demonstrates machine learning models grouped in a tool called MegaTox™ that can be used to predict early stage clinical compounds, as well as recent FDA approved drugs, to identify potential DILI.

Keywords: Assay Central™, Bayesian, Drug-Induced Liver Injury, Machine learning, MegaTox™

Graphical Abstract

graphic file with name nihms-1619903-f0001.jpg

INTRODUCTION

Bringing a new small molecule therapeutic to market is a complex and expensive process, and even if it reaches the market a drug may still fail due to toxicity that was not observed in the preclinical stages. For instance, drugs as well as herbal and dietary supplements can cause Drug-Induced Liver Injury (DILI), an unpredictable adverse reaction to these xenobiotics whose mechanism of action is still not very well understood. DILI is the leading cause for drug withdrawals in the Unites States1, and its symptoms vary from subtle changes in enzyme levels (e.g. alanine aminotransferase and aspartate aminotransferase) to fatal hepatotoxicity and liver necrosis2, 3. There is evidence that this liver toxicity may be caused by the inhibition of various liver transporters such as the bile salt export pump (BSEP)4 and breast cancer resistance protein5, which has led to many research groups building DILI prediction models which included these inhibition profiles5, 6. Other potential toxicity mechanisms that are used to understand DILI are mitochondrial dysfunction and cytotoxicity, which are often paired with further physiochemical properties such as lipophilicity7.

While DILI can sometimes be related to the prescribed dose of the drug, in many cases a drug may behave idiosyncratically and its DILI effect depends on the summation of genetic, immunologic and metabolic factors of patients8,912. Additional challenges of predicting the DILI potential of drugs are the discrepancies between in vitro data and in vivo results in clinical trials or post-marketing phases of drug development3. For example, the antidepressant nefazodone caused severe liver injury in humans despite performing well in preclinical risk assessment studies13, 14. Therefore, considerable recent research has focused on understanding the biological pathways of DILI and finding novel biomarkers for liver injury15, 16.

Research groups have been approaching DILI through bioinformatics and cheminformatics tools by accessing published biological and chemical data. These tools take advantage of software which build machine learning models based on well-known algorithms such as Bayesian, support vector machine (SVM), Bernoulli naïve Bayes, AdaBoost decision trees, random forest, and deep neural networks1720. Simply put, machine learning models are generated by identifying a correlation between molecular structure (i.e. a structure-activity relationship series) and the biological data (i.e. IC50 measurements). Chemical structures can be interpreted computationally by different molecular descriptors such as 2D MOE, MACCS, PaDEL, 3D Volsurf, and extended connectivity fingerprints of different diameters19, 21. For example, one of the earliest Bayesian models used extended connectivity descriptors (ECFC_6) to predict DILI potential for a test set of 237 compounds with a training model composed of 295 compounds, and suggest important substructures involved in this toxicity21. Machine learning models like this can be used to predict the activity of compounds without the need for additional in vitro or in vivo data, but their performance depends on the quantity and quality of available data.

To aid in this research the National Center for Toxicological Research has been working on classifying drugs based on their likelihood to cause liver injury for a number of years22, 23. Their work has resulted in the creation of the Liver Toxicity Knowledge Base Benchmark Dataset (LTKB-BD), a set of 1036 FDA-approved individual compounds that are organized based on information provided in FDA drug labels. This dataset is easily accessible by research groups that want to understand and predict DILI, so it has consequently become a benchmark dataset. The group of Tong et al., have developed several models based on both the LTKB-BD and its updated iteration, DILIRank17, 2224. Whilst this manuscript was in preparation a new dataset, DILIst, was published by their group which includes a binary classification of 1279 drugs that cause hepatotoxicity25.

Other approaches to predicting DILI have leveraged the Biopharmaceutics Drug Disposition Classification System (BDDCS) which was developed in 2005 by Wu and Benet as a tool to predict metabolizing enzyme and drug transporter effects on drug disposition26. Wu and Benet recognized that highly permeable compounds were extensively metabolized, while poorly permeable drugs were primarily eliminated unchanged in the urine or bile. Thus, drugs in the BDDCS are simply classified according to their membrane permeability and aqueous solubility, and do not depend on the dose prescribed to humans or animals. Compounds with a high permeability rate and a high solubility belong to Class 1, those with high permeability rate but low solubility to Class 2, which are then followed by Class 3 (low permeability rate/ high solubility) and 4 (low solubility rate and low solubility). Benet et al. noticed that BDDCS Class 2 compounds are usually DILI- causing, and their classification system is a better predictor of DILI than using BSEP or mitochondrial toxicity, either alone or combined27.

Recently, Pfizer published their hepatic risk matrix (HRM), which represents a scoring system combining physicochemical properties (e.g. Rule of two, partition model) with common toxicity mechanisms involved in DILI (cytotoxicity, mitochondrial dysfunction, BSEP inhibition)6. They analyzed 200 drugs from the LTKB together with internal and external drug candidates, and 70– 80% of compounds classified as Most- DILI- concern were identified; this approach was proposed as a method to assess molecules in early clinical development. Similarly, AstraZeneca described a Bayesian model with 96 molecules for which they generated in vitro data (using hepatic spheroids, BSEP inhibition, mitochondrial toxicity and bioactivation), physicochemical properties (cLogP) and exposure (Cmaxtotal)28. This model had a balanced accuracy of 63% on held out samples and was also used by the group in decision making and during safety studies.

We now describe our efforts to leverage these datasets from the FDA (DILIRank)22, AstraZeneca28 (original classification), Pfizer6 (partition or rule-of-two HRM scoring systems) as well as Benet et al.26(BDDCS) to compare various machine learning models in order to predict DILI (Figure 1). We also describe the thorough external testing of models and generation of predictions for FDA-approved drugs from recent years which may serve as a prospective test for those compounds for which there is little data gathered to date from patients.

Figure 1.

Figure 1.

Classification systems for the datasets.

EXPERIMENTAL SECTION

Curation of Training Datasets

Curation of DILIRank Dataset

The DILIRank dataset22 provides a list of 1036 compounds categorized based on their DILI potential (i.e. DILI-concern) and severity; models were generated using data from both categories. The former category divides drugs into verified (v) categories of DILI concern: vMost-, vLess-, Ambiguous-, and vNo- DILI concern. The DILI-severity classification divides them based on side effects: severe (score 6 to 8), moderate (score 4 and 5), mild (score 1 to 3), and none (score 0).

Five Bayesian models were generated based on DILIRank, from either vDILI-concern or DILI-severity classifications (Table 1), within the Assay Central framework (described in the following section). Two vDILI-concern models utilized only those compounds classified as vMost-, vLess- and vNo- DILI concern, and excluded the 254 drugs classified as Ambiguous- DILI concern drugs, totaling 705 molecules. vDILI-concern models evaluated herein were set at two thresholds The first vDILI-concern model (abbreviated as “AC-Concern-3” herein) aims to predict DILI more broadly, classifying compounds as ‘active’ when assigned a score of three or greater (i.e. Most- and Less- DILI concern), while the other (abbreviated as “AC-Concern-4” herein) predicts only compounds that have the potential to be categorized as Most-DILI, with scores of four. The DILI-severity models built from the DILIRank dataset had a total of 938 compounds with 98 drugs removed or merged to increase the model performances (more explanation provided in Assay Central™ section). Three Bayesian models were built based on the DILI-severity score at different thresholds (Table 1). These models aim to predict severe liver damage only (abbreviated as “AC-Severity-6” herein), moderate and severe liver damage (abbreviated as “AC-Severity-4”), or any kind of damage (abbreviated as “AC-Severity-1” herein).

Table 1.

Curation thresholds of DILIRank, AstraZeneca and Pfizer datasets, and model abbreviations.

Models Threshold Abbreviation
AstraZeneca AstraZeneca Scoring System Threshold 2 Threshold ≥ 2 AC-AZ-2
AstraZeneca Scoring System Threshold 3 Threshold =3 AC-AZ-3
BDDCS BDDCS Threshold= 2 AC-BDDCS-2
DILIRank’s vDILI- Concern Most- and Less- DILI Arbitrary Threshold ≥3 AC-Concern-3
Most- DILI Arbitrary Threshold =4 AC-Concern-4
DILIRank’s DILI- Severity Severe Liver Damage Threshold ≥6 AC-Severity-6
Moderate and Severe Liver Damage Threshold ≥4 AC-Severity-4
Any kind of liver damage Threshold ≥1 AC-Severity-1
Pfizer Partition Hybrid Scoring System Threshold 4 Threshold ≥4 AC-PF-Par-4
Partition Hybrid Scoring System Threshold 8 Threshold ≥8 AC-PF-Par-8
Ro2 Scoring System Threshold 3 Threshold ≥3 AC-PF-Ro2–3
Ro2 Scoring System Threshold 8 Threshold ≥8 AC-PF-Ro2–8

Curation of BDDCS Dataset

The classification system provided by Benet et al. was used to predict potential Most- DILI drugs29. Structural SMILES for all the compounds were retrieved from various public databases (i.e. EPA’s Chemistry Dashboard, PubChem, and DrugBank) and depicted with a proprietary script; 914 compound were used to build the model after excluding those not compatible with machine learning (i.e. cisplatin, n-38, capreomycin 1b, and exp-3174). Class 2 drugs were considered to be DILI causing (active), and class 1, 3 and 4 were considered to be inactive.

Curation of Pfizer’s and AstraZeneca’s Datasets

A dataset of 241 compounds was provided by Pfizer as Supplemental Information6, and SMILES were retrieved from EPA’s Chemistry Dashboard, PubChem, and DrugBank. Conflicting data (i.e. the same drug administered at different dosages) resulted in the omission of that drug from the training set, with the final dataset comprising 221 compounds. After analysis of multiple classification thresholds with Bayesian models (constructed accorded to the following section), the following were found to produce optimal internal statistics from Aleo et al., and four models were built using their 221 compounds (Table 1). Two more models were built using a training set of 96 compounds that were classified into 3 categories (no DILI, intermediate DILI and severe DILI) from Williams et al.28. The DILI severity scores ranged from 1 to 3 where the latter had the highest risk. SMILES for all the 96 compounds were retrieved from EPA, PubChem, and DrugBank, and two models were built using score= 3 (AC-AZ-3) and score ≥2 (AC-AZ-2) as a threshold.

Assay Central

Our proprietary Assay Central™ software30, 31, is a framework for curating high-quality datasets and generating Bayesian machine learning models for prospective drug discovery and toxicology predictions3234. We have previously described the Assay Central™ project in more detail3235 as well as interpretation of prediction and applicability domain scores30. All datasets described previously were subjected to the same structural standardization processes (i.e. removing salts, metal complexes and proteins).

Generating classification models requires defining a threshold of bioactivity. Assay Central™ offers an automated method to select an activity threshold to optimize individual model performance as described previously30, 31. However, considering that all data modelled herein were part of predetermined classification systems, thresholds were set according to those of the original authors and our own discretion for practical application. Each prediction generated with Assay Central™ outputs a Bayesian probability-like score and an applicability score for individual chemicals. Higher applicability values suggest more chemical space is covered in the model and ensures the predicted compound is represented within the training data. Predictions were evaluated using the standard probability cutoff - a prediction score of 0.5 is considered an active bioactivity prediction (i.e. DILI-inducing)30. Merged compounds with contrasting results (i.e. two instances of a compound, one DILI-causing result and another negative result) were considered active (DILI-causing) overall to err on the side of caution. Several metrics generated from internal five-fold cross-validation are also included with each model to evaluate and compare predictive performance, including Receiver Operator Characteristic (ROC), Recall, Precision, F1 Score, Cohen’s Kappa36, 37 and Matthews Correlation Coefficient38. Accuracy and Specificity are also used to assess the performance of the models.

Comparison between additional machine learning algorithms

Similar to previous studies3234 finalized DILI datasets generated in Assay Central™ were used for comparison of multiple machine learning algorithms consistently using extended-connectivity fingerprints as the sole molecular descriptor. These methods include random forest, k-Nearest Neighbors, support vector classification, naïve Bayesian, AdaBoosted decision trees, and deep learning architecture33. We then compared the differences in the internal five-fold cross-validation metrics between algorithms with a rank normalized score. We34, 39 and others40 have previously used this score as a performance criterion to compare machine learning methods. Rank normalized scores can be evaluated using pairwise (machine learning comparison per training set) or independently (to give a general machine learning comparison). Further, we have devised an additional measure to compare models called “difference from the top” (ΔRNS) metric, which gives a rank normalized score for each algorithm subtracted from the highest rank normalized score from a specific training set39. This enables a direct assessment of the performance of two machine learning algorithms by maintaining the pairwise results from each training set cross-validation score by algorithm.

Curation of Testing Datasets

Assay Central™ was utilized to generate predictions and external cross-validation (subvalidation) metrics for testing datasets. Any compounds overlapping between the training and testing sets were not considered, so as to ensure a proper comparison of Bayesian models.

First, training data used to generate DILIRank models, AC-Concern-4 and AC-Severity-4, were used as test sets for the rest of the Bayesian models (i.e. AstraZeneca, Pfizer, and BDDCS datasets). New compounds introduced in the recently published DILIst classification system were used to assess the performance of all AC-Severity and AC-Concern models25. Another external test set of 14 compounds was used to assess the perfomance of our models; this was collated from the Withdrawn database41 which annotates compounds withdrawn because of hepatotoxicity42, as well as drugs considered by Hong et al.24 and Aleo et al.6 as Most-DILI.

More extensive research was done to leverage information reported by the FDA such as Novel Drug Approvals43, MedWatch44, Reports45 and Recalls46. SMILES for all the compounds were retreived from EPA’s Chemistry Dashboard47, DrugBank48 and PubChem49, and we were also able to retreive a LiverTox ranking for almost all the compounds that guided our analysis from the LiverTox website50. This database contains current hepatotoxicity data on drugs curated by the National Institute of Diabetes and Digestive Kidney Diseases, the National Library of Medicine, and the Drug-Induced Liver Injury Network; these drugs are classified and are continuously updated by these institutions based on scientific literature, public databases and the likelihood scale51. Acknowledging that toxicity can often be caused by drug-drug interactions, drugs that had more than one active ingredient were not included in the predictions of FDA approved drugs.

RESULTS

Summary of different DILI models and external validation

Twelve binary Bayesian machine learning models were built using Assay Central™. Training data was taken from the DILIRank and BDDCS classification systems, as well as smaller datasets published by Pfizer and AstraZeneca groups (Table 1). Internal five-five cross-validation metrics for Assay Central models are presented in Table 2.

Table 2.

Internal five-fold cross-validation statistics for all models generated with Assay Central™.

Models Sensitivity Specificity Accuracy ROC
AstraZeneca AC-AZ-2 0.524 0.879 0.646 0.763
AC-AZ-3 0.696 0.575 0.604 0.619
BDDCS AC-BDDCS-2 0.631 0.840 0.780 0.789
DILIRank’s vDILI- Concern AC-Concern-3 0.741 0.755 0.746 0.814
AC-Concern-4 0.722 0.676 0.688 0.764
DILIRank’s DILI- Severity AC-Severity-6 0.739 0.575 0.612 0.710
AC-Severity-4 0.595 0.757 0.702 0.727
AC-Severity-1 0.721 0.720 0.721 0.774
Pfizer AC-PF-Par-4 0.652 0.814 0.715 0.786
AC-PF-Par-8 0.600 0.832 0.769 0.750
AC-PF-Ro2–3 0.545 0.890 0.715 0.774
AC-PF-Ro2–8 0.720 0.866 0.833 0.840

After preliminary models were evaluated, four Bayesian models created with the DILIRank data were of primary interest due to its wide use and reliable data making it a “gold standard” dataset. These were evaluated using five-fold cross-validation to provide performance metrics (Table 2, Figure S1HL). ROC values for all models ranged between 0.700–0.800. The AC-Concern-3 model had the highest ROC of 0.814, followed by AC-Severity-1 with a ROC of 0.774. The overall best performing model was AC-Concern-4 with an accuracy of 0.688, sensitivity of 0.722, and specificity of 0.676. New compounds from the DILIst dataset25 (338 compounds) were also utilized as an external test set to validate the performance of these models (Table 3, Figure S4).

Table 3.

Using the DILIRank and DILIst dataset as an external test set to validate models.

Models Statistics AC-Concern-4 AC-Severity-4 DILIst
AC-BDDCS-2 ROC 0.777 0.714 -
Sensitivity 0.642 0.556 -
Specificity 0.834 0.812 -
AC-AZ-2 ROC 0.602 0.599 -
Sensitivity 0.682 0.725 -
Specificity 0.496 0.438 -
AC-AZ-3 ROC 0.630 0.566 -
Sensitivity 0.493 0.437 -
Specificity 0.734 0.690 -
AC-PF-Par-4 ROC 0.629 0.617 -
Sensitivity 0.652 0.489 -
Specificity 0.587 0.727 -
AC-PF-Par-8 ROC 0.635 0.591 -
Sensitivity 0.536 0.333 -
Specificity 0.743 0.847 -
AC-PF-Ro2–3 ROC 0.672 0.645 -
Sensitivity 0.634 0.537 -
Specificity 0.719 0.750 -
AC-PF-Ro2–3 ROC 0.623 0.583 -
Sensitivity 0.473 0.390 -
Specificity 0.775 0.773 -
AC-Concern-3 ROC - - 0.619
Sensitivity - - 0.617
Specificity - - 0.526
AC-Concern-4 ROC - - 0.584
Sensitivity - - 0.486
Specificity - - 0.577
AC-Severity-1 ROC - - 0.609
Sensitivity - - 0.672
Specificity - - 0.443
AC-Severity-4 ROC - - 0.632
Sensitivity - - 0.428
Specificity - - 0.776
AC-Severity-6 ROC - - 0.592
Sensitivity - - 0.579
Specificity - - 0.541

BDDCS was another classification system of interest because of its simplicity and focus on only two physiochemical parameters. In this model, Class 2 drugs (Most-DILI) were considered active and other classes were considered inactive (i.e. not causing DILI). The sensitivity, specificity, accuracy, and ROC of this model were 0.631, 0.840, 0.780 and 0.789 respectively. The DILIRank training datasets were utilized as external test sets to validate the performance of this model. A total of 258 compounds with vDILI-Concern scores and 344 compounds with DILI-severity scores were predicted (Table 3, Figure S2, FigureS3). The BDDCS model performed slightly better at predicting active DILI-Concern compounds with an ROC of 0.777 and sensitivity of 0.642, than predicting DILI-Severity compounds.

Additionally, two models were built with the AstraZeneca data (proprietary classification) and four were built with the Pfizer data (partition model and rule-of-2 classifications). Sensitivity, specificity, accuracy, and ROC scores for all of the models are included in Table 2 and Supplemental Figure S1(A, B, IL). Of these, AC-PF-Ro2–8 demonstrated the best internal performance, with an ROC of 0.840. External validation of these models, predicting Most-DILI compounds from the DILIRank DILI-concern training model, is depicted in Table 3, Figure S2 (AF) and Figure S3 (AF). All of these models had good specificity (around 0.7) but low precision, sensitivity, F1, Cohen’s Kappa and MCC scores.

Fourteen compounds from Siramshetty et al.,42 Hong et al.,24 and Aleo et al.6 were subsequently used as external text sets to validate all Assay Central™ Bayesian models (Table 4). All DILIRank models correctly classified coumarin, temafloxacin, and etifoxine as DILI- causing drugs. Most models classified beclobrate and medifoxamine as DILI-causing, while febarbamate is considered DILI- inactive across all models. Tacrine, erythromycin estolate, sulfamethoxazole, and zalcitabine were all compounds considered to be Most-DILI by Pfizer based on their literature review6. While some of their models misclassify these compounds, all Bayesian DILIRank models predict them as active (DILI- causing), with AC-Concern-4 performing the best. However, tacrine, erythromycin estolate, and zalcitabine were inaccurately classified as DILI- inactive by the BDDCS model. Hong et al. worked on a two-class model and reported the internal performance as follows: accuracy 72.9%, sensitivity 62.8%, specificity 79.8%24. Our models generated with DILIRank data classified all four compounds analyzed by Hong et al. similarly to their two-class model.

Table 4.

Predicting and Evaluating Compounds from Hong et al.24, Aleo et al.,6 and Withdrawn41 databases.

Name AC- Concern- 3 AC- Concern- 4 AC-Severity-1 AC-Severity-4 AC-Severity-6 AC-BDDCS-2 AC- AZ-2 AC- AZ-3 AC-PF-Par-4 AC-PF-Par-8 AC-PF-Ro2–3 AC-PF-Ro2–8
709 comp. 709 comp. 938 comp. 938 comp. 938 comp. 914 comp. 96 comp. 96 comp. 221 comp. 221 comp. 221 comp. 221 comp.
Solithromycin 0.993 1.730 0.853 0.819 0.826 0.593 −0.641 0.354 1.741 3.399 1.181 3.288
Paritaprevir 0.601 0.530 0.586 0.522 0.589 0.550 −0.228 0.500 0.508 0.665 0.437 0.730
Ombitasvir 0.285 0.137 0.440 0.410 0.500 0.464 0.842 0.644 0.452 0.650 0.406 0.642
Dasabuvir 0.658 0.927 0.613 0.567 0.600 0.695 1.092 0.842 0.685 0.868 0.577 0.847
Tacrine 0.589 0.701 0.560 0.509 0.554 0.373 1.373 0.229 0.798 0.160 0.659 0.238
Erythromycin Estolate 1.050 1.748 0.865 0.850 0.794 −0.135 −0.277 0.475 1.698 2.069 0.680 2.136
Sulfamethoxazole 0.689 0.847 0.604 0.539 0.571 0.656 0.855 0.899 −0.038 0.369 0.266 0.379
Zalcitabine 0.624 0.809 0.597 0.522 0.543 0.274 0.274 0.676 −0.554 −0.3423 −0.063 −0.212
Coumarin 0.599 0.680 0.559 0.528 0.550 0.509 0.639 0.569 0.491 0.406 0.450 0.418
Beclobrate 0.569 0.508 0.541 0.479 0.511 0.578 0.174 0.643 0.684 0.513 0.531 0.521
Temafloxacin 0.786 1.148 0.726 0.678 0.696 0.337 −0.005 0.469 0.457 0.085 0.438 0.148
Febarbamate 0.460 0.432 0.481 0.466 0.486 0.440 0.062 0.522 0.347 0.086 0.362 0.129
Etifoxine 0.534 0.670 0.521 0.516 0.545 0.535 0.297 0.644 0.505 0.461 0.467 0.458
Medifoxamine 0.524 0.550 0.513 0.489 0.534 0.451 −0.175 0.247 0.472 0.228 0.455 0.279

red= score higher than 0.7; orange= score between 0.5 and 0.7; cyan= score between 0.3 and 0.5; blue= scores lower than 0.3; white= compounds were already in the training set of the respective models.

Prospective predictions for newly FDA approved drugs

After generating and validating various Bayesian models, predictions are provided for 99 FDA approved drugs from 2017 to 2019 (Table S7a, b, c). Out of all these FDA approved drugs, only acalabrutinib, ribociclib, pretomanid, and pexidartinib had a LiverTox ranking higher than E and are thus the focus of our evaluation.

Acalabrutinib, the active ingredient for a drug approved in 2017, has a D LiverTox score as it has been associated with the reactivation of hepatitis B52. All the DILIRank models as well as the BDDCS models correctly identified acalabrutinib as DILI-causing while all models built with Pfizer data predict it as inactive (no-DILI). The two Bayesian AstraZeneca models have conflicting predictions for this chemical, but their applicability scores were low (data not shown). Ribociclib was approved in 2017 with a C LiverTox score and has reported cases of jaundice53. Only the DILIRank models identified this compound as DILI-causing while the rest incorrectly predicted it as no-DILI. We would also like to draw attention to neratinib maleate, delafloxacin, and brigatinib, three drugs with an E* rank and active ingredient prediction scores higher than 0.7 from both AC-Concern-3 and AC-Concern-4. Interestingly, no drugs approved in 2017 had a LiverTox rank that would be of concern, but many of these compounds were found to potentially cause DILI (Table S7). Gilteritinib was the only drug with an E* rank whose prediction score was above 0.7 for the two DILI- concern models.

While most drugs approved in 2019 did not have a LiverTox rank, as they have been in clinical use for a short time only, pretomanid and pexidartinib have a D and B rank respectively. Pretomanid was correctly identified as DILI-causing by all the DILI models except the AC-AZ-2 model. However, predictions for pexidartinib were inconsistent and largely fell at the 0.5 score cutoff. Additional predictions were generated for the remaining compounds as well as drugs reported for liver injury to the FDA through Reports (Table S8) and MedWatch (Table S9).

Comparisons of machine learning models

Models were generated with alternative machine learning algorithms to compare how internal five-fold cross-validation performance was affected. Our comparison of different machine learning approaches shows variability in individual model statistics across all datasets (Figure 2). Yet when comparing rank normalized scores of different algorithms, Assay Central™ performs the best followed by support vector classification, (Figure 3A). These differences are more apparent with the ΔRNS metric, which shows the Bayesian Assay Central™ algorithm and support vector classification as superior methods when evaluating internal five-fold cross-validation statistics (Figure 3B) (Table S1S6).

Figure 2.

Figure 2.

Comparison of different machine learning algorithms for DILI prediction. Radar plots depict the metrics resulting from five-fold cross-validation. AC = Assay Central (Bayesian), rf = Random Forest, knn = k-Nearest Neighbors, svc = Support Vector Classification, bnb = Naïve Bayesian, ada = AdaBoosted Decision Trees, DL = Deep Learning.

Figure 3.

Figure 3.

Machine learning algorithm comparisons across multiple five-fold cross-validation metrics. A) Rank normalized scores and B) ΔRNS. Box and whisker plots show individual points for those values that fall outside of the 5–95 percentile. AC = Assay Central (Bayesian), rf = Random Forest, knn = k-Nearest Neighbors, svc = Support Vector Classification, bnb = Naïve Bayesian, ada = AdaBoosted Decision Trees, DL = Deep Learning.

DISCUSSION

Building on many years of research in predicting hepatotoxicity, there has now been over a decade of work on predicting DILI using in vitro and in silico approaches. Examples include our first efforts using Bayesian machine learning21 and the more recent publication from AstraZeneca leveraging internal in vitro data28. Various other groups have used in vivo data to generate DILI predictions17, 2224, and others have devised decision algorithms based on their in vitro data, such as the work from Pfizer6, or based on physicochemical descriptors like BDCCS27.

Overall, there have been numerous machine learning studies applied to DILI which will likely be reviewed at length in the future and we now provide some examples. Machine learning methods such as support vector machine (SVM), artificial neural network (ANN), and random forest (RF) were developed using molecular descriptors, biological descriptors or a combination of both54. For example, work from Zhang et al. consisted of a combination of 1317 molecules collected from publications and used with five machine learning methods with MACCS and FP4 fingerprints. Support vector machine algorithm produced the best accuracy of 75% for external validation of 88 compounds from the LTKB55. Kim et al. worked on random forest and support vector machine DILI models trained with weighted fingerprints that produced accuracies around 60%56. In 2018, Ai et al. also worked on a dataset of 1241 compounds used with 3 machine learning algorithms and 12 fingerprint types to create an ensemble model. This was able to produce an area under the curve score of 0.90 for an external test set of 286 compounds from the LTKB57. More recently, He et al. used 1254 compounds with DILI data and produced an ensemble method using Marvin descriptors to produce an accuracy for training of 0.7858. Overall, various other groups have been working on generating models that can predict the DILI potential of compounds based on structure alone1720, demonstrating the significance of this topic as well as that of in silico work which can bypass expensive or in vivo arduous testing.

However, this field is in constant need for more data and better classification systems. Building models with the most up to date classification algorithms is accompanied by challenges in finding datasets to validate the models. DILI models have used various classification systems with different algorithms, and many types of fingerprints or descriptors. In this study, we have attempted to solve these challenges by carefully selecting compounds for external validation and using several algorithms to build models with the latest datasets using one descriptor type.

The performance of our models was tested through several external datasets (Table 4). Four compounds analyzed by Hong et al.24 (solithromycin, paritaprevir, ombitasvir, and dasabuvir) were predicted the same as their two-class model. Of those four, Ombitasvir was classified as Most-DILI from their three-class model, but the confidence for that prediction was low (0.2%). Paritaprevir was classified as Less-DILI from the three-class model, and we observe lower prediction scores for that compound. Additionally, most DILI models classified the Withdrawn Database’s beclobrate and medifoxamine as DILI-causing, while febarbamate is considered no-DILI across all Bayesian models. The withdrawal of the latter was associated with cases of liver injury from Atrium, which is a fixed preparation containing phenobarbital, febarbamate and difebarbamate59, 60. It is possible that the liver injury may have not be caused by febarbamate but the other two compounds in the mixture, or the resulting toxicity may be the result of drug-drug interactions. Phenobarbital has been found to cause hepatotoxicity in dogs61, lending support to this hypothesis. Finally, the four compounds categorized as Most-DILI by Aleo et al.6 (erythromycin estolate, sulfamethoxazole, tacrine, and zalcitabine) were correctly predicted by all the DILIRank models while Pfizer’s own classification system misclassifies some of them.

While the recent Pfizer and AstraZeneca DILI datasets summarize important in vitro data, the total number of compounds classified by both is relatively small (221 and 96 drugs respectively). A smaller training set is usually associated with a lower overlap of molecular fingerprints between the training set and the test set, as a small chemical property space is covered. This is a very important consideration for structure-based in silico models and may result in low applicability scores, such as those observed in Table S9. In this study, most of the compounds predicted by the AstraZeneca and Pfizer Bayesian Assay Central™ models have a low applicability score, likely due to the relatively small size of these models. This score measures the representation of the chemicals’ molecular fingerprints in the model, and therefore the confidence in these predictions is lower than those from larger datasets such as the BDDCS and DILIRank. Future analysis of this topic may be worthwhile and could leverage recent work on other methods to measure model applicability domain62.

As mentioned earlier, the models AC-Concern-3 and AC-Severity-1 had the highest ROC scores of all the DILIRank models. The active threshold for both of them is the lowest in their respective categories, vDILI-Concern and DILI Severity Class, resulting in a higher ratio of active to inactive compounds in the model. The prevalence of more active fingerprint contributions as a result of these models’ balance is reflected in the increased internal performance statistics. The predictions of more balanced models are usually more trustworthy and this has been shown in many studies63. It is important to note that models with low thresholds, such as AC-AZ-2, or high thresholds, such as AC-Severity-1, have a high ratio of active compounds to total compounds, which directly affects the true positive rate (TPR) and true negative rate (TNR).

As demonstrated in Table 3, the AC-Concern and AC-Severity models were also assessed through external validation with 338 new compounds from the DILIst dataset25 and resulted in ROCs ranging from 0.58 to 0.63. We expect to iteratively improve the models as well as these statistics with new data and continue to monitor studies in the field of toxicology that can elucidate the complexity of DILI. Such a dataset would ideally be a summation of data that would take into consideration the dependence of DILI on genetic, immunologic as well as metabolic factors. In the meantime, machine learning predictions can help the process of cost effectively prioritizing compounds for testing/filtering in early drug discovery that would otherwise rely solely on in vivo data or the monitoring of patient symptoms.

Comparing different machine learning models in this study across different datasets suggests the Assay Central™ method is on a par with more advanced methods like support vector classification and deep learning, consistent with our earlier comparisons3235. Efforts to model single properties thought to be important components of DILI such as BSEP inhibition using the Bayesian algorithm in Assay Central also suggests these models may perform similarly across all datasets (Figure S5), and this is worthy of further exploration in future.

In summary, these combined efforts to generate machine learning models for different sized DILI related datasets suggest the ability to classify DILI potential that is at least on a par with the in vitro datasets used for assigning DILI probability in recent studies (57.9–89.5% accuracy in primary hepatocytes and 53–73% accuracy in 3D spherical human liver microtissues)27. Our recommendation would be that larger training datasets, such as those from DILIRank, be used to train machine learning models, and after sufficient validation be used to prioritize compounds for DILI testing in vitro from molecular structure alone. These predictions can also be further validated by some of the classification methods described by Pfizer6 and AstraZeneca28. These DILI models are the foundation for our software MegaTox™, which compiles an array of ADME/Tox properties.

Supplementary Material

supplemental data

ACKNOWLEDGMENTS

We acknowledge Dr. Alex Clark (Molecular Materials Informatics) for Assay Central™ support and Mr. Valery Tkachenko for machine learning support.

Grant information

We kindly acknowledge NIH funding to develop the software from R44GM122196-02A1 “Centralized assay datasets for modelling support of small drug discovery organizations” from NIGMS and NIEHS for 1R43ES031038-01 “MegaTox for analyzing and visualizing data across different screening systems”. “Research reported in this publication was supported by the National Institute of Environmental Health Sciences of the National Institutes of Health under Award Number R43ES031038. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.”

ABBREVIATIONS USED

BSEP

Bile Salt Export Pump

DILI

Drug-Induced Liver Injury

LTKB-BD

Liver Toxicity Knowledge Base Benchmark Dataset

AC

Assay Central

MCC

Matthews Correlation Coefficient

AUC

area under the receiver operating characteristic curve

CK

Cohen’s Kappa

rf

Random forest

knn

k-Nearest Neighbors

svc

support vector classification

bnb

Bernoulli Naive Bayes

ada

AdaBoost Decision Trees

DL

Deep Neural Networks

Footnotes

Competing interests:

S.E. is owner, all others are employees of Collaborations Pharmaceuticals, Inc.

SUPPORTING INFORMATION

Supporting further details on the models, structures of public molecules and computational models are available. This material is available free of charge via the Internet at http://pubs.acs.org.

REFERENCES

  • 1.Olson H; Betton G; Robinson D; Thomas K; Monro A; Kolaja G; Lilly P; Sanders J; Sipes G; Bracken W; Dorato M; Van Deun K; Smith P; Berger B; Heller A, Concordance of the toxicity of pharmaceuticals in humans and in animals. Regul Toxicol Pharmacol 2000, 32 (1), 56–67. [DOI] [PubMed] [Google Scholar]
  • 2.Meunier L; Larrey D, Drug-Induced Liver Injury: Biomarkers, Requirements, Candidates, and Validation. Front Pharmacol 2019, 10, 1482. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Weaver RJ; Blomme EA; Chadwick AE; Copple IM; Gerets HHJ; Goldring CE; Guillouzo A; Hewitt PG; Ingelman-Sundberg M; Jensen KG; Juhila S; Klingmuller U; Labbe G; Liguori MJ; Lovatt CA; Morgan P; Naisbitt DJ; Pieters RHH; Snoeys J; van de Water B; Williams DP; Park BK, Managing the challenge of drug-induced liver injury: a roadmap for the development and deployment of preclinical predictive models. Nat Rev Drug Discov 2020, 19 (2), 131–148. [DOI] [PubMed] [Google Scholar]
  • 4.Warner DJ; Chen H; Cantin LD; Kenna JG; Stahl S; Walker CL; Noeske T, Mitigating the inhibition of human bile salt export pump by drugs: opportunities provided by physicochemical property modulation, in silico modeling, and structural modification. Drug Metab Dispos 2012, 40 (12), 2332–41. [DOI] [PubMed] [Google Scholar]
  • 5.Kotsampasakou E; Ecker GF, Predicting Drug-Induced Cholestasis with the Help of Hepatic Transporters-An in Silico Modeling Approach. J Chem Inf Model 2017, 57 (3), 608–615. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Aleo MD; Shah F; Allen S; Barton HA; Costales C; Lazzaro S; Leung L; Nilson A; Obach RS; Rodrigues AD; Will Y, Moving beyond Binary Predictions of Human Drug-Induced Liver Injury (DILI) toward Contrasting Relative Risk Potential. Chem Res Toxicol 2020, 33 (1), 223–238. [DOI] [PubMed] [Google Scholar]
  • 7.Chen M; Borlak J; Tong W, High lipophilicity and high daily dose of oral medications are associated with significant risk for drug-induced liver injury. Hepatology 2013, 58 (1), 388–96. [DOI] [PubMed] [Google Scholar]
  • 8.Njoku DB, Drug-induced hepatotoxicity: metabolic, genetic and immunological basis. Int J Mol Sci 2014, 15 (4), 6990–7003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Chen M; Suzuki A; Borlak J; Andrade RJ; Lucena MI, Drug-induced liver injury: Interactions between drug properties and host factors. J Hepatol 2015, 63 (2), 503–14. [DOI] [PubMed] [Google Scholar]
  • 10.Dahlin DC; Miwa GT; Lu AY; Nelson SD, N-acetyl-p-benzoquinone imine: a cytochrome P-450-mediated oxidation product of acetaminophen. Proc Natl Acad Sci U S A 1984, 81 (5), 1327–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Lecoeur S; Bonierbale E; Challine D; Gautier JC; Valadon P; Dansette PM; Catinot R; Ballet F; Mansuy D; Beaune PH, Specificity of in vitro covalent binding of tienilic acid metabolites to human liver microsomes in relationship to the type of hepatotoxicity: comparison with two directly hepatotoxic drugs. Chem Res Toxicol 1994, 7 (3), 434–42. [DOI] [PubMed] [Google Scholar]
  • 12.Cook JC; Wu H; Aleo MD; Adkins K, Principles of precision medicine and its application in toxicology. J Toxicol Sci 2018, 43 (10), 565–577. [DOI] [PubMed] [Google Scholar]
  • 13.Choi S, Nefazodone (Serzone) withdrawn because of hepatotoxicity. CMAJ 2003, 169 (11), 1187. [PMC free article] [PubMed] [Google Scholar]
  • 14.Mendrick DL, Toxicogenomics and classic toxicology: how to improve prediction and mechanistic understanding of human toxicity. Methods Mol Biol 2008, 460, 1–22. [DOI] [PubMed] [Google Scholar]
  • 15.Watkins PB, Quantitative Systems Toxicology Approaches to Understand and Predict Drug-Induced Liver Injury. Clin Liver Dis 2020, 24 (1), 49–60. [DOI] [PubMed] [Google Scholar]
  • 16.McGill MR; Jaeschke H, Biomarkers of drug-induced liver injury. Adv Pharmacol 2019, 85, 221–239. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Liu Z; Shi Q; Ding D; Kelly R; Fang H; Tong W, Translating clinical findings into knowledge in drug safety evaluation--drug induced liver injury prediction system (DILIps). PLoS Comput Biol 2011, 7 (12), e1002310. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Zhang S; Golbraikh A; Oloff S; Kohn H; Tropsha A, A novel automated lazy learning QSAR (ALL-QSAR) approach: method development, applications, and virtual screening of chemical databases using validated ALL-QSAR models. J Chem Inf Model 2006, 46 (5), 1984–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Kotsampasakou E; Montanari F; Ecker GF, Predicting drug-induced liver injury: The importance of data curation. Toxicology 2017, 389, 139–145. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Xu Y; Dai Z; Chen F; Gao S; Pei J; Lai L, Deep Learning for Drug-Induced Liver Injury. J Chem Inf Model 2015, 55 (10), 2085–93. [DOI] [PubMed] [Google Scholar]
  • 21.Ekins S; Williams AJ; Xu JJ, A predictive ligand-based Bayesian model for human drug-induced liver injury. Drug Metab Dispos 2010, 38 (12), 2302–8. [DOI] [PubMed] [Google Scholar]
  • 22.Chen M; Suzuki A; Thakkar S; Yu K; Hu C; Tong W, DILIrank: the largest reference drug list ranked by the risk for developing drug-induced liver injury in humans. Drug Discov Today 2016, 21 (4), 648–53. [DOI] [PubMed] [Google Scholar]
  • 23.Chen M; Vijay V; Shi Q; Liu Z; Fang H; Tong W, FDA-approved drug labeling for the study of drug-induced liver injury. Drug Discov Today 2011, 16 (15–16), 697–703. [DOI] [PubMed] [Google Scholar]
  • 24.Hong H; Thakkar S; Chen M; Tong W, Development of Decision Forest Models for Prediction of Drug-Induced Liver Injury in Humans Using A Large Set of FDA-approved Drugs. Sci Rep 2017, 7 (1), 17311. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Thakkar S; Li T; Liu Z; Wu L; Roberts R; Tong W, Drug-induced liver injury severity and toxicity (DILIst): binary classification of 1279 drugs by human hepatotoxicity. Drug Discov Today 2020, 25 (1), 201–208. [DOI] [PubMed] [Google Scholar]
  • 26.Wu CY; Benet LZ, Predicting drug disposition via application of BCS: transport/absorption/ elimination interplay and development of a biopharmaceutics drug disposition classification system. Pharm Res 2005, 22 (1), 11–23. [DOI] [PubMed] [Google Scholar]
  • 27.Chan R; Benet LZ, Evaluation of the Relevance of DILI Predictive Hypotheses in Early Drug Development: Review of In Vitro Methodologies vs BDDCS Classification. Toxicol Res (Camb) 2018, 7 (3), 358–370. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Williams DP; Lazic SE; Foster AJ; Semenova E; Morgan P, Predicting Drug-Induced Liver Injury with Bayesian Machine Learning. Chem Res Toxicol 2020, 33 (1), 239–248. [DOI] [PubMed] [Google Scholar]
  • 29.Benet LZ; Broccatelli F; Oprea TI, BDDCS applied to over 900 drugs. AAPS J 2011, 13 (4), 519–47. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Clark AM; Dole K; Coulon-Spektor A; McNutt A; Grass G; Freundlich JS; Reynolds RC; Ekins S, Open Source Bayesian Models. 1. Application to ADME/Tox and Drug Discovery Datasets. J Chem Inf Model 2015, 55 (6), 1231–45. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Clark AM; Ekins S, Open Source Bayesian Models. 2. Mining a “Big Dataset” To Create and Validate Models with ChEMBL. J Chem Inf Model 2015, 55 (6), 1246–60. [DOI] [PubMed] [Google Scholar]
  • 32.Lane T; Russo DP; Zorn KM; Clark AM; Korotcov A; Tkachenko V; Reynolds RC; Perryman AL; Freundlich JS; Ekins S, Comparing and Validating Machine Learning Models for Mycobacterium tuberculosis Drug Discovery. Mol Pharm 2018, 15 (10), 4346–4360. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Russo DP; Zorn KM; Clark AM; Zhu H; Ekins S, Comparing Multiple Machine Learning Algorithms and Metrics for Estrogen Receptor Binding Prediction. Mol Pharm 2018, 15 (10), 4361–4370. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Zorn KM; Lane TR; Russo DP; Clark AM; Makarov V; Ekins S, Multiple Machine Learning Comparisons of HIV Cell-based and Reverse Transcriptase Data Sets. Mol Pharm 2019, 16 (4), 1620–1632. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Sandoval PJ; Zorn KM; Clark AM; Ekins S; Wright SH, Assessment of Substrate-Dependent Ligand Interactions at the Organic Cation Transporter OCT2 Using Six Model Substrates. Mol Pharmacol 2018, 94 (3), 1057–1068. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Carletta J, Assessing agreement on classification tasks: The kappa statistic. Computational Linguistics 1996, 22, 249–254. [Google Scholar]
  • 37.Cohen J, A coefficient of agreement for nominal scales. Education and Psychological Measurement 1960, 20, 37–46. [Google Scholar]
  • 38.Matthews BW, Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et biophysica acta 1975, 405 (2), 442–51. [DOI] [PubMed] [Google Scholar]
  • 39.Korotcov A; Tkachenko V; Russo DP; Ekins S, Comparison of Deep Learning With Multiple Machine Learning Methods and Metrics Using Diverse Drug Discovery Data Sets. Mol Pharm 2017, 14 (12), 4462–4475. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Caruana R; Niculescu-Mizil A In An empirical comparison of supervised learning algorithms, 23rd International Conference on Machine Learning, Pittsburgh, PA, Pittsburgh, PA, 2006. [Google Scholar]
  • 41.Anon WITHDRAWN: A Resource for Withdrawn and Discontinued Drugs. http://cheminfo.charite.de/withdrawn/. [DOI] [PMC free article] [PubMed]
  • 42.Siramshetty VB; Nickel J; Omieczynski C; Gohlke BO; Drwal MN; Preissner R, WITHDRAWN--a resource for withdrawn and discontinued drugs. Nucleic Acids Res 2016, 44 (D1), D1080–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.FDA New Drugs at FDA: CDER’s New Molecular Entities and New Therapeutic Biological Products. https://www.fda.gov/drugs/development-approval-process-drugs/new-drugs-fda-cders-new-molecular-entities-and-new-therapeutic-biological-products.
  • 44.Anon MedWatch: The FDA Safety Information and Adverse Event Reporting Program. https://www.fda.gov/safety/medwatch-fda-safety-information-and-adverse-event-reporting-program.
  • 45.FDA Potential Signals of Serious Risks/New Safety Information Identified from the FDA Adverse Event Reporting System (FAERS). https://www.fda.gov/drugs/questions-and-answers-fdas-adverse-event-reporting-system-faers/potential-signals-serious-risksnew-safety-information-identified-fda-adverse-event-reporting-system.
  • 46.FDA Drug Recalls. https://www.fda.gov/drugs/drug-safety-and-availability/drug-recalls.
  • 47.Williams AJ; Grulke CM; Edwards J; McEachran AD; Mansouri K; Baker NC; Patlewicz G; Shah I; Wambaugh JF; Judson RS; Richard AM, The CompTox Chemistry Dashboard: a community data resource for environmental chemistry. J Cheminform 2017, 9 (1), 61. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Wishart DS; Feunang YD; Guo AC; Lo EJ; Marcu A; Grant JR; Sajed T; Johnson D; Li C; Sayeeda Z; Assempour N; Iynkkaran I; Liu Y; Maciejewski A; Gale N; Wilson A; Chin L; Cummings R; Le D; Pon A; Knox C; Wilson M, DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res 2018, 46 (D1), D1074–D1082. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Wang Y; Cheng T; Bryant SH, PubChem BioAssay: A Decade’s Development toward Open High-Throughput Screening Data Sharing. SLAS Discov 2017, 22 (6), 655–666. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Anon LiverTox. https://www.ncbi.nlm.nih.gov/books/NBK547852/.
  • 51.Anon Categorization Of The Likelihood Of Drug Induced Liver Injury. https://www.ncbi.nlm.nih.gov/books/NBK548392/. [PubMed]
  • 52.Herishanu Y; Katchman H; Polliack A, Severe hepatitis B virus reactivation related to ibrutinib monotherapy. Ann Hematol 2017, 96 (4), 689–690. [DOI] [PubMed] [Google Scholar]
  • 53.Anon Ribociclib NDA. https://www.accessdata.fda.gov/drugsatfda_docs/nda/2017/209092Orig1s000MultidisciplineR.pdf. [Google Scholar]
  • 54.Muller C; Pekthong D; Alexandre E; Marcou G; Horvath D; Richert L; Varnek A, Prediction of drug induced liver injury using molecular and biological descriptors. Comb Chem High Throughput Screen 2015, 18 (3), 315–22. [DOI] [PubMed] [Google Scholar]
  • 55.Zhang C; Cheng F; Li W; Liu G; Lee PW; Tang Y, In silico Prediction of Drug Induced Liver Toxicity Using Substructure Pattern Recognition Method. Mol Inform 2016, 35 (3–4), 136–44. [DOI] [PubMed] [Google Scholar]
  • 56.Kim E; Nam H, Prediction models for drug-induced hepatotoxicity by using weighted molecular fingerprints. BMC Bioinformatics 2017, 18 (Suppl 7), 227. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Ai H; Chen W; Zhang L; Huang L; Yin Z; Hu H; Zhao Q; Zhao J; Liu H, Predicting Drug-Induced Liver Injury Using Ensemble Learning Methods and Molecular Fingerprints. Toxicol Sci 2018, 165 (1), 100–107. [DOI] [PubMed] [Google Scholar]
  • 58.He S; Ye T; Wang R; Zhang C; Zhang X; Sun G; Sun X, An In Silico Model for Predicting Drug-Induced Hepatotoxicity. Int J Mol Sci 2019, 20 (8). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Horsmans Y; Lannes D; Pessayre D; Larrey D, Possible association between poor metabolism of mephenytoin and hepatotoxicity caused by Atrium, a fixed combination preparation containing phenobarbital, febarbamate and difebarbamate. J Hepatol 1994, 21 (6), 1075–9. [DOI] [PubMed] [Google Scholar]
  • 60.Dumortier J; Bellemin B; Jacob P; Berger F; Chevallier M; Scoazec JY; Vial T, Liver injury due to tetrabamate (Atrium): an analysis of 11 cases. Eur J Gastroenterol Hepatol 2000, 12 (9), 1007–12. [DOI] [PubMed] [Google Scholar]
  • 61.Muller PB; Taboada J; Hosgood G; Partington BP; VanSteenhouse JL; Taylor HW; Wolfsheimer KJ, Effects of long-term phenobarbital treatment on the liver in dogs. J Vet Intern Med 2000, 14 (2), 165–71. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Carrio P; Pinto M; Ecker G; Sanz F; Pastor M, Applicability Domain ANalysis (ADAN): a robust method for assessing the reliability of drug property predictions. J Chem Inf Model 2014, 54 (5), 1500–11. [DOI] [PubMed] [Google Scholar]
  • 63.Wei Q; Dunbrack RL Jr., The role of balanced training and testing data sets for binary classifiers in bioinformatics. PLoS One 2013, 8 (7), e67863. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supplemental data

RESOURCES