Abstract
As a critical issue in drug development and postmarketing safety surveillance, drug-induced liver injury (DILI) leads to failures in clinical trials as well as retractions of on-market approved drugs. Therefore, it is important to identify DILI compounds in the early-stages through in silico and in vivo studies. It is difficult using conventional safety testing methods, since the predictive power of most of the existing frameworks is insufficiently effective to address this pharmacological issue. In our study, we employ a natural language processing (NLP) inspired computational framework using convolutional neural networks and molecular fingerprint-embedded features. Our development set and independent test set have 1597 and 322 compounds, respectively. These samples were collected from previous studies and matched with established chemical databases for structural validity. Our study comes up with an average accuracy of 0.89, Matthews’s correlation coefficient (MCC) of 0.80, and an AUC of 0.96. Our results show a significant improvement in the AUC values compared to the recent best model with a boost of 6.67%, from 0.90 to 0.96. Also, based on our findings, molecular fingerprint-embedded featurizer is an effective molecular representation for future biological and biochemical studies besides the application of classic molecular fingerprints.
1. Introduction
In pharmacology, an oral drug administration must pass four phases, including absorption, distribution, metabolism, and excretion (abbreviated as ADME). In metabolism, drug molecules are structurally transformed by Cytochrome P450 family enzymes in the human liver.1 Pharmacologically, postconverted drug molecules have their original properties changed to suitably adapt to the human body.2 Hepatotoxicity, however, is an unexpected consequence of this phase because the hepatocytes (liver cell) may be damaged with various levels depending on the intake dose and chemical types.3 Drug-induced liver injury (DILI)4 is termed when a large number of hepatocytes are chemically damaged. Many on-market drugs are given notifications of causing DILI such as acetaminophen,5 nefazodone,6 and trovafloxacin.7 The changes in the concentration of hepatic enzymes, including aspartate aminotransferase, alanine aminotransferase, and γ-glutamyl transferase, provide diagnostic data on liver condition (being damaged or recovered).8 Once DILI occurs, identifying hepatotoxins, terminating, or reducing doses are highly recommended before taking any specific medications for liver recovery. Over the past few years, DILI has significantly become one of the most concerning topics in drug discovery.9−11 An up-to-date search on PubMed using the keyword “DILI” has indicated this research trend. Besides experimental approaches,12−15 computational studies16−24 on compounds causing DILI were also conducted to partially address the limitations of in vitro and in vivo experiments.25
In 2015, Huang et al. performed virtual screening on DILI compounds found in traditional Chinese medicines (TCM) using random forest26−28 and molecular descriptors.17 Huang et al. used the LTKB dataset,29 a small dataset of FDA-labeled DILI compounds. However, an excess number of model built-in features and the ignorance of hyperparameter tuning caused their model prone to overfitting. In 2017, Kim and Nam proposed another computational framework to predict DILI compounds using two algorithms: random forest (RF) and support vector machine (SVM)30 and molecular fingerprints weighted by the Bayesian probability of every single substructure.19 Intuitively, their model is relatively stable than Huang et al.’s model due to a larger dataset and a clearly defined modeling process. Nevertheless, while their models obtained accuracies of 73.8 and 72.6% and AUC values of 0.79 and 0.77 for the validation set, test results were far lower with accuracies of 60.1 and 61.1% for the RF and SVM model, respectively. In the same year, Tharwat et al. developed a classification model for DILI compounds using the Whale optimization algorithm31 to optimize the parameters of the SVM model.20 Besides, they processed the data with various techniques, including random undersampling,32 random oversampling,32 and the synthetic minority oversampling technique33 to deal with class imbalance. Their dataset had only 553 DILI compounds represented by 31 molecular descriptors. Tharwat et al.’s proposed model is relatively robust and efficient. However, applying numerous preprocessing techniques to rebalance the classes of the dataset is not always a suitable option, especially for molecular data. The over-intervention using preprocessing techniques into the original dataset may result in a reduction of fair randomness and generalization. In 2018, Yang et al. reviewed a wide range of in silico toxicity studies using machine learning and structural alerts for various types of toxicity.21 Their study covered useful information about chemical toxicity sources as well as recent computational approaches. Recently, Ai et al. introduced an ensemble model for model improvement to better predict DILI compounds.22 They used 12 types of molecular fingerprints and three machine learning algorithms to create 36 corresponding classifiers and took the averaged probabilistic values returned by each classifier. According to reported findings, Ai et al.’s model is the current best model to predict DILI compounds.
In this study, we introduce a more effective computational framework inspired by an idea of word embeddings in natural language processing (NLP) using convolutional neural networks (CNN) combined with molecular fingerprint-embedded features to address the problem of screening DILI compounds. Chemical data were collected from Zhang et al.,18 Ai et al.,22 Liew et al.,34 Chen et al.,35 and Kotsampasakou et al.36 to obtain a development set and an independent test set after multistages of preprocessing.
2. Materials and Methods
2.1. Data Preparation
Compounds collected from Zhang et al.,18 Ai et al.,22 Liew et al.,34 and Kotsampasakou et al.36 were merged and duplicates were removed to build our development set. Compounds collected from Chen et al.35 (DILIrank benchmark dataset) were used as our independent test set. For Chen et al.’s dataset, only compounds termed “most-DILI” and termed “non-DILI” were selected as positive and negative samples, respectively. Compounds in the independent test set that appeared in the development set were removed to ensure that there was no overlapping between the two datasets. To check for structural validity, all compounds were cross-referenced to PubChem to discard unidentified compounds. Finally, we obtained 1597 samples (946 DILI compounds and 651 non-DILI compounds) and 322 samples (128 DILI compounds and 194 non-DILI compounds) for the development set and the independent test set, respectively (Table 1). All compounds were then converted into binary vectors in terms of Morgan molecular fingerprint (2048 bits and a radius of 2) using RDKit.37
Table 1. Datasets for Model Development and Evaluation.
|
number of
compounds |
||||
|---|---|---|---|---|
| data | DILI | non-DILI | total | data source |
| Liew et al.34 | ||||
| Zhang et al.18 | ||||
| deva | 946 (∼60%) | 651 (∼40%) | 1597 | Kotsampasakou et al.36 |
| Ai et al.22 | ||||
| testb | 128 (∼40%) | 194 (∼60%) | 322 | Chen et al.35 |
Development set.
Independent test set.
2.2. Molecular Fingerprint-Embedded Features
Recently, deep learning (DL)38 has emerged to become one of the most robust and effective computational advances to target multiple problems in most of the disciplines. The effectiveness of DL has been far confirmed with great achievements in face/voice/pattern recognition,39−41 natural language processing (NLP),42 bioinformatics,43 and drug discovery.44 Molecular fingerprint-embedded featuring is inspired by an idea of word embeddings in NLP due to analogous characteristics between molecular expression and linguistic expression.45 In NLP, words are the smallest units that build up sentences and each sentence can be considered a bag of words. To train an NLP model, these words need to be embedded into numerical forms. Any compound can be expressed as molecular fingerprints (MFs), which is a binary vector. The length (number of bits) of the binary vector of different types of MFs may differ from each other. MFs are usually directly converted from SMILESs, a string-formatted molecular expression. The MF generation algorithms read entire sequences of SMILESs to look for molecular substructures and assign “1” for a substructural presence and “0” for a substructural absence. Since each type of MF has a different dictionary of substructures, their indexes of substructures are also distinct, although they may have the same vector length. There are various types of MFs such as Morgan,46 MACCS,47 and BCI.48 To create MF-embedded vectors of a particular MF type from their binary vectors, the MF is viewed as a bag of molecular substructures in which each substructure is determined with an index. These substructural “indexes” are equivalent to “words” in a sentence and they, therefore, can be embedded using similar proposed methods. The idea of word embedding was first presented in Word2vec,42 a group of models for word embeddings, developed by Mikolov et al. in 2015. To deal with chemical and biological data, Seq2vec,49 Mol2vec,50 and FP2VEC51 were introduced in 2016, 2018, and 2019, respectively. However, these mentioned approaches were specifically designed to test on several benchmark datasets only, and more efforts need to be added to cope with an unideal dataset of a certain topic. The true samples in experimental chemical datasets may largely vary in lengths, classes, and substructural counts. To address the problem of screening DILI compounds, we developed an effective computational framework inspired by these previous approaches to better customize our collected data.
Figure 1 explains our proposed method to create MF-embedded
matrices. In the present study, we used Morgan fingerprint (n = 2048 bits and a radius of 2) to convert all chemical
data in the form of SMILES to binary vectors. The length of a binary
vector of Morgan fingerprint is 2048. Initially, binary vectors of
Morgan fingerprint (e.g.,
) are transformed into “indexing” vectors by collecting
all indexes of substructural presence (e.g., =
). The indexing vectors are sequences
of integers. Since only present substructural indexes are picked,
different compounds are expressed by indexing vectors of diverse lengths.
The indexing vectors are inputs of our model and same-length vectors,
therefore, are imperative. We chose the maximal length m of 220 because the longest indexing vector’s length is 220.
For indexing vectors of shorter lengths, zero paddings were added.
The new padded indexing vector, for instance, is
. The padded indexing vectors are then ready to be embedded to become
continuous matrices. The key point of MF embedding is to use an (n + 1) × k matrix, which is also simply
termed as a “lookup table”. To construct a lookup table,
each molecular substructure is first expressed by a continuous vector
of embedding dimension k and these vectors are then concatenated.
The indexing vectors are turned into MF-embedded matrices by fetching
the corresponding rows of the lookup table. The index of paddings
receives a row of all zeros. For example, if the substructures of
indexes 1, 4, 2047, and 2048 are
,
,
, and
, respectively, the new built-in matrix is Membedded = [[0.1, 0.2, 0.3, ..., 0.9], [0.1, 0.6, 0.7, ...,
0.9], [0.1, 0.9, 0.2, ..., 0.3], [0.5, 0.2, 0.7, ..., 0.9], [0, 0,
0, ..., 0], ..., [0, 0, 0, ..., 0]]. The lookup table is initialized
with random values that were iteratively updated during the training
session to minimize the loss function. The generated values are set
to follow a uniform distribution of the mean of 0 and the standard
deviation of 1. The lookup table is actually a primary layer of our
neural networks for the MF embedding. For compounds having many similar
substructures, the distances between their embedded matrices are expected
to be as close as possible and vice versa for the embedded matrices
of structurally dissimilar compounds.
Figure 1.

Conversion from a fingerprint to an embedded matrix.
2.3. Model Architecture
Figure 2 summarizes our proposed model architecture. Overall, the model was constructed with one embedding layer, one convolutional block, and one fully connected block. The fully connected block consisted of three fully connected layers. Except for the final layer of the fully connected block, the others were designed with batch normalization. Batch normalization52 is an essential regularization technique for weight standardization and lessens the effect of weight initialization. Adam optimizer53 was utilized to iteratively update network weights with the learning rate of 0.0025. The leaky rectified linear unit (LeakyRELU) was used as an activation function with a slope of 0.01 for the convolutional layer and the next two fully connected layers. The sigmoid function was the activation function of the last layer. The model was trained over 20 epochs. The optimal network was selected at the epoch displaying minimal validation loss. The loss function used is binary cross-entropy expressed as
| 1 |
where y is the true label and ŷ is the predicted probability. The indexing vectors of 1 × 220 were the inputs of the model. The embedding layer transformed indexing vectors into embedded matrices that were then normalized before passing through the next layer. The outputs of the embedding layer were the embedded matrices of 220 × 220. The embedded matrices move through the two-dimensional convolutional layer using l = 4096 filters and a kernel size of height (h) × embedding size (k) to obtain 4096 convoluted matrices of (m – h + 1) × 1 = 216 × 1. To convolute the input embedded matrices, a convolutional window of 5 × 220 with a stride of 1 and zero-padding was used. The convolutional window first slid down the squared embedded matrices of 220 × 220. After sliding the entire squared embedded matrices, the number of created feature maps (l) was equal to the number of the applied filters (=4096). These feature maps were expressed as continuous vectors of 216 × 1. A max-pooling layer using a kernel size of 216 × 1 is adopted to return 4096 feature maps of 1 × 1. After passing through the max-pooling layer, 4096 matrices of 1 × 1 (node) were obtained. These matrices were reshaped to create vectors of 1 × 4096, which were inputs for the next fully connected block. The numbers of nodes in three fully connected layers are 4096, 512, and 256, respectively. The dropout rate of 0.5 is added into the first fully connected and second fully connected to prevent overfitting. In our experiments, all the deep learning models were implemented using PyTorch 1.3.1 and trained on an i7-9700 CPU equipped with 32GB RAM and one NVIDIA Titan Xp GPU. It took about 2.4 sec to train one epoch and about 0.25 sec to complete testing.
Figure 2.

Architecture of the proposed deep learning model for screening DILI compounds.
2.4. Evaluation Metrics
In this study, three evaluation metrics including area under the receiver operating characteristic curve (AUC), accuracy (ACC), and Matthews’s correlation coefficient (MCC) were used to evaluate the model performance. These mathematical expressions of these evaluation metrics are specified below
| 2 |
| 3 |
where TP, FP, TN, and FN are the numbers of true positives, false positives, true negative, and false negative, respectively. These evaluation metrics may vary when changing the decision threshold for prediction.
3. Results and Discussion
3.1. Model Evaluation
To train the model, 10% of the development set was randomly selected to construct the validation set for model monitoring while the rest of the development data were used for training. We performed the experiments 30 times with random sampling of validation data. The results show that the training losses of most of the models converged at epoch 10 (16/30 trials), followed by at epoch 9 (11/30 trials), and at other epochs (3/30 trials). After epoch 9 or 10, the training losses continued dropping, while the validation losses started increasing. The plots of training loss versus validation loss of 30 trials are fully provided in the Supporting Information S1. All the models were selected at the epochs where the validation losses were minimum. For each trial, the best model was reloaded to train with one, two, and three additional epochs using the entire development set with the learning rate of 0.001 to observe any possible improvements. Accordingly, four models corresponding to four setups, including (i) the model at the converged epoch, (ii) the model with one additional training epoch, (iii) the model with two additional training epochs, and (iv) three additional training epochs were obtained for comparison. The detailed results of evaluation metrics for 30 trials of four setups are fully summarized in the Supporting Information S2. Figure 3 shows the variation in the AUC value over 30 trials and the results of hypothesis testing using an independent t-test. The outcomes indicate that additional training for several epochs can significantly boost the model performance on the independent test set. Besides, the small variation in AUC, accuracy, and MCC over 30 experimental trials suggests the model stability and robustness. The models of setups (ii), (iii), and (iv) show outperformance over the model of setup (i) (p-values < 0.0001). Meanwhile, differences in the AUC value of the models of setup (ii), (iii), and (iv) are not statistically significant. The averaged MCC values of models of setups (ii), (iii), and (iv) are about 0.80, 0.83, and 0.83, respectively compared to that of the model of setup (i) with 0.72. The averaged accuracies of these three setups are also slightly boosted from about 0.84–0.89, 0.91, and 0.92, respectively. The AUC value of about 0.96 is therefore highly meaningful to address the problem of screening DILI compounds (Table 2).
Figure 3.
AUC value of different setup models in comparison with each other and the state-of-the-art method (****p < 0.00001, ***p < 0.0001, **p < 0.01, *p < 0.05; ns: non significance).
Table 2. Averaged AUC, Accuracy, and MCC Values on the Test Set Over 30 Trials with Different Chosen Models.
| metric | model (i)a | model (ii)b | model (iii)c | model (iv)d |
|---|---|---|---|---|
| AUC | 0.95 ± 0.01 | 0.96 ± 0.01 | 0.96 ± 0.01 | 0.96 ± 0.01 |
| ACC | 0.84 ± 0.06 | 0.89 ± 0.03 | 0.91 ± 0.01 | 0.92 ± 0.01 |
| MCC | 0.72 ± 0.09 | 0.80 ± 0.05 | 0.83 ± 0.02 | 0.83 ± 0.01 |
Model at the converged epoch (setup (i)).
Model with one additional training epoch (setup (ii)).
Model with two additional training epochs (setup (iii)).
Model with three additional training epochs (setup (iv)).
3.2. Comparative Analysis
Although there are many computational frameworks designed for hepatotoxicity prediction, only methods that have been cross-validated were selected for comparison. This comparative analysis does not aim to rank the performance of these models but gives readers a general view of how these prediction frameworks can be improved throughout time because all the previous state-of-the-art methods were modeled under different conditions (number of data, algorithms, and test methods) (Table 3). Moreover, our work also suggests developing more effective screening tools using recent novel computational advances. In general, our dataset for model development is larger than the others’ dataset with 1919 compounds. Most of the methods used k-fold cross-validation as the test method. In our implementation, 10% of the development data were randomly sampled to create the validation set. The validation set was used to monitor and manage the training process to avoid overfitting. The experiments for each setup were repeated 30 times to fairly observe the variation. Additionally, we prepared and provided an independent test with no overlapping data with the development set, which can be used for future studies.
Table 3. Reported AUC Values and Testing Method of Other State-of-the-Art Methods.
Since the model performances of setups (ii), (iii), and (iv) are not significantly different from each other, the model of setup (ii) is selected to compare with the state-of-the-art method proposed by Ai et al. In comparison with the recent best-performed model of Ai et al., our model shows a significant improvement with the test AUC value of 0.96. Besides AUC, the MCC value is essential to examine a binary classification model and it is considered to be more informative than the accuracy in case of the imbalanced-class dataset. Our study comes up with an averaged MCC value of over 0.80 and an averaged accuracy of 0.89. For Ai et al.’s model, their work shows surprising competitive outcomes with the AUC value of 0.904. We compared our models of different setups to the current best AUC value of 0.904 using one-sample t-tests. The results strongly confirmed that the AUC values of all models are significantly higher than that of the state-of-the-art method (p-values < 0.00001). Their attempts came up with a better model performance compared to other previous state-of-the-art methods using ensemble learning. The idea of using ensemble learning to boost the model performance is not novel but it may significantly improve predictive power. However, an ensemble model created by the mean of a set of classifiers has limitations. When the variation in probabilistic values (created by contributor classifiers) of each sample is small, the change in ensembled probabilistic values can be neglected. The ensemble model may be considered stable under various conditions but the model performance may not be improved. In contrast, when the difference in probabilistic values of each sample is large, the ensembled probabilistic values can be reduced with a proneness to weak classifiers. Besides, using numerous contributor classifiers to create an ensemble classifier leads to unintentional creation of a bulky model with unnecessary complexities.
4. Conclusion
The experimental results confirm that our proposed computational frameworks to predict DILI compounds perform significantly better than other state-of-the-art methods based on all evaluated metrics. On the other hand, applying embedding techniques in computational modeling is highly recommended to improve model performance. Molecular fingerprint-embedded featurizer is expected to become an effective alternative molecular representation besides classic ones to promote model efficiency because it is designed to iteratively update to maximize the difference between distinct classes. Additionally, molecular fingerprint-embedded featurizer can be used to further develop more effective classification models to address various biological and biochemical issues. Besides, there is a room for model improvement using various types of molecular fingerprints to deal with specific problems. Our research contributes to the current data-driven effort in detection and prediction of adverse drug reactions.54
Acknowledgments
B.P.N. gratefully acknowledges the support of NVIDIA Corporation with the donation of the Titan Xp and Titan V GPUs used for this research.
Supporting Information Available
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acsomega.0c03866.
Plots of training loss versus validation loss of 30 trials; and detailed results of evaluation metrics for 30 trials of four setups (PDF)
Author Contributions
T.-H.N.-V. and L.N. contributed equally to this work. T.-H.N.-V. and B.P.N. designed the experiments, wrote the manuscript, and supervised all technical tasks in this study. L.N. directly performed the experiments and retrieved the data. N.D., T.-N.N., and P.H.L. contributed to data processing, manual correction, and feature calculation. B.P.N. and L.L. interpreted the results and significantly revised this manuscript. All authors have read and approved the final manuscript.
The work of T.-H.N.-V. and B.P.N. was supported in part by the National Science Challenge Science for Technological Innovation (SfTI) under Grant No. RTVU1904 by the Ministry of Business, Innovation and Employment (MBIE) of the New Zealand Government. The work of L.L. was supported in part by the Air Force Office of Scientific Research/Asian Office of Aerospace Research & Development (AOARD) under Grant No. FA2386-19-1-4032.
The authors declare no competing financial interest.
Notes
The benchmark dataset used in this study was collected from the previous work of Liew et al.,34 Zhang et al.,18 Chen et al.,35 Kotsampasakou et al.,36 and Ai et al.22 Source code and data are available at https://github.com/mldlproject/2020-DILI-CNN-MFE.
Supplementary Material
References
- Hsu M.-H.; Savas Ü.; Griffin K. J.; Johnson E. F. Human cytochrome p450 family 4 enzymes: function, genetic variation and regulation. Drug Metab. Rev. 2007, 39, 515–538. 10.1080/03602530701468573. [DOI] [PubMed] [Google Scholar]
- Singh D.; Cho W. C.; Upadhyay G. Drug-induced liver toxicity and prevention by herbal antioxidants: an overview. Front. Physiol. 2016, 6, 363 10.3389/fphys.2015.00363. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Björnsson E. S. Hepatotoxicity by drugs: the most common implicated agents. Int. J. Mol. Sci. 2016, 17, 224. 10.3390/ijms17020224. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pandit A.; Sachdeva T.; Bafna P. Drug-induced hepatotoxicity: a review. J. Appl. Pharm. Sci. 2012, 2, 233–243. 10.7324/JAPS.2012.2541. [DOI] [Google Scholar]
- Jaeschke H.; Xie Y.; McGill M. R. Acetaminophen-induced liver injury: from animal models to humans. J. Clin. Trans. Hepatol. 2014, 2, 153. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Voican C. S.; Corruble E.; Naveau S.; Perlemuter G. Antidepressant-induced liver injury: a review for clinicians. Am. J. Psychiatry 2014, 171, 404–415. 10.1176/appi.ajp.2013.13050709. [DOI] [PubMed] [Google Scholar]
- Giustarini G.; Vrisekoop N.; Kruijssen L.; Wagenaar L.; van Staveren S.; van Roest M.; Bleumink R.; Bol-Schoenmakers M.; Weaver R. J.; Koenderman L.; Smit J.; Pieters R. Trovafloxacin-induced liver injury: Lack in regulation of inflammation by inhibition of nucleotide release and neutrophil movement. Toxicol. Sci. 2019, 167, 385–396. 10.1093/toxsci/kfy244. [DOI] [PubMed] [Google Scholar]
- Maronpot R. R.; Yoshizawa K.; Nyska A.; Harada T.; Flake G.; Mueller G.; Singh B.; Ward J. M. Hepatic enzyme induction: histopathology. Toxicol. Pathol. 2010, 38, 776–795. 10.1177/0192623310373778. [DOI] [PubMed] [Google Scholar]
- Regev A. Drug-induced liver injury and drug development: industry perspective. Semin. Liver Dis. 2014, 34, 227–239. 10.1055/s-0034-1375962. [DOI] [PubMed] [Google Scholar]
- Kullak-Ublick G. A.; Andrade R. J.; Merz M.; End P.; Benesic A.; Gerbes A. L.; Aithal G. P. Drug-induced liver injury: recent advances in diagnosis and risk assessment. Gut 2017, 66, 1154–1164. 10.1136/gutjnl-2016-313369. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kuna L.; Bozic I.; Kizivat T.; Bojanic K.; Mrso M.; Kralj E.; Smolic R.; Wu G. Y.; Smolic M. Models of drug-induced liver injury (DILI)–Current issues and future perspectives. Curr. Drug Metab. 2018, 19, 830–838. 10.2174/1389200219666180523095355. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kenne K.; Skanberg I.; Glinghammar B.; Berson A.; Pessayre D.; Flinois J.-P.; Beaune P.; Edebert I.; Pohl C. D.; Carlsson S.; Andersson T. B. Prediction of drug-induced liver injury in humans by using in vitro methods: the case of ximelagatran. Toxicol. In Vitro 2008, 22, 730–746. 10.1016/j.tiv.2007.11.014. [DOI] [PubMed] [Google Scholar]
- Pessiot J.-F.; Wong P. S.; Maruyama T.; Morioka R.; Tanaka S. A. M.; Fujibuchi W.; Aburatani S. The impact of collapsing data on microarray analysis and DILI prediction. Syst. Biomed. 2013, 1, 137–143. 10.4161/sysb.24255. [DOI] [Google Scholar]
- Goldring C.; et al. Stem cell-derived models to improve mechanistic understanding and prediction of human drug-induced liver injury. Hepatology 2017, 65, 710–721. 10.1002/hep.28886. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vinken M. In vitro prediction of drug-induced cholestatic liver injury: a challenge for the toxicologist. Arch. Toxicol. 2018, 92, 1909–1912. 10.1007/s00204-018-2201-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ekins S.; Williams A. J.; Xu J. J. A predictive ligand-based Bayesian model for human drug-induced liver injury. Drug Metab. Dispos. 2010, 38, 2302–2308. 10.1124/dmd.110.035113. [DOI] [PubMed] [Google Scholar]
- Huang S.-H.; Tung C.-W.; Fülöp F.; Li J.-H. Developing a QSAR model for hepatotoxicity screening of the active compounds in traditional Chinese medicines. Food Chem. Toxicol. 2015, 78, 71–77. 10.1016/j.fct.2015.01.020. [DOI] [PubMed] [Google Scholar]
- Zhang C.; Cheng F.; Li W.; Liu G.; Lee P. W.; Tang Y. In silico prediction of drug induced liver toxicity using substructure pattern recognition method. Mol. Inf. 2016, 35, 136–144. 10.1002/minf.201500055. [DOI] [PubMed] [Google Scholar]
- Kim E.; Nam H. Prediction models for drug-induced hepatotoxicity by using weighted molecular fingerprints. BMC Bioinformatics 2017, 18, 227 10.1186/s12859-017-1638-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tharwat A.; Moemen Y. S.; Hassanien A. E. Classification of toxicity effects of biotransformed hepatic drugs using whale optimized support vector machines. J. Biomed. Informatics 2017, 68, 132–149. 10.1016/j.jbi.2017.03.002. [DOI] [PubMed] [Google Scholar]
- Yang H.; Sun L.; Li W.; Liu G.; Tang Y. In silico prediction of chemical toxicity for drug design using machine learning methods and structural alerts. Front. Chem. 2018, 6, 30 10.3389/fchem.2018.00030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ai H.; Chen W.; Zhang L.; Huang L.; Yin Z.; Hu H.; Zhao Q.; Zhao J.; Liu H. Predicting drug-induced liver injury using ensemble learning methods and molecular fingerprints. Toxicol. Sci. 2018, 165, 100–107. 10.1093/toxsci/kfy121. [DOI] [PubMed] [Google Scholar]
- Chen M.; Hong H.; Fang H.; Kelly R.; Zhou G.; Borlak J.; Tong W. Quantitative structure-activity relationship models for predicting drug-induced liver injury based on FDA-approved drug labeling annotation and using a large collection of drugs. Toxicol. Sci. 2013, 136, 242–249. 10.1093/toxsci/kft189. [DOI] [PubMed] [Google Scholar]
- Zhang H.; Ding L.; Zou Y.; Hu S.-Q.; Huang H.-G.; Kong W.-B.; Zhang J. Predicting drug-induced liver injury in human with Naïve Bayes classifier approach. J. Comput.-Aided Mol. Des. 2016, 30, 889–898. 10.1007/s10822-016-9972-6. [DOI] [PubMed] [Google Scholar]
- Funk C.; Roth A. Current limitations and future opportunities for prediction of DILI from in vitro. Arch. Toxicol. 2017, 91, 131–142. 10.1007/s00204-016-1874-9. [DOI] [PubMed] [Google Scholar]
- Ho T. K. In Random decision forests, Proceedings of 3rd International Conference on Document Analysis and Recognition; IEEE, 1995; pp 278–282.
- Breiman L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. 10.1007/BF00058655. [DOI] [Google Scholar]
- Breiman L. Random forests. Mach. Learn. 2001, 45, 5–32. 10.1023/A:1010933404324. [DOI] [Google Scholar]
- Chen M.; Vijay V.; Shi Q.; Liu Z.; Fang H.; Tong W. FDA-approved drug labeling for the study of drug-induced liver injury. Drug Discovery Today 2011, 16, 697–703. 10.1016/j.drudis.2011.05.007. [DOI] [PubMed] [Google Scholar]
- Steinwart I.; Christmann A.. Support Vector Machines; Springer Science & Business Media, 2008. [Google Scholar]
- Mirjalili S.; Lewis A. The whale optimization algorithm. Adv. Eng. Softw. 2016, 95, 51–67. 10.1016/j.advengsoft.2016.01.008. [DOI] [Google Scholar]
- Kamei Y.; Monden A.; Matsumoto S.; Kakimoto T.; Matsumoto K.-I. In The Effects of Over and Under Sampling on Fault-prone Module Detection, Proceedings of the 1st International Symposium on Empirical Software Engineering and Measurement; IEEE: 2007; pp 196–204.
- Chawla N. V.; Bowyer K. W.; Hall L. O.; Kegelmeyer W. P. SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. 10.1613/jair.953. [DOI] [Google Scholar]
- Liew C. Y.; Lim Y. C.; Yap C. W. Mixed learning algorithms and features ensemble in hepatotoxicity prediction. J. Comput.-Aided Mol. Des. 2011, 25, 855. 10.1007/s10822-011-9468-3. [DOI] [PubMed] [Google Scholar]
- Chen M.; Suzuki A.; Thakkar S.; Yu K.; Hu C.; Tong W. DILIrank: the largest reference drug list ranked by the risk for developing drug-induced liver injury in humans. Drug Discovery Today 2016, 21, 648–653. 10.1016/j.drudis.2016.02.015. [DOI] [PubMed] [Google Scholar]
- Kotsampasakou E.; Montanari F.; Ecker G. F. Predicting drug-induced liver injury: The importance of data curation. Toxicology 2017, 389, 139–145. 10.1016/j.tox.2017.06.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Landrum G.RDKit: Open-source cheminformatics. 2006, http://www.rdkit.org.
- LeCun Y.; Bengio Y.; Hinton G. Deep Learning. Nature 2015, 521, 436–444. 10.1038/nature14539. [DOI] [PubMed] [Google Scholar]
- Voulodimos A.; Doulamis N.; Doulamis A.; Protopapadakis E. Deep learning for computer vision: A brief review. Comput. Intell. Neurosci. 2018, 2018, 1–13. 10.1155/2018/7068349. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ling Z.-H.; Kang S.-Y.; Zen H.; Senior A.; Schuster M.; Qian X.-J.; Meng H. M.; Deng L. Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends. IEEE Signal Process. Mag. 2015, 32, 35–52. 10.1109/MSP.2014.2359987. [DOI] [Google Scholar]
- Wang J.; Chen Y.; Hao S.; Peng X.; Hu L. Deep learning for sensor-based activity recognition: A survey. Pattern Recognit. Lett. 2019, 119, 3–11. 10.1016/j.patrec.2018.02.010. [DOI] [Google Scholar]
- Mikolov T.; Sutskever I.; Chen K.; Corrado G. S.; Dean J.. Distributed Representations of Words and Phrases and their Compositionality, In Advances in Neural Information Processing Systems; NIPS: 2013; pp 3111–3119. [Google Scholar]
- Min S.; Lee B.; Yoon S. Deep learning in bioinformatics. Briefings Bioinf. 2017, 18, 851–869. 10.1093/bib/bbw068. [DOI] [PubMed] [Google Scholar]
- Zhang L.; Tan J.; Han D.; Zhu H. From machine learning to deep learning: progress in machine intelligence for rational drug discovery. Drug Discovery Today 2017, 22, 1680–1685. 10.1016/j.drudis.2017.08.010. [DOI] [PubMed] [Google Scholar]
- Cadeddu A.; Wylie E. K.; Jurczak J.; Wampler-Doty M.; Grzybowski B. A. Organic chemistry as a language and the implications of chemical linguistics for structural and retrosynthetic analyses. Angew. Chem., Int. Ed. 2014, 53, 8108–8112. 10.1002/anie.201403708. [DOI] [PubMed] [Google Scholar]
- Rogers D.; Hahn M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 2010, 50, 742–754. 10.1021/ci100050t. [DOI] [PubMed] [Google Scholar]
- Durant J. L.; Leland B. A.; Henry D. R.; Nourse J. G. Reoptimization of MDL keys for use in drug discovery. J. Chem. Inf. Comput. Sci. 2002, 42, 1273–1280. 10.1021/ci010132r. [DOI] [PubMed] [Google Scholar]
- Cereto-Massagué A.; Ojeda M. J.; Valls C.; Mulero M.; Garcia-Vallvé S.; Pujadas G. Molecular fingerprint similarity search in virtual screening. Methods 2015, 71, 58–63. 10.1016/j.ymeth.2014.08.005. [DOI] [PubMed] [Google Scholar]
- Kimothi D.; Soni A.; Biyani P.; Hogan J. M.. Distributed representations for biological sequence analysis. 2016, 1–7; arXiv:abs/1608.05949. arXiv.org e-Print archive. http://arxiv.org/abs/1608.05949.
- Jaeger S.; Fulle S.; Turk S. Mol2vec: unsupervised machine learning approach with chemical intuition. J. Chem. Inf. Model. 2018, 58, 27–35. 10.1021/acs.jcim.7b00616. [DOI] [PubMed] [Google Scholar]
- Jeon W.; Kim D. FP2VEC: a new molecular featurizer for learning molecular properties. Bioinformatics 2019, 35, 4979–4985. 10.1093/bioinformatics/btz307. [DOI] [PubMed] [Google Scholar]
- Ioffe S.; Szegedy C.. Batch normalization: Accelerating deep network training by reducing internal covariate shift. 2015, arXiv:abs1502.03167. arXiv.org e-Print archive. https://arxiv.org/abs/1502.03167.
- Grzybowski D. P.; Ba J.. Adam: A method for stochastic optimization. 2014, arXiv:1412.6980. arXiv.org e-Print archive. https://arxiv.org/abs/1412.6980.
- Ho T. B.; Le L.; Thai D. T.; Taewijit S. Data-drive Approach to Detect and Predict Adverse Drug Reactions. Curr. Pharm. Des. 2016, 22, 3498–3526. 10.2174/1381612822666160509125047. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.


