Computational determination of hERG-related cardiotoxicity of drug candidates

Hyang-Mi Lee; Myeong-Sang Yu; Sayada Reemsha Kazmi; Seong Yun Oh; Ki-Hyeong Rhee; Myung-Ae Bae; Byung Ho Lee; Dae-Seop Shin; Kwang-Seok Oh; Hyithaek Ceong; Donghyun Lee; Dokyun Na

doi:10.1186/s12859-019-2814-5

. 2019 May 29;20(Suppl 10):250. doi: 10.1186/s12859-019-2814-5

Computational determination of hERG-related cardiotoxicity of drug candidates

Hyang-Mi Lee ¹, Myeong-Sang Yu ¹, Sayada Reemsha Kazmi ¹, Seong Yun Oh ¹, Ki-Hyeong Rhee ², Myung-Ae Bae ³, Byung Ho Lee ⁴, Dae-Seop Shin ³, Kwang-Seok Oh ⁴, Hyithaek Ceong ⁵, Donghyun Lee ^1,^✉, Dokyun Na ^1,^✉

PMCID: PMC6538553 PMID: 31138104

Abstract

Background

Drug candidates often cause an unwanted blockage of the potassium ion channel of the human ether-a-go-go-related gene (hERG). The blockage leads to long QT syndrome (LQTS), which is a severe life-threatening cardiac side effect. Therefore, a virtual screening method to predict drug-induced hERG-related cardiotoxicity could facilitate drug discovery by filtering out toxic drug candidates.

Result

In this study, we generated a reliable hERG-related cardiotoxicity dataset composed of 2130 compounds, which were carried out under constant conditions. Based on our dataset, we developed a computational hERG-related cardiotoxicity prediction model. The neural network model achieved an area under the receiver operating characteristic curve (AUC) of 0.764, with an accuracy of 90.1%, a Matthews correlation coefficient (MCC) of 0.368, a sensitivity of 0.321, and a specificity of 0.967, when ten-fold cross-validation was performed. The model was further evaluated using ten drug compounds tested on guinea pigs and showed an accuracy of 80.0%, an MCC of 0.655, a sensitivity of 0.600, and a specificity of 1.000, which were better than the performances of existing hERG-toxicity prediction models.

Conclusion

The neural network model can predict hERG-related cardiotoxicity of chemical compounds with a high accuracy. Therefore, the model can be applied to virtual high-throughput screening for drug candidates that do not cause cardiotoxicity. The prediction tool is available as a web-tool at http://ssbio.cau.ac.kr/CardPred.

Keywords: In silico model, Machine learning, hERG-related cardiotoxicity, Drug discovery

Background

Many drug candidates are withdrawn owing to unexpected side effects. Therefore, it is a major challenge to screen out potential toxic compounds in the drug discovery process. Cardiac toxicity is one of the side effects and a major cause of drug withdrawals in drug discovery. A representative mechanism of cardiotoxicity involves the binding of compounds to the cardiac potassium channel encoded by the human ether-a-go-go-related gene (hERG), which results in long QT syndrome (LQTS) and eventually leads to fatal ventricular arrhythmias and sudden death [1, 2]. Recently, many drugs, such as terfenadine, cisapride, astemizole, sertindole, thioridazine, and grepafloxacin, were withdrawn from the market owing to undesired cardiotoxicity effects [3]. The development of an accurate prediction model for hERG channel blockers is, therefore, essential in the early stage of drug development.

Experimental high-throughput screening methods have been developed [4], but experimental methods for drug-induced cardiotoxicity are time-consuming and costly. Thus, it is necessary to develop a computational approach to accelerate drug discovery. In recent years, several ligand-based in silico models have been developed to predict drug-hERG interactions based on the pharmacophore, quantitative structure-activity relationship (QSAR), and classification models [5–8].

The first pharmacophore model was developed based on steric and electronic features associated with the biological effects on hERG binding affinity using 15 compounds by Ekins et al. [9]. Because conventional pharmacophore models were generally developed using small training datasets of fewer than 500 [10, 11], their applicability was highly limited. Thus, ensemble models integrating diverse pharmacophore methods have also been developed for a better prediction of the hERG binding affinity [5, 12].

Three-dimensional (3D)-QSAR models based on 3D structure information, such as the molecular interaction fields, have been developed to predict the correlation between the 3D structure information and hERG binding affinity by regression analysis. Two representative methods used for 3D-QSAR modelling were the comparative molecular field analysis (CoMFA) [13] and grid-independent descriptors (GRIND) [14]. Both 3D-QSAR models exhibited a high performance in predicting the binding affinity for most compounds that were not lipophilic compounds [13, 15].

Classification models for toxicity prediction have been developed using a set of physicochemical descriptors. To improve prediction performance, various machine learning algorithms have been employed, including the support vector machine (SVM), naïve Bayes, decision tree, random forest, and k-nearest neighbors (kNN) [16–19]. The machine learning algorithms have facilitated the advancement of prediction model development, but the inclusion of inconsistent experimental data included in training datasets damps the development of accurate prediction models [20]. Available hERG toxicity datasets were compiled from the literature in which experiments were conducted under different conditions and the definition of toxicity was also different. To our knowledge, there are no large hERG toxicity datasets obtained from a single study. Recently, Czodrowski et al. developed a hERG toxicity prediction model using a large dataset containing 4415 compounds extracted from the ChEMBL database [20]; however, the model showed a low AUC value because of the inconsistency of the database. Because the hERG toxicity database was compiled from the literature, it included many inconsistent experimental data.

For this study, we generated a large experimental dataset of hERG assay results from 2130 chemicals, which were carried out under the same conditions. Similar to the ChEMBL hERG toxicity database, publicly available datasets were generally collected from the literature and may contain many inconsistent data. Such inconsistency may lead to inaccurate computational models. Our dataset was used to train machine learning models (linear regression, ridge regression, logistic regression, naïve Bayes, neural network, and random forest), and it was found that the model using the neural network showed a higher Matthews correlation coefficient (MCC) of 0.368, than the other models. In addition, when the neural network model was further evaluated using a test dataset of ten drug compounds obtained from in vivo experiments in this study, the model showed a high accuracy of 80% (MCC of 0.655). Therefore, the developed hERG-toxicity prediction model can be utilized as a virtual screening tool for the identification of the cardiotoxicity of drug candidates in the early stage of drug discovery.

Materials and methods

Binding assay for hERG based on fluorescence polarization

The fluorescence polarization (FP)-based binding assay for hERG was measured according to the protocol of the Predictor™ hERG FP kit (Thermo Fisher Scientific, Inc., Rockford, IL, USA). The membrane fraction containing the hERG channel protein (Predictor™ hERG membrane) and tracer (Predictor™ hERG tracer red) was prepared with dilution in the binding buffer provided by the manufacturer. The binding assay was conducted in a final volume of 20 μL with a 10 μL membrane, 5 μL of a 4 nM tracer, and 5 μL of test compounds. The assays were conducted in 384 well black flat-bottom microplates (Corning Life Sciences, Lowell, MA, USA). After incubation for 4 h at room temperature, the FP was determined using a multimode reader (Infinite M1000PRO; Tecan, Mannedorf, Switzerland) in the FP detection mode, with excitation and emission filters of 535 and 590 nm, respectively.

In vivo experimental procedures and recordings of electrocardiography

In this study, guinea pigs were used and fasted for 18 h prior to the experimental procedures. The animals were anesthetized with sodium pentobarbital (60 mg/kg, i.p.), followed by artificial respiration using a rodent ventilator (60 strokes/min, 1 ml/100 g BW). The animals were placed on a heat pad with circulating water at a temperature of 37 °C. A catheter was inserted into the jugular vein for drug administration, and electrocardiography (ECG) pin electrodes were positioned for the standard limb lead and chest lead configurations. All the animals were allowed to stabilize for 20 min after being instrumented, prior to drug administration. When the heart rate of each animal was constant, the lowest concentration of the drug was administered for 1 min through the jugular vein. After 10 min, the test drug at the following concentration was administered according to the cumulative method. The QRS complex and the PR, QT, PRC, and QRc intervals were measured with the ECG measurement yields, in addition to the heart rate, for the evaluation of the cardiac function. The values were expressed as the mean and standard deviations of each group. The data were analyzed using the one-way analysis of variance (ANOVA) followed by Dunnett’s test, to verify the significant differences between the groups.

Data preparation

The hERG toxicities of 2130 compounds were measured as IC₅₀ values. Compounds with IC₅₀ < 10 μM were classified as toxic and the other compounds were classified as nontoxic [19]. Consequently, 221 compounds (10.38%) were identified as hERG-toxic, and 1909 compounds (89.62%) were identified as nontoxic. The toxicities of ten drug compounds obtained from in vivo experiments, which were not included in the 2130 compounds, were used for testing our developed model.

Descriptor calculation

The compounds from the hERG toxicity assays were expressed in the simplified molecular-input line-entry system (SMILES) format [21], and the SMILES were used for the DRAGON software (version 7.0.10) to calculate their physicochemical descriptors and fingerprints (2432 nonconstant molecular descriptors) [22]. In addition, extended connectivity fingerprints (ECFPs) were also generated [23] with a maximum diameter parameter of 4 and length parameter of 1024. Thus, in this study, 3456 molecular features were used for the training of the learning models.

Feature correlation calculation and feature selection

To reduce the number of features in developing the prediction models, 3456 features were ranked in order of their correlation with toxicity. The phi coefficient was calculated for binary features [24], and the point-biserial correlation coefficient was calculated for continuous features [25].

To calculate the point-biserial correlation coefficient, the dataset was divided into toxic and nontoxic molecules. The point-biserial correlation coefficient (r_pb) was calculated as follows:

\begin{array}{c} r_{pb} = \frac{M_{toxic} - M_{nontoxic}}{s_{n}} \sqrt{\frac{n_{toxic} \times n_{nontoxic}}{n^{2}}} \\ where s_{n} = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(X_{i} - \bar{X})}^{2},} \end{array}

M_toxic and M_nontoxic denote the mean feature values of the toxic and nontoxic compounds, respectively. n_toxic and n_nontoxic denote the numbers of toxic and nontoxic compounds, respectively, and n is the total number of molecules. s_n denotes the standard deviation of the feature. X_i represents each feature value and $\bar{X}$ denotes the mean value of all the feature values.

The phi coefficient (∅) was calculated as below:

\emptyset = \frac{n_{toxic ∙ 1} \times n_{nontoxic ∙ 0} - n_{toxic ∙ 0} \times n_{nontoxic ∙ 1}}{\sqrt{(n_{toxic ∙ 1} + n_{toxic ∙ 0}) (n_{toxic ∙ 1} + n_{nontoxic ∙ 1}) (n_{nonto x ic ∙ 1} + n_{nontoxic ∙ 0}) (n_{toxic ∙ 0} + n_{nontoxic ∙ 0})}}

where n_{toxic ∙ 1} and n_{toxic ∙ 0} denote the number of features of toxic compounds, which are 1 and 0, respectively. n_{nontoxic ∙ 1} and n_{nontoxic ∙ 0} denote the number of features of nontoxic compounds, which are 1 and 0, respectively.

Models

Six machine learning algorithms were used to construct the hERG toxicity prediction models. The linear regression is a simple regression algorithm that models the linear relationship between a dependent variable and multiple explanatory variables [26]. The ridge regression is an advanced linear regression model that introduces a ridge regularization method for the optimization of the model [27]. The logistic regression is a regression algorithm that models a logistic relationship, which can be used for binary classification [28]. A naïve Bayes is a probabilistic classification model based on the Bayesian theorem and the naïve independency between features [29]. A random forest is an ensemble model that constructs multiple decision trees and combines them to derive a merged result [30]. A neural network is a machine learning model that refers to a network structure composed of artificial neurons and nodes, which can optimize the network to recognize patterns of input data [31]. These algorithms were implemented in the Orange 3 Python machine learning package, and, in this study, Orange 3 was used to develop the hERG toxicity prediction models [32].

Performance evaluation

The six models trained with our dataset were evaluated by ten-fold cross-validation. In this process, the optimal number of features was also determined by the area under the receiver operating characteristic curve (AUC). Because the dataset was biased to nontoxic compounds, we also calculated the MCC that is an accuracy measure for unbalanced datasets. After the cross-validation and feature number optimization, the best model was determined. This model was further evaluated with ten drug compounds that were not included in the training dataset and were tested in vivo on guinea pigs to assess the applicability of our model developed using in vitro data to in vivo toxicity. The performance of our model was compared with other hERG prediction tools, the Pred-hERG 4.1 [6] and OCHEM Predictor [33].

Results and discussion

Model construction

Correlation coefficients between the features and toxicity were calculated and the top-ranked features were used to train models. The top 20 features are listed in Table 1. Computational hERG prediction models were trained using six different machine learning algorithms with a different number of top features. The six algorithms were linear regression, ridge regression, logistic regression, artificial neural network, naïve Bayes, and random forest. Their ten-fold cross-validation results and respective optimal feature numbers are shown in Fig. 1 and Table 2. Of the six models, those developed based on the neural network (AUC = 0.764, feature = 1400), ridge regression (AUC = 0.774, feature = 400), and logistic regression (AUC = 0.764, feature = 350) showed better performances than those of the other models. Because the performances of the three models were comparable, they were further optimized to determine the best model.

Table 1.

Top 20 features with a high correlation

Descriptor	Coeff.	Description
nRNR2	0.229	Number of tertiary amines (aliphatic)
Wap	0.215	All-path Wiener index
F02[C-C]	0.212	Frequency of C - C at topological distance 2
F03[C-C]	0.212	Frequency of C - C at topological distance 3
nC	0.211	Number of carbon atoms
F04[C-C]	0.210	Frequency of C - C at topological distance 4
D/Dtr06	0.208	Distance/detour ring index of order 6
ATSC5v	0.207	Centred Broto–Moreau autocorrelation of lag 5 (weighted by van der Waals volume)
F01[C-C]	0.205	Frequency of C - C at topological distance 1
SpDiam_Dt	0.205	Spectral diameter from detour matrix
SpAD_Dt	0.204	Spectral absolute deviation from detour matrix
SpPos_Dt	0.204	Spectral positive sum from detour matrix
N-068	0.203	Atom-centered fragment: Al3-N
Wi_Dt	0.203	Wiener-like index from detour matrix (detour index)
SpMax_Dt	0.203	Leading eigenvalue from detour matrix
TI1_L	0.203	First Mohar index from Laplace matrix
H_Dz(p)	0.202	Harary-like index from Barysz matrix (weighted by atomic number)
IDET	0.202	Total information content on the distance equality
F10[C-C]	0.202	Frequency of C - C at topological distance 10
nR06	0.201	Number of six-membered rings

Open in a new tab

Fig. 1 — AUC with respect to feature number: AUC values of the six models were measured by a ten-fold cross-validation with respect to feature number

Table 2.

Performance (AUC) results of six machine learning methods

Algorithm	Optimal number of features	AUC
Linear regression	40	0.747
Logistic regression	350	0.764
Ridge regression	400	0.774
Neural network	1400	0.764
Naïve Bayes	40	0.687
Random forest	120	0.709

Open in a new tab

Model optimization

To select the best model, we optimized the threshold values of the three selected models, which discriminated toxic and nontoxic groups. The best threshold values that showed the highest MCC are listed in Table 3. MCC is an accuracy measure for highly unbalanced datasets. Of the three models, the neural network model showed the best performance, with an accuracy of 90.1%, an MCC of 0.368, and a positive predictive value (PPV) of 0.542 after threshold optimization. The low sensitivity and high specificity of the neural network model were due to its high threshold value, but the high threshold improved its performance expressed as MCC. Consequently, the toxicity prediction model based on the neural network was selected for further evaluation.

Table 3.

Performance results of the top three models with optimized thresholds

Algorithm	Threshold	Accuracy	MCC	Sensitivity	Specificity	PPV^a
Logistic regression	0.57	0.814	0.307	0.557	0.844	0.292
Neural network	0.82	0.901	0.368	0.321	0.967	0.542
Ridge regression	0.64	0.864	0.332	0.448	0.912	0.371

Open in a new tab

^aPPV: Positive predictive value is defined as the number of true positives/(the number of true positives + the number of false positives)

Test of the constructed model on in vivo data

The optimized model was further tested on ten known drug molecules, whose cardiotoxicities were measured in vivo using guinea pigs. In vitro experiments are simpler and less expensive than in vivo experiments, hence, they can be carried out at a larger scale. However, owing to the complex physiology of in vivo systems, in vitro experimental results are often inconsistent with in vivo results. Thus, we further evaluated the applicability of our model that was trained using in vitro data to the in vivo toxicity. The prediction results of the test compounds are shown in Tables 4 and 5. Our model showed an overall accuracy of 80.0%, an MCC of 0.655, a sensitivity of 0.600, a specificity of 1.000, and a PPV of 1.000. This high performance indicates that our model could also be utilized to predict in vivo cardiotoxicity.

Table 4.

Prediction results of ten drug compounds

Name	In vivo result		Prediction
Name		Our model	Pred-hERG binary	Pred-hERG multiclass	OCHEM Predictor^a
Haloperidol	Toxic	Toxic	Toxic	Nontoxic	Nontoxic
Cimetidine	Nontoxic	Nontoxic	Toxic	Nontoxic	Nontoxic
Disopyramide	Toxic	Toxic	Nontoxic	Nontoxic	Nontoxic
Quinnidine	Toxic	Nontoxic	Toxic	Nontoxic	Toxic
Terazosin	Nontoxic	Nontoxic	Toxic	Nontoxic	Nontoxic
Spironolactone	Nontoxic	Nontoxic	Toxic	Nontoxic	Nontoxic
Sotalol	Toxic	Nontoxic	Nontoxic	Nontoxic	Nontoxic
Cefazoline	Nontoxic	Nontoxic	Toxic	Nontoxic	Nontoxic
Chloropromazine	Toxic	Toxic	Toxic	Toxic	Nontoxic
Loratadine	Nontoxic	Nontoxic	Toxic	Nontoxic	Nontoxic

Open in a new tab

^aConsensus II in the predictor was used

Table 5.

Performance comparison on the in vivo test dataset

Models	Accuracy	MCC	Sensitivity	Specificity
Our model	0.800	0.655	0.600	1.000
Pred-hERG binary	0.300	−0.500	0.600	0.000
Pred-hERG multiclass	0.600	0.333	0.200	1.000
OCHEM Predictor	0.600	0.333	0.200	1.000

Open in a new tab

Several computational methods have been reported for the prediction of hERG toxicity (Pred-hERG and OCHEM Predictor). We compared the performance of our model with previous methods; the prediction results of other methods are also listed in Table 5. The Pred-hERG model is a web-tool based on the statistical QSAR model of hERG channel blockers. OCHEM is also a web-tool based on eight associative neural network models. The prediction results of the ten test drug compounds using the previous methods, and their overall performances are listed in Tables 4 and 5, respectively. Pred-hERG has two models: binary and multiclass. The Pred-hERG binary model decides whether a query compound is a hERG-blocker or nonblocker. The Pred-hERG multiclass model determines the group in which a query compound belongs: nonblockers, weak/moderate blockers, or strong blockers. In this study, we considered weak/moderate and strong blockers as hERG-toxic. The binary model of the Pred-hERG predicted eight out of ten compounds as toxic molecules with an accuracy of 30%. Whereas the multiclass model of the Pred-hERG predicted nine out of ten compounds as nontoxic with an accuracy of 60%. Their MCC values were − 0.500 and 0.333, respectively. Similar to the multiclass model of the Pred-hERG, the OCHEM Predictor predicted nine out of ten compounds as nontoxic. Its accuracy and MCC were 60% and 0.333, respectively. The three previous models made biased predictions, resulting in a very low sensitivity or very low specificity (Table 5). Our model correctly predicted eight out of ten compounds with an accuracy of 80% and an MCC of 0.655, which indicates that our model outperforms other methods and would be useful for the prediction of the in vivo cardiotoxicity of drug candidates. It can also be used for virtual screening in drug discovery.

Additional comparison with previous models

Because in vivo cardiotoxicity assays require animal experiments, it is difficult to obtain a large number of in vivo data. Performance comparison with only ten compounds was not fair, so we evaluated the performances of previous methods using the training dataset containing 2130 compounds obtained from in vitro experiments. For a fair comparison, we divided the dataset into training (90%) and test (10%) datasets; the training data was used to build our model and the remaining test dataset was used to evaluate the performances of our model, the Pred-hERG, and OCHEM Predictor. The evaluation was iterated ten times, and their averages were calculated (Table 6). The MCC values of the previous models were lower than that of our model. Specifically, the Pred-hERG binary model showed an MCC of − 0.034, a sensitivity of 0.912, and a specificity of 0.061, indicating that this model classified most query molecules as toxic and had many false positives. This high number of false positives for the Pred-hERG binary model were also shown on the test dataset (Tables 4 and 5). On the contrary, the Pred-hERG multiclass and OCHEM Predictor showed a low sensitivity and a high specificity, indicating that they classified most query molecules as nontoxic. Because the dataset was highly unbalanced to negative (nontoxic) data, the biased predictions of the Pred-hERG multiclass and OCHEM Predictor to the nontoxic class increased the accuracy to 90.2 and 88.5% and decreased their MCCs to 0.218 and 0.133, respectively. Consequently, our model consistently showed a better performance for the small test dataset as well as on the training dataset.

Table 6.

Performance comparison on the in vitro dataset

Models	Accuracy	MCC	Sensitivity	Specificity
Our model	0.901	0.368	0.321	0.967
Pred-hERG binary	0.15	−0.034	0.912	0.061
Pred-hERG multiclass	0.902	0.218	0.075	0.999
OCHEM Predictor	0.885	0.133	0.099	0.978

Open in a new tab

Conclusion

In this study, we aimed at producing a reliable hERG toxicity dataset and then at developing a better performing cardiotoxicity prediction model. Computational models are highly dependent on the reliability of datasets; however, the collected datasets from the literature may include inconsistent experimental results. We generated our own consistent dataset to build a model; the developed prediction model using our dataset outperformed the other hERG prediction tools. Our model can be useful for the virtual screening for potential drug candidates that do not cause cardiotoxicity and would facilitate the advancement of in silico drug discovery. However, in this study, new features and new machine learning methods were not introduced, so there is scope to improve our model further if new features specialized for describing the cardiotoxicity of molecules are included or new machine learning algorithms are used that efficiently and effectively classify molecules using the features.

Acknowledgements

Not applicable.

Funding

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. NRF-2018R1A5A1025077). This work was also supported by the Bio-Synergy Research Project (NRF-2018M3A9C4076474) of the Ministry of Science, ICT, and Future Planning through the National Research Foundation. Publication costs are funded by the grant (NRF-2018R1A5A1025077).

Availability of data and materials

The datasets supporting the conclusions of this article are available from the corresponding author upon request.

About this supplement

This article has been published as part of BMC Bioinformatics Volume 20 Supplement 10, 2019: Proceedings of the 12th International Workshop on Data and Text Mining in Biomedical Informatics (DTMBIO 2018). The full contents of the supplement are available online at https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume-20-supplement-10.

Abbreviations

AUC: Area under ROC curve
ECFP: Extended connectivity fingerprint
hERG: Human ether-a-go-go-related gene
LQTS: Long QT syndrome
MCC: Matthews correlation coefficient
PPV: Positive predictive value
SEN: Sensitivity
SMILES: Simplified molecular-input line-entry system
SPE: Specificity

Authors’ contributions

HL and MY developed the prediction models and conducted the evaluations. SK, SO, HC, and KR prepared the training and test datasets and their features. MB, BL, DS, and KO conducted the hERG-related toxicity assays. DL and DN supervised the study. All the authors have read and approved the manuscript.

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that there are no conflicts of interest.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Hyang-Mi Lee, Email: myhys84@cau.ac.kr.

Donghyun Lee, Email: dhlee@cau.ac.kr.

Dokyun Na, Email: blisszen@cau.ac.kr.

References

1.Tristani-Firouzi M, Chen J, Mitcheson JS, Sanguinetti MC. Molecular biology of K(+) channels and their role in cardiac arrhythmias. Am J Med. 2001;110(1):50–59. doi: 10.1016/s0002-9343(00)00623-9. [DOI] [PubMed] [Google Scholar]
2.Sanguinetti MC, Tristani-Firouzi M. hERG potassium channels and cardiac arrhythmia. Nature. 2006;440(7083):463–469. doi: 10.1038/nature04710. [DOI] [PubMed] [Google Scholar]
3.Laverty H, Benson C, Cartwright E, Cross M, Garland C, Hammond T, et al. How can we improve our understanding of cardiovascular safety liabilities to develop safer medicines? Br J Pharmacol. 2011;163(4):675–693. doi: 10.1111/j.1476-5381.2011.01255.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Polak S, Wisniowska B, Brandys J. Collation, assessment and analysis of literature in vitro data on hERG receptor blocking potency for subsequent modeling of drugs' cardiotoxic properties. J Appl Toxicol. 2009;29(3):183–206. doi: 10.1002/jat.1395. [DOI] [PubMed] [Google Scholar]
5.Kratz JM, Schuster D, Edtbauer M, Saxena P, Mair CE, Kirchebner J, et al. Experimentally validated hERG pharmacophore models as cardiotoxicity prediction tools. J Chem Inf Model. 2014;54(10):2887–2901. doi: 10.1021/ci5001955. [DOI] [PubMed] [Google Scholar]
6.Braga RC, Alves VM, Silva MF, Muratov E, Fourches D, Liao LM, et al. Pred-hERG: a novel web-accessible computational tool for predicting cardiac toxicity. Mol Inform. 2015;34(10):698–701. doi: 10.1002/minf.201500040. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Chemi G, Gemma S, Campiani G, Brogi S, Butini S, Brindisi M. Computational tool for fast in silico evaluation of hERG K(+) channel affinity. Front Chem. 2017;5:7. doi: 10.3389/fchem.2017.00007. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Munawar S, Windley MJ, Tse EG, Todd MH, Hill AP, Vandenberg JI, et al. Experimentally validated pharmacoinformatics approach to predict hERG inhibition potential of new chemical entities. Front Pharmacol. 2018;9:1035. doi: 10.3389/fphar.2018.01035. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Ekins S, Crumb WJ, Sarazan RD, Wikel JH, Wrighton SA. Three-dimensional quantitative structure-activity relationship for inhibition of human ether-a-go-go-related gene potassium channel. J Pharmacol Exp Ther. 2002;301(2):427–434. doi: 10.1124/jpet.301.2.427. [DOI] [PubMed] [Google Scholar]
10.Aronov AM. Common pharmacophores for uncharged human ether-a-go-go-related gene (hERG) blockers. J Med Chem. 2006;49(23):6917–6921. doi: 10.1021/jm060500o. [DOI] [PubMed] [Google Scholar]
11.Jing Y, Easter A, Peters D, Kim N, Enyedy IJ. In silico prediction of hERG inhibition. Future Med Chem. 2015;7(5):571–586. doi: 10.4155/fmc.15.18. [DOI] [PubMed] [Google Scholar]
12.Tan Y, Chen Y, You Q, Sun H, Li M. Predicting the potency of hERG K(+) channel inhibition by combining 3D-QSAR pharmacophore and 2D-QSAR models. J Mol Model. 2012;18(3):1023–1036. doi: 10.1007/s00894-011-1136-y. [DOI] [PubMed] [Google Scholar]
13.Cavalli A, Poluzzi E, De Ponti F, Recanatini M. Toward a pharmacophore for drugs inducing the long QT syndrome: insights from a CoMFA study of HERG K(+) channel blockers. J Med Chem. 2002;45(18):3844–3853. doi: 10.1021/jm0208875. [DOI] [PubMed] [Google Scholar]
14.Carosati E, Lemoine H, Spogli R, Grittner D, Mannhold R, Tabarrini O, et al. Binding studies and GRIND/ALMOND-based 3D QSAR analysis of benzothiazine type K(ATP)-channel openers. Bioorg Med Chem. 2005;13(19):5581–5591. doi: 10.1016/j.bmc.2005.06.010. [DOI] [PubMed] [Google Scholar]
15.Ermondi G, Visentin S, Caron G. GRIND-based 3D-QSAR and CoMFA to investigate topics dominated by hydrophobic interactions: the case of hERG K+ channel blockers. Eur J Med Chem. 2009;44(5):1926–1932. doi: 10.1016/j.ejmech.2008.11.009. [DOI] [PubMed] [Google Scholar]
16.Jia L, Sun H. Support vector machines classification of hERG liabilities based on atom types. Bioorg Med Chem. 2008;16(11):6252–6260. doi: 10.1016/j.bmc.2008.04.028. [DOI] [PubMed] [Google Scholar]
17.Wang S, Li Y, Wang J, Chen L, Zhang L, Yu H, et al. ADMET evaluation in drug discovery. 12. Development of binary classification models for prediction of hERG potassium channel blockage. Mol Pharm. 2012;9(4):996–1010. doi: 10.1021/mp300023x. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Le Guennec JY, Thireau J, Ouille A, Roussel J, Roy J, Richard S, et al. Inter-individual variability and modeling of electrical activity: a possible new approach to explore cardiac safety? Sci Rep. 2016;6:37948. doi: 10.1038/srep37948. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Thai KM, Ecker GF. A binary QSAR model for classification of hERG potassium channel blockers. Bioorg Med Chem. 2008;16(7):4107–4119. doi: 10.1016/j.bmc.2008.01.017. [DOI] [PubMed] [Google Scholar]
20.Czodrowski P. hERG me out. J Chem Inf Model. 2013;53(9):2240–2251. doi: 10.1021/ci400308z. [DOI] [PubMed] [Google Scholar]
21.Weininger David. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. Journal of Chemical Information and Modeling. 1988;28(1):31–36. [Google Scholar]
22.Mauri A, Consonni V, Pavan M, Todeschini R. Dragon software: an easy approach to molecular descriptor calculations. Match-Commun Math Co. 2006;56(2):237–248. [Google Scholar]
23.Rogers D, Hahn M. Extended-connectivity fingerprints. J Chem Inf Model. 2010;50(5):742–754. doi: 10.1021/ci100050t. [DOI] [PubMed] [Google Scholar]
24.Cox DR, Wermuth N. A comment on the coefficient of determination for binary responses. Am Stat. 1992;46(1):1–4. [Google Scholar]
25.Tate RF. Correlation between a discrete and a continuous variable. Point-biserial correlation. Ann Math Stat. 1954;25(3):603–607. [Google Scholar]
26.Kutner MH. Applied linear statistical models. 5th ed. Boston: McGraw-Hill Irwin; 2005. xxviii, p. 1396.
27.Hoerl AE, Kennard RW. Ridge regression: biased estimation for nonorthogonal problems. Technometrics. 1970;12(1):55–67. [Google Scholar]
28.Carey V, Zeger SL, Diggle P. Modelling multivariate binary data with alternating logistic regressions. Biometrika. 1993;80(3):517–526. [Google Scholar]
29.Yousef M, Nebozhyn M, Shatkay H, Kanterakis S, Showe LC, Showe MK. Combining multi-species genomic data for microRNA identification using a Naïve Bayes classifier. Bioinformatics. 2006;22(11):1325–1334. doi: 10.1093/bioinformatics/btl094. [DOI] [PubMed] [Google Scholar]
30.Boulesteix AL, Janitza S, Kruppa J, Konig IR. Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics. Wires Data Min Knowl. 2012;2(6):493–507. [Google Scholar]
31.Wang YH, Li Y, Yang SL, Yang L. An in silico approach for screening flavonoids as P-glycoprotein inhibitors based on a Bayesian-regularized neural network. J Comput Aided Mol Des. 2005;19(3):137–147. doi: 10.1007/s10822-005-3321-5. [DOI] [PubMed] [Google Scholar]
32.Demsar J, Curk T, Erjavec A, Gorup C, Hocevar T, Milutinovic M, et al. Orange: data mining toolbox in python. J Mach Learn Res. 2013;14:2349–2353. [Google Scholar]
33.Li Xiao, Zhang Yuan, Li Huanhuan, Zhao Yong. Modeling of the hERG K+ Channel Blockage Using Online Chemical Database and Modeling Environment (OCHEM) Molecular Informatics. 2017;36(12):1700074. doi: 10.1002/minf.201700074. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The datasets supporting the conclusions of this article are available from the corresponding author upon request.

[CR1] 1.Tristani-Firouzi M, Chen J, Mitcheson JS, Sanguinetti MC. Molecular biology of K(+) channels and their role in cardiac arrhythmias. Am J Med. 2001;110(1):50–59. doi: 10.1016/s0002-9343(00)00623-9. [DOI] [PubMed] [Google Scholar]

[CR2] 2.Sanguinetti MC, Tristani-Firouzi M. hERG potassium channels and cardiac arrhythmia. Nature. 2006;440(7083):463–469. doi: 10.1038/nature04710. [DOI] [PubMed] [Google Scholar]

[CR3] 3.Laverty H, Benson C, Cartwright E, Cross M, Garland C, Hammond T, et al. How can we improve our understanding of cardiovascular safety liabilities to develop safer medicines? Br J Pharmacol. 2011;163(4):675–693. doi: 10.1111/j.1476-5381.2011.01255.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR4] 4.Polak S, Wisniowska B, Brandys J. Collation, assessment and analysis of literature in vitro data on hERG receptor blocking potency for subsequent modeling of drugs' cardiotoxic properties. J Appl Toxicol. 2009;29(3):183–206. doi: 10.1002/jat.1395. [DOI] [PubMed] [Google Scholar]

[CR5] 5.Kratz JM, Schuster D, Edtbauer M, Saxena P, Mair CE, Kirchebner J, et al. Experimentally validated hERG pharmacophore models as cardiotoxicity prediction tools. J Chem Inf Model. 2014;54(10):2887–2901. doi: 10.1021/ci5001955. [DOI] [PubMed] [Google Scholar]

[CR6] 6.Braga RC, Alves VM, Silva MF, Muratov E, Fourches D, Liao LM, et al. Pred-hERG: a novel web-accessible computational tool for predicting cardiac toxicity. Mol Inform. 2015;34(10):698–701. doi: 10.1002/minf.201500040. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR7] 7.Chemi G, Gemma S, Campiani G, Brogi S, Butini S, Brindisi M. Computational tool for fast in silico evaluation of hERG K(+) channel affinity. Front Chem. 2017;5:7. doi: 10.3389/fchem.2017.00007. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR8] 8.Munawar S, Windley MJ, Tse EG, Todd MH, Hill AP, Vandenberg JI, et al. Experimentally validated pharmacoinformatics approach to predict hERG inhibition potential of new chemical entities. Front Pharmacol. 2018;9:1035. doi: 10.3389/fphar.2018.01035. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR9] 9.Ekins S, Crumb WJ, Sarazan RD, Wikel JH, Wrighton SA. Three-dimensional quantitative structure-activity relationship for inhibition of human ether-a-go-go-related gene potassium channel. J Pharmacol Exp Ther. 2002;301(2):427–434. doi: 10.1124/jpet.301.2.427. [DOI] [PubMed] [Google Scholar]

[CR10] 10.Aronov AM. Common pharmacophores for uncharged human ether-a-go-go-related gene (hERG) blockers. J Med Chem. 2006;49(23):6917–6921. doi: 10.1021/jm060500o. [DOI] [PubMed] [Google Scholar]

[CR11] 11.Jing Y, Easter A, Peters D, Kim N, Enyedy IJ. In silico prediction of hERG inhibition. Future Med Chem. 2015;7(5):571–586. doi: 10.4155/fmc.15.18. [DOI] [PubMed] [Google Scholar]

[CR12] 12.Tan Y, Chen Y, You Q, Sun H, Li M. Predicting the potency of hERG K(+) channel inhibition by combining 3D-QSAR pharmacophore and 2D-QSAR models. J Mol Model. 2012;18(3):1023–1036. doi: 10.1007/s00894-011-1136-y. [DOI] [PubMed] [Google Scholar]

[CR13] 13.Cavalli A, Poluzzi E, De Ponti F, Recanatini M. Toward a pharmacophore for drugs inducing the long QT syndrome: insights from a CoMFA study of HERG K(+) channel blockers. J Med Chem. 2002;45(18):3844–3853. doi: 10.1021/jm0208875. [DOI] [PubMed] [Google Scholar]

[CR14] 14.Carosati E, Lemoine H, Spogli R, Grittner D, Mannhold R, Tabarrini O, et al. Binding studies and GRIND/ALMOND-based 3D QSAR analysis of benzothiazine type K(ATP)-channel openers. Bioorg Med Chem. 2005;13(19):5581–5591. doi: 10.1016/j.bmc.2005.06.010. [DOI] [PubMed] [Google Scholar]

[CR15] 15.Ermondi G, Visentin S, Caron G. GRIND-based 3D-QSAR and CoMFA to investigate topics dominated by hydrophobic interactions: the case of hERG K+ channel blockers. Eur J Med Chem. 2009;44(5):1926–1932. doi: 10.1016/j.ejmech.2008.11.009. [DOI] [PubMed] [Google Scholar]

[CR16] 16.Jia L, Sun H. Support vector machines classification of hERG liabilities based on atom types. Bioorg Med Chem. 2008;16(11):6252–6260. doi: 10.1016/j.bmc.2008.04.028. [DOI] [PubMed] [Google Scholar]

[CR17] 17.Wang S, Li Y, Wang J, Chen L, Zhang L, Yu H, et al. ADMET evaluation in drug discovery. 12. Development of binary classification models for prediction of hERG potassium channel blockage. Mol Pharm. 2012;9(4):996–1010. doi: 10.1021/mp300023x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Le Guennec JY, Thireau J, Ouille A, Roussel J, Roy J, Richard S, et al. Inter-individual variability and modeling of electrical activity: a possible new approach to explore cardiac safety? Sci Rep. 2016;6:37948. doi: 10.1038/srep37948. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] 19.Thai KM, Ecker GF. A binary QSAR model for classification of hERG potassium channel blockers. Bioorg Med Chem. 2008;16(7):4107–4119. doi: 10.1016/j.bmc.2008.01.017. [DOI] [PubMed] [Google Scholar]

[CR20] 20.Czodrowski P. hERG me out. J Chem Inf Model. 2013;53(9):2240–2251. doi: 10.1021/ci400308z. [DOI] [PubMed] [Google Scholar]

[CR21] 21.Weininger David. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. Journal of Chemical Information and Modeling. 1988;28(1):31–36. [Google Scholar]

[CR22] 22.Mauri A, Consonni V, Pavan M, Todeschini R. Dragon software: an easy approach to molecular descriptor calculations. Match-Commun Math Co. 2006;56(2):237–248. [Google Scholar]

[CR23] 23.Rogers D, Hahn M. Extended-connectivity fingerprints. J Chem Inf Model. 2010;50(5):742–754. doi: 10.1021/ci100050t. [DOI] [PubMed] [Google Scholar]

[CR24] 24.Cox DR, Wermuth N. A comment on the coefficient of determination for binary responses. Am Stat. 1992;46(1):1–4. [Google Scholar]

[CR25] 25.Tate RF. Correlation between a discrete and a continuous variable. Point-biserial correlation. Ann Math Stat. 1954;25(3):603–607. [Google Scholar]

[CR26] 26.Kutner MH. Applied linear statistical models. 5th ed. Boston: McGraw-Hill Irwin; 2005. xxviii, p. 1396.

[CR27] 27.Hoerl AE, Kennard RW. Ridge regression: biased estimation for nonorthogonal problems. Technometrics. 1970;12(1):55–67. [Google Scholar]

[CR28] 28.Carey V, Zeger SL, Diggle P. Modelling multivariate binary data with alternating logistic regressions. Biometrika. 1993;80(3):517–526. [Google Scholar]

[CR29] 29.Yousef M, Nebozhyn M, Shatkay H, Kanterakis S, Showe LC, Showe MK. Combining multi-species genomic data for microRNA identification using a Naïve Bayes classifier. Bioinformatics. 2006;22(11):1325–1334. doi: 10.1093/bioinformatics/btl094. [DOI] [PubMed] [Google Scholar]

[CR30] 30.Boulesteix AL, Janitza S, Kruppa J, Konig IR. Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics. Wires Data Min Knowl. 2012;2(6):493–507. [Google Scholar]

[CR31] 31.Wang YH, Li Y, Yang SL, Yang L. An in silico approach for screening flavonoids as P-glycoprotein inhibitors based on a Bayesian-regularized neural network. J Comput Aided Mol Des. 2005;19(3):137–147. doi: 10.1007/s10822-005-3321-5. [DOI] [PubMed] [Google Scholar]

[CR32] 32.Demsar J, Curk T, Erjavec A, Gorup C, Hocevar T, Milutinovic M, et al. Orange: data mining toolbox in python. J Mach Learn Res. 2013;14:2349–2353. [Google Scholar]

[CR33] 33.Li Xiao, Zhang Yuan, Li Huanhuan, Zhao Yong. Modeling of the hERG K+ Channel Blockage Using Online Chemical Database and Modeling Environment (OCHEM) Molecular Informatics. 2017;36(12):1700074. doi: 10.1002/minf.201700074. [DOI] [PubMed] [Google Scholar]

PERMALINK

Computational determination of hERG-related cardiotoxicity of drug candidates

Hyang-Mi Lee

Myeong-Sang Yu

Sayada Reemsha Kazmi

Seong Yun Oh

Ki-Hyeong Rhee

Myung-Ae Bae

Byung Ho Lee

Dae-Seop Shin

Kwang-Seok Oh

Hyithaek Ceong

Donghyun Lee

Dokyun Na

Conference

Abstract

Background

Result

Conclusion

Background

Materials and methods

Binding assay for hERG based on fluorescence polarization

In vivo experimental procedures and recordings of electrocardiography

Data preparation

Descriptor calculation

Feature correlation calculation and feature selection

Models

Performance evaluation

Results and discussion

Model construction

Table 1.

Fig. 1.

Table 2.

Model optimization

Table 3.

Test of the constructed model on in vivo data

Table 4.

Table 5.

Additional comparison with previous models

Table 6.

Conclusion

Acknowledgements

Funding

Availability of data and materials

About this supplement

Abbreviations

Authors’ contributions

Ethics approval and consent to participate

Consent for publication

Competing interests

Publisher’s Note

Contributor Information

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases