Skip to main content
Springer logoLink to Springer
. 2016 Feb 10;30:229–236. doi: 10.1007/s10822-016-9898-z

A k-nearest neighbor classification of hERG K+ channel blockers

Swapnil Chavan 1,, Ahmed Abdelaziz 2, Jesper G Wiklander 1, Ian A Nicholls 1,3,
PMCID: PMC4802000  PMID: 26860111

Abstract

A series of 172 molecular structures that block the hERG K+ channel were used to develop a classification model where, initially, eight types of PaDEL fingerprints were used for k-nearest neighbor model development. A consensus model constructed using Extended-CDK, PubChem and Substructure count fingerprint-based models was found to be a robust predictor of hERG activity. This consensus model demonstrated sensitivity and specificity values of 0.78 and 0.61 for the internal dataset compounds and 0.63 and 0.54 for the external (PubChem) dataset compounds, respectively. This model has identified the highest number of true positives (i.e. 140) from the PubChem dataset so far, as compared to other published models, and can potentially serve as a basis for the prediction of hERG active compounds. Validating this model against FDA-withdrawn substances indicated that it may even be useful for differentiating between mechanisms underlying QT prolongation.

Electronic supplementary material

The online version of this article (doi:10.1007/s10822-016-9898-z) contains supplementary material, which is available to authorized users.

Keywords: Classification model, hERG blockers, Ikr, KCNH2, k-nearest neighbor (k-NN), Toxicity

Introduction

The human ether-a-go-go related gene (hERG, KCNH2) encodes for a voltage dependent K+ ion channel (Kv11.1). Blocking of this channel has been associated with potential severe heart arrhythmia, and because of this, several drugs have been withdrawn from the market [16]. Further, the drug-induced long QT syndrome may cause avoidable sudden cardiac arrest [3, 4]. With the intention of protecting clinical trial participants and patients, the International Conference of Harmonization published a guideline (S7B) recommending that “all new drugs” should be tested pre-clinically for hERG sensitivity and cardiac safety before submitting an application to regulatory reviews [7]. Accordingly, the early assessment of hERG-related cardiotoxicity has become a common practice in drug discovery.

Many in vitro assays exist for the pre-clinical evaluation of hERG-related cardiotoxicity [8], examples include rubidium-flux assays, radioligand binding assays, in vitro electrophysiology measurements, and fluorescence-based assays [9]. In addition, in silico models have been proposed for identifying potential hERG blockers in drug discovery processes [10, 11].

Efforts to use computational methods for the prediction of hERG blocking effects have ranged from the use of simple rules based on structural and functional features, through to more complex quantitative structure–activity relationship (QSAR) models [1216]. A number of QSAR models have been developed for the hERG toxicity endpoint using different machine learning algorithms, such as multiple linear regressions [17], partial least squares (PLS) [18], k-nearest neighbor algorithms (k-NN) [19], artificial neural networks [20], support vector machines (SVM) [21], random forest [22] and naive Bayesian classifications [23]. Despite these efforts there is significant scope for development of more powerful and more easily deployed predictive models.

The recent development of open source fingerprints, such as PaDEL fingerprints, which are libraries of descriptors [24], allows for ready access to tools for predicting biological endpoints. A recent report on the use of PaDEL fingerprints in conjunction with a k-NN strategy aimed at the prediction of chronic toxicity [25] prompted us to apply this approach to hERG-channel blockers, a far more focused system. It was envisaged that publicly available data on a series of hERG-channel blockers could function as a starting point for model construction, and a series of 1953 PubChem compounds could act as basis for validation.

Methodology

Description of dataset

IC50 data for 172 Ikr (‘rapid’ delayed rectifier current) channel blockers were retrieved from the webservers OCHEM [26] and Fenichel [27]. These 172 compounds are structurally diverse and belong to different therapeutic classes. The compounds were authenticated with respect to structure and IUPAC name. After authentication, the SMILES notations for all the 172 compounds were verified using ChemSpider [28], SigmaAldrich [29] and PubChem [30]. A PubChem dataset comprised of 1953 entries was chosen for the external validation [31]. Dataset entries that were mixtures or salts were discarded, leading to a final PubChem validation set of 1795 compounds. More details about the training and test set compounds are provided in the Online Resources 1 and 2, respectively.

Descriptor calculation

The descriptor calculation was a primary requirement for the construction of the classification model. Eight types of PaDEL fingerprints were calculated for both the training and test set compounds using PaDEL software [24]. These consisted of the CDK, Extended CDK, CDK Graph, Estate, MACCS, PubChem, Sub-structure and Sub-structure count fingerprints. Each of the eight types of fingerprints was then used, separately, to develop a classification model.

Class assignment

The training set compounds were split into one of the two classes (active and inactive) using an IC50 threshold value of 5 µM. The PubChem dataset derived test set compounds were similarly classified, i.e. as either active or inactive, here using a % inhibition threshold of 20 %. A summary of the numbers of the compounds and their classes is provided in Table 1.

Table 1.

Classification of training and test set compounds

Class 1 (hERG active) Class 2 (hERG inactive) Total
Training 93 79 172
Test 221 1574 1795

Software and modules

The Matlab module “classification_toolbox” [32] was employed for the development of the k-NN classification model. The Matlab module is freely available at [33].

Classification model development

The k-nearest neighbor (k-NN) classification method employed used cross validation (CV) to identify optimal k values [34, 35]. A series of k values (from 1 to 10) were assigned to construct the model, and by determining the lowest class error, optimal k values were identified.

A five-step cross validation was implemented by first dividing the training set into five equal groups, four of which were used for model construction and the remaining for validation. This procedure was repeated so that each of the five groups was used for validating the models constructed using the remaining four. After cross validation, the models were subjected to external validation using the 1795 PubChem compounds. The performance of each classification model was assessed by means of statistical parameters, such as non-error rate (NER), sensitivity, specificity, precision and error rate [36]. The models were then analysed and compared on the basis of these statistical parameters.

Results and discussion

Construction of eight k-NN classification models

The k-nearest neighbor (k-NN) classification method was employed to construct classification models using each of the eight PaDEL fingerprints. Employing the k-NN algorithm requires that the optimal value of k is determined [34]. There are several ways to determine the k value, e.g. through application of a risk function or empirical rules, or through cross validation. Here, cross validation was used to determine the optimal k value.

A series of eight k-NN classification models was constructed using each of the PaDEL fingerprints, and compared with respect to a series of statistical parameters, Table 2.

Table 2.

Summary of statistical parameters for the k-NN classification models

Entry Fingerprints NER k Sensitivity Specificity
Class 1 Class 2 Class 1 Class 2
1 CDK
Fitting 0.68 1 0.72 0.65 0.65 0.72
CV 0.66 1 0.72 0.61 0.61 0.72
External 0.54 1 0.52 0.57 0.57 0.52
2 Estate
Fitting 0.68 1 0.73 0.62 0.62 0.73
CV 0.66 1 0.72 0.61 0.61 0.72
External 0.53 1 0.49 0.57 0.57 0.49
3 Extended CDK
Fitting 0.67 1 0.70 0.63 0.63 0.70
CV 0.65 1 0.70 0.61 0.61 0.70
External 0.56 1 0.56 0.57 0.57 0.56
4 CDK graph
Fitting 0.64 1 0.69 0.59 0.59 0.69
CV 0.64 1 0.70 0.58 0.58 0.70
External 0.55 1 0.52 0.57 0.57 0.52
5 MACCS
Fitting 0.68 6 0.76 0.59 0.59 0.76
CV 0.67 6 0.76 0.57 0.57 0.76
External 0.55 6 0.54 0.55 0.55 0.54
6 PubChem
Fitting 0.60 3 0.69 0.52 0.52 0.69
CV 0.60 3 0.71 0.49 0.49 0.71
External 0.57 3 0.62 0.52 0.52 0.62
7 Sub-structure
Fitting 0.68 1 0.70 0.67 0.67 0.70
CV 0.67 1 0.69 0.66 0.66 0.69
External 0.57 1 0.54 0.59 0.59 0.54
8 Sub-structure count
Fitting 0.67 1 0.74 0.61 0.61 0.74
CV 0.68 1 0.72 0.65 0.65 0.72
External 0.58 1 0.61 0.56 0.56 0.61

CDK fingerprints are one-dimensional 1024 bit long arrays that are arranged based upon the occurrence of particular structural elements. The Extended CDK fingerprints are extended versions of CDK fingerprints that include ring features. Graph fingerprints are specialized versions of the CDK fingerprints that exclude bond orders. Estate fingerprints represent the influence of substituent electronic effects in a given compound. PubChem fingerprints are binary substructure fingerprints of length 881. MACCS fingerprints consist of 166 keys that are based on SMARTS patterns [37, 38]. The Sub-structure fingerprints represent 307 SMARTS patterns for different functional groups, whereas the count of these SMARTS patterns is referred to as the Sub-structure count fingerprint [37].

The sensitivity expresses the prediction accuracy of hERG-active compounds, whereas specificity reflects the prediction accuracy for hERG-inactive compounds. The models performed similarly in terms of the statistical parameters examined. Thus, to further improve the predictive power of these models we developed a series of consensus models. Several methods have been reported for consensus model development [39]. For classification models, the majority principle [40] is commonly employed and we have used this strategy to develop consensus models based upon three, five and seven different fingerprint-based models. As it is more important to identify hERG-active compounds than hERG-inactive compounds, the eight models (from Table 2) were examined with respect to their sensitivity in the external prediction. The Estate-fingerprint-based model exhibited relatively poor sensitivity (0.49) and was discarded from the consensus model building procedure to provide an odd number (seven) of fingerprints. Six consensus models were built using different combinations of the seven remaining fingerprint-based models, Table 3.

Table 3.

Statistical parameters for the consensus models

Modela Dataset TPb FPc TNd FNe TP + TN Totalf Qg Sens.h Spec.i Prec.j G-meank
1 Training 72 25 54 21 126 172 0.73 0.77 0.68 0.74 0.73
Validation 130 654 920 91 1050 1795 0.58 0.59 0.58 0.17 0.59
2 Training 73 31 48 20 121 172 0.70 0.78 0.61 0.70 0.69
Validation 140 723 851 81 991 1795 0.55 0.63 0.54 0.16 0.59
3 Training 71 31 48 22 119 172 0.69 0.76 0.61 0.70 0.68
Validation 135 707 867 86 1002 1795 0.56 0.61 0.55 0.16 0.58
4 Training 74 32 47 19 121 172 0.70 0.80 0.59 0.70 0.69
Validation 128 718 856 93 984 1795 0.55 0.58 0.54 0.15 0.56
5 Training 73 29 50 20 123 172 0.72 0.78 0.63 0.72 0.70
Validation 132 685 889 89 1021 1795 0.57 0.60 0.56 0.16 0.58
6 Training 73 28 51 20 124 172 0.72 0.78 0.65 0.72 0.71
Validation 131 675 899 90 1030 1795 0.57 0.59 0.57 0.16 0.58

aModel 1 = substructure (SS) + substructure count (SSC) + extended CDK (ECDK), 2 = PubChem (PC) + SSC + ECDK, 3 = PC + SSC + SS, 4 = PC + SSC + MACCS, 5 = PC + SSC + ECDK + SC + MACCS, 6 = PC + SSC + ECDK + SS + MACCS + CDK + CDK Graph, b true positives, c false positives, d true negatives, e false negatives, f TP + TN + FP + FN, g overall accuracy of prediction, h sensitivity, i specificity, j precision, k  Sensitivity×Specificity

Although consensus model 1 shows better overall accuracy of prediction (Q), consensus model 2 shows higher sensitivity for test set prediction, and was thus chosen for further studies.

Individual contribution of each model

With consensus model 2 in hand, we then examined how individual training set compounds were handled by the consensus model as well as the individual models, i.e. Extended CDK, PubChem and Substructure count fingerprint based, Fig. 1.

Fig. 1.

Fig. 1

Venn diagram representing the number of training set compounds correctly predicted by all three models (yellow), by any two models (magenta), by only one model (blue) and by none of the models (green). The shaded area represents compounds correctly predicted by the consensus model

The consensus model correctly predicted 121 of the 172 training set compounds. 69 of these 121 compounds were predicted correctly by all three individual models, while the remaining 52 compounds were correctly predicted by any two of the three models. Conversely, the consensus model incorrectly predicted 51 training set compounds. Of these 51, 25 compounds were predicted correctly by any one of the three models, whereas the remaining 26 compounds were incorrectly predicted by all three models.

In the case of the Extended fingerprint based model, 113 of 172 compounds were correctly predicted, 65 of which were hERG actives. The PubChem fingerprint based model predicted 105 compounds correctly from the training set. Among the 105 correctly predicted compounds, 66 were from class 1 and 39 from class 2. The Substructure count fingerprint based model predicted 118 training set compounds correctly. These 118 compounds were comprised of 67 compounds from class 1 and 51 compounds from class 2.

Compounds for which activities were not correctly predicted by our models are of interest as awareness of factors contributing to the incorrect prediction of compounds can help in the refinement of models. In this case, the IC50 value-based endpoints are derived from a range of studies so impact of inter-laboratory variation in the reported IC50 data on model performance cannot be excluded.

Comparison of our model with other models

External validation provides an assessment of the QSAR model’s performance, and to compare models it is necessary that the external validations are performed on the same dataset. The PubChem dataset is comprised of 221 hERG-actives and 1574 hERG-inactives. Sensitivity and specificity are generally used to assess classification performance in imbalanced binary class studies [41]. G-mean, which is a geometric mean of sensitivity and specificity, was also used to measure the performance of the classification method in predicting actives and inactives. In studies aimed at the effective detection of only one class, as in our case where the prediction of hERG-actives is a priority, sensitivity and F-measures are often adopted [41]. Accordingly, we have compared our model with previously published models that were externally validated with the PubChem dataset [18, 4244], with respect to sensitivity, specificity, G-mean and F-measure, Table 4.

Table 4.

Comparison of the k-NN classification model with other models

Model Our study Su et al. [42] Wang et al. [43] Su et al. [18] Li et al. [44]
Method k-NN SVM Naive Bayesian classifier PLS transformed into binary QSAR SVM
Descriptors 2D PaDEL fingerprints 2D and 3D MOE, 4D fingerprints from MD simulation Physico-chemical property based and geometry based descriptors, and fingerprints 2D and 3D MOE descriptors and 4D fingerprints GRIND descriptors derived from docking
Training set
Cut-off (µM) 5 10 40 40
Total 172 546 719 250 495
True positives 73 188 247 83
True negatives 48 242 315 283
Sensitivity 0.78 0.90 0.89 0.55
Specificity 0.61 0.72 0.72 0.83
Q 0.70 0.79 0.78 0.74
F-measurea 0.74 0.76 0.76 0.56
G-mean 0.69 0.80 0.80 0.67
Test set
Cut-off (%)b 20 20 20 20 20
Total 1795 1668 1953 1668 1877
True positives 140 67 135 121 107
True negatives 851 1298 1247 963 1271
Sensitivity 0.63 0.41 0.54 0.74 0.57
Specificity 0.54 0.86 0.73 0.64 0.75
Q 0.55 0.82 0.71 0.65 0.73
F-measure 0.26 0.31 0.32 0.29 0.30
G-mean 0.59 0.60 0.63 0.69 0.66

a2[(precision*sensitivity)/(precision + sensitivity)], b % hERG blockage

As presented in Table 4, three of the four previously described models demonstrate lower overall sensitivities than our model, though it should be pointed out that IC50 thresholds used in the various studies varied between 5 and 40 µM. From a drug development perspective, it may be argued that it is of more interest to identify the potent hERG blockers (class 1) than hERG inactive compounds (class 2). Comparison on this point reveals that our model demonstrates better performance in predicting the hERG active compounds (True positives = 140, Sensitivity = 0.63) than the other models except that of Su et al. [18] in their model presented 2010. There, 163 hERG actives from the PubChem dataset were used for the external validation, whereas in our study a somewhat more comprehensive external validation was performed using 221 hERG actives.

From a practical perspective, ease of use is an issue of importance and an advantage of our model is that PaDEL fingerprints are fast and easy to calculate and do not involve complicated descriptor selection procedures. This is in contrast with all the other models presented in Table 4 that all employed 3D and 4D descriptors that require geometry optimization, a task necessitating significant computational resources. In addition, the application of different descriptor selection procedures makes these tasks more cumbersome. Therefore, in comparison to the other models, our model has the advantage of being fast, simple and relatively efficient in predicting hERG toxic compounds.

To further assess the potential of our consensus model, we turned our attention to the series of 47 substances withdrawn from use on account of QT-prolongation, which can be hERG-derived, as present in the WITHDRAWN database [45] (database last updated December 2015). Our training set had included 32 of these 47 drugs (shown in bold in Online Resource 1) of which our model had correctly predicted the IC50-based classes of 22. We interrogated the remaining 15 withdrawn substances (see Online Resource 3) using our model, which correctly predicted the IC50-based classes of 11 (73 %, see Online Resource 4). It is important to note that our model is solely based upon in vitro data (hERG IC50), while the basis for withdrawal, QT prolongation, is in vivo data-derived. The interpretation of the QT prolongation endpoint is itself a major challenge as mechanisms other than hERG activity can also underlie QT prolongation [4, 46, 47]. This is reflected in the fact that substances were correctly classified as class 1 or class 2, five and six substances respectively, based on their hERG IC50. This observation suggests that the model may even be useful for differentiating between mechanisms underlying QT prolongation.

A general reflection upon examining the hERG active compounds predicted by our model was the prevalence of aromatic and basic functionalities in these compounds (for example, see Online Resource 2). These features have previously been identified as essential components in a pharmacophore for central nervous system activity [48, 49] and we believe should be considered in future model development. Moreover, this may be considered indicative of a common evolutionary origin for the hERG voltage dependent K+ ion channel and CNS receptors [50, 51].

Conclusion

In conclusion, PaDEL fingerprint-based k-NN classification models presented here show potential as tools for the prediction of the hERG toxicity endpoint, an important issue in modern drug development. In particular, the consensus model developed using the Extended CDK, PubChem and Sub-structure count fingerprint-based models performed comparably with models employing more complicated descriptors in the validation with external datasets. Moreover, the model presented here, in terms of the prediction of hERG toxicity, compares most favorably with these previously published models. Moreover, validating this model against FDA-withdrawn substances indicates that the model may be useful for differentiating between hERG-derived QT prolongation and other QT prolongation mechanisms. Accordingly, we believe that this model may provide a basis for improved drug design.

Electronic supplementary material

Acknowledgments

We acknowledge financial support from the EU FP-7 Environmental Chemoinformatics (ECO) project (Grant Number-238701) and Linnaeus University, Sweden, and express our sincere thanks to Dr. Igor Tetko for valuable advice, comments and guidance during this work. The authors also thank Dr. Yurii Sushko, Dr. Robert Körner and Dr. Sergii Novotarskyi from eADMET, Germany, for their assistance with data collection and technical support. Finally, the authors sincerely thank Prof. Roberto Todeschini (Chemometrics and QSAR research group, University of Milan, Italy) for sharing the classification_toolbox Matlab routines for the k-NN model development.

Abbreviations

CDK

Chemistry development kit

CV

Cross validation

hERG

Human ether-a-go-go-related gene

IUPAC

International union of pure and applied chemistry

k-NN

k-nearest neighbor

MACCS

Molecular ACCess system

NER

Non-error rate

QSAR

Quantitative structure–activity relationship

SMARTS

SMILES arbitrary target specification

SMILES

Simplified molecular-input line-entry system

Compliance with ethical standards

Conflict of interest

The authors declare no conflict of interest.

Contributor Information

Swapnil Chavan, Email: swapnil.chavan@lnu.se.

Ian A. Nicholls, Email: ian.nicholls@lnu.se

References

  • 1.Warmke JW, Ganetzky B. A family of potassium channel genes related to eag in Drosophila and mammals. Proc Natl Acad Sci. 1994;91(8):3438–3442. doi: 10.1073/pnas.91.8.3438. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Choe H, Nah KH, Lee SN, Lee HS, Lee HS, Jo SH, Leem CH, Jang YJ. A novel hypothesis for the binding mode of HERG channel blockers. Biochem Biophys Res Commun. 2006;344(1):72–78. doi: 10.1016/j.bbrc.2006.03.146. [DOI] [PubMed] [Google Scholar]
  • 3.Raschi E, Ceccarini L, De Ponti F, Recanatini M. hERG-related drug toxicity and models for predicting hERG liability and QT prolongation. Expert Opin Drug Metab Toxicol. 2009;5(9):1005–1021. doi: 10.1517/17425250903055070. [DOI] [PubMed] [Google Scholar]
  • 4.Redfern W, Carlsson L, Davis A, Lynch W, MacKenzie I, Palethorpe S, Siegl P, Strang I, Sullivan A, Wallis R. Relationships between preclinical cardiac electrophysiology, clinical QT interval prolongation and torsade de pointes for a broad range of drugs: evidence for a provisional safety margin in drug development. Cardiovasc Res. 2003;58(1):32–45. doi: 10.1016/S0008-6363(02)00846-5. [DOI] [PubMed] [Google Scholar]
  • 5.De Ponti F, Poluzzi E, Montanaro N. QT-interval prolongation by non-cardiac drugs: lessons to be learned from recent experience. Eur J Clin Pharmacol. 2000;56(1):1–18. doi: 10.1007/s002280050714. [DOI] [PubMed] [Google Scholar]
  • 6.Meyer T, Boven KH, Günther E, Fejtl M. Micro-electrode arrays in cardiac safety pharmacology. Drug Saf. 2004;27(11):763–772. doi: 10.2165/00002018-200427110-00002. [DOI] [PubMed] [Google Scholar]
  • 7.Darpo B, Nebout T, Sager PT. Clinical evaluation of QT/QTc prolongation and proarrhythmic potential for nonantiarrhythmic drugs: the international conference on harmonization of technical requirements for registration of pharmaceuticals for human use E14 guideline. J Clin Pharmacol. 2006;46(5):498–507. doi: 10.1177/0091270006286436. [DOI] [PubMed] [Google Scholar]
  • 8.Mitcheson JS. hERG potassium channels and the structural basis of drug-induced arrhythmias. Chem Res Toxicol. 2008;21(5):1005–1010. doi: 10.1021/tx800035b. [DOI] [PubMed] [Google Scholar]
  • 9.Polak S, Wiśniowska B, Brandys J. Collation, assessment and analysis of literature in vitro data on hERG receptor blocking potency for subsequent modeling of drugs’ cardiotoxic properties. J Appl Toxicol. 2009;29(3):183–206. doi: 10.1002/jat.1395. [DOI] [PubMed] [Google Scholar]
  • 10.Cavalli A, Poluzzi E, De Ponti F, Recanatini M. Toward a pharmacophore for drugs inducing the long QT syndrome: insights from a CoMFA study of HERG K + channel blockers. J Med Chem. 2002;45(18):3844–3853. doi: 10.1021/jm0208875. [DOI] [PubMed] [Google Scholar]
  • 11.Wang S, Li Y, Xu L, Li D, Hou T. Recent developments in computational prediction of HERG blockage. Curr Top Med Chem. 2013;13(11):1317–1326. doi: 10.2174/15680266113139990036. [DOI] [PubMed] [Google Scholar]
  • 12.Perry M, Stansfeld PJ, Leaney J, Wood C, de Groot MJ, Leishman D, Sutcliffe MJ, Mitcheson JS. Drug binding interactions in the inner cavity of HERG channels: molecular insights from structure–activity relationships of clofilium and ibutilide analogs. Mol Pharmacol. 2006;69(2):509–519. doi: 10.1124/mol.105.016741. [DOI] [PubMed] [Google Scholar]
  • 13.Sănchez-Chapula JA, Ferrer T, Navarro-Polanco RA, Sanguinetti MC. Voltage-dependent profile of human ether-a-go-go-related gene channel block is influenced by a single residue in the S6 transmembrane domain. Mol Pharmacol. 2003;63(5):1051–1058. doi: 10.1124/mol.63.5.1051. [DOI] [PubMed] [Google Scholar]
  • 14.Milnes JT, Crociani O, Arcangeli A, Hancox JC, Witchel HJ. Blockade of HERG potassium currents by fluvoxamine: incomplete attenuation by S6 mutations at F656 or Y652. Br J Pharmacol. 2003;139(5):887–898. doi: 10.1038/sj.bjp.0705335. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Kamiya K, Niwa R, Mitcheson JS, Sanguinetti MC. Molecular determinants of HERG channel block. Mol Pharmacol. 2006;69(5):1709–1716. doi: 10.1124/mol.105.020990. [DOI] [PubMed] [Google Scholar]
  • 16.Aronov AM. Predictive in silico modeling for hERG channel blockers. Drug Discov Today. 2005;10(2):149–155. doi: 10.1016/S1359-6446(04)03278-7. [DOI] [PubMed] [Google Scholar]
  • 17.Pourbasheer E, Beheshti A, Khajehsharifi H, Ganjali MR, Norouzi P. QSAR study on hERG inhibitory effect of kappa opioid receptor antagonists by linear and non-linear methods. Med Chem Res. 2013;22(9):4047–4058. doi: 10.1007/s00044-012-0412-4. [DOI] [Google Scholar]
  • 18.Su BH, Shen MY, Esposito EX, Hopfinger AJ, Tseng YJ. In silico binary classification QSAR models based on 4D-fingerprints and MOE descriptors for prediction of hERG blockage. J Chem Inf Model. 2010;50(7):1304–1318. doi: 10.1021/ci100081j. [DOI] [PubMed] [Google Scholar]
  • 19.Gunturi SB, Archana K, Khandelwal A, Narayanan R. Prediction of hERG potassium channel blockade using kNN-QSAR and local lazy regression methods. QSAR Comb Sci. 2008;27(11–12):1305–1317. doi: 10.1002/qsar.200810072. [DOI] [Google Scholar]
  • 20.Thai KM, Ecker GF. Similarity-based SIBAR descriptors for classification of chemically diverse hERG blockers. Mol Divers. 2009;13(3):321–336. doi: 10.1007/s11030-009-9117-0. [DOI] [PubMed] [Google Scholar]
  • 21.Yap C, Cai C, Xue Y, Chen Y. Prediction of torsade-causing potential of drugs by support vector machine approach. Toxicol Sci. 2004;79(1):170–177. doi: 10.1093/toxsci/kfh082. [DOI] [PubMed] [Google Scholar]
  • 22.Wiśniowska B, Mendyk A, Polak M, Szlęk J, Polak S. Randomforest based assessment of the hERG channel inhibition potential for the early drug cardiotoxicity testing. BAMS. 2010;6:131–136. [Google Scholar]
  • 23.Sun H. An accurate and interpretable Bayesian classification model for prediction of hERG liability. Chem Med Chem. 2006;1(3):315–322. doi: 10.1002/cmdc.200500047. [DOI] [PubMed] [Google Scholar]
  • 24.Yap CW. PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J Comput Chem. 2011;32(7):1466–1474. doi: 10.1002/jcc.21707. [DOI] [PubMed] [Google Scholar]
  • 25.Chavan S, Friedman R, Nicholls IA. Acute toxicity-supported chronic toxicity prediction: a k-nearest neighbor coupled read-across strategy. Int J Mol Sci. 2015;16(5):11659–11677. doi: 10.3390/ijms160511659. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Sushko I, Novotarskyi S, Körner R, Pandey AK, Rupp M, Teetz W, Brandmaier S, Abdelaziz A, Prokopenko VV, Tanchuk VY. Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information. J Comput Aided Mol Des. 2011;25(6):533–554. doi: 10.1007/s10822-011-9440-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Fenichel dataset. http://www.fenichel.net/pages/Professional/subpages/QT/Tables/pbydrug.htm. Accessed 11 Sept 2015
  • 28.Pence HE, Williams A. ChemSpider: an online chemical information resource. J Chem Educ. 2010;87(11):1123–1124. doi: 10.1021/ed100697w. [DOI] [Google Scholar]
  • 29.Lenga RE, Votoupal KL. The Sigma-Aldrich library of regulatory and safety data. Wisconsin: Aldrich Chemical Company; 1993. [Google Scholar]
  • 30.Bolton EE, Wang Y, Thiessen PA, Bryant SH. PubChem: integrated platform of small molecules and biological activities. Annu Rep Comput Chem. 2008;4:217–241. doi: 10.1016/S1574-1400(08)00012-1. [DOI] [Google Scholar]
  • 31.PubChem Bioassay: hERG channel activity. https://pubchem.ncbi.nlm.nih.gov/assay/assaydata.html?aid=376. Accessed 11 Sept 2015
  • 32.Ballabio D, Consonni V. Classification tools in chemistry. Part 1: linear models. PLS-DA. Anal. Methods. 2013;5(16):3790–3798. [Google Scholar]
  • 33.Classification Toolbox. http://michem.disat.unimib.it/chm/download/classificationinfo.htm. Accessed 11 Sept 2015
  • 34.Kowalski B, Bender C. k-nearest neighbor classification rule (pattern recognition) applied to nuclear magnetic resonance spectral interpretation. Anal Chem. 1972;44(8):1405–1411. doi: 10.1021/ac60316a008. [DOI] [Google Scholar]
  • 35.Chavan S, Nicholls IA, Karlsson BC, Rosengren AM, Ballabio D, Consonni V, Todeschini R. Towards global QSAR model building for acute toxicity: Munro database case study. Int J Mol Sci. 2014;15(10):18162–18174. doi: 10.3390/ijms151018162. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Ballabio D, Todeschini R. In: Infrared spectroscopy for food quality analysis and control. Sun D-W, editor. Amsterdam: Elsevier; 2009. p. 2009. [Google Scholar]
  • 37.Chem Des. Molecular fingerprints library. http://www.scbdd.com/chemdes/list-fingerprints/. Accessed 11 Sept 2015
  • 38.Daylight Chemical Information Systems theory manual. http://www.daylight.com/dayhtml/doc/theory/theory.finger.html. Accessed 11 Sept 2015
  • 39.Mansouri K, Ringsted T, Ballabio D, Todeschini R, Consonni V. Quantitative structure–activity relationship models for ready biodegradability of chemicals. J Chem Inf Model. 2013;53(4):867–878. doi: 10.1021/ci4000213. [DOI] [PubMed] [Google Scholar]
  • 40.Pavan M, Worth A, Netzeva T (2015) Preliminary analysis of an aquatic toxicity dataset and assessment of QSAR models for narcosis. https://eurl-ecvam.jrc.ec.europa.eu/laboratories-research/predictive_toxicology/information-sources/qsar-document-area/Report_QSAR_model_for_narcosis.pdf. Joint research center, European Comission, Ispra, Italy, 2005. Accessed 5 Nov 2015
  • 41.Tang Y, Zhang YQ, Chawla NV, Krasser S. SVMs modeling for highly imbalanced classification. IEEE Trans Syst Man Cybern. 2009;39(1):281–288. doi: 10.1109/TSMCB.2008.2002909. [DOI] [PubMed] [Google Scholar]
  • 42.Su BH, Tu YS, Esposito EX, Tseng YJ. Predictive toxicology modeling: protocols for exploring hERG classification and Tetrahymena pyriformis end point predictions. J Chem Inf Model. 2012;52(6):1660–1673. doi: 10.1021/ci300060b. [DOI] [PubMed] [Google Scholar]
  • 43.Wang S, Li Y, Wang J, Chen L, Zhang L, Yu H, Hou T. ADMET evaluation in drug discovery. 12. Development of binary classification models for prediction of hERG potassium channel blockage. Mol Pharm. 2012;9(4):996–1010. doi: 10.1021/mp300023x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Li Q, Jørgensen FS, Oprea T, Brunak S, Taboureau O. hERG classification model based on a combination of support vector machine method and GRIND descriptors. Mol Pharm. 2008;5(1):117–127. doi: 10.1021/mp700124e. [DOI] [PubMed] [Google Scholar]
  • 45.WITHDRAWN: A resource for withdrawn and discontinued drugs. http://cheminfo.charite.de/withdrawn/. Accessed 26 Jan 2016 [DOI] [PMC free article] [PubMed]
  • 46.Gupta A, Lawrence AT, Krishnan K, Kavinsky CJ, Trohman RG. Current concepts in the mechanisms and management of drug-induced QT prolongation and torsade de pointes. Am Heart J. 2007;153(6):891–899. doi: 10.1016/j.ahj.2007.01.040. [DOI] [PubMed] [Google Scholar]
  • 47.Yap YG, Camm AJ. Drug induced QT prolongation and torsades de pointes. Heart. 2003;89(11):1363–1372. doi: 10.1136/heart.89.11.1363. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Lloyd EJ, Andrews PR. A common structural model for central nervous system drugs and their receptors. J Med Chem. 1986;29(4):453–462. doi: 10.1021/jm00154a005. [DOI] [PubMed] [Google Scholar]
  • 49.Andrews P, Lloyd E. A common structural basis for CNS drug action. J Pharm Pharmacol. 1983;35(8):516–518. doi: 10.1111/j.2042-7158.1983.tb04821.x. [DOI] [PubMed] [Google Scholar]
  • 50.Moran Y, Barzilai MG, Liebeskind BJ, Zakon HH. Evolution of voltage-gated ion channels at the emergence of Metazoa. J Exp Biol. 2015;218(4):515–525. doi: 10.1242/jeb.110270. [DOI] [PubMed] [Google Scholar]
  • 51.Ranganathan R. Evolutionary origins of ion channels. Proc Natl Acad Sci. 1994;91(9):3484–3486. doi: 10.1073/pnas.91.9.3484. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials


Articles from Journal of Computer-Aided Molecular Design are provided here courtesy of Springer

RESOURCES