Abstract
Drug-induced cardiotoxicity (DICT) is a major concern in drug development, accounting for 10–14% of postmarket withdrawals. In this study, we explored the capabilities of chemical and biological data to predict cardiotoxicity, using the recently released DICTrank data set from the United States FDA. We found that such data, including protein targets, especially those related to ion channels (e.g., hERG), physicochemical properties (e.g., electrotopological state), and peak concentration in plasma offer strong predictive ability for DICT. Compounds annotated with mechanisms of action such as cyclooxygenase inhibition could distinguish between most-concern and no-concern DICT. Cell Painting features for ER stress discerned most-concern cardiotoxic from nontoxic compounds. Models based on physicochemical properties provided substantial predictive accuracy (AUCPR = 0.93). With the availability of omics data in the future, using biological data promises enhanced predictability and deeper mechanistic insights, paving the way for safer drug development. All models from this study are available at https://broad.io/DICTrank_Predictor.
Introduction
Drug-induced cardiotoxicity (DICT) is a leading cause of drug withdrawals during postmarket surveillance. One study showed that 10% of withdrawals in the last 4 decades were due to cardiovascular safety concerns, including previously successful therapeutics such as rofecoxib, tegaserod, sibutramine, and rosiglitazone.1 Another study found that cardiotoxicity was the third most common reason for adverse drug reactions and accounted for 14% of withdrawals.2 Worryingly, the rate of DICT-related withdrawals may even be increasing, accounting for 17 out of 38 cases among drugs approved between 1994 and 2006.1,3
DICT is associated with both functional damage such as arrhythmia, which alters mechanical function, and structural damage such as morphological damage in cardiomyocytes; functional damage and structural damage in the heart can be interrelated, where one may precipitate the other.4 DICT can be attributed to several underlying mechanisms affecting myocardial functions and viabilities.5 Some drugs, such as anthracyclines, inflict direct myocyte injury via reactive oxygen species production and compromising DNA replication.6 Electrophysiological disruptions, for example, measured in the hERG potassium channel blockers, can lead to arrhythmias by causing QT interval prolongation.7 Cardiac energy demands can be affected by drugs that interfere with mitochondrial functionality.1 Drugs may also adversely influence vascular supply, inducing ischemic conditions.8 Intracellular calcium regulation for cardiomyocyte activity can also disrupt its homeostasis, resulting in contractile and rhythm abnormalities.9 Furthermore, alterations in growth factors and cytokine balances can induce cardiac conditions like fibrosis, and immunologic drug reactions can also cause cardiotoxicity.10,11 Several neurohormonal pathways also offer indirect routes for drug-induced cardiac stress.12 Notably, a single drug might induce cardiotoxicity via multiple mechanisms, and individual patients’ responses (which can often manifest as side effects) can be modulated by genetics, concurrent health conditions, and other medications.13
To move beyond a limited focus on specific adverse reactions or related proxy assays for cardiotoxicity, the FDA recently released the drug-induced cardiotoxicity rank (DICTrank) that categorizes drugs based on their risk of causing cardiotoxicity.14 Similar to the DILIrank data for liver injury,15,16 the DICTrank system uses FDA drug labeling to comprehensively categorize 1318 human drugs into four DICT Concern categories based on their potential risk for cardiotoxicity: (1) most-DICT-concern, (2) less-DICT-concern, (3) no-DICT-concern, and (4) ambiguous-DICT-concern. The DICTrank data set was generated with an expert review from the FDA, keyword searches, and manual curation of FDA labeling documents as well as data from clinical trials, postmarketing, and literature surveys.
Predictive models for DICT could save considerable time, resources, and human suffering, with the ultimate goal of preventing adverse events in clinical trials and the postmarket stage. However, predicting any in vivo effect is not a trivial classification task, and most predictive models are built on proxy end points (which are often reduced to binary end points) without taking into account in vivo parameters such as pharmacokinetic parameters.17,18 While no models for DICTrank have been publicly available yet to the best of our knowledge, various studies have predicted proxy in vitro assays or side effect data from side effects resource (SIDER), some of which are related to cardiotoxicity.19 Studies focusing on side effects and proxy targets (such as hERG) are reasonable given that compounds that have cardiac-related indications are more likely to show related side effects as well or activity on ion channels.7
Previously it was shown that adverse events data and biological data can be used for identifying mechanism hypotheses leading to cardiotoxicity.20 Wang et al. used LINCS L1000 gene expression features to predict a wide range of drug-induced adverse events from the SIDER data set.21 Particularly for acute myocardial infarction, the models developed achieved an AUC-ROC of 0.84 when using chemical structural data and 0.76 when using Gene Ontology annotations (compared to 0.5 for random models). Galeano et al. used a matrix decomposition algorithm to predict side effect frequencies for drugs and provide biologically interpretable insights.22 MoleculeNet predictions for SIDER side effects, trained on chemical structure data, range from 0.65 to 0.70 AUC-ROC when using a bypass network, a modified version of a multitask network.23
Most predictive models mentioned above were built on chemical structure data as input features. Although certain structural motifs or patterns in a molecule can be indicative of toxic properties and analyzing the chemical structure can flag potential cardiotoxic compounds, such models are often limited in their applicability domain; that is, their accuracy is limited to the chemical space of the training data, and they fail to generalize to markedly different chemical structures. Novel chemical and biological data have been previously used to evaluate side effects in general from the SIDER data set.24 Previous studies have shown that Random Forest models trained on a combination of biological, chemical, and phenotypic features achieved an AUCPR of 0.76 for cardiac disorders.25
With the availability of the new DICTrank data set, we used a novel multifaceted approach using both chemical and biological data (that considers a multitude of possible mechanisms that can lead to DICT) intending to better understand and make mechanistic insights into a drug’s cardiac safety profile. We evaluated a wide range of chemical and biological information, as shown in Figure 1, to determine which feature space is most predictive of DICTrank and evaluated these feature spaces to build the first predictive models of DICTrank using machine learning. Biological data sources included Cell Painting, gene expression, and Gene Ontology,26−30 as well as bioactivity, and annotated mechanisms of action (MOA)31 and pharmacokinetic parameters for the peak unbound and total concentration of a drug molecule in plasma;32 these offer an alternate feature space to chemical space.33 We aimed to glean insights into which chemical and biological data best capture the carefully curated manual annotations in the DICTrank data. Incorporating data from all these sources as feature spaces for predictive models allows for a multifaceted assessment of a drug’s potential cardiotoxicity, potentially enhancing the model’s accuracy and reliability. Overall, the use of biological data sources along with chemical data improved the detection and offered mechanistic insights into the cardiotoxicity of compounds. The models based on chemical structures and physicochemical characteristics are readily accessible for direct use on https://broad.io/DICTrank_Predictor (the other models are not implemted on the server due to the lack of public data for other feature types). All code and data for all models can be found on GitHub (https://github.com/srijitseal/DICTrank) for local implementation with further details on https://broad.io/DICTrank_Predictor.
Methods
Data Sources
We obtained the DICTrank data set, as released by Qu et al. which includes comprehensive DICTConcern categories for a diverse set of over 1300 drugs.14 The SIDER database, a pharmacovigilance resource, contained associations for drugs with side effects.23,34 We used data from cardiac disorders from the SIDER data set to compare concordance with DICTrank and enrich the data set as described later. To gain insights into the MOA of various drugs, we assessed relevant data from the Drug Repurposing Hub,35 which contained information on 6777 drugs for 1130 MOAs and 2183 known targets. To explore the potential targets of drugs, we incorporated the CELLSCAPE target predictions on inhibition/antagonism for 2094 targets at four concentrations (0.1, 1, 10, and 100 μM).36 We used morphological profiles from the Cell Painting assay26 which considers the impact of drugs on cellular morphology and function. This data set contained a range of ca. 1700 morphological features for over 15,000 compound perturbations. We obtained gene expression data from LINCS L1000 data which contains over 19,000 drugs as described in Wang et al.21 This study utilized gene expression features derived from LINCS L100027 transcriptomic data, capturing changes in 978 landmark genes across diverse human cell lines in response to compound perturbations. Gene Ontology-transformed expression features,28 which encode biological processes involved with gene expressions affected by the compound perturbations, were extracted from a data set containing 4438 annotated features linked to these compounds in the study.21 The analysis by Wang et al. prioritized the strongest signatures across cell line, concentration, and time point for each compound using characteristic direction and evaluated the enrichment across various gene set libraries via principal angle enrichment analysis.37 Finally, we used pharmacokinetic data, specifically the maximum unbound and total concentrations (Cmax) of 758 drugs in the bloodstream, as compiled by Smith et al.32 This data set contains Cmax (unbound) for 534 compounds and Cmax (total) for 749 compounds.
Standardization of the SMILES
For each data set, we standardized chemical SMILES iteratively using RDKit38 and MolVS39 functionalities. This includes steps for the InChI transformation, molecular cleanup, charge neutralization, tautomer normalization, and final standardization. We carried out up to five iterations of the standardization until a standardized SMILES was finalized; otherwise, we chose the most common SMILES from the counter. Finally, the molecule was protonated at pH 7.4 using DimorphiteDL to reflect its likely state at physiological pH.40 Hence, we obtained standardized SMILES and standardized InChI.
Preprocessing Data
For the DICTrank data set, we binarized the data set considering DICT no-concern as 0 and less- and most-concern as one as DICTrank labels for machine learning classifiers. We removed compounds that were ambiguous and treated a compound as toxic if there was at least one record of toxicity among duplicates. For the SIDER data set, we removed duplicate standardized smiles and, similar to the above, labeled a compound as toxic if there was at least one evidence of toxicity among the duplicates. Labels from both SIDER and DICTrank are described in Table 1.
Table 1. Distribution of Compound Toxicity Labels Related to Cardiotoxicity/Cardiac Disorders for All Unique Compounds from Each of the Datasets Used in This Study.
data set | label | number of toxic compounds | number of non-toxic compounds | description |
---|---|---|---|---|
SIDER | cardiac disorders (binary) | 829 | 360 (absence of evidence) | recorded adverse drug reactions from marketed medicines |
DICTrank | DICT concern category (categorical) | most: 299, less: 443 | no: 278 (evidence of absence) | ranking system from DICTrank that categorizes drugs according to risk for cardiotoxicity |
DICTrank label (binary) | 742 | 278 (evidence of absence) | binarized labels obtained from DICT concern categories used in this study |
For the Cell Painting, gene expression, and Gene Ontology data sets, we use median cell profiles over standardized SMILES obtaining two data sets: 1783 Cell Painting features for 15,406 compounds, and gene expression features for 978 landmark genes and 4428 Gene Ontology annotations for 9132 compounds. For the MOA data set, we used one hot encoding of given annotations for compounds, which effectively gives us data for evidence of the presence of MOA/known targets and the absence of evidence. We used a variance threshold of 0.001 to identify and remove low-variance features, reducing the dimensionality to 264 MOA and 551 known target features with significant variability. All data sets are released publicly at figshare (10.6084/m9.figshare.24312274) and https://broad.io/DICTrank_Predictor.
Analyzing Chemical Space Overlap Between SIDER and DICTrank
We used standardized InChI to calculate the overlap between SIDER and DICTrank data sets. We assessed the physicochemical space using a t-distributed stochastic neighbor embedding (TSNE; as implemented in scikit-learn41) for six physicochemical properties, namely, molecular weight, topological polar surface area, number of rotatable bonds, hydrogen bond donors and acceptor, and the computed logarithm of the partition coefficient. To analyze the chemical space, we used a principal component analysis (PCA) of the FragFP fingerprints from DataWarrior,42 which in our experience works better with a higher explained variance in the plot of the PCA compared to Morgan fingerprints.
Structural and Physicochemical Features
For structural features, we used 2048 bit Morgan Fingerprints as implemented in RDKIT.38 For chemical compounds, we computed 1579 descriptors using Mordred.43 These physicochemical descriptors are derived from 2D representations of compounds; that is, we did not consider 3D descriptors. We removed the descriptors that failed to compute and finally obtained 1038 2-D physicochemical descriptors, and these were used for the machine learning models. For the analysis of feature distributions, we used the full set of 208 RDKit descriptors, (which are better interpretable compared to Mordred descriptors) as defined in the descriptors module.38
Predicted Targets from CELLSCAPE
To derive predicted molecular targets for compounds, we utilized the commercially available CELLSCAPE target prediction package (Ignota Labs, 2023).36 This package applies models trained on a mixture of publicly available and proprietary bioactivity data (primarily inhibitory/antagonistic mechanisms) at 0.1, 1, 10, and 100 μM with chemical structural features to output a probability score (between 0 and 1) of predicted activity for 2094 distinct human targets. Although not used in this study, publicly available target prediction alternatives are also available such as PIDGINv444,45 and SwissTargetPred.46 We provide the computed CELLSCAPE features for compounds in the DICTrank data set publicly via figshare (10.6084/m9.figshare.24312274) and https://broad.io/DICTrank_Predictor.
Substructure Analysis and Retrospective Analysis of DrugBank
For substructure analysis, we used SARpy47 on the DICTrank data set, in a method similar to the one applied by Hemmrich et al.48 SARpy uses a recursive algorithm for fragmentation. We used two distinct settings for analysis: (1) using both toxic and nontoxic compounds and (2) using only toxic compounds to yield the desired substructures. For both settings, we confined the fragment size within a range of two-18 atoms, with a minimum occurrence of five times. Furthermore, the positive predictive value (PPV) was adjusted to minimize false negatives. We combined structural alerts from both settings and quantified the frequency of these fragments within the entirety of the DICTrank data set. We eliminated fragments with a PPV of below 0.5. We then manually assessed the remaining fragments, for example, removing those having four or fewer atoms, removing substructures like benzene to obtain 58 structural alerts.
We analyzed all compounds in DrugBank49 for the presence of structural alerts from the above to evaluate the risk of the chemical space of drugs for cardiotoxicity. We only used the compounds that did not overlap with the DICTrank data set for this retrospective analysis to avoid information leaks. We annotated these compounds with labels for cardiac disorders from SIDER and disease area labels from the MOA data set. We then checked for the presence of structural alerts among the subset of compounds that are currently approved, investigational, experimental, or withdrawn drugs.
Analysis of Chemical and Biological Data for Differences in Feature Distribution for DICTrank Compounds
We detected features that are predictive of highly cardiotoxic compounds. In order to do this, we detected features for each chemical and biological data set that had a significant difference in the distribution for the DICT concern categories. For categorical features (SIDER, MOA annotations, and some of the 208 RDKit descriptors), we employed the chi-squared test (as implemented in SciPy50) to evaluate the association between categorical variables. We used a contingency table, delineating the frequency distribution for each combination of category values. The chi-squared test yielded a statistical value alongside a corresponding p-value. For continuous features (as in the Cell Painting, Gene Expression, Gene Ontology data sets, and some of the 208 RDKit descriptors), we chose the Kruskal–Wallis test (as implemented in SciPy50) for evaluating the DICT-Concern labels since it is suited for comparisons involving three or more independent groups. Conversely, when comparing two classes, pairwise, the Mann–Whitney U test (as implemented in SciPy50) was used which is adept at discerning differences in distributions between two independent samples. Both tests yield a statistic value alongside its corresponding p-value. For both total unbound/plasma concentrations, as in the Cmax data set, we used the Mann–Whitney U test to compare the distribution of Cmax among each DICT concern class and the DICTrank label.
Enriching DICTrank Compounds with SIDER Compounds
We next determined the overlap of compounds (and the concordance in their labels) in the DICTrank data set with the compounds in SIDER labeled with “cardiac disorders” using the standardized InChI yielding 776 compounds in common. We next enriched DICTrank with SIDER giving a preference to the DICTrank label in the case of a conflict. In this manner, we obtained three data sets besides the DICTrank data set with the distribution of toxic/nontoxic compounds given in Supporting Information Table S1. These are (1) DICTrank, (2) DICTrank enriched with cardiotoxic compounds from SIDER, (3) DICTrank enriched with noncardiotoxic compounds from SIDER, and (4) DICTrank enriched with all compounds from SIDER.
Training Predictive Models for DICTrank
We trained 11 Random Forest models, each using the following features (as listed in Table 2): (1) Structural fingerprints, (2) Mordred descriptors, (3) MOA labels, (4) MOA labels along with total Cmax, (5) MOA labels along with unbound Cmax, (6) CELLSCAPE predicted protein targets, (7) CELLSCAPE predicted protein targets along with total Cmax, (8) CELLSCAPE predicted protein targets along with unbound Cmax, (9) Cell Painting features, (10) Gene Expression features, and (11) Gene Ontology features.
Table 2. Description of Various Feature Spaces Used in This Study.
feature space | dimensions after feature selection (where applicable) | description | signal expected | source |
---|---|---|---|---|
chemical structure | 2048 bit vector | ECFP4 (Morgan) fingerprints representing chemical structures | distinctive patterns of chemical bonding and arrangement | (54) |
physicochemical properties (Mordred descriptors) | 1038 2-D descriptors | properties such as lipophilicity, solubility, molecular weight, ionizing potential, and so forth | properties that are associated with negative impacts on ion channels in the heart | (43) |
MOA data set | 264 binary encoded MOAs +551 known targets | annotations for mechanism of action and known targets based on knowledge | mechanism of action for drugs that inhibit certain ion channels | (31) |
CELLSCAPE target prediction data set | 1893 predictions for 817 unique targets and concentration combinations (0.1, 1, 10, and 100 μM) | predicted protein target for inhibition/antagonism; does not consider the functionality; prediction is based on chemical structure; updated algorithm from PIDGINv445 | understanding how a drug interacts with various biological targets (not just its primary target) can provide insights into potential off-target effects | (36,45) |
Cell Painting | 1783 features | morphological changes in U2OS cells by a chemical perturbation, using a 5-channel fluorescence microscopy assay | morphological changes in cells that reflect basic biological processes | (26,55) |
gene expression | 978 features | transcriptomic changes in response to chemicals using the L1000 assay | upregulation or downregulation of genes associated with cardiac stress, apoptosis in cardiac cells, or ion channel function | (21,27) |
Gene Ontology | 4438 annotations | Gene Ontology manual annotations based on collective knowledge | understanding the biological processes, cellular components, and molecular functions affected, e.g., related to cardiac function, cardiac muscle tissue development, or ion homeostasis | (21,28) |
Cmax | 2 features | the maximum total and unbound concentration of a drug in plasma | High Cmax would indicate a high risk of cardiotoxicity | (32) |
The training data available for these models depended on the number of compounds for which data was available and varied, as given in Supporting Information Table S1. As the external test set, we aimed to keep that fixed for a fair evaluation depending on available data, as shown in Supporting Information Table S2. For models not using Cmax data (where overlaps were larger and hence more data was available), we randomly selected 90 compounds (8.8% of the data set, 65 cardiotoxic and 21 nontoxic) for which all annotations of feature spaces were available (as described in Supporting Information Table S1). These 90 compounds struck a similar balance of DICT concern categories (most: 39, less: 26, and no: 25) as the original DICTrank data set. For models using total Cmax data, we used the same external test set comprising 90 compounds since total Cmax data were available for these compounds. However, for models using unbound Cmax data (which had smaller overlaps compared to the above), we used a subset of 78 compounds (57 cardiotoxic and 21 nontoxic) as the external test set as shown in Supporting Information Table S2.
Among the models that relied on omics data (Cell Painting, Gene Expression, and Gene Ontology), we checked for each training compound whether a profile (feature set) was available. If there was no profile available in the respective data sets, we calculated the median profile of all compounds in the original data set using a v-NN approach, which is different from a fixed k-nn approach; v-nn selects the neighbors based on a condition for each query compound. We used the median profile on the v training compounds that had a Tanimoto similarity greater than 0.70. We ignored any similar compound that appeared in the external test set to avoid information leaks. Subsequently, we further discarded any compounds for which no feature profile was found directly or using the above v-nn approach. Thus, while the test sets for the DICTrank and DICTrank enriched data sets are the same, it is important to note that the training data for them vary for the models (as described in Supporting Information Table S1) since we dropped compounds where no feature data could be found or matched.
For each of the 11 models, we used a Random Forest classifier, with hyperparameter optimization on the training data using a halving random search with a 5-fold stratified cross-validation with a random oversampling to account for class imbalance (as implemented in scikit-learn41). We used the best hyperparameter-optimized estimator and obtained out-of-fold predictions with a 5-fold stratified cross-validation. We used the out-of-fold predictions and the true labels to optimize the decision threshold for binary classification using the J statistic, calculated as the difference between the true positive rate and the false positive rate. This determines the threshold from ROC curve values, where the J statistic is maximized. The model was finally refitted on the entire training data set, and we used the optimized threshold to make final predictions based on the predicted probabilities of the external test set.
We trained two ensemble models to combine the models from the 11 feature spaces above. These were based on soft voting, which considered the mean of the scaled predicted probabilities of each mode (scaled according to the best threshold of each model). The first model considered only the six best-performing models (structural, physicochemical, MOA, CELLSCAPE, MOA with Cmax total, and CELLSCAPE with Cmax total) in the cross-validation (AUC > 0.65). The second ensemble model considers all 11 models and thus is evaluated on the reduced external test set of 78 compounds where data from all feature spaces were available.
Model Evaluation and Applicability Domain
We evaluated the classifiers using the balanced accuracy, sensitivity (or recall), specificity, F1 score, Matthews Correlation Coefficient (MCC), AUC-ROC, and the AUCPR, or precision–recall curve, which focuses on the positive class.
To evaluate the applicability domain of the models, for each compound in the external test set, we calculated the Tanimoto similarity of the nearest neighbor of the same DICTrank label (toxic/nontoxic) in the training data set. We grouped compounds in five equal bins from Tanimoto similarity of 0.0 to 1.0 and evaluated the balanced accuracy and AUCPR in this range for the models used in this study.
Statistics and Reproducibility
We have released the data sets used in this study which are publicly available at 10.6084/m9.figshare.24312274. We released the Python code for the models which are publicly available at https://github.com/srijitseal/DICTrank and further details are available on https://broad.io/DICTrank_Predictor.
Results and Discussion
In this study, we used various biological and chemical data sets to discern among the DICT concern categories, deriving insights into the carefully annotated FDA DICTrank data set. We also trained predictive models using these feature spaces. In particular, we used the Cell Painting data from Bray et al., which captures a wide array of cellular phenotypes after perturbation, e.g., drug treatment, and has been shown to have a signal for various in vitro toxicity.26,51 We also used experimental (from the Repurposing Hub31) and predicted bioactivity data derived from models trained on a mixture of publicly available and proprietary data sets (Ignota Labs CELLSCAPE36), mostly relating to inhibitory/antagonist mechanisms. For structure-derived feature spaces, we used Morgan fingerprints derived from chemical structures as well as physicochemical Mordred descriptors which are often related to pharmacokinetic properties (such as logD, molecular weight, solubility, permeability, and so forth) and implicitly encode the bias between bioactivity classes and chemical structures.52 Finally, we looked at pharmacokinetic parameters for the peak unbound and total concentration of a drug molecule in plasma (Cmax).53 We organized and standardized various chemical and biological data, as shown in Table 2, to analyze their ability to predict DICTrank labels.
DICTrank Labels are Highly Concordant with SIDER Labels
Among the 776 compounds present in both DICTrank and SIDER cardiac disorders data sets (Figure 2a), we found an 87.24% concordance rate in the annotations (labels) between the two data sets (Supporting Information Table S3; SIDER labels have an F1 score of 0.91 when compared against DICTrank labels). This suggests that SIDER labels, which ascertain cardiac disorder events reported as associated with each drug and are often dependent on aggregated dispersed public information and package inserts, agree with DICTrank labels, which ascertain if a compound is classified as cardiotoxic by the FDA.
The physicochemical space of SIDER and DICTrank generally overlap (Figure 2b), defined as a TSNE space for six physicochemical properties, namely, molecular weight, topological polar surface area, number of rotatable bonds, hydrogen bond donors and acceptors, and the computed logarithm of the partition coefficient. Still, compounds exclusively available in the SIDER data set could help enrich nontoxic compounds in areas of the chemical space where DICTrank only covers toxic compounds. We see a similar trend for a chemical space defined in fragment fingerprints space from DataWarrior42 (Supporting Information Figure S1). Therefore, we chose to assess whether adding SIDER compounds to DICTrank compounds improved the predictive ability. Interestingly, other categories of SIDER adverse effects were highly correlated to DICTrank (Figure 2c); the interrelationships of vascular disorders and nervous system disorders are well-known.5,56 Overall, drug adverse events, as recorded in SIDER, have a high concordance with DICTrank labels from the FDA and there is a strong rationale to rely on both resources.
Maximum Total and Unbound Compound Concentration in Plasma Predict Cardiotoxicity
We next determined if a high Cmax indicated compounds more likely to be cardiotoxic as seen in the case of doxorubicin where cardiotoxicity was found to be Cmax dependent.57 As a single parameter, Cmax was not sufficiently discerning to differentiate between compounds that fall under the ‘most-concern’ and ‘less-concern’ categories as per the DICT concern classification (Figure 3). However, for both peak total plasma levels and peak unbound (active) plasma levels’ Cmax, the median distributions were significantly distinguishable between cardiotoxic and nontoxic compounds (Figure 3) suggesting that Cmax can be a useful parameter in determining cardiotoxicity.
Cyclooxygenase Inhibition is Predictive of Cardiotoxicity Concern
Turning to manual annotations of compound MOA and/or targets, we found that cyclooxygenase inhibitors58 along with tyrosine kinase receptor inhibitors were the most significant annotations differentiating the various DICT concern categories (Table 3); this is plausible given cyclooxygenase inhibition, besides reducing inflammation, can also lead to increased blood pressure59 while tyrosine kinase receptor inhibition can induce endoplasmic reticulum stress and inflammation in cardiomyocytes.60 In agreement with this, known targets of prostaglandin endoperoxide synthases (PTGS1 and PTGS2 genes, which encode cyclooxygenases COX-1 and COX-2) could significantly distinguish among most-, less-, and no-DICT concern categories (Table 3).
Table 3. Features from Chemical and Biological Data Sources That Have a Particularly High Significance in the Difference of Distributions for the Three DICT Concern Categories (Most, less, No-Concern).
feature | P-value (statistical test) | test applied | feature space | description/biological interpretation | source |
---|---|---|---|---|---|
Cmax (total) | 2.97 × 10–4 | Mann–Whitney Wilcoxon test two-sided with Bonferroni correction (most vs no) | pharmacokinetic parameters32 | the peak total concentration of a drug in plasma indicates how much of the drug reaches the bloodstream | (57) |
Cmax (unbound) | 7.58 × 10–4 | the peak unbound (active) concentration of a drug in plasma, indicates how much of the drug is available for interaction with its target | |||
cyclooxygenase inhibitor | 2.98 × 10–6 | chi-squared test | MOA (Drug Repurposing Hub31) | inhibits cyclooxygenase enzymes, often leading to reduced inflammation but also increased blood pressure | (59) |
tyrosine kinase receptor inhibitor | 3.24 × 10–5 | inhibition of tyrosine kinase receptors can affect cell growth and proliferation and can also induce endoplasmic reticulum stress, hypertension, heart failure, myocardial infarction, and cardiac arrhythmias | (60) | ||
PDGFR tyrosine kinase receptor inhibitor | 3.24 × 10–5 | ||||
PTGS2 | 9.67 × 10–7 | chi-squared test | known targets (Drug Repurposing Hub31) | prostaglandin-endoperoxide synthase 2 (an enzyme) also known as COX-2 | (59) |
PTGS1 | 9.67 × 10–7 | prostaglandin-endoperoxide synthase 1 (an enzyme) also known as COX-1 | |||
HTR1D | 1.27 × 10–5 | 5-hydroxytryptamine receptor 1D, a serotonin receptor subtype. Previous enrichment analysis for methylation differences identified HTR1D among the genes with decreased promoter methylation, suggesting its involvement with serotonin receptors, which influences human cardiac function | (65,66) | ||
Q12809 at 100 μM (KCNH2) (hERG) | 1.21 × 10–18 | Kruskal–Wallis test | CELLSCAPE predicted target36 | the hERG gene encodes Kv11.1 channels crucial for heart function, linked to genetic and drug-induced arrhythmias | (61) |
cytoplasm granularity 2 ER | 4.64 × 10–4 | chi-squared test | Cell Painting26 | fine-grained smoothness of the ER staining. Disruptions in ER function can lead to ER stress, which is associated with various cardiovascular diseases | (67) |
cells granularity 2 ER | 1.05 × 10–3 | contains information about the size, shape, number, or texture of nucleoli within the nucleus. This could encode signals for cellular stress | |||
nuclei texture contrast RNA 3 0 | 1.19 × 10–3 | (68) | |||
222,103 at (ATF1) | 1.01 × 10–4 | gene expression21,27 | ATF1 is essential for cardiomyocyte function | (69) | |
201,080 at (PIP4K2B) | 6.29 × 10–4 | the role of PIP4k2 in cardiac disorders remains uncertain. PIP4Ks regulate insulin production and immune response, with PIP4k2c impacting TGFβ1 signaling which is vital in heart disease and other fibrotic conditions | (70) | ||
209,092 s at (GLOD4) | 8.14 × 10–4 | the physiological function of GLOD4 remains largely unexplored. The glyoxalase gene family, comprising six enzymes with roles in metabolism and disease prevention, is crucial for detoxifying reactive dicarbonyls and maintaining cellular homeostasis | (71) | ||
transport vesicle (GO:0030,133) | 5.53 × 10–4 | Gene Ontology21,28 | extracellular vesicles play important roles in cardiovascular communication, transporting bioactive molecules that both maintain heart health and contribute to cardiovascular diseases | (72) | |
negative regulation of potassium ion transmembrane transport (GO:1901380) | 1.04 × 10–3 | cardiac K+ channels play a crucial role in cardiac repolarization and their dysfunction can lead to arrhythmias | (73) | ||
response to methylmercury (GO:0051,597) | 1.56 × 10–3 | exposure to mercury (Hg) is considered to be an increased risk of developing cardiovascular system | (74) | ||
VSA EState6 | 6.67 × 10–9 | physicochemical Descriptors from RDKit38 | VSA estate descriptor 6 (6.00 < = x < 6.07) related to molecular surface area and electronic state | (75) | |
Qed | 1.20 × 10–7 | quantitative estimate of drug-likeness, a measure indicating how drug-like a molecule is | (76) | ||
numHAcceptors | 1.26 × 10–7 | number of hydrogen bond acceptors in the molecule | (77) |
CELLSCAPE-Predicted Protein Targets Such as hERG are Predictive of Cardiotoxicity
Among CELLSCAPE-predicted protein targets, the predicted activity of compounds against KCNH2 best differentiates among the three DICT concern categories. The KCNH2 gene, also known as the human ether-à-go-go-related gene (hERG), is well-known for its significance in the cardiac electrical cycle and hERG inhibition can lead to cardiac arrhythmias.61 We also found that the top three features to distinguish the two DICTrank labels (cardiotoxic versus nontoxic) were α-l-fucosidase I, P-selectin, and carbonic anhydrase IX. The activity of plasma α-l-fucosidase has been pinpointed as a potential biomarker for cardiac hypertrophy and complements the currently used marker, atrial natriuretic peptide.62 Elevated amounts of soluble P-selectin in the blood are evident in various heart-related conditions, like coronary artery disease, hypertension, and atrial fibrillation.63 Carbonic anhydrase IX plays a role in managing the intracellular pH in the heart muscle, which is vital for the heart’s functionality.64
Hypothesis-free Omics Data for Cardiotoxicity are Related to MOA
Omics data sources such as Cell Painting (imaging), gene expression, and Gene Ontology features cover a broad swath of biology, not specifically targeted to cardiac function. For Cell Painting, the fine-grained smoothness of the ER in the cytoplasm and RNA in the nucleus were the top features that differed significantly among toxicity classes. This is plausible given disruptions in ER function can lead to ER stress, which is associated with various cardiovascular diseases.67 For the gene expression feature space, activating transcription factor 1 (ATF1), which is essential for cardiomyocyte function, was the top feature. The other two gene expression features that could distinguish DICT concern categories were phosphatidylinositol-5-phosphate 4-kinase type 2 beta (PIP4K2B) and glyoxalase domain containing 4 (GLOD4); both have indirect links to heart disease and other fibrotic conditions (Table 3). Among Gene Ontology annotations, we found that biological processes related to vesicle transport, potassium ion transmembrane transport, and response to methylmercury could best differentiate signals for the concern categories. This is plausible given that cardiomyocytes rely on vesicular transport for various functions, including the delivery of membrane proteins and lipids. The potassium ion channels play crucial roles in cardiac cell electrical activity and dysregulation can lead to arrhythmias and other heart complications.72,73 Exposure to mercury (Hg) is also considered a risk for ischemic heart disease.74
Physicochemical Properties can Differ Among DICT Concern Categories
Among the various molecular descriptors evaluated in our study, VSA_EState6 could significantly distinguish among the DICT-concern categories. This electrotopological state descriptor aggregates the differences in electronegativity between an atom and its neighboring atoms in a molecule, adjusted by their relative distances while focusing on atoms with specific van der Waals surface area.75 This suggests that specific electronic and spatial properties are captured by the VSA_EState6 descriptor, although it is difficult to interpret directly. The second predictive feature, Qed, captures a quantitative estimation of the drug-likeness score that encapsulates the underlying distribution data for a range of drug properties.76 The third predictive feature, NumHAcceptors refers to the number of hydrogen bond acceptors in the compound. Munawar et al. showed that the most potent hERG inhibitors typically possess two aromatic groups, one hydrophobic group, and one hydrogen bond acceptor, at specific relative distances from each other.77
Structural Alerts from DICTrank can Detect Compounds Causing Cardiac Disorders from a Retrospective Analysis of DrugBank
We determined 59 structural alerts that distinguish cardiotoxic and nontoxic compounds in the DICTrank data set (Figure 4). Two structural alerts had a high PPV for the DICT most-concern category, including one with aromatic rings. Aromatic rings can lead to π-stacking or hydrophobic interactions with aromatic rings of amino acids within the hERG channel cavity increasing the potential for blocking and subsequent cardiotoxic effects.78 Six structural alerts distinguished toxic versus nontoxic compounds with a PPV of one and more than ten occurrences in the data set (the PPV was used to filter the structural alerts, hence is not an evaluation metric here). Structural alerts with tertiary amines were consistently protonated at physiological pH in the DICTrank data set, suggesting their importance in biological activity and hERG channel binding.79,80 It is also known that compounds with secondary amine (more hydrogen bond donor number) are likely to be less potent hERG inhibitors compared to tertiary amine (less hydrogen bond donor number).80
We next analyzed compounds in DrugBank49 for the presence of at least one of the two structural alerts above for the most-concern category. We annotated these hits with heart-related side effects from SIDER34 and their current status (approved, withdrawn, and so forth) as indicated in DrugBank. We found six approved drugs, some experimental and some investigation, with reported cardiac disorders from SIDER (Table 4). These compounds spanned different classes of compounds, with the presence of a tertiary amine that remains protonated or aminopyridine rings as defined by the structural alerts. We found evidence in the literature for the risk of cardiovascular disorders for three of the six compounds, namely, ipratropium, tiotropium, and mivacurium.81−83 Overall, our analysis shows that the DICTrank data set is a rich source of cardiotoxicity-causing compounds, with the potential to be used to build pharmacophore models and evaluate compounds with reported adverse events for their potential mechanisms of toxicity. Overall, we could detect multiple approved drugs that match the structural alerts for both the DICT most-concern category (as shown in Table 4) and DICTrank labels for cardiotoxicity (further details in Supporting Information Figure S2).
Table 4. Six Hits from SIDER with Structural Alerts for the DICT Most-Concern Category.
Predictive Models for DICTrank Labels
Finally, given the promising signals seen in each data type, as described above, we evaluated whether cardiotoxicity might be predicted using the data sources currently publicly available. Several data sources contained sufficient information to successfully train models to predict DICTrank labels (Table 2). We trained 11 models on four types of training data: the DICTrank compounds alone and DICTrank compounds enriched with cardiotoxic/nontoxic/all compounds in the SIDER data set (as shown in Supporting Information Table S1). A direct comparison of the predictive value of data sources is not possible due to the incomplete intersection of compounds with available data of each type. Still, we fixed the held-out test set of compounds to be those where data were available for all feature spaces such that only the training set of compounds varied among data sources. We trained two ensemble models, one using six models (structural, physicochemical, MOA, CELLSCAPE, MOA with Cmax total, and CELLSCAPE with Cmax total) that performed relatively well on the internal cross-validation (evaluation metrics from cross-validation for all feature space and data set combinations, are given in Supporting Information Table S4). This ensemble model was evaluated on an external test set of 90 compounds. Another ensemble model was built on all 11 models, which required testing on a smaller held-out test set due to the limited overlap of data. Evaluation metrics for all models are given in84 Supporting Information Table S5.
Looking at each data source independently, we found that models using Mordred descriptors evaluated on the 90 compounds held-out test set (AUC: 0.84, AUCPR: 0.93; random AUC: 0.50, AUCPR: 0.72) performed better compared to models trained on predicted protein targets (AUC: 0.77, AUCPR: 0.89) and MOA annotations with Cmax (total) (AUC: 0.77, AUCPR: 0.90) (Figure 5a,b). In fact, models using Mordred descriptors were as good as the ensemble of six selected models (AUC: 0.83, AUCPR: 0.92; random AUCPR: 0.72) also evaluated on the 90 compounds held-out test set (Supporting Information Figure S3). Further, models across most data sets performed with high AUCPR and F1 scores, with top-performing models using Mordred descriptors (AUCPR: 0.93; random AUCPR: 0.72) and ensemble models (AUCPR: 0.93 for both ensemble models) when using the DICTrank data set directly (Supporting Information Figure S3a and b). Exceptions were models using the broad-based omics data, Cell Painting, Gene Expression, and Gene Ontology, where the performance was relatively poor and similar to random predictions according to the distribution of respective training data. This lack of predictive power may be inherent to the data sources but could also be due to the highly unbalanced and sparse training data available for these data sources (see Supporting Information Table S2). When comparing the models evaluated with the smaller test set (Supporting Information Figure S3), we found that models trained on the DICTrank data set enriched with all SIDER compounds and using MOA data with Cmax (unbound) (AUCPR: 0.93, random AUCPR: 0.73) performed equally as the ensemble models that used predictions from all 11 models trained on just the DICTrank data set (AUCPR: 0.93; random AUCPR: 0.73). Overall, a strong detection of cardiotoxicity was seen equally among the ensemble model and models using physicochemical descriptors.
We next analyzed the applicability domain of these models based on evaluating the quality of prediction for groups of compounds that are structurally dissimilar to the training data. We found that ensemble models and models using MOA annotations perform consistently well across the similarity range (Figure 5c). Models using Mordred descriptors, on the other hand, perform with slightly lower AUCPR when compounds are structurally dissimilar to the training data.
Finally, we predicted the DICTrank labels for 82 unique compounds that were labeled ambiguous in the original DICTrank data set (Supporting Information Table S6). We used Mordred descriptors and retrained the model on all 1020 compounds (training and held-out compounds) in DICTrank, except for the ambiguous compounds. We found that 43 of the 82 ambiguous compounds were predicted to be cardiotoxic and 39 were predicted to be nontoxic and provided this list to the community for further study (Supporting Information Table S6).
Limitations of This Study
While we considered in this study various chemical and biological data sources, it is important to remember that conclusions are based on limited data. Certain feature spaces contain features that are computed based on chemical structure, such as CELLSCAPE target predictions and physicochemical properties, while data sets such as MOA and SIDER are manually gathered and have 'evidence-of-the-presence' and 'absence-of-evidence' annotations. To train models using feature spaces such as Cell Painting, Gene Expression, and Gene Ontology data sets, we dropped compounds where we could not find profiles (whether experimentally captured or imputed based on matching to highly similar compound profiles using the v-nn approach). The amount of training data (and also the class balance of SIDER/DICTrank labels) is lower for these models. Although we compare data sources using the same test compounds, the varying amounts of training data and the differing types of compounds represented therein can disadvantage some data sources versus others, such that we cannot with certainty compare the signal contained across the feature spaces. The poor performance of -omics data should therefore not yet be attributed to representing the signal in the feature space. Rather in this study, we aim to evaluate the signal present in the data that is available and build the best predictive models possible with public data. Recently, deep learning lgorithmms have been shown to learn feature representations from various -omics data. Transfer learning allows for leveraging pretrained models on large data sets (for example general image-based models), which can then be fine-tuned for specific tasks with limited data (such as Cell Painting data).85,86 Similarly, one-shot learning shows potential in enabling models to make precise predictions with minimal data.87 In the future, deep learning models, by learning and generalizing across feature representations, hold the promise of enhancing predictive accuracy and broadening the scope of data analysis in the study of cardiotoxicity. Further, a recurring challenge in using comprehensive -omics data is the sparsity of data, which limits prospective validation.88 This necessitates the development of models that can make reliable predictions even with sparse or incomplete data sets. In this study, we observed that models based on computed physicochemical properties performed on par with other ensemble models. We recommend that this model, which we have made available for public use, be used for prospective validation. In the future, the availability of more data, for example, Cell Painting from JUMP-CP89 and Recursion RxRx390 will significantly improve our ability to ascertain the presence of a signal for cardiotoxicity in -omics data.
Conclusions
In this work, we used biological and chemical data (Figure 1) to predict DICT. We determined the feature contained in each data source that most differed between the most-concern versus nontoxic category for DICTrank and found these could drive mechanistic insights. Features from data sources such as predicted protein targets and annotated MOAs that could distinguish the DICT concern categories resembled activity against targets (ion channels in particular) that are mechanistically most plausible. We further evaluated these feature spaces using machine learning to build the first predictive models of DICTrank. Our findings indicate that models relying on physicochemical properties trained on larger training data sets performed on par with the ensemble models based on diverse data sources. The exploratory data analysis in this study suggests that as more -omics data becomes accessible in the future, it will enhance our ability to predict cardiotoxicity. Therefore, for the present, when constructing models using public data sets, we advocate the use of Mordred descriptors and predicted targets (based on chemical structure), since these computed properties are readily available for compounds; they do not require experimental data and could be used to build models for cardiotoxicity. In the future, using biological data, we can look into the biological pathways and mechanisms of DICT leading to better drug design and safer therapeutic strategies.
Acknowledgments
This work was performed using resources provided by the Cambridge Service for Data-Driven Discovery (CSD3) operated by the University of Cambridge Research Computing Service (www.csd3.cam.ac.uk), provided by Dell EMC and Intel using Tier-2 funding from the Engineering and Physical Sciences Research Council (capital grant EP/T022159/1), and DiRAC funding from the Science and Technology Facilities Council (www.dirac.ac.uk). Cartoons in Figure 1 were created with DALLEv2 (https://openai.com/dall-e-2), Microsoft Designer (https://designer.microsoft.com/), and Bioicons (https://bioicons.com) which compiled images from the Database Centre for the life sciences/TogoTV (https://togotv.dbcls.jp) and Servier (https://smart.servier.com). Technical infrastructure for hosting the app https://dili.serve.scilifelab.se/ was provided by SciLifeLab Serve (https://serve.scilifelab.se), a platform developed and supported by SciLifeLab Data Centre.
Data Availability Statement
The models based on chemical structures and physicochemical characteristics are readily accessible for direct use online at https://broad.io/DICTrank_Predictor. All code and data for all models can be found on GitHub (https://github.com/srijitseal/DICTrank) for local implementation, with further details on https://broad.io/DICTrank_Predictor. All data sets are also available at 10.6084/m9.figshare.24312274.v1.
Supporting Information Available
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jcim.3c01834.
The chemical space of all compounds DICTrank and SIDER; DrugBank compounds containing structural alerts; and evaluation metrics for models developed in this study on an external test set (PDF)
Number of compounds in Train data set; number of compounds in Test data set; evaluation metrics for SIDER labels for cardiac disorders considering the DICTrank data as true labels; evaluation metrics for all models and data set combinations on cross validation and on held out test sets; and predicted DICTrank for 82 unique compounds originally labeled ambiguous in DICTrank (XLSX)
Author Contributions
S. Seal designed and performed exploratory data analysis and implemented and trained the models. L.H.G. trained CELLSCAPE for the prediction of drug targets. S. Seal, A.E.C., and S. Singh analyzed the biological interpretation of morphological features. S. Seal, O.S., and A.B. analyzed the performance of machine learning models. S. Seal wrote the manuscript with extensive discussions with all authors. All of the authors (S. Seal, O.S., L.H.G., M.G.O., A.E.C., A.B., and S. Singh) reviewed, edited, and contributed to discussions on the manuscript and approved the final version of the manuscript.
S. Seal acknowledges funding from the Cambridge Centre for Data-Driven Discovery (C2D3) and Accelerate Programme for Scientific Discovery. A.E.C., S. Singh, and S. Seal acknowledge funding from the National Institutes of Health (R35 GM122547 to A.E.C.). O.S. acknowledges funding from the Swedish Research Council (Grants 2020-03731 and 2020-01865), FORMAS (Grant 2022-00940), Swedish Cancer Foundation (22 2412 Pj 03 H), and Horizon Europe (Grant Agreements 101057014 (PARC) and 101057442 (REMEDI4ALL)).
The authors declare the following competing financial interest(s): S. Singh and A.E.C. serve as scientific advisors for companies that use image-based profiling and Cell Painting (A.E.C.: Recursion, SyzOnc; S. Singh: Waypoint Bio, Dewpoint Therapeutics) and receive honoraria for occasional talks at pharmaceutical and biotechnology companies. O.S. declares shares in Phenaros Pharmaceuticals. L.H.G. is an employee at Ignota Labs, where CELLSCAPE is a proprietary software.
Supplementary Material
References
- Varga Z. V.; Ferdinandy P.; Liaudet L.; Pacher P. Drug-Induced Mitochondrial Dysfunction and Cardiotoxicity. Am. J. Physiol.: Heart Circ. Physiol. 2015, 309 (9), H1453–H1467. 10.1152/ajpheart.00554.2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Onakpoya I. J.; Heneghan C. J.; Aronson J. K. Post-Marketing Withdrawal of 462 Medicinal Products because of Adverse Drug Reactions: A Systematic Review of the World Literature. BMC Med. 2016, 14, 10. 10.1186/s12916-016-0553-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dykens J. A.; Will Y. The Significance of Mitochondrial Toxicity Testing in Drug Development. Drug Discov. Today 2007, 12 (17–18), 777–785. 10.1016/j.drudis.2007.07.013. [DOI] [PubMed] [Google Scholar]
- Clements M.; Millar V.; Williams A. S.; Kalinka S. Bridging Functional and Structural Cardiotoxicity Assays Using Human Embryonic Stem Cell-Derived Cardiomyocytes for a More Comprehensive Risk Assessment. Toxicol. Sci. 2015, 148 (1), 241–260. 10.1093/toxsci/kfv180. [DOI] [PubMed] [Google Scholar]
- Mamoshina P.; Rodriguez B.; Bueno-Orovio A. Toward a Broader View of Mechanisms of Drug Cardiotoxicity. Cell Rep. Med. 2021, 2 (3), 100216. 10.1016/j.xcrm.2021.100216. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McGowan J. V.; Chung R.; Maulik A.; Piotrowska I.; Walker J. M.; Yellon D. M. Anthracycline Chemotherapy and Cardiotoxicity. Cardiovasc. Drugs Ther. 2017, 31 (1), 63–75. 10.1007/s10557-016-6711-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sanguinetti M. C.; Tristani-Firouzi M. hERG Potassium Channels and Cardiac Arrhythmia. Nature 2006, 440 (7083), 463–469. 10.1038/nature04710. [DOI] [PubMed] [Google Scholar]
- Subbiah I. M.; Lenihan D. J.; Tsimberidou A. M. Cardiovascular Toxicity Profiles of Vascular-Disrupting Agents. Oncologist 2011, 16 (8), 1120–1130. 10.1634/theoncologist.2010-0432. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gilbert G.; Demydenko K.; Dries E.; Puertas R. D.; Jin X.; Sipido K.; Roderick H. L. Calcium Signaling in Cardiomyocyte Function. Cold Spring Harb. Perspect. Biol. 2020, 12 (3), a035428. 10.1101/cshperspect.a035428. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thomas T. P.; Grisanti L. A. The Dynamic Interplay Between Cardiac Inflammation and Fibrosis. Front. Physiol. 2020, 11, 529075. 10.3389/fphys.2020.529075. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Heinzerling L.; Ott P. A.; Hodi F. S.; Husain A. N.; Tajmir-Riahi A.; Tawbi H.; Pauschinger M.; Gajewski T. F.; Lipson E. J.; Luke J. J. Cardiotoxicity Associated with CTLA4 and PD1 Blocking Immunotherapy. J. Immunother Cancer 2016, 4, 50. 10.1186/s40425-016-0152-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hartupee J.; Mann D. L. Neurohormonal Activation in Heart Failure with Reduced Ejection Fraction. Nat. Rev. Cardiol. 2017, 14 (1), 30–38. 10.1038/nrcardio.2016.163. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li M.-Y.; Peng L.-M.; Chen X.-P. Pharmacogenomics in Drug-Induced Cardiotoxicity: Current Status and the Future. Front Cardiovasc Med. 2022, 9, 966261. 10.3389/fcvm.2022.966261. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Qu Y.; Li T.; Liu Z.; Li D.; Tong W. DICTrank: The Largest Reference List of 1318 Human Drugs Ranked by Risk of Drug-Induced Cardiotoxicity Using FDA Labeling. Drug Discov. Today 2023, 28 (11), 103770. 10.1016/j.drudis.2023.103770. [DOI] [PubMed] [Google Scholar]
- Chen M.; Suzuki A.; Thakkar S.; Yu K.; Hu C.; Tong W. DILIrank: The Largest Reference Drug List Ranked by the Risk for Developing Drug-Induced Liver Injury in Humans. Drug Discov. Today 2016, 21 (4), 648–653. 10.1016/j.drudis.2016.02.015. [DOI] [PubMed] [Google Scholar]
- Seal S.; Williams D. P.; Hosseini-Gerami L.; Spjuth O.; Bender A. Improved Early Detection of Drug-Induced Liver Injury by Integrating Predicted in Vivo and in Vitro Data. bioRxiv 2024, (https://doi.org/10.1101/2024.01.10.575128, accessed Jan 29, 2023) 10.1101/2024.01.10.575128. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bender A.; Cortés-Ciriano I. Artificial Intelligence in Drug Discovery: What Is Realistic, What Are Illusions? Part 1: Ways to Make an Impact, and Why We Are Not There yet. Drug Discov. Today 2021, 26 (2), 511–524. 10.1016/j.drudis.2020.12.009. [DOI] [PubMed] [Google Scholar]
- Horne R. I.; Wilson-Godber J.; Díaz A. G.; Brotzakis Z. F.; Seal S.; Gregory R. C.; Possenti A.; Chia S.; Vendruscolo M. Using Generative Modeling to Endow with Potency Initially Inert Compounds with Good Bioavailability and Low Toxicity. J. Chem. Inf. Model. 2024, (https://doi.org/10.1021/ACS.JCIM.3C01777) 10.1021/ACS.JCIM.3C01777. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bassan A.; Alves V. M.; Amberg A.; Anger L. T.; Beilke L.; Bender A.; Bernal A.; Cronin M. T. D.; Hsieh J.-H.; Johnson C.; Kemper R.; Mumtaz M.; Neilson L.; Pavan M.; Pointon A.; Pletz J.; Ruiz P.; Russo D. P.; Sabnis Y.; Sandhu R.; Schaefer M.; Stavitskaya L.; Szabo D. T.; Valentin J.-P.; Woolley D.; Zwickl C.; Myatt G. J. In Silico Approaches in Organ Toxicity Hazard Assessment: Current Status and Future Needs for Predicting Heart, Kidney and Lung Toxicities. Comput. Toxicol 2021, 20, 100188. 10.1016/j.comtox.2021.100188. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Svensson F.; Zoufir A.; Mahmoud S.; Afzal A. M.; Smit I.; Giblin K. A.; Clements P. J.; Mettetal J. T.; Pointon A.; Harvey J. S.; Greene N.; Williams R. V.; Bender A. Information-Derived Mechanistic Hypotheses for Structural Cardiotoxicity. Chem. Res. Toxicol. 2018, 31 (11), 1119–1127. 10.1021/acs.chemrestox.8b00159. [DOI] [PubMed] [Google Scholar]
- Wang Z.; Clark N. R.; Ma’ayan A. Drug-Induced Adverse Events Prediction with the LINCS L1000 Data. Bioinformatics 2016, 32 (15), 2338–2345. 10.1093/bioinformatics/btw168. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Galeano D.; Li S.; Gerstein M.; Paccanaro A. Predicting the Frequencies of Drug Side Effects. Nat. Commun. 2020, 11 (1), 4575. 10.1038/s41467-020-18305-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu Z.; Ramsundar B.; Feinberg E. N.; Gomes J.; Geniesse C.; Pappu A. S.; Leswing K.; Pande V. MoleculeNet: A Benchmark for Molecular Machine Learning. Chem. Sci. 2018, 9 (2), 513–530. 10.1039/C7SC02664A. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Duran-Frigola M.; Aloy P. Analysis of Chemical and Biological Features Yields Mechanistic Insights into Drug Side Effects. Chem. Biol. 2013, 20 (4), 594–603. 10.1016/j.chembiol.2013.03.017. [DOI] [PubMed] [Google Scholar]
- Jamal S.; Ali W.; Nagpal P.; Grover S.; Grover A. Computational Models for the Prediction of Adverse Cardiovascular Drug Reactions. J. Transl. Med. 2019, 17 (1), 171. 10.1186/s12967-019-1918-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bray M.-A.; Gustafsdottir S. M.; Rohban M. H.; Singh S.; Ljosa V.; Sokolnicki K. L.; Bittker J. A.; Bodycombe N. E.; Dančík V.; Hasaka T. P.; Hon C. S.; Kemp M. M.; Li K.; Walpita D.; Wawer M. J.; Golub T. R.; Schreiber S. L.; Clemons P. A.; Shamji A. F.; Carpenter A. E. A Dataset of Images and Morphological Profiles of 30 000 Small-Molecule Treatments Using the Cell Painting Assay. Gigascience 2017, 6 (12), 1–5. 10.1093/gigascience/giw014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Subramanian A.; Narayan R.; Corsello S. M.; Peck D. D.; Natoli T. E.; Lu X.; Gould J.; Davis J. F.; Tubelli A. A.; Asiedu J. K.; Lahr D. L.; Hirschman J. E.; Liu Z.; Donahue M.; Julian B.; Khan M.; Wadden D.; Smith I. C.; Lam D.; Liberzon A.; Toder C.; Bagul M.; Orzechowski M.; Enache O. M.; Piccioni F.; Johnson S. A.; Lyons N. J.; Berger A. H.; Shamji A. F.; Brooks A. N.; Vrcic A.; Flynn C.; Rosains J.; Takeda D. Y.; Hu R.; Davison D.; Lamb J.; Ardlie K.; Hogstrom L.; Greenside P.; Gray N. S.; Clemons P. A.; Silver S.; Wu X.; Zhao W.-N.; Read-Button W.; Wu X.; Haggarty S. J.; Ronco L. V.; Boehm J. S.; Schreiber S. L.; Doench J. G.; Bittker J. A.; Root D. E.; Wong B.; Golub T. R. A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles. Cell 2017, 171 (6), 1437–1452.e17. 10.1016/j.cell.2017.10.049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ashburner M.; Ball C. A.; Blake J. A.; Botstein D.; Butler H.; Cherry J. M.; Davis A. P.; Dolinski K.; Dwight S. S.; Eppig J. T.; Harris M. A.; Hill D. P.; Issel-Tarver L.; Kasarskis A.; Lewis S.; Matese J. C.; Richardson J. E.; Ringwald M.; Rubin G. M.; Sherlock G. Gene Ontology: tool for the unification of biology. Nat. Genet. 2000, 25 (1), 25–29. 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Seal S.; Yang H.; Vollmers L.; Bender A. Comparison of Cellular Morphological Descriptors and Molecular Fingerprints for the Prediction of Cytotoxicity- and Proliferation-Related Assays. Chem. Res. Toxicol. 2021, 34 (2), 422–437. 10.1021/acs.chemrestox.0c00303. [DOI] [PubMed] [Google Scholar]
- Seal S.; Yang H.; Trapotsi M.-A.; Singh S.; Carreras-Puigvert J.; Spjuth O.; Bender A. Merging Bioactivity Predictions from Cell Morphology and Chemical Fingerprint Models Using Similarity to Training Data. J. Cheminform. 2023, 15 (1), 56. 10.1186/s13321-023-00723-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Corsello S. M.; Bittker J. A.; Liu Z.; Gould J.; McCarren P.; Hirschman J. E.; Johnston S. E.; Vrcic A.; Wong B.; Khan M.; Asiedu J.; Narayan R.; Mader C. C.; Subramanian A.; Golub T. R. The Drug Repurposing Hub: A next-Generation Drug Library and Information Resource. Nat. Med. 2017, 23 (4), 405–408. 10.1038/nm.4306. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smit I. A.; Afzal A. M.; Allen C. H. G.; Svensson F.; Hanser T.; Bender A. Systematic Analysis of Protein Targets Associated with Adverse Events of Drugs from Clinical Trials and Postmarketing Reports. Chem. Res. Toxicol. 2021, 34 (2), 365–384. 10.1021/acs.chemrestox.0c00294. [DOI] [PubMed] [Google Scholar]
- Liu A.; Seal S.; Yang H.; Bender A. Using Chemical and Biological Data to Predict Drug Toxicity. SLAS Discovery 2023, 28 (3), 53–64. 10.1016/j.slasd.2022.12.003. [DOI] [PubMed] [Google Scholar]
- Kuhn M.; Letunic I.; Jensen L. J.; Bork P. The SIDER Database of Drugs and Side Effects. Nucleic Acids Res. 2016, 44 (D1), D1075–D1079. 10.1093/nar/gkv1075. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Broad Institute of Harvard. Repurposing Related Drug Annotations. https://repo-hub.broadinstitute.org/repurposing (accessed Oct 01, 2023).
- Ignota Labs . Ignota Labs. https://ignotalabs.ai/ (accessed Oct 05, 2023).
- Prediction of Drug Side Effects. http://www.maayanlab.net/SEP-L1000/ (accessed Oct 01, 2023).
- Landrum G. Rdkit Documentation. Release 2013, 1 (1–79), 4.(accessed Jan 29, 2023). [Google Scholar]
- Swain M.MolVS: Molecule Validation and Standardization, 2018. DOI 2019 (accessed Jan 29, 2023). [Google Scholar]
- Sul . Dimorphite_dl: Protonate Your SMILES! Mirror of Https://git.durrantlab.pitt.edu/jdurrant/dimorphite_dl/; Github. (accessed Jan 29, 2023).
- Pedregosa F.; Varoquaux G.; Gramfort A.; Michel V.; Thirion B.; Grisel O.; Blondel M.; Müller A.; Nothman J.; Louppe G.; Prettenhofer P.; Weiss R.; Dubourg V.; Vanderplas J.; Passos A.; Cournapeau D.; Brucher M.; Perrot M.; Duchesnay E. ´.. Scikit-Learn: Machine Learning in Python. 2012, arXiv [cs.LG], 2825–2830. https://www.jmlr.org/papers/volume12/pedregosa11a/pedregosa11a.pdf?ref=https:/. (accessed Oct 13, 2023)
- Sander T.; Freyss J.; von Korff M.; Rufener C. DataWarrior: An Open-Source Program for Chemistry Aware Data Visualization and Analysis. J. Chem. Inf. Model. 2015, 55 (2), 460–473. 10.1021/ci500588j. [DOI] [PubMed] [Google Scholar]
- Moriwaki H.; Tian Y.-S.; Kawashita N.; Takagi T. Mordred: A Molecular Descriptor Calculator. J. Cheminform. 2018, 10 (1), 4. 10.1186/s13321-018-0258-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mervin L. H.; Bulusu K. C.; Kalash L.; Afzal A. M.; Svensson F.; Firth M. A.; Barrett I.; Engkvist O.; Bender A. Orthologue Chemical Space and Its Influence on Target Prediction. Bioinformatics 2018, 34 (1), 72–79. 10.1093/bioinformatics/btx525. [DOI] [PMC free article] [PubMed] [Google Scholar]
- PIDGINv4 . PIDGINv4; Github; (accessed Jan 29, 2023). [Google Scholar]
- Daina A.; Michielin O.; Zoete V. SwissTargetPrediction: Updated Data and New Features for Efficient Prediction of Protein Targets of Small Molecules. Nucleic Acids Res. 2019, 47 (W1), W357–W364. 10.1093/nar/gkz382. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ferrari T.; Gini G.; Golbamaki Bakhtyari N.; Benfenati E.. Mining Toxicity Structural Alerts from SMILES: A New Way to Derive Structure Activity Relationships. In 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM); IEEE, 2011. 10.1109/cidm.2011.5949444. [DOI] [Google Scholar]
- Hemmerich J.; Troger F.; Füzi B.; FEcker G. Using Machine Learning Methods and Structural Alerts for Prediction of Mitochondrial Toxicity. Mol. Inf. 2020, 39 (5), e2000005 10.1002/minf.202000005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wishart D. S.; Feunang Y. D.; Guo A. C.; Lo E. J.; Marcu A.; Grant J. R.; Sajed T.; Johnson D.; Li C.; Sayeeda Z.; Assempour N.; Iynkkaran I.; Liu Y.; Maciejewski A.; Gale N.; Wilson A.; Chin L.; Cummings R.; Le D.; Pon A.; Knox C.; Wilson M. DrugBank 5.0: A Major Update to the DrugBank Database for 2018. Nucleic Acids Res. 2018, 46 (D1), D1074–D1082. 10.1093/nar/gkx1037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Statistical Functions (Scipy.Stats)—SciPy v1.11.3 Manual. https://docs.scipy.org/doc/scipy/reference/stats.html (accessed Oct 08, 2023).
- Seal S.; Carreras-Puigvert J.; Trapotsi M.-A.; Yang H.; Spjuth O.; Bender A. Integrating Cell Morphology with Gene Expression and Chemical Structure to Aid Mitochondrial Toxicity Detection. Commun. Biol. 2022, 5 (1), 858. 10.1038/s42003-022-03763-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lombardo F.; Berellini G.; Obach R. S. Trend Analysis of a Database of Intravenous Pharmacokinetic Parameters in Humans for 1352 Drug Compounds. Drug Metab. Dispos. 2018, 46 (11), 1466–1477. 10.1124/dmd.118.082966. [DOI] [PubMed] [Google Scholar]
- Pang L.; Sager P.; Yang X.; Shi H.; Sannajust F.; Brock M.; Wu J. C.; Abi-Gerges N.; Lyn-Cook B.; Berridge B. R.; Stockbridge N. Workshop Report: FDA Workshop on Improving Cardiotoxicity Assessment With Human-Relevant Platforms. Circ. Res. 2019, 125 (9), 855–867. 10.1161/CIRCRESAHA.119.315378. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rogers D.; Hahn M. Extended-Connectivity Fingerprints. J. Chem. Inf. Model. 2010, 50 (5), 742–754. 10.1021/ci100050t. [DOI] [PubMed] [Google Scholar]
- Seal S.; Carreras-Puigvert J.; Carpenter A. E.; Spjuth O.; Bender A. From Pixels to Phenotypes: Integrating Image-Based Profiling with Cell Health Data Improves Interpretability. Mol. Biol. Cell. 2024, 10.1101/2023.07.14. [DOI] [PMC free article] [PubMed] [Google Scholar]; (https://doi.org/10.1091/mbc.E23-08-0298).
- Slade W. R. Jr; McNeal A. C.; Tse P. S. Neurologic Complications of Cardiovascular Diseases. J. Natl. Med. Assoc. 1989, 81 (2), 193–197. [PMC free article] [PubMed] [Google Scholar]
- Ishisaka T.; Kishi S.; Okura K.; Horikoshi M.; Yamashita T.; Mitsuke Y.; Shimizu H.; Ueda T. A Precise Pharmacodynamic Study Showing the Advantage of a Marked Reduction in Cardiotoxicity in Continuous Infusion of Doxorubicin. Leuk. Lymphoma 2006, 47 (8), 1599–1607. 10.1080/10428190600580767. [DOI] [PubMed] [Google Scholar]
- Patrono C. Cardiovascular Effects of Cyclooxygenase-2 Inhibitors: A Mechanistic and Clinical Perspective. Br. J. Clin. Pharmacol. 2016, 82 (4), 957–964. 10.1111/bcp.13048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Singh G.; Miller J. D.; Huse D. M.; Pettitt D.; D’Agostino R. B.; Russell M. W. Consequences of Increased Systolic Blood Pressure in Patients with Osteoarthritis and Rheumatoid Arthritis. J. Rheumatol. 2003, 30 (4), 714–719. [PubMed] [Google Scholar]
- Wang H.; Wang Y.; Li J.; He Z.; Boswell S. A.; Chung M.; You F.; Han S. Three Tyrosine Kinase Inhibitors Cause Cardiotoxicity by Inducing Endoplasmic Reticulum Stress and Inflammation in Cardiomyocytes. BMC Med. 2023, 21 (1), 147. 10.1186/s12916-023-02838-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vandenberg J. I.; Perry M. D.; Perrin M. J.; Mann S. A.; Ke Y.; Hill A. P. hERG K(+) Channels: Structure, Function, and Clinical Significance. Physiol. Rev. 2012, 92 (3), 1393–1478. 10.1152/physrev.00036.2011. [DOI] [PubMed] [Google Scholar]
- Nagai-Okatani C.; Minamino N. Aberrant Glycosylation in the Left Ventricle and Plasma of Rats with Cardiac Hypertrophy and Heart Failure. PLoS One 2016, 11 (6), e0150210 10.1371/journal.pone.0150210. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Blann A. D.; Nadar S. K.; Lip G. Y. H. The Adhesion Molecule P-Selectin and Cardiovascular Disease. Eur. Heart J. 2003, 24 (24), 2166–2179. 10.1016/j.ehj.2003.08.021. [DOI] [PubMed] [Google Scholar]
- Nolly M. B.; Vargas L. A.; Correa M. V.; Lofeudo J. M.; Pinilla A. O.; Rueda J. O. V.; Guerrero-Gimenez M. E.; Swenson E. R.; Damiani M. T.; Alvarez B. V. Carbonic Anhydrase IX and Hypoxia-Inducible Factor 1 Attenuate Cardiac Dysfunction after Myocardial Infarction. Pflug. Arch. Eur. J. Physiol. 2021, 473 (8), 1273–1285. 10.1007/s00424-021-02592-5. [DOI] [PubMed] [Google Scholar]
- Bain C. R.; Ziemann M.; Kaspi A.; Khan A. W.; Taylor R.; Trahair H.; Khurana I.; Kaipananickal H.; Wallace S.; El-Osta A.; Myles P. S.; Bozaoglu K. DNA Methylation Patterns from Peripheral Blood Separate Coronary Artery Disease Patients with and without Heart Failure. ESC Heart Fail. 2020, 7 (5), 2468–2478. 10.1002/ehf2.12810. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Neumann J.; Hofmann B.; Dhein S.; Gergs U. Cardiac Roles of Serotonin (5-HT) and 5-HT-Receptors in Health and Disease. Int. J. Mol. Sci. 2023, 24 (5), 4765. 10.3390/ijms24054765. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Belmadani S.; Matrougui K. Broken Heart: A Matter of the Endoplasmic Reticulum Stress Bad Management?. World J. Cardiol. 2019, 11 (6), 159–170. 10.4330/wjc.v11.i6.0000. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hariharan N.; Sussman M. A. Stressing on the Nucleolus in Cardiovascular Disease. Biochim. Biophys. Acta, Mol. Basis Dis. 2014, 1842 (6), 798–801. 10.1016/j.bbadis.2013.09.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Matus M.; Schulte J.; Sur H.; Seidl M.; Schütz G.; Schmitz W.; Müller F. U. Knockout of ATF1 Leads to Enhanced Cardiac Contractility and Output. FASEB J. 2008, 22 (Suppl. 1), 1155.14. 10.1096/fasebj.22.1_supplement.1155.14.18039926 [DOI] [Google Scholar]
- Magadum A.; Singh N.; Kurian A. A.; Sharkar M. T. K.; Sultana N.; Chepurko E.; Kaur K.; Żak M. M.; Hadas Y.; Lebeche D.; Sahoo S.; Hajjar R.; Zangi L. Therapeutic Delivery of Pip4k2c-Modified mRNA Attenuates Cardiac Hypertrophy and Fibrosis in the Failing Heart. Adv. Sci. 2021, 8 (10), 2004661. 10.1002/advs.202004661. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Farrera D. O.; Galligan J. J. The Human Glyoxalase Gene Family in Health and Disease. Chem. Res. Toxicol. 2022, 35 (10), 1766–1776. 10.1021/acs.chemrestox.2c00182. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fu S.; Zhang Y.; Li Y.; Luo L.; Zhao Y.; Yao Y. Extracellular Vesicles in Cardiovascular Diseases. Cell Death Discovery 2020, 6, 68. 10.1038/s41420-020-00305-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grandi E.; Sanguinetti M. C.; Bartos D. C.; Bers D. M.; Chen-Izu Y.; Chiamvimonvat N.; Colecraft H. M.; Delisle B. P.; Heijman J.; Navedo M. F.; Noskov S.; Proenza C.; Vandenberg J. I.; Yarov-Yarovoy V. Potassium Channels in the Heart: Structure, Function and Regulation. J. Physiol. 2017, 595 (7), 2209–2228. 10.1113/JP272864. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hu X. F.; Lowe M.; Chan H. M. Mercury Exposure, Cardiovascular Disease, and Mortality: A Systematic Review and Dose-Response Meta-Analysis. Environ. Res. 2021, 193, 110538. 10.1016/j.envres.2020.110538. [DOI] [PubMed] [Google Scholar]
- Kier L. B.; Hall L. H. An Electrotopological-State Index for Atoms in Molecules. Pharm. Res. 1990, 07 (8), 801–807. 10.1023/a:1015952613760. [DOI] [PubMed] [Google Scholar]
- Bickerton G. R.; Paolini G. V.; Besnard J.; Muresan S.; Hopkins A. L. Quantifying the Chemical Beauty of Drugs. Nat. Chem. 2012, 4 (2), 90–98. 10.1038/nchem.1243. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Munawar S.; Windley M. J.; Tse E. G.; Todd M. H.; Hill A. P.; Vandenberg J. I.; Jabeen I. Experimentally Validated Pharmacoinformatics Approach to Predict hERG Inhibition Potential of New Chemical Entities. Front. Pharmacol 2018, 9, 1035. 10.3389/fphar.2018.01035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Polak S.; Wiśniowska B.; Brandys J. Collation, Assessment and Analysis of Literature in Vitro Data on hERG Receptor Blocking Potency for Subsequent Modeling of Drugs’ Cardiotoxic Properties. J. Appl. Toxicol. 2009, 29 (3), 183–206. 10.1002/jat.1395. [DOI] [PubMed] [Google Scholar]
- Cavalli A.; Poluzzi E.; De Ponti F.; Recanatini M. Toward a Pharmacophore for Drugs Inducing the Long QT Syndrome: Insights from a CoMFA Study of HERG K(+) Channel Blockers. J. Med. Chem. 2002, 45 (18), 3844–3853. 10.1021/jm0208875. [DOI] [PubMed] [Google Scholar]
- Garrido A.; Lepailleur A.; Mignani S. M.; Dallemagne P.; Rochais C. hERG Toxicity Assessment: Useful Guidelines for Drug Design. Eur. J. Med. Chem. 2020, 195, 112290. 10.1016/j.ejmech.2020.112290. [DOI] [PubMed] [Google Scholar]
- Ogale S. S.; Lee T. A.; Au D. H.; Boudreau D. M.; Sullivan S. D. Cardiovascular Events Associated with Ipratropium Bromide in COPD. Chest 2010, 137 (1), 13–19. 10.1378/chest.08-2367. [DOI] [PubMed] [Google Scholar]
- Shin J.; Lee J. H. Effects of Tiotropium on the Risk of Coronary Heart Disease in Patients with COPD: A Nationwide Cohort Study. Sci. Rep. 2022, 12 (1), 16674. 10.1038/s41598-022-21038-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Savarese J. J.; Ali H. H.; Basta S. J.; Scott R. F.; Embree P. B.; Wastlla W.; Abou-Donia M. M.; Gelb C. The Cardiovascular Effects of Mivacurium Chloride (BW B1090U) in Patients Receiving Nitrous Oxide-Opiate-Barbiturate Anesthesia. Anesthesiology 1989, 70 (3), 386–394. 10.1097/00000542-198903000-00003. [DOI] [PubMed] [Google Scholar]
- Javed A.; Ajmal M.; Wolfson A. Dabigatran in Cardiovascular Disease Management: A Comprehensive Review. World J. Cardiol. 2021, 13 (12), 710–719. 10.4330/wjc.v13.i12.710. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Martorell-Marugán J.; Tabik S.; Benhammou Y.; del Val C.; Zwir I.; Herrera F.; Carmona-Sáez P.. Deep Learning in Omics Data Analysis and Precision Medicine. In Computational Biology; Husi H., Ed.; Codon Publications: Brisbane (AU). [PubMed] [Google Scholar]
- Cai C.; Wang S.; Xu Y.; Zhang W.; Tang K.; Ouyang Q.; Lai L.; Pei J. Transfer Learning for Drug Discovery. J. Med. Chem. 2020, 63 (16), 8683–8694. 10.1021/acs.jmedchem.9b02147. [DOI] [PubMed] [Google Scholar]
- Altae-Tran H.; Ramsundar B.; Pappu A. S.; Pande V. Low Data Drug Discovery with One-Shot Learning. ACS Cent. Sci. 2017, 3 (4), 283–293. 10.1021/acscentsci.6b00367. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Veríssimo G. C.; Serafim M. S. M.; Kronenberger T.; Ferreira R. S.; Honorio K. M.; Maltarollo V. G. Designing Drugs When There Is Low Data Availability: One-Shot Learning and Other Approaches to Face the Issues of a Long-Term Concern. Expert Opin. Drug Discovery 2022, 17 (9), 929–947. 10.1080/17460441.2022.2114451. [DOI] [PubMed] [Google Scholar]
- Chandrasekaran S. N.; Ackerman J.; Alix E.; Ando D. M.; Arevalo J.; Bennion M.; Boisseau N.; Borowa A.; Boyd J. D.; Brino L.; Byrne P. J.; Ceulemans H.; Ch’ng C.; Cimini B. A.; Clevert D.-A.; Deflaux N.; Doench J. G.; Dorval T.; Doyonnas R.; Dragone V.; Engkvist O.; Faloon P. W.; Fritchman B.; Fuchs F.; Garg S.; Gilbert T. J.; Glazer D.; Gnutt D.; Goodale A.; Grignard J.; Guenther J.; Han Y.; Hanifehlou Z.; Hariharan S.; Hernandez D.; Horman S. R.; Hormel G.; Huntley M.; Icke I.; Iida M.; Jacob C. B.; Jaensch S.; Khetan J.; Kost-Alimova M.; Krawiec T.; Kuhn D.; Lardeau C.-H.; Lembke A.; Lin F.; Little K. D.; Lofstrom K. R.; Lotfi S.; Logan D. J.; Luo Y.; Madoux F.; Marin Zapata P. A.; Marion B. A.; Martin G.; McCarthy N. J.; Mervin L.; Miller L.; Mohamed H.; Monteverde T.; Mouchet E.; Nicke B.; Ogier A.; Ong A.-L.; Osterland M.; Otrocka M.; Peeters P. J.; Pilling J.; Prechtl S.; Qian C.; Rataj K.; Root D. E.; Sakata S. K.; Scrace S.; Shimizu H.; Simon D.; Sommer P.; Spruiell C.; Sumia I.; Swalley S. E.; Terauchi H.; Thibaudeau A.; Unruh A.; Van de Waeter J.; Van Dyck M.; van Staden C.; Warchoł M.; Weisbart E.; Weiss A.; Wiest-Daessle N.; Williams G.; Yu S.; Zapiec B.; Żyła M.; Singh S.; Carpenter A. E.. JUMP Cell Painting Dataset: Morphological Impact of 136,000 Chemical and Genetic Perturbations. bioRxiv 2023, 10.1101/2023.03.23.534023 (accessed Jan 29, 2023). [DOI] [Google Scholar]
- Fay M. M.; Kraus O.; Victors M.; Arumugam L.; Vuggumudi K.; Urbanik J.; Hansen K.; Celik S.; Cernek N.; Jagannathan G.; Christensen J.; Earnshaw B. A.; Haque I. S.; Mabey B.. RxRx3: Phenomics Map of Biology. bioRxiv 2023, 10.1101/2023.02.07.527350 (accessed Jan 29, 2023). [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The models based on chemical structures and physicochemical characteristics are readily accessible for direct use online at https://broad.io/DICTrank_Predictor. All code and data for all models can be found on GitHub (https://github.com/srijitseal/DICTrank) for local implementation, with further details on https://broad.io/DICTrank_Predictor. All data sets are also available at 10.6084/m9.figshare.24312274.v1.