Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Dec 28.
Published in final edited form as: Pharm Res. 2018 Jun 29;35(9):170. doi: 10.1007/s11095-018-2439-9

Naïve Bayesian Models for Vero Cell Cytotoxicity

Alexander L Perryman 1, Jimmy S Patel 1, Riccardo Russo 2, Eric Singleton 2, Nancy Connell 2, Sean Ekins 3, Joel S Freundlich 1,2
PMCID: PMC7768703  NIHMSID: NIHMS1655592  PMID: 29959603

Abstract

Purpose

To advance translational research of potential therapeutic small molecules against infectious microbes, the compounds must display a relative lack of mammalian cell cytotoxicity. Vero cell cytotoxicity (CC50) is a common initial assay for this metric. We explored the development of naïve Bayesian models that can enhance the probability of identifying non-cytotoxic compounds.

Methods

Vero cell cytotoxicity assays were identified in PubChem, reformatted, and curated to create a training set with 8741 unique small molecules. These data were used to develop Bayesian classifiers, which were assessed with internal cross-validation, external tests with a set of 193 compounds from our laboratory, and independent validation with an additional diverse set of 1609 unique compounds from PubChem.

Results

Evaluation with independent, external test and validation sets indicated that cytotoxicity Bayesian models constructed with the ECFP_6 descriptor were more accurate than those that used FCFP_6 fingerprints. The best cytotoxicity Bayesian model displayed predictive power in external evaluations, according to conventional and chance-corrected statistics, as well as enrichment factors.

Conclusions

The results from external tests demonstrate that our novel cytotoxicity Bayesian model displays sufficient predictive power to help guide translational research. To assist the chemical tool and drug discovery communities, our curated training set is being distributed as part of the Supplementary Material.

Keywords: Bayesian model, machine learning, predicting mammalian cytotoxicity, translational research, vero cell CC50

INTRODUCTION

Drug discovery is a long and costly process, commencing with hit discovery and translating through optimization efforts, with the long-term goal of entering preclinical and potentially clinical stages. Toxicity issues frequently arise and are a major cause of program failure, which has been recognized for decades (13). Researchers have developed many different types of computational models to predict specific toxicities such as: cardiotoxicity (4,5), hepatotoxicity (69), renal toxicity (10), and mitochondrial toxicity (11). Interestingly, one study (12) indicated that cell lines derived from three different organs may each be used to predict a compound’s general toxicity overall (but not the organ-specific toxicity, which is largely affected by compound accumulation in vivo). Model cell lines, such as Vero (African green monkey kidney), HepG2 (human hepatocarcinoma), THP-1 (derived from a human monocytic leukemia patient), Huh7 (human hepatocarcinoma), BHK-21 (baby Syrian Golden hamster kidney), HEK 293 T (human embryonic kidney), H9c2 (embryonic myocardium), CHO (Chinese hamster ovary), and NRK-52E (kidney proximal tubule) have been utilized to assess general cytotoxicity to mammalian cells or cell death (1317). Over the years, and in particular the last decade, several published studies have discussed the collation of data from different model cell lines and subsequent use of an array of machine learning or statistical approaches to predict cytotoxicity (1821). The majority of these efforts solely implemented internal testing, such as leaving out a percentage of the training set molecules and utilizing this as a test set. Most of these previous studies were performed without the evaluation of truly independent external test and validation sets. In addition, most previous studies did not distribute their curated training sets, which hampers model comparisons, improvements, and their utility for the chemical tool and drug discovery communities (especially in academia). Therefore, improving the ability to construct in vitro cytotoxicity models and apply them successfully to chemical tools or drug discovery series that are structurally different from the training set would be of significant value (22,23).

Our efforts in this area began with naïve Bayesian classifier (heretofore referred to as Bayesian) and other machine learning algorithms to model Vero cell cytotoxicity and growth inhibition of Mycobacterium tuberculosis (24). These so-called dual-event models were instrumental in the rediscovery of a triazine antitubercular with promising whole-cell efficacy (MIC = 0.15 μM; minimum concentration of compound to inhibit the growth of the bacterium by 90%) and lack of significant Vero cell cytotoxicity (CC50 = 9.6 μM; minimum concentration of compound to inhibit the growth of the mammalian cells by 50%). Recently, we have shown how we can extend these computational models of in vitro data to help predict in vivo activity in the mouse model of the disease (25). It is, however, important to use public data sets on Vero cell cytotoxicity to build standalone models to predict in vitro cytotoxicity. These models can then be used broadly across many chemical biology and drug discovery programs, spanning, for example, members of the ESKAPE family of bacteria (Enterococcus faecium, Staphylococcus aureus, Klebsiella pneumoniae, Acinetobacter baumannii, Pseudomonas aeruginosa, Enterobacter cloacae). Herein, we present our efforts to construct, test, and validate a Bayesian model for Vero cell cytotoxicity. Unlike most previous studies, we also make the training set for this model publicly available, as an sdf file in the Supplementary Material.

MATERIALS AND METHODS

Cytotoxicity Models: Curation of Training, Test, and Validation Sets

Seven different large sets of Vero cell cytotoxicity results from the Southern Research Institute (SRI) and SRI’s Specialized Biocontainment Screening Center (SBSC) were identified on PubChem and combined. These assay results were from evaluations of the cytotoxicity of initial hits from high-throughput screening projects in multiple therapeutic areas: West Nile Virus (AID 1650 and 2402), Dengue-2 Virus (AID 588,742) Mycobacterium tuberculosis (AID 435,019 and 492,998), Plasmodium falciparum (AID 602,317), and Human Immunodeficiency Virus (AID 1,117,358). Since we required a model to guide hit-to-lead optimization, and because 881 compounds had CC50 > 40 μM, we selected a cutoff of active = toxic = having a CC50 ≤ 39.4 μM (i.e., inactive, or non-cytotoxic, compounds were defined as displaying a CC50 ≥ 39.5 μM). This was the largest cutoff that we could use, without requiring us to discard those 881 compounds due to their ambiguous CC50 values, and was consistent with our typical goal criteria for MIC and SI = CC50/MIC. Only compounds with unambiguous CC50 values were used to define toxic members of the training set. For example, if a compound was listed as having a CC50 > 20 μM, it had to be discarded due to ambiguity when using a 39.4 μM cut-off. When the CC50 value was not explicitly listed on PubChem, the PubChem Activity Score could often be used to make the classification: e.g., a PubChem Activity Score of 0 = inactive = non-cytotoxic was assigned by SRI for compounds that never reduced Vero cell viability below 70% or 80% (depending on the comment section for that AID) for any concentration of the test compound that was evaluated. This was a valid metric for classifying a compound as non-cytotoxic, since these seven sets of assay results utilized a range of concentrations for each compound that exceeded 40 μM. For additional details regarding the collation and curation of the training and test sets, please see the Supplementary Material.

After combining the data from all of the curated sets of assay results together, a custom script that we constructed was then applied in Pipeline Pilot 9.5 (BIOVIA, Inc.) to remove duplicate compounds. This protocol used the following Pipeline components and connections: DS Ligand Reader ➔ Keep Largest Fragment ➔ Remove Duplicate Molecules ➔ SD Writer. The “Keep Largest Fragment” component strips out buffers and salts when they are present. The “Remove Duplicate Molecules” component uses canonical SMILES to compare each compound to every other compound in the set, and it removes duplicate instances of any compound. The “SD Writer” then generates an sdf output, which we used as the input for the Bayesian modeling protocol in Pipeline Pilot.

The curated training sets that we constructed were as follows: (A) the full (i.e., conventional or non-pruned) 40 μM training set had 1391 toxic (“active”) compounds out of 8741 unique compounds, using a CC50 cut-off of ≤39.4 μM to classify compounds as toxic = active = 1. For this training set, 15.91% of the compounds were defined as toxic. The data pruning strategy that we developed and validated for Bayesian models that predict mouse liver microsomal stability (26) was also utilized to generate an alternate training set. (B) The pruned 40 μM training set had 923 toxic (“active”) compounds out of 8273 unique compounds, using a CC50 cut-off of ≤20.4 μM to classify compounds as toxic = active = 1. Thus, 11.16% of the compounds in the pruned training set were classified as toxic. For this pruned training set, compounds that had a CC50 = 20.5–39.49 μM were deleted, but it had the exact same non-toxic compounds as the full training set. Unlike our mouse liver microsomal stability Bayesian, in which some of the unstable (inactive) compounds were pruned, for the pruned cytotoxicity model we removed some of the actives (i.e., the moderately toxic compounds were deleted, while the most toxic compounds were retained). This provided an alternate test of our pruning strategy. In addition, we could not reliably prune the moderately non-cytotoxic compounds, due to the aforementioned issue with 881 compounds having ambiguous CC50 values of >40 μM.

Independent and external test sets were also constructed to enable evaluating our different cytotoxicity Bayesian models. One external test set, the JSF set, contained 193 compounds from Freundlich lab projects in different therapeutic areas, for which Vero cell cytotoxicity values were available. With the 40 μM cutoff, the external JSF set contained 82 out of 193 compounds (42.49%) as cytotoxic. An additional external set was constructed by combining two different sets of SRI results from PubChem BioAssay on a M. tuberculosis project (AID 434958) and from a project on VEEV (Venezuelan Equine Encephalitis Virus, AID 588719). To generate an additional external test set that was larger and more structurally diverse than the JSF set, all of the M. tuberculosis results were combined with only the unambiguously toxic compounds from the VEEV set, to generate the Mtb + VEEV set, of which 587 out of 1609 compounds (36.48%) were classified as toxic (using the aforementioned metric). The other compounds in the VEEV set, which were discarded, all had ambiguous CC50 values of >20 μM, which were unsuitable for our purposes. The custom Pipeline Pilot script mentioned previously was used to remove duplicate compounds from within each of these independent, external test sets. In addition, each of these external sets was then concatenated with the full training set, and the custom Pipeline Pilot script was again used to remove any compounds from the external test sets that were duplicates with the training set (according to canonical SMILES).

Construction of Bayesian Models to Predict Vero Cell Cytotoxicity

The same Bayesian modeling protocol was utilized that we established in our previous studies (24,26). Nine different descriptors were used in Pipeline Pilot 9.5 (BIOVIA, Inc., San Diego, CA) to characterize the training set, build the model, and evaluate the external test and validation sets. Eight of these descriptors are associated with the physiochemical properties of each compound as a whole entity: ALogP, molecular weight, number of rotatable bonds, number of aromatic rings, total number of all types of rings, number of hydrogen bond donors, number of hydrogen bond acceptors, and the molecular fractional polar surface area. The ninth descriptor classifies the 2D topology of different substructures within each compound. Two different types of these substructural descriptors were tested: FCFP_6 (functional class fingerprints of maximum diameter 6, which uses bits for different functional groups) and ECFP_6 (extended class fingerprints of maximum diameter 6, which uses bits that are atom type specific). These 2D descriptors characterize which atoms are connected to each other; the 6 at the end of ECFP_6 indicates that 1 to 6 different levels of connections are assessed. Thus, one set of models was constructed using FCFP_6 and the eight standard physiochemical property descriptors, while another set of models was constructed using ECFP_6 and the same set of eight other descriptors. The models were constructed in Pipeline Pilot 9.5, which we also utilized to evaluate the independent external test and validation sets. Since cytotoxic compounds were defined as “active” = 1, compounds that have a higher Bayesian score have a higher probability of being cytotoxic (i.e., a higher probability of crossing the CC50 cut-off threshold, but not necessarily a higher probability of being more toxic as quantified by a numerically lower CC50).

RESULTS

Efforts commenced with the location and curation of a model training set of Vero cell cytotoxicity data on PubChem BioAssay (https://pubchem.ncbi.nlm.nih.gov/) (27), all generated by one institution, the Southern Research Institute, in the hope that a more standardized set of protocols and cell lines might generate less noisy data. A cytotoxicity cutoff of 39.4 μM (such that “active” or “good” in the model connotes cytotoxic and molecules with a CC50 ≥ 39.5 μM were judged to be non-toxic or “inactive”) was chosen, because 881 compounds in the training set had ambiguous CC50 values of >40 μM, and we chose not to discard this many compounds. Secondarily, we rationalized that the CC50 cutoff was reasonable given that an SI criteria of ≥10 would necessitate MIC values ≤4 μM. Typically, in our experience MIC values in this range are acceptable for antibacterial hit compounds.

Drawing on our past development and implementation of a training set pruning strategy with mouse liver microsomal stability models (26), we explored pruning of the CC50 training set such that compounds with a CC50 = 20.5–39.49 μM were deleted to remove this potential dis-information with modestly cytotoxic molecules. The pruned training set contained the exact same non-toxic compounds as the full training set (Fig. 1). Different descriptors (ECFP-6 and FCFP-6) and full (or unfiltered) versus pruned training sets were evaluated with internal five-fold cross-validation, followed by assessments with independent, external test and validation sets (Figs. 2 and 3, Table I and II).

Fig. 1.

Fig. 1

Workflow used to construct and evaluate the new 40 μM cytotoxicity Bayesian models. SRI’s SBSC refers to the Southern Research Institute’s Specialized Biocontainment Screening Center.

Fig. 2.

Fig. 2

Internal and external ROC curves for the four new 40 μM cytotoxicity Bayesian models. In (a) the internal ROC curves from five-fold cross-validation of the ECFP6 and FCFP6-based models are shown in green and orange, respectively. The internal ROC curves for the two pruned models are displayed in (b), with the pruned ECFP6 model in green and the pruned FCFP6 model in orange. The external ROC from evaluating these models on the set of JSF compounds is shown in (c), while (d) depicts the external ROC from evaluating these models with the independent external SRI set of Mtb + VEEV compound set. In both (c) and (d), the full ECFP6-based model is in green, the FCFP6-based full model is in orange, the pruned ECFP6-based model is in cyan, and the pruned FCFP6-based model is in tan. For all ROC plots, the diagonal in grey depicts the ROC that a random model would produce.

Fig. 3.

Fig. 3

Histograms displaying the distribution of Bayesian scores when evaluating the 40 μM cytotoxicity models with the external test set of JSF compounds. The percentage of non-toxic compounds that received a particular Bayesian score are shown in green, while the percentage of toxic compounds that had a particular score are rendered in red. Since these models were constructed by defining toxic as the “active” classification and non-toxic as inactive, toxic compounds should have high scores and non-toxic compounds should be ranked with lower scores. In (a) the full ECFP6-based model is depicted, while (b) displays the distribution produced by the full FCFP6-based model.

Table I.

Internal Statistics from Five-Fold Cross-Validation and External Statistics from Independent Test and Validation Sets

Internal statistics: ECFP6 40 μM cytotoxicity Bayesian FCFP6 40 μM cytotoxicity Bayesian Pruneda ECFP6 40 μM cytotoxicity Bayesian Pruneda FCFP6 40 μM cytotoxicity Bayesian
ROC scoreb 0.806 0.796 0.841 0.829
Sensitivity %c 90.9 87.7 92.2 93.1
Specificity %d 83.7 80.9 86.5 79.1
Concordance %e 84.9 82.0 87.2 80.6
External test statistics on JSF setf
 ROC score 0.767 0.729 0.794 0.744
 Sensitivity % 86.6 89.0 85.4 89.0
 Specificity % 58.6 31.5 55.9 38.7
 Concordance % 70.5 56.0 68.4 60.1
External validation statistics on SRI’s Mtb + VEEV setg
 ROC score 0.746 0.727 0.715 0.698
 Sensitivity % 60.0 66.3 62.2 61.2
 Specificity % 76.9 64.5 66.6 65.2
 Concordance % 70.7 65.1 65.0 63.7
a)

“Pruned” indicates that moderately toxic compounds, which had a CC50 of 20.5–39.49 μM, were deleted from the training set before constructing the 40 μM cytotoxicity Bayesian model. Compounds were only pruned from the training set and not from any test or validation sets

b)

ROC score refers to the area under the curve of the ROC curve. A value of 1.0 represents a “perfect” model

c)

Sensitivity indicates the total % of toxic compounds (true positives) identified, using a Vero cell CC50 ≤ 39.4 μM to define toxicity

d)

Specificity indicates the total % of non-toxic compounds (true negatives) correctly identified

e)

Concordance is a measure of the overall accuracy. It reflects the sum of the number of true positives + true negatives correctly identified, divided by the total number of predictions made (i.e., by the total number of compounds in that set)

f)

The JSF set is a collection of 193 compounds from Freundlich lab projects across multiple therapeutic areas (mostly M. tuberculosis and the ESKAPE bacteria), of which 82 compounds (42.49%) were toxic and 111 (57.51%) were non-toxic, according to the Vero cell CC50 ≤ 39.4 μM = toxic metric

g)

SRI’s Mtb + VEEV set is a collation of two PubChem sets of cytotoxicity data from the Southern Research Institute. All of the compounds from an SRI project against M. tuberculosis (Mtb) were included in this set, but only the unambiguously toxic compounds from their VEEV set were included (the VEEV project compounds with an ambiguous CC50 > 20 μM were discarded). This external validation set has 1609 compounds, of which 587 (36.48%) are toxic, according to the Vero cell CC50 ≤ 39.4 μM = toxic metric

Table II.

Additional External Statistics from Independent Test and Validation Sets, Including Chance-Corrected Statistics and Enrichment Factors

External Test statistics on JSF set: ECFP6 40 μM cytotoxicity Bayesian FCFP6 40 μM cytotoxicity Bayesian Pruneda ECFP6 40 μM cytotoxicity Bayesian Pruneda FCFP6 40 μM cytotoxicity Bayesian
Cohen’s Kappab 0.43 0.19 0.39 0.25
Matthew’s correlation coeffcientb 0.46 0.24 0.42 0.31
F1 Scorec 0.71 0.63 0.70 0.65
NPV hit rated (% of nontoxic predictions correct) 65/76 (85.53%) 35/44 (79.55%) 62/74 (83.78%) 43/52 (82.69%)
NPV Enrichment factor overalle 1.49 1.38 1.46 1.44
NPV Enrichment factor for the best 10% of scoresf 1.56 1.46 1.56 1.46
NPV Enrichment factor for the best 5% of Scoresf 1.74 1.74 1.74 1.74
External validation statistics on Mtb + VEEV set:
 Cohen’s Kappa 0.37 0.29 0.28 0.25
 Matthew’s Correlation Coefficient 0.37 0.30 0.28 0.26
 F1 Score 0.60 0.58 0.56 0.55
 NPV hit rate (% of nontoxic predictions correct) 786 / 1021 (76.98%) 659 / 857 (76.90%) 681 / 903 (75.42%) 666 / 894 (74.50%)
 NPV Enrichment factor overallg 1.21 1.21 1.19 1.17
 NPV Enrichment factor for the best 10% of scoresh 1.35 1.35 1.32 1.33
 NPV Enrichment factor for the best 5% of Scoresh 1.44 1.44 1.36 1.30
a)

“Pruned” indicates that moderately toxic compounds, which had a CC50 of 20.5–39.49 μM, were deleted from the training set before constructing the 40 μM cytotoxicity Bayesian model. Compounds were only pruned from the training set—not from any test or validation sets

b)

Cohen’s Kappa and Matthew’s Correlation Coefficient (MCC) are both chance-corrected statistics. They indicate the overall accuracy at identifying both toxic and non-toxic compounds, with different correction factors included that correspond to the expected chance of making correct predictions by random chance

c)

The F1 score is not chance-corrected. It is the harmonic mean of precision and sensitivity

d)

NPV hit rate refers to the Negative Predicted Value, which equals the number of true negatives (i.e., correctly identified non-toxic compounds) divided by the total number of negative predictions (i.e., by the sum of true negatives and false negatives)

e)

The NPV enrichment factor is calculated by dividing the NPV hit rate % by the % of compounds in the external set that are non-toxic (i.e., by the random chance of picking non-toxic compounds). For the JSF set, 111 / 193 (57.51%) are non-toxic. A “perfect” model would thus have a maximum NPV enrichment factor of 100% / 57.51% = 1.74

f)

The NPV enrichment factors for the best 10% and best 5% of scores refer to the enrichment factors amongst the 10% (19 compounds) or 5% (10 compounds) that received the lowest Bayesian scores. Since these Bayesians are trained to predict toxicity, high scores reflect a prediction of toxicity, while the lowest scores predict non-toxicity

g)

For the SRI’s Mtb + VEEV set, 1022 / 1609 (63.52%) compounds are non-toxic. The maximum NPV enrichment factor that a “perfect” model could achieve is thus 100% / 63.52% = 1.57

h)

The NPV enrichment factors for the best 10% and best 5% of scores correspond to the lowest-scoring 161 and 80 compounds, respectively

All four of the 40 μM Bayesian models displayed sufficient accuracy as viewed through internal five-fold cross-validation (20% of the training set was removed, the model was built with the remaining 80% of the data and used to score the 20% that was removed, and the process was repeated five times; Table I). All four models also had sensitivity, specificity, concordance, and ROC AUC values between 79 and 93% (Table I). For nearly every internal validation metric, the ECFP-6 based model was demonstrated to be similar to or significantly more accurate than the corresponding FCFP-6 based model. The same trend occurred for the internal ROC curves, in which the ECFP-6 based models were slightly better than the corresponding FCFP-6 based model (Fig. 2). For the internal ROC curves (which measure the true positive rate and the false positive rate and, thus, compare the PPV cytotoxicity hit rate, as opposed to the NPV filtering rate or non-toxicity hit rate), the pruned ECFP-6 model had the best shape and AUC out of all four models, but it was only slightly better than the full ECFP-6 model, with ROC AUC values of 84% versus 81%, respectively.

When evaluating the four different Bayesian models with an external test set of 193 Freundlich laboratory compounds (prepared during the course of various antibacterial programs), the value of the ECFP-6 based models became more apparent (Table I). Comparing the full ECFP-6 based and the full FCFP-6 based models, the external ROC AUC values (77% versus 73%, respectively), specificity (59% versus 31%) and concordance (70% versus 56%) clearly favored the ECFP-6 based model. Similarly, when comparing the pruned ECFP-6 based model to the pruned FCFP-6 based model, the external ROC AUC values (79% versus 74%), specificity (56% versus 39%), and concordance (68% versus 60%) favored the pruned ECFP-6 model. The same overall trend was observed when comparing the chance-corrected statistics (Cohen’s Kappa and Matthew’s Correlation Coefficient), F1 scores, NPV filtering hit rates (for both the total number of correctly predicted non-cytotoxic compounds and the percentage of compounds predicted to be non-toxic that were non-toxic), and NPV enrichment factors (Table II; Supplementary Table S-I). The ECFP-6 based model was significantly more accurate than the corresponding FCFP-6 based model. Interestingly, for the NPV enrichment factors of the best 5% of compounds (i.e., the ten Freundlich laboratory compounds that had the lowest Bayesian scores from each model), all four cytotoxicity Bayesian models had a perfect enrichment factor of 1.74 (i.e., perfect EF = 100% correct / 57.51% of the compounds are known to be non-cytotoxic = 1.74). According to the specificity (ability to correctly identify non-cytotoxic compounds), the concordance (the overall accuracy), the chance-corrected Kappa and MCC values, the overall NPV enrichment factor, and the histogram-based distribution of scores for cytotoxic and non-cytotoxic compounds (Fig. 3), the full ECFP-6 model was the most accurate Bayesian when evaluating these four models on the external JSF set.

When comparing the same types of statistics and other metrics that were generated when evaluating these four cytotoxicity Bayesian models with the external “Mtb +VEEV validation set,” (1609 compounds; created from PubChem data sets AID 434958 and AID 588719) the same overall trend occurred (Tables I and II). The ECFP-6 based model was more accurate than the corresponding FCFP-6 based model, and the full ECFP-6 based Bayesian was the most accurate model overall for this second external, independent set.

Pertinent to the full ECFP-6 model, the chemical structural features most frequently associated with Vero cell cytotoxicity (Supplementary Fig. S1) as well as lack of cytotoxicity (Supplementary Fig. S2) may be discerned. Substructural features that were most frequently present in toxic compounds include alkyl halide, nitrofuran, pyridine, and imine. In contrast, 1,2,4-triazole, 4-alkoxyaniline, and 2-alkoxy/aryloxyacetamide were rarely or never present within the cytotoxic molecules in the training set, but they were found in many non-cytotoxic compounds.

DISCUSSION

The ability to predict cytotoxicity found initial utility for small structure-activity relationship studies focused on developing compounds that displayed cytotoxicity for cancer applications (28,29) and was then broadened to larger cancer related datasets (30). The NCI dataset and other sets of public or proprietary assay data have been used in multiple therapeutic areas to make several different types of cytotoxicity/cell viability models using a range of approaches, including 2D kernel models (31), decision trees with MACCS keys (32), a linear model using least squares (32), neural networks (33), Recursive-Partitioning Random Forest and naïve Bayesian models using CATS2D descriptors (34), Kernel Multitask Latent Analysis (35), weighted feature significance (WFS) and sequential minimal optimization compared to a Bayesian classifier (36), Bayesian models with FCFP_6 fingerprints (and the eight default physiochemical property descriptors in Pipeline Pilot) (37), SVM methods with 4D fingerprints (38), and Random Forest models with ECFP_4 descriptors (39). For a detailed description of these previous models and their accuracy, please refer to the Supplementary Material (pages 4–6).

These previous studies used different CC50 cut-off values, and they generally did not release their curated training and test sets, which impedes our ability to make a rigorous comparison of these previous models to the new models we present herein. However, using the internal and external statistics provided for these previously published models, our best model generally performed similarly or displayed enhanced predictive power. Importantly, we provide our curated training set and an external test set as an sdf file in the Supplementary Material, to enable other laboratories to perform further studies that utilize different modeling approaches and/or training sets and compare their models to our best cytotoxicity model in a fair and objective manner.

Considering the internal statistics and the many different external metrics (calculated from evaluating two different independent sets of compounds), the full ECFP-6 based model was the most accurate cytotoxicity Bayesian that we constructed. The pruned ECFP-6 model was generally only slightly less accurate than the full ECFP-6 model. Thus, unlike our previous study in which the pruned (i.e., moderately inactive, or unstable, compounds were deleted) mouse liver microsomal stability Bayesian was the most accurate model (26), pruning the moderately active (cytotoxic) compounds from the training set did not significantly improve the predictive power of the Bayesian model in the external tests. However, the pruning strategy did not significantly degrade performance at predicting cytotoxicity either, even though fewer compounds were used to train that model.

Although our best cytotoxicity model exhibited sufficient internal and external statistics to support the assertion that it will display useful predictive power, no modeling results should ever be considered as absolute proof. Models are expected to function best when predicting the property of a test compound with significant similarity to its training set. With Bayesian classifier models, one can select to have the model output the “applicability domain” score for a test compound, which is basically a measure of how similar a test compound is to the training set. Conversely, we have demonstrated that some of our models, including the full ECPF6 based model, are able to make useful predictions for molecules that have very different structures and physiochemical properties than the training sets (i.e., they are “generalizable” to different areas of chemical space than what trained them) (26). Based on our own experiences (40) and the results of other labs (41), classifications of compounds that receive the highest or lowest scores tend to be most accurate, while predictions for compounds that have scores that are near the activity threshold are least accurate. However, Vero cell cytotoxicity assays will still need to be conducted to validate the predictions that are made for a particular compound or series of interest. This model is not intended to replace experiments. Its purpose is to increase the probability of probing more useful areas of chemical space, thereby increasing efficiency and reducing costs. By filtering libraries of available compounds or virtual libraries of potential analogs of a molecule of interest, this model should focus the medicinal chemistry decisions toward a smaller number of compounds that have a higher likelihood of being non-cytotoxic. Many prospective predictions, covering numerous types of scaffolds, will need to be conducted and experimentally assessed over time to fully characterize how generalizable the model is to particular regions of chemical space. By releasing our curated training set and test set to the community, we are facilitating those types of follow-up studies as well as providing datasets that other labs may use when testing their own machine learning algorithms or protocols.

CONCLUSIONS

Based on the conventional and chance-corrected statistics, PPV hit rates, NPV filtering rates, enrichment factors, and histograms produced when evaluating our new cytotoxicity Bayesian models with independent external test and validation sets, cytotoxicity Bayesian models constructed with the ECFP_6 descriptor (and the eight other standard descriptors for physiochemical properties) were significantly more accurate than similar models that were constructed using the FCFP_6 descriptor. Our best cytotoxicity Bayesian model displayed sufficient predictive power when evaluating it with independent external test and validation sets. It could provide a useful approach to filter commercially available libraries or virtual libraries of potential analogs in a hit-to-lead optimization to identify hit derivatives with a higher probability of not being cytotoxic. Studies in multiple therapeutic areas are currently underway that utilize our best cytotoxicity model to make these types of prospective predictions. To assist other researchers in the chemical tool and drug discovery communities, we are releasing our curated training set and the Mtb + VEEV test set as sdf files in the Supplementary Material.

Supplementary Material

Supplementary Material

ACKNOWLEDGMENTS AND DISCLOSURES

J.S.F., S.E., and N.C. were supported by Award Number U19AI109713 NIH/NIAID for the “Center to develop therapeutic countermeasures to high-threat bacterial agents,” from the National Institutes of Health: Centers of Excellence for Translational Research (CETR). We thank Tim O’Driscoll at BIOVIA for providing J.S.F with Discovery Studio and Pipeline Pilot. We also thank Jodi Shaulsky at BIOVIA and Katalin Nadassy (formerly at Accelrys) for assistance with setting up and maintaining the license server and Pipeline Pilot server.

ABBREVIATIONS

ADME/Tox

Absorption, metabolism, distribution, excretion and toxicity

AID

Assay Identification number on PubChem BioAssay

ECFP_6

Extended class fingerprints of maximum diameter 6

FCFP_6

Molecular function class fingerprints of maximum diameter 6

NPV

Negative predictive value (filtering rate)

PPV

Positive predictive value (hit rate)

QSAR

Quantitative Structure-Activity Relationships

ROC

Receiver-operator characteristic

SAR

Structure-Activity Relationship

SMILES

Simplified molecular-input line-entry system

Vero CC50

Vero cell (African green monkey kidney cell) 50% cytotoxicity value

Footnotes

Conflicts of Interest S.E. is the Founder and CEO of Collaborations Pharmaceuticals Inc.

Electronic supplementary material The online version of this article (https://doi.org/10.1007/s11095-018-2439-9) contains supplementary material, which is available to authorized users.

REFERENCES

  • 1.Kola I, Landis J. Can the pharmaceutical industry reduce attrition rates? Nat Rev Drug Discov. 2004;3(8):711–5. [DOI] [PubMed] [Google Scholar]
  • 2.Schoonen WG, Westerink WM, Horbach GJ. High-throughput screening for analysis of in vitro toxicity. EXS. 2009;99:401–52. [DOI] [PubMed] [Google Scholar]
  • 3.Segall MD, Barber C. Addressing toxicity risk when designing and selecting compounds in early drug discovery. Drug Discov Today. 2014;19(5):688–93. [DOI] [PubMed] [Google Scholar]
  • 4.Chekmarev DS, Kholodovych V, Balakin KV, Ivanenkov Y, Ekins S, Welsh WJ. Shape signatures: new descriptors for predicting cardiotoxicity in silico. Chem Res Toxicol. 2008;21(6):1304–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Polak S, Wisniowska B, Fijorek K, Glinka A, Polak M, Mendyk A. The open-access dataset for insilico cardiotoxicity prediction system. Bioinformation. 2011;6(6):244–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Ekins S, Williams AJ, Xu JJ. A predictive ligand-based Bayesian model for human drug-induced liver injury. Drug Metab Dispos. 2010;38(12):2302–8. [DOI] [PubMed] [Google Scholar]
  • 7.Greene N, Fisk L, Naven RT, Note RR, Patel ML, Pelletier DJ. Developing structure-activity relationships for the prediction of hepatotoxicity. Chem Res Toxicol. 2010;23(7):1215–22. [DOI] [PubMed] [Google Scholar]
  • 8.Rodgers AD, Zhu H, Fourches D, Rusyn I, Tropsha A. Modeling liver-related adverse effects of drugs using knearest neighbor quantitative structure-activity relationship method. Chem Res Toxicol. 2010;23(4):724–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Liew CY, Lim YC, Yap CW. Mixed learning algorithms and features ensemble in hepatotoxicity prediction. J Comput Aided Mol Des. 2011;25(9):855–71. [DOI] [PubMed] [Google Scholar]
  • 10.Ekins S Progress in computational toxicology. J Pharmacol Toxicol Methods. 2014;69(2):115–40. [DOI] [PubMed] [Google Scholar]
  • 11.Zhang H, Chen QY, Xiang ML, Ma CY, Huang Q, Yang SY. In silico prediction of mitochondrial toxicity by using GA-CG-SVM approach. Toxicol in Vitro. 2009;23(1):134–40. [DOI] [PubMed] [Google Scholar]
  • 12.Lin Z, Will Y. Evaluation of drugs with specific organ toxicities in organ-specific cell lines. Toxicol Sci. 2012;126(1):114–27. [DOI] [PubMed] [Google Scholar]
  • 13.Lakshminarayana SB, Huat TB, Ho PC, Manjunatha UH, Dartois V, Dick T, et al. Comprehensive physicochemical, pharmacokinetic and activity profiling of anti-TB agents. J Antimicrob Chemother. 2015;70(3):857–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Riss TL, Moravec RA. Use of multiple assay endpoints to investigate the effects of incubation time, dose of toxin, and plating density in cell-based cytotoxicity assays. Assay Drug Dev Technol. 2004;2(1):51–62. [DOI] [PubMed] [Google Scholar]
  • 15.Manjunatha UH, Smith PW. Perspective: challenges and opportunities in TB drug discovery from phenotypic screening. Bioorg Med Chem. 2015;23(16):5087–97. [DOI] [PubMed] [Google Scholar]
  • 16.Franzblau SG, DeGroote MA, Cho SH, Andries K, Nuermberger E, Orme IM, et al. Comprehensive analysis of methods used for the evaluation of compounds against Mycobacterium tuberculosis. Tuberculosis (Edinb). 2012;92(6):453–88. [DOI] [PubMed] [Google Scholar]
  • 17.Kim H, Yoon SC, Lee TY, Jeong D. Discriminative cytotoxicity assessment based on various cellular damages. Toxicol Lett. 2009;184(1):13–7. [DOI] [PubMed] [Google Scholar]
  • 18.Schrey AK, Nickel-Seeber J, Drwal MN, Zwicker P, Schultze N, Haertel B, et al. Computational prediction of immune cell cytotoxicity. Food Chem Toxicol. 2017;107(Pt A):150–66. [DOI] [PubMed] [Google Scholar]
  • 19.Moon H, Cong M. Predictive models of cytotoxicity as mediated by exposure to chemicals or drugs. SAR QSAR Environ Res. 2016;27(6):455–68. [DOI] [PubMed] [Google Scholar]
  • 20.Adhikari N, Halder AK, Saha A, Das Saha K, Jha T. Structural findings of phenylindoles as cytotoxic antimitotic agents in human breast cancer cell lines through multiple validated QSAR studies. Toxicol in Vitro. 2015;29(7):1392–404. [DOI] [PubMed] [Google Scholar]
  • 21.Ekins S, Freundlich JS, Hobrath JV, Lucile White E, Reynolds RC. Combining computational methods for hit to lead optimization in Mycobacterium tuberculosis drug discovery. Pharm Res. 2014;31(2):414–35. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Stouch TR, Kenyon JR, Johnson SR, Chen XQ, Doweyko A, Li Y. In silico ADME/Tox: why models fail. J Comput Aided Mol Des. 2003;17(2–4):83–92. [DOI] [PubMed] [Google Scholar]
  • 23.Johnson SR. The trouble with QSAR (or how I learned to stop worrying and embrace fallacy). J Chem Inf Model. 2008;48(1): 25–6. [DOI] [PubMed] [Google Scholar]
  • 24.Ekins S, Reynolds RC, Kim H, Koo M-S, Ekonomidis M, Talaue M, et al. Bayesian models leveraging bioactivity and cytotoxicity information for drug discovery. Chem Biol. 2013;20:370–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Ekins S, Perryman AL, Clark AM, Reynolds RC, Freundlich JS. Machine learning model analysis and data visualization with small molecules tested in a mouse model of Mycobacterium tuberculosis infection (2014–2015). J Chem Inf Model. 2016;56(7):1332–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Perryman AL, Stratton TP, Ekins S, Freundlich JS. Predicting mouse liver microsomal stability with “pruned” machine learning models and public data. Pharm Res. 2016;33(2):433–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Wang Y, Xiao J, Suzek TO, Zhang J, Wang J, Zhou Z, et al. PubChem’s BioAssay database. Nucleic Acids Res. 2012;40(Database issue):D400–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Smith CJ, Hansch C, Morton MJ. QSAR treatment of multiple toxicities: the mutagenicity and cytotoxicity of quinolines. Mutat Res. 1997;379(2):167–75. [DOI] [PubMed] [Google Scholar]
  • 29.Skibo EB, Xing C, Dorr RT. Aziridinyl quinone antitumor agents based on indoles and cyclopent[b]indoles: structure-activity relationships for cytotoxicity and antitumor activity. J Med Chem. 2001;44(22):3545–62. [DOI] [PubMed] [Google Scholar]
  • 30.Weinstein JN, Myers TG, O’Connor PM, Friend SH, Fornace AJ Jr, Kohn KW, et al. An information-intensive approach to the molecular pharmacology of cancer. Science. 1997;275(5298):343–9. [DOI] [PubMed] [Google Scholar]
  • 31.Swamidass SJ, Chen J, Bruand J, Phung P, Ralaivola L, Baldi P. Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity. Bioinformatics. 2005;21(Suppl 1): i359–68. [DOI] [PubMed] [Google Scholar]
  • 32.Lee AC, Shedden K, Rosania GR, Crippen GM. Data mining the NCI60 to predict generalized cytotoxicity. J Chem Inf Model. 2008;48(7):1379–88. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Molnar L, Keseru GM, Papp A, Lorincz Z, Ambrus G, Darvas F. A neural network based classification scheme for cytotoxicity predictions:validation on 30,000 compounds. Bioorg Med Chem Lett. 2006;16(4):1037–9. [DOI] [PubMed] [Google Scholar]
  • 34.Guha R, Schurer SC. Utilizing high throughput screening data for predictive toxicology models: protocols and application to MLSCN assays. J Comput Aided Mol Des. 2008;22(6–7):367–84. [DOI] [PubMed] [Google Scholar]
  • 35.Boik JC, Newman RA. Structure-activity models of oral clearance, cytotoxicity, and LD50: a screen for promising anticancer compounds. BMC Pharmacol. 2008;8:12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Huang R, Southall N, Xia M, Cho MH, Jadhav A, Nguyen DT, et al. Weighted feature significance: a simple, interpretable model of compound toxicity based on the statistical enrichment of structural features. Toxicol Sci. 2009;112(2):385–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Langdon SR, Mulgrew J, Paolini GV, van Hoorn WP. Predicting cytotoxicity from heterogeneous data sources with Bayesian learning. J Cheminform. 2010;2(1):11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Chang CY, Hsu MT, Esposito EX, Tseng YJ. Oversampling to overcome overfitting: exploring the relationship between data set composition, molecular descriptors, and predictive modeling methods. J Chem Inf Model. 2013;53(4):958–71. [DOI] [PubMed] [Google Scholar]
  • 39.Mervin LH, Cao Q, Barrett IP, Firth MA, Murray D, McWilliams L, et al. Understanding cytotoxicity and Cytostaticity in a high-throughput screening collection. ACS Chem Biol. 2016;11(11): 3007–23. [DOI] [PubMed] [Google Scholar]
  • 40.Stratton TP, Perryman AL, Vilcheze C, Russo R, Li SG, Patel JS, et al. Addressing the metabolic stability of Antituberculars through machine learning. ACS Med Chem Lett. 2017;8(10):1099–104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Hu Y, Unwalla R, Denny RA, Bikker J, Di L, Humblet C. Development of QSAR models for microsomal stability: identification of good and bad structural features for rat, human and mouse microsomal stability. J Comput Aided Mol Des. 2010;24(1):23–35. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material

RESOURCES