Skip to main content
ACS AuthorChoice logoLink to ACS AuthorChoice
. 2023 Jun 16;36(7):1028–1036. doi: 10.1021/acs.chemrestox.2c00404

Leveraging Cell Painting Images to Expand the Applicability Domain and Actively Improve Deep Learning Quantitative Structure–Activity Relationship Models

Dorota Herman †,*, Maciej M Kańduła , Lorena G A Freitas , Caressa van Dongen §, Thanh Le Van , Natalie Mesens , Steffen Jaensch , Emmanuel Gustin , Liesbeth Micholt , Charles-Hugues Lardeau , Christos Varsakelis , Joke Reumers , Sannah Zoffmann , Yvonne Will , Pieter J Peeters , Hugo Ceulemans
PMCID: PMC10354798  PMID: 37327474

Abstract

graphic file with name tx2c00404_0003.jpg

The search for chemical hit material is a lengthy and increasingly expensive drug discovery process. To improve it, ligand-based quantitative structure–activity relationship models have been broadly applied to optimize primary and secondary compound properties. Although these models can be deployed as early as the stage of molecule design, they have a limited applicability domain—if the structures of interest differ substantially from the chemical space on which the model was trained, a reliable prediction will not be possible. Image-informed ligand-based models partly solve this shortcoming by focusing on the phenotype of a cell caused by small molecules, rather than on their structure. While this enables chemical diversity expansion, it limits the application to compounds physically available and imaged. Here, we employ an active learning approach to capitalize on both of these methods’ strengths and boost the model performance of a mitochondrial toxicity assay (Glu/Gal). Specifically, we used a phenotypic Cell Painting screen to build a chemistry-independent model and adopted the results as the main factor in selecting compounds for experimental testing. With the additional Glu/Gal annotation for selected compounds we were able to dramatically improve the chemistry-informed ligand-based model with respect to the increased recognition of compounds from a 10% broader chemical space.

Introduction

High-throughput screening has been deployed for decades as an efficient paradigm to identify primary chemical hit material from large libraries of small molecules in a primary assay, i.e., an assay for a desired biological activity. Upon confirmation in repeat experiments, hits from a high-throughput screen are typically organized in chemical series—structurally similar compounds. Subsequently, series are explored and prioritized, and the prioritized series are optimized based on not only their primary activity but also their physicochemical, pharmacokinetic, and toxicological properties—the so-called secondary activities. At this stage, experimental evaluation is complemented by computational evaluation, whereby the relevant assays are modeled with machine learning approaches.13 Ligand-based quantitative structure–activity relationship (QSAR) models used for molecule design leverage chemical descriptors like SMILES and ECFPs and are broadly deployed to optimize primary and secondary compound properties.4 They are, however, structurally biased, meaning that they have a limited applicability domain.5 If the structures of interest differ substantially from the chemical space on which the model was trained, a reliable prediction will not be possible.

Image-informed ligand-based models (or image-based models, here used interchangeably) that leveraged repurposed high-throughput screens have recently also been shown to effectively predict activity in a selection of assays and increase assay hit rate and chemical hit diversity,2,6 or predict mitochondrial toxicity.7,8 In this scenario, images can be interpreted as representations of the phenotype of a cell caused by the small molecule, instead of focusing on its chemical structure per se. As such, image-based virtual assays may avoid the structural biases inherent to ligand-based QSAR models. In practice, this means that image-informed ligand-based models may suggest compounds relevant to a project that are structurally diverse from those used in training and fall outside of the applicability domain of structural models. However, image-based models can be applied only to compounds that are physically available and imaged, while ligand-based QSAR models can be applied already at the stage of molecule design.

To leverage the advantages of both these approaches and overcome their respective limitations, we propose a computational pipeline in the spirit of active learning. Our approach builds upon previous work showing the positive impact of employing multimodal compound descriptors for bioactivity prediction.9,10 In a first step, initial models are built based on structural information and images, respectively. Then predictions from the image-informed and chemistry-informed ligand-based models guide the selection of molecules to be tested in an actual physical assay. In this way, we aim to collect additional assay labels from these selected compounds, which we then utilize in a retraining loop, improving both initial models, with focus on boosting the performance of the structural model. This process can be repeated until the model performance of the models reaches a satisfactory target.

In the present article, we test out this idea by virtualizing and improving the performance of a toxicity-related assay. Drug-induced toxicity has been estimated to account for the attrition of approximately one-third of drug candidates and is a major contributor to the high costs and loss of time during drug development, particularly when not recognized until late stages.11 Hence, there is a high demand for methods to identify toxicity effects of potential drugs as early as possible. For instance, previous efforts to produce predictive models for safety-relevant readouts of small molecules12 have leveraged clinical or preclinical or in vitro data, concentrating on general toxicity,13 specific organ toxicity (e.g., drug-induced liver injury),14 or specific mechanisms, such as a dysfunction of mitochondrial membrane potential,15 that can lead to toxicity. Compounds identified by these models can then be validated in fit for purpose cellular assays.

Mitochondrial dysfunction has been implicated as one of the major contributors to drug-induced toxicity.16 This is unsurprising, considering the pivotal role played by these organelles, not only in ATP production but also in the integration of various cell-signaling pathways and in the maintenance of cellular homeostasis. In short, when mitochondria fail, the cell dies,17 and this situation is exacerbated by the multitude of mechanisms by which mitochondrial functions can be perturbed. These perturbations are often mediated directly through the uncoupling of ATP synthesis from electron transport, or the inhibition of either of these processes. Furthermore, indirect perturbation is also possible through the disruption of mitochondrial transcription/translation and/or the acceleration of free-radical production. Considerable advances have been made in understanding these mechanisms,1820 and drug-induced mitochondrial dysfunction has now been shown to contribute to toxicity in the liver, heart, kidney, muscle, and the central nervous system.21

In most mammalian cells, mitochondria generate almost all the energy—in the form of ATP—required for survival. However, many highly proliferative cells generate almost all ATP via glycolysis, despite abundant oxygen and functional mitochondria—a phenomenon known as the Crabtree effect.22 Such anaerobically poised cells have a reduced susceptibility to mitochondrial toxicants. To increase detection of drug-induced mitochondrial toxicity during early stages of drug development, an in vitro cell-based assay has been developed: the Glu/Gal assay. In this assay, HepG2 cells are forced to rely on mitochondrial oxidative phosphorylation rather than glycolysis, by substituting galactose for glucose in the culture medium.2325 The mitochondrial dysfunction is determined by the ratio of the test substance in glucose and galactose culture conditions (Glu/Gal). Compounds with a significantly higher cytotoxicity potential in galactose-grown cells are determined as substances that induce mitochondrial dysfunction as the primary mode of action.

The Glu/Gal assay is the primary assay of choice as an early in vitro screen for drug-induced mitochondrial toxicity.26 This is due to its simplicity and amenability to a medium- to semi-high-throughput format, on the one hand, and its potential to capture various mechanisms of mitochondrial toxicity on the other hand. However, the assay requires an assessment of full concentration response curves in different medium conditions and is not suited to screen large libraries. Modeling the Glu/Gal outcome is therefore desired, but nontrivial, given the magnitude of underlying mitochondrial toxicity mechanisms that need to be captured.

Experimental Procedures

Data Preparation

Glu/Gal Label Generation and Data Preparation for Modeling

To assess the mitochondrial toxicity potential of compounds, the cytotoxicity potential (determined by ATP measurement) of the compounds was compared in glucose versus galactose culture conditions. Briefly, HepG2 cells, either grown in glucose or galactose-containing media, were seeded at the same densities in 384-well plates (Corning Inc., New York) and allowed to adhere for 24 h. Compounds were prepared as 100× concentrated stock solutions in dimethyl sulfoxide (DMSO), and the final DMSO concentration was 1%. Cells were exposed to the compounds (concentration range: 0.2–100 μM, dilution factor 2) for 24 h before cellular ATP content was measured. Cellular ATP concentrations were assessed by using the CellTiter-Glo Assay (Promega, Madison, WI), and the readout was performed by detecting luminescence on an EnVision 2105 Multilabel Reader (PerkinElmer, Waltham, MA). Results were imported in Genedata Screener software (Genedata, Basel, Switzerland) to create dose–response curves and calculate the IC50 values of each compound. Subsequently, the ratio between Glu IC50 and Gal IC50 for each compound was calculated. Compounds with a significantly higher cytotoxicity potential in galactose-grown cells (Glu/Gal ratio ≥ 5) are determined as substances that induce mitochondrial dysfunction as the primary mode of action. To prepare these data for our deep-learning classification models, we thus binarized the calculated ratio at 5, meaning compounds with ratios ≥5 correspond to the positive class and ratios <5 correspond to the negative class, i.e., safe compounds. The proportion of positives (from here on also referred to as “hit rate”) is in the range of 4–5%, but the selection of the threshold is based on indications of signal translation to in vivo drug-induced liver toxicity.27 The original set of Glu/Gal labels (v1 set) contained 11280 annotated compounds, collected internally over time.

Cell Painting Images and Image Preprocessing

Images of cells exposed to compounds were acquired using Cell Painting, a high-content image-based assay for cytological profiling.28,29 It consists of a panel of fluorescent dyes labeling different cell components or organelles (nucleus, endoplasmic reticulum, mitochondria, Golgi apparatus and cytoskeleton, plasma membrane, nucleoli, and cytoplasmic RNA), with the aim to capture the biological state of the cell after chemical perturbation. Briefly, U2OS cells were seeded in 1536 well plates (Aurora, Scottsdale, AZ) and allowed to attach for 24 h. Compounds were diluted in DMSO to a final concentration of 20 μM, and the plates were incubated for 24 h before fixation, permeabilization, and staining. Images of the five fluorescence channels were acquired with a Yokogawa (Tokyo, Japan) CellVoyager 8000 confocal high-content imaging reader. The PerkinElmer (Waltham, MA) Acapella automated image analysis software was used to extract ∼1600 morphological features, including measures of shape, size, texture, intensity, etc., from individual cells. Well-level results were obtained by taking the means and medians over the cells. Results were imported in the Phaedra software30,31 version 1.0.8 for quality control and technical validation, rejecting wells containing experimental artifacts. In every experiment, 70 reference compounds were included at 4 concentrations, and the consistency with a historical reference of 50 features was verified for selected compound-concentration pairs.32 Validated feature data were exported and converted to Z-scores by normalization against the DMSO controls on every plate.

Model Building

We combine Glu/Gal labels together with additional activity labels from different assays to model all the tasks together in a multitask learning (MTL) problem.33 This approach aims to take advantage of the inherent relation of multiple assays modeled jointly, improving learning efficiency and the performance of individual tasks through correlations. Every property in the model is treated as a binary classification task. For the ligand-based QSAR models we use a computationally expensive graph convolutional method.34 Thus, we focus on 47 such accompanying tasks; see Ligand-Based QSAR Model Building for details. In practice, this number already allows the model to yield satisfying results. In comparison, due to a comparatively simpler architecture, and based on past experience,2 we expanded the training data set to thousands of accompanying tasks for the image-informed ligand-based modeling.

Ligand-Based QSAR Model Building

The 11280 labels for the Glu/Gal assay combined with labels from 47 accompanying tasks for the ligand-based QSAR model amounted to a total of ∼50000 SMILES,35 a 23% fill rate. Accompaning tasks represent assays that could be correlated with mitochondrial toxicity or other drug-induced liver toxicity mechanisms, e.g., log 6D,36 oxidative stress, or cholestasis. In order to minimize the loss of information—due to the need of binarizing the data for a classification experiment—each assay may have been thresholded at multiple values and can therefore be represented by multiple tasks. We employed the Chemprop34 tool in a multitask binary classification experiment. SMILES were normalized with commercial software ChemAxon.37 A chemical cluster-based folding split allows assessment of how a model generalizes to chemical series never seen during model training. Thus, to split compounds into cross-validation folds that would represent different chemical spaces, molecular binary 4096-bit ECFP fingerprints (radius 6) were first calculated for all compounds with RDKit(38) open-source cheminformatics library version 2022.03.2. Next, compounds were clustered using the Butina algorithm39 with Tanimoto similarity threshold 0.5, with RDKit(38) library version 2022.03.2. The clusters were then split into 5 folds, optimizing for the equal number of compounds in each fold and representation of each class from each task in every fold. First, hyperparameter optimization was run with respect to ROC AUC with a hyperopt library40 to select the best performing architecture: feed-forward layers = 2, hidden layers size = 512, depth = 0.5, epochs = 60, batch size = 250, and dropout = 0.2. Then, with the selected architecture, 20 models were run, where each of these 20 runs had different combinations of 5-fold splits into 3 folds of training, 1 fold of validation, and 1 fold of testing. The 1-fold validation sets were used to select the best performing epoch, which is an integrated feature of Chemprop, and to generate calibrators with an approach described later under “Model Calibration with Mondrian Cross-Conformal Predictor (MCCP)”. The 1-fold testing sets were used for model evaluation described later under “Model Evaluation”. For a final model an assembly of five models is used, where each model was built on different combinations of 4 folds for training and 1 fold to select the best performing epoch, an integrated feature of Chemprop.

Image-Informed Ligand-Based Model Building

A multitask classification model was trained on image-based feature vectors of compounds, with their corresponding experimental results in assays of interest used as labels, to predict activity for new compounds based on their image features. The number of Glu/Gal-annotated compounds available to train the image-based model was limited to 7725 (out of 11280 available labels), as this is the set that had been imaged with the Cell Painting protocol at the time of the modeling experiment. As with the ligand-based QSAR model, the hit rate remained in the range of 4–5%. As previously mentioned, we expanded the training data set to thousands of accompanying tasks, including tasks that were used in the ligand-based QSAR model, resulting in a fill rate of 1% of the labeled data. The model was built using the SparseChem41,42 package, a machine-learning framework that supports very high dimensional models with sparse inputs (in our case thousands of features, over a hundred thousand compounds, and thousands of assays with incomplete results—or labels—coverage for all compounds).

To compensate for the smaller size of the imaged data set compared to the number of structures available (due to not every compound in our library having been imaged), we trained image-based models using five different architectures (hidden size: “1024 128 128”, “128 128”, “512 256 256”, “512 256 128”, “1024 512 256”; last dropout: 0.5; middle dropout: 0.3) selected as the best performing out of 100 random sets of hyperparameters using grid search on an independent imaging data set. One final ensemble model43 was then produced based on the aggregated results and evaluation metrics, averaging the outputs across the five neural networks.

Compounds were split into 5 folds, each containing molecules that are structurally closest to each other based on their Murcko scaffolds.44 This makes the training deliberately harder because the model does not train on chemical series that it is then evaluated on, forcing the evaluation results to generalize better. A leave-one-out approach was then employed, where a fold was used to generate a cross-conformal Mondrian predictor or, simply put, a conformal predictor.

Model Calibration with Mondrian Cross-Conformal Predictor (MCCP)

We employed Mondrian Cross-Conformal Predictor (MCCP), implemented according to Sun et al.,45 to calibrate the models. This is a rigorous approach to define prediction confidence while addressing large data imbalances,27 common in bioactivity data sets.

To ensure generalizability, we divided the data into 5 folds, such that for a model built from a 3-fold training set, MCCP is calibrated on a 1-fold set, and the metrics from the calibrated model are reported on a 1-fold test set (see Model Evaluation for metrics and their definitions). This arrangement rotates in a cross-validation approach until all folds are used for calibration. All calibrators here used significance level ε = 0.05, which means that we accept a misclassification of up to 5%. Increasing ε would boost model efficiency, but this would come at the cost of increased misclassifications and a decrease in predicitve value.

The predictions returned by the MCCP model may not only be positive or negative but also empty (none of the classes predicted) or uncertain (both of the classes predicted),45 meaning that a model cannot make a reliable single prediction. The returned predictions depend on the predefined ε. For all reported models we apply ε = 0.05 at which, in practice, our models of the Glu/Gal assay have not encountered empty predictions. However, this cannot be excluded as a possibility.

Model Evaluation

Evaluation metrics are reported as the mean of 1-fold test sets over all runs, where the model is built on a train set comprised of 3 folds, and 1 fold is saved for calibration.

We report the following metrics for MCCP:

  • Efficiency: the proportion of compounds for which the model can make reliable predictions (either positive or negative);

  • Predictive Value (PV): the proportion of compounds correctly predicted out of those for which the model made a reliable prediction (either positive or negative). The PV can be calculated for all compounds but also for each class individually, which here is called positive predictive value (PPV) or negative predictive value (NPV).

In addition, we provide the following classic model quality evaluation metrics:

  • ROC AUC: a general measure of model quality that takes true positive and false positive rates into account.

  • Precision-Recall AUC: a general measure that considers the validity for the positive class and the true positive rate, which makes it more sensitive to improvements for the positive class. This is especially useful when the data are highly imbalanced, with a small fraction of positive examples.

Clustering and Visualization of Chemical Space

To enable the use of chemical diversity as a criterion for compound selection, compounds were clustered using the Butina algorithm39 with a Tanimoto similarity threshold of 0.5, using the RDKit(38) library version 2022.03.2. For a Uniform Manifold Approximation and Projection (UMAP)46,47 of the chemical space of compounds annotated with Glu/Gal, we used the umap python library version 0.5.3. The representation was built with Jaccard distance on binary 4096-bit ECFP fingerprints (radius 6) calculated with RDKit(38) open-source cheminformatics library version 2022.03.2. To fit the chemical representation data, we used the following UMAP parameters: number of components = 2, number of neighbors = 80, and minimum distance = 0.95. To fit the phenotypic representation data, the same feature vector was used for modeling, with the following UMAP parameters: number of components = 2, number of neighbors = 80, and minimum distance = 0.95 with Pearson correlation distance.

Results

Models v1

Both SMILES and image-based models were built separately in multitask classification experiments and calibrated with MCCP. As described under Experimental Procedures, SMILES and image-based models were built on data of different sizes—both the Glu/Gal and the accompaning tasks. Therefore, we focus on baselining the models rather than directly comparing them. We refer to these initial SMILES- and image-based models as version 1 (v1) models. They both achieved reasonably good metrics, with ROC AUC 0.86 and 0.84 and with PR AUC 0.36 and 0.45 being SMILES-based and image-based, respectively (Table 1). However, the positive predictive value (for mitochondrial toxic compounds)—calibrated with the MCCP—is relatively low, while the negative predictive value (for mitochondrial safe compounds) is extremely high. This can be attributed to the data being substantially imbalanced toward the safe class (see Glu/Gal Data Generation and Data Preparation for Modeling). In both cases, however, uncertain predictions amount for slightly less than half of all compounds, which provides an estimate of the applicability domain of the models.

Table 1. Model Performance for SMILES-Based and Image-Based for Glu/Gal Classification Model Threshold for Ratio ≥5.

ratio ≥5 efficiency PPV NPV no. of positives/no. of all compounds ROC AUC PR AUC
SMILES-based v1 43.4% 30.8% 99.6% 486/11280 0.86 0.36
image-based 48.6% 28.4% 98.6% 394/7725 0.84 0.45

To ultimately improve the performance of the SMILES-based model and expand its applicability domain, we next utilized the image-based predictions as one of the factors to guide the selection of additional compounds for experimental Glu/Gal testing, as described in the next section.

Phenotypic representation of Glu/Gal annotated compounds shows that there is no particular, clean phenotype driving the image-based model (Figure 1). There might be multiple reasons for that, one of them being that compounds have more modes of action than what they have been optimized for.

Figure 1.

Figure 1

UMAP of compounds represented by normalized image feature vectors. Blue: compounds experimentally annotated with Glu/Gal ratio <5; brown: compounds experimentally annotated with Glu/Gal ratio ≥5.

Selection of Compounds

To improve the model quality, in the spirit of active learning,48 we used the new predictions to select compounds for experimental testing in the Glu/Gal assay so the results could be further included in the next round of model training. A criterion for selection was the availability of a Cell Painting image, allowing us to make a prediction with the image-based model. With this, we expected to both increase the proportion of positive labels (hit rate) of the assay and expand the annotated chemical space. In the selection, we focused on compounds with different Murcko scaffolds44 compared to the compounds in the training set, without, however, including the minimal Tanimoto distance to the original set in the selection criteria. Our choice was limited by three factors: testing capacity, compound availability, and chemical diversity. To expand the model toward broader chemical regions, we limited the number of selected compounds from the same cluster of chemically similar compounds to 10 to ensure coverage of broader chemical diversity. Priority was given to compounds ranked higher—by the model—based on probability toward the positive class. Again, prioritizing compounds with probability toward the positive class, a total of 1809 compounds were ultimately experimentally tested from a selection based on the SMILES and/or image-based predictions, including the following:

  • 621 compounds predicted to be positive by both models;

  • 1188 compounds predicted to be positive by the image-based model, while deemed uncertain by the SMILES-based model.

The separation between a subset of the selected compounds and the original set of Glu/Gal-labeled compounds in the chemical space, represented by a UMAP with a Jaccard distance, indicates a jump of the selected compounds to distant chemical space (Figure 2). Plots with varying random states are included in the Supporting Information (Figure S1).

Figure 2.

Figure 2

UMAP of compounds represented by a binary ECFP, with Jaccard distance. Orange: original set of Glu/Gal annotated compounds; green: compounds predicted positive in the image-based model and uncertain in SMILES-based model; purple: compounds predicted positive by both image- and SMILES-based models; red: validation compound predicted uncertain in v1 and positive in v2 SMILES-based models.

Data Boost Results and Model Improvement

All the 1809 compounds were physically tested in the Glu/Gal assay and yielded a confirmation rate of 34.2% (Table 2). For these compounds the Glu to Gal ratio is confirmed to be ≥5, i.e., the “positive” class. This observed confirmation rate is higher than expected from the original model, where the calculated PPV was 28.4% (Table 2). For the 621 compounds (out of 1809 selected) that were independently predicted positive by both the image- and SMILES-based models, the confirmation rate is 45.5%. For the 1188 compounds initially predicted to be positive by the image-based model and uncertain by the SMILES-based model, the confirmation rate is 28.2%, close to what we expected.

Table 2. Expected and Observed Confirmation Rates for the Set of 1809 Selected Compoundsa.

    experimentally observed confirmation rate
  expected confirmation rate image-based all image-based positive and SMILES-based positive image-based positive and SMILES-based uncertain
ratio ≥5 28.4% 34.2% 45.5% 28.2%
a

The expected confirmation rate is the PPV from the v1 model (Table 1). Experimentally observed confirmation rates: all: for all selected compounds, and for each of two different selections: compounds predicted to be positive in both models; compounds predicted positive in image-based and uncertain in SMILES-based model.

We then added the newly confirmed labels to our initial label set and trained a new SMILES-based model (v2). ROC AUC did not change significantly. However, it had already been high in the v1 model, compared to the ROC AUC for a random set with equivalent hit rate. The efficiency, PPV, and sensitivity improved dramatically in the data-boosted v2 model (Table 3). The stark increase of the sensitivity and PPV from 0.29 to 0.41 and 30.8% to 47.6%, respectively, is expected when a model built on a highly imbalanced data set is retrained with substantially more positive labels, as is the case here with an increase in the proportion of positives from 4% to 8%. At the same time, the specificity and NPV remained very high and stable compared to the initial numbers. Crucially, we observe a substantial increase in the efficiency—from 43.4% to 54.1%. This means that we were able to expand the applicability domain of the initial model; an increase of 10.7% (in absolute terms, or 25% in relative terms) in reliable predictions facilitated by a 16% expansion of the training set.

Table 3. Model Performance for SMILES-Based Models for Glu/Gal Classification Model Threshold with Ratio ≥5a.

ratio ≥5 efficiency PPV NPV no. of positives/no. of all compounds ROC AUC PR AUC ROC AUC (random) PR AUC (random) specificity sensitivity
SMILES-based v1 43.4% 30.8% 99.6% 486/11280 0.86 0.36 0.52 0.05 0.96 0.29
SMILES-based v2 54.1% 47.6% 99.2% 1105/13089 0.88 0.48 0.50 0.08 0.97 0.41
a

SMILES-based v1 model built on data before data boost. SMILES-based v2 model built on data after data boost.

Expansion of Applicability Domain with an Example of Validation Compound

To further test the expansion of the applicability domain, measured by efficiency, we have run predictions over a chemically diverse set of ∼800000 of our library compounds with both models. Efficiency of the SMILES-based v1 model, which was 27%, increased to 46% in the SMILES-based v2 models, which is an increase of 170% in relative terms.

To present the jump toward the new chemical space of SMILE-based v2 vs SMILES-based v1 models, we selected one of our compounds from internal data, which has been confirmed experimetally to have a Glu/Gal ratio ≥5. The validation compound is predicted uncertain by the SMILES-based v1 model, but predicted positive by the SMILES-based v2 model, which indicated the expansion of the applicability domain. More importantly, the chemical similarity of that compound is closer to compounds selected for data boost that were predicted positive by the image-based model only—maximal Tanimoto similarity of 0.51 compared to compounds selected as double hits (predicted positive in both models) or the original set of Glu/Gal annotated compounds—maximal Tanimoto similaries of 0.12 and 0.19, respectively (Table 4. This finding can be visualized on the UMAP (Figure 1), where the validation compound is close in chemical space to compounds selected by image-based—a cluster on the UMAP circled in red.

Table 4. Similarity of the Validation Compound to the Most Similar Compound from Sets of Originally Annotated Glu/Gal Compounds, Image-Based Positive and SMILES-Based Positives, and Image-Based Positives and SMILES-Based Uncertain.

  maximum Tanimoto similarity in the sets of compounds
  original set of Glu/Gal annotated compounds image-based positive and SMILES-based positive image-based positive and SMILES-based uncertain
validation compound 0.19 0.12 0.51

Discussion

Based on the emerging need of addressing tox liabilities before preclinical studies, the Glu/Gal assay has been developed as a midthroughput proxy to mitochondrial toxicity. It is the primary assay of choice as an early in vitro screen for drug-induced mitochondrial toxicity. In principle, it is an assay simple to perform, but in practice not suited to screen large libraries. Therefore, a predictive model is desired. While ligand-based QSAR ADME predictive models are currently broadly used for molecule design, it immediately raises an idea to address the tox components also. This Glu/Gal predictive model will only be useful globally if it can cover a broad chemical space; that is, its applicability domain is broad enough to support projects over a large therapeutic area portfolio. In order to build such models with a suitable model quality for projects support, or to minimize the cost of compound screening in an assay, we applied an active learning approach, assisted by additional phenotypic data. Specifically, we used the Cell Painting images to aid in selecting compounds for experimental testing. These data are independent of the chemical structure information and, therefore, have the potential to help expand the chemical space of compounds annotated with the Glu/Gal assay. In practice, with the additional Glu/Gal annotation for selected compounds with image-informed and chemistry-informed ligand-based models, we were able to improve the latter with respect to increased recognition of compounds from a broader chemical space, which was identified here by an increase of MCCP efficiency. Specifically, the improved ligand-based QSAR model, built after a 16% expansion of the training set, can reliably predict 25% more compounds than the initial model and 170% more compounds from our internal 800000 diverse compound set, in relative terms. The higher number of reliable predictions in our 800000 diverse compound set is mostly due to a lower hit rate than a hit rate of the boosted model itself. Additionally, in the improved model we were able to increase the hit rate. The boosted hit rate is a direct effect of the high experimental confirmation rate of the compounds selected with our approach. At 34.2% (PPV), it is higher than the expected 28.4% of the image-based model alone. We achieved this increase by enriching selected compounds with “predicted double hits”, compounds predicted to be active by both ligand-based QSAR and image-based models. Assessed for these compounds specifically, the confirmation rate is 45.5%. Surprisingly, we observe almost the same confirmation rate as expected for compounds selected positive from the image-based model and uncertain from the ligand-based QSAR, even though based on our experience we would expect to see it slightly decreased. Moreover, we have shown the example of a validation compound that has been confidently predicted by the SMILES-based model v2 and not by the SMILES-based model v1. At the same time, the validation compound is closer in chemical space to compounds selected by image-based model only, which can indicate the additive value of the Cell Painting data.

With this straightforward approach we were able to show and exploit the complementarity of two independent data sets representing small molecules, dramatically boosting the performance of one model by leveraging the other. A more complex method, directly fusing multiple data modalities, could be explored, integrating image-based representations with chemical structure representations to improve performance of specific virtualized assays even further.

Acknowledgments

This work was funded by Janssen Pharmaceutical Companies of Johnson & Johnson.

Supporting Information Available

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.chemrestox.2c00404.

  • UMAP of chemical representation with varying random states (PDF)

Author Contributions

CRediT: Dorota Herman conceptualization, data curation, formal analysis, investigation, methodology, visualization, writing-original draft, writing-review & editing; Maciej M. Kańduła data curation, formal analysis, writing-original draft, writing-review & editing; Lorena G. A. Freitas data curation, formal analysis, writing-original draft, writing-review & editing; Caressa van Dongen data curation, validation, writing-original draft; Thanh Le Van data curation, writing-review & editing; Natalie Mesens resources, writing-original draft, writing-review & editing; Steffen Jaensch data curation, writing-review & editing; Emmanuel Gustin data curation, writing-original draft; Liesbeth Micholt data curation, writing-review & editing; Charles-Hugues Lardeau data curation; Christos Varsakelis writing-review & editing; Joke Reumers writing-review & editing; Sannah Zoffmann writing-review & editing; Yvonne Will resources, writing-review & editing; Pieter J. Peeters resources, writing-review & editing; Hugo Ceulemans conceptualization, investigation, methodology, resources, supervision, writing-review & editing.

The authors declare no competing financial interest.

Supplementary Material

tx2c00404_si_001.pdf (327.3KB, pdf)

References

  1. Chandrasekaran S. N.; Ceulemans H.; Boyd J. D.; Carpenter A. E. Image-Based Profiling for Drug Discovery: Due for a Machine-Learning Upgrade?. Nat. Rev. Drug Discovery 2021, 20 (2), 145–159. 10.1038/s41573-020-00117-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Simm J.; Klambauer G.; Arany A.; Steijaert M.; Wegner J. K.; Gustin E.; Chupakhin V.; Chong Y. T.; Vialard J.; Buijnsters P.; Velter I.; Vapirev A.; Singh S.; Carpenter A. E.; Wuyts R.; Hochreiter S.; Moreau Y.; Ceulemans H. Repurposing High-Throughput Image Assays Enables Biological Activity Prediction for Drug Discovery. Cell Chem. Biol. 2018, 25 (5), 611–618.e3. 10.1016/j.chembiol.2018.01.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Soufan O.; Ba-alawi W.; Magana-Mora A.; Essack M.; Bajic V. B. DPubChem: A Web Tool for QSAR Modeling and High-Throughput Virtual Screening. Sci. Rep. 2018, 8 (1), 9110. 10.1038/s41598-018-27495-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Butina D.; Segall M. D.; Frankcombe K. Predicting ADME Properties in Silico: Methods and Models. Drug Discovery Today 2002, 7 (11), S83–88. 10.1016/S1359-6446(02)02288-2. [DOI] [PubMed] [Google Scholar]
  5. Gadaleta D.; Mangiatordi G. F.; Catto M.; Carotti A.; Nicolotti O. Applicability Domain for QSAR Models: Where Theory Meets Reality. Int. J. Quant. Struct.-Prop. Relatsh. IJQSPR 2016, 1 (1), 45–63. 10.4018/IJQSPR.2016010102. [DOI] [Google Scholar]
  6. Hofmarcher M.; Rumetshofer E.; Clevert D.-A.; Hochreiter S.; Klambauer G. Accurate Prediction of Biological Assays with High-Throughput Microscopy Images and Convolutional Networks. J. Chem. Inf. Model. 2019, 59 (3), 1163–1171. 10.1021/acs.jcim.8b00670. [DOI] [PubMed] [Google Scholar]
  7. Trapotsi M.-A.; Mouchet E.; Williams G.; Monteverde T.; Juhani K.; Turkki R.; Miljković F.; Martinsson A.; Mervin L.; Pryde K. R.; Müllers E.; Barrett I.; Engkvist O.; Bender A.; Moreau K. Cell Morphological Profiling Enables High-Throughput Screening for Proteolysis TArgeting Chimera (PROTAC) Phenotypic Signature. ACS Chem. Biol. 2022, 17 (7), 1733–1744. 10.1021/acschembio.2c00076. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Seal S.; Carreras-Puigvert J.; Trapotsi M.-A.; Yang H.; Spjuth O.; Bender A. Integrating Cell Morphology with Gene Expression and Chemical Structure to Aid Mitochondrial Toxicity Detection. Commun. Biol. 2022, 5 (1), 1–15. 10.1038/s42003-022-03763-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Seal S.; Yang H.; Trapotsi M.-A.; Singh S.; Carreras-Puigvert J.; Spjuth O.; Bender A.. Merging Bioactivity Predictions from Cell Morphology and Chemical Fingerprint Models Using Similarity to Training Data. bioRxiv, February 4, 2023; p 2022.08.11.503624. 10.1101/2022.08.11.503624 (accessed 2023-04-05). [DOI] [PMC free article] [PubMed]
  10. Moshkov N.; Becker T.; Yang K.; Horvath P.; Dancik V.; Wagner B. K.; Clemons P. A.; Singh S.; Carpenter A. E.; Caicedo J. C. Predicting Compound Activity from Phenotypic Profiles and Chemical Structures. Nat. Commun. 2023, 14 (1), 1967. 10.1038/s41467-023-37570-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Guengerich F. P. Mechanisms of Drug Toxicity and Relevance to Pharmaceutical Development. Drug Metab. Pharmacokinet. 2011, 26 (1), 3–14. 10.2133/dmpk.DMPK-10-RV-062. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Liu A.; Seal S.; Yang H.; Bender A. Using Chemical and Biological Data to Predict Drug Toxicity. SLAS Discovery 2023, 28 (3), 53–64. 10.1016/j.slasd.2022.12.003. [DOI] [PubMed] [Google Scholar]
  13. Sharma B.; Chenthamarakshan V.; Dhurandhar A.; Pereira S.; Hendler J. A.; Dordick J. S.; Das P. Accurate Clinical Toxicity Prediction Using Multi-Task Deep Neural Nets and Contrastive Molecular Explanations. Sci. Rep. 2023, 13 (1), 4908. 10.1038/s41598-023-31169-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Vall A.; Sabnis Y.; Shi J.; Class R.; Hochreiter S.; Klambauer G. The Promise of AI for DILI Prediction. Front. Artif. Intell. 2021, 4, 638410. 10.3389/frai.2021.638410. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Mayr A.; Klambauer G.; Unterthiner T.; Hochreiter S.. DeepTox: Toxicity Prediction Using Deep Learning. Front. Environ. Sci. 2016, 3. 10.3389/fenvs.2015.00080. [DOI] [Google Scholar]
  16. Wallace K. B. Mitochondrial off Targets of Drug Therapy. Trends Pharmacol. Sci. 2008, 29 (7), 361–366. 10.1016/j.tips.2008.04.001. [DOI] [PubMed] [Google Scholar]
  17. Dykens J. A.; Marroquin L. D.; Will Y. Strategies to Reduce Late-Stage Drug Attrition Due to Mitochondrial Toxicity. Expert Rev. Mol. Diagn. 2007, 7 (2), 161–175. 10.1586/14737159.7.2.161. [DOI] [PubMed] [Google Scholar]
  18. Amacher D. E. Drug-Associated Mitochondrial Toxicity and Its Detection. Curr. Med. Chem. 2005, 12 (16), 1829–1839. 10.2174/0929867054546663. [DOI] [PubMed] [Google Scholar]
  19. Wallace K. B.; Starkov A. A. Mitochondrial Targets of Drug Toxicity. Annu. Rev. Pharmacol. Toxicol. 2000, 40 (1), 353–388. 10.1146/annurev.pharmtox.40.1.353. [DOI] [PubMed] [Google Scholar]
  20. Zhou S.; Wallace K. B. The Effect of Peroxisome Proliferators on Mitochondrial Bioenergetics. Toxicol. Sci. 1999, 48 (1), 82–89. 10.1093/toxsci/48.1.82. [DOI] [PubMed] [Google Scholar]
  21. Hynes J.; Nadanaciva S.; Swiss R.; Carey C.; Kirwan S.; Will Y. A High-Throughput Dual Parameter Assay for Assessing Drug-Induced Mitochondrial Dysfunction Provides Additional Predictivity over Two Established Mitochondrial Toxicity Assays. Toxicol. Vitro Int. J. Publ. Assoc. BIBRA 2013, 27 (2), 560–569. 10.1016/j.tiv.2012.11.002. [DOI] [PubMed] [Google Scholar]
  22. Rodríguez-Enríquez S.; Juárez O.; Rodríguez-Zavala J. S.; Moreno-Sánchez R. Multisite Control of the Crabtree Effect in Ascites Hepatoma Cells. Eur. J. Biochem. 2001, 268 (8), 2512–2519. 10.1046/j.1432-1327.2001.02140.x. [DOI] [PubMed] [Google Scholar]
  23. Marroquin L. D.; Hynes J.; Dykens J. A.; Jamieson J. D.; Will Y. Circumventing the Crabtree Effect: Replacing Media Glucose with Galactose Increases Susceptibility of HepG2 Cells to Mitochondrial Toxicants. Toxicol. Sci. Off. J. Soc. Toxicol. 2007, 97 (2), 539–547. 10.1093/toxsci/kfm052. [DOI] [PubMed] [Google Scholar]
  24. Dykens J. A.; Jamieson J. D.; Marroquin L. D.; Nadanaciva S.; Xu J. J.; Dunn M. C.; Smith A. R.; Will Y. In Vitro Assessment of Mitochondrial Dysfunction and Cytotoxicity of Nefazodone, Trazodone, and Buspirone. Toxicol. Sci. Off. J. Soc. Toxicol. 2008, 103 (2), 335–345. 10.1093/toxsci/kfn056. [DOI] [PubMed] [Google Scholar]
  25. Dykens J. A.; Jamieson J.; Marroquin L.; Nadanaciva S.; Billis P. A.; Will Y. Biguanide-Induced Mitochondrial Dysfunction Yields Increased Lactate Production and Cytotoxicity of Aerobically-Poised HepG2 Cells and Human Hepatocytes in Vitro. Toxicol. Appl. Pharmacol. 2008, 233 (2), 203–210. 10.1016/j.taap.2008.08.013. [DOI] [PubMed] [Google Scholar]
  26. Kamalian L.; Douglas O.; Jolly C. E.; Snoeys J.; Simic D.; Monshouwer M.; Williams D. P.; Park B. K.; Chadwick A. E. Acute Metabolic Switch Assay Using Glucose/Galactose Medium in HepaRG Cells to Detect Mitochondrial Toxicity. Curr. Protoc. Toxicol. 2019, 80 (1), e76. 10.1002/cptx.76. [DOI] [PubMed] [Google Scholar]
  27. Aleo M. D.; Shah F.; Allen S.; Barton H. A.; Costales C.; Lazzaro S.; Leung L.; Nilson A.; Obach R. S.; Rodrigues A. D.; Will Y. Moving beyond Binary Predictions of Human Drug-Induced Liver Injury (DILI) toward Contrasting Relative Risk Potential. Chem. Res. Toxicol. 2020, 33 (1), 223–238. 10.1021/acs.chemrestox.9b00262. [DOI] [PubMed] [Google Scholar]
  28. Gustafsdottir S. M.; Ljosa V.; Sokolnicki K. L.; Anthony Wilson J.; Walpita D.; Kemp M. M.; Petri Seiler K.; Carrel H. A.; Golub T. R.; Schreiber S. L.; Clemons P. A.; Carpenter A. E.; Shamji A. F. Multiplex Cytological Profiling Assay to Measure Diverse Cellular States. PLoS One 2013, 8 (12), e80999. 10.1371/journal.pone.0080999. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Bray M.-A.; Singh S.; Han H.; Davis C. T.; Borgeson B.; Hartland C.; Kost-Alimova M.; Gustafsdottir S. M.; Gibson C. C.; Carpenter A. E. Cell Painting, a High-Content Image-Based Assay for Morphological Profiling Using Multiplexed Fluorescent Dyes. Nat. Protoc. 2016, 11 (9), 1757–1774. 10.1038/nprot.2016.105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Cornelissen F.; Cik M.; Gustin E. Phaedra, a Protocol-Driven System for Analysis and Validation of High-Content Imaging and Flow Cytometry. J. Biomol. Screen. 2012, 17 (4), 496–506. 10.1177/1087057111432885. [DOI] [PubMed] [Google Scholar]
  31. Open Analytics, Phaedra, 2022. https://github.com/openanalytics/phaedra (accessed 2022-11-25).
  32. Assefa A. T.; Verbist B.; Gustin E.; Peeters D. Automated Quality Control Tool for High-Content Imaging Data by Building 2D Prediction Intervals on Reference Biosignatures. SLAS Discovery 2023, 28, 111. 10.1016/j.slasd.2023.01.007. [DOI] [PubMed] [Google Scholar]
  33. Caruana R. Multi-Task Learning. Mach. Learn. 1997, 28 (1), 41–75. 10.1023/A:1007379606734. [DOI] [Google Scholar]
  34. Soleimany A. P.; Amini A.; Goldman S.; Rus D.; Bhatia S. N.; Coley C. W. Evidential Deep Learning for Guided Molecular Property Prediction and Discovery. ACS Cent. Sci. 2021, 7 (8), 1356–1367. 10.1021/acscentsci.1c00546. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Weininger D. SMILES, a Chemical Language and Information System. 1. Introduction to Methodology and Encoding Rules. J. Chem. Inf. Comput. Sci. 1988, 28 (1), 31–36. 10.1021/ci00057a005. [DOI] [Google Scholar]
  36. Will Y.; Dykens J. Mitochondrial Toxicity Assessment in Industry – a Decade of Technology Development and Insight. Expert Opin. Drug Metab. Toxicol. 2014, 10 (8), 1061–1067. 10.1517/17425255.2014.939628. [DOI] [PubMed] [Google Scholar]
  37. ChemAxon. https://chemaxon.com (accessed 2022-11-25).
  38. RDKit. https://www.rdkit.org/ (accessed 2022-11-25).
  39. Butina D. Unsupervised Data Base Clustering Based on Daylight’s Fingerprint and Tanimoto Similarity: A Fast and Automated Way To Cluster Small and Large Data Sets. J. Chem. Inf. Comput. Sci. 1999, 39 (4), 747–750. 10.1021/ci9803381. [DOI] [Google Scholar]
  40. Bergstra J.; Yamins D.; Cox D. Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures. Proceedings of the 30th International Conference on Machine Learning; PMLR 2013, 28, 115–123. [Google Scholar]
  41. Arany A.; Simm J.; Oldenhof M.; Moreau Y. SparseChem: Fast and Accurate Machine Learning Model for Small Molecules. arXiv. March 9, 2022. arXiv:2203.04676v1 (accessed 2022-11-25) 10.48550/arXiv.2203.04676. [DOI]
  42. Simm J.SparsChem, 2022. https://github.com/jaak-s/sparsechem (accessed 2022-11-25).
  43. Rokach L. Ensemble-Based Classifiers. Artif. Intell. Rev. 2010, 33 (1–2), 1–39. 10.1007/s10462-009-9124-7. [DOI] [Google Scholar]
  44. Bemis G. W.; Murcko M. A. The Properties of Known Drugs. 1. Molecular Frameworks. J. Med. Chem. 1996, 39 (15), 2887–2893. 10.1021/jm9602928. [DOI] [PubMed] [Google Scholar]
  45. Sun J.; Carlsson L.; Ahlberg E.; Norinder U.; Engkvist O.; Chen H. Applying Mondrian Cross-Conformal Prediction To Estimate Prediction Confidence on Large Imbalanced Bioactivity Data Sets. J. Chem. Inf. Model. 2017, 57 (7), 1591–1598. 10.1021/acs.jcim.7b00159. [DOI] [PubMed] [Google Scholar]
  46. McInnes L.; Healy J.; Saul N.; Großberger L. UMAP: Uniform Manifold Approximation and Projection. J. Open Source Softw. 2018, 3 (29), 861. 10.21105/joss.00861. [DOI] [Google Scholar]
  47. McInnes L.; Healy J.; Melville J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv. September 17, 2020. 10.48550/arXiv.1802.03426 (accessed 2022-11-24). [DOI]
  48. Settles B.Active Learning Literature Survey; Technical Report; University of Wisconsin-Madison Department of Computer Sciences, 2009. https://minds.wisconsin.edu/handle/1793/60660 (accessed 2022-11-30). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

tx2c00404_si_001.pdf (327.3KB, pdf)

Articles from Chemical Research in Toxicology are provided here courtesy of American Chemical Society

RESOURCES