Abstract
Mitochondrial toxicity is a significant concern in the drug discovery process, as compounds that disrupt the function of these organelles can lead to serious side effects, including liver injury and cardiotoxicity. Different in vitro assays exist to detect mitochondrial toxicity at varying mechanistic levels: disruption of the respiratory chain, disruption of the membrane potential, or general mitochondrial dysfunction. In parallel, whole cell imaging assays like Cell Painting provide a phenotypic overview of the cellular system upon treatment and enable the assessment of mitochondrial health from cell profiling features. In this study, we aim to establish machine learning models for the prediction of mitochondrial toxicity, making the best use of the available data. For this purpose, we first derived highly curated datasets of mitochondrial toxicity, including subsets for different mechanisms of action. Due to the limited amount of labeled data often associated with toxicological endpoints, we investigated the potential of using morphological features from a large Cell Painting screen to label additional compounds and enrich our dataset. Our results suggest that models incorporating morphological profiles perform better in predicting mitochondrial toxicity than those trained on chemical structures alone (up to +0.08 and +0.09 mean MCC in random and cluster cross-validation, respectively). Toxicity labels derived from Cell Painting images improved the predictions on an external test set up to +0.08 MCC. However, we also found that further research is needed to improve the reliability of Cell Painting image labeling. Overall, our study provides insights into the importance of considering different mechanisms of action when predicting a complex endpoint like mitochondrial disruption as well as into the challenges and opportunities of using Cell Painting data for toxicity prediction.
Introduction
Mitochondria are vital organelles that serve as the primary source of energy production and play a critical role in cell metabolism, signaling, and cell death. Due to their involvement in numerous cellular processes, they have emerged as a key therapeutic target in the development of new drugs. However, compounds that interfere with mitochondrial function by inducing oxidative stress, disrupting the electron transport chain, or inhibiting other mitochondrial functions can also lead to adverse effects, such as drug-induced liver injury.1−5 Therefore, identifying potential mitochondrial toxic compounds early in the drug development process is critical for reducing the risk of adverse reactions in patients.
Traditionally, mitochondrial toxicity has been evaluated using in vitro assays that measure specific endpoints, such as changes in membrane potential or the inhibition of the respiratory chain. However, these assays can be time-consuming and labor-intensive and may not capture the full range of effects that a compound can have on mitochondrial function.6 Moreover, they require a priori knowledge of the mechanism of toxicity, which may not always be available for new compounds. This type of assay data has already been used to build well-performing machine learning (ML) models based on single-source data for electron transport chain inhibitors7 or on compilations of data from several sources and assays related to membrane potential loss.8,9 Recently, the MitoTox database10 was released and offers the opportunity of developing new models that cover a wider chemical space while remaining specific to a mode of action.
Recent advances in high-throughput imaging technologies have enabled the generation of large-scale datasets that capture cellular and organelle morphology in response to chemical perturbations. The Cell Painting assay,11 developed by the Broad Institute, is one such technology that uses fluorescent dyes to label different cellular components and generate images of the perturbed cells. From these images, a collection of image-based features (currently over 3600) is extracted to capture a morphological profile of the perturbation. This approach has the potential to overcome some of the limitations of traditional assays, as it does not require prior knowledge of the mechanism of toxicity, can capture a wide range of effects on mitochondrial function, and can be applied in a high-throughput manner. By analyzing these large amounts of data with ML models, it may be possible to predict the mitochondrial toxicity of new compounds based on their observed morphological phenotypes.12,13
Morphological profiling data has already been used in various areas of toxicity prediction, including mitochondrial toxicity of small molecules and PROTACs,14,15 cytotoxicity,16 biological assays,17 cell health,18 and drug-induced liver injury.19 Seal et al.14 assembled a mitochondrial toxicity dataset containing 382 compounds (with 62 mitotoxic and 320 nontoxic compounds) with chemical, Cell Painting, and gene expression data and evaluated it within cross-validation (CV) and on an external test set of 244 compounds (with 47 mitotoxic compounds). The Cell Painting and gene expression data proved especially useful in combination with the chemical features (by combining either the three feature types as input or the predictions of three individual models). Models combining the three sources of information showed the highest F1 score on the external test set, indicating a better extrapolation ability.
In this work, we aim to improve the prediction of mitochondrial toxicity with ML models by leveraging both chemical descriptors and Cell Painting data. To this end, we compiled and curated publicly available datasets for mitochondrial toxicity, covering both overall unspecific toxicity and mechanism-specific toxicities. We also demonstrate the relevance of these datasets for the development of accurate and efficient predictive models. Moreover, we use Cell Painting data to answer two questions: (i) whether models built on morphological profiles present an advantage compared to models trained on chemical descriptors of the compiled datasets and (ii) whether the outcome of a hypothesis-free and high-throughput assay like the Cell Painting assay could serve to annotate compounds and expand the chemical space of the mitochondrial toxicity datasets, which is one of the biggest limitations for the development of accurate toxicity prediction models.
Materials and Methods
Mechanistic Assays
The data used in this study were collected from several public sources and aggregated into an overall mitochondrial toxicity dataset and mechanism-specific subsets (based on the analyzed mechanism of action or the assay setup).
Data Sources
Data from the Tox21 database (Toxicology in the 21st Century)20 for the quantitative high-throughput screening assay identifying disruptors of the mitochondrial membrane potential was collected from PubChem AID720637.21 The corresponding test and score sets from the Tox21 challenge22 for the membrane potential assay were also gathered to use them as external test set.
The MitoTox database10 contains data about the effect of over 1400 compounds on different mitochondrial functions. To obtain data subsets of sufficient size for modeling, an analysis was performed to determine the best method for aggregating the data. The analysis involved evaluating the data description, number of results, and consistency of the results for a compound across different categories and species. Based on this evaluation, three subsets of data were extracted: (a) membrane potential assays in human only, (b) respiratory chain inhibitors and uncouplers, and (c) inhibitors of mitochondrial functions (see Supporting Information on the Methods for details).
Further data measuring mitochondrial function loss in HepG2 cells using a respirometric screening assay were collected from Hallinger et al.23
Finally, data generated at Bayer with the glucose-galactose assay in HepG2 cells were also assembled. As opposed to cells grown in glucose, those grown in galactose media rely on mitochondria for the generation of adenosine triphosphate. Hence, measuring the ratio of IC50 values between the glucose and galactose media can be used to estimate the degree to which a chemical may disrupt mitochondrial function. Binary labels were derived by defining ratios below 2 as inactive and ratios above 2.5 as active (compounds with ratios between the thresholds were filtered out of the dataset).
Data Aggregation
Data from the different sources were aggregated based on the canonical SMILES (see Structure Preparation section). When multiple results for a single compound were found, majority voting was used for aggregation. If a tie occurred, the compound was excluded from the corresponding dataset. This approach was followed for all the presented datasets and subsets. The number of results from single and multiple sources, as well as the number of discarded results (due to tied results among sources), are provided in the Supporting Information (Table S1).
The overall dataset considers the mitochondrial toxicity result annotated in all sources (except the Bayer in-house dataset), and the final class labels were determined by majority voting, giving all sources the same weight. This resulted in 5920 labeled compounds (1086 toxic and 4834 nontoxic compounds; Table 1). Different approaches for determining the overall class labels were investigated but only had a minor influence on the labels and downstream results (see Supporting Information on the Methods). To avoid an overlap between the training and external test sets, compounds also present in the Tox21 test and score sets were removed from the overall dataset.
Table 1. Overview of Mitochondrial Toxicity Dataset Composition (Mechanistic Datasets).
Endpoint | Total | Active | Inactive |
---|---|---|---|
Overall | 5920 | 1086 | 4834 |
Membrane potential | 5457 | 771 | 4686 |
Respiratory chain | 1079 | 379 | 700 |
Function of mitochondria | 1249 | 537 | 712 |
Tox21 test + score set | 761 | 94 | 667 |
Bayer | 393 | 168 | 225 |
In order to create more specific and less noisy datasets, three subsets of data were identified corresponding to results from related assays or similar mechanisms of action (Table 1):
Membrane potential: contains data from the Tox21 and the membrane potential assays (in human cells) from the MitoTox database (see Data Sources section). Compounds in the Tox21 test and score sets were also removed from this data subset. The dataset for this endpoint covers 5457 compounds, of which 771 are toxic.
Respiratory chain: contains data from the respiratory chain inhibitors and uncouplers from the MitoTox database, as well as the respirometric screening from Hallinger et al.23 This dataset comprises 1079 compounds, of which 379 are toxic.
Function of mitochondria: contains the same data as the respiratory chain endpoint, plus the inhibitors of mitochondrial functions from the MitoTox database. This dataset covers 1249 compounds, of which 537 are toxic.
These datasets, derived from results in mechanism-specific in vitro assays, are referred to as the mechanistic datasets from here on.
Structure Preparation
The chemical structures were standardized using the Standardizer class from the RDKit library,24 starting from the SMILES strings. This involved applying the functions charge_parent, isotope_parent, stereo_parent, and tautomer_parent to each molecule. This standardization protocol was inspired by the MELLODDY-TUNER standardize_smiles workflow.25
To ensure a consistent representation of chemically equivalent structures, the canonical SMILES for each molecule was calculated. Duplicate structures with conflicting labels within the data from a single source were removed. When merging data from different sources, any duplicate results were resolved by majority voting, with ties leading to the removal of the compound from the respective dataset.
Chemical Descriptors
The molecular structures were represented using Continuous and Data-Driven molecular Descriptors (CDDDs).26 These descriptors are obtained from a pretrained model that learns to translate between two different molecular representations. For this purpose, information about the molecular structure is embedded in a low-dimensional vector that can be extracted and shapes the CDDDs of a compound. The code for calculating these descriptors is publicly available and can be readily utilized.27
Models based on other commonly used descriptors (Morgan fingerprints and physicochemical descriptors) were also evaluated but resulted in comparable results to those of the models presented in this study and were not investigated further for the sake of simplicity.
Cell Painting Assay
Cell Painting Data
Images were generated following the Cell Painting protocol, version 328 imaged with an Opera Phenix (PerkinElmer) microscope in 384-well plates. The dataset consists of 32347 proprietary compounds plus 10057 compounds from the Joint Undertaking in Morphological Profiling - Cell Painting (JUMP-CP) consortium,29 which correspond to a subset of the Selleckchem and MedChemExpress bioactive libraries. Each compound was tested in cells from the U2OS cell line at a concentration of 10 μM in at least two biological repetitions. To ensure data consistency and avoid interlaboratory bias, only assay data from the JUMP-CP consortium generated at Bayer were used.
Morphological Profiles
Images were processed with CellProfiler30 bioimage analysis software using the pipelines provided by the JUMP-CP consortium.31 Single cell profiles were then processed with pycytominer workflows32 to produce well-level normalized features. We used profiles normalized by robust z-scoring to the full plate (normalized.csv.gz) since, in our case, they provided the highest replicability. To reduce feature redundancy, we only considered the subset of 560 features reported by Chandrasekaran et al.,33 which resulted from applying feature selection on pilot data from the JUMP-CP consortium based on a variance threshold. Moreover, the normalized cell count was added as an extra feature. Invalid (-inf, +inf, nan) and outlier features (absolute value >1000) were imputed to the dataset median. Finally, the element-wise mean among well profiles from the same perturbation was computed to generate the consensus profiles (one profile per compound) used for modeling.
Mitochondrial Toxicity Datasets with Morphological Profiles
Datasets of compounds with both a mitochondrial toxicity label and a morphological profile from the Cell Painting assay were assembled. These datasets were obtained by taking the overlap between the individual mitochondrial toxicity datasets and the available Cell Painting data based on the canonical SMILES strings. An overview of the composition of the overlapping data at this stage is provided in the Supporting Information (Table S2).
Some images may show inactive phenotypes (i.e., similar morphological profile to the DMSO negative control), which can result, for instance, from low concentrations, low exposure time, low permeability, or bioactivities not leading to morphological changes. To improve data quality and remove false negative profiles that could mislead the models, a filtering approach was applied to reduce the number of inactive phenotypes. First, all the plate normalized CellProfiler30 features (see Morphological Profiles section) were standardized with the StandardScaler class of scikit-learn.34 Subsequently, active profiles were selected following the criteria suggested by Okolo et al.,35 i,e., percentage of significantly changed features above 5%. However, we reduced the threshold applied to detect significant features (scaled absolute value ≥0.7) in order include mildly active compounds and increase the dataset size. The collection of active profiles, referred to as Cell Painting datasets from here on, was used for all modeling experiments (Table 2).
Table 2. Overview of the Overlap between the Mitochondrial Toxicity Datasets and the Active Profiles from the Cell Painting Data Used for Modeling (Cell Painting Datasets).
Endpoint | Total | Active | Inactive |
---|---|---|---|
Overall | 681 | 193 | 488 |
Membrane potential | 643 | 158 | 485 |
Respiratory chain | 166 | 78 | 88 |
Function of mitochondria | 179 | 91 | 88 |
Tox21 test set | 60 | 17 | 43 |
Bayer | 27 | 14 | 13 |
Chemical Space Analysis
To visualize the chemical space of the mechanistic datasets, a dimension reduction with the Uniform Manifold Approximation and Projection (UMAP)36 algorithm was conducted based on the CDDDs (with the parameters n_neighbors = 15, min_dist = 0.2, metric = “euclidean”). The dimension reduction was based on the complete mechanistic dataset, and the datapoints corresponding to the mechanism-specific subsets were extracted into further individual plots.
UMAP was also employed to project the chemical space of the Cell Painting datasets based on the CDDDs as well as on the subset of morphological profiles used for modeling (see section Morphological Profiles). For comparability, the plots based on CDDDs for the Cell Painting datasets (containing a subset of the compounds in the mechanistic datasets) were extracted from the same transformation as the plots for the mechanistic datasets. The plots based on morphological profiles were derived by applying the dimensionality reduction on the compounds from the Cell Painting datasets only.
Model Development
Development of Mechanistic Models
ML models were built on the mechanistic datasets for the overall, membrane potential, respiratory chain, function of mitochondria, and Bayer endpoints. For developing these models, random forest (RF) and fully connected neural network (NN) models were trained on CDDDs.
The RF models were developed within a nested CV: an outer CV splitting the data into training and test sets for model evaluation (see Model Performance Evaluation) and an inner CV splitting each training set to obtain validation sets for hyperparameter optimization. The hyperparameter optimization and the model training were conducted with the GridSearchCV and RandomForestClassifier implementations from scikit-learn34 (see Supporting Information on the Methods for details on the hyperparameters).
For the NN models, a validation set was first split from the data, and a CV was performed on the remaining data to obtain the training and test sets for model evaluation. The nested CV was not used in this case to avoid running too many computationally intensive hyperparameter optimizations. The NN models were implemented with PyTorch,37 and the hyperparameters were optimized on the validation set with Optuna.38 The hyperparameter space covered different numbers of layers, numbers of units, learning rates, dropout rates, weight decay rates, and epochs (see Supporting Information on the Methods for details). The weights given to the active and inactive classes when calculating the loss of NN models were balanced (based on the class ratio of each dataset) to penalize errors committed on the minority class more strongly and hence account for the class imbalance in the datasets.
For each endpoint, the mean MCCs obtained on the test sets during cluster split CV (see Model Performance Evaluation) for the RF and NN models were compared and the results from the best model were reported (Table 4). These models are referred to as mechanistic models from here on.
Table 4. Mean and Standard Deviation of the Performance of the Best Model for Each Endpoint within Random and Cluster CV.
Endpoint | Method | CV | MCC | F1 score | Recall | Precision | Specificity | AUC |
---|---|---|---|---|---|---|---|---|
Overall | NN | Random | 0.54 (±0.05) | 0.62 (±0.04) | 0.61 (±0.05) | 0.63 (±0.04) | 0.92 (±0.01) | 0.88 (±0.02) |
Cluster | 0.46 (±0.04) | 0.55 (±0.03) | 0.53 (±0.08) | 0.58 (±0.05) | 0.91 (±0.04) | 0.86 (±0.03) | ||
Membrane potential | NN | Random | 0.64 (±0.03) | 0.68 (±0.02) | 0.67 (±0.05) | 0.70 (±0.03) | 0.95 (±0.01) | 0.92 (±0.01) |
Cluster | 0.53 (±0.08) | 0.59 (±0.07) | 0.63 (±0.09) | 0.57 (±0.10) | 0.92 (±0.02) | 0.89 (±0.03) | ||
Respiratory chain | RF | Random | 0.37 (±0.13) | 0.59 (±0.07) | 0.59 (±0.10) | 0.59 (±0.06) | 0.78 (±0.05) | 0.77 (±0.07) |
Cluster | 0.30 (±0.09) | 0.52 (±0.11) | 0.53 (±0.18) | 0.54 (±0.12) | 0.77 (±0.11) | 0.72 (±0.05) | ||
Function of mitochondria | RF | Random | 0.41 (±0.11) | 0.68 (±0.05) | 0.75 (±0.08) | 0.62 (±0.03) | 0.66 (±0.05) | 0.77 (±0.06) |
Cluster | 0.34 (±0.09) | 0.63 (±0.08) | 0.69 (±0.13) | 0.59 (±0.11) | 0.65 (±0.14) | 0.73 (±0.05) | ||
Bayer | RF | Random | 0.45 (±0.10) | 0.68 (±0.07) | 0.70 (±0.06) | 0.68 (±0.15) | 0.76 (±0.10) | 0.81 (±0.05) |
Cluster | 0.51 (±0.16) | 0.70 (±0.09) | 0.68 (±0.15) | 0.78 (±0.19) | 0.79 (±0.19) | 0.79 (±0.07) |
Development of Cell Painting Models
Given the small size of the Cell Painting datasets (see Mitochondrial Toxicity Datasets with Morphological Profiles), only RF models (trained following the same approach described for the mechanistic models) were considered for these data. Note that NN models were also evaluated but led to similar or worse results and were hence not further investigated. Models based on (a) CDDDs, (b) morphological profiles, or (c) CDDDs and morphological profiles were trained on the Cell Painting datasets. The models trained on CDDDs served as a comparison to evaluate the benefits of morphological profiles for mitochondrial toxicity prediction.
Model Training Including Pseudolabels Calculated by Cell Painting Models
The models trained on morphological profiles and CDDDs (described in the section Development of Cell Painting Models) were applied on the remaining Cell Painting profiles with active phenotypes, for which no experimental mitochondrial toxicity annotation is available. The predicted mitochondrial toxicity values from the overall and membrane potential endpoints were used as labels (referred to as pseudolabels) to expand the mechanistic datasets and try to improve the performance of models based on chemical features only. Only these two endpoints were considered, as the performance of the Cell Painting models for the other endpoints was too poor.
Models were trained on the combination of experimentally measured compounds and an increasingly large subset of pseudolabeled compounds. Four main approaches for selecting the pseudolabels to add to the model at each step were investigated:
-
(a)
Confidence-based approach: 500 pseudolabels predicted with the highest confidence (i.e., highest and lowest probability of being active) were added to the training data of the model. To maintain the data balance, the same number of active and inactive pseudolabels (250) are added at each step.
-
(b)
Minority-enriched approach: 500 pseudolabels from the minority class (active class) predicted with the highest confidence (i.e., highest probability of being active) are added to the training data at each step.
-
(c)
Active learning approach: 500 unlabeled compounds for which the predictions with the mechanistic models were most uncertain (i.e., closest to the decision threshold) were labeled and added to the training data. This process was repeated at each iteration by retraining the model with the newly pseudolabeled data to select the next batch of uncertain compounds (based on the predictions of the retrained model).
-
(d)
Combined approach: the same procedure as in the active learning approach was followed to select the most relevant compounds, but a filter was applied on the predicted probability of the pseudolabels to remove low-confidence pseudolabels (with a predicted probability between 0.30 and 0.70). Due to the probability filter, the amount of available pseudolabels in this approach is smaller. At each step 200 pseudolabels were added.
To ensure that any observed performance improvement was not solely due to the increased amount of labeled data, we also compared these four approaches with the addition of the same subset of compounds used in the confidence-based approach, but with randomly assigned toxic and nontoxic labels. This analysis served to evaluate whether the quality of the pseudolabels was indeed superior to that of random labeling.
All models trained on the expanded datasets were based on NN following the same approach as described in the section Development of Mechanistic Models and evaluated only on the true experimental labels.
Multi-Task Model Development
Multi-task models were trained on five experimental mitochondrial toxicity tasks (from the overall, membrane potential, respiratory chain, function of mitochondria, and Bayer in-house datasets) either alone or in combination with tasks composed of pseudolabeled data. The pseudolabels were calculated by models trained on CDDDs and morphological profiles for the overall and membrane potential endpoints (see Development of Cell Painting Models). Two different multi-task models with pseudolabeled data were developed: (a) including two tasks containing all available pseudolabels for the overall and membrane potential endpoints (24313 and 24373 pseudolabeled compounds, respectively) and (b) including two tasks containing the subset of the 8000 most certain pseudolabels (4000 active and 4000 inactive compounds) for each of the two endpoints. The datasets without pseudolabels, with all pseudolabels, and with a subset of pseudolabels were composed of 7219, 31421, and 16332 compounds, corresponding to five, seven, and seven tasks, respectively.
The multi-task models were implemented with PyTorch,37 and the hyperparameters were optimized on the validation set over a variety of hyperparameter combinations (Table S3).
Model Performance Evaluation
Models were evaluated within a 7-fold cross-validation (CV) framework based on both random splits (dividing the dataset in random groups) and cluster splits (dividing the dataset in groups based on chemical similarity). The number of folds was fixed at seven, as this resulted in comparable sizes in all clusters.
Cluster splits were employed to avoid evaluating the models on structures overly similar to those contained in the training set, in order to assess the generalization capacity of the models more accurately. The clusters were derived by applying the Butina clustering algorithm from the deepchem library39 (with a Tanimoto similarity cutoff of 0.6 based on Morgan fingerprints) on the mechanistic datasets. The assigned splits on the mechanistic datasets were also transferred to the Cell Painting datasets.
To ensure comparability between models with and without pseudolabels in their training set, as well as multi-task models, the initial splits computed for the mechanistic datasets were maintained, and new datapoints were assigned to one of the clusters using the k-nearest neighbor implementation from scikit-learn34 (based on Morgan fingerprints). The reported metrics obtained from models partially trained on pseudolabels were exclusively calculated on the experimentally obtained labels.
Results and Discussion
Chemical Space Analysis
To gain insights into the chemical space of the mitochondrial toxicity datasets, we applied UMAP36 to reduce the dimensionality of the chemical descriptors (CDDDs) of compounds in the mechanistic datasets (Figure 1). The chemical space shows some clusters enriched with known mitochondrial toxicants, whereas others appear to contain fewer toxic compounds, suggesting that the chemical characteristics of the molecules play an important role in their toxicity. Moreover, the location of these clusters seems relatively consistent along the different mechanism-specific datasets. While the Bayer in-house dataset covers only a small portion of the chemical space represented by the public data, it is located in close proximity to the public compounds.
Figure 1.
UMAP projection based on CDDDs from the complete mitochondrial toxicity dataset (mechanistic dataset) and subdivided in the different mechanism-specific datasets.
The chemical space of the Cell Painting datasets (the subset of data for which active Cell Painting profile data is available) was also analyzed with UMAP based on either CDDDs or morphological profiles (Figure 2). The separation of mitochondrial toxic compounds from nontoxic compounds is in both cases visible and similarly prominent, indicating that both feature sets contain data patterns that may help identify toxic compounds. A more in-depth data inspection showed that toxic compounds clustering together in the chemical space of one feature set are widely spread in the space of the other feature set. Hence, it can be inferred that both feature sets may provide useful and complementary information for the prediction of mitochondrial toxicity.
Figure 2.
UMAP projection of the subset of compounds with Cell Painting data (Cell Painting datasets) based on CDDDs (left) and on morphological profiles (right) on the example of the overall mitochondrial toxicity dataset. The same data transformation as in Figure 1 was used for the projection based on CDDDs, in order to identify the fraction of the mechanistic datasets covered by the Cell Painting datasets. The projection based on morphological profiles was conducted considering only the Cell Painting datasets.
Differences among Mechanism-Specific Datasets
We investigated how class labels vary between mechanism-specific datasets for those compounds annotated in different datasets (Table 3). The analysis showed that a few toxic compounds in the overall dataset are labeled as nontoxic for some mechanism-specific endpoints. This agrees with the intuition that mitochondrial toxic compounds can be toxic via a single mechanism of action and not necessarily through several of them. The contrary observation (nontoxic compounds in the overall dataset and toxic in any mechanism-specific dataset) was not found.
Table 3. Differing Class Labels between Overlapping Compounds in Different Datasets.
Inactive |
||||
---|---|---|---|---|
Active | Overall | Membrane potential | Respiratory chain | Function of mitochondria |
Overall | - | 9 | 23 | 21 |
Membrane potential | 0 | - | 61 | 63 |
Respiratory chain | 0 | 73 | - | 0 |
Function of mitochondria | 0 | 94 | 0 | - |
Moreover, the discrepancies between the membrane potential and respiratory chain or function of mitochondria datasets indicate that the creation of mechanism-specific datasets is pertinent and can help to reduce the noise of the data. Since the function of mitochondria could be considered an extended version of the respiratory chain dataset (extended by data for the inhibition of further mitochondrial functions), no disagreement was found in the labels of these two datasets. This analysis could not be conducted for the Bayer in-house dataset, as no overlap was found with the public data.
Mechanistic Models
RF and fully connected NN models trained on CDDDs were built for all endpoints in the mechanistic datasets (see section Data Aggregation). For each endpoint, the best model out of the two was selected and made available in the Supporting Information.
The best performance was obtained for the membrane potential endpoint, possibly due to the better relationship between data size and quality (Table 4). This endpoint has the largest dataset among those that focus on a specific mode of action instead of a combination of them, as is the case of the overall dataset. For the membrane potential endpoint, a mean MCC of 0.62 (±0.02) was obtained during random CV, while for the overall task the mean MCC was of 0.55 (±0.04; Table 4). In the case of the endpoints with smaller datasets (respiratory chain, function of mitochondria, and Bayer), the mean MCC was between 0.45 and 0.37 and the corresponding standard deviations between 0.10 and 0.13, indicating a high performance difference depending on the analyzed data split. As the data for the inhibition of the respiratory chain and other functions of mitochondria are scarce, these endpoints combine data from related but different assays and mechanisms and hence result in less specific results. This fact, next to the smaller dataset sizes could explain the poorer performance of the models.
Despite the small dataset (of only 393 compounds), the model trained on the Bayer in-house data showed promising performance. Since all results were obtained from the same assay and laboratory, the noise in the class labels is potentially lower than in other mechanism-specific datasets. However, due to the limited number and narrow range of compounds used for training, the ability of this model to accurately predict the toxicity of compounds highly different from the training data is likely to be limited.
It was also observed that for the endpoints with larger datasets (overall and membrane potential) the NN models perform best, while for smaller datasets RF is the preferred method. This trend may be due to the fact that the higher complexity of NN models is often only advantageous when there is sufficient data available, as otherwise, these models tend to overfit the training set.
Performance on an External Test Set and Comparison with Literature Models
An external test set composed of the Tox21 test and score datasets, corresponding to membrane potential loss assay data, was used to evaluate the overall and membrane potential models. Our results show that models trained on the membrane potential endpoint (based on data from the same mechanism of action as for the test set) perform better on the external test set than the less specific overall models (Table 5).
Table 5. Performance of the Models Developed in this Study and Literature Models on an External Test Set of Membrane Potential Loss.
Model | Method | MCC | F1 score | Recall | Precision | Specificity |
---|---|---|---|---|---|---|
Overall | NN | 0.48 | 0.54 | 0.68 | 0.45 | 0.88 |
Membrane potential | NN | 0.57 | 0.62 | 0.66 | 0.59 | 0.94 |
Hemmerich et al.8,a | NN | 0.53 | 0.59 | 0.57 | 0.61 | 0.95 |
Bringezu et al.40,a | Ensemble of undersampled gradient boosting models | 0.61 | - | 0.92 | - | 0.87 |
Results taken from the original papers; not recalculated.
This external test set was also used in other publications to evaluate the performance of their developed models, which enables the comparison of methods and training data quality. Our membrane potential model shows higher performance than the model developed by Hemmerich et al.,8 which was also trained using NN on a slightly larger dataset (5761 compounds, including 824 toxic compounds) from membrane potential loss assays as well as other (unspecified) assays. This finding provides further evidence that utilizing mechanism-specific data can improve the prediction of mitochondrial membrane potential loss.
Bringezu et al.40 developed models using the same dataset as Hemmerich et al. Their models are ensembles of undersampled gradient boosting models, where each model is based on a subset of data with a good class balance. These models show higher MCC, but especially a much higher recall (due to the balanced data used in model training) compared to both the membrane potential models and the models from Hemmerich et al. Nevertheless, the increase in recall comes at the cost of a higher false positive rate, as reflected by the lower specificity. Further investigation on this improved modeling approach trained on our membrane potential dataset would be of interest to fully assess the potential of this combination in the future. Other published models7,9 were not included in this analysis, as they were evaluated on test sets that are not directly comparable, and their code or models are not available for replication.
Cell Painting Models
By exposing cells to mitochondrial toxic chemicals, the morphology and appearance of the mitochondria may change, leading to unique phenotypic properties that can be captured in cell images. For instance, cells treated with papaverine show morphological changes compared to the negative control, such as reduced intensity and increased perinuclear localization of the mitochondrial channel (red channel; Figure 3). To explore the potential of these morphological properties for the prediction of mitochondrial toxicity, we compared the predictive performance of ML models trained on chemical descriptors (CDDDs) and Cell Painting profiles or the combination of both. These models were trained on the subset of data for which both Cell Painting profiles and mitochondrial toxicity annotations were available and are referred to as Cell Painting models (Figure 4).
Figure 3.
Example images of cells treated with the negative control DMSO (A) and cells treated with papaverine (B), an inhibitor of complex I in the mitochondrial respiratory chain. The right image zooms in on the upper left corner of the middle examples.
Figure 4.
Overview of the different sources of data and the datasets used for training each type of model, as well as for the calculation of the pseudolabels.
In general, the performance of models trained on morphological profiles alone was similar to or worse than that of models trained on CDDDs (Figure 5). The results on the overall and membrane potential endpoints were comparable between both types of models, with changes between +0.03 and −0.07 in the MCC of the profile-based model over the chemistry-based model. The endpoint benefiting most from using morphological features for model training was the overall endpoint, which has the biggest dataset (composed of 705 compounds). Besides the higher amount of data, one possible explanation for the higher performance enrichment achieved on the overall endpoint compared to the membrane potential endpoint is that images may capture any type of change in mitochondria, rather than concretely identifying membrane potential loss.
Figure 5.
MCC obtained during random (green) and cluster (orange) CV using chemical descriptors (CDDDs), morphological profiles, or a combination of both as input for the ML models. None of the differences in performance were significant in the Mann–Whitney U test.
Models trained on the combination of morphological profiles and CDDDs showed the best performance for these two endpoints, during both random and cluster CV. Moreover, these models show not only a slight performance increase but also a more stable performance across CV runs (especially for the membrane potential endpoint; Figure 5). However, a statistical analysis of the CV results with the Mann–Whitney U test showed that the difference in the performance is not significant between any pair of models (trained on only one descriptor or the combination of both).
For the two endpoints with the smallest datasets, namely, respiratory chain and function of mitochondria (with dataset sizes between 166 and 179 compounds), the MCC obtained with the profile-based models dropped up to 0.38 points compared to the chemistry-based model. These results may indicate that no (or too subtle) morphological changes are caused by these mechanisms of action and are thus not captured by profile features. Moreover, the amount of available data may also be influencing these results. If the data structures present in morphological profiles are more complex or subtle, ML models would need more data to learn the patterns that lead to the toxicity labels.
Another possible factor that may contribute to the limited improvement in performance when adding morphological features is the potential variation in mitochondrial susceptibility across different cell lines. The Cell Painting assay is performed on U2OS cells, chosen for their suitability for this purpose, while the mechanistic mitochondrial assays used to derive the toxicity labels primarily employ HepG2 cells. These potential differences could introduce noise into the data, offsetting the information gained from morphological profile features.
A detailed overview of the results obtained for the different models and endpoints is provided in the Supporting Information (Table S4). Overall, these observations suggest that both feature types contain relevant and complementary information for the prediction of mitochondrial toxicity. In addition, our results suggest that morphological profiles are more suitable to predict general mitochondrial toxicity than specific mechanisms of action.
Similar observations were made by Seal et al.14 when comparing models trained on Morgan fingerprints, Cell Painting data, and gene expression data, as well as the combination of the three. The biological feature sets by themselves did not significantly outperform the chemical descriptors, but their combination, either merging the input features of the three different sets (early stage fusion) or averaging the predictions obtained by three individual models (late stage fusion), seemed beneficial, especially for increasing the predictivity on the external test set. In some studies, late fusion models seemed to slightly outperform early stage fusion models.41 However, in our case, late stage fusion models did not show any benefits and were not further investigated. Alternative ways of combining chemical and morphological profiles have also been explored in other studies42,43 and could be tested in follow-up analysis.
We also evaluated the performance of the Cell Painting models on the subset of the Tox21 external test set for which Cell Painting data were available (60 compounds with 17 active compounds). The predictivity of the models trained on morphological profiles (alone or in combination with CDDDs) on the external test set was worse than for the chemistry-based model. However, inconsistencies between the evaluation of the mechanistic models on the full test set (761 compounds) and the test subset with Cell Painting data (60 compounds) suggest that the latter might be too small for a reliable performance estimate. For instance, we observe a large performance decrease in the membrane potential mechanistic model between the full test set (MCC of 0.57; Table 5) and the Cell Painting subset (MCC of 0.41; Table 6), while the overall mechanistic model shows a performance increase (+0.06 MCC). Moreover, the inflated performance of Cell Painting models trained on CDDDs compared to mechanistic models (trained also on CDDDs but on a much larger dataset) might also be indicative of a data bias (Table 6).
Table 6. Performance of Different Models on a Subset of 60 Compounds from the External Test Set with Cell Painting Data.
Endpoint | Model | Input features | MCC | F1 score | Recall | Precision | Specificity | AUC |
---|---|---|---|---|---|---|---|---|
Overall | Mechanistic | CDDD | 0.54 | 0.67 | 0.65 | 0.69 | 0.88 | 0.88 |
Cell Painting | CDDD | 0.64 | 0.74 | 0.76 | 0.72 | 0.88 | 0.93 | |
Cell Painting | profile | 0.55 | 0.68 | 0.76 | 0.62 | 0.81 | 0.86 | |
Cell Painting | profile+CDDD | 0.53 | 0.67 | 0.71 | 0.63 | 0.84 | 0.87 | |
Membrane potential | Mechanistic | CDDD | 0.41 | 0.48 | 0.35 | 0.75 | 0.95 | 0.86 |
Cell Painting | CDDD | 0.72 | 0.80 | 0.82 | 0.78 | 0.91 | 0.94 | |
Cell Painting | profile | 0.58 | 0.70 | 0.76 | 0.65 | 0.83 | 0.88 | |
Cell Painting | profile+CDDD | 0.61 | 0.72 | 0.76 | 0.68 | 0.86 | 0.87 |
Another relevant result from our study is the importance of properly cleaning the Cell Painting data prior to modeling. Cell Painting assays were performed at a fixed concentration of 10 μM on the U2OS cell line. However, some compounds may show activity only at higher concentrations and result in inactive image profiles highly similar to that of the negative control. Moreover, other parameters such as the exposure time or tested cell line may prevent morphological changes after treatment. To limit the noise in the data and increase the accuracy of the predictions, inactive profiles were filtered out from the datasets (see section Mitochondrial Toxicity Datasets with Morphological Profiles).
The comparison of the results obtained by training models on the filtered profiles and on all (unfiltered) profiles confirmed the presence of noise in these data. Models trained on all profiles showed lower performance, especially regarding the F1 score (see Figure S1). The most dramatic performance decreases were observed for models trained on morphological profiles (either alone or in combination with chemical descriptors). The fact that models trained on chemical descriptors alone also showed a slight decrease in performance on the unfiltered datasets during cluster CV may indicate that the filtered compounds are less active or have a less clear mitochondrial toxicity label, making them more difficult to predict.
Dataset Expansion with Pseudolabels Calculated by Cell Painting Models
One limitation of models based on morphological profiles is the necessity of conducting the Cell Painting assay before applying the models to new compounds in order to obtain the input features for the prediction. To make the ML models more practical, it would be desirable to predict toxicity directly from the chemical structure without synthesizing and testing the compounds beforehand. However, one of the biggest limitations of toxicity prediction models based on chemical structures is usually the small amount of experimental data and the small coverage of the chemical space. To address this limitation, we investigated whether Cell Painting assay results could be used to annotate compounds in a high-throughput manner. For this purpose, models trained on morphological profiles (and chemical descriptors) were used to predict labels for mitochondrial toxicity for the remaining, unlabeled Cell Painting data (Figure 4). We refer to these inferred labels as ”pseudolabels” in the rest of the work.
The predicted pseudolabels were added sequentially to the original mitochondrial toxicity datasets to evaluate how models may benefit from different amounts and compositions of pseudolabeled data. The models trained on the combination of experimental and pseudolabeled data were evaluated only on the subset of data with experimental labels. Note that to ensure direct comparability, the data splits used for the evaluation of the mechanistic models (see section Mechanistic Models) were maintained and pseudolabeled compounds were assigned to the existing splits (see section Model Performance Evaluation). This study was conducted only for the overall and membrane potential endpoints due to the limited amount of labeled profiles for other mitochondrial toxicity endpoints as well as the poor performance of the models trained on these data.
Pseudolabels were added to the models iteratively following four compound selection approaches (see section Model Training Including Pseudolabels Calculated by Cell Painting Models for more details): (i) the confidence-based approach prioritizes compounds with the most reliable pseudolabels (based on the predicted probability); (ii) the minority-enriched approach prioritizes compounds with the most reliable pseudolabels from the minority class; (iii) the active learning approach prioritizes compounds for which the mechanistic model is most uncertain; (iv) and the combined approach combines the active learning approach with a reliability threshold for the pseudolabel.
The models trained on the expanded datasets were based on the NN model, as this method scales better to bigger datasets in both training speed and performance. Nevertheless, the same approach was also tested on RF models with similar results (data not shown). The best MCC obtained for each endpoint over all combinations of approaches and number of pseudolabels included is reported in Table 7. Details on the best results yielded by each approach are provided in the Supporting Information (Table S5)
Table 7. Mean and Standard Deviation of the Performance of the Mechanistic Models and the Best Models Trained on an Extended Dataset with Pseudolabels.
MCC |
|||||
---|---|---|---|---|---|
Endpoint | CV | Baseline | Extended training set | Number of pseudolabels | Approach |
Overall | Random | 0.54 (±0.05) | 0.58 (±0.03) | 1200 | Combined approach |
Cluster | 0.46 (±0.04) | 0.52 (±0.05) | 8000 | Confidence-based approach | |
Membrane potential | Random | 0.64 (±0.03) | 0.65 (±0.03) | 4000 | Active learning approach |
Cluster | 0.53 (±0.08) | 0.57 (±0.08) | 8500 | Confidence-based approach |
Overall, model performance increased in up to +0.06 and +0.04 mean MCC for the overall and membrane potential endpoints, respectively, when adding pseudolabels to the training data. The highest performance improvements were observed during cluster CV, while the improvement during random CV was minor. These results indicate that the models trained on the extended datasets generalize slightly better to new chemical scaffolds but do not significantly improve the predictions on the already covered chemical space.
No significant differences were observed between the results yielded by the four different approaches (Figure 6). In general, model performance tended to improve after adding the first pseudolabels (up to around 8000 pseudolabels) and to decrease with larger numbers of pseudolabels. These results were compared to the performance of models trained on data in which the pseudolabels were exchanged by randomized labels. The performance obtained with any of the four approaches was better than that with the randomized labels, which, as expected, caused the performance to continuously drop. This demonstrates that pseudolabels derived from Cell Painting profiles provide reasonable annotations of mitochondrial toxicity, as they did not reduce model performance.
Figure 6.
Changes in the MCC obtained for the overall and membrane potential endpoints as pseudolabels selected with four different approaches are added to the training data. For comparison, the addition of compounds with randomized pseudolabels is also shown (purple). For the combined approach (red), a smaller number of pseudolabeled data is available, as only reliable pseudolabels are considered.
The evaluation of the models with extended datasets on the Tox21 external test set did not show better results than the original mechanistic models, suggesting that the extrapolation to the chemical space of this test set is not significantly enhanced by the pseudolabeled data.
Multi-Task Models with and without Pseudolabeled Data
Multi-task models were also explored to try to improve the performance of the single-task mechanistic models and as another possible way of extracting relevant information from the pseudolabeled data. These models are trained on multiple tasks simultaneously to allow for learning shared features that can be suitable for all tasks. Multi-task models have been shown to be useful when there is limited data for each individual task and/or when there are strong relationships between the tasks and shared features that can help the model make more informed predictions.44−46
Here, multi-task models were trained on the mechanistic datasets alone (including five tasks; see Multi-Task Model Development section) or in combination with two datasets of pseudolabeled data for the overall and membrane potential endpoints. Two different strategies were considered for adding tasks with pseudolabeled data: (a) adding all available pseudolabels (over 24000) or (b) adding a subset of the 8000 pseudolabels predicted with the highest (or lowest) probability in order to consider only the most reliable pseudolabels.
In this case, the combination of several tasks (with or without the addition of tasks with pseudolabels) rarely resulted in an improvement in the predictive performance over the single-task mechanistic models (Figure 7). The distribution of the MCC obtained during cluster CV was slightly enhanced for the overall and membrane potential endpoints when including additional tasks with the subset of the most reliable pseudolabels. This improvement was mainly caused by a higher recall, often counteracted by a lower precision (Tables S3 and S4). Overall, the mean MCC of these multi-task models showed improvements of only +0.03 and +0.01 for the respective endpoints. The Mann–Whitney U test also showed that the difference in the MCC between the single-task model and any multi-task model is not significant. Possible explanations for the minor performance improvement are the absence of complementary information among tasks or the inability of the model to derive a more relevant representation of the problem than those for the single-task models.
Figure 7.
MCC obtained for all the mitochondrial toxicity endpoints with the respective best single-task model (mechanistic single-task) and different multi-task models. Three multi-task models were evaluated: including only experimental data (mechanistic multi-task), adding also two tasks with all the available pseudolabels of the overall and membrane potential endpoints (all-pseudolabels multi-task), or adding two tasks containing only the subset of the most reliable pseudolabels (subset-pseudolabels multi-task). None of the differences in performance were significant in the Mann–Whitney U test.
The multi-task models were also applied on the external test set of membrane potential loss to determine if the generalization capabilities of the model had improved. In the case of the overall endpoint, we observed a big improvement of the predictivity of the multi-task models, especially when adding tasks with a subset of the pseudolabels (+0.08 MCC and +0.14 recall compared to the single-task model; Table 8). For the membrane potential endpoint, the overall performance was not improved, but we saw some potential for increasing the recall using the multi-task approach including tasks with pseudolabels (+0.07 recall over the single-task model).
Table 8. Model Performance Comparison between Single-Task and Multi-Task Models on the External Test Set.
Endpoint | Method | Pseudolabel tasks | MCC | F1 score | Recall | Precision | Specificity |
---|---|---|---|---|---|---|---|
Overall | Single-task NN | - | 0.48 | 0.54 | 0.68 | 0.45 | 0.88 |
Overall | Multi-task NN | - | 0.50 | 0.56 | 0.71 | 0.47 | 0.88 |
Overall | Multi-task NN | All pseudolabels | 0.53 | 0.59 | 0.74 | 0.48 | 0.89 |
Overall | Multi-task NN | Subset pseudolabels | 0.56 | 0.61 | 0.82 | 0.48 | 0.88 |
Membrane potential | Single-task NN | - | 0.57 | 0.62 | 0.66 | 0.59 | 0.94 |
Membrane potential | Multi-task NN | - | 0.54 | 0.60 | 0.68 | 0.54 | 0.92 |
Membrane potential | Multi-task NN | All pseudolabels | 0.56 | 0.61 | 0.73 | 0.53 | 0.90 |
Membrane potential | Multi-task NN | Subset pseudolabels | 0.56 | 0.62 | 0.73 | 0.53 | 0.91 |
We also examined whether the observed performance improvement on the external test set could be attributed to the similarity between the test compounds and the pseudolabeled compounds. There is a small overlap between the test set (761 compounds) and the subset of selected compounds with reliable pseudolabels, consisting of 38 compounds for the overall and 39 compounds for the membrane potential endpoints. Interestingly, for these overlapping compounds, the single-task models (without pseudolabel information) already provided correct predictions for 29 and 32 compounds, respectively. Therefore, the performance improvement can mainly be attributed to the correction of predictions for unseen compounds. The remaining compounds with pseudolabels (not present in the test set) exhibit a lower similarity to the compounds in the test set compared to the experimental mechanistic datasets (Figure 8). Consequently, the improved results can be better explained by an enhanced generalization capability of the model or a broader coverage of the chemical space rather than the high similarity between the test compounds and the pseudolabeled data.
Figure 8.
Tanimoto similarity distribution of the nearest neighbors of the external test set compounds in the experimental mechanistic datasets (blue) and in the subset of the 8000 most reliable pseudolabels (orange) for the overall and membrane potential endpoints.
Conclusions
Mitochondrial dysfunction has been linked to various diseases and adverse drug effects, such as drug-induced liver injury. Therefore, developing accurate ML models for the prediction of mitochondrial toxicity can help to identify problematic substances early in the drug discovery pipeline and to develop safer drugs and chemicals. In this study, datasets for general and mechanism-specific mitochondrial toxicity were compiled and curated, and the performance of various ML models was evaluated on these datasets. It was shown that especially well-performing models can be obtained for the prediction of membrane potential loss, showing a mean MCC of 0.64 in random CV and an MCC of 0.57 on an external test set.
We also investigated the benefits of using Cell Painting profiles (alone or in combination with chemical structure features) for the prediction of mitochondrial toxicity. It was observed that models trained on the combination of morphological and chemical features outperform those trained on chemical features alone, indicating that morphological profiles capture important information related to mitochondrial toxicity and complementary to the chemical structure. Furthermore, it was demonstrated that the removal of images from inactive profiles improved the accuracy of profile-based models.
Finally, we attempted to expand the compiled mitochondrial toxicity datasets by labeling new compounds based on the Cell Painting images. However, only minor improvements in model performance were observed. These results suggest that, while the Cell Painting approach has potential for identifying toxic compounds, further work is needed to improve the performance of models based on these image data and enable their broader application in toxicity prediction. Overall, our study investigates various approaches for using Cell Painting data for mitochondrial toxicity prediction and sheds light on the potential benefits and challenges of high-throughput imaging technologies for toxicological assessment.
Acknowledgments
M.G.L and F.M. received funding from the European Union’s Horizon research and innovation programme under grant agreement No. 963845. The authors thank the JUMP-CP consortium for fostering the Cell Painting assay and granting access to part of the generated data. We also thank Marian Raschke for providing the Bayer in-house data and valuable knowledge on mitochondrial toxicity, as well as Marc Osterland for extracting the morphological features and selecting example Cell Painting images.
Data Availability Statement
GitHub (https://github.com/Bayer-Group/mitochondrial-tox) mechanistic models and code used to prepare the data and apply the models.
Supporting Information Available
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.chemrestox.3c00086.
Additional experimental details, methods, and results, including number of results derived from a single or multiple sources, overview of the overlap between the mitochondrial toxicity datasets and all Cell Painting data, hyperparameter space covered, and performance of models, including detailed CV results of models trained on morphological and chemical features, models expanded with pseudolabels derived from Cell Painting data and multi-task models (PDF)
mitotox_dataset.xlsx: table containing the mechanistic datasets, including SMILES, class labels and further information on the data aggregation (XLSX)
Author Contributions
CRediT: Marina Garcia de Lomana conceptualization, data curation, formal analysis, investigation, methodology, software, writing-original draft; Paula Andrea Marin Zapata conceptualization, methodology, resources, supervision, writing-review & editing; Floriane Montanari conceptualization, methodology, project administration, supervision, writing-review & editing.
The authors declare no competing financial interest.
Supplementary Material
References
- Labbe G.; Pessayre D.; Fromenty B. Drug-induced liver injury through mitochondrial dysfunction: mechanisms and detection during preclinical safety studies. Fundam. Clin. Pharmacol. 2008, 22, 335–353. 10.1111/j.1472-8206.2008.00608.x. [DOI] [PubMed] [Google Scholar]
- Varga Z. V.; Ferdinandy P.; Liaudet L.; Pacher P. Drug-induced mitochondrial dysfunction and cardiotoxicity. Am. J. Physiol. Heart Circ. Physiol. 2015, 309, H1453–H1467. 10.1152/ajpheart.00554.2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pessayre D.; Mansouri A.; Berson A.; Fromenty B. Mitochondrial involvement in drug-induced liver injury. Handb. Exp. Pharmacol. 2010, 196, 311–365. 10.1007/978-3-642-00663-0_11. [DOI] [PubMed] [Google Scholar]
- Porceddu M.; Buron N.; Roussel C.; Labbe G.; Fromenty B.; Borgne-Sanchez A. Prediction of liver injury induced by chemicals in human with a multiparametric assay on isolated mouse liver mitochondria. Toxicol. Sci. 2012, 129, 332–345. 10.1093/toxsci/KFS197. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Borgne-Sanchez A.; Fromenty B.. Mitochondrial Dysfunction Caused by Drugs and Environmental Toxicants; John Wiley & Sons, Ltd, 2018; pp 47–72. [Google Scholar]
- Mihajlovic M.; Vinken M. Mitochondria as the Target of Hepatotoxicity and Drug-Induced Liver Injury: Molecular Mechanisms and Detection Methods. Int. J. Mol. Sci. 2022, 23, 3315. 10.3390/ijms23063315. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tang W.; Liu W.; Wang Z.; Hong H.; Chen J. Machine learning models on chemical inhibitors of mitochondrial electron transport chain. J. Hazard. Mater. 2022, 426, 128067. 10.1016/j.jhazmat.2021.128067. [DOI] [PubMed] [Google Scholar]
- Hemmerich J.; Troger F.; Füzi B.; Ecker G. F. Using Machine Learning Methods and Structural Alerts for Prediction of Mitochondrial Toxicity. Mol. Inform. 2020, 39, 2000005. 10.1002/minf.202000005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhao P.; Peng Y.; Xu X.; Wang Z.; Wu Z.; Li W.; Tang Y.; Liu G. In silico prediction of mitochondrial toxicity of chemicals using machine learning methods. J. Appl. Toxicol. 2021, 41, 1518–1526. 10.1002/jat.4141. [DOI] [PubMed] [Google Scholar]
- Lin Y.-T.; Lin K.-H.; Huang C.-J.; Wei A.-C. MitoTox: a comprehensive mitochondrial toxicity database. BMC Bioinformatics 2021, 22, 369. 10.1186/s12859-021-04285-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bray M.-A.; Singh S.; Han H.; Davis C. T.; Borgeson B.; Hartland C.; Kost-Alimova M.; Gustafsdottir S. M.; Gibson C. C.; Carpenter A. E. Cell Painting, a high-content image-based assay for morphological profiling using multiplexed fluorescent dyes. Nat. Protoc. 2016, 11, 1757–1774. 10.1038/nprot.2016.105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chandrasekaran S. N.; Ceulemans H.; Boyd J. D.; Carpenter A. E. Image-based profiling for drug discovery: due for a machine-learning upgrade?. Nat. Rev. Drug Discovery 2021, 20, 145–159. 10.1038/s41573-020-00117-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu A.; Seal S.; Yang H.; Bender A.. Using Chemical and Biological Data to Predict Drug Toxicity; SLAS Discovery, 2023. 10.1016/j.slasd.2022.12.003. [DOI] [PubMed] [Google Scholar]
- Seal S.; Carreras-Puigvert J.; Trapotsi M.-A.; Yang H.; Spjuth O.; Bender A. Integrating cell morphology with gene expression and chemical structure to aid mitochondrial toxicity detection. Commun. Biol. 2022, 5, 858. 10.1038/s42003-022-03763-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Trapotsi M.-A.; Mouchet E.; Williams G.; Monteverde T.; Juhani K.; Turkki R.; Miljković F.; Martinsson A.; Mervin L.; Pryde K. R.; Müllers E.; Barrett I.; Engkvist O.; Bender A.; Moreau K. Cell Morphological Profiling Enables High-Throughput Screening for Proteolysis TArgeting Chimera (PROTAC) Phenotypic Signature. ACS Chem. Biol. 2022, 17, 1733–1744. 10.1021/acschembio.2c00076. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Seal S.; Yang H.; Vollmers L.; Bender A. Comparison of Cellular Morphological Descriptors and Molecular Fingerprints for the Prediction of Cytotoxicityand Proliferation-Related Assays. Chem. Res. Toxicol. 2021, 34, 422–437. 10.1021/acs.chemrestox.0c00303. [DOI] [PubMed] [Google Scholar]
- Hofmarcher M.; Rumetshofer E.; Clevert D.-A.; Hochreiter S.; Klambauer G. Accurate Prediction of Biological Assays with High-Throughput Microscopy Images and Convolutional Networks. J. Chem. Inf. Model. 2019, 59, 1163–1171. 10.1021/acs.jcim.8b00670. [DOI] [PubMed] [Google Scholar]
- Way G. P.; Kost-Alimova M.; Shibue T.; Harrington W. F.; Gill S.; Piccioni F.; Becker T.; Shafqat-Abbasi H.; Hahn W. C.; Carpenter A. E.; Vazquez F.; Singh S. Predicting cell health phenotypes using image-based morphology profiling. Mol. Biol. Cell 2021, 32, 995–1005. 10.1091/mbc.E20-12-0784. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chavan S.; Scherbak N.; Engwall M.; Repsilber D. Predicting Chemical-Induced Liver Toxicity Using High-Content Imaging Phenotypes and Chemical Descriptors: A Random Forest Approach. Chem. Res. Toxicol. 2020, 33, 2261–2275. 10.1021/acs.chemrestox.9b00459. [DOI] [PubMed] [Google Scholar]
- Richard A. M.; et al. The Tox21 10K Compound Library: Collaborative Chemistry Advancing Toxicology. Chem. Res. Toxicol. 2021, 34, 189–216. 10.1021/acs.chemrestox.0c00264. [DOI] [PMC free article] [PubMed] [Google Scholar]
- AID 720637 - qHTS assay for small molecule disruptors of the mitochondrial membrane potential: Summary - PubChem. https://pubchem.ncbi.nlm.nih.gov/bioassay/720637 (accessed Jan. 24, 2023).
- Tox21 Challenge https://tripod.nih.gov/tox21/challenge/index.jsp (accessed Jan. 24, 2023).
- Hallinger D. R.; Lindsay H. B.; Paul Friedman K.; Suarez D. A.; Simmons S. O. Respirometric Screening and Characterization of Mitochondrial Toxicants Within the ToxCast Phase I and II Chemical Libraries. Toxicol. Sci. 2020, 176, 175–192. 10.1093/toxsci/kfaa059. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Landrum G.RDKit: Open-Source Cheminformatics Software, (version 2021.09.4); 2016. [Google Scholar]
- MELLODDY-TUNER/standardizer.py, GitHub https://github.com/melloddy/MELLODDY-TUNER (accessed Feb. 23, 2023).
- Winter R.; Montanari F.; Noé F.; Clevert D.-A. Learning continuous and data-driven molecular descriptors by translating equivalent chemical representations. Chem. Sci. 2019, 10, 1692–1701. 10.1039/C8SC04175J. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Winter R.Continuous and Data-Driven Descriptors (CDDD), GitHub https://github.com/jrwnter/cddd (accessed May 15, 2023).
- Cimini B. A.; et al. Optimizing the Cell Painting assay for image-based profiling. Nat. Protoc. 2023, 10.1038/s41596-023-00840-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- JUMP-Cell Painting Consortium https://jump-cellpainting.broadinstitute.org/ (accessed Feb. 22, 2023).
- McQuin C.; Goodman A.; Chernyshev V.; Kamentsky L.; Cimini B. A.; Karhohs K. W.; Doan M.; Ding L.; Rafelski S. M.; Thirstrup D.; Wiegraebe W.; Singh S.; Becker T.; Caicedo J. C.; Carpenter A. E. CellProfiler 3.0: Nextgeneration image processing for biology. PLOS Biology 2018, 16, e2005970 10.1371/journal.pbio.2005970. [DOI] [PMC free article] [PubMed] [Google Scholar]
- JUMP Production Pipelines https://github.com/broadinstitute/imaging-platform-pipelines/tree/master/JUMP_production (accessed Feb. 8, 2023).
- Image-Based Profiling Recipe https://github.com/jump-cellpainting/profiling-recipe (accessed Feb. 8, 2023).
- Chandrasekaran S. N.; et al. Three million images and morphological profiles of cells treated with matched chemical and genetic perturbations. bioRxiv 2022, 10.1101/2022.01.05.475090. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pedregosa F.; et al. Scikit-learn: Machine Learning in Python. (version 1.0.2). J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Okolo E. A.; Pahl A.; Sievers S.; Pask C. M.; Nelson A.; Marsden S. Scaffold Remodelling of Diazaspirotricycles Enables Synthesis of Diverse sp3-Rich Compounds With Distinct Phenotypic Effects. Chem.—Eur. J. 2023, 29, e202203992. 10.1002/chem.202203992. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McInnes L.; Healy J.; Melville J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv 2020, 10.48550/arXiv.1802.03426. [DOI] [Google Scholar]
- Paszke A.et al. In Advances in Neural Information Processing Systems 32, (version 1.10.2); Curran Associates, Inc.: 2019; pp 8024–8035. [Google Scholar]
- Akiba T.; Sano S.; Yanase T.; Ohta T.; Koyama M. Optuna: A Nextgeneration Hyperparameter Optimization Framework. arXiv 2019, 10.48550/arXiv.1907.10902. [DOI] [Google Scholar]
- Ramsundar B.; Eastman P.; Walters P.; Pande V.; Leswing K.; Wu Z.. Deep Learning for the Life Sciences, (version 2.6.0); O’Reilly Media, 2019.
- Bringezu F.; Carlos Gómez-Tamayo J.; Pastor M. Ensemble prediction of mitochondrial toxicity using machine learning technology. Comput. Toxicol. 2021, 20, 100189. 10.1016/j.comtox.2021.100189. [DOI] [Google Scholar]
- Moshkov N.; Becker T.; Yang K.; Horvath P.; Dancik V.; Wagner B. K.; Clemons P. A.; Singh S.; Carpenter A. E.; Caicedo J. C. Predicting compound activity from phenotypic profiles and chemical structures. Nat. Commun. 2023, 14, 1967. 10.1038/s41467-023-37570-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Seal S.; Yang H.; Trapotsi M.-A.; Singh S.; Carreras-Puigvert J.; Spjuth O.; Bender A. Merging Bioactivity Predictions from Cell Morphology and Chemical Fingerprint Models Using Similarity to Training Data. J. Cheminform. 2023, 15, 56. 10.1186/s13321-023-00723-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tian G.; Harrison P. J.; Sreenivasan A. P.; Carreras-Puigvert J.; Spjuth O. Combining molecular and cell painting image data for mechanism of action prediction. Artificial Intelligence in the Life Sciences 2023, 3, 100060. 10.1016/j.ailsci.2023.100060. [DOI] [Google Scholar]
- Sosnin S.; Karlov D.; Tetko I. V.; Fedorov M. V. Comparative Study of Multitask Toxicity Modeling on a Broad Chemical Space. J. Chem. Inf. Model. 2019, 59, 1062–1072. 10.1021/acs.jcim.8b00685. [DOI] [PubMed] [Google Scholar]
- Wenzel J.; Matter H.; Schmidt F. Predictive Multitask Deep Neural Network Models for ADME-Tox Properties: Learning from Large Data Sets. J. Chem. Inf. Model. 2019, 59, 1253–1268. 10.1021/acs.jcim.8b00785. [DOI] [PubMed] [Google Scholar]
- Jain S.; Siramshetty V. B.; Alves V. M.; Muratov E. N.; Kleinstreuer N.; Tropsha A.; Nicklaus M. C.; Simeonov A.; Zakharov A. V. Large-Scale Modeling of Multispecies Acute Toxicity End Points Using Consensus of Multitask Deep Learning Methods. J. Chem. Inf. Model. 2021, 61, 653–663. 10.1021/acs.jcim.0c01164. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
GitHub (https://github.com/Bayer-Group/mitochondrial-tox) mechanistic models and code used to prepare the data and apply the models.