Abstract
Drug-induced phospholipidosis (PLD) involves the accumulation of phospholipids in cells of multiple tissues, particularly within lysosomes, and it is associated with prolonged exposure to druglike compounds, predominantly cationic amphiphilic drugs (CADs). PLD affects a significant portion of drugs currently in development and has recently been proven to be responsible for confounding antiviral data during drug repurposing for SARS-CoV-2. In these scenarios, it has become crucial to identify potential safe drug candidates in advance and distinguish them from those that may lead to false in vitro antiviral activity. In this work, we developed a series of machine learning classifiers with the aim of predicting the PLD-inducing potential of drug candidates. The models were built on a high-quality chemical collection comprising 545 curated small molecules extracted from ChEMBL v30. The most effective model, obtained using the balanced random forest algorithm, achieved high performance, including an AUC value computed in validation as high as 0.90. The model was made freely available through a user-friendly web platform named AMALPHI (https://www.ba.ic.cnr.it/softwareic/amalphiportal/), which can represent a valuable tool for medicinal chemists interested in conducting an early evaluation of PLD inducer potential.
Keywords: phospholipidosis, ligand-based classifiers, machine learning, SARS-CoV-2
Introduction
Phospholipidosis (PLD) is a lysosomal storage disorder characterized by excessive accumulation of phospholipids in liver, kidney, brain, cornea, lung, and other organs.1 While it is widely recognized that this phenomenon can arise from prolonged treatment with cationic amphiphilic drugs (CADs), the exact mechanism behind this process remains unclear. Various hypotheses have been explored in the literature, including direct inhibition of lysosomal phospholipases,2 binding to phospholipids,3 the potential regulation of phospholipid synthesis,4 and the enhanced cholesterol biosynthesis.5 For a comprehensive review on this topic, the reader is referred to the recent paper by Breiden et al.6 Given that a notable proportion (∼5%7) of drugs can induce PLD, there has been a growing interest in recent years to assess the potential of drug candidates to be inducers of PLD during the early stages of a drug discovery (DD) process. This proactive evaluation is recognized as valuable, as compounds that lead to PLD have a reduced likelihood of being successfully brought to market.8 Recently, highly significant correlations have been demonstrated between lipophilicity, the ability of CADs to induce PLD, and the antiviral activity that these cationic amphiphilic drugs have shown against multiple viruses such as hepatitis C virus (HCV), Japanese encephalitis virus (JEV), severe acute respiratory syndrome coronavirus (SARS-CoV), and Epstein–Barr virus (EBV).9 In light of the recent COVID-19 pandemic, a publication in Science by Tummino et al.10 presented findings that highlight the pivotal role of PLD in the context of drugs with anti-SARS-CoV-2 activity, revealing that most of the molecules return antiviral activity during the drug repurposing campaigns conducted during the pandemic induce PLD. This adds another layer of complexity and importance to the understanding and evaluation of PLD in drug development efforts, particularly in the context of the recent global health crisis. Based on these data, Tummino et al. speculated that the anti-SARS-CoV-2 activity observed in vitro for many molecules would be the consequence of their ability to induce PLD rather than activity on a specific target (false anti-SARS-CoV-2 activity10). This hypothesis was supported by the evidence that many molecules exhibiting antiviral activity in vitro lost such activity when transitioning to in vivo conditions. Further support came from the lack of correlation between the antiviral activity and the affinity of some ligands that interact with host targets identified as important in combating SARS-CoV-2 replication (e.g., sigma-1 receptor).11 Some molecules with high affinity for sigma-1, for instance, show no antiviral activity.12 Although PLD in the context of SARS-CoV-2 antiviral assays remains a subject of ongoing scientific debate with contradictory data,13,14 its importance in the drug development process has been further underscored. Unfortunately, the available in vitro assays able to measure the PLD-inducing potential of drug candidates are laborious, time-consuming, expensive, and, for these reasons, poorly applied despite the fact that several CADs agents are in clinical use/development. Furthermore, the gold standard method (transmission electron microscope—TEM) does not allow the screening of a large number of molecules.15 Other in vitro approaches include a method consisting of measuring the binding of dyes to the phospholipids by flow cytometry or fluorescence microscopy,16−20 as well as a method based on quantifying gene biomarkers linked to PLD.15 All these approaches are particularly expensive and often yield conflicting data.21−23 The development of in-silico tools able to prioritize safe drug candidates is, therefore, highly desirable, although it is worth noting that they are not free from limitations. Especially when not used in conjunction with experiments or when not starting from highly curated experimental data, they can lead to a considerable number of false positives, as was clearly seen during the COVID-19 pandemic.24 In the context of antiviral design, if developed effectively, these tools would provide valid support to the identification of those compounds with a low probability to provide false antiviral activities during in vitro assays. Accordingly, several models have been developed in the past few years to predict the PLD-inducing potential of drug candidates, based on ligand-based approaches7,25−28 or substructure search methods.29,30 Valuable examples can be found in the papers by Kruhlak et al.7 and Orogo et al.,26 reporting quantitative structure–activity relationship (QSAR) models based on 583 and 743 compounds, respectively, extracted from the published literature, existing pharmaceutical databases, and Food and Drug Administration (FDA) internal reports. Of note are also the papers by Fusani et al. and Schieferdecker et al.27,28 Based on in-house in vitro data, the authors developed machine learning-based models of PLD-inducing potential. However, despite their good performance (accuracy > 80%), the developed models are not available; hence, their accessibility to potentially interested users is strongly limited. Building on this background, in the present study, new classifiers of PLD-inducing potential were developed using four algorithms, namely, random forest (RF), K-nearest neighbors (KNN), gradient boosting (GB), and extreme gradient boosting (XGB) starting from 545 compounds extracted from ChEMBL version (v) 30 (PLD-DB) and then splitting into a training set (TS) and a validation set (VS). The top-performing classifier was also tested on two external sets (ESs). Despite the limited data availability, these models demonstrated satisfactory performance, as evidenced by widely accepted quality metrics, such as the area under the receiver operating characteristic curve (AUC) and balanced accuracy (BA). Following an approach successfully employed by our team for predicting other chemical properties,31,32 the most effective model was incorporated into a user-friendly web platform named AMALPHI (https://www.ba.ic.cnr.it/softwareic/amalphiportal/). Significantly, this platform does not necessitate expertise in cheminformatics or programming and can be a valuable resource for medicinal chemists interested in early evaluations of the PLD inducer potential. To the best of our knowledge, AMALPHI is the first freely accessible tool able to efficiently predict the PLD potential of drug candidates.
Materials and Methods
Data Set Preparation
We extracted 851 entries from ChEMBL v30 according to the Target ID (CHEMBL1626541) assigned to the PLD phenotype. Following an approach described elsewhere,32,33 we checked the validity of each SMILES string using an in-house semiautomated procedure implemented in the KNIME platform. In particular, this procedure allows for the removal of organometallic and inorganic compounds, chemicals characterized by unusual elements and mixtures, neutralizing salts, and stereochemistry. Finally, the OpenBabel node implemented in KNIME allowed the conversion of retrieved SMILES in a standardized QSAR-ready format. In doing that, we created the PLD-DB, consisting of 545 curated entries. It is worth noting that 70% of the compounds belonging to the PLD-DB data set are approved drugs, while the remaining ones have yet to progress to the clinical phase.
Furthermore, to assess the diversity of molecules in our data set, we employed a metric called internal diversity (ID—defined as the mean over the Tanimoto distances between each molecule and all the others belonging to the same set34), which effectively measures the similarity of molecules within the data set. The resulting ID value of 0.82 indicates that the compounds in our data set exhibit a high degree of diversity. To classify the entries as either PLD inducers (P+) or noninducers (P−), we analyzed the comments field based on the reference CHEMBL ID document. Annotations selected as referring to P+ were: “active”/ “positive”/ “positive: inducer confirmed by electron microscopy”/ “positive: weak inducer based on foamy macrophages and cytoplasmic vacuolations.” Instead, comments selected as indicating no PLD induction (P−) were: “not active”/“negative”/“negative: confirmed by electron microscopy”/“negative: based on the absence of positive reported data from WMDD.” 295 duplicates were removed, keeping the P+ or P– class as the most frequent one. Finally, 11 chemicals were excluded, as their activity was indicated as “Not determined” in ChEMBL v30. In doing that, the final curated data set (PDL-DB) comprises 104 P+ and 441 P– for a total of 545 compounds.
External Set Preparation
Two different external sets (ESs), one consisting of 117 (ES1) and the other consisting of 20 (ES2) compounds, were built and used in this work. In particular, Orogo et al.26 made available a data set (Or-ds) consisting of 743 compounds that we used to create ES1. Noteworthy, Or-ds comprises compounds along with their associated PLD activity and a corresponding data confidence rate expressed as either high or medium. Compounds associated with keywords related to electron microscopy confirmation of PLD are considered to have high confidence, while those associated with keywords and phrases indicating only the presence of foamy macrophages are considered to have medium confidence. We kept only those compounds with a high confidence rating and processed the SMILES strings using the same semiautomated procedure described above to remove duplicates and compounds already included in PLD-DB. The second external set (ES2) was built based on the work by Przybylak et al.29 The authors used two curated data sets comprising 185 and 331 compounds. These two data sets were merged and processed following the already described data curation approach.
Data Set Splitting
We employed a rational approach to split PLD-DB into a TS and a VS. To this aim, we applied the RDkit Diversity Picker node separately on the two classes (i.e., P+ and P−). This node automatically generates Morgan fingerprints (radius 2–2048 bits) for each SMILES string and then picks 80% of the most diverse molecules for each class based on the Tanimoto distance.35 In this way, a TS of 431 compounds (80% of each class) and a VS that includes the remaining 114 compounds were obtained. Table 2 summarizes the composition of the starting TS, VS, and ES as well as the relative imbalanced ratio (IR) calculated as the ratio between the number of majority and minority instances.36 Note that such a procedure allowed us to keep the ratio between the classes in each subset. To depict the chemical space covered by TS, VS, and ESs, a principal component analysis (PCA) was performed based on 36 physicochemical properties of the molecules calculated by the RDKit Descriptor Calculation KNIME node and then standardized using the Normalizer KNIME node (Figure 1). The score plot of the first three principal components (PC1, PC2, and PC3) that account for 80.8% of the variance shows each ligand belonging to the different data sets in the resulting 3D chemical space.
Table 2. Partitioning Schemes Before (Top) and After (Bottom) the Application of the ADa.
data set | # | P– | P+ | IR |
---|---|---|---|---|
TS | 431 | 351 | 80 | 4.4 |
VS | 114 | 90 | 24 | 3.7 |
ES1 | 133 | 112 | 21 | 5.3 |
ES2 | 20 | 11 | 9 | 1.2 |
within the AD | ||||
VS | 112 | 88 | 24 | 3.6 |
ES1 | 117 | 99 | 18 | 5.5 |
ES2 | 20 | 11 | 9 | 1.2 |
For PLD-DB, the number of noninducers (P−) and inducers (P+) chemicals is reported for the training set (TS), validation set (VS), largest (ES1), and smallest (ES2) external sets. Notably, the total number of chemicals (#) is also reported.
Figure 1.
PCA based on the physicochemical properties returned by the compounds belonging to TS, VS, ES1, and ES2.
Development of Statistically Based Models
Development and Validation
In this work, four classification algorithms were used: RF, KNN, GB, and XGB. We employed the following KNIME nodes: tree ensemble learner, tree ensemble predictor, K-nearest neighbor, gradient boosted trees learner, gradient boosted trees predictor, XGBoost tree ensemble learner, XGBoost predictor.37−39 AtomPair fingerprints (AP—1024 bits) calculated by the RDKit Fingerprint KNIME node were used to represent each chemical structure belonging to PLD-DB.
It is worth noting that we opted for AP fingerprints instead of the previously used Morgan fingerprints due to their acknowledged higher sensitivity to molecular global features, such as size and shape.40
Noteworthy, an IR equal to 4.4 was computed for the TS. For this reason, we created an additional set of models using an undersampling ensemble learning model (UELM), employing KNN, GB, and XGB. This technique presents two advantages as it (i) avoids the convergence of algorithms trained on the majority class ignoring classes with fewer samples,41 and (ii) preserves information from the majority class using the ensemble technique. In particular, we used the equal size sampling node to generate, from the original TS, 50 sub-TS (characterized by an IR equal to 1) to train 50 models and generate the final ensemble model, able to make predictions on external data (i.e., VS and ES) following a majority voting approach. In all cases, we found the optimal setting (shown in Table 1) for the final model training through hyperparameter tuning performed based on a 5-fold cross-validation (5-CV). Note that, for each algorithm, the hyperparameters known to be responsible for the higher impact on the overall performance42,43 were considered.
Table 1. Optimized Parameters for Each Algorithm.
algorithm | optimized parameters | unbalanced TS | equal size models |
---|---|---|---|
RF | split criterion | gini index | |
attribute sampling | square root | ||
set of attributes for each tree | different | ||
number of trees | 423 | ||
tree depth | 6 | ||
equal size sampling | yes | ||
kNN | number of neighbors to consider | 5 | 7 |
weight neighbors by distance | yes | no | |
GB | number of trees | 280 | 100 |
learning rate | 0.98 | 1 | |
attribute sampling | square root | ||
set of attributes for each tree | same | ||
maximum tree depth | 8 | ||
XGB | eta | 0.589 | 0.28 |
boosting rounds | 253 | 100 | |
gamma | 0.182 | ||
lamba | 4.842 | ||
alpha | 0.211 | ||
maximum depth | 6 |
To do that, we employed a Bayesian optimization algorithm for RF and XGB and a grid search for KNN and GB. Finally, after performance evaluation, we selected the best-performing model.
Applicability Domain
An applicability domain (AD) was defined for the TS in order to increase confidence in the predictions. Notably, AD represents the chemical space from which the models are built and, therefore, where a prediction can be considered reliable.44 The domain-similarity KNIME node was employed to define the AD. This node measures the Euclidean distances between the compounds belonging to the TS and those subjected to prediction. In particular, this approach allows defining an AD threshold (ADP) following these steps: (i) the computation of all the Euclidean distances between all the possible pairs of compounds belonging to TS, based on representative descriptors (AP fingerprint in our case); (ii) the creation of a set of distances that are lower than the average distance calculated in step 1; (iii) the computation of the mean (d) and standard deviation (σ) of the distances in the set created in step 2; and (iv) the definition the ADP (AD threshold) using the equation
![]() |
1 |
where Z is an empirical cutoff value equal to 0.5 by default.45 In doing that, we excluded 2 compounds from VS and 16 compounds from ES1, while all the compounds belonging to ES2 resulted within the AD. Table 2 reports the VS, ES1, and ES2 compositions after the application of the AD filter.
Performance Evaluation
Each classifier was evaluated by using Coopers statistics. In particular, sensitivity (SE), specificity (SP), and BA were computed as follows
![]() |
2 |
![]() |
3 |
![]() |
4 |
where TP (true positives) and TN (true negatives) are, respectively, the positive and negative samples correctly classified by the trained model, whereas FP (false positives) and FN (false negatives) are the misclassified positive and negative samples, respectively. Another quality metric, namely, the Matthews correlation coefficient (MCC), was considered to evaluate model performance. MCC indicates the quality of binary classification and is generally recognized as a reliable metric, although it deteriorates when the TS is unbalanced. MCC ranges between −1 and +1, where a value of +1 means a perfect classification, 0 indicates a random classification, and −1 is a complete misclassification.
![]() |
5 |
The AUC was also computed by the ROC curve node46 to measure the ability of a model to distinguish P+ from P– samples. This metric ranges between 0 (miss-classifiers) and 1 (ideal-classifiers), reflecting the probability of positive compounds being ranked earlier than decoy compounds according to the prediction confidence value estimated by the KNIME Predictor nodes with respect to each specific algorithm used.37−39 The ROC curve and, consequently, the AUC will be one of the key metrics we will primarily consider for selecting the most effective model.47
Finally, the positive (+LR) and the negative likelihood ratio (−LR) were considered and computed as follows
![]() |
6 |
and
![]() |
7 |
The classification model becomes more informative as the +LR value increases (or the −LR value decreases).
Results and Discussion
In this work, different classifiers of PLD-inducing potential were developed using four ML classification algorithms, namely, RF, KNN, GB, and XGB, all available in the KNIME analytics platform. To this end, a highly curated data set (PLD-DB) consisting of 545 compounds was used to train and then to validate the models. More specifically, PLD-DB was divided into a TS used to perform hyperparameter tuning based on a 5-fold cross-validation (5-CV) and a VS used to validate the models obtained with the best parameters identified. Each compound was described by binary fingerprints, namely, AtomPair FP.48 As already mentioned, PLD-DB is an unbalanced data set with an IR approximately equal to 4. To address the problem and therefore prevent a significant discrepancy between SE and SP, different techniques were undertaken. The RF algorithm was combined with a uniform size sampling strategy (hereinafter referred to as balanced random forest–BRF) to reduce the bias toward the majority class while an additional technique named UELM (see the section Materials and Methods for methodological details) was implemented for KNN (hereinafter referred to as uKNN), GB (uGB), and XGB (uXGB) algorithms. In particular, using the equal size KNIME node, we developed, for each of these algorithms, a final ensemble model following the approach described in the section “Models Development and Validation”. The final prediction was performed following a majority vote approach. Furthermore, to wisely evaluate real-life model predictivity, VS was kept imbalanced, and only compounds within the AD were considered. Finally, two different ESs were employed to assess the predictivity of the best model in a real-life case study. The following section will focus on analyzing the key quality metrics (SE, SP, BA, MCC, and AUC) calculated for each validation process (internal and external), aiming to identify the top-performing classifier.
Hyperparameterization
Table 3 displays the performances achieved using 5-fold cross-validation (5-CV) for each employed algorithm using the TS extracted from PLD-DB for hyperparameter optimization tuning. In this step, ensuring satisfactory performance is critical to guarantee the capability of model generalization, meaning that models fit the data set accurately, avoiding overfitting or underfitting. As expected, the classifiers built based on RF, KNN, GB, and XGB returned SP values significantly higher than SE ones (difference ranging from 0.55 to 0.78). Moreover, in all cases, BA values lower than 0.70 were computed. A remarkable performance improvement is instead observed when a proper treatment of the TS imbalance is undertaken as evident looking at the quality metrics returned by BRF, uKNN, uGB, and uXGB (Table 3). More specifically, no significant difference is observed between the SE and SP, ranging from 0.01 (BRF) to 0.08 (uKNN). Furthermore, higher BA and AUC values were also detected, with BRF returning the best performance (BA = 0.73 and AUC = 0.78). As mentioned in the introduction, building models of PLD-inducing potential has, as a primary aim, that of directing the experimental efforts toward safe (P−) molecules. Building on that, the prediction of dangerous substances as safe should be avoided. In light of that, we also focused our attention on the computed −LR values. Notice that this quality metric is independent of the TS data distribution and is able to provide an estimation of the decrease in the probability of a compound being a P– with respect to the initial condition (before querying the relative classifier). Again, BRF (−LR = 0.37) returned the best performance, followed by uGB (0.39), uXGB (0.44), and uKNN (0.48), while significantly worse values were returned by RF (0.84), KNN (0.70), GB (0.80), and XGB (0.63). In summary, the performed 5-CV provided clear evidence that resembling approaches are required to reach satisfactory performances.
Table 3. Performances in 5-CV Returned by All of the Developed Classifiersa.
SE | SP | BA | AUC | –LR | +LR | MCC | TP | FP | TN | FN | |
---|---|---|---|---|---|---|---|---|---|---|---|
RF | 0.19 | 0.97 | 0.58 | 0.77 | 0.84 | 6.58 | 0.26 | 15 | 10 | 341 | 65 |
KNN | 0.31 | 0.95 | 0.63 | 0.70 | 0.72 | 6.86 | 0.35 | 25 | 16 | 335 | 55 |
GB | 0.23 | 0.97 | 0.60 | 0.75 | 0.80 | 6.58 | 0.30 | 18 | 12 | 339 | 62 |
XGB | 0.40 | 0.95 | 0.68 | 0.78 | 0.63 | 8.26 | 0.43 | 32 | 17 | 334 | 48 |
BRF | 0.73 | 0.74 | 0.73 | 0.78 | 0.37 | 2.77 | 0.38 | 58 | 92 | 259 | 22 |
uKNN | 0.65 | 0.73 | 0.69 | 0.72 | 0.48 | 2.41 | 0.31 | 52 | 96 | 255 | 28 |
uGB | 0.73 | 0.70 | 0.71 | 0.73 | 0.39 | 2.43 | 0.34 | 58 | 105 | 246 | 22 |
uXGB | 0.68 | 0.72 | 0.70 | 0.72 | 0.44 | 2.43 | 0.32 | 54 | 98 | 253 | 26 |
For each model, the following statistics are reported: sensitivity (SE), specificity (SP), balanced accuracy (BA), area under the ROC (AUC), negative likelihood ratio (−LR), positive likelihood ratio (+LR), Matthews correlation coefficient (MCC), number of true positives (TPs), false positives (FPs), true negatives (TNs), and false negatives (FNs).
Validation
Aimed at selecting the top-performing classifier, the models built using BRF, uGB, uXGB, and uKNN were subjected to a validation using the VS previously extracted from PLD-DB and comprising 112 compounds. Notice that to judiciously evaluate the predictiveness of the model in real-life scenarios, VS was deliberately left imbalanced, and the performances were computed considering only those compounds within the AD. As evident in Figure 2A, the good performance observed in 5-CV is herein confirmed for all the models, as indicated by the computed BA values ≈ 0.80, AUC > 0.85, and very low −LR (ranging from 0.07 to 0.18). Importantly, acceptable differences were observed between SE and SP, ranging from 0.20 to 0.32. Figure 2B displays a radar plot constructed with the aim of selecting the top-performing classifier. Taking as a whole, the obtained data put forward BRF, whose ROC curve is displayed in Figure 2C, as the model to be selected, being able to provide the best AUC (0.90), BA (0.81), and MCC (0.50) values. It is important to emphasize that we conducted an additional analysis to assess the stability of the built models. This analysis involved creating an additional set of 100 classifiers by using different, randomly selected training sets (TS) and validation sets (VS) while adhering to the data splitting methodology outlined in the Materials and Methods Section. For each of these models, we calculated key metrics, including BA, SE, and SP, and then examined their relative standard deviations to gauge the stability of the classifiers. The obtained results unequivocally demonstrate that all of the classifiers exhibit robustness and independence from the TS and VS composition (standard deviations ≤ 0.07). To further challenge the BRF model, an additional validation with two different external sets (ES1 and ES2) was performed. As reported in Figure 3A, an acceptable balance between SE and SP was observed as well as high values of BA (0.72 and 0.90 for ES1 and ES2, respectively) and AUC (0.75 and 0.94, respectively). This is also supported by the relative ROC curves displayed in Figure 3B,C.
Figure 2.
Selection of the top-performing model was based on the performance obtained in validation. (A) Table reporting the quality metrics returned by all the developed models: sensitivity (SE), specificity (SP), balanced accuracy (BA), area under the ROC (AUC), negative likelihood ratio (−LR), positive likelihood ratio (+LR), Matthews correlation coefficient (MCC), number of true positives (TPs), false positives (FPs), true negatives (TNs), and false negatives (FNs); (B) radar plot comparing the performance of the models; and (C) ROC curve derived from the probability-based ranking returned by the selected classifier (BRF).
Figure 3.
Performance of the selected BRF model on external sets ES1 and ES2. (A) Table reporting the computed sensitivity (SE), specificity (SP), balanced accuracy (BA), area under the ROC (AUC), negative likelihood ratio (−LR), positive likelihood ratio (+LR), Matthews correlation coefficient (MCC), number of true positives (TPs), false positives (FPs), true negatives (TNs), and false negatives (FNs); (B, C) ROC curve derived from the probability-based ranking returned by BRF on ES1 and ES2, respectively.
AMALPHI: A Freely Accessible Web Platform
We made available the top-performing classifier, built using the BRF algorithm, in a freely accessible web platform called AMALPHI (A machine learning platform for predicting drug-induced phospholipidosis—https://www.ba.ic.cnr.it/softwareic/amalphiportal/). Following an approach already employed for other web platforms developed by our group,31,32 the user can draw a 2D structure of her/his query molecule using the JSME canvas applet49 or, alternatively, insert the relative SMILES string directly into the provided text field. Additionally, to facilitate the use of the platform for virtual screening applications, the user can upload a .txt file containing a list of SMILES strings. This can be achieved by clicking on the “MASSIVE” button. Once the file is uploaded or the query molecule is drawn, AMALPHI generates predictions regarding the PLD inducer potential of each compound used as input. The results are displayed as “YES” if the BRF model predicts the query to be a PLD inducer and conversely as “NO” if it is not. Notably, information on the reliability of the performed predictions is also provided, based on the considered AD. Finally, the user can download the produced output as .csv file. It is worth noting that a link to download the predictions is sent to the user’s registered email address. Additionally, the “History” page maintains a record of all user executions, preserving input SMILES files and their corresponding output. Figure 4 shows an example of an output page generated by the tool.
Figure 4.
Example of the output page returned by the AMALPHI web platform.
Conclusions
In an era where accurate prediction of pharmacological and toxicological properties of organic molecules is becoming fundamental to significantly expedite the drug discovery process in both academia and industry, expensive and time-consuming traditional approaches are increasingly giving way to the use of computational technologies. In this regard, this study focuses on the development of multiple machine learning models capable of predicting the PLD-inducing potential, employing different ML algorithms (RF, KNN, GB, and XGB). Following data extraction from ChEMBL v30 and subsequent analysis of experimental phospholipidosis data concerning 851 compounds, we applied rigorous data curation practices to create PLD-DB comprising 545 compounds and used them to build different PLD-inducing potential classifiers. The analysis of the obtained validation performances in validation yields similar values for all of the models trained using techniques that consider data imbalance (BRF, uKNN, uGB, and uXGB), among which the top-performing one was BRF, capable of providing the best AUC (0.90), BA (0.81), and MCC (0.50). Furthermore, external validation using two different external sets (ES1 and ES2) returned high values of BA (0.72 and 0.9, respectively) and AUC (0.75 and 0.94, respectively). Collectively, these promising results led us to make the top-performing classifier available through a user-friendly web platform developed by our group and named AMALPHI (https://www.ba.ic.cnr.it/softwareic/amalphiportal/). AMALPHI is the first freely accessible tool capable of efficiently predicting the PLD potential of drug candidates. It can assist medicinal chemists in proactively identifying safe drug candidates during the research and development of pharmacologically active molecules and in prioritizing drugs with a low probability of exhibiting false in vitro antiviral activity.
Acknowledgments
This work was funded by the National Research Council (CNR–Italy) under the program “PROGETTI DI RICERCA @CNR” (acronym DATIAMO) and by the European Union (Next Generation EU) under the program PRIN of the Ministero Dell'Università e della Ricerca (Progetti di ricerca di Rilevante Interesse Nazionale 2022–2022Z3BBPE—Development of broad-spectrum coronavirus antiviral agents acting as allosteric modulators of the host protein sigma-1). The salary of Dr. Maria Cristina Lomuscio was funded by REGIONE PUGLIA under the program RIPARTI (assegni di RIcerca in PARTenariato con le Imprese). The authors thank Dr. Ivan Mercurio for providing graphical support in creating the AMALPHI logo and Biofordrug srl (via Dante 95/99, 70019, Triggiano, Bari) for scientific and technological support.
Supporting Information Available
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.molpharmaceut.3c00964.
PLDBD excel file containing the 545 SMILES strings of the chemicals belonging to the PLDBD data set and the corresponding experimental values (XLSX)
ES1 file containing the 117 SMILES strings of the chemicals belonging to the ES1 data set and the corresponding experimental values (XLSX)
ES2 excel file containing the 20 SMILES strings of the chemicals belonging to the ES2 data set and the corresponding experimental values (XLSX)
The authors declare no competing financial interest.
Supplementary Material
References
- Lüllmann H.; Lüllmann-Rauch R.; Wassermann O. Drug-Induced Phospholipidoses. II. Tissue Distribution of the Amphiphilic Drug Chlorphentermine. CRC Crit. Rev. Toxicol. 1975, 4 (2), 185–218. 10.1080/10408447509164014. [DOI] [PubMed] [Google Scholar]
- Kubo M.; Hostetler K. Y. Mechanism of Cationic Amphiphilic Drug Inhibition of Purified Lysosomal Phospholipase A1. Biochemistry 1985, 24 (23), 6515–6520. 10.1021/bi00344a031. [DOI] [PubMed] [Google Scholar]
- Lüllmann H.; Lüllmann-Rauch R.; Wassermann O. Lipidosis Induced by Amphiphilic Cationic Drugs. Biochem. Pharmacol. 1978, 27 (8), 1103–1108. 10.1016/0006-2952(78)90435-5. [DOI] [PubMed] [Google Scholar]
- Pappu A.; Hostetler K. Y. Effect of Cationic Amphiphilic Drugs on the Hydrolysis of Acidic and Neutral Phospholipids by Liver Lysosomal Phospholipase A. Biochem. Pharmacol. 1984, 33 (10), 1639–1644. 10.1016/0006-2952(84)90286-7. [DOI] [PubMed] [Google Scholar]
- Lowe R.; Mussa H. Y.; Nigsch F.; Glen R. C.; Mitchell J. B. Predicting the Mechanism of Phospholipidosis. J. Cheminf. 2012, 4 (1), 2. 10.1186/1758-2946-4-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Breiden B.; Sandhoff K. Emerging Mechanisms of Drug-Induced Phospholipidosis. Biol. Chem. 2019, 401 (1), 31–46. 10.1515/hsz-2019-0270. [DOI] [PubMed] [Google Scholar]
- Kruhlak N. L.; Choi S. S.; Contrera J. F.; Weaver J. L.; Willard J. M.; Hastings K. L.; Sancilio L. F. Development of a Phospholipidosis Database and Predictive Quantitative Structure-Activity Relationship (QSAR) Models. Toxicol. Mech. Methods 2008, 18 (2–3), 217–227. 10.1080/15376510701857262. [DOI] [PubMed] [Google Scholar]
- Ettlin R. A.; Kuroda J.; Plassmann S.; Hayashi M.; Prentice D. E. Successful Drug Development Despite Adverse Preclinical Findings Part 2: Examples. J. Toxicol. Pathol. 2010, 23 (4), 213–234. 10.1293/tox.23.213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gunesch A. P.; Zapatero-Belinchón F. J.; Pinkert L.; Steinmann E.; Manns M. P.; Schneider G.; Pietschmann T.; Brönstrup M.; von Hahn T. Filovirus Antiviral Activity of Cationic Amphiphilic Drugs Is Associated with Lipophilicity and Ability To Induce Phospholipidosis. Antimicrob. Agents Chemother. 2020, 64 (8), 10–1128. 10.1128/aac.00143-20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tummino T. A.; Rezelj V. V.; Fischer B.; Fischer A.; O’Meara M. J.; Monel B.; Vallet T.; White K. M.; Zhang Z.; Alon A.; Schadt H.; O’Donnell H. R.; Lyu J.; Rosales R.; McGovern B. L.; Rathnasinghe R.; Jangra S.; Schotsaert M.; Galarneau J.-R.; Krogan N. J.; Urban L.; Shokat K. M.; Kruse A. C.; García-Sastre A.; Schwartz O.; Moretti F.; Vignuzzi M.; Pognan F.; Shoichet B. K. Drug-Induced Phospholipidosis Confounds Drug Repurposing for SARS-CoV-2. Science 2021, 373 (6554), 541–547. 10.1126/science.abi4708. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gordon D. E.; Jang G. M.; Bouhaddou M.; Xu J.; Obernier K.; White K. M.; O’Meara M. J.; Rezelj V. V.; Guo J. Z.; Swaney D. L.; Tummino T. A.; Hüttenhain R.; Kaake R. M.; Richards A. L.; Tutuncuoglu B.; Foussard H.; Batra J.; Haas K.; Modak M.; Kim M.; Haas P.; Polacco B. J.; Braberg H.; Fabius J. M.; Eckhardt M.; Soucheray M.; Bennett M. J.; Cakir M.; McGregor M. J.; Li Q.; Meyer B.; Roesch F.; Vallet T.; Mac Kain A.; Miorin L.; Moreno E.; Naing Z. Z. C.; Zhou Y.; Peng S.; Shi Y.; Zhang Z.; Shen W.; Kirby I. T.; Melnyk J. E.; Chorba J. S.; Lou K.; Dai S. A.; Barrio-Hernandez I.; Memon D.; Hernandez-Armenta C.; Lyu J.; Mathy C. J. P.; Perica T.; Pilla K. B.; Ganesan S. J.; Saltzberg D. J.; Rakesh R.; Liu X.; Rosenthal S. B.; Calviello L.; Venkataramanan S.; Liboy-Lugo J.; Lin Y.; Huang X.-P.; Liu Y.; Wankowicz S. A.; Bohn M.; Safari M.; Ugur F. S.; Koh C.; Savar N. S.; Tran Q. D.; Shengjuler D.; Fletcher S. J.; O’Neal M. C.; Cai Y.; Chang J. C. J.; Broadhurst D. J.; Klippsten S.; Sharp P. P.; Wenzell N. A.; Kuzuoglu-Ozturk D.; Wang H.-Y.; Trenker R.; Young J. M.; Cavero D. A.; Hiatt J.; Roth T. L.; Rathore U.; Subramanian A.; Noack J.; Hubert M.; Stroud R. M.; Frankel A. D.; Rosenberg O. S.; Verba K. A.; Agard D. A.; Ott M.; Emerman M.; Jura N.; von Zastrow M.; Verdin E.; Ashworth A.; Schwartz O.; d’Enfert C.; Mukherjee S.; Jacobson M.; Malik H. S.; Fujimori D. G.; Ideker T.; Craik C. S.; Floor S. N.; Fraser J. S.; Gross J. D.; Sali A.; Roth B. L.; Ruggero D.; Taunton J.; Kortemme T.; Beltrao P.; Vignuzzi M.; García-Sastre A.; Shokat K. M.; Shoichet B. K.; Krogan N. J. A SARS-CoV-2 Protein Interaction Map Reveals Targets for Drug Repurposing. Nature 2020, 583 (7816), 459–468. 10.1038/s41586-020-2286-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Abatematteo F. S.; Delre P.; Mercurio I.; Rezelj V. V.; Siliqi D.; Beaucourt S.; Lattanzi G.; Colabufo N. A.; Leopoldo M.; Saviano M.; Vignuzzi M.; Mangiatordi G. F.; Abate C. A Conformational Rearrangement of the SARS-CoV-2 Host Protein Sigma-1 Is Required for Antiviral Activity: Insights from a Combined in-Silico/in-Vitro Approach. Sci. Rep. 2023, 13 (1), 12798 10.1038/s41598-023-39662-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Diesendorf V.; Roll V.; Geiger N.; Fähr S.; Obernolte H.; Sewald K.; Bodem J. Drug-Induced Phospholipidosis Is Not Correlated with the Inhibition of SARS-CoV-2 - Inhibition of SARS-CoV-2 Is Cell Line-Specific. Front. Cell Infect. Microbiol. 2023, 13, 1100028 10.3389/fcimb.2023.1100028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lane T. R.; Ekins S. Defending Antiviral Cationic Amphiphilic Drugs That May Cause Drug-Induced Phospholipidosis. J. Chem. Inf. Model. 2021, 61 (9), 4125–4130. 10.1021/acs.jcim.1c00903. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Atienzar F.; Gerets H.; Dufrane S.; Tilmant K.; Cornet M.; Dhalluin S.; Ruty B.; Rose G.; Canning M. Determination of Phospholipidosis Potential Based on Gene Expression Analysis in HepG2 Cells. Toxicol. Sci. 2006, 96 (1), 101–114. 10.1093/toxsci/kfl184. [DOI] [PubMed] [Google Scholar]
- Ulrich R. G.; Kilgore K. S.; Sun E. L.; Cramer C. T.; Ginsberg L. C. An in Vitro Fluorescence Assay for the Detection of Drug-Induced Cytoplasmic Lamellar Bodies. Toxicol. Methods 1991, 1 (2), 89–105. 10.3109/15376519109044560. [DOI] [Google Scholar]
- Gum R. J.; Hickman D.; Fagerland J. A.; Heindel M. A.; Gagne G. D.; Schmidt J. M.; Michaelides M. R.; Davidsen S. K.; Ulrich R. G. Analysis of Two Matrix Metalloproteinase Inhibitors and Their Metabolites for Induction of Phospholipidosis in Rat and Human Hepatocytes(1). Biochem. Pharmacol. 2001, 62 (12), 1661–1673. 10.1016/S0006-2952(01)00823-1. [DOI] [PubMed] [Google Scholar]
- Casartelli A.; Bonato M.; Cristofori P.; Crivellente F.; Dal Negro G.; Masotto I.; Mutinelli C.; Valko K.; Bonfante V. A Cell-Based Approach for the Early Assessment of the Phospholipidogenic Potential in Pharmaceutical Research and Drug Development. Cell Biol. Toxicol. 2003, 19 (3), 161–176. 10.1023/A:1024778329320. [DOI] [PubMed] [Google Scholar]
- Kasahara T.; Tomita K.; Murano H.; Harada T.; Tsubakimoto K.; Ogihara T.; Ohnishi S.; Kakinuma C. Establishment of an in Vitro High-Throughput Screening Assay for Detecting Phospholipidosis-Inducing Potential. Toxicol. Sci. 2006, 90 (1), 133–141. 10.1093/toxsci/kfj067. [DOI] [PubMed] [Google Scholar]
- Morelli J. K.; Buehrle M.; Pognan F.; Barone L. R.; Fieles W.; Ciaccio P. J. Validation of an in Vitro Screen for Phospholipidosis Using a High-Content Biology Platform. Cell Biol. Toxicol. 2006, 22 (1), 15–27. 10.1007/s10565-006-0176-z. [DOI] [PubMed] [Google Scholar]
- Sawada H.; Takami K.; Asahi S. A Toxicogenomic Approach to Drug-Induced Phospholipidosis: Analysis of Its Induction Mechanism and Establishment of a Novel in Vitro Screening System. Toxicol. Sci. 2004, 83 (2), 282–292. 10.1093/toxsci/kfh264. [DOI] [PubMed] [Google Scholar]
- Reasor M. J.; Hastings K. L.; Ulrich R. G. Drug-Induced Phospholipidosis: Issues and Future Directions. Expert Opin. Drug Saf. 2006, 5 (4), 567–583. 10.1517/14740338.5.4.567. [DOI] [PubMed] [Google Scholar]
- Goracci L.; Ceccarelli M.; Bonelli D.; Cruciani G. Modeling Phospholipidosis Induction: Reliability and Warnings. J. Chem. Inf. Model. 2013, 53 (6), 1436–1446. 10.1021/ci400113t. [DOI] [PubMed] [Google Scholar]
- Macip G.; Garcia-Segura P.; Mestres-Truyol J.; Saldivar-Espinoza B.; Ojeda-Montes M. J.; Gimeno A.; Cereto-Massagué A.; Garcia-Vallvé S.; Pujadas G. Haste Makes Waste: A Critical Review of Docking-based Virtual Screening in Drug Repurposing for SARS-CoV-2 Main Protease (M-pro) Inhibition. Med. Res. Rev. 2022, 42 (2), 744–769. 10.1002/med.21862. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lowe R.; Glen R. C.; Mitchell J. B. O. Predicting Phospholipidosis Using Machine Learning. Mol. Pharm. 2010, 7 (5), 1708–1714. 10.1021/mp100103e. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Orogo A. M.; Choi S. S.; Minnier B. L.; Kruhlak N. L. Construction and Consensus Performance of (Q)SAR Models for Predicting Phospholipidosis Using a Dataset of 743 Compounds. Mol. Inf. 2012, 31 (10), 725–739. 10.1002/minf.201200048. [DOI] [PubMed] [Google Scholar]
- Fusani L.; Brown M.; Chen H.; Ahlberg E.; Noeske T. Predicting the Risk of Phospholipidosis with in Silico Models and an Image-Based in Vitro Screen. Mol. Pharm. 2017, 14 (12), 4346–4352. 10.1021/acs.molpharmaceut.7b00388. [DOI] [PubMed] [Google Scholar]
- Schieferdecker S.; Eberlein A.; Vock E.; Beilmann M. Development of an in Silico Consensus Model for the Prediction of the Phospholipigenic Potential of Small Molecules. Comput. Toxicol. 2022, 22, 100226 10.1016/j.comtox.2022.100226. [DOI] [Google Scholar]
- Przybylak K. R.; Alzahrani A. R.; Cronin M. T. D. How Does the Quality of Phospholipidosis Data Influence the Predictivity of Structural Alerts?. J. Chem. Inf. Model. 2014, 54 (8), 2224–2232. 10.1021/ci500233k. [DOI] [PubMed] [Google Scholar]
- Lagorce D.; Bouslama L.; Becot J.; Miteva M. A.; Villoutreix B. O. FAF-Drugs4: Free ADME-Tox Filtering Computations for Chemical Biology and Early Stages Drug Discovery. Bioinformatics 2017, 33 (22), 3658–3660. 10.1093/bioinformatics/btx491. [DOI] [PubMed] [Google Scholar]
- Creanza T. M.; Lamanna G.; Delre P.; Contino M.; Corriero N.; Saviano M.; Mangiatordi G. F.; Ancona N. DeLA-Drug: A Deep Learning Algorithm for Automated Design of Druglike Analogues. J. Chem. Inf. Model. 2022, 62 (6), 1411–1424. 10.1021/acs.jcim.2c00205. [DOI] [PubMed] [Google Scholar]
- Delre P.; Contino M.; Alberga D.; Saviano M.; Corriero N.; Mangiatordi G. F. ALPACA: A Machine Learning Platform for Affinity and Selectivity Profiling of CAnnabinoids Receptors Modulators. Comput. Biol. Med. 2023, 164, 107314 10.1016/j.compbiomed.2023.107314. [DOI] [PubMed] [Google Scholar]
- Delre P.; Lavado G. J.; Lamanna G.; Saviano M.; Roncaglioni A.; Benfenati E.; Mangiatordi G. F.; Gadaleta D. Ligand-Based Prediction of hERG-Mediated Cardiotoxicity Based on the Integration of Different Machine Learning Techniques. Front. Pharmacol. 2022, 13, 951083 10.3389/fphar.2022.951083. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Benhenda M.ChemGAN Challenge for Drug Discovery: Can AI Reproduce Natural Chemical Diversity? arXiv 2017 10.48550/arXiv.1708.08227. [DOI]
- RDKit Diversity Picker - RDKit Nodes Feature - KNIME. Knime relase 4.6.1: RDKit Nodes Feature; NIBR; 2022. [Google Scholar]
- Zhu R.; Guo Y.; Xue J.-H. Adjusting the Imbalance Ratio by the Dimensionality of Imbalanced Data. Pattern Recognition Letters 2020, 133, 217–223. 10.1016/j.patrec.2020.03.004. [DOI] [Google Scholar]
- Ensamble Learning Wrappers - KNIME. In Knime Relase 4.6.1: KNIME Ensamble Learning Wrappers; KNIME AG: Zurich, 2022. [Google Scholar]
- K Nearest Neighbor - Base Nodes - KNIME. In Knime Relase 4.6.1: KNIME Base nodes; KNIME AG: Zurich, 2022. [Google Scholar]
- XGBoost Integration - KNIME. In Knime Relase 4.6.1: KNIME XGBoost Integration; KNIME AG: Zurich, 2022. [Google Scholar]
- Capecchi A.; Probst D.; Reymond J.-L. One Molecular Fingerprint to Rule Them All: Drugs, Biomolecules, and the Metabolome. J. Cheminf. 2020, 12 (1), 43. 10.1186/s13321-020-00445-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lu W.; Li Z.; Chu J. Adaptive Ensemble Undersampling-Boost: A Novel Learning Framework for Imbalanced Data. J. Syst. Software 2017, 132, 272–282. 10.1016/j.jss.2017.07.006. [DOI] [Google Scholar]
- Agrawal T.Introduction to Hyperparameters. In Hyperparameter Optimization in Machine Learning: Make Your Machine Learning and Deep Learning Models More Efficient; Agrawal T., Ed.; Apress: Berkeley, CA, 2021; pp 1–30 10.1007/978-1-4842-6579-6_1. [DOI] [Google Scholar]
- Probst P.; Bischl B.; Boulesteix A.-L.. Tunability: Importance of Hyperparameters of Machine Learning Algorithms. 2018 10.48550/arXiv.1802.09596. [DOI]
- Gadaleta D.; Mangiatordi G. F.; Catto M.; Carotti A.; Nicolotti O. Applicability Domain for QSAR Models: Where Theory Meets Reality. Int. J. Quant. Struct.-Prop. Relat. 2016, 1 (1), 45–63. 10.4018/IJQSPR.2016010102. [DOI] [Google Scholar]
- Melagraki G.; Afantitis A.; Sarimveis H.; Igglessi-Markopoulou O.; Koutentis P. A.; Kollias G. In Silico Exploration for Identifying Structure–Activity Relationship of MEK Inhibition and Oral Bioavailability for Isothiazole Derivatives. Chem. Biol. Drug Des. 2010, 76 (5), 397–406. 10.1111/j.1747-0285.2010.01029.x. [DOI] [PubMed] [Google Scholar]
- ROC Curve - JavaScript Views - KNIME. In Knime Relase 4.6.1: KNIME JavaScript Views; KNIME AG: Zurich, 2022. [Google Scholar]
- Ling C. X.; Huang J.; Zhang H.. AUC: A Better Measure than Accuracy in Comparing Learning Algorithms. In Advances in Artificial Intelligence; Xiang Y.; Chaib-draa B., Eds.; Springer: Berlin, Heidelberg, 2003; pp 329–341 10.1007/3-540-44886-1_25. [DOI] [Google Scholar]
- Carhart R. E.; Smith D. H.; Venkataraghavan R. Atom Pairs as Molecular Features in Structure-Activity Studies: Definition and Applications. J. Chem. Inf. Comput. Sci. 1985, 25 (2), 64–73. 10.1021/ci00046a002. [DOI] [Google Scholar]
- Bienfait B.; Ertl P. JSME: A Free Molecule Editor in JavaScript. J. Cheminf. 2013, 5 (1), 24. 10.1186/1758-2946-5-24. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.