Skip to main content
ACS AuthorChoice logoLink to ACS AuthorChoice
. 2022 Nov 5;62(22):5351–5360. doi: 10.1021/acs.jcim.2c00787

Selective Inhibitor Design for Kinase Homologs Using Multiobjective Monte Carlo Tree Search

Tatsuya Yoshizawa , Shoichi Ishida †,*, Tomohiro Sato , Masateru Ohta §, Teruki Honma , Kei Terayama †,*
PMCID: PMC9709912  PMID: 36334094

Abstract

graphic file with name ci2c00787_0008.jpg

Designing highly selective molecules for a drug target protein is a challenging task in drug discovery. This task can be regarded as a multiobjective problem that simultaneously satisfies criteria for various objectives, such as selectivity for a target protein, pharmacokinetic endpoints, and drug-like indices. Recent breakthroughs in artificial intelligence have accelerated the development of molecular structure generation methods, and various researchers have applied them to computational drug designs and successfully proposed promising drug candidates. However, designing efficient selective inhibitors with releasing activities against various homologs of a target protein remains a difficult issue. In this study, we developed a de novo structure generator based on reinforcement learning that is capable of simultaneously optimizing multiobjective problems. Our structure generator successfully proposed selective inhibitors for tyrosine kinases while optimizing 18 objectives consisting of inhibitory activities against 9 tyrosine kinases, 3 pharmacokinetics endpoints, and 6 other important properties. These results show that our structure generator and optimization strategy for selective inhibitors will contribute to the further development of practical structure generators for drug designs.

Introduction

Identifying candidate compounds with high selectivity for a drug target protein is an essential factor for avoiding development failures related to pharmacokinetics and toxicity issues.1 For example, epidermal growth factor receptor (EGFR) inhibitors such as erlotinib and gefitinib were designed to be highly selective for EGFR and much less potent for other kinases in the EGFR family, such as receptor protein-tyrosine kinase erbB-2 (HER2/ERBB2) and erbB-4.2,3 Additionally, various properties, such as pharmacokinetic endpoints4 and drug-likeness indices,5 should be optimized to suitable criteria intended for lead compounds or clinical candidates. To be approved as a marketed drug, a drug candidate must simultaneously meet the criteria of multiple objectives, including the abovementioned properties.6

Various molecular structure generation techniques have been developed with machine learning methods to address the multiobjective problem in drug discovery over the past few decades.716 These studies have regarded the problems as optimizing multiple objectives, including the inhibitory activity against target proteins, absorption–distribution–metabolism–excretion–toxicity (ADMET), and drug-likeness properties. For instance, Winter et al.11 prepared 10 objectives for designing EGFR- and/or β-secretase 1 (BACE1)-targeting drugs using a particle swarm optimization algorithm. To follow an actual drug discovery project, Perron et al.15 prepared 11 objectives, including 1 undisclosed primary target and 6 off-target activity assays: serotonin 2A and 2B, alpha 1, dopamine receptor D1, sodium voltage-gated channel alpha subunit 2 (Nav1.2), and human ether-à-go-go-related gene channel (hERG). As shown in the above studies, simultaneous optimization approaches for proteins in different families have been attempted. However, selectivity for a target protein among its homologs has not been fully considered in de novo structure generation methods.

In this study, we attempted to perform de novo generation of molecular structures with simultaneously high selectivity in kinases and desirable properties related to ADMET, synthesizability, and drug-likeness. As a structure generator, we employed ChemTS,17 which consists of a recurrent neural network (RNN)-based structure generator18 and a Monte Carlo tree search (MCTS)19-based exploration system. ChemTS has been experimentally proven to generate molecular structures with target properties.2023 To guide ChemTS to optimal solutions, Dscore,11,24 a strategy for multiobjective optimization, was incorporated into the reward calculation part of ChemTS. Here, we targeted tyrosine kinase-selective inhibitor designs while optimizing 18 objectives consisting of inhibitory activities against 9 tyrosine kinases, pharmacokinetic endpoints, and indices of synthesizability and drug-likeness. It should be noted that the values of the inhibitory activities and most of the other properties are based on prediction models, reported in previous study.11,15 Our method successfully designed highly optimized selective inhibitor candidates for six tyrosine kinases among the nine total kinases while optimizing most of the other nine desirable properties simultaneously. Our molecular structure generator is publicly available on GitHub at https://github.com/molecule-generator-collection/ChemTSv2.

Methods

The workflow of our selective inhibitor design method is shown in Figure 1. Our system consists of two parts: the structure generator based on an MCTS and the evaluation strategy for multiobjective optimization. The details are as follows.

Figure 1.

Figure 1

Workflow of multiobjective optimization for selective inhibitor design. This workflow consists of two parts: molecular structure generation and generated structure evaluation. In structure generation, a reinforcement learning-based structure generator, ChemTS, searches for promising structures based on the evaluations of the generated structures. During the evaluation process of a generated structure, first, the objective values of the generated structure are predicted or calculated. Subsequently, reward values are calculated by the scoring function, Dscore, using the objective values and are fed back to the structure generator. By repeating the cycle of molecule generation and evaluation, generated molecules are gradually optimized to have high selectivity against a target protein, good pharmacokinetic properties, and decent druglikeness.

Generative Model for the Molecular Structure

For molecular structure generation, we used ChemTS, which consists of an RNN-based molecule generator and an MCTS-based search algorithm in chemical space. In ChemTS, a node in the MCTS corresponds to a single atom in SMILES format, and the MCTS builds a search tree by expanding nodes through four steps: selection, expansion, simulation, and backpropagation.25 In the selection step, the node that is considered to be the most promising leaf node is selected based on its upper confidence bound (UCB) score. The UCB score balances the trade-off between exploration and exploitation using an exploration parameter C. If C is set to a large value (e.g., C = 1.0), more priority is given to unvisited nodes, whereas if set to a small value (e.g., C = 0.1), more priority is given to nodes that have been revealed to be promising. During the expansion step, new nodes are added to the node selected in the selection step. In the simulation step, the RNN model is iteratively applied to the expanded partial SMILES strings until a terminal symbol is encountered. Subsequently, the generated SMILES strings are evaluated using a reward function. Designing an appropriate reward function is essential to the success of reasonable molecule generation. In the backpropagation step, the reward scores are backpropagated through the selected nodes to update their node states. During the search process, ChemTS generates compounds to maximize the reward value.

An RNN model was used as a policy network for SMILES string generation in both the expansion and simulation steps. The RNN model was trained to predict the probability distribution of the next atoms from a given partial SMILES string. Following a previous study,17 the RNN model was constructed based on a gated recurrent unit, and approximately 250,000 compounds from the ZINC database26 were used as the training dataset.

Objectives of Selective Inhibitor Design

To design highly selective tyrosine kinase-selective inhibitors, we focused on three categories: inhibitory activities against tyrosine kinases, pharmacokinetic endpoints, and other properties, as listed in Table 1. For inhibitory activities, nine representative tyrosine kinases, including kinases from the EGFR family, were selected: EGFR, ERBB2, Abelson tyrosine-protein kinase (ABL), proto-oncogene tyrosine-protein kinase (SRC), lymphocyte-specific tyrosine-protein kinase (LCK), platelet-derived growth factor receptor beta (PDGFRβ), vascular endothelial growth factor receptor 2 (VEGFR2), fibroblast growth factor receptor 1 (FGFR1), and ephrin type-B receptor 4 (EPHB4). The selection procedure was as follows: first, EGFR bioactivity information, which was also analyzed in a previous study,11 was collected. Then, the tyrosine kinases for which the bioactivity of more than 500 compounds was available in ChEMBL2827 were selected. Finally, 8 tyrosine kinases whose more than 200 bioactive compounds overlapped with the EGFR dataset were selected as the target proteins to investigate selectivity in this study. For pharmacokinetic endpoints, three objectives were chosen: solubility, membrane permeability, and metabolic stability. For the other properties, six objectives were selected: the synthetic accessibility score (SAscore),28 quantitative estimate of drug-likeness (QED) score,29 molecular weight filter,11 oral rat acute toxicity,30 Tox alert filter,31,32 and ChEMBL structure filter.11 The objectives to be maximized were as follows: the inhibitory activity against the target protein, solubility, membrane permeability, metabolic stability, and QED. The objectives to be minimized were as follows: the inhibitory activities against the off-target proteins, acute toxicity, and SAscore. The objectives to be categorized in a binary manner (0 or 1) were as follows: a molecular weight filter, a Tox alert filter, and a ChEMBL structure filter. For the molecular weight filter, the objective value was set to 1 if the weight of a given molecule ranged from 200 to 600 and 0 if the weight was not in this range. The value of the Tox alert filter was set to 1 if the substructures in a given molecular structure were not included in the Tox alert and 0 otherwise. Penalties were applied to compounds with structures that matched the list of structures that are toxic or can cause side effects. For the ChEMBL structure filter, the value was set to 1 if the substructure in a given molecule was in any substructure set that appeared in at least five compounds in ChEMBL and 0 otherwise.

Table 1. List of Optimization Objectives and Three Weight Patterns for Multiobjective Optimization.

      weight
category objective prediction/calculation target-only selectivity
inhibitory activity target protein LightGBM 8 8
  eight off-target proteins LightGBM 0 1a
pharmacokinetics solubility LightGBM 1 1
  permeability LightGBM 1 1
  metabolic stability LightGBM 1 1
other property SAScore RDKit 1 1
  QED RDKit 1 1
  MW RDKit 1 1
  toxicity LightGBM 1 1
  tox alert RDKit 1 1
  ChEMBL structure RDKit 1 1
a

In the selectivity pattern, the objective function and the weights for off-target proteins were designed to reduce their inhibitory activities.

Due to the lack of coverage of actual kinase bioactivity data and the pharmacokinetic and toxicity properties of the generated structures, prediction models were needed to obtain each objective value. For the bioactivity data, the pChEMBL value, which is equivalent to the pIC50 value, was used. To collect training datasets, the compounds with pChEMBL, membrane permeability, and metabolic stability values were retrieved from ChEMBL28;27 those with solubility and acute toxicity values were retrieved from Therapeutics Data Commons.33 To prepare an input feature for the model, RDKit software34 was used to calculate the Morgan fingerprint, whose radius and dimension were set to 2 and 2048, respectively. The light gradient boosting machine (LightGBM)35 was used as the prediction model. The performance of each model was assessed with 5-fold cross validation (Figure S1). In each fold, each dataset was split into two datasets: 80% for training and 20% for testing. When generating structures using ChemTS, each model was retrained using all the datasets.

Multiobjective Optimization Strategy and Parameter Settings

Appropriately designing the reward function is essential for obtaining well-optimized candidate structures, especially in multiobjective problems. In this study, we used Dscore24 to simultaneously contend with our 18 objectives, as shown in Table 1. Dscore is the weighted geometric mean of multiple values and is defined as

graphic file with name ci2c00787_m001.jpg 1

where n is the number of optimized items, w is the weight, and u is the objective value scaled from 0 to 1. Referring to a previous study,36 Gaussian modifiers were used to normalize each of the seven objectives: the inhibitory activities, solubility, membrane permeability, metabolic stability, QED, acute toxicity, and SAscore. Gaussian modifiers were defined for each objective such that the maximum value was returned when given the value that each objective was expected to take, as shown in Figure S2. Weights of each objective were set according to the importance levels of these objectives in a multiobjective problem. To verify the applicability of our method, we attempted to design nine weight patterns for selective inhibitors targeting each of the nine tyrosine kinases, as shown in Table 1. To account for selectivity, eight proteins other than the target protein were set as off-target proteins. The weight for each target protein in the reward function was set to 8 to maximize its inhibitory activity, while those for the other proteins were set to 0 (target-only pattern) or 1 (selectivity pattern). The threshold regarding the number of generated compounds was set to 200,000, and three values were used for the C parameter (0.2, 0.6, and 1.0) in the MCTS. Calculations were performed three times for each condition.

Results

Selective EGFR Inhibitor Design

Figure 2a,b shows the optimization processes for generating selective EGFR inhibitors using the proposed method with target-only and selectivity patterns. The details of the target-only and selectivity patterns are shown in Table 1. Here, the search parameter C was fixed to 0.2, and the moving average of the results of three runs with 200,000 molecule generations is shown. As shown in Figure 2a, without considering the off-targets, the inhibitory activity against EGFR increased, but the other activities, especially PDFGRβ, increased. When both the EGFR activity and selectivity for the eight off-targets were considered, only the inhibitory activity against EGFR increased, and sufficient selectivity was ensured (Figure 2b). Note that each line in Figure 2 shows the value of the moving average, and individual structures may have higher or lower selectivity for EGFR. Pharmacokinetics, toxicity, and drug-likeness properties were slightly worse in the selectivity pattern than in the target-only pattern. To design more selective inhibitors, we changed the weights of PDFGRβ, VEGFR2, and EPHB4 from 1 to 2 (emphasized selectivity pattern), as shown in Figure 2c, because these three proteins showed higher inhibitory activities among the other proteins in the target-only and selectivity patterns. As a result, the selectivity here was higher than that in the selectivity pattern, as expected. However, the pharmacokinetics, toxicity, and drug-likeness properties were worse than those in the selectivity pattern. These results indicate that the multiobjective optimization process of our method worked well for generating the structures of selective inhibitors even with a considerably large number of objectives, and our method allowed us to control the strength of selectivity by changing the weight patterns. Since the selectivity pattern provided a relatively good balance between selectivity and the other properties, we analyzed the results generated by the selectivity pattern in the following.

Figure 2.

Figure 2

Comparison of optimization progress in the design of selective EGFR inhibitors for three weight patterns: (a) using the reward function without eight off-target proteins (target-only pattern), (b) using the default reward function (selectivity pattern), and (c) using the reward function with the emphasized weight setting (emphasized selectivity pattern). Upper panels represent the moving averages of the predicted inhibitory activity against the nine kinase proteins. Thick blue lines correspond to the inhibitory activity against EGFR and the others for the off-target proteins. Lower panels represent the moving averages of the predicted pharmacokinetics and drug-likeness properties scaled from 0 to 1. Details are shown in Figure S2.

Table 2 shows the representative molecules generated by the selectivity pattern. To select representative compounds with high Dscores, k-means clustering (k = 20) was applied to the compounds with Dscores that were in the top 5% (10,000 compounds). Structures with the largest Dscores in each cluster were extracted, and the top five structures and their predicted properties are shown in Table 2. The Dscore values and the predicted or calculated objective properties of the 20 structures are shown in Table S1. Molecular structures shown in Table 2 had sufficiently high predicted activities for EGFR. The difference between EGFR activity and the largest activity against the off-target proteins, ΔBA, was approximately 2, indicating that they had sufficient selectivity for EGFR. Pharmacokinetics, toxicity, and drug-likeness properties were also within the overall acceptable ranges, although some molecules had lower permeability values and SAscores. This result demonstrates the validity of the proposed method, although it is based on the scores of the predicted and calculated results.

Table 2. Examples of the Generated Selective Inhibitors with High Dscores for Targeting EGFR and Their Predicted Propertiesa.

graphic file with name ci2c00787_0007.jpg

a

Generated structures with Dscores in the top 5% were divided into 20 clusters. Structures with the largest Dscores in each cluster were extracted, and top five structures and predicted properties are shown here. Predicted or calculated properties for each structure are as follows: Dscore, inhibitory activity against EGFR (EGFR) in pIC50, difference between the inhibitory activity against EGFR and the maximum inhibitory activity against the off-target proteins (delta bioactivity: ΔBA) in pIC50, solubility (Sol) in logS, Caco-2 permeability (Perm) in log(μcm/s), metabolic stability in human liver microsomes (Stab) in 1 h remaining (%), oral rat acute toxicity (Tox) in LD50, synthetic accessibility score (SA), and drug-likeness (QED).

Selective Inhibitor Designs for Kinase Homologs

To check the applicability of the proposed method, selective inhibitor generations were attempted for eight kinases other than EGFR, as shown in Figure 3. Here, 200,000 molecules were generated for each kinase with the selectivity pattern and the C parameter set to 0.2. For all targets, the proposed method succeeded in improving the activity compared to providing greater activity than the initial value for each main target. In terms of selectivity, the four results obtained for EGFR, PDGFRβ, EPHB4, and ABL showed that each activity against a target protein was higher than the inhibitory activity against the other eight off-targets by at least 10 times (Figure 3a–d). In addition, the two results obtained for FGFR1 and VEGFR2 showed 10 times higher inhibitory activity than that of the other off-target proteins (Figure 3e,f). However, there was no selectivity for LCK, SRC, or ERBB2 (Figure 3g–i). In terms of pharmacokinetics, acute toxicity, metabolic stability, solubility, and membrane permeability were successfully optimized in most target patterns. The acute toxicity, metabolic stability, solubility, and membrane permeability values fluctuated at approximately 1.0, 0.7, 0.7, and 0.9, respectively. In terms of other properties, the SAscore and QED values gradually decreased; nevertheless, the scaled SAscores remained high, ranging from 0.7 to 0.9 (3.5 and 4.5 in nonscaled values).

Figure 3.

Figure 3

Molecule generation processes of the proposed method for selective inhibitors targeting kinase homologs. (a–i) represent the results obtained when targeting EGFR, PDGFRβ, EPHB4, ABL, FGFR1, VEGFR2, LCK, SRC, and ERBB2, respectively. Upper panels represent the moving averages of the predicted inhibitory activity against the nine kinase proteins. Thick lines show the inhibitory activity against the main target and the other objectives for the off-target proteins. Lower panels represent the moving averages of the predicted pharmacokinetics and drug-likeness properties scaled from 0 to 1. Moving averages for each condition were calculated with a moving window size of 1000 using the average of the results of three runs. Here, the value of the exploration parameter C was set to 0.2. Results obtained with other values of C are shown in Figure S3.

We also designed molecules with three different values of the C parameter to evaluate the effect of C on multiobjective optimization and molecular variety, as shown in Figure S3. Each objective value tended to converge faster as the C value decreased. In most cases, the inhibitory activities against each target protein and selectivity were better when C was set to 0.2. To quantitatively assess the diversity of the generated molecular structures, the number of unique scaffolds was counted, as shown in Figure 4. Blue and orange bars indicate the generic Murcko scaffold and the Murcko scaffold, respectively. As expected, a larger variety of molecules were generated with larger values of C.

Figure 4.

Figure 4

Comparison among the numbers of unique scaffolds in the generated compounds for the three patterns of the parameter C. (a–i) represent the results obtained when targeting EGFR, PDGFRβ, EPHB4, ABL, FGFR1, VEGFR2, LCK, SRC, and ERBB2, respectively. Blue and orange bars correspond to the generic Murcko scaffold and the Murcko scaffold, respectively.

Discussion and Conclusions

In this study, we developed a molecular structure generation method in combination with the Dscore technique as a reward function to simultaneously optimize multiple properties. Our method successfully optimized most of the 18 objectives, including inhibitory activities against tyrosine kinases, pharmacokinetics, toxicity, synthesizability, and drug-likeness properties, and successfully designed promising selective inhibitors for 6 kinases among the 9 total kinases based on the prediction models. The generated results showed that the proposed method could generate various selective inhibitors by adjusting the weight patterns and the exploration parameter C (Figures 3 and S3).

To understand how the generated structures achieved selectivity with the proposed method, additional analyses were performed. As an example of a large selectivity change, we focused on a generation run targeting EGFR with the selectivity pattern (Figure 5). In this run, the inhibitory activity against EGFR increased from approximately 6.0 to over 7.0 at the 117,000th compound, although the other inhibitory activities tended to decrease. To confirm what changes happened at that point, we extracted the 2000 molecular structures before and after the 117,000th structure; we then removed the structures with pChEMBL values that were over 6.5 from the former group and the structures with pChEMBL values that were less than 7.5 from the latter group, resulting in 592 (group A) and 580 (group B) structures remaining, respectively. To obtain representative structures for each group, k-means clustering (k = 6) was applied to each group. Figure 5b,c shows the structures with the largest inhibitory activities in the two groups. Molecular structures in group A had a five-membered ring (blue circle) attached to the thieno[2,3-d]pyrimidine (highlighted in Figure 5b,c), whereas the structures in group B had a six-membered ring (orange circles) attached to it. The proposed method successfully discovered this critical structural change, although laboratory experiments are needed to confirm that the change actually improves the inhibitory activities of the molecules.

Figure 5.

Figure 5

Structural analysis of the factors contributing to improving the inhibitory activity toward EGFR. (a) Optimization progress of inhibitory activity to kinase proteins in one run of the selectivity pattern targeting EGFR. Lower panel shows a close-up view of the area where the inhibitory activity for EGFR increased. The 2000 structures before and after the 117,000th structure are colored blue and orange, respectively. (b,c) Differences between the structural characteristics of molecules with and without high inhibitory activity against EGFR. (b) Representative molecules without high inhibitory activity against EGFR. Among the 2000 structures before the 117,000th structure [the blue-colored range in (a)], the molecules with inhibitory activity values that were less than 6.5 (group A) were divided into six clusters, and the molecules with the highest Dscores were selected. (c) Representative molecules with high inhibitory activity against EGFR. Among the 2000 structures after the 117,000th structure [the orange-colored range in (a)], the molecules with inhibitory activity values that were greater than 7.5 (group B) were divided into six clusters, and the molecules with the highest Dscores were selected. Molecules in group A had a five-membered ring attached to thieno[2,3-d]pyrimidine (highlighted in gray), whereas the molecules in group B had a six-membered ring attached to it. The predicted inhibitory activity toward EGFR (pChEMBL value) of each molecule is shown below.

Furthermore, to compare our model with that developed in a previous study, we performed molecular structure generation to design the same selective inhibitors with the approach in a previous study.11 In the previous study, the following 10 objectives were optimized: inhibitory activity against BACE1 and EGFR; pharmacokinetics and toxicity properties; and other properties, such as synthesizability and drug-likeness indices. The authors attempted to design three patterns of selective inhibitors: (1) maximizing the inhibitory activity against EGFR while minimizing the inhibitory activity against BACE1, (2) maximizing the inhibitory activity against BACE1 while minimizing the inhibitory activity against EGFR, and (3) maximizing the inhibitory activity against both EGFR and BACE1. In addition, nine objectives related to pharmacokinetics, toxicity, and other properties described in the Methods section were included in the reward function. In this comparison, LightGBM models were used to predict the inhibitory activities against BACE1 and EGFR. The prediction model for BACE1 was trained using compound information provided by Subramanian et al.,37 and for EGFR the model prepared in this study was used. Our method generally optimized several properties under the three conditions, as shown in Figure S4. Courses of optimization under the three conditions were similar to those in the previous study. In particular, the inhibitory activities against the two proteins were significantly optimized when the MCTS parameter C was set to 0.2. These results indicate that the performances achieved by our method under the setting of the previous study were comparable to those of the method in that study.11

Although the proposed method successfully designed six kinase-selective inhibitors, it generated structures with insufficient selectivity for three kinases (LCK, SRC, and ERBB2). The correlation between the prediction models may be one of the reasons for the low-target-selectivity molecule generation results. For example, the correlation coefficients of the prediction models for LCK and SRC with each other model were higher than those of the other seven models (Figure S5), which may have resulted in the failure of the LCK and SRC selective inhibitor designs (Figure 3g,h). However, the ABL-selective inhibitor designs exhibited high selectivity despite their prediction model having relatively high correlations with the other models. Another reason could simply be that the number of structure generations was not sufficient for these tasks. A larger number of generations would enable the design of structures with higher selectivity.

It should be noted that the generated structures were optimized based on the objectives, including the prediction values output by the LightGBM models. Although the models achieved satisfactory prediction performances (with R values of approximately 0.8 in most cases, as shown in Figure S1), the predicted values may differ from the experimental activity values. To confirm the reliability of the result obtained from predictions, we further evaluated the generated structures with high Dscores using other in silico software: AutoDock Vina38 and ADMETlab 2.0,39 as shown in Tables S2–S5. Results obtained by these software programs suggested that some generated structures are promising regardless of the C values. In addition, the applicability domains of each prediction model were limited to each training dataset. Therefore, the reliabilities of each prediction model are expected to be lower for the generated structures that were not included in the training dataset. To overcome the abovementioned reliability problem, models that consider the applicability domain40 and the actual synthesis and evaluation of compounds are necessary.15,22

Data and Software Availability

Our implementation is publicly available on GitHub at https://github.com/molecule-generator-collection/ChemTSv2 under the MIT License. The README file at https://github.com/molecule-generator-collection/ChemTSv2/blob/master/doc/multiobjective_optimization_using_dscore.md provides information about how to set up and use the application. Training data and trained prediction models are stored at https://github.com/ycu-iil/prediction_model_collection under the MIT License.

Acknowledgments

This research was conducted in “Development of a Next-generation Drug Discovery AI through Industry-Academia Collaboration (DAIIA)” supported by Japan Agency for Medical Research and Development (AMED) under grant no.JP22nk0101111. This work was also supported by MEXT as a “Program for Promoting Researches on the Supercomputer Fugaku (Application of Molecular Dynamics Simulation to Precision Medicine Using Big Data Integration System for Drug Discovery)” and a Research Support Project for Life Science and Drug Discovery (project ID: JP22ama121023) from AMED.

Supporting Information Available

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jcim.2c00787.

  • Correlation plots of the predicted objective values and corresponding experimental values; scaling functions for each objective; molecule generation processes of the proposed method for selective inhibitors targeting kinase homologs; molecule generation processes of the proposed method for selective inhibitors targeting EGFR and BACE1; correlation matrix between the prediction models; 20 examples of generated selective inhibitors with high Dscores targeting EGFR and their predicted properties; comparison of estimated binding affinities against EGFR (PDB: 2RGP) of the generated structures for each of the three patterns of parameter C; estimated binding affinity against EGFR (PDB: 2RGP) for the six approved drugs and the generated structures listed in Table S1; comparison of predicted pharmacokinetic and toxicity properties of the generated structures for each of the three patterns of parameter C using ADMETlab 2.0; and predicted pharmacokinetic and toxicity properties of the representative structures listed in Table S1 using ADMETlab 2.0 (PDF)

The authors declare no competing financial interest.

Supplementary Material

References

  1. Bendels S.; Bissantz C.; Fasching B.; Gerebtzoff G.; Guba W.; Kansy M.; Migeon J.; Mohr S.; Peters J.-U.; Tillier F.; Wyler R.; Lerner C.; Kramer C.; Richter H.; Roberts S. Safety screening in early drug discovery: An optimized assay panel. J. Pharmacol. Toxicol. Methods 2019, 99, 106609. 10.1016/j.vascn.2019.106609. [DOI] [PubMed] [Google Scholar]
  2. Karaman M. W.; Herrgard S.; Treiber D. K.; Gallant P.; Atteridge C. E.; Campbell B. T.; Chan K. W.; Ciceri P.; Davis M. I.; Edeen P. T.; Faraoni R.; Floyd M.; Hunt J. P.; Lockhart D. J.; Milanov Z. V.; Morrison M. J.; Pallares G.; Patel H. K.; Pritchard S.; Wodicka L. M.; Zarrinkar P. P. A quantitative analysis of kinase inhibitor selectivity. Nat. Biotechnol. 2008, 26, 127–132. 10.1038/nbt1358. [DOI] [PubMed] [Google Scholar]
  3. Selleckchem.com. https://www.selleckchem.com/EGFR(HER).html, 2022. (accessed on 3 June, 2022).
  4. Ruiz-Garcia A.; Bermejo M.; Moss A.; Casabo V. G. Pharmacokinetics in Drug Discovery. J. Pharm. Sci. 2008, 97, 654–690. 10.1002/jps.21009. [DOI] [PubMed] [Google Scholar]
  5. Mignani S.; Rodrigues J.; Tomas H.; Jalal R.; Singh P. P.; Majoral J.-P.; Vishwakarma R. A. Present drug-likeness filters in medicinal chemistry during the hit and lead optimization process: how far can they be simplified?. Drug Discovery Today 2018, 23, 605–615. 10.1016/j.drudis.2018.01.010. [DOI] [PubMed] [Google Scholar]
  6. Nicolaou C. A.; Brown N. Multi-objective optimization methods in drug design. Drug Discovery Today: Technol. 2013, 10, e427–e435. 10.1016/j.ddtec.2013.02.001. [DOI] [PubMed] [Google Scholar]
  7. Gillet V. J.; Khatib W.; Willett P.; Fleming P. J.; Green D. V. S. Combinatorial Library Design Using a Multiobjective Genetic Algorithm. J. Chem. Inf. Comput. Sci. 2002, 42, 375–385. 10.1021/ci010375j. [DOI] [PubMed] [Google Scholar]
  8. Firth N. C.; Atrash B.; Brown N.; Blagg J. MOARF, an Integrated Workflow for Multiobjective Optimization: Implementation, Synthesis, and Biological Evaluation. J. Chem. Inf. Model. 2015, 55, 1169–1180. 10.1021/acs.jcim.5b00073. [DOI] [PubMed] [Google Scholar]
  9. Li Y.; Zhang L.; Liu Z. Multi-objective de novo drug design with conditional graph generative model. J. Cheminf. 2018, 10, 33. 10.1186/s13321-018-0287-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Zhou Z.; Kearnes S.; Li L.; Zare R. N.; Riley P. Optimization of Molecules via Deep Reinforcement Learning. Sci. Rep. 2019, 9, 10752. 10.1038/s41598-019-47148-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Winter R.; Montanari F.; Steffen A.; Briem H.; Noé F.; Clevert D.-A. Efficient multi-objective molecular optimization in a continuous latent space. Chem. Sci. 2019, 10, 8016–8024. 10.1039/c9sc01928f. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Khemchandani Y.; O’Hagan S.; Samanta S.; Swainston N.; Roberts T. J.; Bollegala D.; Kell D. B. DeepGraphMolGen, a multi-objective, computational strategy for generating molecules with desirable properties: a graph convolution and reinforcement learning approach. J. Cheminf. 2020, 12, 53. 10.1186/s13321-020-00454-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Yasonik J. Multiobjective de novo drug design with recurrent neural networks and nondominated sorting. J. Cheminf. 2020, 12, 14. 10.1186/s13321-020-00419-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Liu X.; Ye K.; van Vlijmen H. W. T.; Emmerich M. T. M.; IJzerman A. P.; van Westen G. J. P. DrugEx v2: de novo design of drug molecules by Pareto-based multi-objective reinforcement learning in polypharmacology. J. Cheminf. 2021, 13, 85. 10.1186/s13321-021-00561-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Perron Q.; Mirguet O.; Tajmouati H.; Skiredj A.; Rojas A.; Gohier A.; Ducrot P.; Bourguignon M.-P.; Sansilvestri-Morel P.; Do Huu N. D.; Gellibert F.; Gaston-Mathé Y. Deep generative models for ligand-based de novo design applied to multi-parametric optimization. J. Comput. Chem. 2022, 43, 692–703. 10.1002/jcc.26826. [DOI] [PubMed] [Google Scholar]
  16. Bung N.; Krishnan S. R.; Roy A. An In Silico Explainable Multiparameter Optimization Approach for De Novo Drug Design against Proteins from the Central Nervous System. J. Chem. Inf. Model. 2022, 62, 2685. 10.1021/acs.jcim.2c00462. [DOI] [PubMed] [Google Scholar]
  17. Yang X.; Zhang J.; Yoshizoe K.; Terayama K.; Tsuda K. ChemTS an efficient python library for de novo molecular generation. Sci. Technol. Adv. Mater. 2017, 18, 972–976. 10.1080/14686996.2017.1401424. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Cho K.; van Merrienboer B.; Gulcehre C.; Bahdanau D.; Bougares F.; Schwenk H.; Bengio Y.. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing; EMNLP, 2014.
  19. Coulom R.Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. 5th International Conference on Computer and Games, 2006.
  20. Sumita M.; Yang X.; Ishihara S.; Tamura R.; Tsuda K. Hunting for Organic Molecules with Artificial Intelligence: Molecules Optimized for Desired Excitation Energies. ACS Cent. Sci. 2018, 4, 1126–1133. 10.1021/acscentsci.8b00213. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Zhang Y.; Zhang J.; Suzuki K.; Sumita M.; Terayama K.; Li J.; Mao Z.; Tsuda K.; Suzuki Y. Discovery of polymer electret material via de novo molecule generation and functional group enrichment analysis. Appl. Phys. Lett. 2021, 118, 223904. 10.1063/5.0051902. [DOI] [Google Scholar]
  22. Sumita M.; Terayama K.; Suzuki N.; Ishihara S.; Tamura R.; Chahal M. K.; Payne D. T.; Yoshizoe K.; Tsuda K. De novo creation of a naked eye–detectable fluorescent molecule based on quantum chemical computation and machine learning. Sci. Adv. 2022, 8, eabj3906 10.1126/sciadv.abj3906. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Fujita T.; Terayama K.; Sumita M.; Tamura R.; Nakamura Y.; Naito M.; Tsuda K. Understanding the evolution of a de novo molecule generator via characteristic functional group monitoring. Sci. Technol. Adv. Mater. 2022, 23, 352–360. 10.1080/14686996.2022.2075240. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Cummins D. J.; Bell M. A. Integrating Everything: The Molecule Selection Toolkit, a System for Compound Prioritization in Drug Discovery. J. Med. Chem. 2016, 59, 6999–7010. 10.1021/acs.jmedchem.5b01338. [DOI] [PubMed] [Google Scholar]
  25. Browne C. B.; Powley E.; Whitehouse D.; Lucas S. M.; Cowling P. I.; Rohlfshagen P.; Tavener S.; Perez D.; Samothrakis S.; Colton S. A survey of Monte Carlo tree search methods. IEEE Trans. Comput. Intell. AI in Games 2012, 4, 1–43. 10.1109/tciaig.2012.2186810. [DOI] [Google Scholar]
  26. Sterling T.; Irwin J. J. ZINC 15—Ligand Discovery for Everyone. J. Chem. Inf. Model. 2015, 55, 2324–2337. 10.1021/acs.jcim.5b00559. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Mendez D.; Gaulton A.; Bento A. P.; Chambers J.; De Veij M. D.; Félix E.; Magariños M. P.; Mosquera J. F.; Mutowo P.; Nowotka M.; Gordillo-Marañón M.; Hunter F.; Junco L.; Mugumbate G.; Rodriguez-Lopez M.; Atkinson F.; Bosc N.; Radoux C. J.; Segura-Cabrera A.; Hersey A.; Leach A. R. ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res. 2018, 47, D930–D940. 10.1093/nar/gky1075. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Ertl P.; Schuffenhauer A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J. Cheminf. 2009, 1, 8. 10.1186/1758-2946-1-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Bickerton G. R.; Paolini G. V.; Besnard J.; Muresan S.; Hopkins A. L. Quantifying the chemical beauty of drugs. Nat. Chem. 2012, 4, 90–98. 10.1038/nchem.1243. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Zhu H.; Martin T. M.; Ye L.; Sedykh A.; Young D. M.; Tropsha A. Quantitative Structure-Activity Relationship Modeling of Rat Acute Toxicity by Oral Exposure. Chem. Res. Toxicol. 2009, 22, 1913–1921. 10.1021/tx900189p. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Sushko I.; Salmina E.; Potemkin V. A.; Poda G.; Tetko I. V. ToxAlerts: A Web Server of Structural Alerts for Toxic Chemicals and Compounds with Potential Adverse Reactions. J. Chem. Inf. Model. 2012, 52, 2310–2316. 10.1021/ci300245q. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Non MedChem-Friendly SMARTS. https://www.surechembl.org/knowledgebase/169485-non-medchem-friendly-smarts, 2022. (accessed on 3 June, 2022).
  33. Huang K.; Fu T.; Gao W.; Zhao Y.; Roohani Y.; Leskovec J.; Coley C. W.; Xiao C.; Sun J.; Zitnik M.. Therapeutics Data Commons: Machine Learning Datasets and Tasks for Drug Discovery and Development. Proceedings of Neural Information Processing Systems; NeurIPS Datasets and Benchmarks, 2021.
  34. Landrum G.RDKit: Open-Source Cheminformatics. https://www.rdkit.org.
  35. Ke G.; Meng Q.; Finley T.; Wang T.; Chen W.; Ma W.; Ye Q.; Liu T.-Y.. LightGBM: A Highly Efficient Gradient Boosting Decision Tree, 2017; Vol. 30.
  36. Brown N.; Fiscato M.; Segler M. H.; Vaucher A. C. GuacaMol: Benchmarking Models for de Novo Molecular Design. J. Chem. Inf. Model. 2019, 59, 1096–1108. 10.1021/acs.jcim.8b00839. [DOI] [PubMed] [Google Scholar]
  37. Subramanian G.; Ramsundar B.; Pande V.; Denny R. A. Computational Modeling of β-Secretase 1 (BACE-1) Inhibitors Using Ligand Based Approaches. J. Chem. Inf. Model. 2016, 56, 1936–1949. 10.1021/acs.jcim.6b00290. [DOI] [PubMed] [Google Scholar]
  38. Eberhardt J.; Santos-Martins D.; Tillack A. F.; Forli S. AutoDock Vina 1.2.0: New Docking Methods, Expanded Force Field, and Python Bindings. J. Chem. Inf. Model. 2021, 61, 3891–3898. 10.1021/acs.jcim.1c00203. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Xiong G.; Wu Z.; Yi J.; Fu L.; Yang Z.; Hsieh C.; Yin M.; Zeng X.; Wu C.; Lu A.; Chen X.; Hou T.; Cao D. ADMETlab 2.0: an integrated online platform for accurate and comprehensive predictions of ADMET properties. Nucleic Acids Res. 2021, 49, W5–W14. 10.1093/nar/gkab255. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Wang Z.; Chen J.; Hong H. Developing QSAR Models with Defined Applicability Domains on PPARγ Binding Affinity Using Large Data Sets and Machine Learning Algorithms. Environ. Sci. Technol. 2021, 55, 6857–6866. 10.1021/acs.est.0c07040. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data Availability Statement

Our implementation is publicly available on GitHub at https://github.com/molecule-generator-collection/ChemTSv2 under the MIT License. The README file at https://github.com/molecule-generator-collection/ChemTSv2/blob/master/doc/multiobjective_optimization_using_dscore.md provides information about how to set up and use the application. Training data and trained prediction models are stored at https://github.com/ycu-iil/prediction_model_collection under the MIT License.


Articles from Journal of Chemical Information and Modeling are provided here courtesy of American Chemical Society

RESOURCES