Abstract
Generative deep learning models enable data-driven de novo design of molecules with tailored features. Chemical language models (CLM) trained on string representations of molecules such as SMILES have been successfully employed to design new chemical entities with experimentally confirmed activity on intended targets. Here, we probe the application of CLM to generate multi-target ligands for designed polypharmacology. We capitalize on the ability of CLM to learn from small fine-tuning sets of molecules and successfully bias the model towards designing drug-like molecules with similarity to known ligands of target pairs of interest. Designs obtained from CLM after pooled fine-tuning are predicted active on both proteins of interest and comprise pharmacophore elements of ligands for both targets in one molecule. Synthesis and testing of twelve computationally favored CLM designs for six target pairs reveals modulation of at least one intended protein by all selected designs with up to double-digit nanomolar potency and confirms seven compounds as designed dual ligands. These results corroborate CLM for multi-target de novo design as source of innovation in drug discovery.
Subject terms: Computational chemistry, Cheminformatics, Medicinal chemistry
Chemical language models are deep learning models trained with molecules in string representation. They enable data-driven de novo design of molecules with tailored features. Here, the authors used chemical language models to design multi-target ligands.
Introduction
Many prevalent and severe diseases such as the metabolic syndrome (MetS) and chronic inflammatory disorders are multifactorial and involve dysregulation of several signaling systems or metabolic pathways (Fig. 1a)1–5. While typical treatment regimens of multifactorial diseases and multimorbidity heavily rely on drug combinations2,6,7 (also referred to as polypharmacy), designed polypharmacology, i.e., the development of molecules intentionally and simultaneously modulating more than one target involved in a disease, can achieve synergies, improve therapeutic outcomes, and overcome multi-drug treatment8,9. However, designing molecules that intentionally exhibit potent activity on several proteins of interest is challenging since compliance with pharmacophores for two or more target binding sites has to be achieved simultaneously8,9. Machine learning is increasingly used to accelerate innovation in drug discovery10 and might also aid the development of designed multi-target drugs.
Chemical language models (CLM, Fig. 1b)11–13 are deep learning models trained to capture string representations of molecules (i.e., the Simplified Molecular Input Line Entry System14, SMILES) and can be used to automatically design molecules with desired properties15–18. CLM have been successfully employed for de novo design of new chemical entities with intended bioactivity19–22 and enable navigation in the chemical space to obtain tailored molecular designs11,23. The data-driven de novo design approach using CLM is “rule-free” as CLM are capable of extracting relevant features, e.g., for bioactivity from the training molecules23. Therefore, CLM may also serve the task of multi-target design by accessing regions of the chemical space common to ligands of two proteins.
The development of a CLM typically proceeds in two steps20,24. Initial pretraining with a large collection of molecules allows the model to capture the syntax of SMILES. The pretrained general CLM is subsequently fine-tuned by transfer learning25 with a small set of molecules comprising the intended features which introduces a bias towards a region of interest in the chemical space11,23. The fine-tuned model can then be employed to design molecules with desired properties de novo. The critical step of fine-tuning has been successfully implemented with one to tens of known active molecules22. This ability of CLM to capture relevant features for bioactivity purely from molecular templates and to work in low-data regimes suggests great potential to enable data-driven multi-target design.
In this study, we systematically evaluated and optimized the application of CLM to design dual ligands for protein pairs of interest. We focused on combinations of six targets with a validated role in MetS associated disorders: (1) angiotensin II receptor type 1 (AT1), which exhibits a key signaling role in the cardiovascular system26, (2) farnesoid X receptor (FXR), a crucial hepatoprotective transcription factor and metabolic regulator27, (3) free fatty acid receptor 1 (FFAR1, GPR40), which is involved in the control of insulin and gut hormone secretion28, (4) peroxisome proliferator-activated receptor δ (PPARδ), a major regulator of lipid and glucose metabolism in multiple tissues29, (5) thyroid hormone receptor β (THRβ), which is under evaluation as a target in non-alcoholic fatty liver disease30, and (6) soluble epoxide hydrolase (sEH), which is implicated in cardiovascular and inflammatory diseases31,32. These targets cover a broad range of protein families, i.e., G-protein coupled receptors (AT1, FFAR1), nuclear receptors (FXR, PPARδ, THRβ) and enzymes (sEH), and ligands of these proteins have therapeutic relevance or were clinically evaluated for diseases related to the MetS. Of note, some approved drugs selectively targeting one of these proteins are regularly used together in combination therapy33 underlining the potential of designed dual ligands in MetS.
We trained CLM to generate dual ligands of six target pairs and synthesized top-ranking candidates for prospective validation. All twelve CLM designed compounds exhibited biological activity on at least one of the intended targets and seven dual modulators were successfully obtained for three target pairs. These results underscore the value of CLM to access desired regions of the chemical space and corroborate their application for automated de novo design of multi-target ligands.
Results
CLM capture molecular features of pooled ligand classes
To estimate the potential of pairs of the targets of interest to bind a common dual ligand, we evaluated the chemical space covered by their known modulators. Of note, while the numbers of known ligands for the targets of interest were high (Fig. 2a), dual ligands have only been annotated for FXR/sEH (34; potency cutoff <100 µM) and FXR/THRβ (1), and the pairwise similarity between known ligand sets for the targets of interest was low (Fig. 2b). Despite high chemical diversity within all known ligand sets for the targets of interest (EC50/IC50 ≤1 µM; mean ± SD pairwise Tanimoto similarity computed on Morgan fingerprints34: 0.17 ± 0.02 (sEH) to 0.19 ± 0.03 (AT1)), the ligands of all six targets populated defined and distinct regions of the chemical space (Fig. 2c) as illustrated by t-distributed stochastic neighbor embedding (t-SNE) with Morgan fingerprints or the fuzzier pharmacophore descriptors Chemically Advanced Template Search (CATS)35. Among the target pairs, an overlap was evident for FXR agonists and sEH inhibitors as well as FXR and THRβ agonists. Additionally, PPARδ agonists and sEH inhibitors populated proximal regions of the chemical space while ligands of FFAR1, AT1, and sEH were more distant and hence less similar. This was also evident from scaffold analysis indicating that known ligands of FXR and sEH as well as PPARδ and sEH (potency cutoff <100 µM) shared a higher number and more complex (higher no. of atoms) scaffolds than for the other target pairs (Fig. 2d). These results pointed to a substantially higher potential to obtain dual ligands for FXR/sEH, FXR/THRβ, and PPARδ/sEH than for other pairs prompting us to focus on these combinations first.
We intended to achieve multi-target design with CLM using tailored template molecule sets covering ligands for a target pair for fine-tuning. Thus, we retrieved known binders of the proteins of interest from BindingDB36 and clustered the ligands of each target based on their fingerprint similarity. Only the most potent compound from each cluster was kept to cover the entire known ligand space for each target and promote chemical diversity of the training molecules. After manual validation of the intended biological activity (e.g., agonism vs. antagonism) and binding mode (e.g., orthosteric vs. allosteric), 5–9 compounds per target were obtained as fine-tuning sets (Supplementary Table 1). The selected fine-tuning molecules were chemically diverse (Fig. 2e) and represented potent modulators of the targets of interest (Fig. 2f). All templates had no annotated dual activity on the studied targets.
For multi-target design model development, we built on a previously published20 CLM which was pretrained on 365k molecules from ChEMBL37 to capture SMILES syntax and general molecular properties. To achieve multi-target design, this general model was fine-tuned with the template molecule sets for a target pair. Three fine-tuning strategies, (1) sequential, (2) alternating and (3) pooled fine-tuning, were evaluated using beam search20 (width 50) to identify the best performing approach (Fig. 3a and Supplementary Fig. 1a). Beam search sampling is a heuristic method to reveal molecules with high probability to be sampled from a CLM and proved valuable to monitor fine-tuning effects20,22. Evaluation of beam search designs sampled during the transfer learning procedure for similarity to the fine-tuning molecules demonstrated substantial differences in the fine-tuning strategies.
Sequential fine-tuning successfully biased the model towards the first target of each pair within few epochs. However, similarity to known ligands of the first target was also rapidly lost when template molecules for the second target were introduced. Alternating fine-tuning had a similar effect with higher frequency and flipped the bias to either of the targets from epoch to epoch. The intended balanced similarity to both targets of a pair was only achieved by pooling both fine-tuning sets.
De novo designs obtained from the CLM after pooled fine-tuning exhibited favorable validity, novelty and uniqueness (Supplementary Tables 2 and 3) and had quantitative estimation of drug-likeness (QED) scores38 (Fig. 3b and Supplementary Fig. 1b) in the range of the template molecules suggesting that the multi-target design approach by CLM generated novel drug-like molecules. With an increasing number of fine-tuning epochs, the designs approached the pooled template sets in terms of basic features such as molecular weight (MW), clogP and topological polar surface area (TPSA; Fig. 3b and Supplementary Fig. 1b). A bias towards the chemical space around and between the template molecules after fine-tuning was also evident by visualization with t-SNE (Fig. 3c) which revealed that beam search designs from the selected epochs populated a region between known ligands for the targets of interest. Moreover, external target prediction using the Similarity Ensemble Approach (SEA)39 demonstrated substantially enhanced probability for interaction with the targets of interest after fine-tuning compared to baseline designs from epoch 0 (Fig. 3d). Synthetic accessibility40 of the designs was not compromised by pooled fine-tuning and remained in the favorable range of the template molecules (Fig. 3e).
CLM designed dual ligands modulate both intended targets
With these encouraging results, we set out to test the potential of CLM based multi-target design prospectively. From the dual-target CLMs obtained by pooled fine-tuning on ligand sets for FXR/sEH, FXR/THRβ and PPARδ/sEH, we identified epochs complying best with the design criteria, i.e., balanced similarity to the fine-tuning sets for both targets (epochs 51–55 for PPARδ/sEH; cf. Fig. 3c), and sampled 5000 designs by temperature sampling23. From each virtual collection, we selected twelve designs based on sampling frequency22 as model-intrinsic approach for automated design prioritization without external scoring, and twelve designs based on fingerprint similarity to the fine-tuning molecules for further evaluation (Supplementary Table 4). Three compounds per target pair were prioritized from these collections by docking for prospective validation by synthesis and in vitro characterization. Two compounds (1, 2) for the FXR/sEH target pair resembled previously developed dual FXR/sEH modulators41 thus strongly pointing to dual activity. Of note, these previously known dual FXR/sEH ligands were not used to develop the model. As an additional proof of concept, we applied the design approach to the FXR/PPARδ target pair (Supplementary Table 5) for which dual agonists have been previously obtained by rational design42. A CLM fine-tuned with the pooled selective FXR and PPARδ ligand template sets (not containing dual ligands) successfully generated known dual FXR/PPARδ agonists and several structurally similar molecules supporting the applied design strategy.
The CLM designed dual ligand candidates 3–9 were successfully synthesized over 2–6 steps with 1–68% overall yield (Fig. 4) demonstrating that the dual target CLM generated synthesizable molecules. Compound 7 was originally designed with tert-butyloxy motif which was replaced by tert-butyl for improved chemical stability. The dual ligand candidates were characterized in functional assays for the targets of interest using cellular Gal4-hybrid reporter gene assays43 to determine FXR, THRβ and PPARδ modulation and an enzyme activity assay44 with the fluorogenic substrate PHOME to test sEH inhibition (Table 1). In vitro characterization of the CLM designed dual FXR/sEH modulators 1–3 confirmed engagement of both targets by all three compounds with sub-micromolar to low micromolar potencies. The dual FXR/THRβ agonist designs obtained from the model exhibited substantial FXR agonism combined with weak (6) or no detectable (4, 5) THRβ agonism at non-toxic concentrations. All designed PPARδ/sEH modulators (7–9) were confirmed as dual ligands of the intended targets. 7 emerged as remarkably potent sEH inhibitor endowed with micromolar PPARδ agonist activity. 8 and 9 were less potent on both targets.
Table. 1.
ID | Structure | In vitro biol. activity |
---|---|---|
1 |
FXR: EC50 = 0.37 µM sEH: IC50 = 4.1 µM |
|
2 |
FXR: EC50 = 0.44 µM sEH: IC50 = 8.1 µM |
|
3 |
FXR: 3.4-fold act. at 50 µM sEH: IC50 = 21 µM |
|
4 |
FXR: EC50 = 12 µM THRβ: inactive |
|
5 |
FXR: EC50 = 11 µM THRβ: inactive |
|
6 |
FXR: EC50 = 1.0 µM THRβ: 5-fold act. at 15 µM |
|
7 |
PPARδ: EC50 = 10 µM sEH: IC50 = 0.017 µM |
|
8 |
PPARδ: 6.7-fold act. at 80 µM sEH: IC50 = 0.20 µM |
|
9 |
PPARδ: 3.0-fold act. at 80 µM sEH: IC50 = 0.27 µM |
At least one dual ligand was identified for all three target pairs. Activities were determined in Gal4-hybrid reporter gene assays (FXR, THRβ, PPARδ) and an enzyme activity assay (sEH). Activity data are the mean; n ≥ 3.
These results from in vitro testing corroborated multi-target design by CLM. Seven out of nine dual ligands designed by the models modulated both intended targets. With potencies varying between 17 nM and >100 µM, and mostly unbalanced activity on the respective target pair, the CLM designs require structural optimization, but their scaffolds are highly valuable leads for designed polypharmacology.
CLM designs fuse molecular and pharmacophore features of ligands for two targets
The predicted binding modes of the dual modulators 1, 6 and 7 to the targets of interest (Fig. 5a–c) revealed key interactions that were also formed by the co-crystallized ligands. The designs thus complied with the pharmacophore requirements of the respective binding sites indicating that the CLM could capture relevant pharmacophore elements for the respective two binding sites and fuse them in one molecule. This was also evident from structural comparison of the CLM designed dual ligands with the most similar ligands annotated in ChEMBL for the respective targets (Fig. 5d). The dual FXR/THRβ modulator 6 combined the well-known 5-alkyl-3-phenylisoxazole motif of FXR agonists with a (4-benzylphenoxy)acetic acid residue inherited from THRβ agonists. The latter group at the same time resembled aromatic linkers found in FXR agonists bearing the essential carboxylic acid motif. Similarly, the dual PPARδ/sEH ligand 9 comprised a hydrophobic urea scaffold widely found in sEH inhibitors which was merged with a 3-(4-alkyloxyphenyl)propanoic acid pharmacophore of PPARδ agonists. Dual ligands designed by the CLM thus comprised merged pharmacophores8,9, were drug-like with similar QED scores as the template molecules and exhibited preferable synthetic accessibility (Fig. 5e and Supplementary Fig. 2). Due to the structural fusion of ligand features for the proteins of interest, the designed dual ligands exhibited similarity to known modulators of both their targets (Supplementary Table 6) and resembled the respective fine-tuning sets in basic properties like MW, clogP and TPSA (Fig. 5f and Supplementary Fig. 2).
Comparison of the designed dual ligands with the fine-tuning molecules (Fig. 6a and Supplementary Fig. 3) indicated that the CLM fused pharmacophore elements with varying degrees of structural diversity. The designs exhibited low to intermediate Tanimoto similarity to their respective most similar fine-tuning templates and Rapid Similarity Calculation of Maximum Common Edge Subgraph (RascalMCES)45, which identify maximum common sub-skeletons of pairs of molecules, revealed substructure contributions from both selective templates to the dual modulators. To verify that the polypharmacology of the dual ligands was indeed due to the pharmacophore fusion and structural modification designed by the CLM, we tested the commercially available fine-tuning compounds for PPARδ/sEH as representative target pair in the same assays as the designed dual ligands (Table 2). The evaluated fine-tuning molecules were selective for their annotated bioactivity over the second target of the respective designs providing further support for the ability of CLM to fuse selective pharmacophores to dual ligands. Although a potential bias from other molecules used to (pre-)train the CLMs, which act as dual ligands but have not been tested on the targets of interest, cannot be fully excluded, these results indicate that structural fusion and moderate structural modification by the CLM compared to the selective templates was driving the designed polypharmacology.
Table. 2.
Fine-tuning molecule | EC50(PPARδ) [µM] | IC50(sEH) [µM] | Tanimoto |
---|---|---|---|
0.016 | >30 | 0.17; 0.13; 0.13 | |
0.022 | >30 | 0.20; 0.15; 0.16 | |
0.009 | >30 | 0.13; 0.11; 0.11 | |
>50 | 0.005 | 0.55; 0.25; 0.27 | |
>50 | 0.097 | 0.38; 0.19; 0.19 | |
>50 | <0.001 | 0.11; 0.13; 0.13 |
Commercially available fine-tuning molecules for PPARδ/sEH were experimentally confirmed selective for their annotated bioactivity over the respective second target in the assays used to characterize the dual PPARδ/sEH ligand designs 7–9. Data are the mean; n = 3. Tanimoto refers to the fine-tuning molecules’ similarity to the experimentally tested dual ligand designs 7–9.
To validate our design approach further, we evaluated whether the CLM learned target-specific SAR features from the pooled dual fine-tuning by comparing with designs from a native CLM without fine-tuning and with designs obtained after fine-tuning with the pooled templates for all four targets (Fig. 6b–d). In contrast to pooled dual fine-tuning, the Tanimoto similarity to the template sets did not consistently increase when the CLM was fine-tuned with the pooled templates for FXR, THRβ, PPARδ and sEH (Fig. 6b) suggesting potentially that simultaneous training with molecular information for four targets was too diverse to extract the relevant structural features or too complex to be structurally fused. Similarly, target prediction for the designs obtained after pooled fine-tuning for four targets revealed increasing scores but no consistent balanced improvement and strong variance over the fine-tuning procedure (Fig. 6c).
We then used pharmacophore models for the targets of interest (Fig. 6d and Supplementary Fig. 4) to estimate the rates of designed dual ligands obtained from a native baseline CLM, after pooled fine-tuning for target pairs (dual), and after fine-tuning with the pooled templates for all four targets. The fraction of designs from the baseline CLM matching both pharmacophore models of a target pair of interest was small (Fig. 6d) aligning with very low scores in the target prediction for the native model (Fig. 6c, epoch 0—baseline model). Pooled fine-tuning with the templates for all four targets slightly enhanced the fraction of designs matching pharmacophore models for the target pairs but was inferior to the focused dual models. These results further corroborated the pooled fine-tuning for target pairs to achieve dual ligand design by CLM.
Dual modulator design by CLM fails for target pairs with highly diverse ligands
Intrigued by the favorable results for CLM-driven dual ligand design we next addressed the target pairs whose known ligands populated more distant regions in the chemical space (Fig. 2c). Using the same procedures that successfully generated dual FXR/sEH, FXR/THRβ, and PPARδ/sEH modulators, we designed dual ligand candidates for AT1/sEH, FFAR1/sEH, and AT1/FFAR1 and selected one design per target pair for synthesis based on sampling frequency and subsequent docking. Again, the computationally favored designs 10–12 were synthesizable with good overall yields (14–40% over 4 steps; Fig. 7a). For improved accessibility, the dual FFAR1/sEH modulator design 11 was slightly modified by replacing a cyclopropyl substituent suggested by the CLM with an ethyl group. In vitro testing in cellular assays for AT1 and FFAR1, and an enzyme activity assay for sEH revealed biological activity for all three designs (Table 3) but no design exhibited the intended dual modulation. The dual AT1/sEH (10) and FFAR1/sEH (11) modulator candidates potently inhibited sEH but were inactive on the intended GPCRs. The dual AT1/FFAR1 design 12 activated FFAR1 but showed no effect on AT1 activity. Despite only testing one design per target pair, these results may suggest that the CLM was able to capture features relevant for modulation of the targets of interest but failed to access a chemical space region of dual ligands. However, based on the different distribution of known modulators for these targets in the chemical space (cf. Fig. 2c) it can be speculated whether such common region exists. Of note, ChEMBL contains no molecule with an annotated dual activity at <100 µM on AT1/sEH, FFAR1/sEH or AT1/FFAR1 (cf. Fig. 2a). The failure to access a chemical space of dual ligands for the targets of interest could also be observed by target prediction (SEA, Fig. 7b) which suggested improvement for only one target (FFAR1/sEH) or varying performance (AT1/FFAR1). Only the dual AT1/sEH focused designs tended to have a higher probability for activity on both targets at later fine-tuning epochs.
Table 3.
ID | Structure | In vitro biol. activity |
---|---|---|
10 |
AT1: inactive sEH: IC50 = 0.02 µM |
|
11 |
FFAR1: inactive sEH: IC50 = 0.061 µM |
|
12 |
FFAR1: EC50 = 8.5 µM AT1: inactive |
The dual AT1/sEH (10), FFAR1/sEH (11) and FFAR1/AT1 (12) ligand designs modulated only one of the targets of interest. Data are the mean; n ≥ 3.
Discussion
Generative deep learning is on the rise as a tool to design innovative new chemical entities (NCEs) with tailored properties such as bioactivity on an intended target. Such de novo design by deep learning is data-driven and overcomes the need for predefined rules, thus offering potential access to a wider chemical space11,23. Among various graph-based and sequence-based approaches19–22,46–49, CLM have been particularly successful in designing NCEs with experimentally confirmed activity which was achieved by fine-tuning with known ligands for the targets of interest as templates. Based on this valuable performance of CLM in transfer learning even in low-data scenarios22,23, we here sought to extend the scope of CLM application to multi-target design, i.e., de novo design of NCEs with biological activity on two intended targets. Designed multiple ligands of proteins that are co-involved in one pathology can exhibit enhanced efficacy by exploiting synergistic effects and may be valuable for drug development. For example, in multifactorial pathologies like the MetS or chronic inflammatory diseases, designed polypharmacology could be an avenue to improve therapeutic outcomes and overcome polypharmacy. However, the development of multi-target ligands is challenging and typically requires sophisticated design and extensive optimization. CLM could accelerate solving this task as an innovative data-driven approach to designed polypharmacology.
CLM learn from data and can capture higher-order chemical features from the molecular string representation of SMILES to achieve rule-free de novo design. Fine-tuning with a tailored set of molecules comprising the desired features is the critical step in CLM development and allows biasing the model towards a chemical space region of interest. We reasoned that data-driven de novo design by CLM could offer access to multi-target ligands by employing sets of known modulators for target pairs of interest for fine-tuning. Pooled sets of template ligands indeed successfully biased the model to design molecules exhibiting similarity to both template sets. Sequential fine-tuning, in contrast, produced a bias towards one target per pair and the alternating approach seemed to rather disturb the model. Designs obtained from CLM after pooled fine-tuning resembled the combined fine-tuning sets in terms of basic molecular features and had higher predicted probabilities for interaction with the targets of interest than designs from the baseline model.
We selected multi-target de novo designs, that were computationally favored based on similarity and on sampling frequency as a model intrinsic measure, for experimental validation. Synthesis and in vitro evaluation confirmed all twelve tested designs as active on at least one of the targets of interest and seven out of twelve modulated both intended targets with varying potencies. This high success rate in obtaining dual ligands may be due to a focus on target pairs with well compatible binding sites and is likely not representative of a broader application of CLM for multi-target design. Still, this outcome demonstrates potential of CLM in designing dual modulators as innovative bioactive NCEs.
Experimentally confirmed dual ligands were successfully obtained for FXR/sEH, FXR/THRβ, and PPARδ/sEH while the computationally most favored designs obtained from the CLM for AT1/sEH, FFAR1/sEH, and AT1/FFAR1 did not act as dual modulators but were selective for one of the targets of interest. Known ligands of target pairs for which dual ligands were successfully obtained populate proximal regions in the chemical space indicating higher chemical similarity compared to ligands of AT1/FFAR1 and sEH. Despite only testing one design per target pair for the less similar combinations, these observations indicate that de novo design of multi-target ligands by CLM is feasible in scenarios where the template molecules representing the targets of interest are close in the chemical space and more challenging for target combinations whose ligands are chemically more diverse—a scenario that also hinders multiple ligand development by traditional, systematic means8. While target pairs with too diverse ligand binding sites will likely not be accessible for designed polypharmacology due to their inability to bind a common ligand, dual modulation of target combinations with intermediate ligand similarity might be achievable with further improved CLM, combined approaches like interactome learning50, or newer models like GPT.
Comparison with the baseline model and a CLM fine-tuned with all template molecules confirmed that the focused dual ligand CLMs captured target-specific ligand features. Designs with confirmed dual activity obtained from these dual ligand CLM comprised structural contributions from template ligands for both targets highlighting an ability of CLM to merge pharmacophores. In contrast to linking or fusing the chemical features relevant for activity, dual ligands with merged pharmacophores offer favorable drug-like properties and are most difficult to obtain in designed polypharmacology8,9. These favorable results therefore corroborate CLM for multi-target de novo design.
Methods
Computational methods
Software
Python (v3.7.13 for CLM and v3.9.16 for visualizations), RDKit (v2022.09), DataWarrior (v5.5.0), tensorflow (v2.14.0), keras (v.2.14.0) scikit-learn (v1.2.2), matplotlib (v3.8.4), pandas (v.2.2.2) and numpy (v.1.26.4) in Python (v3.7.13). Molecular Operating Environment51 (MOE, version 2022.02, Chemical Computing Group Inc. Montreal, QC, Canada) for molecular docking and pharmacophore generation and search.
Data processing
Molecules were encoded as canonical SMILES using RDKit (v2022.09, www.rdkit.org). Only SMILES up to 140 characters in length were retained and standardized in Python (v3.9.16, www.python.org) by removing stereochemistry, salts, and duplicates.
Fine-tuning sets
Molecules with IC50/EC50 ≤ 1 µM on the targets of interest were selected from BindingDB36 (v2021). After processing (cf. Data processing), the molecules were clustered based on Morgan fingerprints34 (length = 1024, radius = 2) using the k-means algorithm from scikit-learn (v1.2.2, www.scikit-learn.org) in Python (v3.9.16). The cluster number was optimized from 2 to the number of molecules to minimize the silhouette score. From the resulting clusters, the molecule with the lowest EC50/IC50-value was selected as representative. From this collection, the final fine-tuning molecules for each target were selected based on manual binding mode inspection and confirmation of the desired biological activity (agonism, inhibition, etc.).
CLM implementation
We used a recently published framework (www.github.com/ETHmodlab/ molecular_design_with_beam_search)20 to implement the multi-target CLM in Python (v3.7.13) using tensorflow (v2.14.0) and keras API (www.tensorflow.org, v.2.14.0). The model was based on a recurrent neural network with long short-term memory (LSTM) cells and consisted of four layers with a total of 5,820,515 parameters: layer 1, BatchNormalization; layer 2, LSTM with 1024 units; layer 3, LSTM with 256 units; layer 4, BatchNormalization. The CLM was trained using the Adam optimizer (learning rate = 10−3), categorical cross-entropy loss, and a dropout rate of 0.4 for both LSTM layers, with the first layer set to frozen. The CLM was trained on SMILES strings encoded as one-hot vectors and all SMILES were augmented 10-fold.
Training strategies
In each strategy, the CLM was trained over 60 epochs. Pooled: both fine-tuning sets were used together in a single set to fine-tune the CLM. Alternating: the two fine-tuning sets were alternatingly presented to the CLM in each epoch. Sequential: the CLM was first trained with one fine-tuning set for 60 epochs. Then, based on the maximum similarity of the beam designs to the fine-tuning set, one epoch was selected to continue training with the other fine-tuning set.
Beam search and epoch selection
We used beam search20 with a beam width of 50 to monitor fine-tuning and select epochs for sampling. The maximum SMILES string length was defined as 140 tokens.
Temperature sampling
SMILES were sampled using the softmax function parameterized by two different sampling temperatures (0.2 and 0.7). The probability of the ith character to be sampled from the CLM was computed as: where is the CLM prediction for character i, T is the temperature, and qi is the sampling probability of character i.
Top 12 selection
For both sampling temperatures (0.2, 0.7) we selected 12 CLM designs with highest sampling frequency and 12 CLM designs with highest geometric mean similarity to their fine-tuning sets. The geometric mean between frequency and similarity led to the selection of the top 12 candidate designs.
Molecular docking
From the top 12 candidate designs, molecules were selected for synthesis and in vitro characterization by molecular docking. Docking was performed in Molecular Operating Environment51 (MOE, version 2022.02, Chemical Computing Group Inc. Montreal, QC, Canada). The ligand-bound X-ray structures of sEH (pdb ID: 5ali52), FXR (pdb ID: 6a6053), THRβ (pdb ID: 1nax54), PPARδ (pdb ID: 5y7x), AT1 (pdb ID: 4zud55) and FFAR1 (pdb ID: 4phu56) served as templates. The structures were prepared using the MOE QuickPrep tool with default settings, adjusting the protonation state of the complex. Ligands were prepared using the MOE Wash tool with dominant protonation state at pH 7.0; coordinates were rebuilt 3D; existing chirality was maintained. The following settings were used for all docking calculations: Force Field = Amber10:EHT, Receptor = Receptor and Solvent Atoms, Site = Crystallized Ligand Atoms, Placement = Template with 100 poses, Refinement = Rigid Receptor, scoring function = GBVI/WSA dG with 10 poses.
Molecular descriptors
Morgan fingerprints34 (length = 1024, radius = 2), RascalMCES45 (similarityThreshold = 0.5) and quantitative estimation of drug-likeness38 (default settings) were computed using RDKit (v2022.09) in Python (v3.9.16). Total molweight, cLogP and TPSA were computed using DataWarrior57 (v5.5.0). Synthetic accessibility score was computed using SwissADME40. CATS35 descriptors were computed with https://github.com/iwatobipen/CATS2D.
Target prediction
The web interface of the Similarity ensemble approach (SEA)39 (https://sea.bkslab.org/) was used for target prediction.
Stochastic neighbor embedding
The t-SNE projection was performed with scikit-learn (v1.2.2) in Python (v3.9.16) on Morgan fingerprints and on CATS35 descriptors. The following settings were chosen: perplexity = 30, learning_rate = “auto”, init = “pca”.
Pharmacophore modeling and search
For pharmacophore modeling, the ligands of the superimposed X-ray structures of sEH (pdb IDs: 5ali52, 3ant58, 3wke59), FXR (pdb IDs: 6a6053, 3dcu60, 3fli61), THRβ (pdb IDs: 1nax54, 1r6g62, 6kkb63) and PPARδ (pdb IDs: 5y7x, 3tkm64, 5u4665) were used in Molecular Operating Environment51 (MOE, version 2022.02, Chemical Computing Group Inc. Montreal, QC, Canada). A consensus pharmacophore model was generated per target considering features present in at least 2 ligands. Binding site residues of the target were defined as excluded volumes. The sampled designs of the respective target pairs were screened with the respective pharmacophore models for the targets of interest.
In vitro assays
Cell culture for stably transfected CHO-K1 cell lines
CHO-K1 cells (DSMZ, ACC110) were cultured at 5% CO2 and 37 °C in full growth medium (Ham’s F-12 medium supplemented with 10% fetal calf serum (FCS), penicillin (100 U/mL) and streptomycin (100 µg/mL)). For harvesting cells were incubated with Trypsin/EDTA. IP-One assay for AT1 and FFAR1. To investigate the ligand dependent activation of the two GPCRs, inositol mono-phosphate (IP-One) accumulation was monitored using an HTRF based displacement assay between FRET acceptor coupled IP-One and Terbium cryptate coupled anti-IP-One antibody (IP-One assay kit, Cisbio Bioassays, Codolet, France). AT1: the IP-One assay for AT1 was performed according to the protocol published by Hernandez-Olmos et al.66 In brief, by using the sleeping beauty method a CHO-K1 cell line was generated which stably overexpresses AT1 isoform 1 and GNA11 (G-protein subunit α 11) as well as cells only overexpressing the GNA11 without any GPCR, as control. Cells were seeded into white tissue culture 384-well plates (Greiner Bio-One, Frickenhausen, Germany) at 15,000 cells/well and incubated overnight at 37 °C and 5% CO2. The next day the medium was removed and the cells were washed four times with stimulation buffer (146 mM NaCl, 4.2 mM KCl, 1 mM CaCl2, 0.5 mM MgCl2, 50 mM LiCl, 5.5 mM D-glucose, 0.1% (w/v) fatty acid-free bovine serum albumin fraction V buffered with 10 mM HEPES at pH 7.4 (NaOH)) using a Tecan HydroSpeed plate washer (Tecan Deutschland GmbH, Crailsheim, Germany). Thereafter, the compounds and a total of 0.5% DMSO were added to the cells. The plate was sealed and incubated for 90 min at 37 °C. Afterwards the cells were lysed by addition of the detection agents prepared in lysis buffer according to the manufacturer′s instructions. To calculate the IP-One concentration produced by the cells, a standard curve using dilutions of unlabeled IP-One in buffer without cells was used. For control of assay performance, a full dose-response curve of the known agonist [Val5]-Angiotensin II (Sigma-Aldrich #A2900) was conducted. Antagonist activity of compounds was measured with parallel stimulation with 10 nM [Val5]-Angiotensin II. FFAR1: the cell line generation for FFAR1 was conducted according to the protocol published by Ehrler et al.67 and the second messenger assay was performed as described for AT1. A full dose-response curve with FFAR1 reference agonist GW9508 was measured as control for assay performance. All compounds were tested at the indicated concentration and CHO-K1 wt cells were used as control.
sEH activity assay
Soluble epoxide hydrolase (sEH) was purified as described by Hahn et al.68 and Lukin et al.69, respectively. Protein expression was carried out in 500 ml ZYP5052 autoinduction medium containing Kanamycin (100 mg/L) with lactose uptake-dependent induction. The fermentation procedure started at 37 °C and 200 rpm until the OD600 reached approximately 1.0. Then the temperature was reduced to 16 °C. After 24–36 h the cells (OD600 ≈ 15.0) were harvested by centrifugation at 5000 × g (4 °C, 20 min). Pellets were resuspended in a resuspension buffer containing Tris (50 mM), NaCl (0.5 M) and MgCl2 (10 mM) at pH 8 and sonicated twice for 15 s on ice. The sonicated suspension was centrifuged (18,000 × g, 30 min, 4 °C) and the supernatant was ultracentrifuged (100,000 × g, 1 h 10 min, 4 °C). The ultracentrifugation supernatant was loaded on a 5 ml HisTrap® FF (GE Healthcare, München, Germany) column at a flow rate of 2 ml/min and washed with washing buffer containing Tris (50 mM), NaCl (0.5 M), MgCl2 (10 mM), imidazole (20 mM) and DTT (3 mM) at pH 8, and eluted at 120 mM imidazole. The fractions containing hsEH after SDS-PAGE fraction analysis were pooled. The identity of hsEH was confirmed by Western Blotting and hsEH hydrolase activity assay. The inhibitory potency of the compounds was determined using a fluorescence-based enzyme activity assay in which PHOME (3-phenyl-cyano(6-methoxy-2-naphthalenyl)methylester-2-oxiraneacetic acid) is metabolized by sEH hydrolase towards a fluorescent naphthalene aldehyde70. Dilution series of the compounds were incubated for 30 min with recombinant sEH (final protein concentration 3 nM). Subsequently, the fluorogenic substrate PHOME at final concentration of 50 μM was added, and the fluorescence was measured every minute for 45 min using a Tecan plate reader (ex: 360 nm, em: 465 nm; bandwidth 35 nm). Inhibition in percent was plotted against the logarithmic test compound concentration and IC50-values were calculated using the “log(Inhibitor) vs. Response – Variable slope (four parameters)” equation in GraphPad Prism 7.
Hybrid Gal4 reporter gene assays
HEK293T cells (ATCC, CRL-1573) were cultured in Dulbecco’s modified eagle medium (DMEM), high glucose, supplemented with 10% FCS, sodium pyruvate (1 mM), penicillin (100 U/mL), and streptomycin (100 µg/mL) to 70–80% confluence at 37 °C and 5% CO2 and seeded in clear 96-well plates with a density of 40,000 cells/well. After 24 h, the medium was changed to Opti-MEM without supplements. Transient transfection was performed using Lipofectamine LTX reagent (Invitrogen, Carlsbad, CA, USA) according to the manufacturer’s protocol with pFR-Luc (100 ng/well) as reporter plasmid, pRL-SV40 (2 ng/well) for normalization of transfection efficiency and cell growth, and one pFA-CMV-hNR-LBD clone coding for the hinge region and ligand binding domain (LBD) of the NR of interest. Five hours after transfection, the cells were incubated with Opti-MEM supplemented with penicillin (100 U/mL) and streptomycin (100 µg/mL), additionally containing 0.1% DMSO and the respective test compound or 0.1% DMSO alone as untreated control. After overnight (14–16 h) incubation, luminescence was measured using the Dual-Glo® Luciferase Assay System (Promega, Madison, WI, USA) according to the manufacturer’s protocol with a Tecan Spark® 10 M multimode microplate reader. Relative light units (RLU) for normalization of transfection efficiency and cell growth were calculated by dividing firefly luminescence by renilla luminescence and multiplication by 1000. Fold activation was obtained by dividing the mean RLU of a test compound at a respective concentration by the mean RLU of the untreated control. All samples were tested in duplicates in at least three biologically independent experiments. For dose response curve fitting and calculation of the EC50/IC50 values, the equation “[Agonist]/[Inhibitor] vs. response – Variable slope (four parameters)” was used in GraphPad Prism 7. The following pFA-CMV-hNR-LBD clones and reference ligands were used: THRβ (pFA-CMV-hTHRβ-LBD71, 1 µM T3), PPARδ (pFA-CMV-hPPARδ-LBD72, 1 µM L-165,041), FXR (pFA-CMV-hFXR-LBD73, 1 µM GW4064).
Chemistry
General
Unless stated otherwise, reactions were performed in dry, pre-heated Schlenk glassware under argon atmosphere. Chemicals purchased from BLD and Merck were of reagent grade and used without further purification. Dry solvents were purchased from Sigma-Aldrich and stored under argon atmosphere. Other solvents—especially for work-up procedures—were of reagent grade or purified by distillation (iso-hexanes, EtOAc). Reactions were monitored by thin layer chromatography (TLC) on TLC Silica gel 60 F254 aluminum sheets by Merck and visualized under fluorescent light (254 or 366 nm) or by ceric ammonium molybdate staining (5% (NH4)6Mo7O24 × 4 H2O, 0.2% Ce(SO4)2 in 5%iger H2SO4). Reaction and intermediate product control was performed by mass spectrometry on an Advion Interchim MS system with APCI using either ASAP for solutions or an Advion Plate express for TLC. Manual normal phase flash column chromatography was performed under positive N2 pressure using silica 60 M from Macherey-Nagel. Automated normal phase flash column chromatography (CC) was performed on an Interchim Puriflash XS520Plus using corresponding PF-15/30/50SIHP columns. NMR spectroscopy was performed on a Bruker BioSpin AV400 or AV500. Chemical shifts (δ) are reported in ppm relative to tetramethylsilane (TMS) or the residual solvent signal protons. Signal multiplicity is reported as singlet (s), broad singlet (bs), doublet (d), triplet (t), quartet (q), quintet (quin), heptet (hept), multiplet (m), doublet of doublets (dd), triplet of triplets (tt), doublet of triplets (dt), triplet of doublets (td). Coupling constants (J) are given in Hz. Assignments were made by means of two-dimensional experiments (1H-1H-COSY, 1H-13C-HMQC, 1H-13C-HMBC). Quantitative 1H NMR (qHNMR) were acquired according to a method described by Pauli et al.74 with internal calibration. The qHNMR measurements were carried out under conditions allowing complete relaxation to assure the exact determination of peak area ratios. High-resolution mass (HRMS) spectra were recorded on an MALDI LTQ ORBITRAP XL instrument (Thermo Fisher Scientific Inc., Waltham, USA). using a α-cyano-4-hydroxycinnamic acid (HCCA) matrix or on a Thermo Finnigan LTQ FT Ultra FT-ICR (ESI). LCMS analysis was performed on a LCMS 2020 from Shimadzu (Duisburg, Germany). For analytical determination, a Luna 10u C18(2) (250 × 4.6 nm) and for semi-preparative purification, a Luna 10μ C18(2) (250 × 21.20 nm) column from Phenomenex LTD Deutschland (Aschaffenburg, Germany) was used. The system is equipped with a SPD 20 A UV/VIS detector (λ = 240/280 nm) and an ESI-TOF (measuring in the positive- and/or negative-ion mode). Mixtures of MeCN/0.1% aqueous formic acid were used as mobile phase with a flow rate of 0.1 mL/min (Scout Column) or 21 mL/min (semi-preparative) at rt. The following methods were used: Method 1: linear gradient from 50 to 90% MeCN over 10 min, 90% MeCN for 5 min, linear gradient from 90 to 50% MeCN over 1 min, 50% for 2 min; Method 2: 5% MeCN over 2 min, linear gradient from 5% to 90% MeCN over 12 min, 90% MeCN for 6 min, linear gradient from 90% to 5% MeCN over 1 min, 50% for 4 min; Purity of the compounds was determined by integrating the peaks of the UV-chromatogram. All compounds for biological testing had >95% purity according to qHNMR or LCMS analysis.
Synthetic procedures and analytical characterization of the test compounds are described in the Supporting Information.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Supplementary information
Source data
Acknowledgements
The authors thank Lilia Weizel for technical support and Dr. Kerstin Hiesinger for sharing HK91. This research was funded by the European Union (ERC, NeuRoPROBE, 101040355). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council. Neither the European Union nor the granting authority can be held responsible for them. This work has received funding from the Innovative Medicines Initiative 2 Joint Undertaking (JU) under grant agreement No. 875510. The JU receives support from the European Union’s Horizon 2020 research and innovation program, EFPIA, Ontario Institute for Cancer Research, Royal Institution for the Advancement of Learning McGill University, Kungliga Tekniska Hoegskolan, and Diamond Light Source Limited. J.H.M.E. and E.P. thank the Deutsche Forschungsgemeinschaft (DFG, SFB1039, TP A07) for financial support.
Author contributions
L.I. and T.H. performed the computations; E.S., K.S., F.F.L., B.H., J.B., and J.P. synthesized the computational designs; E.S., J.H.M.E., and J.A.M. characterized the computational designs; L.I., T.H., J.A.M., J.P., E.P., and D.M. analyzed and interpreted the data; D.M. conceived and supervised the study; L.I. and D.M. prepared the figures and wrote the manuscript with contributions from all authors.
Peer review
Peer review information
Nature Communications thanks Jurgen Bajorath, Giovanni Bottegoni and the other anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Data availability
All data generated in this study and supporting the results of this study are provided in the Supplementary Information and Source Data file. Source data are provided with this paper.
Code availability
Code used in this study is available at Zenodo (10.5281/zenodo.12795470)75.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
The online version contains supplementary material available at 10.1038/s41467-024-52060-8.
References
- 1.Eckel, R. H., Grundy, S. M. & Zimmet, P. Z. The metabolic syndrome. Lancet365, 1415–1428 (2005). 10.1016/S0140-6736(05)66378-7 [DOI] [PubMed] [Google Scholar]
- 2.Grundy, S. M. Drug therapy of the metabolic syndrome: minimizing the emerging crisis in polypharmacy. Nat. Rev. Drug Discov.5, 295–309 (2006). 10.1038/nrd2005 [DOI] [PubMed] [Google Scholar]
- 3.Francque, S. et al. Nonalcoholic steatohepatitis: the role of peroxisome proliferator-activated receptors. Nat. Rev. Gastroenterol. Hepatol.18, 24–39 (2021). 10.1038/s41575-020-00366-5 [DOI] [PubMed] [Google Scholar]
- 4.Levy, M., Kolodziejczyk, A. A., Thaiss, C. A. & Elinav, E. Dysbiosis and the immune system. Nat. Rev. Immunol.17, 219–232 (2017). 10.1038/nri.2017.7 [DOI] [PubMed] [Google Scholar]
- 5.Hotamisligil, G. S. Inflammation, metaflammation and immunometabolic disorders. Nature542, 177–185 (2017). 10.1038/nature21363 [DOI] [PubMed] [Google Scholar]
- 6.Langenberg, C., Hingorani, A. D. & Whitty, C. J. M. Biological and functional multimorbidity—from mechanisms to management. Nat. Med.29, 1649–1657 (2023). 10.1038/s41591-023-02420-6 [DOI] [PubMed] [Google Scholar]
- 7.van Onna, M. & Boonen, A. Challenges in the management of older patients with inflammatory rheumatic diseases. Nat. Rev. Rheumatol.18, 326–334 (2022). 10.1038/s41584-022-00768-6 [DOI] [PubMed] [Google Scholar]
- 8.Proschak, E., Stark, H. & Merk, D. Polypharmacology by design: a medicinal chemist’s perspective on multitargeting compounds. J. Med. Chem.62, 420–444 (2019). 10.1021/acs.jmedchem.8b00760 [DOI] [PubMed] [Google Scholar]
- 9.Peters, J. U. Polypharmacology—foe or friend? J. Med. Chem.56, 8955–8971 (2013). 10.1021/jm400856t [DOI] [PubMed] [Google Scholar]
- 10.Vamathevan, J. et al. Applications of machine learning in drug discovery and development. Nat. Rev. Drug Discov.18, 463–477 (2019). 10.1038/s41573-019-0024-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Skinnider, M. A., Stacey, R. G., Wishart, D. S. & Foster, L. J. Chemical language models enable navigation in sparsely populated chemical space. Nat. Mach. Intell.3, 759–770 (2021). 10.1038/s42256-021-00368-1 [DOI] [Google Scholar]
- 12.Grisoni, F. Chemical language models for de novo drug design: challenges and opportunities. Curr. Opin. Struct. Biol.79, 102527 (2023). 10.1016/j.sbi.2023.102527 [DOI] [PubMed] [Google Scholar]
- 13.Chen, H., Engkvist, O., Wang, Y., Olivecrona, M. & Blaschke, T. The rise of deep learning in drug discovery. Drug Discov. Today23, 1241–1250 (2018). 10.1016/j.drudis.2018.01.039 [DOI] [PubMed] [Google Scholar]
- 14.Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci.28, 31–36 (1988). 10.1021/ci00057a005 [DOI] [Google Scholar]
- 15.Segler, M. H. S., Kogej, T., Tyrchan, C. & Waller, M. P. Generating focussed molecule libraries for drug discovery with recurrent neural networks. arXiv10.48550/arXiv.1701.01329 (2017). [DOI] [PMC free article] [PubMed]
- 16.Segler, M. H. S., Kogej, T., Tyrchan, C. & Waller, M. P. Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent. Sci.4, 120–131 (2018). 10.1021/acscentsci.7b00512 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Gómez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci.4, 268–276 (2018). 10.1021/acscentsci.7b00572 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Gupta, A. et al. Generative recurrent networks for de novo drug design. Mol. Inform. 37, 1700111 (2018). [DOI] [PMC free article] [PubMed]
- 19.Merk, D., Friedrich, L., Grisoni, F. & Schneider, G. De novo design of bioactive small molecules by artificial intelligence. Mol. Inform.37, 1700153 (2018). 10.1002/minf.201700153 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Moret, M., Helmstädter, M., Grisoni, F., Schneider, G. & Merk, D. Beam search for automated design and scoring of novel ROR ligands with machine intelligence**. Angew. Chem. Int. Ed.60, 19477–19482 (2021). [DOI] [PMC free article] [PubMed]
- 21.Moret, M. et al. Leveraging molecular structure and bioactivity with chemical language models for de novo drug design. Nat. Commun.14, 114 (2023). 10.1038/s41467-022-35692-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Ballarotto, M. et al. De novo design of Nurr1 agonists via fragment-augmented generative deep learning in low-data regime. J. Med. Chem.66, 8170–8177 (2023). 10.1021/acs.jmedchem.3c00485 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Moret, M., Friedrich, L., Grisoni, F., Merk, D. & Schneider, G. Generative molecular design in low data regimes. Nat. Mach. Intell.2, 171–180 (2020). 10.1038/s42256-020-0160-y [DOI] [Google Scholar]
- 24.Grisoni, F. & Schneider, G. De novo molecular design with chemical language models. In Methods in Molecular Biology (ed. Heifetz, A.) 2390, 207–232 (Humana Press Inc., 2022). [DOI] [PubMed]
- 25.Awale, M., Sirockin, F., Stiefl, N. & Reymond, J. L. Drug analogs from fragment-based long short-term memory generative neural networks. J. Chem. Inf. Model.59, 1347–1356 (2019). 10.1021/acs.jcim.8b00902 [DOI] [PubMed] [Google Scholar]
- 26.Kawai, T. et al. AT1 receptor signaling pathways in the cardiovascular system. Pharmacol. Res.125, 4–13 (2017). 10.1016/j.phrs.2017.05.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Sun, L., Cai, J. & Gonzalez, F. J. The role of farnesoid X receptor in metabolic diseases, and gastrointestinal and liver cancer. Nat. Rev. Gastroenterol. Hepatol.18, 335–347 (2021). 10.1038/s41575-020-00404-2 [DOI] [PubMed] [Google Scholar]
- 28.Kimura, I., Ichimura, A., Ohue-Kitano, R. & Igarashi, M. Free fatty acid receptors in health and disease. Physiol. Rev.100, 171–210 (2020). 10.1152/physrev.00041.2018 [DOI] [PubMed] [Google Scholar]
- 29.Montaigne, D., Butruille, L. & Staels, B. PPAR control of metabolism and cardiovascular functions. Nat. Rev. Cardiol.18, 809–823 (2021). 10.1038/s41569-021-00569-6 [DOI] [PubMed] [Google Scholar]
- 30.Saponaro, F., Sestito, S., Runfola, M., Rapposelli, S. & Chiellini, G. Selective thyroid hormone receptor-beta (TRβ) agonists: new perspectives for the treatment of metabolic and neurodegenerative disorders. Front. Med.7, 331 (2020). 10.3389/fmed.2020.00331 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Imig, J. D., Cervenka, L. & Neckar, J. Epoxylipids and soluble epoxide hydrolase in heart diseases. Biochem. Pharmacol.195, 114866 (2022). 10.1016/j.bcp.2021.114866 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Wagner, K. M., McReynolds, C. B., Schmidt, W. K. & Hammock, B. D. Soluble epoxide hydrolase as a therapeutic target for pain, inflammatory and neurodegenerative diseases. Pharmacol. Ther.180, 62–76 (2017). 10.1016/j.pharmthera.2017.06.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Cosentino, F. et al. 2019 ESC guidelines on diabetes, pre-diabetes, and cardiovascular diseases developed in collaboration with the EASD. Eur. Heart J.41, 255–323 (2020). 10.1093/eurheartj/ehz486 [DOI] [PubMed] [Google Scholar]
- 34.Morgan, H. L. The generation of a unique machine description for chemical structures—a technique developed at chemical abstracts service. J. Chem. Doc.5, 107–113 (1965). 10.1021/c160017a018 [DOI] [Google Scholar]
- 35.Reutlinger, M. et al. Chemically advanced template search (CATS) for scaffold-hopping and prospective target prediction for ‘orphan’ molecules. Mol. Inform.32, 133–138 (2013). 10.1002/minf.201200141 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Gilson, M. K. et al. BindingDB in 2015: a public database for medicinal chemistry, computational chemistry and systems pharmacology. Nucleic Acids Res.44, D1045–D1053 (2016). 10.1093/nar/gkv1072 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Mendez, D. et al. ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res.47, D930–D940 (2019). 10.1093/nar/gky1075 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Bickerton, G. R., Paolini, G. V., Besnard, J., Muresan, S. & Hopkins, A. L. Quantifying the chemical beauty of drugs. Nat. Chem.4, 90–98 (2012). 10.1038/nchem.1243 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Keiser, M. J. et al. Relating protein pharmacology by ligand chemistry. Nat. Biotechnol.25, 197–206 (2007). 10.1038/nbt1284 [DOI] [PubMed] [Google Scholar]
- 40.Daina, A., Michielin, O. & Zoete, V. SwissADME: a free web tool to evaluate pharmacokinetics, drug-likeness and medicinal chemistry friendliness of small molecules. Sci. Rep.7, 42717 (2017). 10.1038/srep42717 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Schmidt, J. et al. A dual modulator of farnesoid X receptor and soluble epoxide hydrolase to counter nonalcoholic steatohepatitis. J. Med. Chem.60, 7703–7724 (2017). 10.1021/acs.jmedchem.7b00398 [DOI] [PubMed] [Google Scholar]
- 42.Schierle, S. et al. Design and structural optimization of dual FXR/PPARδ activators. J. Med. Chem.63, 8369–8379 (2020). 10.1021/acs.jmedchem.0c00618 [DOI] [PubMed] [Google Scholar]
- 43.Arifi, S. et al. Targeting the alternative vitamin E metabolite binding site enables noncanonical PPARγ modulation. J. Am. Chem. Soc.145, 14802–14810 (2023). 10.1021/jacs.3c03417 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Blöcher, R. et al. N-benzylbenzamides: a novel merged scaffold for orally available dual soluble epoxide hydrolase/peroxisome proliferator-activated receptor γ modulators. J. Med. Chem.59, 61–81 (2016). 10.1021/acs.jmedchem.5b01239 [DOI] [PubMed] [Google Scholar]
- 45.Raymond, J. W. RASCAL: calculation of graph similarity using maximum common edge subgraphs. Comput. J.45, 631–644 (2002). 10.1093/comjnl/45.6.631 [DOI] [Google Scholar]
- 46.Wang, M. et al. RELATION: a deep generative model for structure-based de novo drug design. J. Med. Chem.65, 9478–9492 (2022). 10.1021/acs.jmedchem.2c00732 [DOI] [PubMed] [Google Scholar]
- 47.Atz, K., Grisoni, F. & Schneider, G. Geometric deep learning on molecular representations. Nat. Mach. Intell.3, 1023–1032 (2021). 10.1038/s42256-021-00418-8 [DOI] [Google Scholar]
- 48.Skalic, M., Jiménez, J., Sabbadin, D. & De Fabritiis, G. Shape-based generative modeling for de novo drug design. J. Chem. Inf. Model.59, 1205–1214 (2019). 10.1021/acs.jcim.8b00706 [DOI] [PubMed] [Google Scholar]
- 49.Blaschke, T. et al. REINVENT 2.0: an AI tool for de novo drug design. J. Chem. Inf. Model.60, 5918–5922 (2020). 10.1021/acs.jcim.0c00915 [DOI] [PubMed] [Google Scholar]
- 50.Atz, K. et al. Prospective de novo drug design with deep interactome learning. Nat. Commun.15, 3408 (2024). 10.1038/s41467-024-47613-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Molecular Operating Environment (MOE). 2022.02 Chemical Computing Group ULC, 1010 Sherbooke St. West, Suite #910, Montreal, QC, Canada, H3A 2R7. (2022).
- 52.Öster, L., Tapani, S., Xue, Y. & Käck, H. Successful generation of structural information for fragment-based drug discovery. Drug Discov. Today20, 1104–1111 (2015). 10.1016/j.drudis.2015.04.005 [DOI] [PubMed] [Google Scholar]
- 53.Wang, N., Zou, Q., Xu, J., Zhang, J. & Liu, J. Ligand binding and heterodimerization with retinoid X receptor α (RXRα) induce farnesoid X receptor (FXR) conformational changes affecting coactivator binding. J. Biol. Chem.293, 18180–18191 (2018). 10.1074/jbc.RA118.004652 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Ye, L. et al. Thyroid receptor ligands. 1. Agonist ligands selective for the thyroid receptor β1. J. Med. Chem.46, 1580–1588 (2003). 10.1021/jm021080f [DOI] [PubMed] [Google Scholar]
- 55.Zhang, H. et al. Structural basis for ligand recognition and functional selectivity at angiotensin receptor. J. Biol. Chem.290, 29127–29139 (2015). 10.1074/jbc.M115.689000 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Srivastava, A. et al. High-resolution structure of the human GPR40 receptor bound to allosteric agonist TAK-875. Nature513, 124–127 (2014). 10.1038/nature13494 [DOI] [PubMed] [Google Scholar]
- 57.Sander, T., Freyss, J., Von Korff, M. & Rufener, C. DataWarrior: an open-source program for chemistry aware data visualization and analysis. J. Chem. Inf. Model.55, 460–473 (2015). 10.1021/ci500588j [DOI] [PubMed] [Google Scholar]
- 58.Tanaka, D. et al. A practical use of ligand efficiency indices out of the fragment-based approach: ligand efficiency-guided lead identification of soluble epoxide hydrolase inhibitors. J. Med. Chem.54, 851–857 (2011). 10.1021/jm101273e [DOI] [PubMed] [Google Scholar]
- 59.Amano, Y., Yamaguchi, T. & Tanabe, E. Structural insights into binding of inhibitors to soluble epoxide hydrolase gained by fragment screening and X-ray crystallography. Bioorg. Med. Chem.22, 2427–2434 (2014). 10.1016/j.bmc.2014.03.001 [DOI] [PubMed] [Google Scholar]
- 60.Akwabi-Ameyaw, A. et al. Conformationally constrained farnesoid X receptor (FXR) agonists: naphthoic acid-based analogs of GW 4064. Bioorg. Med. Chem. Lett.18, 4339–4343 (2008). 10.1016/j.bmcl.2008.06.073 [DOI] [PubMed] [Google Scholar]
- 61.Flatt, B. et al. Discovery of XL335 (WAY-362450), a highly potent, selective, and orally active agonist of the farnesoid X receptor (FXR). J. Med. Chem.52, 904–907 (2009). 10.1021/jm8014124 [DOI] [PubMed] [Google Scholar]
- 62.Hangeland, J. J. et al. Thyroid receptor ligands. Part 2: thyromimetics with improved selectivity for the thyroid hormone receptor beta. Bioorg. Med. Chem. Lett.14, 3549–3553 (2004). 10.1016/j.bmcl.2004.04.032 [DOI] [PubMed] [Google Scholar]
- 63.Yao, B. et al. Revealing a mutant-induced receptor allosteric mechanism for the thyroid hormone resistance. iScience20, 489–496 (2019). 10.1016/j.isci.2019.10.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Batista, F. A. H. et al. Structural insights into human peroxisome proliferator activated receptor delta (PPAR-Delta) selective ligand binding. PLoS ONE7, e33643 (2012). 10.1371/journal.pone.0033643 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Wu, C. C. et al. Structural basis for specific ligation of the peroxisome proliferator-activated receptor. Proc. Natl. Acad. Sci. USA114, E2563–E2570 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Hernandez-Olmos, V. et al. Discovery of irbesartan derivatives as BLT2 agonists by virtual screening. ACS Med. Chem. Lett.12, 1261–1266 (2021). 10.1021/acsmedchemlett.1c00240 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Ehrler, J. H. M. et al. Compilation and evaluation of a fatty acid mimetics screening library. Biochem. Pharmacol.204, 115191 (2022). 10.1016/j.bcp.2022.115191 [DOI] [PubMed] [Google Scholar]
- 68.Hahn, S. et al. Complementary screening techniques yielded fragments that inhibit the phosphatase activity of soluble epoxide hydrolase. ChemMedChem6, 2146–2149 (2011). 10.1002/cmdc.201100433 [DOI] [PubMed] [Google Scholar]
- 69.Lukin, A. et al. Discovery of polar spirocyclic orally bioavailable urea inhibitors of soluble epoxide hydrolase. Bioorg. Chem.80, 655–667 (2018). 10.1016/j.bioorg.2018.07.014 [DOI] [PubMed] [Google Scholar]
- 70.Wolf, N. M., Morisseau, C., Jones, P. D., Hock, B. & Hammock, B. D. Development of a high-throughput screen for soluble epoxide hydrolase inhibition. Anal. Biochem.355, 71–80 (2006). 10.1016/j.ab.2006.04.045 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Gellrich, L. et al. l-Thyroxin and the nonclassical thyroid hormone TETRAC are potent activators of PPARγ. J. Med. Chem.63, 6727–6740 (2020). 10.1021/acs.jmedchem.9b02150 [DOI] [PubMed] [Google Scholar]
- 72.Rau, O. et al. Carnosic acid and carnosol, phenolic diterpene compounds of the labiate herbs rosemary and sage, are activators of the human peroxisome proliferator-activated receptor gamma. Planta Med.72, 881–887 (2006). 10.1055/s-2006-946680 [DOI] [PubMed] [Google Scholar]
- 73.Schmidt, J. et al. NSAIDs ibuprofen, indometacin, and diclofenac do not interact with farnesoid X receptor. Sci. Rep.5, 14782 (2015). 10.1038/srep14782 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Pauli, G. F. et al. Importance of purity evaluation and the potential of quantitative 1H NMR as a purity assay. J. Med. Chem.57, 9220–9231 (2014). 10.1021/jm500734a [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Isigkeit, L. et al. Automated design of multi-target ligands by generative deep learning. Zenodo (2024). [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All data generated in this study and supporting the results of this study are provided in the Supplementary Information and Source Data file. Source data are provided with this paper.
Code used in this study is available at Zenodo (10.5281/zenodo.12795470)75.